ElasticSearch Curator delete unless last entry - elasticsearch-curator

Is there a way to stop curator deleting the last index when deleting by time?
actions:
1:
action: delete_indices
description: Delete kube- indices older than 14 days. Ignore the error if there are none and exit cleanly.
options:
disable_action: False
ignore_empty_list: True
filters:
- filtertype: pattern
kind: prefix
value: kube-
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 14
I have this which works great to keep a currently running K8s cluster logs in check. However when we move AWS region the log name changes e.g. from kube-eu-west-1-<date> to kube-eu-west-2-<date>.
Curator diligently cleans up all the data after 14 days. What I'd like is to prevent it from removing the last entry for a particular index, so there is always a record of what happened the last time the cluster was in that region.
(It would also "fix" some less well written pieces of code that throw errors when the data they expect to be there has legitimately gone away).

You could use the count filter:
filters:
# your existing filters go BEFORE the count filter...
- filtertype: count
count: 1
This example should exclude (in spite of the exclude: false) the most recent index from the list of actionable indices, preserving it. If this is not the index you want to exclude, explore using exclude: true/false (default is true), and/or reverse: true/false (default is true) until it excludes the index you want.
NOTE: Always use the --dry-run flag to test your filters before deploying them on actual data. Iterate until it looks right. Use of loglevel: DEBUG in your client settings will show how filters make their decisions, if that helps.

Related

Wazuh - Filebeat - Elasticsearch non-zero metrics

Could you please help me solve this Filebeat error?
Its Wazuh manager server. All is working, I can connect to Kibana web, enter Wazuh app and I can see there my three Wazuh agents connected and active.
I want FIM monitoring nad If I change file on agent server, alert is created and I can see that alert in alert.log on manager server. Issue is, that Filebeat wont send this alert to elasticsearch so I cant see that alert on Kibana web.
Wazuh manager>
Wazuh 4.2.5
Filebeat 7.14.2
Elasticsearch 7.14.2
Kibana 7.14.2
Wazuh alert log - /var/ossec/logs/alerts/2022/Feb/ and /var/ossec/logs/alerts
systemctl status filebeat is active, but I can see there lines:
WARN [elasticsearch] elasticsearch/client.go:405 Cannot>
This is error from > filebeat -e
2022-02-03T12:46:20.386+0100 INFO [monitoring] log/log.go:153 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"id":"session-248447.scope","mem":{"limit":{"bytes":9223372036854771712},"usage":{"bytes":622415872}}}},"cpu":{"system":{"ticks":70,"time":{"ms":72}},"total":{"ticks":300,"time":{"ms":311},"value":300},"user":{"ticks":230,"time":{"ms":239}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":9},"info":{"ephemeral_id":"641d7fdd-47a0-4b10-bda9-36f29c29fdef","uptime":{"ms":98413},"version":"7.14.2"},"memstats":{"gc_next":18917616,"memory_alloc":14197072,"memory_sys":75383816,"memory_total":71337840,"rss":115638272},"runtime":{"goroutines":11}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":2,"starts":2},"reloads":1,"scans":1},"output":{"events":{"active":0},"type":"elasticsearch"},"
And here is error found in /var/log/messages
Feb 3 10:27:54 filebeat[2531915]: 2022-02-03T10:27:54.707+0100#011WARN#011[elasticsearch]#011elasticsearch/client.go:405#011Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07705e669760167, ext:958857091513, loc:(*time.Location)(0x5620964fb2a0)}, Meta:{"pipeline":"filebeat-7.14.0-wazuh-alerts-pipeline"}, Fields:{"agent":{"ephemeral_id":"33cb9baa-af71-4b44-99a6-1379c747722f","hostname":"xlc","id":"03fb57ca-9940-4886-9e6e-a3b3e635cd35","name":"xlc","type":"filebeat","version":"7.14.0"},"ecs":{"version":"1.10.0"},"event":{"dataset":"wazuh.alerts","module":"wazuh"},"fields":{"index_prefix":"wazuh-monitoring-"},"fileset":{"name":"alerts"},"host":{"name":"xlc"},"input":{"type":"log"},"log":{"file":{"path":"/var/ossec/logs/alerts/alerts.json"},"offset":122695554},"message":"{\"timestamp\":\"2022-02-03T10:27:52.438+0100\",\"rule\":{\"level\":5,\"description\":\"Registry Value Integrity Checksum Changed\",\"id\":\"750\",\"mitre\":{\"id\":[\"T1492\"],\"tactic\":[\"Impact\"],\"technique\":[\"Stored Data Manipulation\"]},\"firedtimes\":7,\"mail\":false,\"groups\":[\"ossec\",\"syscheck\",\"syscheck_entry_modified\",\"syscheck_registry\"],\"pci_dss\":[\"11.5\"],\"gpg13\":[\"4.13\"],\"gdpr\":[\"II_5.1.f\"],\"hipaa\":[\"164.312.c.1\",\"164.312.c.2\"],\"nist_800_53\":[\"SI.7\"],\"tsc\":[\"PI1.4\",\"PI1.5\",\"CC6.1\",\"CC6.8\",\"CC7.2\",\"CC7.3\"]},\"agent\":{\"id\":\"006\",\"name\":\"CPP\",\"ip\":\"10.74.37.3\"},\"manager\":{\"name\":\"xlc\"},\"id\":\"1643880472.68132386\",\"full_log\":\"Registry Value '[x32] HKEY_LOCAL_MACHINE\\\\System\\\\CurrentControlSet\\\\Services\\\\W32Time\\\\Config\\\\LastKnownGoodTime' modified\\nMode: scheduled\\nChanged attributes: md5,sha1,sha256\\nOld md5sum was: '5df5b1598b729d98734105148103abf2'\\nNew md5sum is : '361334bf60bdd83e30894c4f313d16ec'\\nOld sha1sum was: 'c233c8ccb56fbd363c44b51a9d51c7fa32512474'\\nNew sha1sum is : '7163cffa48f1a7c0bcb4a3ddff6278ae9a4895a6'\\nOld sha256sum was: '3aad3da22f2d53e8ac33c46c73f40c3e8f5db07188d166e24957d8a20b62b5f1'\\nNew sha256sum is : 'bee8072335d870a1624a541cb13ca5085ba85646a8417d4d894deff71c3f4a92'\\n\",\"syscheck\":{\"path\":\"HKEY_LOCAL_MACHINE\\\\System\\\\CurrentControlSet\\\\Services\\\\W32Time\\\\Config\",\"mode\":\"scheduled\",\"arch\":\"[x32]\",\"value_name\":\"LastKnownGoodTime\",\"size_after\":\"8\",\"md5_before\":\"5df5b1598b729d98734105148103abf2\",\"md5_after\":\"361334bf60bdd83e30894c4f313d16ec\",\"sha1_before\":\"c233c8ccb56fbd363c44b51a9d51c7fa32512474\",\"sha1_after\":\"7163cffa48f1a7c0bcb4a3ddff6278ae9a4895a6\",\"sha256_before\":\"3aad3da22f2d53e8ac33c46c73f40c3e8f5db07188d166e24957d8a20b62b5f1\",\"sha256_after\":\"bee8072335d870a1624a541cb13ca5085ba85646a8417d4d894deff71c3f4a92\",\"changed_attributes\":[\"md5\",\"sha1\",\"sha256\"],\"event\":\"modified\"},\"decoder\":{\"name\":\"syscheck_registry_value_modified\"},\"location\":\"syscheck\"}","service":{"type":"wazuh"}}, Private:file.State{Id:"native::1049-64776", PrevId:"", Finished:false, Fileinfo:(*os.fileStat)(0xc000fc9380), Source:"/var/ossec/logs/alerts/alerts.json", Offset:122697450, Timestamp:time.Time{wall:0xc07704f6d4cb3764, ext:510354422, loc:(*time.Location)(0x5620964fb2a0)}, TTL:-1, Type:"log", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x419, Device:0xfd08}, IdentifierName:"native"}, TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"data_stream [<wazuh-monitoring-{2022.02.03||/d{yyyy.MM.dd|UTC}}>] must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]"}
Could you please help with this? I tried google but with no success. Thank you.
Filebeat reads from alerts.json, you can check this file to see if the alerts are being generated. Judging from the log you provided, it looks like filebeat cannot send some logs to elasticsearch (Cannot index event publisher.Event), but we would need more details about the complete error and source logs causing that error. The output of the command # journalctl -f -u filebeat will be useful in this case to provide further assistance.
Based on previous experience. the problem could be that you have reached the maximum limit of shards opened, by default this number is set to 1000. If this is the case, you will see an error like the following: {"type":"validation_exception","reason":"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;"}
If that's the case, you can either reduce the number of shards, or increase the limit to solve the situation right now. I'd recommend the first approach if you only have 1 Elasticsearch node, having 1000 shards is not healthy for the environment in these cases.
To reduce the number of shards in /etc/filebeat/wazuh-template.json check this information and change it to "1", then restart filebeat. These actions will affect the index from now on, but checking This guide can help you with cases like this one.
Also, you can try to remove old indexes. I would first check what are the indices you have stored. I suppose some of them are related to statistics or other stuff, so I would first try to remove those before actual data (wazuh-alerts-)
You can use:
GET /_cat/indices
As the indices are stored per day by default, so you can remove, for instance, those indices older than 1 month and we only keep one month of those indices
To prevent this from happening in the future, you may try implementing an Index Management Policy after you solve the issue at hand.

How to push logs to elasticsearch in filebeat instantly?

hear is my filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- ../typescript/rate-limit-test/logs/*.log
json.message_key: "message"
json.keys_under_root: true
json.overwrite_keys: true
scan_frequency: 1s
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
setup.template.settings:
index.number_of_shards: 1
logging.level: debug
output.elasticsearch:
hosts: ["34.97.108.113:9200"]
index: "filebeat-%{+yyyy-MM-dd}"
setup.template:
name: 'filebeat'
pattern: 'filebeat-*'
enabled: true
setup.template.overwrite: true
setup.template.append_fields:
- name: time
type: date
processors:
- drop_fields:
fields: ["agent","host","ecs","input","log"]
setup.ilm.enabled: false`
I changed scan_frequncy but elasticsearch couldn't get logs faster
How can i get logs in elasticsearch instantly?
Please help me..
There will be never an 'instantly' available logline in elasticsearch. The file needs to be watched for a considerable amount of changes or time, then the newly added lines need to be sent to elasticsearch in a bulk request and indexed into the appropriate shard on the correct cluster node. Network latency, TLS, authentification + authorization, concurrent write/search load: all the things affects the 'instantly' experience.
The speed of log ingestion and NRT (near-real-time search) depends on many factors and configuration options in elasticsearch and filebeat.
Regarding tuning elasticsearch for indexing speed, have a look at this documentation, and apply what you have missed yet. A brief overview:
Disable swapping and enable memory locking (bootstrap.memory_lock: true)
Consider reducing index.refresh_interval (defaults to 1s) for the index in order to have the docs flushed more often (produces more IO in the cluster)
For Filebeat, there is also good documentation about tuning, but in general, I see the following options:
Try different output.elasticsarch.bulk_max_size values (defaults to a batch size of 50) and monitor the ingestion speed. For each cluster configuration, there are different optimal settings.
In high load scenarios, when the logs are written fast, consider increasing the number of workers output.elasticsarch.workers (defaults to 1)
In the opposite scenario, having just a few log lines being written, consider increasing the close_inactive and scan_frequency value for the harvester. Specifying a more suitable backoff will have an effect on how aggressively Filebeat checks files for updates.

Error decoding JSON and broken logs in Elastic search

i'm using elk stack of version 5.5 in ubuntu 16.0
My logs are getting broken and not writting properly into elastic which is resulting in json.erros
like below
Error decoding JSON: invalid character 'e' in literal null (expecting 'u')"
getting json.errors very frequent and those logs are not reading or writting properly into elasticsearch ?
and this is happening for every 5 to 10 mins. please help me solve it.
screenshot of broken logs in kibana
My sample log is :
{"log":"2019-10-01 07:18:26:854*[DEBUG]*cluster2-nio-worker-0*Connection*userEventTriggered*Connection[cassandraclient/10.3.254.137:9042-1, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat\n","stream":"stdout","time":"2019-10-01T07:18:26.85462769Z"}
Since you have stated that the json logs are not pretty printed I assume that the multiline-settings of your input configuration are causing the problems.
In my opinion you don't need any multiline settings when you have logs in json format and they are not pretty printed, meaning the whole json object (= log event) is written in one line.
You have already specified
json.message_key: log
This solely should get the job done.
So to sum it up:
Remove the multiline settings and try again. Your configuration should look like this:
filebeat.inputs:
- type: log
paths:
- "/var/log/containers/*.log"
tags: ["kube-logs"]
symlinks: true
json.message_key: log
json.keys_under_root: true
json.add_error_key: true

Elasticsearch Delete by Query Version Conflict

I am using Elasticsearch version 5.6.10. I have a query that deletes records for a given agency, so they can later be updated by a nightly script.
The query is in elasticsearch-dsl and look like this:
def remove_employees_from_search(jurisdiction_slug, year):
s = EmployeeDocument.search()
s = s.filter('term', year=year)
s = s.query('nested', path='jurisdiction', query=Q("term", **{'jurisdiction.slug': jurisdiction_slug}))
response = s.delete()
return response
The problem is I am getting a ConflictError exception when trying to delete the records via that function. I have read this occurs because the documents were different between the time the delete process started and executed. But I don't know how this can be, because nothing else is modifying the records during the delete process.
I am going to add s = s.params(conflicts='proceed') in order to silence the exception. But this is a band-aid as I do not understand why the delete is not processing as expected. Any ideas on how to troubleshoot this? A snapshot of the error is below:
ConflictError:TransportError(409,
u'{
"took":10,
"timed_out":false,
"total":55,
"deleted":0,
"batches":1,
"version_conflicts":55,
"noops":0,
"retries":{
"bulk":0,
"search":0
},
"throttled_millis":0,
"requests_per_second":-1.0,
"throttled_until_millis":0,
"failures":[
{
"index":"employees",
"type":"employee_document",
"id":"24681043",
"cause":{
"type":"version_conflict_engine_exception",
"reason":"[employee_document][24681043]: version conflict, current version [5] is different than the one provided [4]",
"index_uuid":"G1QPF-wcRUOCLhubdSpqYQ",
"shard":"0",
"index":"employees"
},
"status":409
},
{
"index":"employees",
"type":"employee_document",
"id":"24681063",
"cause":{
"type":"version_conflict_engine_exception",
"reason":"[employee_document][24681063]: version conflict, current version [5] is different than the one provided [4]",
"index_uuid":"G1QPF-wcRUOCLhubdSpqYQ",
"shard":"0",
"index":"employees"
},
"status":409
}
You could try making it do a refresh first
client.indices.refresh(index='your-index')
source https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#_indices_refresh
First, this is a question that was asked 2 years ago, so take my response with a grain of salt due to the time gap.
I am using the javascript API, but I would bet that the flags are similar. When you index or delete there is a refresh flag which allows you to force the index to have the result appear to search.
I am not an Elasticsearch guru, but the engine must perform some systematic maintenance on the indices and shards so that it moves the indices to a stable state. It's probably done over time, so you would not necessarily get an immediate state update. Furthermore, from personal experience, I have seen when delete does not seemingly remove the item from the index. It might mark it as "deleted", give the document a new version number, but it seems to "stick around" (probably until general maintenance sweeps run).
Here I am showing the js API for delete, but it is the same for index and some of the other calls.
client.delete({
id: string,
index: string,
type: string,
wait_for_active_shards: string,
refresh: 'true' | 'false' | 'wait_for',
routing: string,
timeout: string,
if_seq_no: number,
if_primary_term: number,
version: number,
version_type: 'internal' | 'external' | 'external_gte' | 'force'
})
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/api-reference.html#_delete
refresh
'true' | 'false' | 'wait_for' - If true then refresh the affected shards to make this operation visible to search, if wait_for then wait for a refresh to make this operation visible to search, if false (the default) then do nothing with refreshes.
For additional reference, here is the page on Elasticsearch refresh info and what might be a fairly relevant blurb for you.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
Use the refresh API to explicitly refresh one or more indices. If the request targets a data stream, it refreshes the stream’s backing indices. A refresh makes all operations performed on an index since the last refresh available for search.
By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds. You can change this default interval using the index.refresh_interval setting.

Elastalert constant realerting.

I'm having some difficulties setting up an elastalert rule. It's quite a basic one, and I've read the documentation but clearly not understood it and I'm after some help.
I have a basic test rule that i want to alert when my data input to elastic from certain devices stops for more that 5 minutes.
es_host: localhost
es_port: 9200
name: Example rule
type: flatline
index: test_mapping-*
threshold: 1
timeframe:
minutes: 5
filter:
- term:
device: "ggYthy767b"
alert:
- command
command: ["/bin/test"]
realert:
minutes: 10
This works, so when data stops i get an alert, then that alert is silenced until 10 minutes later it realerts again. The issue is that it realerts every 10 minutes and i don't know how to stop it. Is there a way to get it to realert just once and then stop? Or have i misunderstood? Also I have 10+ different devices, and i want the same alert to apply if any of them stop sending data for 5 minutes, is that possible within one rule? Thanks very much in advance.
The question you need to ask to yourself is how often do you want to get alerted. Once a lifetime, a year, a month or fortnightly or what? So "realert" is the part you want to edit. You might want to change it to something like below. So even if the alert is triggered multiple times you'll only get it once a day. It uses simple English terms so you can update it how you like it (weeks, hours etc.).
realert:
days: 1
But if you're getting alerted much more than you want, either you're system is too unstable or your alerts are too paranoid. For example for this alert every 5 minutes you're looking for one record which actually doesn't get populated. You should raise your period or add less selective filters because it's a 'flatline' alert. You can also use it with "query_key" so it will be applied on a per key basis.

Resources