Wazuh - Filebeat - Elasticsearch non-zero metrics - elasticsearch

Could you please help me solve this Filebeat error?
Its Wazuh manager server. All is working, I can connect to Kibana web, enter Wazuh app and I can see there my three Wazuh agents connected and active.
I want FIM monitoring nad If I change file on agent server, alert is created and I can see that alert in alert.log on manager server. Issue is, that Filebeat wont send this alert to elasticsearch so I cant see that alert on Kibana web.
Wazuh manager>
Wazuh 4.2.5
Filebeat 7.14.2
Elasticsearch 7.14.2
Kibana 7.14.2
Wazuh alert log - /var/ossec/logs/alerts/2022/Feb/ and /var/ossec/logs/alerts
systemctl status filebeat is active, but I can see there lines:
WARN [elasticsearch] elasticsearch/client.go:405 Cannot>
This is error from > filebeat -e
2022-02-03T12:46:20.386+0100 INFO [monitoring] log/log.go:153 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"id":"session-248447.scope","mem":{"limit":{"bytes":9223372036854771712},"usage":{"bytes":622415872}}}},"cpu":{"system":{"ticks":70,"time":{"ms":72}},"total":{"ticks":300,"time":{"ms":311},"value":300},"user":{"ticks":230,"time":{"ms":239}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":9},"info":{"ephemeral_id":"641d7fdd-47a0-4b10-bda9-36f29c29fdef","uptime":{"ms":98413},"version":"7.14.2"},"memstats":{"gc_next":18917616,"memory_alloc":14197072,"memory_sys":75383816,"memory_total":71337840,"rss":115638272},"runtime":{"goroutines":11}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":2,"starts":2},"reloads":1,"scans":1},"output":{"events":{"active":0},"type":"elasticsearch"},"
And here is error found in /var/log/messages
Feb 3 10:27:54 filebeat[2531915]: 2022-02-03T10:27:54.707+0100#011WARN#011[elasticsearch]#011elasticsearch/client.go:405#011Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07705e669760167, ext:958857091513, loc:(*time.Location)(0x5620964fb2a0)}, Meta:{"pipeline":"filebeat-7.14.0-wazuh-alerts-pipeline"}, Fields:{"agent":{"ephemeral_id":"33cb9baa-af71-4b44-99a6-1379c747722f","hostname":"xlc","id":"03fb57ca-9940-4886-9e6e-a3b3e635cd35","name":"xlc","type":"filebeat","version":"7.14.0"},"ecs":{"version":"1.10.0"},"event":{"dataset":"wazuh.alerts","module":"wazuh"},"fields":{"index_prefix":"wazuh-monitoring-"},"fileset":{"name":"alerts"},"host":{"name":"xlc"},"input":{"type":"log"},"log":{"file":{"path":"/var/ossec/logs/alerts/alerts.json"},"offset":122695554},"message":"{\"timestamp\":\"2022-02-03T10:27:52.438+0100\",\"rule\":{\"level\":5,\"description\":\"Registry Value Integrity Checksum Changed\",\"id\":\"750\",\"mitre\":{\"id\":[\"T1492\"],\"tactic\":[\"Impact\"],\"technique\":[\"Stored Data Manipulation\"]},\"firedtimes\":7,\"mail\":false,\"groups\":[\"ossec\",\"syscheck\",\"syscheck_entry_modified\",\"syscheck_registry\"],\"pci_dss\":[\"11.5\"],\"gpg13\":[\"4.13\"],\"gdpr\":[\"II_5.1.f\"],\"hipaa\":[\"164.312.c.1\",\"164.312.c.2\"],\"nist_800_53\":[\"SI.7\"],\"tsc\":[\"PI1.4\",\"PI1.5\",\"CC6.1\",\"CC6.8\",\"CC7.2\",\"CC7.3\"]},\"agent\":{\"id\":\"006\",\"name\":\"CPP\",\"ip\":\"10.74.37.3\"},\"manager\":{\"name\":\"xlc\"},\"id\":\"1643880472.68132386\",\"full_log\":\"Registry Value '[x32] HKEY_LOCAL_MACHINE\\\\System\\\\CurrentControlSet\\\\Services\\\\W32Time\\\\Config\\\\LastKnownGoodTime' modified\\nMode: scheduled\\nChanged attributes: md5,sha1,sha256\\nOld md5sum was: '5df5b1598b729d98734105148103abf2'\\nNew md5sum is : '361334bf60bdd83e30894c4f313d16ec'\\nOld sha1sum was: 'c233c8ccb56fbd363c44b51a9d51c7fa32512474'\\nNew sha1sum is : '7163cffa48f1a7c0bcb4a3ddff6278ae9a4895a6'\\nOld sha256sum was: '3aad3da22f2d53e8ac33c46c73f40c3e8f5db07188d166e24957d8a20b62b5f1'\\nNew sha256sum is : 'bee8072335d870a1624a541cb13ca5085ba85646a8417d4d894deff71c3f4a92'\\n\",\"syscheck\":{\"path\":\"HKEY_LOCAL_MACHINE\\\\System\\\\CurrentControlSet\\\\Services\\\\W32Time\\\\Config\",\"mode\":\"scheduled\",\"arch\":\"[x32]\",\"value_name\":\"LastKnownGoodTime\",\"size_after\":\"8\",\"md5_before\":\"5df5b1598b729d98734105148103abf2\",\"md5_after\":\"361334bf60bdd83e30894c4f313d16ec\",\"sha1_before\":\"c233c8ccb56fbd363c44b51a9d51c7fa32512474\",\"sha1_after\":\"7163cffa48f1a7c0bcb4a3ddff6278ae9a4895a6\",\"sha256_before\":\"3aad3da22f2d53e8ac33c46c73f40c3e8f5db07188d166e24957d8a20b62b5f1\",\"sha256_after\":\"bee8072335d870a1624a541cb13ca5085ba85646a8417d4d894deff71c3f4a92\",\"changed_attributes\":[\"md5\",\"sha1\",\"sha256\"],\"event\":\"modified\"},\"decoder\":{\"name\":\"syscheck_registry_value_modified\"},\"location\":\"syscheck\"}","service":{"type":"wazuh"}}, Private:file.State{Id:"native::1049-64776", PrevId:"", Finished:false, Fileinfo:(*os.fileStat)(0xc000fc9380), Source:"/var/ossec/logs/alerts/alerts.json", Offset:122697450, Timestamp:time.Time{wall:0xc07704f6d4cb3764, ext:510354422, loc:(*time.Location)(0x5620964fb2a0)}, TTL:-1, Type:"log", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x419, Device:0xfd08}, IdentifierName:"native"}, TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"data_stream [<wazuh-monitoring-{2022.02.03||/d{yyyy.MM.dd|UTC}}>] must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]"}
Could you please help with this? I tried google but with no success. Thank you.

Filebeat reads from alerts.json, you can check this file to see if the alerts are being generated. Judging from the log you provided, it looks like filebeat cannot send some logs to elasticsearch (Cannot index event publisher.Event), but we would need more details about the complete error and source logs causing that error. The output of the command # journalctl -f -u filebeat will be useful in this case to provide further assistance.
Based on previous experience. the problem could be that you have reached the maximum limit of shards opened, by default this number is set to 1000. If this is the case, you will see an error like the following: {"type":"validation_exception","reason":"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;"}
If that's the case, you can either reduce the number of shards, or increase the limit to solve the situation right now. I'd recommend the first approach if you only have 1 Elasticsearch node, having 1000 shards is not healthy for the environment in these cases.
To reduce the number of shards in /etc/filebeat/wazuh-template.json check this information and change it to "1", then restart filebeat. These actions will affect the index from now on, but checking This guide can help you with cases like this one.
Also, you can try to remove old indexes. I would first check what are the indices you have stored. I suppose some of them are related to statistics or other stuff, so I would first try to remove those before actual data (wazuh-alerts-)
You can use:
GET /_cat/indices
As the indices are stored per day by default, so you can remove, for instance, those indices older than 1 month and we only keep one month of those indices
To prevent this from happening in the future, you may try implementing an Index Management Policy after you solve the issue at hand.

Related

Kibana Fatal Error (Server is not ready yet)

retries=0,throttledUntil=0s,bulk_failures=[],search_failures=[]]", "cluster.uuid": "5FrxkY3GRbGzR2nSuEaxow", "node.id": "kHGHvefFTViG8CdPdF5pxw" }
kibana | {"type":"log","#timestamp":"2021-05-28T09:15:07+00:00","tags":["info","savedobjects-service"],"pid":7,"message":"[.kibana_task_manager] UPDATE_TARGET_MAPPINGS_WAIT_FOR_TASK -> DONE"}
kibana | {"type":"log","#timestamp":"2021-05-28T09:15:07+00:00","tags":["info","savedobjects-service"],"pid":7,"message":"[.kibana_task_manager] Migration completed after 8648ms"}
kibana | {"type":"log","#timestamp":"2021-05-28T09:15:28+00:00","tags":["warning","plugins","licensing"],"pid":7,"message":"License information could not be obtained from Elasticsearch due to Error: Cluster client cannot be used after it has been closed. error"}
kibana | {"type":"log","#timestamp":"2021-05-28T09:15:37+00:00","tags":["warning","plugins-system"],"pid":7,"message":"\"eventLog\" plugin didn't stop in 30sec., move on to the next."}
kibana |
kibana | FATAL Error: Unable to complete saved object migrations for the [.kibana] index. RequestAbortedError: The content length (823769731) is bigger than the maximum allowed string (536870888)
kibana |
kibana exited with code 1
I've obtained this error after creating through Vega-lite a number of visualization. How I manage to solve?
If you need I can post all the informations
After analyzing different response on the web I manage to solve:
1 step:
`http://localhost:9200/.kibana/_search?q=type:dashboard&size=100`
2 step:
`curl -X DELETE "localhost:5601/api/saved_objects/kibanaSavedObjectMeta.searchSourceJSON.index/index-pattern/e68f45b0-ab73-11eb-a01c-8590ef1580f4/" -H 'kbn-xsrf: true'
#(Respectly <name>/<type>/<id>)
`
Before doing it I'll do the instruction of this link to be sure that everything work correctly
In practice the problem is refereed to the fact we've too much saved object over the information in the space. There're different way for solving them in case I've deleted all but you can increase also the space. The other suggestion is to avoid partions

Error decoding JSON and broken logs in Elastic search

i'm using elk stack of version 5.5 in ubuntu 16.0
My logs are getting broken and not writting properly into elastic which is resulting in json.erros
like below
Error decoding JSON: invalid character 'e' in literal null (expecting 'u')"
getting json.errors very frequent and those logs are not reading or writting properly into elasticsearch ?
and this is happening for every 5 to 10 mins. please help me solve it.
screenshot of broken logs in kibana
My sample log is :
{"log":"2019-10-01 07:18:26:854*[DEBUG]*cluster2-nio-worker-0*Connection*userEventTriggered*Connection[cassandraclient/10.3.254.137:9042-1, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat\n","stream":"stdout","time":"2019-10-01T07:18:26.85462769Z"}
Since you have stated that the json logs are not pretty printed I assume that the multiline-settings of your input configuration are causing the problems.
In my opinion you don't need any multiline settings when you have logs in json format and they are not pretty printed, meaning the whole json object (= log event) is written in one line.
You have already specified
json.message_key: log
This solely should get the job done.
So to sum it up:
Remove the multiline settings and try again. Your configuration should look like this:
filebeat.inputs:
- type: log
paths:
- "/var/log/containers/*.log"
tags: ["kube-logs"]
symlinks: true
json.message_key: log
json.keys_under_root: true
json.add_error_key: true

Filebeat duplicating events

I am running a basic elk stack setup using Filebeat > logstash > elasticsearch > kibana - all on version 5.2
When I remove Filebeat and configure logstash to look directly at a file, it ingests the correct number of events.
If I delete the data and re-ingest the file using Filebeat to pass the same log file contents to logstash, I get over 10% more events created. I have checked a number of these to confirm the duplicates are being created by filebeat.
Has anyone seen this issue? or have any suggestions why this would happen?
I need to understand first what do you mean by removing file beat!!
Possibility-1
if you have uninstalled and installed again, then obviously file beat will read the data from the path again(which you have re-ingested and post it to logstash->elasticsearch->kibana(assuming old data is not been removed from elastic node) hence the duplicates.
Possibility-2.
You just have stopped filebeat,configured for logstash and restarted filebeat and may be your registry file is not been updated properly during shutdown(as you know,file beat reads line by line and update the registry file upto what line it has successfully published to logstash/elasticsearch/kafka etc and if any of those output servers face any difficulty processing huge load of input coming from filebeat then filebeat waits until those servers are available for further processing of input data.Once those output servers are available,filebeat reads the registry file and scan upto what line it has published and starts publishing next line onwards).
Sample registry file will be like
{
"source": "/var/log/sample/sample.log",
"offset": 88,
"FileStateOS": {
"inode": 243271678,
"device": 51714
},
"timestamp": "2017-02-03T06:22:36.688837822-05:00",
"ttl": -2
}
As you can see, it maintains timestamp in the registry file.
So this is one of the reasons for duplicates.
For further references, you can follow below links
https://discuss.elastic.co/t/filebeat-sending-old-logs-on-restart/46189
https://discuss.elastic.co/t/deleting-filebeat-registry-file/46112
https://discuss.elastic.co/t/filebeat-stop-cleaning-registry/58902
Hope that helps.

percolate returns empty matches under heavy load during elasticsearch cluster resizing

We have an elasticsearch cluster dynamically re-sizing in respect to percolate message count in a rabbitmq queue.
We have a single shard and ~18K query in our index, and we use auto_expand_replicas: "0-all" at index settings to copy single shard to all nodes when a node becomes available.
But during heavy load and cluster re-sizing, some requests produces unexpected empty matches.
We send ~1M percolate request daily and we were losing ~1K content. We added a cluster status control to our code, if cluster status is not green before and after percolate request we're waiting for green status and re-sending percolate request, we were able to reduce lost content count from 1K to ~100 in this way. We do not live this problem in a cluster with fixed node size.
Unfortunately any loss is not acceptable in our scenario, and we don't want to give up auto scaling, we need to find a workaround for this problem.
To repeat problem, you can use following bash script:
https://gist.github.com/ekesken/de41598a1e7e54c6f33c
This script will download and install elasticsearch 1.5.2 on your current directory, create a cluster with 10 nodes on your local and create index and percolation queries and will start testing.
Normally we expect following output for single percolate request:
curl -XGET 'localhost:9200/my-index/my-type/_percolate' -d '{
"doc" : {
"message" : "A new bonsai tree in the office"
}
}'
{"took":95,"_shards":{"total":1,"successful":1,"failed":0},"total":1,"matches":[{"_index":"my-index","_id":"tree"}]}
After running script, if you see all shards in all nodes are started at http://localhost:9200/_cat/shards response and test script is still running, that means you couldn't reproduce problem, try increasing node count which was 10 by default:
./repeat_percolation_loss.sh 15 test-only
When you reproduce problem, script will exit with following output:
{"took":209,"_shards":{"total":1,"successful":1,"failed":0},"total":0,"matches":[]}
Problem repeated! Congratulations.
You can shutdown all servers and clean all directory and files created via script with command:
./repeat_percolation_loss.sh 15 clean
Change node count above with latest node count you've tried.

logstash with elasticsearch_http

Apparently logstash OnDemand account does not work when I wanted to post an issue.
Anyways, I have a logstash setup with redis, elasticsearch, and kibana. My logstash are collecting logs from several files and putting in redis just fine.
Logstash version 1.3.3
Elasticsearch version 1.0.1
The only thing I have in elasticsearch_http for logstash is the host name. This all setup seems to glue together just fine.
The problem is that the elasticsearch_http is not consuming the redis entries as they come. What I have seen by running it in debug mode is that it flush about 100 entries after every 1 min (flush_size and idle_flush_time's default values). The documentation however states, from what I understand is, that it will force a flush in case the 100 flush_size is not satisfied (for example we had 10 messages in last 1 min). But it seems to work the other way. Its flushing about 100 messages every 1 min only. I changed the size to 2000 and it flush 2000 every min or so.
Here is my logstash-indexer.conf
input {
redis {
host => "1xx.xxx.xxx.93"
data_type => "list"
key => "testlogs"
codec => json
}
}
output {
elasticsearch_http {
host => "1xx.xxx.xxx.93"
}
}
Here is my elasticsearch.yml
cluster.name: logger
node.name: "logstash"
transport.tcp.port: 9300
http.port: 9200
discovery.zen.ping.unicast.hosts: ["1xx.xxx.xxx.93:9300"]
discovery.zen.ping.multicast.enabled: false
#discovery.zen.ping.unicast.enabled: true
network.bind_host: 1xx.xxx.xxx.93
network.publish_host: 1xx.xxx.xxx.93
The indexer, elasticsearch, redis, and kibana are on same server. The log collection from file is done on another server.
So I'm going to suggest a couple of different approaches to solve you problem. Logstash as you are discovering can be a bit quirky so I've found a these approaches useful in dealing with unexpected behavior from logstash.
Use the elasticsearch output instead of elasticsearch_http. You
can get the same functionality by using elasticsearch output with
protocol set to http. The elasticsearch output is more mature
(milestone 2 vs milestone 3) and I've seen this change make a
difference before.
Set the defaults for idle_flush_time and flush_size. There have
been issues with Logstash defaults previously, I've found it to be a
lot safer to set them explicitly. idle_flush_time is in seconds,
flush_size is the number of records to flush.
Upgrade to more recent versions of logstash. There is
enough of a change in how logstash is deployed with version 1.4.X
(http://logstash.net/docs/1.4.1/release-notes) that I'd that I'd
bite the bullet and upgrade. It's also significantly easier to get
attention if you still have a problem with the most recent stable
major release.
Make certain your Redis version matches those support by your
logstash version.
Experiment with setting the batch, batch_events and batch_timeout
values for the Redis output. You are using the list data_type.
list supports various batch options and as with some other
parameters it's best not to assume the defaults are always being set
correctly.
Do all of the above. In addition to trying the first set of
suggestions, I'd try all of them together in various combinations.
Keep careful records of each test run. Seems obvious but between all
the variations above it's easy to lose track - I'd keep careful
records and try to change only one variation at a time.

Resources