Error decoding JSON and broken logs in Elastic search - elasticsearch

i'm using elk stack of version 5.5 in ubuntu 16.0
My logs are getting broken and not writting properly into elastic which is resulting in json.erros
like below
Error decoding JSON: invalid character 'e' in literal null (expecting 'u')"
getting json.errors very frequent and those logs are not reading or writting properly into elasticsearch ?
and this is happening for every 5 to 10 mins. please help me solve it.
screenshot of broken logs in kibana
My sample log is :
{"log":"2019-10-01 07:18:26:854*[DEBUG]*cluster2-nio-worker-0*Connection*userEventTriggered*Connection[cassandraclient/10.3.254.137:9042-1, inFlight=0, closed=false] was inactive for 30 seconds, sending heartbeat\n","stream":"stdout","time":"2019-10-01T07:18:26.85462769Z"}

Since you have stated that the json logs are not pretty printed I assume that the multiline-settings of your input configuration are causing the problems.
In my opinion you don't need any multiline settings when you have logs in json format and they are not pretty printed, meaning the whole json object (= log event) is written in one line.
You have already specified
json.message_key: log
This solely should get the job done.
So to sum it up:
Remove the multiline settings and try again. Your configuration should look like this:
filebeat.inputs:
- type: log
paths:
- "/var/log/containers/*.log"
tags: ["kube-logs"]
symlinks: true
json.message_key: log
json.keys_under_root: true
json.add_error_key: true

Related

Wazuh - Filebeat - Elasticsearch non-zero metrics

Could you please help me solve this Filebeat error?
Its Wazuh manager server. All is working, I can connect to Kibana web, enter Wazuh app and I can see there my three Wazuh agents connected and active.
I want FIM monitoring nad If I change file on agent server, alert is created and I can see that alert in alert.log on manager server. Issue is, that Filebeat wont send this alert to elasticsearch so I cant see that alert on Kibana web.
Wazuh manager>
Wazuh 4.2.5
Filebeat 7.14.2
Elasticsearch 7.14.2
Kibana 7.14.2
Wazuh alert log - /var/ossec/logs/alerts/2022/Feb/ and /var/ossec/logs/alerts
systemctl status filebeat is active, but I can see there lines:
WARN [elasticsearch] elasticsearch/client.go:405 Cannot>
This is error from > filebeat -e
2022-02-03T12:46:20.386+0100 INFO [monitoring] log/log.go:153 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"id":"session-248447.scope","mem":{"limit":{"bytes":9223372036854771712},"usage":{"bytes":622415872}}}},"cpu":{"system":{"ticks":70,"time":{"ms":72}},"total":{"ticks":300,"time":{"ms":311},"value":300},"user":{"ticks":230,"time":{"ms":239}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":9},"info":{"ephemeral_id":"641d7fdd-47a0-4b10-bda9-36f29c29fdef","uptime":{"ms":98413},"version":"7.14.2"},"memstats":{"gc_next":18917616,"memory_alloc":14197072,"memory_sys":75383816,"memory_total":71337840,"rss":115638272},"runtime":{"goroutines":11}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":2,"starts":2},"reloads":1,"scans":1},"output":{"events":{"active":0},"type":"elasticsearch"},"
And here is error found in /var/log/messages
Feb 3 10:27:54 filebeat[2531915]: 2022-02-03T10:27:54.707+0100#011WARN#011[elasticsearch]#011elasticsearch/client.go:405#011Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07705e669760167, ext:958857091513, loc:(*time.Location)(0x5620964fb2a0)}, Meta:{"pipeline":"filebeat-7.14.0-wazuh-alerts-pipeline"}, Fields:{"agent":{"ephemeral_id":"33cb9baa-af71-4b44-99a6-1379c747722f","hostname":"xlc","id":"03fb57ca-9940-4886-9e6e-a3b3e635cd35","name":"xlc","type":"filebeat","version":"7.14.0"},"ecs":{"version":"1.10.0"},"event":{"dataset":"wazuh.alerts","module":"wazuh"},"fields":{"index_prefix":"wazuh-monitoring-"},"fileset":{"name":"alerts"},"host":{"name":"xlc"},"input":{"type":"log"},"log":{"file":{"path":"/var/ossec/logs/alerts/alerts.json"},"offset":122695554},"message":"{\"timestamp\":\"2022-02-03T10:27:52.438+0100\",\"rule\":{\"level\":5,\"description\":\"Registry Value Integrity Checksum Changed\",\"id\":\"750\",\"mitre\":{\"id\":[\"T1492\"],\"tactic\":[\"Impact\"],\"technique\":[\"Stored Data Manipulation\"]},\"firedtimes\":7,\"mail\":false,\"groups\":[\"ossec\",\"syscheck\",\"syscheck_entry_modified\",\"syscheck_registry\"],\"pci_dss\":[\"11.5\"],\"gpg13\":[\"4.13\"],\"gdpr\":[\"II_5.1.f\"],\"hipaa\":[\"164.312.c.1\",\"164.312.c.2\"],\"nist_800_53\":[\"SI.7\"],\"tsc\":[\"PI1.4\",\"PI1.5\",\"CC6.1\",\"CC6.8\",\"CC7.2\",\"CC7.3\"]},\"agent\":{\"id\":\"006\",\"name\":\"CPP\",\"ip\":\"10.74.37.3\"},\"manager\":{\"name\":\"xlc\"},\"id\":\"1643880472.68132386\",\"full_log\":\"Registry Value '[x32] HKEY_LOCAL_MACHINE\\\\System\\\\CurrentControlSet\\\\Services\\\\W32Time\\\\Config\\\\LastKnownGoodTime' modified\\nMode: scheduled\\nChanged attributes: md5,sha1,sha256\\nOld md5sum was: '5df5b1598b729d98734105148103abf2'\\nNew md5sum is : '361334bf60bdd83e30894c4f313d16ec'\\nOld sha1sum was: 'c233c8ccb56fbd363c44b51a9d51c7fa32512474'\\nNew sha1sum is : '7163cffa48f1a7c0bcb4a3ddff6278ae9a4895a6'\\nOld sha256sum was: '3aad3da22f2d53e8ac33c46c73f40c3e8f5db07188d166e24957d8a20b62b5f1'\\nNew sha256sum is : 'bee8072335d870a1624a541cb13ca5085ba85646a8417d4d894deff71c3f4a92'\\n\",\"syscheck\":{\"path\":\"HKEY_LOCAL_MACHINE\\\\System\\\\CurrentControlSet\\\\Services\\\\W32Time\\\\Config\",\"mode\":\"scheduled\",\"arch\":\"[x32]\",\"value_name\":\"LastKnownGoodTime\",\"size_after\":\"8\",\"md5_before\":\"5df5b1598b729d98734105148103abf2\",\"md5_after\":\"361334bf60bdd83e30894c4f313d16ec\",\"sha1_before\":\"c233c8ccb56fbd363c44b51a9d51c7fa32512474\",\"sha1_after\":\"7163cffa48f1a7c0bcb4a3ddff6278ae9a4895a6\",\"sha256_before\":\"3aad3da22f2d53e8ac33c46c73f40c3e8f5db07188d166e24957d8a20b62b5f1\",\"sha256_after\":\"bee8072335d870a1624a541cb13ca5085ba85646a8417d4d894deff71c3f4a92\",\"changed_attributes\":[\"md5\",\"sha1\",\"sha256\"],\"event\":\"modified\"},\"decoder\":{\"name\":\"syscheck_registry_value_modified\"},\"location\":\"syscheck\"}","service":{"type":"wazuh"}}, Private:file.State{Id:"native::1049-64776", PrevId:"", Finished:false, Fileinfo:(*os.fileStat)(0xc000fc9380), Source:"/var/ossec/logs/alerts/alerts.json", Offset:122697450, Timestamp:time.Time{wall:0xc07704f6d4cb3764, ext:510354422, loc:(*time.Location)(0x5620964fb2a0)}, TTL:-1, Type:"log", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x419, Device:0xfd08}, IdentifierName:"native"}, TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"data_stream [<wazuh-monitoring-{2022.02.03||/d{yyyy.MM.dd|UTC}}>] must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]"}
Could you please help with this? I tried google but with no success. Thank you.
Filebeat reads from alerts.json, you can check this file to see if the alerts are being generated. Judging from the log you provided, it looks like filebeat cannot send some logs to elasticsearch (Cannot index event publisher.Event), but we would need more details about the complete error and source logs causing that error. The output of the command # journalctl -f -u filebeat will be useful in this case to provide further assistance.
Based on previous experience. the problem could be that you have reached the maximum limit of shards opened, by default this number is set to 1000. If this is the case, you will see an error like the following: {"type":"validation_exception","reason":"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;"}
If that's the case, you can either reduce the number of shards, or increase the limit to solve the situation right now. I'd recommend the first approach if you only have 1 Elasticsearch node, having 1000 shards is not healthy for the environment in these cases.
To reduce the number of shards in /etc/filebeat/wazuh-template.json check this information and change it to "1", then restart filebeat. These actions will affect the index from now on, but checking This guide can help you with cases like this one.
Also, you can try to remove old indexes. I would first check what are the indices you have stored. I suppose some of them are related to statistics or other stuff, so I would first try to remove those before actual data (wazuh-alerts-)
You can use:
GET /_cat/indices
As the indices are stored per day by default, so you can remove, for instance, those indices older than 1 month and we only keep one month of those indices
To prevent this from happening in the future, you may try implementing an Index Management Policy after you solve the issue at hand.

Get Logstash to parse a whole log line by line

Currently at the end of my Jenkins build I grab the console log and add it to a json blob along with the build details, and I send that to logstash via curl
def payload = JsonOutput.toJson([
CONSOLE: getConsoleText(),
BUILD_RESULT: currentBuild.result,
] << manager.getEnvVars()
)
sh "curl -i -X PUT -H \'content-type: application/json\' --insecure -d #data.json http://mylogstash/jenkins"
Logstash then put this straight into elasticsearch against a Jenkins index for the day. This works great and the whole log gets stored in elasticsearch, but it doesnt make it very searchable.
What I would like to do is send the log to logstash as a whole (as it is quite large), and for logstash to parse it line by line and apply filters. Then any lines I dont filter out to be posted to ES as a document by itself.
Is this possible, or would I have to send it line by line from Jenkins? As the log files are thousands of lines long would result in loads of requests to logstash.
If you have the flexibilty to it, i would suggest you to write the console logs to a log file. In this way you can use filebeat to automatically read the log line by line and send it over to the logstash. By using filebeat, you get the advantage of guaranteed single delivery of the data and automatic retires if and when the logstash goes down.
Once the data reaches logstash, you can use the pipeline to parse/filter the data as per your requirement. The Grok debugger available in this link is handy --> http://grokdebug.herokuapp.com/.
After transforming the data, the document can be sent to ES for persistance.

Filebeat duplicating events

I am running a basic elk stack setup using Filebeat > logstash > elasticsearch > kibana - all on version 5.2
When I remove Filebeat and configure logstash to look directly at a file, it ingests the correct number of events.
If I delete the data and re-ingest the file using Filebeat to pass the same log file contents to logstash, I get over 10% more events created. I have checked a number of these to confirm the duplicates are being created by filebeat.
Has anyone seen this issue? or have any suggestions why this would happen?
I need to understand first what do you mean by removing file beat!!
Possibility-1
if you have uninstalled and installed again, then obviously file beat will read the data from the path again(which you have re-ingested and post it to logstash->elasticsearch->kibana(assuming old data is not been removed from elastic node) hence the duplicates.
Possibility-2.
You just have stopped filebeat,configured for logstash and restarted filebeat and may be your registry file is not been updated properly during shutdown(as you know,file beat reads line by line and update the registry file upto what line it has successfully published to logstash/elasticsearch/kafka etc and if any of those output servers face any difficulty processing huge load of input coming from filebeat then filebeat waits until those servers are available for further processing of input data.Once those output servers are available,filebeat reads the registry file and scan upto what line it has published and starts publishing next line onwards).
Sample registry file will be like
{
"source": "/var/log/sample/sample.log",
"offset": 88,
"FileStateOS": {
"inode": 243271678,
"device": 51714
},
"timestamp": "2017-02-03T06:22:36.688837822-05:00",
"ttl": -2
}
As you can see, it maintains timestamp in the registry file.
So this is one of the reasons for duplicates.
For further references, you can follow below links
https://discuss.elastic.co/t/filebeat-sending-old-logs-on-restart/46189
https://discuss.elastic.co/t/deleting-filebeat-registry-file/46112
https://discuss.elastic.co/t/filebeat-stop-cleaning-registry/58902
Hope that helps.

Configure logstash to take updated values

I have connected logstash, Elasticsearch and Kibana. It all works fine.
I used logstash to take the tomcat logs.
input {
file {
path => "/tom_logs/*"
type => "tomcat"
start_position => "end"
}
}
Once i updated the log file, It takes the whole logs in the file instead of updated log. I just want to load the log which is last updated.
Any one help me.
Thanks in advance
Your problem is a bit strange because I never experienced it. To be sure that I understand correctly : when a new log comes, logstash start analysing again all the logs in the file ?
You correctly specify the start_position=>"end" which is actually the default option. In this case, logstash must consider only new changes in the file (so, new logs) since its start-up.
So, I think the issue of this "bug" is not in logstash but in "how" tomcat writes logs... But if I were you, I'd try to specify path=>"tom_logs/*.log" instead of * only.
Hope it will help.

logstash with elasticsearch_http

Apparently logstash OnDemand account does not work when I wanted to post an issue.
Anyways, I have a logstash setup with redis, elasticsearch, and kibana. My logstash are collecting logs from several files and putting in redis just fine.
Logstash version 1.3.3
Elasticsearch version 1.0.1
The only thing I have in elasticsearch_http for logstash is the host name. This all setup seems to glue together just fine.
The problem is that the elasticsearch_http is not consuming the redis entries as they come. What I have seen by running it in debug mode is that it flush about 100 entries after every 1 min (flush_size and idle_flush_time's default values). The documentation however states, from what I understand is, that it will force a flush in case the 100 flush_size is not satisfied (for example we had 10 messages in last 1 min). But it seems to work the other way. Its flushing about 100 messages every 1 min only. I changed the size to 2000 and it flush 2000 every min or so.
Here is my logstash-indexer.conf
input {
redis {
host => "1xx.xxx.xxx.93"
data_type => "list"
key => "testlogs"
codec => json
}
}
output {
elasticsearch_http {
host => "1xx.xxx.xxx.93"
}
}
Here is my elasticsearch.yml
cluster.name: logger
node.name: "logstash"
transport.tcp.port: 9300
http.port: 9200
discovery.zen.ping.unicast.hosts: ["1xx.xxx.xxx.93:9300"]
discovery.zen.ping.multicast.enabled: false
#discovery.zen.ping.unicast.enabled: true
network.bind_host: 1xx.xxx.xxx.93
network.publish_host: 1xx.xxx.xxx.93
The indexer, elasticsearch, redis, and kibana are on same server. The log collection from file is done on another server.
So I'm going to suggest a couple of different approaches to solve you problem. Logstash as you are discovering can be a bit quirky so I've found a these approaches useful in dealing with unexpected behavior from logstash.
Use the elasticsearch output instead of elasticsearch_http. You
can get the same functionality by using elasticsearch output with
protocol set to http. The elasticsearch output is more mature
(milestone 2 vs milestone 3) and I've seen this change make a
difference before.
Set the defaults for idle_flush_time and flush_size. There have
been issues with Logstash defaults previously, I've found it to be a
lot safer to set them explicitly. idle_flush_time is in seconds,
flush_size is the number of records to flush.
Upgrade to more recent versions of logstash. There is
enough of a change in how logstash is deployed with version 1.4.X
(http://logstash.net/docs/1.4.1/release-notes) that I'd that I'd
bite the bullet and upgrade. It's also significantly easier to get
attention if you still have a problem with the most recent stable
major release.
Make certain your Redis version matches those support by your
logstash version.
Experiment with setting the batch, batch_events and batch_timeout
values for the Redis output. You are using the list data_type.
list supports various batch options and as with some other
parameters it's best not to assume the defaults are always being set
correctly.
Do all of the above. In addition to trying the first set of
suggestions, I'd try all of them together in various combinations.
Keep careful records of each test run. Seems obvious but between all
the variations above it's easy to lose track - I'd keep careful
records and try to change only one variation at a time.

Resources