Get Logstash to parse a whole log line by line

Get Logstash to parse a whole log line by line - elasticsearch

Currently at the end of my Jenkins build I grab the console log and add it to a json blob along with the build details, and I send that to logstash via curl
def payload = JsonOutput.toJson([
CONSOLE: getConsoleText(),
BUILD_RESULT: currentBuild.result,
] << manager.getEnvVars()
)
sh "curl -i -X PUT -H \'content-type: application/json\' --insecure -d #data.json http://mylogstash/jenkins"
Logstash then put this straight into elasticsearch against a Jenkins index for the day. This works great and the whole log gets stored in elasticsearch, but it doesnt make it very searchable.
What I would like to do is send the log to logstash as a whole (as it is quite large), and for logstash to parse it line by line and apply filters. Then any lines I dont filter out to be posted to ES as a document by itself.
Is this possible, or would I have to send it line by line from Jenkins? As the log files are thousands of lines long would result in loads of requests to logstash.

If you have the flexibilty to it, i would suggest you to write the console logs to a log file. In this way you can use filebeat to automatically read the log line by line and send it over to the logstash. By using filebeat, you get the advantage of guaranteed single delivery of the data and automatic retires if and when the logstash goes down.
Once the data reaches logstash, you can use the pipeline to parse/filter the data as per your requirement. The Grok debugger available in this link is handy --> http://grokdebug.herokuapp.com/.
After transforming the data, the document can be sent to ES for persistance.

Related

Wazuh - Filebeat - Elasticsearch non-zero metrics

Could you please help me solve this Filebeat error?
Its Wazuh manager server. All is working, I can connect to Kibana web, enter Wazuh app and I can see there my three Wazuh agents connected and active.
I want FIM monitoring nad If I change file on agent server, alert is created and I can see that alert in alert.log on manager server. Issue is, that Filebeat wont send this alert to elasticsearch so I cant see that alert on Kibana web.
Wazuh manager>
Wazuh 4.2.5
Filebeat 7.14.2
Elasticsearch 7.14.2
Kibana 7.14.2
Wazuh alert log - /var/ossec/logs/alerts/2022/Feb/ and /var/ossec/logs/alerts
systemctl status filebeat is active, but I can see there lines:
WARN [elasticsearch] elasticsearch/client.go:405 Cannot>
This is error from > filebeat -e
2022-02-03T12:46:20.386+0100 INFO [monitoring] log/log.go:153 Total non-zero metrics {"monitoring": {"metrics": {"beat":{"cgroup":{"memory":{"id":"session-248447.scope","mem":{"limit":{"bytes":9223372036854771712},"usage":{"bytes":622415872}}}},"cpu":{"system":{"ticks":70,"time":{"ms":72}},"total":{"ticks":300,"time":{"ms":311},"value":300},"user":{"ticks":230,"time":{"ms":239}}},"handles":{"limit":{"hard":262144,"soft":1024},"open":9},"info":{"ephemeral_id":"641d7fdd-47a0-4b10-bda9-36f29c29fdef","uptime":{"ms":98413},"version":"7.14.2"},"memstats":{"gc_next":18917616,"memory_alloc":14197072,"memory_sys":75383816,"memory_total":71337840,"rss":115638272},"runtime":{"goroutines":11}},"filebeat":{"harvester":{"open_files":0,"running":0}},"libbeat":{"config":{"module":{"running":2,"starts":2},"reloads":1,"scans":1},"output":{"events":{"active":0},"type":"elasticsearch"},"
And here is error found in /var/log/messages
Feb 3 10:27:54 filebeat[2531915]: 2022-02-03T10:27:54.707+0100#011WARN#011[elasticsearch]#011elasticsearch/client.go:405#011Cannot index event publisher.Event{Content:beat.Event{Timestamp:time.Time{wall:0xc07705e669760167, ext:958857091513, loc:(*time.Location)(0x5620964fb2a0)}, Meta:{"pipeline":"filebeat-7.14.0-wazuh-alerts-pipeline"}, Fields:{"agent":{"ephemeral_id":"33cb9baa-af71-4b44-99a6-1379c747722f","hostname":"xlc","id":"03fb57ca-9940-4886-9e6e-a3b3e635cd35","name":"xlc","type":"filebeat","version":"7.14.0"},"ecs":{"version":"1.10.0"},"event":{"dataset":"wazuh.alerts","module":"wazuh"},"fields":{"index_prefix":"wazuh-monitoring-"},"fileset":{"name":"alerts"},"host":{"name":"xlc"},"input":{"type":"log"},"log":{"file":{"path":"/var/ossec/logs/alerts/alerts.json"},"offset":122695554},"message":"{\"timestamp\":\"2022-02-03T10:27:52.438+0100\",\"rule\":{\"level\":5,\"description\":\"Registry Value Integrity Checksum Changed\",\"id\":\"750\",\"mitre\":{\"id\":[\"T1492\"],\"tactic\":[\"Impact\"],\"technique\":[\"Stored Data Manipulation\"]},\"firedtimes\":7,\"mail\":false,\"groups\":[\"ossec\",\"syscheck\",\"syscheck_entry_modified\",\"syscheck_registry\"],\"pci_dss\":[\"11.5\"],\"gpg13\":[\"4.13\"],\"gdpr\":[\"II_5.1.f\"],\"hipaa\":[\"164.312.c.1\",\"164.312.c.2\"],\"nist_800_53\":[\"SI.7\"],\"tsc\":[\"PI1.4\",\"PI1.5\",\"CC6.1\",\"CC6.8\",\"CC7.2\",\"CC7.3\"]},\"agent\":{\"id\":\"006\",\"name\":\"CPP\",\"ip\":\"10.74.37.3\"},\"manager\":{\"name\":\"xlc\"},\"id\":\"1643880472.68132386\",\"full_log\":\"Registry Value '[x32] HKEY_LOCAL_MACHINE\\\\System\\\\CurrentControlSet\\\\Services\\\\W32Time\\\\Config\\\\LastKnownGoodTime' modified\\nMode: scheduled\\nChanged attributes: md5,sha1,sha256\\nOld md5sum was: '5df5b1598b729d98734105148103abf2'\\nNew md5sum is : '361334bf60bdd83e30894c4f313d16ec'\\nOld sha1sum was: 'c233c8ccb56fbd363c44b51a9d51c7fa32512474'\\nNew sha1sum is : '7163cffa48f1a7c0bcb4a3ddff6278ae9a4895a6'\\nOld sha256sum was: '3aad3da22f2d53e8ac33c46c73f40c3e8f5db07188d166e24957d8a20b62b5f1'\\nNew sha256sum is : 'bee8072335d870a1624a541cb13ca5085ba85646a8417d4d894deff71c3f4a92'\\n\",\"syscheck\":{\"path\":\"HKEY_LOCAL_MACHINE\\\\System\\\\CurrentControlSet\\\\Services\\\\W32Time\\\\Config\",\"mode\":\"scheduled\",\"arch\":\"[x32]\",\"value_name\":\"LastKnownGoodTime\",\"size_after\":\"8\",\"md5_before\":\"5df5b1598b729d98734105148103abf2\",\"md5_after\":\"361334bf60bdd83e30894c4f313d16ec\",\"sha1_before\":\"c233c8ccb56fbd363c44b51a9d51c7fa32512474\",\"sha1_after\":\"7163cffa48f1a7c0bcb4a3ddff6278ae9a4895a6\",\"sha256_before\":\"3aad3da22f2d53e8ac33c46c73f40c3e8f5db07188d166e24957d8a20b62b5f1\",\"sha256_after\":\"bee8072335d870a1624a541cb13ca5085ba85646a8417d4d894deff71c3f4a92\",\"changed_attributes\":[\"md5\",\"sha1\",\"sha256\"],\"event\":\"modified\"},\"decoder\":{\"name\":\"syscheck_registry_value_modified\"},\"location\":\"syscheck\"}","service":{"type":"wazuh"}}, Private:file.State{Id:"native::1049-64776", PrevId:"", Finished:false, Fileinfo:(*os.fileStat)(0xc000fc9380), Source:"/var/ossec/logs/alerts/alerts.json", Offset:122697450, Timestamp:time.Time{wall:0xc07704f6d4cb3764, ext:510354422, loc:(*time.Location)(0x5620964fb2a0)}, TTL:-1, Type:"log", Meta:map[string]string(nil), FileStateOS:file.StateOS{Inode:0x419, Device:0xfd08}, IdentifierName:"native"}, TimeSeries:false}, Flags:0x1, Cache:publisher.EventCache{m:common.MapStr(nil)}} (status=400): {"type":"illegal_argument_exception","reason":"data_stream [<wazuh-monitoring-{2022.02.03||/d{yyyy.MM.dd|UTC}}>] must not contain the following characters [ , \", *, \\, <, |, ,, >, /, ?]"}
Could you please help with this? I tried google but with no success. Thank you.

Filebeat reads from alerts.json, you can check this file to see if the alerts are being generated. Judging from the log you provided, it looks like filebeat cannot send some logs to elasticsearch (Cannot index event publisher.Event), but we would need more details about the complete error and source logs causing that error. The output of the command # journalctl -f -u filebeat will be useful in this case to provide further assistance.
Based on previous experience. the problem could be that you have reached the maximum limit of shards opened, by default this number is set to 1000. If this is the case, you will see an error like the following: {"type":"validation_exception","reason":"Validation Failed: 1: this action would add [2] total shards, but this cluster currently has [1000]/[1000] maximum shards open;"}
If that's the case, you can either reduce the number of shards, or increase the limit to solve the situation right now. I'd recommend the first approach if you only have 1 Elasticsearch node, having 1000 shards is not healthy for the environment in these cases.
To reduce the number of shards in /etc/filebeat/wazuh-template.json check this information and change it to "1", then restart filebeat. These actions will affect the index from now on, but checking This guide can help you with cases like this one.
Also, you can try to remove old indexes. I would first check what are the indices you have stored. I suppose some of them are related to statistics or other stuff, so I would first try to remove those before actual data (wazuh-alerts-)
You can use:
GET /_cat/indices
As the indices are stored per day by default, so you can remove, for instance, those indices older than 1 month and we only keep one month of those indices
To prevent this from happening in the future, you may try implementing an Index Management Policy after you solve the issue at hand.

logstash won’t read single line files

I'm trying to make a pipeline that sends xml documents to elasticsearch. Problem is that each document is in its own separate file as a single line without \n in the end.
Any way to tell logstash not to wait for \n but read whole file till EOF and send it?

Can you specify which logstash version you are using, and can you share your configuration?
It may depends on the mode you set: it may be tail or read. it defaults to tail, which means it listens on your file and it waits for default 1 hour before closing it and stopping waiting for new lines.
You may have to change this parameter fro 1 hour to 1 second if you know you have reached the EOF yet:
file {
close_older=> "1 second"
}
Let me know if that works!
Docs here: https://www.elastic.co/guide/en/logstash/current/plugins-inputs-file.html#plugins-inputs-file-close_older

extracting data from multiple events from Elasticsearch using single logstash filter

I have log lines loaded in ElasticSearch which has the data scattered in multiple events, say event_id is in event (line) number 5 and event_action is available in event number 88, further event_port information is available in event number 455. How can i extract this data so that my output looks like following. For this case multiline codec will not work.
{
event_id: 1223
event_action: "socket_open"
event_port: 76654
}
Currently I have the log files persisted so i can get the file path from ES. I tried to have a shell script executed from ruby filter, this shell script can perform grep commands and put the stdout data in a new event, like following.
input {
elasticsearch {
hosts => "localhost:9200"
index => "my-logs"
}
}
filter
{
ruby {
code => 'require "open3"
file_path = event.get("file_path")
cmd = "my_filter.sh -f #{file_path}"
stdin, stdout, stderr = Open3.popen3(cmd)
event.set("process_result", stdout.read)
err = stderr.read
if err.to_s.empty?
filter_matched(event)
else
event.set("ext_script_err_msg", err)
end'
remove_field => ["file_path"]
}
}
With above approach I am facing problems.
1) Doing grep in huge files can be time consuming. Is there any alternative, whithout having to grep on files?
2) My input plugin (attached above) is taking events from Elastic Search which has file_path set for "ALL" events in an index, this makes my_filter.sh to be executed multiple times which is something I want to avoid. How can I extract unique file_path from ES?

Elasticsearch was not made to build output stream depending on input. Elastic is a noSQL database, where the data should be consumed over the time (in a real time approach). It means that you should first store everything in your Elasticsearch, and process the data after. In your case, you are stucking the flow by waiting different events.
If you really need to catch these events and process it in background, you could try something like nxlog before filtering in logstash (input is nxlog) or a python script (use it as a filter in logstash). In your case I would pre-process my data to consolidate it, and then send it to logstash

Filebeat duplicating events

I am running a basic elk stack setup using Filebeat > logstash > elasticsearch > kibana - all on version 5.2
When I remove Filebeat and configure logstash to look directly at a file, it ingests the correct number of events.
If I delete the data and re-ingest the file using Filebeat to pass the same log file contents to logstash, I get over 10% more events created. I have checked a number of these to confirm the duplicates are being created by filebeat.
Has anyone seen this issue? or have any suggestions why this would happen?

I need to understand first what do you mean by removing file beat!!
Possibility-1
if you have uninstalled and installed again, then obviously file beat will read the data from the path again(which you have re-ingested and post it to logstash->elasticsearch->kibana(assuming old data is not been removed from elastic node) hence the duplicates.
Possibility-2.
You just have stopped filebeat,configured for logstash and restarted filebeat and may be your registry file is not been updated properly during shutdown(as you know,file beat reads line by line and update the registry file upto what line it has successfully published to logstash/elasticsearch/kafka etc and if any of those output servers face any difficulty processing huge load of input coming from filebeat then filebeat waits until those servers are available for further processing of input data.Once those output servers are available,filebeat reads the registry file and scan upto what line it has published and starts publishing next line onwards).
Sample registry file will be like
{
"source": "/var/log/sample/sample.log",
"offset": 88,
"FileStateOS": {
"inode": 243271678,
"device": 51714
},
"timestamp": "2017-02-03T06:22:36.688837822-05:00",
"ttl": -2
}
As you can see, it maintains timestamp in the registry file.
So this is one of the reasons for duplicates.
For further references, you can follow below links
https://discuss.elastic.co/t/filebeat-sending-old-logs-on-restart/46189
https://discuss.elastic.co/t/deleting-filebeat-registry-file/46112
https://discuss.elastic.co/t/filebeat-stop-cleaning-registry/58902
Hope that helps.

Configure logstash to take updated values

I have connected logstash, Elasticsearch and Kibana. It all works fine.
I used logstash to take the tomcat logs.
input {
file {
path => "/tom_logs/*"
type => "tomcat"
start_position => "end"
}
}
Once i updated the log file, It takes the whole logs in the file instead of updated log. I just want to load the log which is last updated.
Any one help me.
Thanks in advance

Your problem is a bit strange because I never experienced it. To be sure that I understand correctly : when a new log comes, logstash start analysing again all the logs in the file ?
You correctly specify the start_position=>"end" which is actually the default option. In this case, logstash must consider only new changes in the file (so, new logs) since its start-up.
So, I think the issue of this "bug" is not in logstash but in "how" tomcat writes logs... But if I were you, I'd try to specify path=>"tom_logs/*.log" instead of * only.
Hope it will help.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio