JSON parser in logstash ignoring data? - elasticsearch

I've been at this a while now, and I feel like the JSON filter in logstash is removing data for me. I originally followed the tutorial from https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-ubuntu-14-04
I've made some changes, but it's mostly the same. My grok filter looks like this:
uuid #uuid and fingerprint to avoid duplicates
{
target => "#uuid"
overwrite => true
}
fingerprint
{
key => "78787878"
concatenate_sources => true
}
grok #Get device name from the name of the log
{
match => { "source" => "%{GREEDYDATA}%{IPV4:DEVICENAME}%{GREEDYDATA}" }
}
grok #get all the other data from the log
{
match => { "message" => "%{NUMBER:unixTime}..." }
}
date #Set the unix times to proper times.
{
match => [ "unixTime","UNIX" ]
target => "TIMESTAMP"
}
grok #Split up the message if it can
{
match => { "MSG_FULL" => "%{WORD:MSG_START}%{SPACE}%{GREEDYDATA:MSG_END}" }
}
json
{
source => "MSG_END"
target => "JSON"
}
So the bit causing problems is the bottom, I think. My gork stuff should all be correct. When I run this config, I see everything in kibana displayed correctly, except for all the logs which would have JSON code in them (not all of the logs have JSON). When I run it again without the JSON filter it displays everything.
I've tried to use a IF statement so that it only runs the JSON filter if it contains JSON code, but that didn't solve anything.
However, when I added a IF statement to only run a specific JSON format (So, if MSG_START = x, y or z then MSG_END will have a different json format. In this case lets say I'm only parsing the z format), then in kibana I would see all the logs that contain x and y JSON format (not parsed though), but it won't show z. So i'm sure it must be something to do with how I'm using the JSON filter.
Also, whenever I want to test with new data I started clearing old data in elasticsearch so that if it works I know it's my logstash that's working and not just running of memory from elasticsearch. I've done this using XDELETE 'http://localhost:9200/logstash-*/'. But logstash won't make new indexes in elasticsearch unless I provide filebeat with new logs. I don't know if this is another problem or not, just thought I should mention it.
I hope that all makes sense.
EDIT: I just check the logstash.stdout file, it turns out it is parsing the json, but it's only showing things with "_jsonparsefailure" in kibana so something must be going wrong with Elastisearch. Maybe. I don't know, just brainstorming :)
SAMPLE LOGS:
1452470936.88 1448975468.00 1 7 mfd_status 000E91DCB5A2 load {"up":[38,1.66,0.40,0.13],"mem":[967364,584900,3596,116772],"cpu":[1299,812,1791,3157,480,144],"cpu_dvfs":[996,1589,792,871,396,1320],"cpu_op":[996,50]}
MSG_START is load, MSG_END is everything after in the above example, so MSG_END is valid JSON that I want to parse.
The log bellow has no JSON in it, but my logstash will try to parse everything after "Inf:" and send out a "_jsonparsefailure".
1452470931.56 1448975463.00 1 6 rc.app 02:11:03.301 Inf: NOSApp: UpdateSplashScreen not implemented on this platform
Also this is my output in logstash, since I feel like that is important now:
elasticsearch
{
hosts => ["localhost:9200"]
document_id => "%{fingerprint}"
}
stdout { codec => rubydebug }

I experienced a similar issue and found that some of my logs were using a UTC time/date stamp and others were not.
Fixed the code to use exclusively UTC and sorted the issue for me.

I asked this question: Logstash output from json parser not being sent to elasticsearch
later on, and it has more relevant information on it, maybe a better answer if anyone ever has a similar problem to me you can check out that link.

Related

Parsing log data throught grok filter (logstash)

I'm pretty new to ELK, and I'm trying to parse my logs throught logstash. Logs are sent by filebeat.
Logs looks like:
2019.12.02 16:21:54.330536 [ 1 ] {} <Information> Application: starting up
2020.03.21 13:14:54.941405 [ 28 ] {xxx23xx-xxx23xx-4f0e-a3c6-rge3gu1} <Debug> executeQuery: (from [::ffff:192.0.0.0]:9999) blahblahblah
2020.03.21 13:14:54.941469 [ 28 ] {xxx23xx-xxx23xx-4f0e-a3c6-rge3gu0} <Error> executeQuery: Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 1
My default logstash configuration is:
input {
beats {
port => 5044
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[#metadata][beat]}-%{[#metadata][version]}-%{+YYYY.MM.dd}"
}
}
In my log example, I extract fields like this:
timestamp
code
pipelineId
logLevel
program
message.
But I have several problems with my grok pattern. First, the timestamp on the log is quite different than a classic timestamp. How can I get it recognized ?
I also have problems when {} can be empty or not. Can you give me some advices on what should be the correct grok pattern please ?
Also, in Kibana, I have A LOT of informations, such as hostname, os details, agent details, source etc. I've read that these fields are ES metadata so it's not possible to remove them. I found that it's
a lot of informations throught, is there any way to "hide" these ?
Grok pattern
On the screenshot below you can see the pattern I constructed for your example log (in Grok Debugger):
Is this the result you're looking for?
Logstash config
# logstash.conf
…
filter {
grok {
patterns_dir => ["./patterns"]
match => {
"message" => "%{CUSTOM_DATE:timestamp}\s\[\s%{BASE10NUM:code}\s\]\s\{%{GREEDYDATA:pipeline_id}\}\s\<%{GREEDYDATA:log_level}\>\s%{GREEDYDATA:program_message}"
}
}
}
…
Custom pattern
As you can see, I told grok to look for my custom patterns in the patterns directory which I put in the same location as my logstash.conf file. In this directory I created the custom.txt file with the following content:
# patterns/custom.txt
CUSTOM_DATE (?>\d\d){1,2}\.(?:0?[1-9]|1[0-2])\.(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])\s(?!<[0-9])(?:2[0123]|[01]?[0-9]):(?:[0-5][0-9])(?::(?:(?:[0-5][0-9]|60)(?:[:.,][0-9]+)?))(?![0-9])
I didn't write this long pattern on my own. I started with this line:
CUSTOM_DATE %{YEAR}\.%{MONTHNUM}\.%{MONTHDAY}\s%{TIME}
Then, I replaced every predefined pattern with a corresponding regular expression (one by one, directly in the Grok Debugger). You can use the %{YEAR}\.%{MONTHNUM}\.%{MONTHDAY}\s%{TIME} in your application, but the Grok Debugger interface will print every part separately.
Do you want to remove empty fields?
I don't know what you want to do in case the pipeline_id field is empty. If you want to remove it completely you can try adding the following lines to your config:
# logstash.conf
…
filter {
grok {
…
}
if [pipeline_id] == "" {
mutate {
remove_field => ["pipeline_id"]
}
}
}
…
Useful resources
Available patterns that I used in my pattern
What to do when part of one field got caught in a different pattern

Logstash extracting and customizing field with grok and ruby

i have this data in elastic search logs saved in a referer field
/clientReq?sessionid=3332&UID=ed91b-517234-4f4c211-a20e-d2e1aefc126a&signUp=false
i want to use ruby to save this data ed91b-517234-4f4c211-a20e-d2e1aefc126a in a separate field.
i have tried this in ruby in my pattern configuration file,
ruby {
code => "
saveid=event[referer].match((\w+[-]?)+)+)
event.set('saved',saveid) "
}
this doesn't even save the entire filed. So i went ahead to try grok filter instead and tried this,
grok {
match => {"message" => "%{COMBINEDAPACHELOG}"}
add_field => { "savedData" => "%{referer}" }
}
neither of these works. I have tested configuration and if configuring successfully. when i visit kibana front end i don't see new field created either.
Ruby hash syntax event[field] = foo is not used anymore, and has been replaced by Get API for example, event.get(referrer).
Beside that, your regex is not correct to get desired results. One of the solutions is to use Positive Lookbehind to check for UID,
this should work,
ruby {
code => "
saveid = event.get('referer').match(/(?<=UID=)((\w+[-]?)+)+/)[1]
event.set('saved',saveid)
"
}
for grok, you can create a new filter for your referer field, and use the gork's predefined UUID pattern to match your string...can you try this,
grok {
match => {"referer" => "UID=%{UUID:saveData}"}
}
hope this helps.

Add extra value to field before sending to elasticsearch

I'm using logstash, filebeat and grok to send data from logs to my elastisearch instance. This is the grok configuration in the pipe
filter {
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:messageDate} %{GREEDYDATA:messagge}"
}
}
}
This works fine, the issue is that messageDate is in this format Jan 15 11:18:25 and it doesn't have a year entry.
Now, i actually know the year these files were created in and i was wondering if it is possible to add the value to the field during the process, that is, somehow turn Jan 15 11:18:25 into 2016 Jan 15 11:18:25 before sending to elasticsearch (obviously without editing the files, which i could do and even with ease but it'll be a temporary fix to what i have to do and not a definitive solution)
I have tried googling if it was possible but no luck...
Valepu,
The only way to modify the data from a field is using the ruby filter:
filter {
ruby {
code => "#your code here#"
}
}
For more information like...how to get,set field values, here is the link:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html
If you have a separate field for date as a string, you can use logstash date plugin:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html
If you don't have it as a separate field (as in this case) use this site to construct your own grok pattern:
http://grokconstructor.appspot.com/do/match
I made this to preprocess the values:
%{YEAR:yearVal} %{MONTH:monthVal} %{NUMBER:dayVal} %{TIME:timeVal} %{GREEDYDATA:message}
Not the most elegant I guess, but you get the values in different fields. Using this you can create your own date field and parse it with date filter so you will get a comparable value or you can use these fields by themselves. I'm sure there is a better solution, for example you could make your own grok pattern and use that, but I'm gonna leave some exploration for you too. :)
By reading thoroughly the grok documentation i found what google couldn't find for me and which i apparently missed the first time i read that page
https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html#plugins-filters-grok-add_field
Using the add_field and remove_field options i managed to add the year to my date, then i used the date plugin to send it to logstash as a timestamp. My filter configuration now looks like this
filter {
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:tMessageDate} %{GREEDYDATA:messagge}"
add_field => { "messageDate" => "2016 %{tMessageDate}" }
remove_field => ["tMessageDate"]
}
}
date {
match => [ "messageDate", "YYYY MMM dd HH:mm:ss"]
}
}
And it worked fine

Logstash filter - half json line parse

I'm using 'filebeat' as a shipper an the client send it to redis, read from redis with logstash and send it to ES.
I'm trying to parse the following example line:
09:24:01.969 watchdog - INFO - 100.140.2 PASSED: Mobile:Mobile[].popover["mc1814"].select(2,) :706<<<<<<<<<<<<<<<<<<< {"actionDuration":613}
In the end I want to have a field names: "actionDuration" with the value: 613.
As you can see it's partially json.
- I've tried to use grok filter, with add_field and match and I've tried to change a few configurations in the filebeat and logstash.
I'm using the basic configurations:
filebeat.conf:
filebeat.prospectors:
input_type: log
paths:
/sketch/workspace/sanity-dev-kennel/out/*.log
fields:
type: watchdog
BUILD_ID: 82161
If there's a possibility to do it in the filebeat side I prefer, but it's also good in the Logstash side.
Thanks a lot,
Moshe
This sort of partial-formatting is best handled on the Logstash side, not the shipper. The filters/transforms available in FileBeat aren't up to that. A Logstash filter pipeline is, though.
filter {
grok {
match => {
"message" => [ "(?<plain_prefix>^.*?) (?<json_segment>{.*$)"]
}
}
json {
source => "json_segment"
}
mutate {
remove_field => [ "json_segment" ]
}
}
This basic example will split your incoming message into two fields. a plain_prefix and a json_segment. The json{} filter is then used to parse the JSON data into the event. Finally, a mutate {} filter is used to remove the json_segment field from the event, as it has already been parsed and included.
Note: the .*? in the plain_prefix is critical in this filter. Constructed this way, everything from the first { onward is considered part of the JSON segment. If you use .*, the JSON segment will be everything from the last {, which will be a problem with complex JSON datastructures.

Logstash not parsing multiple named capture groups

I have just started playing around with Logstash, ElasticSearch and Kibana for visualisation of logs and am currently experiencing some problems.
I have a log file that is being gathered by logstash and I want to extract fields from log entries before writing these into ElasticSearch.
I have define a filter with my a number of named capture groups in my logstash config file but at this point only the first of those named capture groups is matching.
My log file looks something like the following:
[2014-01-31 12:00:00] [FIELD1:SOMEVALUE] [FIELD2:SOMEVALUE]
and my logstash filter looks like the follwing:
if[type] == "mytype { grok { match => [ "message", "(?<TIMESTAMP>regex)", "message", "(?<FIELD1>regex)", "message", "(?<FIELD2>regex)" ] } }
I have verfied the regexes for all my fields are correct but when I go to the Kibana dashboard FIELD1 and FIELD2 are not appearing.
If anyone could shed some light on this I would be grateful.
Thanks
Kevin
grok's default behavior is to stop processing after the first match.
You can change this by setting break_on_match to false:
if[type] == "mytype {
grok
{
match => [
"message", "(?<TIMESTAMP>regex)",
"message", "(?<FIELD1>regex)",
"message", "(?<FIELD2>regex)"
]
break_on_match => false
}
}
After learning a bit more about parsing using grok I've found a lot of the time it isn't necessary to have to write my own regexes. There are a number of predefined grok patterns I can use and I can extend these to create my own custom patterns when parsing logstash logs.
A useful link on the grok patterns supported by logstash: https://github.com/elastic/logstash/blob/v1.4.2/patterns/grok-patterns.
Using the new found knowledge I was able to change my match configuration to that below.
if[type] == "mytype" {
grok {
match => ["\[%{TIMESTAMP_ISO8601:dateTime}\]%{SPACE}\[%{WORD}\:%{FLOATINGPOINT:cpu}\]%{SPACE}\[%{WORD}\:%{FLOATINGPOINT:memory}\]"]
}
}
This uses the built in grok patterns TIMESTAMP:ISO8601 to pick out the date in my logs, and I have created a very simple custom pattern FLOATINGPOINT to pick out the floating point values for memory and cpu in my example. The FLOATINGPOINT pattern looks like:
FLOATINGPOINT %{INT}\.%{INT}

Resources