Grok doesn't match multiline log entries properly? - elasticsearch

I've been going at this for weeks now and I can't seem to wrap my head around what's wrong about this.
I'm trying to get all of these lines to fit into a multiline match with grok, but it only picks up the last one, and even discards the digit at the beginning of the line.
11:31:03.936 < : 1> 5: Load times per type (ms):
12: aaaaaa.aaaaaaaaa.aaaaaaa.aaaaaaa
1: bbbb.bbbb.bbbbbbbbbbbbb.bbbbbbbbb
3: cccc.cccccccc.ccccccccccc.cccccc
64: ddd.dddddddddddd.ddddddd.ddddddd
Expected result:
message_processed = Load times per type (ms):
12: aaaaaa.aaaaaaaaa.aaaaaaa.aaaaaaa
1: bbbb.bbbb.bbbbbbbbbbbbb.bbbbbbbbb
3: cccc.cccccccc.ccccccccccc.cccccc
64: ddd.dddddddddddd.ddddddd.ddddddd
Actual result:
message_processed = ddd.dddddddddddd.ddddddd.ddddddd
I'm using the following grok pattern:
grok {
match => [ "message" , "%{TIME:time}.*%{NUMBER:loglevel}:\s%{GREEDYDATA:message_processed}" ]
}
It is being shipped to logstash with filebeat on a windows server with the following multi-line config in filebeat.yml:
multiline.pattern: ^[0-9]{2}\:[0-9]{2}\:[0-9]{2}
multiline.negate: true
multiline.match: after
I've tried using (?m) flag but to no avail, and using multi-line codec with filebeat is a no-go according to the official documentation.
What am I doing wrong?

Try the following:
%{TIME:time}.*%{NUMBER:loglevel}:\s(?<message_processed>(.|\‌​r|\n)*)
As pointed out here GREEDYDATA won't match newlines while (?<message>(.|\r|\n)*) will. So the above works for me.
See the results for your example with your multiline config on https://grokconstructor.appspot.com/do/match:

Related

Using actual regex in Ansible's search_regex parameter

I'm trying to use the wait_for module in Ansible to repeatedly check a log file and end when it finds either success (the word Successfully appears in the log) or failure (the string [FATAL] appears in the log).
- name: Wait for Logstash API Endpoint to be running
wait_for:
path: /var/log/logstash/logstash-plain.log
search_regex: '(\\[FATAL\\]|Successfully)'
delay: 30
timeout: 120
I've tried various approaches to the search_regex parameter including no escape characters, single escape characters etc.
I've checked that the logs' output includes [FATAL] and it definitely does, but I can't get this module to work.
Where am I going wrong?
** EDIT **
Attempting to use the following code:
On the following log:
[2019-01-15T15:43:14,735][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.5.2"}
[2019-01-15T15:44:00,534][ERROR][logstash.agent ] Failed to execute action {:action=>LogStash::PipelineAction::Create/pipeline_id:main, :exception=>"Java::OrgLogstashConfigIr::InvalidIRException", :message=>"Config has duplicate Ids: \nID: jsonfilter P[filter-json{\"id\"=>\"jsonfilter\", \"source\"=>\"message\", \"remove_field\"=>[\"_sourceUri\", \"_user\", \"sourceUri\", \"user\", \"pid\", \"v\"]}|[str]pipeline:41:7:```\njson {\n id => \"jsonfilter\"\n source => \"message\"\n # remove some irrelevant fields\n remove_field => [\"_sourceUri\", \"_user\", \"sourceUri\", \"user\", \"pid\", \"v\"]\n }\n```]\nP[filter-json{\"id\"=>\"jsonfilter\", \"source\"=>\"message\", \"remove_field\"=>[\"_sourceUri\", \"_user\", \"sourceUri\", \"user\", \"pid\", \"v\"]}|[str]pipeline:191:7:```\njson {\n id => \"jsonfilter\"\n source => \"message\"\n # remove some irrelevant fields\n remove_field => [\"_sourceUri\", \"_user\", \"sourceUri\", \"user\", \"pid\", \"v\"]\n }\n```]", :backtrace=>["org.logstash.config.ir.graph.Graph.validate(org/logstash/config/ir/graph/Graph.java:294)", "org.logstash.config.ir.PipelineIR.<init>(org/logstash/config/ir/PipelineIR.java:52)", "java.lang.reflect.Constructor.newInstance(java/lang/reflect/Constructor.java:423)", "org.jruby.javasupport.JavaConstructor.newInstanceDirect(org/jruby/javasupport/JavaConstructor.java:246)", "org.jruby.RubyClass.newInstance(org/jruby/RubyClass.java:1022)", "org.jruby.RubyClass$INVOKER$i$newInstance.call(org/jruby/RubyClass$INVOKER$i$newInstance.gen)", "usr.share.logstash.logstash_minus_core.lib.logstash.compiler.compile_sources(/usr/share/logstash/logstash-core/lib/logstash/compiler.rb:29)", "usr.share.logstash.logstash_minus_core.lib.logstash.compiler.RUBY$method$compile_sources$0$__VARARGS__(usr/share/logstash/logstash_minus_core/lib/logstash//usr/share/logstash/logstash-core/lib/logstash/compiler.rb)", "org.jruby.RubyClass.finvoke(org/jruby/RubyClass.java:899)", "org.jruby.RubyBasicObject.callMethod(org/jruby/RubyBasicObject.java:372)", "org.logstash.config.ir.ConfigCompiler.configToPipelineIR(org/logstash/config/ir/ConfigCompiler.java:32)", "org.logstash.execution.AbstractPipelineExt.initialize(org/logstash/execution/AbstractPipelineExt.java:149)", "org.logstash.execution.AbstractPipelineExt$INVOKER$i$3$0$initialize.call(org/logstash/execution/AbstractPipelineExt$INVOKER$i$3$0$initialize.gen)", "usr.share.logstash.logstash_minus_core.lib.logstash.pipeline.initialize(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:22)", "usr.share.logstash.logstash_minus_core.lib.logstash.pipeline.initialize(/usr/share/logstash/logstash-core/lib/logstash/pipeline.rb:90)", "org.jruby.RubyClass.newInstance(org/jruby/RubyClass.java:1022)", "org.jruby.RubyClass$INVOKER$i$newInstance.call(org/jruby/RubyClass$INVOKER$i$newInstance.gen)", "usr.share.logstash.logstash_minus_core.lib.logstash.pipeline_action.create.block in execute(/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:42)", "org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)", "org.jruby.RubyProc.call19(org/jruby/RubyProc.java:273)", "org.jruby.RubyProc$INVOKER$i$0$0$call19.call(org/jruby/RubyProc$INVOKER$i$0$0$call19.gen)", "usr.share.logstash.logstash_minus_core.lib.logstash.agent.block in exclusive(/usr/share/logstash/logstash-core/lib/logstash/agent.rb:92)", "org.jruby.ext.thread.Mutex.synchronize(org/jruby/ext/thread/Mutex.java:148)", "org.jruby.ext.thread.Mutex$INVOKER$i$0$0$synchronize.call(org/jruby/ext/thread/Mutex$INVOKER$i$0$0$synchronize.gen)", "usr.share.logstash.logstash_minus_core.lib.logstash.agent.exclusive(/usr/share/logstash/logstash-core/lib/logstash/agent.rb:92)", "usr.share.logstash.logstash_minus_core.lib.logstash.agent.RUBY$method$exclusive$0$__VARARGS__(usr/share/logstash/logstash_minus_core/lib/logstash//usr/share/logstash/logstash-core/lib/logstash/agent.rb)", "usr.share.logstash.logstash_minus_core.lib.logstash.pipeline_action.create.execute(/usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb:38)", "usr.share.logstash.logstash_minus_core.lib.logstash.pipeline_action.create.RUBY$method$execute$0$__VARARGS__(usr/share/logstash/logstash_minus_core/lib/logstash/pipeline_action//usr/share/logstash/logstash-core/lib/logstash/pipeline_action/create.rb)", "usr.share.logstash.logstash_minus_core.lib.logstash.agent.block in converge_state(/usr/share/logstash/logstash-core/lib/logstash/agent.rb:317)", "org.jruby.RubyProc.call(org/jruby/RubyProc.java:289)", "org.jruby.RubyProc.call(org/jruby/RubyProc.java:246)", "java.lang.Thread.run(java/lang/Thread.java:748)"]}
[2019-01-15T15:44:01,189][FATAL][logstash.runner ] An unexpected error occurred! {:error=>#<LogStash::Error: Don't know how to handle `Java::OrgLogstashConfigIr::InvalidIRException` for `PipelineAction::Create<main>`>, :backtrace=>["org/logstash/execution/ConvergeResultExt.java:103:in `create'", "org/logstash/execution/ConvergeResultExt.java:34:in `add'", "/usr/share/logstash/logstash-core/lib/logstash/agent.rb:329:in `block in converge_state'"]}
[2019-01-15T15:44:02,220][ERROR][org.logstash.Logstash ] java.lang.IllegalStateException: Logstash stopped processing because of an error: (SystemExit) exit
Results in this failure:
Further info:
Based on my testing you were soooooooooo close. This:
search_regex: "(\\[FATAL\\]|Successfully)"
works for me. (Note " rather than ')
Also, check that Ansible has permission to read the log file. In the case that it doesn't, wait_for doesn't give you any indication that it can't read the file.

Grok filter for logstash to match a specific value from a log file

I have the following log:
2018-10-30 11:47:52 INFO 30464 SMS-MT [cid:300038] [queue-msgid:bb7a195d-fb23-42ae-bbfa-d2dcda405af9] [smpp-msgid:j.11082.639364178944.#MARKET SETU] [status:ESME_ROK] [prio:1] [dlr:NO_SMSC_DELIVERY_RECEIPT_REQUESTED] [validity:none] [from:2323232] [to:23232132312] [content:'#MARKET SETUP\nadsadadadadasdasdadaasdada mo ang:\nC jean_rivera\n--Mag reply ng A-C']
I've created a grok filter based on pattern in logstash so I can parse the log the way I want. And I have this:
%{DATESTAMP:Timestamp} %{LOGLEVEL:Level} %{BASE10NUM:Pid} %{USERNAME:SMS_TYPE} %{CID:CID} %{GREEDYDATA:Message}
I'm trying to create a GROK patter that will match 300038, which is the number coming after cid:. The syntax is always the same, [cid:number]. What I have now is:
CID (\[cid:[0-9]{6}\])
but that results into:
"CID": [
[
"[cid:300038]"
]
],
and I only want to match the 300038, without the [cid:] part
I have noticed that there are more than single space character between LOG and pid, you can match all of them using \s*.
To match just a number from [cid:300038] you can use custom pattern, \[cid:(?<CID>[0-9]{1,})\] this will match cid of any length, not just 6 digits.
Your pattern will become,
%{DATESTAMP:Timestamp} %{LOGLEVEL:Level}\s*%{BASE10NUM:Pid} %{USERNAME:SMS_TYPE} \[cid:(?<CID>[0-9]{1,})\] %{GREEDYDATA:Message}
Use
%{DATESTAMP:Timestamp} %{LOGLEVEL:Level} %{BASE10NUM:Pid} %{USERNAME:SMS_TYPE} \[cid:(?<CID>[0-9]{6})\] %{GREEDYDATA:Message}

Telegraf tail with grok pattern error

I am using Telegraf to get logs information from Apache NiFi, for this task I am using this config:
[[inputs.tail]]
## files to tail.
files = ["/var/log/nifi/nifi-app.log"]
## Read file from beginning.
from_beginning = true
#name_override = "nifi_app"
## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
data_format = "grok"
grok_patterns = [ "%{DATE:date} %{TIME:time} %{WORD:EventType} \[%{GREEDYDATA:NifiTask} %{NOTSPACE:Thread}\] %{NOTSPACE:NifiEventType} %{GREEDYDATA:EventText} %{NUMBER:EventDuration} %{WORD:EventDurationUnits}" ]
When I try to start telegraf it give me this error:
Error parsing /etc/telegraf/telegraf.conf, toml: line 10: parse error
The pattern I wrote was tested in a Grok debugger with this text:
2018-08-02 10:53:16,976 INFO [Heartbeat Monitor Thread-1]
o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 1 heartbeats
in 11863 nanos
These are the results of some testing:
grok_patterns = ["\[%{GREEDYDATA:NifiTask}\]"] ==> toml: line 10: parse error
grok_patterns = ["[%{GREEDYDATA:NifiTask}]"] ==> Invalid data format: grok
grok_patterns = ['\[%{GREEDYDATA:NifiTask}\]'] ==> Invalid data format: grok
grok_patterns = ["\\[%{GREEDYDATA:NifiTask}\\]"] ==> Invalid data format: grok
grok_patterns = ['[%{GREEDYDATA:NifiTask}]'] -> Invalid data format: grok
The first option for me is the right one, but doesn't works, and the problem seems to be the way the bracket is being escaped.
How is possible to solve this issue?
To fix the issue about escaping bracket, a "partial" solution is to change double quote by simple quote, with this way, in my case (telegraph version 1.13.4) the bracket is correctly escaped by \
There was more than one problem:
First problem: the grok dataformat is added to Telegraf in the 1.8 release (ref), so I must use a nightly install until this version is released.
Second problem: how to escape the brackets, there are problems doing it in a regular way, so what I finally did was to put this part in a custom pattern file, this way it works perfectly.

Logstash: Attaching to previous line using multiline attaches somewhere else

I have a filter that looks like so:
multiline {
pattern => "(^.+Exception.*)|(^\tat .+)"
negate => false
what => "previous"
}
But for some reason, it's not attaching to the previous line for lines with ^\tat. Sometimes it does, but most of the time it doesn't. It attaches to the line way far back. I don't see anything wrong with my code.
Does anyone know if this is a bug?
Edit: This worked properly just now but couple minutes after it doesn't work again. Is it a buffer overflow? How would I debug this?
Edit: Example of success:
2014-06-20 09:09:07,989 http-bio-8080-exec-629 WARN com.rubiconproject.rfm.adserver.filter.impl.PriorityFilter - Request : NBA_DIV=Zedge_Tier1_App_MPBTAG_320x50_ROS_Android&NBA_APPID=4E51A330AD7A0131112022000A93D4E6&NBA_PUBID=111657&NBA_LOCATION_LAT=&NBA_LOCATION_LNG=&NBA_KV=device_id_sha-1_key=5040e46d15bd2f37b3ba58860cc94c1308c0ca4b&_v=2_0_0&id=84472439740784460, Response : Unable to Score Ads.. Selecting first one and Continuing...
java.lang.IndexOutOfBoundsException: Index: 8, Size: 1
at java.util.ArrayList.rangeCheck(ArrayList.java:604)
at java.util.ArrayList.get(ArrayList.java:382)
Edit: Example of failure:
2014-06-20 09:02:31,139 http-bio-8080-exec-579 WARN com.rubiconproject.rfm.adserver.web.AdRequestController - Request : car=vodafone UK&con=0&model=iPhone&bdl=com.racingpost.general&sup=adm,dfp,iAd&id=8226846&mak=Apple&sze=320x50&TYP=1&rtyp=json&app=F99D88D0FDEC01300BF5123139244773&clt=MBS_iOS_SDK_2.4.0&dpr=2.000000&apver=10.4&osver=7.1&udid=115FC62F-D4FF-44E0-8D92-5A060043EFDD&pub=111407&tud=3&osn=iPhone OS&, Response : No Ad Selected to Serve..Exiting
at java.util.ArrayList.get(ArrayList.java:382)
My file has 13000+ lines, and when it errors, it attaches to couple hundred lines back. But strangely each attaches to a line with the exact same offset in between (by offset I mean those couple hundred lines that it skips).
Your logs is java stack logs.
You can try to use this pattern. Use the date as the pattern, which is the beginning of each log.
input {
stdin{}
}
filter {
multiline {
pattern => "^(?>\d\d){1,2}-(?:0?[1-9]|1[0-2])-(?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])"
what => "previous"
}
}
output {
stdout {
codec => "rubydebug"
}
}
This pattern parses the date, if the line do not start with date, logstash will multiline it.
I have try it with your logs, it's worked on both two logs.
Hope this can help you.

Cannot parse multiple files with logstash input "file"

I am trying to parse a folder with logstash on windows but I have some strange results.
I have a file samplea.log with the following content :
1
2
3
4
5
I have a file sampleb.log with the following content:
6
7
8
9
10
This is my config file:
input {
file {
path=> "C:/monitoring/samples/*.log"
}
}
output {
stdout { debug => "true" }
elasticsearch {
protocol => "transport"
host => "127.0.0.1"
}
}
For unknown reasons, the events displayed on my console are 6,7,8,9 and the same events are stored in elasticsearch. The last line of my file sampleb is ignored and the whole file samplea is ignored.
Thanks in advance for your help.
To solve this problem, I've patched original logstash (1.4.2) distribution with the filewatch-0.6.1 gem (original is filewatch-0.5.1)
Logstash is delimiting lines with return character. Make sure you hit enter on the last line.
Known bug, the inner library tracks the file with inode. On Windows the library use a function that always return 0. In your case, I think logstash reads the simpleb.log, then reads simplea.log from index 6, which is end-of-file.
Bug tracked here : https://github.com/logstash-plugins/logstash-input-file/issues/2

Resources