Logstash multiline failing with custom json parsing - elasticsearch

I have a Kafka queue with json objects. I am filling this queue with a java based offline producer. The structure of json object is shown as an example:
{
"key": "999998",
"message" : "dummy \n Messages \n Line 1 ",
"type" : "app_event",
"stackTrace" : "dummyTraces",
"tags" : "dummyTags"
}
Note the \n in the "message".
I loaded the queue with million objects and started logstash with the following script:
input {
kafka {
zk_connect => "localhost:2181"
topic_id => "MemoryTest"
type => "app_event"
group_id => "dash_prod"
}
}
filter{
if [type] == "app_event" {
multiline {
pattern => "^\s"
what => "previous"
}
}
}
output {
if [type] == "app_event" {
stdout {
codec => rubydebug
}
elasticsearch {
host => "localhost"
protocol => "http"
port => "9200"
index => "app_events"
index_type => "event"
}
}
}
The multiline filter is expected to remove \n from the message field. When I start logstash, I start getting two issues:
None of the event is pushed into Elastic. I am getting error:_jsonparsefailure. Also notice that the message of one event 'gobbles' up consecutive events.
{
"message" => "{ \n\t\"key\": \"146982\", \n\t\"message\" : \"dummy \n Messages \n Line 1 \", \n\t\"type\" : \"app_event\", \n\t\"stackTrace\" : \"dummyTraces\", \n\t\"tags\" : \"dummyTags\" \n \t \n}\n{ \n\t\"key\": \"146983\", \n\t\"message\" : \"dummy \n Messages \n Line 1 \", \n\t\"type\" : \"app_event\", \n\t\"stackTrace\" : \"dummyTraces\", \n\t\"tags\" : \"dummyTags\" \n \t \n}\n{ \n\t\"key\": \"146984\", \n\t\"message\" : \"dummy \n Messages \n Line 1 \", \n\t\"type\" : \"app_event\", \n\t\"stackTrace\" : \"dummyTraces\", \n\t\"tags\" : \"dummyTags\" \n \t \n},
"tags" => [
[0] "_jsonparsefailure",
1 "multiline"
],
"#version" => "1",
"#timestamp" => "2015-09-21T18:38:32.005Z",
"type" => "app_event"
}
After few minutes, the available heap memory reached a cap and logstash stopped.
A memory profile is attached with this issue. After 13 minutes, logstash hit the memory cap and stopped responding.
I am trying to understand how to get multiline worked for this scenario and what causes memory crash.

To replace part of a string, use mutate->gsub{}.
filter {
mutate {
gsub => [
# replace all forward slashes with underscore
"fieldname", "/", "_",
# replace backslashes, question marks, hashes, and minuses
# with a dot "."
"fieldname2", "[\\?#-]", "."
]
}
}
multiline is, as you've discovered, for combining several events into one.

Related

ElasticSearch - not setting the date type

I am trying the ELK stack, and so far so good :)
I have run in to strange situation regardgin the parsing the date field and sending it to ElasticSearch. I manage to parse the field, and it really gets created in the ElasticSearch, but it always end up as string.
I have tried many different combinations. Also I have tried many different things that people suggested, but still I fail.
This is my setup:
The strings that comes from Filebeat:
[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {"key":"securitysecured_area"} []
[2017-04-26 09:50:42] request.INFO: Matched route "home_logged_in". {"route_parameters":{"controller":"AppBundle\Controller\HomeLoggedInController::showAction","locale":"de","route":"homelogged_in"},"request_uri":"https://qa.someserver.de/de/home"} []
The logstash parsing section:
if [#metadata][type] == "feprod" or [#metadata][type] == "feqa"{
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:logdate}" }
}
date {
#timezone => "Europe/Berlin"
match => [ "logdate", "yyyy-MM-dd HH:mm:ss"]
}
}
According to the documentation, my #timestamp field should be overwritten with the logdate value. But it is no happening.
In the ElasticSearch I can see the field logdate is being created and it has value of 2017-04-26 09:40:33, but its type is string.
I always create index from zero, I delete it first and let the logstash populate it.
I need either #timestamp overwritten with the actual date (not the date when it was indexed), or that logdate field is created with date type. Both is good
Unless you are explicitly adding [#metadata][type] somewhere that you aren't showing, that is your problem. It's not set by default, [type] is set by default from the 'type =>' parameter on your input.
You can validate this with a minimal complete example:
input {
stdin {
type=>'feprod'
}
}
filter {
if [#metadata][type] == "feprod" or [#metadata][type] == "feqa"{
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:logdate}" }
}
date {
match => [ "logdate", "yyyy-MM-dd HH:mm:ss"]
}
}
}
output {
stdout { codec => "rubydebug" }
}
And running it:
echo '[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {"key":"securitysecured_area"} []' | bin/logstash -f test.conf
And getting the output:
{
"#timestamp" => 2017-05-02T15:15:05.875Z,
"#version" => "1",
"host" => "xxxxxxxxx",
"message" => "[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {\"key\":\"securitysecured_area\"} []",
"type" => "feprod",
"tags" => []
}
if you use just if [type] ==... it will work fine.
{
"#timestamp" => 2017-04-26T14:40:33.000Z,
"logdate" => "2017-04-26 09:40:33",
"#version" => "1",
"host" => "xxxxxxxxx",
"message" => "[2017-04-26 09:40:33] security.DEBUG: Stored the security token in the session. {\"key\":\"securitysecured_area\"} []",
"type" => "feprod",
"tags" => []
}

Grok parse error while parsing multiple line messages

I am trying to figure out grok pattern for parsing multiple messages like exception trace & below is one such log
2017-03-30 14:57:41 [12345] [qtp1533780180-12] ERROR com.app.XYZ - Exception occurred while processing
java.lang.NullPointerException: null
at spark.webserver.MatcherFilter.doFilter(MatcherFilter.java:162)
at spark.webserver.JettyHandler.doHandle(JettyHandler.java:61)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:189)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
at org.eclipse.jetty.server.Server.handle(Server.java:517)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:302)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:245)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Here is my logstash.conf
input {
file {
path => ["/debug.log"]
codec => multiline {
# Grok pattern names are valid! :)
pattern => "^%{TIMESTAMP_ISO8601} "
negate => true
what => previous
}
}
}
filter {
mutate {
gsub => ["message", "r", ""]
}
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} \[%{NOTSPACE:uid}\] \[%{NOTSPACE:thread}\] %{LOGLEVEL:loglevel} %{DATA:class}\-%{GREEDYDATA:message}" ]
overwrite => [ "message" ]
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss" ]
}
}
output {
elasticsearch { hosts => localhost }
stdout { codec => rubydebug }
}
This works fine for single line logs parsing but fails in
0] "_grokparsefailure"
for multiline exception traces
Can someone please suggest me the correct filter pattern for parsing multiline logs ?
If you are working with Multiline logs then please use Multiline filter provided by logstash. You first need to distinguish the starting of a new record in multiline filter. From your logs I can see new record is starting with "TIMESTAMP", below is the example usage.
Example usage ::
filter {
multiline {
type => "/debug.log"
pattern => "^%{TIMESTAMP}"
what => "previous"
}
}
You can then use Gsub to replace "\n" and "\r" which will be added by multiline filter to your record. After that use Grok.
The above logstash config worked fine after removing
mutate {
gsub => ["message", "r", ""]
}
So the working logstash config for parsing single line & multi line inputs for the above log pattern
input {
file {
path => ["./debug.log"]
codec => multiline {
# Grok pattern names are valid! :)
pattern => "^%{TIMESTAMP_ISO8601} "
negate => true
what => previous
}
}
}
filter {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} \[%{NOTSPACE:uid}\] \[%{NOTSPACE:thread}\] %{LOGLEVEL:loglevel} %{DATA:class}\-%{GREEDYDATA:message}" ]
overwrite => [ "message" ]
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss" ]
}
}
output {
elasticsearch { hosts => localhost }
stdout { codec => rubydebug }
}

Logstash can't add fields?

I have been using logstash to read some DB restore logs. Here is some lines of sample records.
07/08/2016 6:33:22.50: START restore database
SQL2540W Restore is successful, however a warning "2539" was encountered
during Database Restore while processing in No Interrupt mode.
07/08/2016 6:33:28.93: END restore database
SQL4406W The DB2 Administration Server was started successfully.
07/08/2016 6:35:35.29: END restart server
connect reset
DB20000I The SQL command completed successfully.
07/08/2016 6:35:38.48: END p:\s6\source\system\CMD\res_uw.cmd
Here is the filter part of my conf file.
if ([message] =~ /Backup successful/){
grok{
match => {"message" => ['%{GREEDYDATA:Message}'] }
}
mutate {
add_tag => "send_to_es"
add_field => {"Timestamp" => "%{GREEDYDATA:DATETIME}"}
}
}
if ([message] =~ /warning "2539"/){
grok{
match => {"message" => ['%{GREEDYDATA:Message}'] }
}
mutate {
add_tag => "send_to_es"
add_field => {"Timestamp" => "%{GREEDYDATA:DATETIME}"}
}
}
if ([message] =~ /(END p:|END P:)/){
grok{
match => {"message" => ['%{GREEDYDATA:DATETIME}:%{SPACE}END%{SPACE}%{GREEDYDATA:Mis}'] }
remove_field => "%{GREEDYDATA:Mis}"
}
mutate {
add_tag => "send_to_es"
}
}
I want to add the data "DATETIME" extracted from the last line of my record to message to other message to index at the same time. However, it could not add the field successfully. The output will become
"message": "SQL2540W Restore is successful, however a warning \"2539\" was encountered \r\r",
"#version": "1",
"#timestamp": "2016-07-12T02:28:52.337Z",
"path": "C:/CIGNA/hkiapp67_db_restore/res_uw.log",
"host": "SIMSPad",
"type": "txt",
"Message": "SQL2540W Restore is successful, however a warning \"2539\" was encountered \r\r",
"Timestamp": "%{GREEDYDATA:DATETIME}",
"tags": [
"send_to_es"
]
How could I solve this?
Logstash, when receiving a line, does not have knowledge of any other line. You'll have to use a multiline codec/filter to regroup all the lines you need with a line with the date. Then you use the grok filter to extract the date and add it to your document.
The configuration of the multiline codec/filter will look like this:
multiline {
pattern => "%{DATE}"
negate => "true"
what => "next"
}
With this, all the lines not beginning with the pattern DATE will be joined with the next line.

Having trouble parsing checkpoint firewall logs using grok filter

They are check point fire wall logs and they look like so.... (first row = fields, second row and all the rows thereafter = values of the respective fields)
"Number" "Date" "Time" "Interface" "Origin" "Type" "Action" "Service" "Source Port" "Source" "Destination" "Protocol" "Rule" "Rule Name" "Current Rule Number" "Information"
"7319452" "18Mar2015" "15:00:00" "eth1-04" "grog1" "Log" "Accept" "domain-udp" "20616" "172.16.36.250" "8.8.8.8" "udp" "7" "" "7-open_1" "inzone: Internal; outzone: External; service_id: domain-udp" "Security Gateway/Management"
I have tried doing this bit by bit by getting some code online (grok filters).
I have a file that has nothing more than
"GoLpoT" "502" (quotes included)
and some code that reads this file which is pasted below:
input {
file {
path => "/usr/local/bin/firewall_log"
}
}
filter {
grok {
match => ["message", "%{WORD:type}\|%{NUMBER:nums}"]
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
When I run the code, I get the following error
"message" => "",
"#version" => "1",
"#timestamp" => "2015-04-30T15:52:48.331Z",
"host" => "UOD-220076",
"path" => "/usr/local/bin/firewall_log",
"tags" => [
[0] "_grokparsefailure"
Any help please.
My second question - how do I parse the Date and Time - together or separately ?
The date doesn't change - it's all logs from one day - it's only the time that changes.
Many thanks.

Logstash Doesn't Read Entire Line With File Input

I'm using Logstash and I'm having troubles getting a rather simple configuration to work.
input {
file {
path => "C:/path/test-data/*.log"
start_position => beginning
type => "usage_data"
}
}
filter {
if [type] == "usage_data" {
grok {
match => { "message" => "^\s*%{NUMBER:lineNumber}\s+%{TIMESTAMP_ISO8601:date},(?<value1>[A-Za-z0-9+/]+),(?<value2>[A-Za-z0-9+/]+),(?<value3>[A-Za-z0-9+/]+),(?<value4>[^,]+),(?<value5>[^\r]*)" }
}
}
if "_grokparsefailure" not in [tags] {
drop { }
}
}
output {
stdout { codec => rubydebug }
}
I call Logstash like this:
SET LS_MAX_MEM=2g
DEL "%USERPROFILE%\.sincedb_*" 2> NUL
"C:\Program Files (x86)\logstash-1.4.1\bin\logstash.bat" agent -p "C:\path\\." -w 1 -f "logstash.conf"
The output:
←[33mUsing milestone 2 input plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.1/plugin-milestones {:level=>:w
arn}←[0m
{
"message" => ",",
"#version" => "1",
"#timestamp" => "2014-11-20T09:16:08.591Z",
"type" => "usage_data",
"host" => "my-machine",
"path" => "C:/path/test-data/monitor_20141116223000.log",
"tags" => [
[0] "_grokparsefailure"
]
}
If I parse only C:\path\test-data\monitor_20141116223000.log all lines are read and there is no grokparsefailure. If I remove C:\path\test-data\monitor_20141116223000.log the same grokparsefailure pops up in another log-file:
{
"message" => "atches in another context\r",
"#version" => "1",
"#timestamp" => "2014-11-20T09:14:04.779Z",
"type" => "usage_data",
"host" => "my-machine",
"path" => "C:/path/test-data/monitor_20140829235900.log",
"tags" => [
[0] "_grokparsefailure"
]
}
Especially the last output proves that Logstash doesn't read the entire line or attempts to interpret a newline where there is none. It always breaks at the same line at the same position.
Maybe I should add that the log-files contain \n as a line separator and I'm running Logstash on Windows. However, I'm not getting a whole lot of errors, just that one. And there are quite a lot of lines in there. They all appear properly when I remove the if "_grokparsefailure" ....
I assume that there is some problem with buffering, but I have no clue how to make this work. Any ideas?
Workaround:
# diff -Nur /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb.orig /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb
--- /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb.orig 2015-02-25 10:46:06.916321816 +0700
+++ /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb 2015-02-12 18:39:34.943833909 +0700
## -86,7 +86,9 ##
_read_file(path, &block)
#files[path].close
#files.delete(path)
- #statcache.delete(path)
+ ##statcache.delete(path)
+ inode = #statcache.delete(path)
+ #sincedb[inode] = 0
else
#logger.warn("unknown event type #{event} for #{path}")
end

Resources