logstash multiline codec with java stack trace - elasticsearch

I am trying to parse a log file with grok. the configuration I use allows me to parse a single lined event but not if multilined (with java stack trace).
#what i get on KIBANA for a single line:
{
"_index": "logstash-2015.02.05",
"_type": "logs",
"_id": "mluzA57TnCpH-XBRbeg",
"_score": null,
"_source": {
"message": " - 2014-01-14 11:09:35,962 [main] INFO (api.batch.ThreadPoolWorker) user.country=US",
"#version": "1",
"#timestamp": "2015-02-05T09:38:21.310Z",
"path": "/root/test2.log",
"time": "2014-01-14 11:09:35,962",
"main": "main",
"loglevel": "INFO",
"class": "api.batch.ThreadPoolWorker",
"mydata": " user.country=US"
},
"sort": [
1423129101310,
1423129101310
]
}
#what i get for a multiline with Stack trace:
{
"_index": "logstash-2015.02.05",
"_type": "logs",
"_id": "9G6LsSO-aSpsas_jOw",
"_score": null,
"_source": {
"message": "\tat oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:20)",
"#version": "1",
"#timestamp": "2015-02-05T09:38:21.380Z",
"path": "/root/test2.log",
"tags": [
"_grokparsefailure"
]
},
"sort": [
1423129101380,
1423129101380
]
}
input {
file {
path => "/root/test2.log"
start_position => "beginning"
codec => multiline {
pattern => "^ - %{TIMESTAMP_ISO8601} "
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", " -%{SPACE}%{SPACE}%{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel}%{SPACE}%{SPACE}\(%{JAVACLASS:class}\) %{GREEDYDATA:mydata} %{JAVASTACKTRACEPART}"]
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch {
host => "194.3.227.23"
}
# stdout { codec => rubydebug}
}
Can anyone please tell me what i'm doing wrong on my configuration file? Thanks.
here's a sample of my log file:
- 2014-01-14 11:09:36,447 [main] INFO (support.context.ContextFactory) Creating default context
- 2014-01-14 11:09:38,623 [main] ERROR (support.context.ContextFactory) Error getting connection to database jdbc:oracle:thin:#HAL9000:1521:DEVPRINT, with user cisuser and driver oracle.jdbc.driver.OracleDriver
java.sql.SQLException: ORA-28001: the password has expired
at oracle.jdbc.driver.SQLStateMapping.newSQLException(SQLStateMapping.java:70)
at oracle.jdbc.driver.DatabaseError.newSQLException(DatabaseError.java:131)
**
*> EDIT: here's the latest configuration i'm using
https://gist.github.com/anonymous/9afe80ad604f9a3d3c00#file-output-L1*
**

First point, when repeating testing with the file input, be sure to use sincedb_path => "/dev/null" to be sure to read from the beginning of the file.
About multiline, there must be something wrong either with your question content or your multiline pattern because none of the event have the multiline tag that is added by the multiline codec or filter when aggregating the lines.
Your message field should contains all lines separated by line feed characters \n (\r\n in my case being on windows). Here is the expected output from your input configuration
{
"#timestamp" => "2015-02-10T11:03:33.298Z",
"message" => " - 2014-01-14 11:09:35,962 [main] INFO (api.batch.ThreadPoolWorker) user.country=US\r\n\tat oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:20\r",
"#version" => "1",
"tags" => [
[0] "multiline"
],
"host" => "localhost",
"path" => "/root/test.file"
}
About grok, as you want to match a multiline string you should use a pattern like this.
filter {
grok {
match => {"message" => [
"(?m)^ -%{SPACE}%{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] % {LOGLEVEL:loglevel}%{SPACE}\(%{JAVACLASS:class}\) %{DATA:mydata}\n%{GREEDYDATA:stack}",
"^ -%{SPACE}%{TIMESTAMP_ISO8601:time} \[%{WORD:main}\] %{LOGLEVEL:loglevel}%{SPACE}\(%{JAVACLASS:class}\) %{GREEDYDATA:mydata}"]
}
}
}
(?m) prefix instruct the regex engine to do multiline matching.
And then you get an event like
{
"#timestamp" => "2015-02-10T10:47:20.078Z",
"message" => " - 2014-01-14 11:09:35,962 [main] INFO (api.batch.ThreadPoolWorker) user.country=US\r\n\tat oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:20\r",
"#version" => "1",
"tags" => [
[0] "multiline"
],
"host" => "localhost",
"path" => "/root/test.file",
"time" => "2014-01-14 11:09:35,962",
"main" => "main",
"loglevel" => "INFO",
"class" => "api.batch.ThreadPoolWorker",
"mydata" => " user.country=US\r",
"stack" => "\tat oracle.jdbc.driver.DatabaseError.throwSqlException(DatabaseError.java:20\r"
}
You can build and validate your multiline patterns with this online tool http://grokconstructor.appspot.com/do/match
A final warning, there is currently a bug in Logstash file input with multiline codec that mixup content from several files if you use a list or wildcard in path setting. The only workaroud is to use the multiline filter
HTH
EDIT: I was focusing on the multiline strings, you need to add a similar pattern for non single lines string

Related

How to get fields inside message array from Logstash?

I've been trying to configure a logstash pipeline with input type is snmptrap along with yamlmibdir. Here's the code
input {
snmptrap {
host => "abc"
port => 1062
yamlmibdir => "/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/snmp-1.3.2/data/ruby/snmp/mibs"
}
}
filter {
mutate {
gsub => ["message","^\"{","{"]
gsub => ["message","}\"$","}"]
gsub => ["message","[\\]",""]
}
json { source => "message" }
split {
field => "message"
target => "evetns"
}
}
output {
elasticsearch {
hosts => "xyz"
index => "logstash-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
and the result shown in Kibana (JSON format)
{
"_index": "logstash-2019.11.18-000001",
"_type": "_doc",
"_id": "Y_5zjG4B6M9gb7sxUJwG",
"_version": 1,
"_score": null,
"_source": {
"#version": "1",
"#timestamp": "2019-11-21T05:33:07.675Z",
"tags": [
"_jsonparsefailure"
],
"1.11.12.13.14.15": "teststring",
"message": "#<SNMP::SNMPv1_Trap:0x244bf33f #enterprise=[1.2.3.4.5.6], #timestamp=#<SNMP::TimeTicks:0x196a1590 #value=55>, #varbind_list=[#<SNMP::VarBind:0x21f5e155 #name=[1.11.12.13.14.15], #value=\"teststring\">], #specific_trap=99, #source_ip=\"xyz\", #agent_addr=#<SNMP::IpAddress:0x5a5c3c5f #value=\"xC0xC1xC2xC3\">, #generic_trap=6>",
"host": "xyz"
},
"fields": {
"#timestamp": [
"2019-11-21T05:33:07.675Z"
]
},
"sort": [
1574314387675
]
}
As you can see in the message field, it's an array so how can I get all the field inside the array. also able to select these field to display on Kibana.
ps1. still got tags _jsonparsefailure if select type 'Table' in Expanded document
ps2. even if using gsub for remove '\' from expected json result, why still got an result with '\' ?

Distributing tracing and elastic stack visualisation

2019-06-03 10:45:00.051 INFO [currency-exchange,411a0496b048bcf4,8d40fcfea92613ad,true] 45648 --- [x-Controller-10] logger : inside exchange
This is the log format in my console. I am using spring cloud stream to transport my log from application to logstash.This is the format for log parsing in logstash
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:severity}\s+\[%{DATA:service},%{DATA:trace},%{DATA:span},%{DATA:exportable}\]\s+%{DATA:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{GREEDYDATA:rest}" }
}
This is my logstash.conf
input { kafka { topics => ['zipkin'] } } filter {
# pattern matching logback pattern
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:severity}\s+\[%{DATA:service},%{DATA:trace},%{DATA:span},%{DATA:exportable}\]\s+%{DATA:pid}\s+---\s+\[%{DATA:thread}\]\s+%{DATA:class}\s+:\s+%{GREEDYDATA:rest}"
}
} } output { elasticsearch {hosts => ['localhost:9200'] index => 'logging'} stdout {} }
and this is my output in log-stash console . which is parsing exception
{
"message" => "[{\"traceId\":\"411a0496b048bcf4\",\"parentId\":\"8d40fcfea92613ad\",\"id\":\"f14c1c332d2ef077\",\"kind\":\"CLIENT\",\"name\":\"get\",\"timestamp\":1559538900053889,\"duration\":16783,\"localEndpoint\":{\"serviceName\":\"currency-exchange\",\"ipv4\":\"10.8.0.7\"},\"tags\":{\"http.method\":\"GET\",\"http.path\":\"/convert/1/to/4\"}},{\"traceId\":\"411a0496b048bcf4\",\"parentId\":\"411a0496b048bcf4\",\"id\":\"8d40fcfea92613ad\",\"name\":\"hystrix\",\"timestamp\":1559538900050039,\"duration\":34500,\"localEndpoint\":{\"serviceName\":\"currency-exchange\",\"ipv4\":\"10.8.0.7\"}},{\"traceId\":\"411a0496b048bcf4\",\"id\":\"411a0496b048bcf4\",\"kind\":\"SERVER\",\"name\":\"get
/convert\",\"timestamp\":1559538900041446,\"duration\":44670,\"localEndpoint\":{\"serviceName\":\"currency-exchange\",\"ipv4\":\"10.8.0.7\"},\"remoteEndpoint\":{\"ipv6\":\"::1\",\"port\":62200},\"tags\":{\"http.method\":\"GET\",\"http.path\":\"/convert\",\"mvc.controller.class\":\"Controller\",\"mvc.controller.method\":\"convert\"}}]",
"#timestamp" => 2019-06-03T05:15:00.296Z,
"#version" => "1",
"tags" => [
[0] "_grokparsefailure"
] }
When I use the Grok Debugger that is built into Kibana (under Dev Tools) I get the following result from your sample log and grok pattern:
{
"severity": "DEBUG",
"rest": "GET \"/convert/4/to/5\", parameters={}",
"pid": "35973",
"thread": "nio-9090-exec-1",
"trace": "62132b44a444425e",
"exportable": "true",
"service": "currency-conversion",
"class": "o.s.web.servlet.DispatcherServlet",
"timestamp": "2019-05-31 05:31:42.667",
"span": "62132b44a444425e"
}
That looks correct to me. So what is the missing part?
Also the logging output you are showing contains "ipv4":"192.168.xx.xxx"},"remoteEndpoint": {"ipv6":"::1","port":55394},"tags": ..., which is not in the sample log. Where is that coming from?

insert data into elasticsearch using logstash and visualize in kibana

I have the following CSV file
tstp,voltage_A_real,voltage_B_real,voltage_C_real #header not present in actual file
2000-01-01 00:00:00,2535.53,-1065.7,-575.754
2000-01-01 01:00:00,2528.31,-1068.67,-576.866
2000-01-01 02:00:00,2528.76,-1068.49,-576.796
2000-01-01 03:00:00,2530.12,-1067.93,-576.586
2000-01-01 04:00:00,2531.02,-1067.56,-576.446
2000-01-01 05:00:00,2533.28,-1066.63,-576.099
2000-01-01 06:00:00,2535.53,-1065.7,-575.754
2000-01-01 07:00:00,2535.53,-1065.7,-575.754
....
I am trying to insert the data into elasticsearch through logstash and have the following logstash config
input {
file {
path => "path_to_csv_file"
sincedb_path=> "/dev/null"
start_position => beginning
}
}
filter {
csv {
columns => [
"tstp",
"Voltage_A_real",
"Voltage_B_real",
"Voltage_C_real"
]
separator => ","
}
date {
match => [ "tstp", "yyyy-MM-dd HH:mm:ss"]
}
mutate {
convert => ["Voltage_A_real", "float"]
convert => ["Voltage_B_real", "float"]
convert => ["Voltage_C_real", "float"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "temp_load_index"
}
}
My output from rubydebug when I run logstash -f conf_file -v is
{
"message" => "2000-02-18 16:00:00,2532.38,-1067,-576.238",
"#version" => "1",
"#timestamp" => "2000-02-18T21:00:00.000Z",
"path" => "path_to_csv",
"host" => "myhost",
"tstp" => "2000-02-18 16:00:00",
"Voltage_A_real" => 2532.38,
"Voltage_B_real" => -1067.0,
"Voltage_C_real" => -576.238
}
However I see only 2 events in kibana when I look at the dashboard and both have the current datetime stamp and not that of the year 2000 which is the range of my data. Could someone please help me figure out what is happening?
A sample kibana object is as follows
{
"_index": "temp_load_index",
"_type": "logs",
"_id": "myid",
"_score": null,
"_source": {
"message": "2000-04-02 02:00:00,2528.76,-1068.49,-576.796",
"#version": "1",
"#timestamp": "2016-09-27T05:15:29.753Z",
"path": "path_to_csv",
"host": "myhost",
"tstp": "2000-04-02 02:00:00",
"Voltage_A_real": 2528.76,
"Voltage_B_real": -1068.49,
"Voltage_C_real": -576.796,
"tags": [
"_dateparsefailure"
]
},
"fields": {
"#timestamp": [
1474953329753
]
},
"sort": [
1474953329753
]
}
When you open Kibana, it usually show you only events in the last 15 min, according to the #timestamp field. So you need to set the time filter to the appropriate time range (cf documentation), in your case, using the absolute option and starting 2000-01-01.
Or you can put the parsed timestamp in another field (for example original_tst), so that the #timestamp added by Logstash will be kept.
date {
match => [ "tstp", "yyyy-MM-dd HH:mm:ss"]
target => "original_tst"
}

logstash splits event field values and assign to #metadata field

I have a logstash event, which has the following field
{
"_index": "logstash-2016.08.09",
"_type": "log",
"_id": "AVZvz2ix",
"_score": null,
"_source": {
"message": "function_name~execute||line_no~128||debug_message~id was not found",
"#version": "1",
"#timestamp": "2016-08-09T14:57:00.147Z",
"beat": {
"hostname": "coredev",
"name": "coredev"
},
"count": 1,
"fields": null,
"input_type": "log",
"offset": 22299196,
"source": "/project_root/project_1/log/core.log",
"type": "log",
"host": "coredev",
"tags": [
"beats_input_codec_plain_applied"
]
},
"fields": {
"#timestamp": [
1470754620147
]
},
"sort": [
1470754620147
]
}
I am wondering how to use filter (kv maybe?) to extract core.log from "source": "/project_root/project_1/log/core.log", and put it in e.g. [#metadata][log_type], and so later on, I can use log_type in output to create an unique index, composing of hostname + logtype + timestamp, e.g.
output {
elasticsearch {
hosts => "localhost:9200"
manage_template => false
index => "%{[#metadata][_source][host]}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}
You can leverage the mutate/gsub filter in order to achieve this:
filter {
# add the log_type metadata field
mutate {
add_field => {"[#metadata][log_type]" => "%{source}"}
}
# remove everything up to the last slash
mutate {
gsub => [ "[#metadata][log_type]", "^.*\/", "" ]
}
}
Then you can modify your elasticsearch output like this:
output {
elasticsearch {
hosts => ["localhost:9200"]
manage_template => false
index => "%{host}-%{[#metadata][log_type]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
stdout { codec => rubydebug }
}

Logstash and ElasticSearch filter Date #timestamp issue

Im trying to index some data from file to ElasticSearch by using Logstash.
If I'm not using the Date filter in order to replace the #timestamp everything works very well, but when in using the filter I do not get all the data.
I can't figure out why there is a difference between the Logstash command line and Elasticsearch in the #timestamp value.
Logstash conf
filter {
mutate {
replace => {
"type" => "dashboard_a"
}
}
grok {
match => [ "message", "%{DATESTAMP:Logdate} \[%{WORD:Severity}\] %{JAVACLASS:Class} %{GREEDYDATA:Stack}" ]
}
date {
match => [ "Logdate", "dd-MM-yyyy hh:mm:ss,SSS" ]
}
}
Logstash Command line trace
{
**"#timestamp" => "2014-08-26T08:16:18.021Z",**
"message" => "26-08-2014 11:16:18,021 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
"#version" => "1",
"host" => "bts10d1",
"path" => "D:\\ElasticSearch\\logstash-1.4.2\\Dashboard_A\\Log_1\\6.log",
"type" => "dashboard_a",
"Logdate" => "26-08-2014 11:16:18,021",
"Severity" => "DEBUG",
"Class" => "com.fnx.snapshot.mdb.SnapshotMDB",
"Stack" => " - SnapshotMDB Ctor is called\r"
}
ElasticSearch result
{
"_index": "logstash-2014.08.28",
"_type": "dashboard_a",
"_id": "-y23oNeLQs2mMbyz6oRyew",
"_score": 1,
"_source": {
**"#timestamp": "2014-08-28T14:31:38.753Z",
**"message": "15:07,565 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
"#version": "1",
"host": "bts10d1",
"path": "D:\\ElasticSearch\\logstash-1.4.2\\Dashboard_A\\Log_1\\6.log",
"type": "dashboard_a",
"tags": ["_grokparsefailure"]
}
}
Please make sure all your logs is in format!
You can see in the logstash command line trace the logs is
26-08-2014 11:16:18,021 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r
But, in the elastsicsearch the log is
15:07,565 [DEBUG] com.fnx.snapshot.mdb.SnapshotMDB - SnapshotMDB Ctor is called\r",
Two logs have different time and their format are not same! The second one do not have any information about daytime, therefore it will cause the grok filter parsing error. You can go to check the origin logs. Or can you provide the origin logs sample for more discussion if all of them are in format!

Resources