Logstash can't add fields? - elasticsearch

I have been using logstash to read some DB restore logs. Here is some lines of sample records.
07/08/2016 6:33:22.50: START restore database
SQL2540W Restore is successful, however a warning "2539" was encountered
during Database Restore while processing in No Interrupt mode.
07/08/2016 6:33:28.93: END restore database
SQL4406W The DB2 Administration Server was started successfully.
07/08/2016 6:35:35.29: END restart server
connect reset
DB20000I The SQL command completed successfully.
07/08/2016 6:35:38.48: END p:\s6\source\system\CMD\res_uw.cmd
Here is the filter part of my conf file.
if ([message] =~ /Backup successful/){
grok{
match => {"message" => ['%{GREEDYDATA:Message}'] }
}
mutate {
add_tag => "send_to_es"
add_field => {"Timestamp" => "%{GREEDYDATA:DATETIME}"}
}
}
if ([message] =~ /warning "2539"/){
grok{
match => {"message" => ['%{GREEDYDATA:Message}'] }
}
mutate {
add_tag => "send_to_es"
add_field => {"Timestamp" => "%{GREEDYDATA:DATETIME}"}
}
}
if ([message] =~ /(END p:|END P:)/){
grok{
match => {"message" => ['%{GREEDYDATA:DATETIME}:%{SPACE}END%{SPACE}%{GREEDYDATA:Mis}'] }
remove_field => "%{GREEDYDATA:Mis}"
}
mutate {
add_tag => "send_to_es"
}
}
I want to add the data "DATETIME" extracted from the last line of my record to message to other message to index at the same time. However, it could not add the field successfully. The output will become
"message": "SQL2540W Restore is successful, however a warning \"2539\" was encountered \r\r",
"#version": "1",
"#timestamp": "2016-07-12T02:28:52.337Z",
"path": "C:/CIGNA/hkiapp67_db_restore/res_uw.log",
"host": "SIMSPad",
"type": "txt",
"Message": "SQL2540W Restore is successful, however a warning \"2539\" was encountered \r\r",
"Timestamp": "%{GREEDYDATA:DATETIME}",
"tags": [
"send_to_es"
]
How could I solve this?

Logstash, when receiving a line, does not have knowledge of any other line. You'll have to use a multiline codec/filter to regroup all the lines you need with a line with the date. Then you use the grok filter to extract the date and add it to your document.
The configuration of the multiline codec/filter will look like this:
multiline {
pattern => "%{DATE}"
negate => "true"
what => "next"
}
With this, all the lines not beginning with the pattern DATE will be joined with the next line.

Related

How to filter data with Logstash before storing parsed data in Elasticsearch

I understand that Logstash is for aggregating and processing logs. I have NGIX logs and had Logstash config setup as:
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "weblogs-%{+YYYY.MM}"
document_type => "nginx_logs"
}
stdout { codec => rubydebug }
}
This would parse the unstructured logs into a structured form of data, and store the data into monthly indexes.
What I discovered is that the majority of logs were contributed by robots/web-crawlers. In python I would filter them out by:
browser_names = browser_names[~browser_names.str.\
match('^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*$', na=False)]
However, I would like to filter them out with Logstash so I can save a lot of disk space in Elasticsearch server. Is there a way to do that? Thanks in advance!
Thanks LeBigCat for generously giving a hint. I solved this problem by adding the following under the filter:
if [browser_names] =~ /(?i)^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*$/ {
drop {}
}
the (?i) flag is for case insensitive matching.
In your filter you can ask for drop (https://www.elastic.co/guide/en/logstash/current/plugins-filters-drop.html). As you already got your pattern, should be pretty fast ;)

Grok parse error while parsing multiple line messages

I am trying to figure out grok pattern for parsing multiple messages like exception trace & below is one such log
2017-03-30 14:57:41 [12345] [qtp1533780180-12] ERROR com.app.XYZ - Exception occurred while processing
java.lang.NullPointerException: null
at spark.webserver.MatcherFilter.doFilter(MatcherFilter.java:162)
at spark.webserver.JettyHandler.doHandle(JettyHandler.java:61)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:189)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
at org.eclipse.jetty.server.Server.handle(Server.java:517)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:302)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:245)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
Here is my logstash.conf
input {
file {
path => ["/debug.log"]
codec => multiline {
# Grok pattern names are valid! :)
pattern => "^%{TIMESTAMP_ISO8601} "
negate => true
what => previous
}
}
}
filter {
mutate {
gsub => ["message", "r", ""]
}
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} \[%{NOTSPACE:uid}\] \[%{NOTSPACE:thread}\] %{LOGLEVEL:loglevel} %{DATA:class}\-%{GREEDYDATA:message}" ]
overwrite => [ "message" ]
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss" ]
}
}
output {
elasticsearch { hosts => localhost }
stdout { codec => rubydebug }
}
This works fine for single line logs parsing but fails in
0] "_grokparsefailure"
for multiline exception traces
Can someone please suggest me the correct filter pattern for parsing multiline logs ?
If you are working with Multiline logs then please use Multiline filter provided by logstash. You first need to distinguish the starting of a new record in multiline filter. From your logs I can see new record is starting with "TIMESTAMP", below is the example usage.
Example usage ::
filter {
multiline {
type => "/debug.log"
pattern => "^%{TIMESTAMP}"
what => "previous"
}
}
You can then use Gsub to replace "\n" and "\r" which will be added by multiline filter to your record. After that use Grok.
The above logstash config worked fine after removing
mutate {
gsub => ["message", "r", ""]
}
So the working logstash config for parsing single line & multi line inputs for the above log pattern
input {
file {
path => ["./debug.log"]
codec => multiline {
# Grok pattern names are valid! :)
pattern => "^%{TIMESTAMP_ISO8601} "
negate => true
what => previous
}
}
}
filter {
grok {
match => [ "message", "%{TIMESTAMP_ISO8601:timestamp} \[%{NOTSPACE:uid}\] \[%{NOTSPACE:thread}\] %{LOGLEVEL:loglevel} %{DATA:class}\-%{GREEDYDATA:message}" ]
overwrite => [ "message" ]
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss" ]
}
}
output {
elasticsearch { hosts => localhost }
stdout { codec => rubydebug }
}

Logstash Doesn't Read Entire Line With File Input

I'm using Logstash and I'm having troubles getting a rather simple configuration to work.
input {
file {
path => "C:/path/test-data/*.log"
start_position => beginning
type => "usage_data"
}
}
filter {
if [type] == "usage_data" {
grok {
match => { "message" => "^\s*%{NUMBER:lineNumber}\s+%{TIMESTAMP_ISO8601:date},(?<value1>[A-Za-z0-9+/]+),(?<value2>[A-Za-z0-9+/]+),(?<value3>[A-Za-z0-9+/]+),(?<value4>[^,]+),(?<value5>[^\r]*)" }
}
}
if "_grokparsefailure" not in [tags] {
drop { }
}
}
output {
stdout { codec => rubydebug }
}
I call Logstash like this:
SET LS_MAX_MEM=2g
DEL "%USERPROFILE%\.sincedb_*" 2> NUL
"C:\Program Files (x86)\logstash-1.4.1\bin\logstash.bat" agent -p "C:\path\\." -w 1 -f "logstash.conf"
The output:
←[33mUsing milestone 2 input plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.1/plugin-milestones {:level=>:w
arn}←[0m
{
"message" => ",",
"#version" => "1",
"#timestamp" => "2014-11-20T09:16:08.591Z",
"type" => "usage_data",
"host" => "my-machine",
"path" => "C:/path/test-data/monitor_20141116223000.log",
"tags" => [
[0] "_grokparsefailure"
]
}
If I parse only C:\path\test-data\monitor_20141116223000.log all lines are read and there is no grokparsefailure. If I remove C:\path\test-data\monitor_20141116223000.log the same grokparsefailure pops up in another log-file:
{
"message" => "atches in another context\r",
"#version" => "1",
"#timestamp" => "2014-11-20T09:14:04.779Z",
"type" => "usage_data",
"host" => "my-machine",
"path" => "C:/path/test-data/monitor_20140829235900.log",
"tags" => [
[0] "_grokparsefailure"
]
}
Especially the last output proves that Logstash doesn't read the entire line or attempts to interpret a newline where there is none. It always breaks at the same line at the same position.
Maybe I should add that the log-files contain \n as a line separator and I'm running Logstash on Windows. However, I'm not getting a whole lot of errors, just that one. And there are quite a lot of lines in there. They all appear properly when I remove the if "_grokparsefailure" ....
I assume that there is some problem with buffering, but I have no clue how to make this work. Any ideas?
Workaround:
# diff -Nur /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb.orig /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb
--- /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb.orig 2015-02-25 10:46:06.916321816 +0700
+++ /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb 2015-02-12 18:39:34.943833909 +0700
## -86,7 +86,9 ##
_read_file(path, &block)
#files[path].close
#files.delete(path)
- #statcache.delete(path)
+ ##statcache.delete(path)
+ inode = #statcache.delete(path)
+ #sincedb[inode] = 0
else
#logger.warn("unknown event type #{event} for #{path}")
end

logstash, syslog and grok

I am working on an ELK-stack configuration. logstash-forwarder is used as a log shipper, each type of log is tagged with a type-tag:
{
"network": {
"servers": [ "___:___" ],
"ssl ca": "___",
"timeout": 15
},
"files": [
{
"paths": [
"/var/log/secure"
],
"fields": {
"type": "syslog"
}
}
]
}
That part works fine... Now, I want logstash to split the message string in its parts; luckily, that is already implemented in the default grok patterns, so the logstash.conf remains simple so far:
input {
lumberjack {
port => 6782
ssl_certificate => "___" ssl_key => "___"
}
}
filter {
if [type] == "syslog" {
grok {
match => [ "message", "%{SYSLOGLINE}" ]
}
}
}
output {
elasticsearch {
cluster => "___"
template => "___"
template_overwrite => true
node_name => "logstash-___"
bind_host => "___"
}
}
The issue I have here is that the document that is received by elasticsearch still holds the whole line (including timestamp etc.) in the message field. Also, the #timestamp still shows the date of when logstash has received the message which makes is bad to search since kibana does query the #timestamp in order to filter by date... Any idea what I'm doing wrong?
Thanks, Daniel
The reason your "message" field contains the original log line (including timestamps etc) is that the grok filter by default won't allow existing fields to be overwritten. In other words, even though the SYSLOGLINE pattern,
SYSLOGLINE %{SYSLOGBASE2} %{GREEDYDATA:message}
captures the message into a "message" field it won't overwrite the current field value. The solution is to set the grok filter's "overwrite" parameter.
grok {
match => [ "message", "%{SYSLOGLINE}" ]
overwrite => [ "message" ]
}
To populate the "#timestamp" field, use the date filter. This will probably work for you:
date {
match => [ "timestamp", "MMM dd HH:mm:ss", "MMM d HH:mm:ss" ]
}
It is hard to know were the problem without seeing an example event that is causing you the problem. I can suggest you to try the grok debugger in order to verify the pattern is correct and to adjust it to your needs once you see the problem.

Logstash date parsing as timestamp using the date filter

Well, after looking around quite a lot, I could not find a solution to my problem, as it "should" work, but obviously doesn't.
I'm using on a Ubuntu 14.04 LTS machine Logstash 1.4.2-1-2-2c0f5a1, and I am receiving messages such as the following one:
2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
BTW, I am also multiline
In the input configuration, I do have a multiline codec and the event is parsed correctly. I also separate the event text in several parts so that it is easier to read.
In the end, I obtain, as seen in Kibana, something like the following (JSON view):
{
"_index": "logstash-2014.08.06",
"_type": "customType",
"_id": "PRtj-EiUTZK3HWAm5RiMwA",
"_score": null,
"_source": {
"#timestamp": "2014-08-06T08:51:21.160Z",
"#version": "1",
"tags": [
"multiline"
],
"type": "utg-su",
"host": "ubuntu-14",
"path": "/mnt/folder/thisIsTheLogFile.log",
"logTimestamp": "2014-08-05;10:21:13.618",
"logThreadId": "17",
"logLevel": "INFO",
"logMessage": "Class.Type - This is a log message from the class:\r\n BTW, I am also multiline\r"
},
"sort": [
"21",
1407315081160
]
}
You may have noticed that I put a ";" in the timestamp. The reason is that I want to be able to sort the logs using the timestamp string, and apparently logstash is not that good at that (e.g.: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html).
I have unsuccessfull tried to use the date filter in multiple ways, and it apparently did not work.
date {
locale => "en"
match => ["logTimestamp", "YYYY-MM-dd;HH:mm:ss.SSS", "ISO8601"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
Since I read that the Joda library may have problems if the string is not strictly ISO 8601-compliant (very picky and expects a T, see https://logstash.jira.com/browse/LOGSTASH-180), I also tried to use mutate to convert the string to something like 2014-08-05T10:21:13.618 and then use "YYYY-MM-dd'T'HH:mm:ss.SSS". That also did not work.
I do not want to have to manually put a +02:00 on the time because that would give problems with daylight saving.
In any of these cases, the event goes to elasticsearch, but date does apparently nothing, as #timestamp and logTimestamp are different and no debug field is added.
Any idea how I could make the logTime strings properly sortable? I focused on converting them to a proper timestamp, but any other solution would also be welcome.
As you can see below:
When sorting over #timestamp, elasticsearch can do it properly, but since this is not the "real" log timestamp, but rather when the logstash event was read, I need (obviously) to be able to sort also over logTimestamp. This is what then is output. Obviously not that useful:
Any help is welcome! Just let me know if I forgot some information that may be useful.
Update:
Here is the filter config file that finally worked:
# Filters messages like this:
# 2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
# BTW, I am also multiline
# Take only type- events (type-componentA, type-componentB, etc)
filter {
# You cannot write an "if" outside of the filter!
if "type-" in [type] {
grok {
# Parse timestamp data. We need the "(?m)" so that grok (Oniguruma internally) correctly parses multi-line events
patterns_dir => "./patterns"
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:logTimestampString}[ ;]\[%{DATA:logThreadId}\][ ;]%{LOGLEVEL:logLevel}[ ;]*%{GREEDYDATA:logMessage}" ]
}
# The timestamp may have commas instead of dots. Convert so as to store everything in the same way
mutate {
gsub => [
# replace all commas with dots
"logTimestampString", ",", "."
]
}
mutate {
gsub => [
# make the logTimestamp sortable. With a space, it is not! This does not work that well, in the end
# but somehow apparently makes things easier for the date filter
"logTimestampString", " ", ";"
]
}
date {
locale => "en"
match => ["logTimestampString", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "logTimestamp"
}
}
}
filter {
if "type-" in [type] {
# Remove already-parsed data
mutate {
remove_field => [ "message" ]
}
}
}
I have tested your date filter. it works on me!
Here is my configuration
input {
stdin{}
}
filter {
date {
locale => "en"
match => ["message", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
output {
stdout {
codec => "rubydebug"
}
}
And I use this input:
2014-08-01;11:00:22.123
The output is:
{
"message" => "2014-08-01;11:00:22.123",
"#version" => "1",
"#timestamp" => "2014-08-01T09:00:22.123Z",
"host" => "ABCDE",
"debug" => "timestampMatched"
}
So, please make sure that your logTimestamp has the correct value.
It is probably other problem. Or can you provide your log event and logstash configuration for more discussion. Thank you.
This worked for me - with a slightly different datetime format:
# 2017-11-22 13:00:01,621 INFO [AtlassianEvent::0-BAM::EVENTS:pool-2-thread-2] [BuildQueueManagerImpl] Sent ExecutableQueueUpdate: addToQueue, agents known to be affected: []
input {
file {
path => "/data/atlassian-bamboo.log"
start_position => "beginning"
type => "logs"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
charset => "ISO-8859-1"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "(?m)^%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}\[%{DATA:thread_id}\]%{SPACE}\[%{WORD:classname}\]%{SPACE}%{GREEDYDATA:logmessage}" ]
}
date {
match => ["logtime", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
timezone => "Europe/Berlin"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

Resources