Drop log messages containing a specific string - elasticsearch

So I have log messages of the format :
[INFO] <blah.blah> 2016-06-27 21:41:38,263 some text
[INFO] <blah.blah> 2016-06-28 18:41:38,262 some other text
Now I want to drop all logs that does not contain a specific string "xyz" and keep all the rest. I also want to index timestamp.
grokdebug is not helping much.
This is my attempt :
input {
file {
path => "/Users/username/Desktop/validateLogconf/logs/*"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => '%{SYSLOG5424SD:loglevel} <%{JAVACLASS:job}> %{GREEDYDATA:content}'
}
}
date {
match => [ "Date", "YYYY-mm-dd HH:mm:ss" ]
locale => en
}
}
output {
stdout {
codec => plain {
charset => "ISO-8859-1"
}
}
elasticsearch {
hosts => "http://localhost:9201"
index => "hello"
}
}
I am new to grok so patterns above might not be making sense. Please help.

To drop the message that does not contain the string xyz:
if ([message] !~ "xyz") {
drop { }
}
Your grok pattern is not grabbing the date part of your logs.
Once you have a field from your grok pattern containing the date, you can invoque the date filter on this field.
So your grok filter should look like this:
grok {
match => {
"message" => '%{SYSLOG5424SD:loglevel} <%{JAVACLASS:job}> %{TIMESTAMP_ISO8601:Date} %{GREEDYDATA:content}'
}
}
I added a part to grab the date, which will be in the field Date. Then you can use the date filter:
date {
match => [ "Date", "YYYY-mm-dd HH:mm:ss,SSS" ]
locale => en
}
I added the ,SSS so that the format match the one from the Date field.
The parsed date will be stored in the #timestamp field, unless specified differently with the target parameter.

to check if your message contains a substring, you can do:
if [message] =~ "a" {
mutate {
add_field => { "hello" => "world" }
}
}
So in your case you can use the if to invoke the drop{} filter, or you can wrap your output plugin in it.
To parse a date and write it back to your timestamp field, you can use something like this:
date {
locale => "en"
match => ["timestamp", "ISO8601"]
timezone => "UTC"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
This matches my timestamp in:
Source field: "timestamp" (see match)
Format is "ISO...", you can use a custom format that matches your timestamp
timezone - self explanatory
target - write it back into the event's "#timestamp" field
Add a debug field to check that it has been matched correctly
Hope that helps,
Artur

Related

Logstash - Setting a timestamp from a JSON parsed object

I am having an issue with setting a timestamp from a JSON parse.
I have this string:
[{"orderNumber":"423523-4325-3212-4235-463a72e76fe8","externalOrderNumber":"reactivate_22d6ff0d8f55eb821be14df9d35505a6","operation":{"name":"CAPTURE","amount":134,"status":"SUCCESS","createdAt":"2015-05-11T09:14:30.969Z","updatedAt":{}}}]
I parse it as a json using this Logstash filter:
grok {
match => { "message" => "\[%{GREEDYDATA:firstjson}\]%{SPACE} \[%{GREEDYDATA:secondjson}\}]}]"}
}
json{
source => "firstjson"
}
date {
match => [ "operation.createdAt", "ISO8601"]
}
mutate {
remove_field => [ "firstjson", "secondjson" ]
}
}
This creates a document inside the ElasticSearch. I have a field named operation.createdAt which is properly recognised as a date field. But for some reason, this line:
date {
match => [ "operation.createdAt", "ISO8601"]
}
is not setting #timestamp field. Current #timestamp field is set at the moment of document insertion. What am I doing wrong?
Thanks to nice people at ES Logstash Community, I have found the answer.
Instead of:
date {
match => [ "operation.createdAt", "ISO8601"]
}
I use this:
date {
match => [ "[operation][createdAt]", "ISO8601"]
}
and that properly extracts and parses the JSON time object.

Logstash - Add fields from the log - Grok

I'm learning logstash and I'm using Kibana to see the logs. I would like to know if is there anyway to add fields using data from message property.
For example, the log is like this:
#timestamp:December 21st 2016, 21:39:12.444 port:47,144
appid:%{[path]} host:172.18.0.5 levell:level message:
{"#timestamp":"2016-12-22T00:39:12.438+00:00","#version":1,"message":"Hello","logger_name":"com.empresa.miAlquiler.controllers.UserController","thread_name":"http-nio-7777-exec-1","level":"INFO","level_value":20000,
"HOSTNAME":"6f92ae402cb4","X-Span-Export":"false","X-B3-SpanId":"8f548829e9d18a8a","X-B3-TraceId":"8f548829e9d18a8a"}
My logstash conf is like:
filter {
grok {
match => {
"message" =>
"^%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:level}\s+%{NUMBER:pid}\s+---\s+\[\s*%{USERNAME:thread}\s*\]\s+%{JAVAFILE:class}\s*:\s*%{DATA:themessage}(?:\n+(?<stacktrace>(?:.|\r|\n)+))?$"
}
}
date {
match => [ "timestamp" , "yyyy-MM-dd HH:mm:ss.SSS" ]
}
mutate {
remove_field => ["#version"]
add_field => {
"appid" => "%{[path]}"
}
add_field => {
"levell" => "level"
}
}
}
I would like to take level (in the log is INFO), and message (in the log is Hello) and add them as fields.
Is there anyway to do that?
What if you do something like this using mutate:
filter {
mutate {
add_field => ["newfield", "%{appid} %{levell}"] <-- this should concat both your appid and level to a new field
}
}
You might have a look at this thread.

Extract Parameter (sub-string) from URL GROK Pattern

I have ELK running for log analysis. I have everything working. There are just a few tweaks I would like to make. To all the ES/ELK Gods in stackoverflow, I'd appreciate any help on this. I'd gladly buy you a cup of coffee! :D
Example:
URL: /origina-www.domain.com/this/is/a/path?page=2
First I would like to get the entire path as seen above.
Second, I would like to get just the path before the parameter: /origina-www.domain.com/this/is/a/path
Third, I would like to get just the parameter: ?page=2
Fourth, I would like to make the timestamp on the logfile be the main time stamp on kibana. Currently, the timestamp kibana is showing is the date and time the ES was processed.
This is what a sample entry looks like:
2016-10-19 23:57:32 192.168.0.1 GET /origin-www.example.com/url 200 1144 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" "-"
Here's my config:
if [type] == "syslog" {
grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
date {
match => [ "timestamp", "MMM dd, yyyy HH:mm:ss a" ]
locale => "en"
}
}
ES Version: 5.0.1
Logstash Version: 5.0
Kibana: 5.0
UPDATE: I was actually able to solve it by using:
grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
grok {
match => [ "request", "%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}" ]
}
kv {
source => "uri_query"
field_split => "&"
target => "query"
}
In order to use the actual timestamp of your log entry rather than the indexed time, you could use the date and mutate plugins as such to override the existing timestamp value. You could have your logstash filter look, something like this:
//filtering your log file
grok {
patterns_dir => ["/pathto/patterns"] <--- you could have a pattern file with such expression LOGTIMESTAMP %{YEAR}%{MONTHNUM}%{MONTHDAY} %{TIME} if you have to change the timestamp format.
match => { "message" => "^%{LOGTIMESTAMP:logtimestamp}%{GREEDYDATA}" }
}
//overriding the existing timestamp with the new field logtimestamp
mutate {
add_field => { "timestamp" => "%{logtimestamp}" }
remove_field => ["logtimestamp"]
}
//inserting the timestamp as UTC
date {
match => [ "timestamp" , "ISO8601" , "yyyyMMdd HH:mm:ss.SSS" ]
target => "timestamp"
locale => "en"
timezone => "UTC"
}
You could follow up Question for more as well. Hope it helps.
grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
grok {
match => [ "request", "%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}" ]
}
kv {
source => "uri_query"
field_split => "&"
target => "query"
}

logstash elastic search date output is different

my system audit log contains the date format like created_at":1422765535789, so, the elastic search output also displays the date as same style. however, I would like convert and print this 1422765535789 to unix style date format.
I've used this format in syslog file (as suggested by another question thread) . but I am not getting the above value to unix style Date format
date {
match => ["created_at", "UNIX_MS"]
}
Hi, I've updated the code in the syslog , however, I am getting the created_at still output to elastic search page on same format like 1422765535789 , please find the modified code
input {
stdin {
}
}
filter {
grok {
match => [ "message", "%{NUMBER:created_at}"
]
}
if [message] =~ /^created_at/ {
date {
match => [ "created_at" , "UNIX_MS" ]
}
ruby {
code => "
event['created_at'] = Time.at(event['created_at']/1000);
"
}
}
}
output {
elasticsearch { host => localhost }
stdout { codec => rubydebug }
}
The date filter is used to update the #timestamp field value.
input {
stdin {
}
}
filter {
grok {
match => [ "message", "%{NUMBER:created_at:int}"
]
}
if "_grokparsefailure" not in [tags]
{
date {
match => [ "created_at" , "UNIX_MS" ]
}
ruby {
code => "
event['created_at'] = Time.at(event['created_at']/1000);
"
}
}
}
output
{
stdout {
codec => rubydebug
}
}
Here is my config. When I input 1422765535789, it can parse the value and update the #timestamp field value.
The output is
{
"message" => "1422765535789",
"#version" => "1",
"#timestamp" => "2015-02-01T04:38:55.789Z",
"host" => "ABC",
"created_at" => "2015-02-01T12:38:55.000+08:00"
}
You can found the value of #timestamp is same with created_at.
And, the ruby filter is used to convert the created_at to UTC format.
FYI.

Logstash date parsing as timestamp using the date filter

Well, after looking around quite a lot, I could not find a solution to my problem, as it "should" work, but obviously doesn't.
I'm using on a Ubuntu 14.04 LTS machine Logstash 1.4.2-1-2-2c0f5a1, and I am receiving messages such as the following one:
2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
BTW, I am also multiline
In the input configuration, I do have a multiline codec and the event is parsed correctly. I also separate the event text in several parts so that it is easier to read.
In the end, I obtain, as seen in Kibana, something like the following (JSON view):
{
"_index": "logstash-2014.08.06",
"_type": "customType",
"_id": "PRtj-EiUTZK3HWAm5RiMwA",
"_score": null,
"_source": {
"#timestamp": "2014-08-06T08:51:21.160Z",
"#version": "1",
"tags": [
"multiline"
],
"type": "utg-su",
"host": "ubuntu-14",
"path": "/mnt/folder/thisIsTheLogFile.log",
"logTimestamp": "2014-08-05;10:21:13.618",
"logThreadId": "17",
"logLevel": "INFO",
"logMessage": "Class.Type - This is a log message from the class:\r\n BTW, I am also multiline\r"
},
"sort": [
"21",
1407315081160
]
}
You may have noticed that I put a ";" in the timestamp. The reason is that I want to be able to sort the logs using the timestamp string, and apparently logstash is not that good at that (e.g.: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html).
I have unsuccessfull tried to use the date filter in multiple ways, and it apparently did not work.
date {
locale => "en"
match => ["logTimestamp", "YYYY-MM-dd;HH:mm:ss.SSS", "ISO8601"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
Since I read that the Joda library may have problems if the string is not strictly ISO 8601-compliant (very picky and expects a T, see https://logstash.jira.com/browse/LOGSTASH-180), I also tried to use mutate to convert the string to something like 2014-08-05T10:21:13.618 and then use "YYYY-MM-dd'T'HH:mm:ss.SSS". That also did not work.
I do not want to have to manually put a +02:00 on the time because that would give problems with daylight saving.
In any of these cases, the event goes to elasticsearch, but date does apparently nothing, as #timestamp and logTimestamp are different and no debug field is added.
Any idea how I could make the logTime strings properly sortable? I focused on converting them to a proper timestamp, but any other solution would also be welcome.
As you can see below:
When sorting over #timestamp, elasticsearch can do it properly, but since this is not the "real" log timestamp, but rather when the logstash event was read, I need (obviously) to be able to sort also over logTimestamp. This is what then is output. Obviously not that useful:
Any help is welcome! Just let me know if I forgot some information that may be useful.
Update:
Here is the filter config file that finally worked:
# Filters messages like this:
# 2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
# BTW, I am also multiline
# Take only type- events (type-componentA, type-componentB, etc)
filter {
# You cannot write an "if" outside of the filter!
if "type-" in [type] {
grok {
# Parse timestamp data. We need the "(?m)" so that grok (Oniguruma internally) correctly parses multi-line events
patterns_dir => "./patterns"
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:logTimestampString}[ ;]\[%{DATA:logThreadId}\][ ;]%{LOGLEVEL:logLevel}[ ;]*%{GREEDYDATA:logMessage}" ]
}
# The timestamp may have commas instead of dots. Convert so as to store everything in the same way
mutate {
gsub => [
# replace all commas with dots
"logTimestampString", ",", "."
]
}
mutate {
gsub => [
# make the logTimestamp sortable. With a space, it is not! This does not work that well, in the end
# but somehow apparently makes things easier for the date filter
"logTimestampString", " ", ";"
]
}
date {
locale => "en"
match => ["logTimestampString", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "logTimestamp"
}
}
}
filter {
if "type-" in [type] {
# Remove already-parsed data
mutate {
remove_field => [ "message" ]
}
}
}
I have tested your date filter. it works on me!
Here is my configuration
input {
stdin{}
}
filter {
date {
locale => "en"
match => ["message", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
output {
stdout {
codec => "rubydebug"
}
}
And I use this input:
2014-08-01;11:00:22.123
The output is:
{
"message" => "2014-08-01;11:00:22.123",
"#version" => "1",
"#timestamp" => "2014-08-01T09:00:22.123Z",
"host" => "ABCDE",
"debug" => "timestampMatched"
}
So, please make sure that your logTimestamp has the correct value.
It is probably other problem. Or can you provide your log event and logstash configuration for more discussion. Thank you.
This worked for me - with a slightly different datetime format:
# 2017-11-22 13:00:01,621 INFO [AtlassianEvent::0-BAM::EVENTS:pool-2-thread-2] [BuildQueueManagerImpl] Sent ExecutableQueueUpdate: addToQueue, agents known to be affected: []
input {
file {
path => "/data/atlassian-bamboo.log"
start_position => "beginning"
type => "logs"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
charset => "ISO-8859-1"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "(?m)^%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}\[%{DATA:thread_id}\]%{SPACE}\[%{WORD:classname}\]%{SPACE}%{GREEDYDATA:logmessage}" ]
}
date {
match => ["logtime", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
timezone => "Europe/Berlin"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

Resources