Logstash: How to use date/time in a filename as an imported field - elasticsearch

I have a bunch of log files that are named as 'XXXXXX_XX_yymmdd_hh:mm:ss.txt' - I need to include the date and time (separate fields) from the filename in fields that are added to Logstash.
Can anyone help?
Thanks

Use a grok filter to extract the date and time:
filter {
grok {
match => [
"path",
"^%{GREEDYDATA}/[^/]+_%{INT:date}_%{TIME:time}\.txt$"
]
}
}
Depending on what goes instead of XXXXXX_XX you might prefer a stricter expression. Also, GREEDYDATA isn't very efficient. This might yield better performance:
filter {
grok {
match => [
"path", "^(?:/[^/]+)+/[^/]+_%{INT:date}_%{TIME:time}\.txt$"
]
}
}

Related

Logstash normalise URL from JSON logs

I have logs in new line separated JSON like following
{
"httpRequest": {
"requestMethod": "GET",
"requestUrl": "/foo/submit?proj=56"
}
}
Now I need the url without the dynamic parts in the i.e. 1st resource (someTenant) and the query parameters to be added as a field in elasticsearch ie. the expected normalised url is
"requestUrl": "/{{someTenant}}/submit?{{someParams}}"
I already have the following filter in logstash config but not sure how to do sequence of regex operation on a specific field and add it as a new one.
json{
source => "message"
}
This way I could aggregate the unique endpoints although the urls are different in logs due to variable path params and query params.
Since this question tagged with grok, i will go ahead and assume you can use grok filters.
use grok filter and create a new field from requestUrl field, you can then use URIPATHPARAM grok pattern to separate various components from requestUrl as follows,
grok {
match => {"requestUrl" => "%{URIPATHPARAM:request_data}"}
}
this will produce following output,
{
"request_data": [
[
"/foo/submit?proj=56"
]
],
"URIPATH": [
[
"/foo/submit"
]
],
"URIPARAM": [
[
"?proj=56"
]
]
}
Can be tested on Grok Online Debugger
thanks

Not able to remove all fields with a prefix in Logstash

I have following fields after I have parsed my JSON in Logstash.
parsedcontent.userinfo.appId
parsedcontent.userinfo.deviceId
parsedcontent.userinfo.id
parsedcontent.userinfo.token
parsedcontent.userinfo.type.
I want to remove all these fields using a filter. I can do it with :
filter{
mutate{
remove_field => "[parsedcontent][userinfo][appId]"
}
}
But I have to write field names with same prefix many times and I have many such kind of fields. Is there any filter to remove fields with a prefix easily? Please guide.
You can use wildcards or regex.
filter {
mutate {
remove_field => "[parsedcontent*]"
}
}

Modify the content of a field using logstash

I am using logstash to get data from a sql database. There is a field called "code" in which the content has
this structure:
PO0000001209
ST0000000909
And what I would like to do is to remove the 6 zeros after the letters to get the following result:
PO1209
ST0909
I will put the result in another field called "code_short" and use it for my query in elasticsearch. I have configured the input
and the output in logstash but I am not sure how to do it using grok or maybe mutate filter
I have read some examples but I am quite new on this and I am a bit stuck.
Any help would be appreciated. Thanks.
You could use a mutate/gsub filter for this but that will replace the value of the code field:
filter {
mutate {
gsub => [
"code", "000000", "",
]
}
}
Another option is to use a grok filter like this:
filter {
grok {
match => { "code" => "(?<prefix>[a-zA-Z]+)000000%{INT:suffix}" }
add_field => { "code_short" => "%{prefix}%{suffix}"}
}
}

Logstash change time format

My log statement looks like this.
2014-04-23 06:40:29 INFO [1605853264] [ModuleName] - [ModuleName] -
Blah blah
I am able to parse it fine and it gets logged to ES correctly with following ES field
"LogTimestamp": "2014-04-23T13:40:29.000Z"
But my requirement is to log this statement as following, note 'z' is dropped with +0000. I tried replace, gsub but none changes the output.
"LogTimestamp": "2014-04-23T13:40:29.000+0000"
Can somebody help?
Here is my pattern
TEMP_TIMESTAMP %{YEAR}-%{MONTHNUM}-%{MONTHDAY}\s%{HOUR}:%{MINUTE}:%{SECOND} TEMP_LOG %{TEMP_TIMESTAMP:logdate}\s*?%{LOGLEVEL:TempLogLevel}\s*?\[\s?*%{BASE10NUM:TempThreadId}\]%{GREEDYDATA}
This is the filter config:
grok{
patterns_dir => ["patterns"]
match=> ["message", "%{TEMP_LOG}"]
}
date{
match => [ "logdate", "yyyy-MM-dd HH:mm:ss" ]
target => "LogTimestamp"
timezone => "PST8PDT"
}
mutate {
gsub => ["logdate", ".000Z", ".000+0000"]
}
I haven't quite understood meaning of fields in logstash and how they map to elastic search, that confusion is making me go wrong in this case.
You can use ruby plugin to do what you want!
As your requirement, you want to change this
"LogTimestamp": "2014-04-23T13:40:29.000Z"
to
"LogTimestamp": "2014-04-23T13:40:29.000+0000"
Try to use this filter
filter {
ruby {
code => "
event['LogTimestamp'] = event['LogTimestamp'].localtime('+00:00')
"
}
}
Hope this can help you.

How to handle non-matching Logstash grok filters

I am wondering what the best approach to take with my Logstash Grok filters. I have some filters that are for specific log entries, and won't apply to all entries. The ones that don't apply always generate _grokparsefailure tags. For example, I have one grok filter that's for every log entry and it works fine. Then I have another filter that's for error messages with tracebacks. The traceback filter throws a grokparsefailure for every single log entry that doesn't have a traceback.
I'd prefer to have it just pass the rule if there isn't a match instead of adding the parsefailure tag. I use the parsefailure tag to find things that aren't parsing properly, not things that simply didn't match a particular filter. Maybe it's just the nomenclature "parse failure" that gets me. To me that means there's something wrong with the filter (e.g. badly formatted), not that it didn't match.
So the question is, how should I handle this?
Make the filter pattern optional using ?
(ab)use the tag_on_failure option by setting it to nothing []
make the filter conditional using something like "if traceback in message"
something else I'm not considering?
Thanks in advance.
EDIT
I took the path of adding a conditional around the filter:
if [message] =~ /took\s\d+/ {
grok {
patterns_dir => "/etc/logstash/patterns"
match => ["message", "took\s+(?<servicetime>[\d\.]+)"]
add_tag => [ "stats", "servicetime" ]
}
}
Still interested in feedback though. What is considered "best practice" here?
When possible, I'd go with a conditional wrapper just like the one you're using. Feel free to post that as an answer!
If your application produces only a few different line formats, you can use multiple match patterns with the grok filter. By default, the filter will process up to the first successful match:
grok {
patterns_dir => "./patterns"
match => {
"message" => [
"%{BASE_PATTERN} %{EXTRA_PATTERN}",
"%{BASE_PATTERN}",
"%{SOME_OTHER_PATTERN}"
]
}
}
If your logic is less straightforward (maybe you need to check the same condition more than once), the grep filter can be useful to add a tag. Something like this:
grep {
drop => false #grep normally drops non-matching events
match => ["message", "/took\s\d+/"]
add_tag => "has_traceback"
}
...
if "has_traceback" in [tags] {
...
}
You can also add tag_on_failure => [] to your grok stanza like so:
grok {
match => ["context", "\"tags\":\[%{DATA:apptags}\]"]
tag_on_failure => [ ]
}
grok will still fail, but will do so without adding to the tags array.
This is the most efficient way of doing this. Ignore the filter
filter {
grok {
match => [ "message", "something"]
}
if "_grokparsefailure" in [tags] {
drop { }
}
}
You can also do this
remove_tag => [ "_grokparsefailure" ]
whenever you have a match.

Resources