Extract Parameter (sub-string) from URL GROK Pattern - elasticsearch

I have ELK running for log analysis. I have everything working. There are just a few tweaks I would like to make. To all the ES/ELK Gods in stackoverflow, I'd appreciate any help on this. I'd gladly buy you a cup of coffee! :D
Example:
URL: /origina-www.domain.com/this/is/a/path?page=2
First I would like to get the entire path as seen above.
Second, I would like to get just the path before the parameter: /origina-www.domain.com/this/is/a/path
Third, I would like to get just the parameter: ?page=2
Fourth, I would like to make the timestamp on the logfile be the main time stamp on kibana. Currently, the timestamp kibana is showing is the date and time the ES was processed.
This is what a sample entry looks like:
2016-10-19 23:57:32 192.168.0.1 GET /origin-www.example.com/url 200 1144 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" "-"
Here's my config:
if [type] == "syslog" {
grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
date {
match => [ "timestamp", "MMM dd, yyyy HH:mm:ss a" ]
locale => "en"
}
}
ES Version: 5.0.1
Logstash Version: 5.0
Kibana: 5.0
UPDATE: I was actually able to solve it by using:
grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
grok {
match => [ "request", "%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}" ]
}
kv {
source => "uri_query"
field_split => "&"
target => "query"
}

In order to use the actual timestamp of your log entry rather than the indexed time, you could use the date and mutate plugins as such to override the existing timestamp value. You could have your logstash filter look, something like this:
//filtering your log file
grok {
patterns_dir => ["/pathto/patterns"] <--- you could have a pattern file with such expression LOGTIMESTAMP %{YEAR}%{MONTHNUM}%{MONTHDAY} %{TIME} if you have to change the timestamp format.
match => { "message" => "^%{LOGTIMESTAMP:logtimestamp}%{GREEDYDATA}" }
}
//overriding the existing timestamp with the new field logtimestamp
mutate {
add_field => { "timestamp" => "%{logtimestamp}" }
remove_field => ["logtimestamp"]
}
//inserting the timestamp as UTC
date {
match => [ "timestamp" , "ISO8601" , "yyyyMMdd HH:mm:ss.SSS" ]
target => "timestamp"
locale => "en"
timezone => "UTC"
}
You could follow up Question for more as well. Hope it helps.

grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
grok {
match => [ "request", "%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}" ]
}
kv {
source => "uri_query"
field_split => "&"
target => "query"
}

Related

Logstash to Opensearch , _dateparsefailure tag

I have some problems while using logstash to opensearch.
filter{
grok {
patterns_dir => ["/etc/logstash/conf.d/patterns"]
match => [ "message","%{DATE_FORM:logdate}%{LOGTYPE:logtype}:%{SPACE}%{GREEDYDATA:msgbody}" ]
}
date {
match => ["logdate", "yyyy.MM.dd-HH.mm.ss:SSS"]
timezone => "UTC"
target=>"timestamp"
}
mutate {
remove_field => ["message"]
add_field => {
"file" => "%{[#metadata][s3][key]}"
}
}
}
This is the conf file I'm using for logstash.
In the opensearch console
#timestamp : Dec 15, 2022 # 18:10:56.975
logdate [2022.12.10-11.57.36:345]
tags _dateparsefailure
The timestamp , logdate are different and _dateparsefailure error occurs.
In the raw logs , it starts with
[2022.12.10-11.57.36:345]
this format.
Right now ,
logdate : raw log's timestamp
#timestamp : the time that log send to opensearch
I want to match logdate and #timestamp.
How can I modify the filter.date.match part to make the results of the logdate and #timestamp filters the same?
If you have multiple times you can have more than one filter.date.match, you can do this:
filter{
date {
match => ["logdate", "yyyy.MM.dd-HH.mm.ss:SSS"]
timezone => "UTC"
target=>"logdate"
}
date {
match => ["#timestamp", "yyyy.MM.dd-HH.mm.ss:SSS"]
timezone => "UTC"
target=>"#timestamp"
}
}
If your time field has multiple formats, you can do this:
date {
match => [ "logdate", "yyyy.MM.dd-HH.mm.ss:SSS", "third_format", "ISO8601" ]
target=> "#timestamp"
}
Reference: https://www.elastic.co/guide/en/logstash/current/plugins-filters-date.html#plugins-filters-date-match

Drop log messages containing a specific string

So I have log messages of the format :
[INFO] <blah.blah> 2016-06-27 21:41:38,263 some text
[INFO] <blah.blah> 2016-06-28 18:41:38,262 some other text
Now I want to drop all logs that does not contain a specific string "xyz" and keep all the rest. I also want to index timestamp.
grokdebug is not helping much.
This is my attempt :
input {
file {
path => "/Users/username/Desktop/validateLogconf/logs/*"
start_position => "beginning"
}
}
filter {
grok {
match => {
"message" => '%{SYSLOG5424SD:loglevel} <%{JAVACLASS:job}> %{GREEDYDATA:content}'
}
}
date {
match => [ "Date", "YYYY-mm-dd HH:mm:ss" ]
locale => en
}
}
output {
stdout {
codec => plain {
charset => "ISO-8859-1"
}
}
elasticsearch {
hosts => "http://localhost:9201"
index => "hello"
}
}
I am new to grok so patterns above might not be making sense. Please help.
To drop the message that does not contain the string xyz:
if ([message] !~ "xyz") {
drop { }
}
Your grok pattern is not grabbing the date part of your logs.
Once you have a field from your grok pattern containing the date, you can invoque the date filter on this field.
So your grok filter should look like this:
grok {
match => {
"message" => '%{SYSLOG5424SD:loglevel} <%{JAVACLASS:job}> %{TIMESTAMP_ISO8601:Date} %{GREEDYDATA:content}'
}
}
I added a part to grab the date, which will be in the field Date. Then you can use the date filter:
date {
match => [ "Date", "YYYY-mm-dd HH:mm:ss,SSS" ]
locale => en
}
I added the ,SSS so that the format match the one from the Date field.
The parsed date will be stored in the #timestamp field, unless specified differently with the target parameter.
to check if your message contains a substring, you can do:
if [message] =~ "a" {
mutate {
add_field => { "hello" => "world" }
}
}
So in your case you can use the if to invoke the drop{} filter, or you can wrap your output plugin in it.
To parse a date and write it back to your timestamp field, you can use something like this:
date {
locale => "en"
match => ["timestamp", "ISO8601"]
timezone => "UTC"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
This matches my timestamp in:
Source field: "timestamp" (see match)
Format is "ISO...", you can use a custom format that matches your timestamp
timezone - self explanatory
target - write it back into the event's "#timestamp" field
Add a debug field to check that it has been matched correctly
Hope that helps,
Artur

Logstash - Error log event date going as string to ES

I'm using Logstash to forward error logs from app servers to ES. Everything is working fine except that log timestamp going as string to ES.
Here is my log format
[Date:2015-03-25 01:29:09,554] [ThreadId:4432] [HostName:AEPLWEB1] [Host:(null)] [ClientIP:(null)] [Browser:(null)] [UserAgent:(null)] [PhysicalPath:(null)] [Url:(null)] [QueryString:(null)] [Referrer:(null)] [Carwale.Notifications.ExceptionHandler] System.InvalidCastException: Unable to cast object of type 'Carwale.Entity.CMS.Articles.ArticleDetails' to type 'Carwale.Entity.CMS.Articles.ArticlePageDetails'. at Carwale.Cache.Core.MemcacheManager.GetFromCacheCore[T](String key, TimeSpan cacheDuration, Func`1 dbCallback, Boolean& isKeyFirstTimeCreated)
Filter configuration for logstash forwarder
filter {
multiline {
pattern => "^\[Date:%{TIMESTAMP_ISO8601}"
negate => true
what => "previous"
}
grok {
match => [ "message", "(?:Date:%{TIMESTAMP_ISO8601:log_timestamp})\] \[(?:ThreadId:%{NUMBER:ThreadId})\] \[(?:HostName:%{WORD:HostName})\] \[(?:Host:\(%{WORD:Host})\)\] \[(?:ClientIP:\(%{WORD:ClientIP})\)\] \[(?:Browser:\(%{WORD:Browser})\)\] \[(?:UserAgent:\(%{WORD:UserAgent})\)\] \[(?:PhysicalPath:\(%{WORD:PhysicalPath})\)\] \[(?:Url:\(%{WORD:Url})\)\] \[(?:QueryString:\(%{WORD:QueryString})\)\] \[(?:Referrer:\(%{WORD:Referrer})\)\] \[%{DATA:Logger}\] %{GREEDYDATA:err_message}" ]
}
date {
match => [ "log_timestamp", "MMM dd YYY HH:mm:ss","MMM d YYY HH:mm:ss", "ISO8601" ]
target => "log_timestamp"
}
mutate {
convert => ["ThreadId", "integer"]
}
}
How I can make it date in ES? Please help. Thanks in advance.
I had the similar issue. Now fixed it with the below workaround.
grok {
match => {
"message" => "%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}[T ]%{HOUR:hour}:%{MINUTE:minute}:%{SECOND:second}"
}
}
grok{
match => {
"second" => "(?<asecond>(^[^,]*))" }
}
mutate {
add_field => {
"timestamp" => "%{year}-%{month}-%{day} %{hour}:%{minute}:%{asecond}"
}
}
date{ match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ] timezone=> "UTC" target => "log_timestamp" }
Thanks,

Logstash date parsing as timestamp using the date filter

Well, after looking around quite a lot, I could not find a solution to my problem, as it "should" work, but obviously doesn't.
I'm using on a Ubuntu 14.04 LTS machine Logstash 1.4.2-1-2-2c0f5a1, and I am receiving messages such as the following one:
2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
BTW, I am also multiline
In the input configuration, I do have a multiline codec and the event is parsed correctly. I also separate the event text in several parts so that it is easier to read.
In the end, I obtain, as seen in Kibana, something like the following (JSON view):
{
"_index": "logstash-2014.08.06",
"_type": "customType",
"_id": "PRtj-EiUTZK3HWAm5RiMwA",
"_score": null,
"_source": {
"#timestamp": "2014-08-06T08:51:21.160Z",
"#version": "1",
"tags": [
"multiline"
],
"type": "utg-su",
"host": "ubuntu-14",
"path": "/mnt/folder/thisIsTheLogFile.log",
"logTimestamp": "2014-08-05;10:21:13.618",
"logThreadId": "17",
"logLevel": "INFO",
"logMessage": "Class.Type - This is a log message from the class:\r\n BTW, I am also multiline\r"
},
"sort": [
"21",
1407315081160
]
}
You may have noticed that I put a ";" in the timestamp. The reason is that I want to be able to sort the logs using the timestamp string, and apparently logstash is not that good at that (e.g.: http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/multi-fields.html).
I have unsuccessfull tried to use the date filter in multiple ways, and it apparently did not work.
date {
locale => "en"
match => ["logTimestamp", "YYYY-MM-dd;HH:mm:ss.SSS", "ISO8601"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
Since I read that the Joda library may have problems if the string is not strictly ISO 8601-compliant (very picky and expects a T, see https://logstash.jira.com/browse/LOGSTASH-180), I also tried to use mutate to convert the string to something like 2014-08-05T10:21:13.618 and then use "YYYY-MM-dd'T'HH:mm:ss.SSS". That also did not work.
I do not want to have to manually put a +02:00 on the time because that would give problems with daylight saving.
In any of these cases, the event goes to elasticsearch, but date does apparently nothing, as #timestamp and logTimestamp are different and no debug field is added.
Any idea how I could make the logTime strings properly sortable? I focused on converting them to a proper timestamp, but any other solution would also be welcome.
As you can see below:
When sorting over #timestamp, elasticsearch can do it properly, but since this is not the "real" log timestamp, but rather when the logstash event was read, I need (obviously) to be able to sort also over logTimestamp. This is what then is output. Obviously not that useful:
Any help is welcome! Just let me know if I forgot some information that may be useful.
Update:
Here is the filter config file that finally worked:
# Filters messages like this:
# 2014-08-05 10:21:13,618 [17] INFO Class.Type - This is a log message from the class:
# BTW, I am also multiline
# Take only type- events (type-componentA, type-componentB, etc)
filter {
# You cannot write an "if" outside of the filter!
if "type-" in [type] {
grok {
# Parse timestamp data. We need the "(?m)" so that grok (Oniguruma internally) correctly parses multi-line events
patterns_dir => "./patterns"
match => [ "message", "(?m)%{TIMESTAMP_ISO8601:logTimestampString}[ ;]\[%{DATA:logThreadId}\][ ;]%{LOGLEVEL:logLevel}[ ;]*%{GREEDYDATA:logMessage}" ]
}
# The timestamp may have commas instead of dots. Convert so as to store everything in the same way
mutate {
gsub => [
# replace all commas with dots
"logTimestampString", ",", "."
]
}
mutate {
gsub => [
# make the logTimestamp sortable. With a space, it is not! This does not work that well, in the end
# but somehow apparently makes things easier for the date filter
"logTimestampString", " ", ";"
]
}
date {
locale => "en"
match => ["logTimestampString", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "logTimestamp"
}
}
}
filter {
if "type-" in [type] {
# Remove already-parsed data
mutate {
remove_field => [ "message" ]
}
}
}
I have tested your date filter. it works on me!
Here is my configuration
input {
stdin{}
}
filter {
date {
locale => "en"
match => ["message", "YYYY-MM-dd;HH:mm:ss.SSS"]
timezone => "Europe/Vienna"
target => "#timestamp"
add_field => { "debug" => "timestampMatched"}
}
}
output {
stdout {
codec => "rubydebug"
}
}
And I use this input:
2014-08-01;11:00:22.123
The output is:
{
"message" => "2014-08-01;11:00:22.123",
"#version" => "1",
"#timestamp" => "2014-08-01T09:00:22.123Z",
"host" => "ABCDE",
"debug" => "timestampMatched"
}
So, please make sure that your logTimestamp has the correct value.
It is probably other problem. Or can you provide your log event and logstash configuration for more discussion. Thank you.
This worked for me - with a slightly different datetime format:
# 2017-11-22 13:00:01,621 INFO [AtlassianEvent::0-BAM::EVENTS:pool-2-thread-2] [BuildQueueManagerImpl] Sent ExecutableQueueUpdate: addToQueue, agents known to be affected: []
input {
file {
path => "/data/atlassian-bamboo.log"
start_position => "beginning"
type => "logs"
codec => multiline {
pattern => "^%{TIMESTAMP_ISO8601} "
charset => "ISO-8859-1"
negate => true
what => "previous"
}
}
}
filter {
grok {
match => [ "message", "(?m)^%{TIMESTAMP_ISO8601:logtime}%{SPACE}%{LOGLEVEL:loglevel}%{SPACE}\[%{DATA:thread_id}\]%{SPACE}\[%{WORD:classname}\]%{SPACE}%{GREEDYDATA:logmessage}" ]
}
date {
match => ["logtime", "yyyy-MM-dd HH:mm:ss,SSS", "yyyy-MM-dd HH:mm:ss,SSS Z", "MMM dd, yyyy HH:mm:ss a" ]
timezone => "Europe/Berlin"
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}

Logstash converting date to valid joda time (#timestamp)

Hope someone can help me out!
I have a question about logstash. I grok the following date with succes: 26/Jun/2013:14:00:26 +0200
Next, I want this date to be used as the #timestamp of the event. As you know logstash automatically adds a timestamp.
Replacing the timestamp that logstash is adding can be done by the date filter. I have added the following date filter: match => [ "date", "dd/MMM/YYYY:HH:mm:ss Z"]
But, for some reason, that doesn't work. When I test it out, I see that logstash just adds his own timestamp.
Code:
grok {
type => "log-date"
pattern => "%{HTTPDATE:date}"
}
date{
type => "log-date"
match => [ "date", "dd/MMM/YYYY:HH:mm:ss Z"]
}
I need to do this, so I can add events to elasticsearch.
Thanks in advance!
I used the following approach:
# strip the timestamp and force event timestamp to be the same.
# the original string is saved in field %{log_timestamp}.
# the original logstash input timestamp is saved in field %{event_timestamp}.
grok {
patterns_dir => "./patterns"
match => [ "message", "%{IRODS_TIMESTAMP:log_timestamp}" ]
add_tag => "got_syslog_timestamp"
add_field => [ "event_timestamp", "%{#timestamp}" ]
}
date {
match => [ "log_timestamp", "MMM dd HH:mm:ss" ]
}
mutate {
replace => [ "#timestamp", "%{log_timestamp}" ]
}
My problem now is that, even if #timestamp is replaced, I would like to convert it to a ISO8601-compatible format first so that other programs don't have problems interpreting it, like the timestamp present in "event_timestamp":
"#timestamp" => "Mar 5 14:38:40",
"#version" => "1",
"type" => "irods.relog",
"host" => "ids-dev",
"path" => "/root/logstash/reLog.2013.03.01",
"pid" => "5229",
"level" => "NOTICE",
"log_timestamp" => "Mar 5 14:38:40",
"event_timestamp" => "2013-09-17 12:20:28 UTC",
"tags" => [
[0] "got_syslog_timestamp"
]
You could convert it easily since you have the year information... In my case I would have to parse it out of the "path" (filename) attribute... but still, there does not seem to be an convert_to_iso8901 => #timestamp directive.
Hope this helps with your issue anyway! :)
The above answer is just a work around !, try to add locale => "en" to your code.
If not added, the date weekdays and month names will be parsed with the default platform locale language (spanish, french or whatever) and that's why it didn't work (since your log is in english).
date{
type => "log-date"
match => [ "date", "dd/MMM/YYYY:HH:mm:ss Z"]
locale => "en"
}

Resources