Parsing out text from a string using a logstash filter - filter

I have an Apache Access Log that I would like to parse out some text from within the REQUEST field:
GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
What I would like to do is extract and assign the 12121212122 to a value but the value is based off of the prefix ABC&_ (so I think I need an if statement or something). The prefix could take on other forms (e.g., DDD&_)
So basically I would like to say
if (prefix == ABC&_)
ABCID = 1212121212
elseif (prefix == DDD&_)
DDDID = <whatever value>
else
do nothing
I have been struggling to build the right filter in logstash to extract the id based on the prefix. Any help would be great.
Thank you

For this you would use a grok filter.
For example:
artur#pandaadb:~/dev/logstash$ ./logstash-2.3.2/bin/logstash -f conf2
Settings: Default pipeline workers: 8
Pipeline main started
GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
{
"message" => "GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1\"",
"#version" => "1",
"#timestamp" => "2016-07-28T15:59:12.787Z",
"host" => "pandaadb",
"prefix" => "ABC&_",
"id" => "1212121212"
}
This is your sample input, parsing out your prefix and Id.
There is no need for an if here, since the regular expression of the GROK filter takes care of it.
You can however (if you need to put it in different fields) analyse your field and add it to a different one.
This would output like that:
GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1"
{
"message" => "GET /foo/bar?contentId=ABC&_=1212121212 HTTP/1.1\"",
"#version" => "1",
"#timestamp" => "2016-07-28T16:05:07.442Z",
"host" => "pandaadb",
"prefix" => "ABC&_",
"id" => "1212121212",
"ABCID" => "1212121212"
}
GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1"
{
"message" => "GET /foo/bar?contentId=DDD&_=1212121212 HTTP/1.1\"",
"#version" => "1",
"#timestamp" => "2016-07-28T16:05:20.026Z",
"host" => "pandaadb",
"prefix" => "DDD&_",
"id" => "1212121212",
"DDDID" => "1212121212"
}
The filter I used for this looks like that:
filter {
grok {
match => {"message" => ".*contentId=%{GREEDYDATA:prefix}=%{NUMBER:id}"}
}
if [prefix] =~ "ABC" {
mutate {
add_field => {"ABCID" => "%{id}"}
}
}
if [prefix] =~ "DDD" {
mutate {
add_field => {"DDDID" => "%{id}"}
}
}
}
I hope that illustrates how to go about it. You can use this to test your grok regex:
http://grokdebug.herokuapp.com/
Have fun!
Artur

Related

Automatically parse logs fields with Logstash

Let's say I have this kind of log :
Jun 2 00:00:00 192.168.14.4 date=2016-06-01 time=23:56:05
devname=POPB-FW-01 devid=FG1K2D3I14800220 logid=1059028704 type=utm
subtype=app-ctrl eventtype=app-ctrl-all level=information vd="root"
appid=40568 user="" srcip=10.20.4.35 srcport=52438
srcintf="VRF-PUBLIC" dstip=125.209.230.238 dstport=443 dstintf="OUT"
proto=6 service="HTTPS" sessionid=424666004 applist="Monitor-all"
appcat="Web.Others" app="HTTPS.BROWSER" action=pass
hostname="lcs.naver.com" url="/" msg="Web.Others: HTTPS.BROWSER,"
apprisk=medium
So with this code below, I can regex the timestamp and the ip in future elastic fields :
filter {
grok {
match => {"message" => "%{SYSLOGTIMESTAMP:timestamp} %{client}" }
}
}
Now, how do I automatically get fields for the rest of the log ? Is there a simple way to say :
The thing before the "=" is the field name and the thing after is the value.
So I can obtain a JSON for elastic index with many fields for each log line :
{
"path" => "C:/Users/yoyo/Documents/yuyu/temp.txt",
"#timestamp" => 2017-11-29T10:50:18.947Z,
"#version" => "1",
"client" => "192.168.14.4",
"timestamp" => "Jun 2 00:00:00",
"date" => "2016-06-01",
"time" => "23:56:05",
"devname" => "POPB-FW-01 ",
"devid" => "FG1K2D3I14800220",
etc,...
}
Thanks in advance
Okay, I am really dumb
It was easy, rather than search on google, how to match equals, I just had to search key value matching with logstash.
So I just have to write :
filter {
kv {
}
}
And it's done !
Sorry

Extract Parameter (sub-string) from URL GROK Pattern

I have ELK running for log analysis. I have everything working. There are just a few tweaks I would like to make. To all the ES/ELK Gods in stackoverflow, I'd appreciate any help on this. I'd gladly buy you a cup of coffee! :D
Example:
URL: /origina-www.domain.com/this/is/a/path?page=2
First I would like to get the entire path as seen above.
Second, I would like to get just the path before the parameter: /origina-www.domain.com/this/is/a/path
Third, I would like to get just the parameter: ?page=2
Fourth, I would like to make the timestamp on the logfile be the main time stamp on kibana. Currently, the timestamp kibana is showing is the date and time the ES was processed.
This is what a sample entry looks like:
2016-10-19 23:57:32 192.168.0.1 GET /origin-www.example.com/url 200 1144 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" "-"
Here's my config:
if [type] == "syslog" {
grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
date {
match => [ "timestamp", "MMM dd, yyyy HH:mm:ss a" ]
locale => "en"
}
}
ES Version: 5.0.1
Logstash Version: 5.0
Kibana: 5.0
UPDATE: I was actually able to solve it by using:
grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
grok {
match => [ "request", "%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}" ]
}
kv {
source => "uri_query"
field_split => "&"
target => "query"
}
In order to use the actual timestamp of your log entry rather than the indexed time, you could use the date and mutate plugins as such to override the existing timestamp value. You could have your logstash filter look, something like this:
//filtering your log file
grok {
patterns_dir => ["/pathto/patterns"] <--- you could have a pattern file with such expression LOGTIMESTAMP %{YEAR}%{MONTHNUM}%{MONTHDAY} %{TIME} if you have to change the timestamp format.
match => { "message" => "^%{LOGTIMESTAMP:logtimestamp}%{GREEDYDATA}" }
}
//overriding the existing timestamp with the new field logtimestamp
mutate {
add_field => { "timestamp" => "%{logtimestamp}" }
remove_field => ["logtimestamp"]
}
//inserting the timestamp as UTC
date {
match => [ "timestamp" , "ISO8601" , "yyyyMMdd HH:mm:ss.SSS" ]
target => "timestamp"
locale => "en"
timezone => "UTC"
}
You could follow up Question for more as well. Hope it helps.
grok {
match => ["message", "%{IP:client}\s+%{WORD:method}\s+%{URIPATHPARAM:request}\s+%{NUMBER:bytes}\s+%{NUMBER:duration}\s+%{USER-AGENT}\s+%{QS:referrer}\s+%{QS:agent}%{GREEDYDATA}"]
}
grok {
match => [ "request", "%{GREEDYDATA:uri_path}\?%{GREEDYDATA:uri_query}" ]
}
kv {
source => "uri_query"
field_split => "&"
target => "query"
}

CSV filter in logstash throwing "_csvparsefailure" error

I asked another question eairler which I think might be related to this question:
JSON parser in logstash ignoring data?
The reason I think it's related is because in the previous question kibana wasn't displaying results from the JSON parser which have the "PROGRAM" field as "mfd_status". Now I'm changing the way I do things, removed the JSON parser just in case it might be interfering with stuff, but I still don't have any logs with "mfd_status" in them showing up.
csv
{
columns => ["unixTime", "unixTime2", "FACILITY_NUM", "LEVEL_NUM", "PROGRAM", "PID", "MSG_FULL"]
source => "message"
separator => " "
}
In my filter from the previous question I used two grok filters, now I've replaced them with a csv filter. I also have two date and a fingerprint filter but they're irrelevant for this question, I think.
Example log messages:
"1452564798.76\t1452496397.00\t1\t4\tkernel\t\t[ 6252.000246] sonar: sonar_write(): waiting..."
OUTPUT:
"unixTime" => "1452564798.76",
"unixTime2" => "1452496397.00",
"FACILITY_NUM" => "1",
"LEVEL_NUM" => "4",
"PROGRAM" => "kernel",
"PID" => nil,
"MSG_FULL" => "[ 6252.000246] sonar: sonar_write(): waiting...",
"TIMESTAMP" => "2016-01-12T02:13:18.760Z",
"TIMESTAMP_second" => "2016-01-11T07:13:17.000Z"
"1452564804.57\t1452496403.00\t1\t7\tmfd_status\t\t00800F08CFB0\textra\t{\"date\":1452543203,\"host\":\"ABCD1234\",\"inet\":[\"169.254.42.207/16\",\"10.8.207.176/32\",\"172.22.42.207/16\"],\"fb0\":[\"U:1280x800p-60\",32]}"
OUTPUT:
"tags" => [
[0] "_csvparsefailure"
After it says kernel/mfd_status in the logs, there shouldn't be any more deliminators and it should all go under the MSG_FULL field.
So, to summarize, why does one of my log messages parse correctly and the other one not? Also, even if it doesn't parse correctly it should still send it to elasticsearch just with empty fields, I think, why doesn't it do that either?
You're almost good, you need to override two more parameters in your CSV filter and both lines will be parsed correctly.
The first is skip_empty_columns => true because you have one empty field in your second log line and you need to ignore it.
The second is quote_char=> "'" (or anything else than the double quote ") since your JSON contain double quotes.
csv {
columns => ["unixTime", "unixTime2", "FACILITY_NUM", "LEVEL_NUM", "PROGRAM", "PID", "MSG_FULL"]
source => "message"
separator => " "
skip_empty_columns => true
quote_char => "'"
}
Using this, your first log line parses as:
{
"message" => "1452564798.76\\t1452496397.00\\t1\\t4\\tkernel\\t\\t[ 6252.000246] sonar: sonar_write(): waiting...",
"#version" => "1",
"#timestamp" => "2016-01-12T04:21:34.051Z",
"host" => "iMac.local",
"unixTime" => "1452564798.76",
"unixTime2" => "1452496397.00",
"FACILITY_NUM" => "1",
"LEVEL_NUM" => "4",
"PROGRAM" => "kernel",
"MSG_FULL" => "[ 6252.000246] sonar: sonar_write(): waiting..."
}
And the second log lines parses as:
{
"message" => "1452564804.57\\t1452496403.00\\t1\\t7\\tmfd_status\\t\\t00800F08CFB0\\textra\\t{\\\"date\\\":1452543203,\\\"host\\\":\\\"ABCD1234\\\",\\\"inet\\\":[\\\"169.254.42.207/16\\\",\\\"10.8.207.176/32\\\",\\\"172.22.42.207/16\\\"],\\\"fb0\\\":[\\\"U:1280x800p-60\\\",32]}",
"#version" => "1",
"#timestamp" => "2016-01-12T04:21:07.974Z",
"host" => "iMac.local",
"unixTime" => "1452564804.57",
"unixTime2" => "1452496403.00",
"FACILITY_NUM" => "1",
"LEVEL_NUM" => "7",
"PROGRAM" => "mfd_status",
"MSG_FULL" => "00800F08CFB0",
"column8" => "extra",
"column9" => "{\\\"date\\\":1452543203,\\\"host\\\":\\\"ABCD1234\\\",\\\"inet\\\":[\\\"169.254.42.207/16\\\",\\\"10.8.207.176/32\\\",\\\"172.22.42.207/16\\\"],\\\"fb0\\\":[\\\"U:1280x800p-60\\\",32]}"
}

How to write expression for special KV string in logstash kv fiter?

I got plenty of logs like these kind of stuff:
uid[118930] pageview h5_act, actTag[cyts] corpId[2] inviteType[0] clientId[3] clientVer[2.3.0] uniqueId[d317de16a78a0089b0d94d684e7a9585565ffa236138c0.85354991] srcId[0] subSrc[]
Most of these are key-value expression in KEY[VALUE] form.
I have read the document but still cannot figure out how to write the configurations.
Any help would be appreciated!
You can simply configure your kv filter using the value_split and trim settings, like below:
filter {
kv {
value_split => "\["
trim => "\]"
}
}
For the sample log line you've given, you'll get:
{
"message" => "uid[118930] pageview h5_act, actTag[cyts] corpId[2] inviteType[0] clientId[3] clientVer[2.3.0] uniqueId[d317de16a78a0089b0d94d684e7a9585565ffa236138c0.85354991] srcId[0] subSrc[]",
"#version" => "1",
"#timestamp" => "2015-12-12T05:04:00.888Z",
"host" => "iMac.local",
"uid" => "118930",
"actTag" => "cyts",
"corpId" => "2",
"inviteType" => "0",
"clientId" => "3",
"clientVer" => "2.3.0",
"uniqueId" => "d317de16a78a0089b0d94d684e7a9585565ffa236138c0.85354991",
"srcId" => "0",
"subSrc" => ""
}

Logstash Doesn't Read Entire Line With File Input

I'm using Logstash and I'm having troubles getting a rather simple configuration to work.
input {
file {
path => "C:/path/test-data/*.log"
start_position => beginning
type => "usage_data"
}
}
filter {
if [type] == "usage_data" {
grok {
match => { "message" => "^\s*%{NUMBER:lineNumber}\s+%{TIMESTAMP_ISO8601:date},(?<value1>[A-Za-z0-9+/]+),(?<value2>[A-Za-z0-9+/]+),(?<value3>[A-Za-z0-9+/]+),(?<value4>[^,]+),(?<value5>[^\r]*)" }
}
}
if "_grokparsefailure" not in [tags] {
drop { }
}
}
output {
stdout { codec => rubydebug }
}
I call Logstash like this:
SET LS_MAX_MEM=2g
DEL "%USERPROFILE%\.sincedb_*" 2> NUL
"C:\Program Files (x86)\logstash-1.4.1\bin\logstash.bat" agent -p "C:\path\\." -w 1 -f "logstash.conf"
The output:
←[33mUsing milestone 2 input plugin 'file'. This plugin should be stable, but if you see strange behavior, please let us know! For more information on plugin milestones, see http://logstash.net/docs/1.4.1/plugin-milestones {:level=>:w
arn}←[0m
{
"message" => ",",
"#version" => "1",
"#timestamp" => "2014-11-20T09:16:08.591Z",
"type" => "usage_data",
"host" => "my-machine",
"path" => "C:/path/test-data/monitor_20141116223000.log",
"tags" => [
[0] "_grokparsefailure"
]
}
If I parse only C:\path\test-data\monitor_20141116223000.log all lines are read and there is no grokparsefailure. If I remove C:\path\test-data\monitor_20141116223000.log the same grokparsefailure pops up in another log-file:
{
"message" => "atches in another context\r",
"#version" => "1",
"#timestamp" => "2014-11-20T09:14:04.779Z",
"type" => "usage_data",
"host" => "my-machine",
"path" => "C:/path/test-data/monitor_20140829235900.log",
"tags" => [
[0] "_grokparsefailure"
]
}
Especially the last output proves that Logstash doesn't read the entire line or attempts to interpret a newline where there is none. It always breaks at the same line at the same position.
Maybe I should add that the log-files contain \n as a line separator and I'm running Logstash on Windows. However, I'm not getting a whole lot of errors, just that one. And there are quite a lot of lines in there. They all appear properly when I remove the if "_grokparsefailure" ....
I assume that there is some problem with buffering, but I have no clue how to make this work. Any ideas?
Workaround:
# diff -Nur /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb.orig /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb
--- /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb.orig 2015-02-25 10:46:06.916321816 +0700
+++ /opt/logstash/vendor/bundle/jruby/1.9/gems/filewatch-0.5.1/lib/filewatch/tail.rb 2015-02-12 18:39:34.943833909 +0700
## -86,7 +86,9 ##
_read_file(path, &block)
#files[path].close
#files.delete(path)
- #statcache.delete(path)
+ ##statcache.delete(path)
+ inode = #statcache.delete(path)
+ #sincedb[inode] = 0
else
#logger.warn("unknown event type #{event} for #{path}")
end

Resources