Logstash - find length of split result inside mutate - elasticsearch

I'm newbie with Logstash. Currently i'm trying to parse a log in CSV format. I need to split a field with whitespace delimiter, then i'll add new field(s) based on split result.
Here is the filter i need to create:
filter {
...
mutate {
split => ["user", " "]
if [user.length] == 2 {
add_field => { "sourceUsername" => "%{user[0]}" }
add_field => { "sourceAddress" => "%{user[1]}" }
}
else if [user.length] == 1 {
add_field => { "sourceAddress" => "%{user[0]}" }
}
}
...
}
I got error after the if script.
Please advice, is there any way to capture the length of split result inside mutate plugin.
Thanks,
Heri

According to your code example I suppose that you are done with csv parsing and you already have a field user which has either a value that contains a sourceAddress or a value that contains a sourceUsername sourceAddress (separated by whitespace).
Now, there are a lot of filters that can be used to retrieve further fields. You don't need to use the mutate filter to split the field. In this case, a more flexible approach would be the grok filter.
Filter:
grok {
match => {
"user" => [
"%{WORD:sourceUsername} %{IP:sourceAddress}",
"%{WORD:sourceUsername}"
]
}
}
A field "user" => "192.168.0.99" would result in
"sourceAddress" => "191.168.0.99".
A field "user" => "Herry 192.168.0.99" would result in
"sourceUsername" => "Herry", "sourceAddress" => "191.168.0.99"
Of course you can change IP to WORD if your sourceAddress is not an IP.

Related

Create a new field with ruby filter

So here the params is a hash, and I would like to concatenate all of the keys and store it to a new field, how can I achieve that? I found that is it possible to use inline ruby code in the configuration file but, I have no idea how to assign the return value of the concat.
grok { match => { "request" => [ "url", "%{URIPATH:url_path}%{URIPARAM:url_params}?" ]} }
urldecode{ field => "url_path" }
mutate { gsub => ["url_params","\?","" ] }
kv {
field_split => "&"
source => "url_params"
target => "params"
}
urldecode{ field => "params" }
ruby {
code => 'pattern= params.keys.join(",")'
#Pattern should be the new field that contains the key, separated by comma
}
Expected result should be:
pattern = "param1,param2,param3 ... and so on"
Solution is:
event.set('field_name', field_value)

Is it possible to change a field by a previous value in logstash

I'm searching on internet a way to put a variable in logstash and use or modify the value if a term is corresponding to a pattern.
Here, the is an example of my data source:
2017-04-12 15:49:57,641|OK|file1|98|||
2017-04-12 15:49:58,929|OK|file2|1387|null|msg_fils|
2017-04-12 15:49:58,931|OK|file3|2|msg_pere|msg_fils|
2017-04-12 15:50:17,666|OK|file1|25|||
2017-04-12 15:50:17,929|OK|file2|1387|null|msg_fils|
I'm using this grok code to parse my source.
grok {
match => {"message" => '%{TIMESTAMP_ISO8601:msgdates:date}\|%{WORD:verb}\|%{DATA:component}\|%{NUMBER:temps:int}\|%{DATA:msg_pere}\|%{DATA:msg_fils}\|'}
}
But in fact I want to modify the first field by the previous value of the line which contains file1
Can you tell me if it's possible or not?
Thanks
I have found a solution to my issue. I'm sharing you the solution to my problem.
I'm using a plugin named logstash-filter-memorize, it can be install by the command :
logstash-plugin install logstash-filter-memorize
So my filter is like this :
grok {
match => {"message" => '%{TIMESTAMP_ISO8601:msgdates:date}\|%{WORD:verb}\|%{DATA:component}\|%{NUMBER:temps:int}\|%{DATA:msg_pere}\|%{DATA:msg_fils}\|'}
}
if [component] =~ "file1" {
mutate {
add_field => [ "msg_id", "%{msgdates}" ]
}
memorize {
fields => [ "msg_id" ]
default => { "msg_id" => "NOTFOUND" }
} }
memorize {
fields => [ "msg_id9" ]
}
I hope that it can be useful for others.

How to remove part of the string before specific word using grok or gsub in logstash?

I have a string field "origin_message". It is pretty big one (used multiline to get mail content. Example of "origin_message":
Delivered-to: somemail#domain.com A LOT OF OTHER CONTENT Subject: Subject goes here AND THE REST OF THE MESSAGE
Desired result:
Subject goes here AND THE REST OF THE MESSAGE
Is there a way to trim everything before "Subject:" phrase?
I have tried the following filter with no luck:
filter {
mutate {
add_field => { "original_message" => "%{message}" }
convert => {
"original_message" => "string"
}
gsub => [
"original_message", "^(.*)Subject", " "
]
}
}
No sure why but using gsub on "message" field before copying that to separate "original_message" field fixed the issue.
filter {
mutate {
gsub => ["message", "^(.*)Subject", " "]
add_field => { "original_message" => "%{message}" }
convert => {
"original_message" => "string"
}
}
}
#Val, thanks for verification. Issue appeared to be not pattern related.

Data type conversion using logstash grok

Basic is a float field. The mentioned index is not present in elasticsearch. When running the config file with logstash -f, I am getting no exception. Yet, the data reflected and entered in elasticsearch shows the mapping of Basic as string. How do I rectify this? And how do I do this for multiple fields?
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
csv {
columns => ["Generation_Date","Basic"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
You have two problems. First, your grok filter is listed prior to the csv filter and because filters are applied in order there won't be a "Basic" field to convert when the grok filter is applied.
Secondly, unless you explicitly allow it, grok won't overwrite existing fields. In other words,
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
will always be a no-op. Either specify overwrite => ["Basic"] or, preferably, use mutate's type conversion feature:
mutate {
convert => ["Basic", "float"]
}

reparsing a logstash record? fix extracts?

I'm taking a JSON message (Cloudtrail, many objects concatenated together) and by the time I'm done filtering it, Logstash doesn't seem to be parsing the message correctly. It's as if the hash was simply dumped into a string.
Anyhow, here's the input and filter.
input {
s3 {
bucket => "stanson-ops"
delete => false
#snipped unimportant bits
type => "cloudtrail"
}
}
filter {
if [type] == "cloudtrail" {
json { # http://logstash.net/docs/1.4.2/filters/json
source => "message"
}
ruby {
code => "event['RecordStr'] = event['Records'].join('~~~')"
}
split {
field => "RecordStr"
terminator => "~~~"
remove_field => [ "message", "Records" ]
}
}
}
By the time I'm done, elasticsearch entries include a RecordStr key with the following data. It doesn't have a message field, nor does it have a Records field.
{"eventVersion"=>"1.01", "userIdentity"=>{"type"=>"IAMUser", "principalId"=>"xxx"}}
Note that is not JSON style, it's been parsed. (which is important for the concat->split thing to work).
So, the RecordStr key looks not quite right as one value. Further, in Kibana, filterable fields include RecordStr (no subfields). It includes some entries that aren't there anymore: Records.eventVersion, Records.userIdentity.type.
Why is that? How can I get the proper fields?
edit 1 here's part of the input.
{"Records":[{"eventVersion":"1.01","userIdentity":{"type":"IAMUser",
It's unprettified JSON. It appears the body of the file (the above) is in the message field, json extracts it and I end up with an array of records in the Records field. That's why I join and split it- I then end up with individual documents, each with a single RecordStr entry. However, the template(?) doesn't seem to understand the new structure.
I've worked out a method that allows for indexing the appropriate CloudTrail fields as you requested. Here are the modified input and filter configs:
input {
s3 {
backup_add_prefix => \"processed-logs/\"
backup_to_bucket => \"test-bucket\"
bucket => \"test-bucket\"
delete => true
interval => 30
prefix => \"AWSLogs/<account-id>/CloudTrail/\"
type => \"cloudtrail\"
}
}
filter {
if [type] == \"cloudtrail\" {
json {
source => \"message\"
}
ruby {
code => \"event.set('RecordStr', event.get('Records').join('~~~'))\"
}
split {
field => \"RecordStr\"
terminator => \"~~~\"
remove_field => [ \"message\", \"Records\" ]
}
mutate {
gsub => [
\"RecordStr\", \"=>\", \":\"
]
}
mutate {
gsub => [
\"RecordStr\", \"nil\", \"null\"
]
}
json {
skip_on_invalid_json => true
source => \"RecordStr\"
target => \"cloudtrail\"
}
mutate {
add_tag => [\"cloudtrail\"]
remove_field=>[\"RecordStr\", \"#version\"]
}
date {
match => [\"[cloudtrail][eventTime]\",\"ISO8601\"]
}
}
}
The key observation here is that once the split is done we no longer possess valid json in the event and are therefore required to execute the mutate replacements ('=>' to ':' and 'nil' to 'null'). Additionally, I found it useful to get the timestamp out of the CloudTrail eventTime and do some cleanup of unnecessary fields.

Resources