Is it possible to change a field by a previous value in logstash - elasticsearch

I'm searching on internet a way to put a variable in logstash and use or modify the value if a term is corresponding to a pattern.
Here, the is an example of my data source:
2017-04-12 15:49:57,641|OK|file1|98|||
2017-04-12 15:49:58,929|OK|file2|1387|null|msg_fils|
2017-04-12 15:49:58,931|OK|file3|2|msg_pere|msg_fils|
2017-04-12 15:50:17,666|OK|file1|25|||
2017-04-12 15:50:17,929|OK|file2|1387|null|msg_fils|
I'm using this grok code to parse my source.
grok {
match => {"message" => '%{TIMESTAMP_ISO8601:msgdates:date}\|%{WORD:verb}\|%{DATA:component}\|%{NUMBER:temps:int}\|%{DATA:msg_pere}\|%{DATA:msg_fils}\|'}
}
But in fact I want to modify the first field by the previous value of the line which contains file1
Can you tell me if it's possible or not?
Thanks

I have found a solution to my issue. I'm sharing you the solution to my problem.
I'm using a plugin named logstash-filter-memorize, it can be install by the command :
logstash-plugin install logstash-filter-memorize
So my filter is like this :
grok {
match => {"message" => '%{TIMESTAMP_ISO8601:msgdates:date}\|%{WORD:verb}\|%{DATA:component}\|%{NUMBER:temps:int}\|%{DATA:msg_pere}\|%{DATA:msg_fils}\|'}
}
if [component] =~ "file1" {
mutate {
add_field => [ "msg_id", "%{msgdates}" ]
}
memorize {
fields => [ "msg_id" ]
default => { "msg_id" => "NOTFOUND" }
} }
memorize {
fields => [ "msg_id9" ]
}
I hope that it can be useful for others.

Related

How to filter data with Logstash before storing parsed data in Elasticsearch

I understand that Logstash is for aggregating and processing logs. I have NGIX logs and had Logstash config setup as:
filter {
grok {
match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
overwrite => [ "message" ]
}
mutate {
convert => ["response", "integer"]
convert => ["bytes", "integer"]
convert => ["responsetime", "float"]
}
geoip {
source => "clientip"
target => "geoip"
add_tag => [ "nginx-geoip" ]
}
date {
match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
remove_field => [ "timestamp" ]
}
useragent {
source => "agent"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "weblogs-%{+YYYY.MM}"
document_type => "nginx_logs"
}
stdout { codec => rubydebug }
}
This would parse the unstructured logs into a structured form of data, and store the data into monthly indexes.
What I discovered is that the majority of logs were contributed by robots/web-crawlers. In python I would filter them out by:
browser_names = browser_names[~browser_names.str.\
match('^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*$', na=False)]
However, I would like to filter them out with Logstash so I can save a lot of disk space in Elasticsearch server. Is there a way to do that? Thanks in advance!
Thanks LeBigCat for generously giving a hint. I solved this problem by adding the following under the filter:
if [browser_names] =~ /(?i)^[\w\W]*(google|bot|spider|crawl|headless)[\w\W]*$/ {
drop {}
}
the (?i) flag is for case insensitive matching.
In your filter you can ask for drop (https://www.elastic.co/guide/en/logstash/current/plugins-filters-drop.html). As you already got your pattern, should be pretty fast ;)

Logstash create a new field based on existing field

I have data coming from database queries using jdbc input plugin and result from queries contains url field from which I want to extract a few properties.
Example urls:
/incident.do?sys_id=0dc18b246faa17007a64cbe64f3ee4e1&sysparm_view
/navpage_form_default.do
/u_pm_prov_project_list.do?sysparm_userpref_module=fa547ce26f661
JOB: email read events process
JOB: System - reduce resources
I added regex patterns in grok patterns file:
webpage_category .*
job_type .*
I have two types of url so I used if in filter block to distinguish between them
Config I tried so far:
filter {
if [url] =~ /JOB: .*/ {
grok {
patterns_dir => ["/etc/logstash/patterns"]
match => {
"url" => "JOB: %{job_type:job_type}"
}
}
} else
if [url] =~ /\/.*\.do\?.*/ {
grok {
patterns_dir => ["/etc/logstash/patterns"]
match => {
"url" => "/{webpage_category:webpage_category}\.do\?.*"
}
}
}
}
Creation of a new field for urls starting with JOB: works properly but webpage_category is not working at all. Is it because regex can not be used inside of match?
The problem is you are trying to use grok pattern inside a mutate filter, which wouldn't work. mutate and grok are two separate filter plugins.
You need to use add_field inside grok filter if you want to use grok pattern to create a field. please remember add_field is supported by all filter plugins.
Please have a look at following example,
filter {
grok {
add_field => { "foo_%{somefield}" => "Hello world, from %{host}" }
}
}
In your case, it will be,
filter{
grok {
add_field => {
"webpage_category" => "%{webpage_category:url}"
"job_type" => "%{job_type:url}"
}
}
}
Please also make sure, patterns_dir is imported,
patterns_dir => ["./patterns"] => ["./patterns"]
please checkout grok filter documentation as well.

Logstash - find length of split result inside mutate

I'm newbie with Logstash. Currently i'm trying to parse a log in CSV format. I need to split a field with whitespace delimiter, then i'll add new field(s) based on split result.
Here is the filter i need to create:
filter {
...
mutate {
split => ["user", " "]
if [user.length] == 2 {
add_field => { "sourceUsername" => "%{user[0]}" }
add_field => { "sourceAddress" => "%{user[1]}" }
}
else if [user.length] == 1 {
add_field => { "sourceAddress" => "%{user[0]}" }
}
}
...
}
I got error after the if script.
Please advice, is there any way to capture the length of split result inside mutate plugin.
Thanks,
Heri
According to your code example I suppose that you are done with csv parsing and you already have a field user which has either a value that contains a sourceAddress or a value that contains a sourceUsername sourceAddress (separated by whitespace).
Now, there are a lot of filters that can be used to retrieve further fields. You don't need to use the mutate filter to split the field. In this case, a more flexible approach would be the grok filter.
Filter:
grok {
match => {
"user" => [
"%{WORD:sourceUsername} %{IP:sourceAddress}",
"%{WORD:sourceUsername}"
]
}
}
A field "user" => "192.168.0.99" would result in
"sourceAddress" => "191.168.0.99".
A field "user" => "Herry 192.168.0.99" would result in
"sourceUsername" => "Herry", "sourceAddress" => "191.168.0.99"
Of course you can change IP to WORD if your sourceAddress is not an IP.

Data type conversion using logstash grok

Basic is a float field. The mentioned index is not present in elasticsearch. When running the config file with logstash -f, I am getting no exception. Yet, the data reflected and entered in elasticsearch shows the mapping of Basic as string. How do I rectify this? And how do I do this for multiple fields?
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
csv {
columns => ["Generation_Date","Basic"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
You have two problems. First, your grok filter is listed prior to the csv filter and because filters are applied in order there won't be a "Basic" field to convert when the grok filter is applied.
Secondly, unless you explicitly allow it, grok won't overwrite existing fields. In other words,
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
will always be a no-op. Either specify overwrite => ["Basic"] or, preferably, use mutate's type conversion feature:
mutate {
convert => ["Basic", "float"]
}

reparsing a logstash record? fix extracts?

I'm taking a JSON message (Cloudtrail, many objects concatenated together) and by the time I'm done filtering it, Logstash doesn't seem to be parsing the message correctly. It's as if the hash was simply dumped into a string.
Anyhow, here's the input and filter.
input {
s3 {
bucket => "stanson-ops"
delete => false
#snipped unimportant bits
type => "cloudtrail"
}
}
filter {
if [type] == "cloudtrail" {
json { # http://logstash.net/docs/1.4.2/filters/json
source => "message"
}
ruby {
code => "event['RecordStr'] = event['Records'].join('~~~')"
}
split {
field => "RecordStr"
terminator => "~~~"
remove_field => [ "message", "Records" ]
}
}
}
By the time I'm done, elasticsearch entries include a RecordStr key with the following data. It doesn't have a message field, nor does it have a Records field.
{"eventVersion"=>"1.01", "userIdentity"=>{"type"=>"IAMUser", "principalId"=>"xxx"}}
Note that is not JSON style, it's been parsed. (which is important for the concat->split thing to work).
So, the RecordStr key looks not quite right as one value. Further, in Kibana, filterable fields include RecordStr (no subfields). It includes some entries that aren't there anymore: Records.eventVersion, Records.userIdentity.type.
Why is that? How can I get the proper fields?
edit 1 here's part of the input.
{"Records":[{"eventVersion":"1.01","userIdentity":{"type":"IAMUser",
It's unprettified JSON. It appears the body of the file (the above) is in the message field, json extracts it and I end up with an array of records in the Records field. That's why I join and split it- I then end up with individual documents, each with a single RecordStr entry. However, the template(?) doesn't seem to understand the new structure.
I've worked out a method that allows for indexing the appropriate CloudTrail fields as you requested. Here are the modified input and filter configs:
input {
s3 {
backup_add_prefix => \"processed-logs/\"
backup_to_bucket => \"test-bucket\"
bucket => \"test-bucket\"
delete => true
interval => 30
prefix => \"AWSLogs/<account-id>/CloudTrail/\"
type => \"cloudtrail\"
}
}
filter {
if [type] == \"cloudtrail\" {
json {
source => \"message\"
}
ruby {
code => \"event.set('RecordStr', event.get('Records').join('~~~'))\"
}
split {
field => \"RecordStr\"
terminator => \"~~~\"
remove_field => [ \"message\", \"Records\" ]
}
mutate {
gsub => [
\"RecordStr\", \"=>\", \":\"
]
}
mutate {
gsub => [
\"RecordStr\", \"nil\", \"null\"
]
}
json {
skip_on_invalid_json => true
source => \"RecordStr\"
target => \"cloudtrail\"
}
mutate {
add_tag => [\"cloudtrail\"]
remove_field=>[\"RecordStr\", \"#version\"]
}
date {
match => [\"[cloudtrail][eventTime]\",\"ISO8601\"]
}
}
}
The key observation here is that once the split is done we no longer possess valid json in the event and are therefore required to execute the mutate replacements ('=>' to ':' and 'nil' to 'null'). Additionally, I found it useful to get the timestamp out of the CloudTrail eventTime and do some cleanup of unnecessary fields.

Resources