Tagging the Logs by Logstash - Grok - ElasticSearch - elasticsearch

Summary:
I am using Logstash - Grok and elastic search and my main aim is to First accept the logs by logstash, parse them by grok and associate tags with the messages depending on the type of the log, and then finally feed it to the Elastic server to query with Kibana.
I have already written this code but am not able to get the tags in Elastic Search.
This is my logstash confif file.
input {
stdin {
type => "stdin-type"
}
}
filter {
grok {
tags => "mytags"
pattern => "I am a %{USERNAME}"
add_tag => "mytag"
named_captures_only => true
}
}
output {
stdout { debug => true debug_format => "json"}
elasticsearch {}
}
Where am I going wrong?

1) I would first start with editing your values to match the data type they represent. For example
add_tag => "mytag"
actually should have an array as it's value, not a simple string. Change that to
add_tag => ["mytag"]
as a good start. Double check all your values and verify they are of the correct type for logstash.
2) You are limiting your grok filters to messages that are already tagged with "mytags" based on the config line
tags => "mytags"
I don't see anywhere where you have added that tag ahead of time. Therefore, none of your messages will even go through your grok filter.
3) Please read the logstash docs carefully. I am rather new to the Logstash/Grok/ES/Kibana etc. world as well, but I have had very similar problems to what you have had, and all of them were solved by paying attention to what the documentation says.

You can run LogStash by hand (You may already be doing this) with /opt/logstash/bin/logstash -f $CONFIG_FILE and can check that your config file is valid with /opt/logstash/bin/logstash -f $CONFIG_FILE --configtest I bet you're already doing that though.
You may need to put your add_tag stanza into an array
grok {
...
add_tag => [ "mytag" ]
}
It could also be that what you're piping into STDIN isn't being matched in the grok pattern. If grok doesn't match is should result in _grokparsefailure being added to your tags. If you see those, it means your grok pattern isn't firing.
A better way to do this may be...
input {
stdin {
type => 'stdin'
}
}
filter {
if [type] = 'stdin' {
mutate {
add_tag => [ "mytag" ]
}
}
}
output {
stdout {
codec => 'rubydebug'
}
}
This will add a "mytag" tag to all things coming from standard in, wether they're groked or not.

Related

After adding Prune filter along with KV filter - logs are not going to Elastic search

I am learning ELK and trying to do as a POC for my project. I am applying KV filter for the sample integration logs from my project and i could see lot of extra fields are coming as a result so i have tried to apply prune filter and white-listed certain fields. I can see the logs getting printed in the logstash server but logs are not going to elastic search. If i remove the filter it is going to the elastic search. Please advise how to further debug on this issue.
filter {
kv {
field_split => "{},?\[\]"
transform_key => "capitalize"
transform_value => "capitalize"
trim_key => "\s"
trim_value => "\s"
include_brackets => false
}
prune
{
whitelist_names => [ "App_version", "Correlation_id", "Env", "Flow_name", "host", "Instance_id", "log_level","log_thread", "log_timestamp", "message", "patient_id", "status_code", "type", "detail"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "mule-%{+YYYY.MM.dd}"
#user => "elastic"
#password => "changeme"
}
stdout { codec => rubydebug }
}
I also need two more suggestion,
I am also trying to use the grok filter in the initial logs and trying to take log level fields(time and log type) from the sample log and send the remaining logs to the KV filter. Is there any reference please share for it. This is what i have tried for it. but getting as _grokparsefailure. I have passed the msgbody to the kv filter with the source option.
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:loglevel}\s+%{GREEDYDATA:msgbody}"}
overwrite => [ "msgbody" ]
}
I am having message fields inside sample logs as like below. When the data goes to Kibana i can see two message field tag one is with full log and other is with correct message(highlighted). Will the mutate works for this case? Is there any way we can change the full log name as something else ??
[2020-02-10 11:20:07.172] INFO Mule.api [[MuleRuntime].cpuLight.04:
[main-api-test].api-main.CPU_LITE #256c5cf5:
[main-api-test].main-api-main/processors/0/processors/0.CPU_LITE
#378f34b0]: event:00000003 {app_name=main-api-main, app_version=v1,
env=Test, timestamp=2020-02-10T11:20:07.172Z,
log={correlation_id=00000003, patient_id=12345678,
instance_id=hospital, message=Start of System API,
flow_name=main-api-main}}
prune filter error
Your prune filter does not have the #timestamp field in the whitelist_names list, your output is date based (%{+YYYY.MM.dd}), logstash needs the #timestamp field in the output to extract the date.
I've ran your pipeline with your sample message and it worked as expected, with the prune filter the message is sent to elasticsearch, but it is stored in an index named mule- without any datetime field.
Without the prune filter your message uses the time when logstash received the event as the #timestamp, since you do not have any date filter to change it.
If you created the index pattern for the index mule-* with a datetime field like #timestamp, you won't see on Kibana any documents on the index that doesn't have the same datetime field.
grok error
Your grok is wrong, you need to escape the square brackets surrounding your timestamp. Kibana has a grok debugger where you can try your patterns.
The following grok works, move your kv to run after the grok and with the msgbody as source.
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\]\s+%{LOGLEVEL:loglevel}\s+%{GREEDYDATA:msgbody}"}
overwrite => [ "msgbody" ]
}
kv {
source => "msgbody"
field_split => "{},?\[\]"
transform_key => "capitalize"
transform_value => "capitalize"
trim_key => "\s"
trim_value => "\s"
include_brackets => false
}
Just run it with output only to stdout to see the filters you need to change your prune filter.
duplicated message fields
If you put your kv filter after the grok you wouldn't have duplicated message fields since your kv is capitalizing your fields, you will end with a message field containing your full log, and a Message field containing your internal message, logstash fields are case sensitive.
However you can rename any field using the mutate filter.
mutate {
rename => ["message", "fullLogMessage"]
}

Duplicate field values for grok-parsed data

I have a filebeat that captures logs from uwsgi application running in docker. The data is sent to the logstash which parses it and forwards to elasticsearch.
Here is the logstash conf file:
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "log" => "\[pid: %{NUMBER:worker.pid}\] %{IP:request.ip} \{%{NUMBER:request.vars} vars in %{NUMBER:request.size} bytes} \[%{HTTPDATE:timestamp}] %{URIPROTO:request.method} %{URIPATH:request.endpoint}%{URIPARAM:request.params}? => generated %{NUMBER:response.size} bytes in %{NUMBER:response.time} msecs(?: via sendfile\(\))? \(HTTP/%{NUMBER:request.http_version} %{NUMBER:response.code}\) %{NUMBER:headers} headers in %{NUMBER:response.size} bytes \(%{NUMBER:worker.switches} switches on core %{NUMBER:worker.core}\)" }
}
date {
# 29/Oct/2018:06:50:38 +0700
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z"]
}
kv {
source => "request.params"
field_split => "&?"
target => "request.query"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "test-index"
}
}
Everything was fine, but I've noticed that all values captured by the grok pattern is duplicated. Here is how it looks in kibana:
Note that the raw data like log which wasn't grok output is fine. I've seen that kv filter has allow_duplicate_values parameter, but it doesn't apply to grok.
What is wrong with my configuration? Also, is it possible to rerun grok patterns on existing data in elasticsearch?
Maybe your filebeat is already doing the job and creating these fields
Did you try to add this parameter to your grok ?
overwrite => [ "request.ip", "request.endpoint", ... ]
In order to rerun grok on already indexed data you need to use elasticsearch input plugin in order to read data from ES and re-index it after grok.

Logstash create a new field based on existing field

I have data coming from database queries using jdbc input plugin and result from queries contains url field from which I want to extract a few properties.
Example urls:
/incident.do?sys_id=0dc18b246faa17007a64cbe64f3ee4e1&sysparm_view
/navpage_form_default.do
/u_pm_prov_project_list.do?sysparm_userpref_module=fa547ce26f661
JOB: email read events process
JOB: System - reduce resources
I added regex patterns in grok patterns file:
webpage_category .*
job_type .*
I have two types of url so I used if in filter block to distinguish between them
Config I tried so far:
filter {
if [url] =~ /JOB: .*/ {
grok {
patterns_dir => ["/etc/logstash/patterns"]
match => {
"url" => "JOB: %{job_type:job_type}"
}
}
} else
if [url] =~ /\/.*\.do\?.*/ {
grok {
patterns_dir => ["/etc/logstash/patterns"]
match => {
"url" => "/{webpage_category:webpage_category}\.do\?.*"
}
}
}
}
Creation of a new field for urls starting with JOB: works properly but webpage_category is not working at all. Is it because regex can not be used inside of match?
The problem is you are trying to use grok pattern inside a mutate filter, which wouldn't work. mutate and grok are two separate filter plugins.
You need to use add_field inside grok filter if you want to use grok pattern to create a field. please remember add_field is supported by all filter plugins.
Please have a look at following example,
filter {
grok {
add_field => { "foo_%{somefield}" => "Hello world, from %{host}" }
}
}
In your case, it will be,
filter{
grok {
add_field => {
"webpage_category" => "%{webpage_category:url}"
"job_type" => "%{job_type:url}"
}
}
}
Please also make sure, patterns_dir is imported,
patterns_dir => ["./patterns"] => ["./patterns"]
please checkout grok filter documentation as well.

Drop filter not working logstash

I have multiple log messages in a file which I am processing using logstash filter plugins. Then, the filtered logs are getting sent to elasticsearch.
There is one field called addID in a log message. I want to drop all the log messages which have a particular addID present. These particular addIDS are present in a ID.yml file.
Scenario: If the addID of a log message matches with any of the addIDs present in the ID.yml file, that log message should be dropped.
Could anyone help me in achieving this?
Below is my config file.
input {
file {
path => "/Users/jshaw/logs/access_logs.logs
ignore_older => 0
}
}
filter {
grok {
patterns_dir => ["/Users/jshaw/patterns"]
match => ["message", "%{TIMESTAMP:Timestamp}+{IP:ClientIP}+{URI:Uri}"]
}
kv{
field_split => "&?"
include_keys => [ "addID" ]
allow_duplicate_values => "false"
}
if [addID] in "/Users/jshaw/addID.yml" {
drop{}
}
}
output {
elasticsearch{
hosts => ["localhost:9200"]
}
}
You are using the in operator wrong. It is used to check if a value is in an array, not in a file, which are usually a bit more complicated to use.
A solution would be to use the ruby filter to open the file each time.
Or put the addId value in your configuration file, like this:
if [addID] == "addID" {
drop{}
}

How to handle non-matching Logstash grok filters

I am wondering what the best approach to take with my Logstash Grok filters. I have some filters that are for specific log entries, and won't apply to all entries. The ones that don't apply always generate _grokparsefailure tags. For example, I have one grok filter that's for every log entry and it works fine. Then I have another filter that's for error messages with tracebacks. The traceback filter throws a grokparsefailure for every single log entry that doesn't have a traceback.
I'd prefer to have it just pass the rule if there isn't a match instead of adding the parsefailure tag. I use the parsefailure tag to find things that aren't parsing properly, not things that simply didn't match a particular filter. Maybe it's just the nomenclature "parse failure" that gets me. To me that means there's something wrong with the filter (e.g. badly formatted), not that it didn't match.
So the question is, how should I handle this?
Make the filter pattern optional using ?
(ab)use the tag_on_failure option by setting it to nothing []
make the filter conditional using something like "if traceback in message"
something else I'm not considering?
Thanks in advance.
EDIT
I took the path of adding a conditional around the filter:
if [message] =~ /took\s\d+/ {
grok {
patterns_dir => "/etc/logstash/patterns"
match => ["message", "took\s+(?<servicetime>[\d\.]+)"]
add_tag => [ "stats", "servicetime" ]
}
}
Still interested in feedback though. What is considered "best practice" here?
When possible, I'd go with a conditional wrapper just like the one you're using. Feel free to post that as an answer!
If your application produces only a few different line formats, you can use multiple match patterns with the grok filter. By default, the filter will process up to the first successful match:
grok {
patterns_dir => "./patterns"
match => {
"message" => [
"%{BASE_PATTERN} %{EXTRA_PATTERN}",
"%{BASE_PATTERN}",
"%{SOME_OTHER_PATTERN}"
]
}
}
If your logic is less straightforward (maybe you need to check the same condition more than once), the grep filter can be useful to add a tag. Something like this:
grep {
drop => false #grep normally drops non-matching events
match => ["message", "/took\s\d+/"]
add_tag => "has_traceback"
}
...
if "has_traceback" in [tags] {
...
}
You can also add tag_on_failure => [] to your grok stanza like so:
grok {
match => ["context", "\"tags\":\[%{DATA:apptags}\]"]
tag_on_failure => [ ]
}
grok will still fail, but will do so without adding to the tags array.
This is the most efficient way of doing this. Ignore the filter
filter {
grok {
match => [ "message", "something"]
}
if "_grokparsefailure" in [tags] {
drop { }
}
}
You can also do this
remove_tag => [ "_grokparsefailure" ]
whenever you have a match.

Resources