Duplicate field values for grok-parsed data - elasticsearch

I have a filebeat that captures logs from uwsgi application running in docker. The data is sent to the logstash which parses it and forwards to elasticsearch.
Here is the logstash conf file:
input {
beats {
port => 5044
}
}
filter {
grok {
match => { "log" => "\[pid: %{NUMBER:worker.pid}\] %{IP:request.ip} \{%{NUMBER:request.vars} vars in %{NUMBER:request.size} bytes} \[%{HTTPDATE:timestamp}] %{URIPROTO:request.method} %{URIPATH:request.endpoint}%{URIPARAM:request.params}? => generated %{NUMBER:response.size} bytes in %{NUMBER:response.time} msecs(?: via sendfile\(\))? \(HTTP/%{NUMBER:request.http_version} %{NUMBER:response.code}\) %{NUMBER:headers} headers in %{NUMBER:response.size} bytes \(%{NUMBER:worker.switches} switches on core %{NUMBER:worker.core}\)" }
}
date {
# 29/Oct/2018:06:50:38 +0700
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z"]
}
kv {
source => "request.params"
field_split => "&?"
target => "request.query"
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "test-index"
}
}
Everything was fine, but I've noticed that all values captured by the grok pattern is duplicated. Here is how it looks in kibana:
Note that the raw data like log which wasn't grok output is fine. I've seen that kv filter has allow_duplicate_values parameter, but it doesn't apply to grok.
What is wrong with my configuration? Also, is it possible to rerun grok patterns on existing data in elasticsearch?

Maybe your filebeat is already doing the job and creating these fields
Did you try to add this parameter to your grok ?
overwrite => [ "request.ip", "request.endpoint", ... ]
In order to rerun grok on already indexed data you need to use elasticsearch input plugin in order to read data from ES and re-index it after grok.

Related

Logstash Elasticsearch plugin. Compare results from two sources

I have two deployed Elasticsearch clusters. Data "surpassingly" should be the same in both clusters. My main aim is to compare _source field for each elasticsearch document from source and target ES clusters.
I created logstash config in which I define Elasticsearch input plugin, which run over each document in source cluster, next using elasticsearch filter look up the target Elasticsearch cluster and query from it document by _id which I took from source cluster, match results of the _source field for both documents.
Could you please helm to implement such a config.
input {
elasticsearch {
hosts => ["source_cluster:9200"]
ssl => true
user => "user"
password => "password"
index => "my_index_pattern"
}
}
filter {
mutate {
remove_field => ["#version", "#timestamp"]
}
elasticsearch {
hosts => ["target_custer:9200"]
ssl => true
user => "user"
password => "password"
query => ???????
match _source field ????
}
}
output {
stdout { codec => rubydebug }
}
Maybe print some results of comparison...

After adding Prune filter along with KV filter - logs are not going to Elastic search

I am learning ELK and trying to do as a POC for my project. I am applying KV filter for the sample integration logs from my project and i could see lot of extra fields are coming as a result so i have tried to apply prune filter and white-listed certain fields. I can see the logs getting printed in the logstash server but logs are not going to elastic search. If i remove the filter it is going to the elastic search. Please advise how to further debug on this issue.
filter {
kv {
field_split => "{},?\[\]"
transform_key => "capitalize"
transform_value => "capitalize"
trim_key => "\s"
trim_value => "\s"
include_brackets => false
}
prune
{
whitelist_names => [ "App_version", "Correlation_id", "Env", "Flow_name", "host", "Instance_id", "log_level","log_thread", "log_timestamp", "message", "patient_id", "status_code", "type", "detail"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "mule-%{+YYYY.MM.dd}"
#user => "elastic"
#password => "changeme"
}
stdout { codec => rubydebug }
}
I also need two more suggestion,
I am also trying to use the grok filter in the initial logs and trying to take log level fields(time and log type) from the sample log and send the remaining logs to the KV filter. Is there any reference please share for it. This is what i have tried for it. but getting as _grokparsefailure. I have passed the msgbody to the kv filter with the source option.
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:loglevel}\s+%{GREEDYDATA:msgbody}"}
overwrite => [ "msgbody" ]
}
I am having message fields inside sample logs as like below. When the data goes to Kibana i can see two message field tag one is with full log and other is with correct message(highlighted). Will the mutate works for this case? Is there any way we can change the full log name as something else ??
[2020-02-10 11:20:07.172] INFO Mule.api [[MuleRuntime].cpuLight.04:
[main-api-test].api-main.CPU_LITE #256c5cf5:
[main-api-test].main-api-main/processors/0/processors/0.CPU_LITE
#378f34b0]: event:00000003 {app_name=main-api-main, app_version=v1,
env=Test, timestamp=2020-02-10T11:20:07.172Z,
log={correlation_id=00000003, patient_id=12345678,
instance_id=hospital, message=Start of System API,
flow_name=main-api-main}}
prune filter error
Your prune filter does not have the #timestamp field in the whitelist_names list, your output is date based (%{+YYYY.MM.dd}), logstash needs the #timestamp field in the output to extract the date.
I've ran your pipeline with your sample message and it worked as expected, with the prune filter the message is sent to elasticsearch, but it is stored in an index named mule- without any datetime field.
Without the prune filter your message uses the time when logstash received the event as the #timestamp, since you do not have any date filter to change it.
If you created the index pattern for the index mule-* with a datetime field like #timestamp, you won't see on Kibana any documents on the index that doesn't have the same datetime field.
grok error
Your grok is wrong, you need to escape the square brackets surrounding your timestamp. Kibana has a grok debugger where you can try your patterns.
The following grok works, move your kv to run after the grok and with the msgbody as source.
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\]\s+%{LOGLEVEL:loglevel}\s+%{GREEDYDATA:msgbody}"}
overwrite => [ "msgbody" ]
}
kv {
source => "msgbody"
field_split => "{},?\[\]"
transform_key => "capitalize"
transform_value => "capitalize"
trim_key => "\s"
trim_value => "\s"
include_brackets => false
}
Just run it with output only to stdout to see the filters you need to change your prune filter.
duplicated message fields
If you put your kv filter after the grok you wouldn't have duplicated message fields since your kv is capitalizing your fields, you will end with a message field containing your full log, and a Message field containing your internal message, logstash fields are case sensitive.
However you can rename any field using the mutate filter.
mutate {
rename => ["message", "fullLogMessage"]
}

How to avoid elasticsearch duplicate documents

How do I avoid elasticsearch duplicate documents?
The elasticsearch index docs count (20,010,253) doesn’t match with logs line count (13,411,790).
documentation:
File input plugin.
File rotation is detected and handled by this input,
regardless of whether the file is rotated via a rename or a copy operation.
nifi:
real time nifi pipeline copies logs from nifi server to elk server.
nifi has rolling log files.
logs line count on elk server:
wc -l /mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log
13,411,790 total
elasticsearch index docs count:
curl -XGET 'ip:9200/_cat/indices?v&pretty'
docs.count = 20,010,253
logstash input conf file:
cat /mnt/elk/logstash/input_conf_files/test_4.conf
input {
file {
path => "/mnt/elk/logstash/data/from/nifi/dev/logs/nifi/*.log"
type => "test_4"
sincedb_path => "/mnt/elk/logstash/scripts/sincedb/test_4"
}
}
filter {
if [type] == "test_4" {
grok {
match => {
"message" => "%{DATE:date} %{TIME:time} %{WORD:EventType} %{GREEDYDATA:EventText}"
}
}
}
}
output {
if [type] == "test_4" {
elasticsearch {
hosts => "ip:9200"
index => "test_4"
}
}
else {
stdout {
codec => rubydebug
}
}
}
You can use fingerprint filter plugin: https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html
This can e.g. be used to create consistent document ids when inserting
events into Elasticsearch, allowing events in Logstash to cause
existing documents to be updated rather than new documents to be
created.

ElasticSearch 2, Logstash and Kibana : grok match can't create fields

I try to parse a message field to generate differents fields. After research, the solution is to use grok with match. But in Kibana, I can't see the new fields (even after refresh or recreate fields from logstash indexes)
I try this in filter config :
grok {
match => {
"message" => "\[32m%{LOGLEVEL:loglevel}\[39m: memory: %{NOTSPACE:memory}, uptime \(seconds\): %{NUMBER:uptime}, load: %{NUMBER:load1},%{NUMBER:load5},%{NUMBER:load15}"
}
}
mutate {
rename => { "docker.id" => "container_id" }
rename => { "docker.name" => "container_name" }
rename => { "docker.image" => "docker_image" }
rename => { "docker.hostname" => "docker_hostname" }
}
To transform this type of message :
[32minfo[39m: memory: 76Mb, uptime (seconds): 5529.927, load: 0.05322265625,0.1298828125,0.19384765625
To this variables :
load15 0.19384765625
uptime 5529.927
load1 0.05322265625
load5 0.1298828125
memory 76Mb
loglevel info
I test the pattern in http://grokconstructor.appspot.com/do/match and my matches work fine. But, In Kibana I can't retrieve this fields.
Thanks

Tagging the Logs by Logstash - Grok - ElasticSearch

Summary:
I am using Logstash - Grok and elastic search and my main aim is to First accept the logs by logstash, parse them by grok and associate tags with the messages depending on the type of the log, and then finally feed it to the Elastic server to query with Kibana.
I have already written this code but am not able to get the tags in Elastic Search.
This is my logstash confif file.
input {
stdin {
type => "stdin-type"
}
}
filter {
grok {
tags => "mytags"
pattern => "I am a %{USERNAME}"
add_tag => "mytag"
named_captures_only => true
}
}
output {
stdout { debug => true debug_format => "json"}
elasticsearch {}
}
Where am I going wrong?
1) I would first start with editing your values to match the data type they represent. For example
add_tag => "mytag"
actually should have an array as it's value, not a simple string. Change that to
add_tag => ["mytag"]
as a good start. Double check all your values and verify they are of the correct type for logstash.
2) You are limiting your grok filters to messages that are already tagged with "mytags" based on the config line
tags => "mytags"
I don't see anywhere where you have added that tag ahead of time. Therefore, none of your messages will even go through your grok filter.
3) Please read the logstash docs carefully. I am rather new to the Logstash/Grok/ES/Kibana etc. world as well, but I have had very similar problems to what you have had, and all of them were solved by paying attention to what the documentation says.
You can run LogStash by hand (You may already be doing this) with /opt/logstash/bin/logstash -f $CONFIG_FILE and can check that your config file is valid with /opt/logstash/bin/logstash -f $CONFIG_FILE --configtest I bet you're already doing that though.
You may need to put your add_tag stanza into an array
grok {
...
add_tag => [ "mytag" ]
}
It could also be that what you're piping into STDIN isn't being matched in the grok pattern. If grok doesn't match is should result in _grokparsefailure being added to your tags. If you see those, it means your grok pattern isn't firing.
A better way to do this may be...
input {
stdin {
type => 'stdin'
}
}
filter {
if [type] = 'stdin' {
mutate {
add_tag => [ "mytag" ]
}
}
}
output {
stdout {
codec => 'rubydebug'
}
}
This will add a "mytag" tag to all things coming from standard in, wether they're groked or not.

Resources