How to find documents which are causing mapping conflict in Elasticsearch - elasticsearch

I have following Logstash filter:
...
if [type] == "binarysize" {
if ([file] =~ /svn/) {
drop {} }
grok {
match => {
"file" => "\A/home/data/binaries_size_stats/%{WORD:branch}/%{WORD:binary}/%{WORD:architecture}/%{WORD}"
}
}
grok {
match => {
"message" => "\A%{INT:commit_number},%{INT:binary_size},%{TIMESTAMP_ISO8601:date_and_time_of_commit}"
}
}
date {
match => ["date_and_time_of_commit", "ISO8601"]
#timezone => "CET"
}
}
...
I started it, pushed some data into Elasticsearch, did some plots in Kibana and left everything working nicely for the weekend.
When I returned, my plots are not updated with new data and I receive constant "14 Courier Fetch: 15 of 2465 shards failed." message no matter how much I reload page in browser.
After reloading field list I found that I have one conflict in "binary_size" field. I was plotting data based on this field, so my guess is that something weird happened over weekend with new documents pushed to Elasticsearch by Logstash.
My question is: "How can I find those documents with conflicting fields?". Or what should I do alternatively to be able to plot fresh data again.

Related

After adding Prune filter along with KV filter - logs are not going to Elastic search

I am learning ELK and trying to do as a POC for my project. I am applying KV filter for the sample integration logs from my project and i could see lot of extra fields are coming as a result so i have tried to apply prune filter and white-listed certain fields. I can see the logs getting printed in the logstash server but logs are not going to elastic search. If i remove the filter it is going to the elastic search. Please advise how to further debug on this issue.
filter {
kv {
field_split => "{},?\[\]"
transform_key => "capitalize"
transform_value => "capitalize"
trim_key => "\s"
trim_value => "\s"
include_brackets => false
}
prune
{
whitelist_names => [ "App_version", "Correlation_id", "Env", "Flow_name", "host", "Instance_id", "log_level","log_thread", "log_timestamp", "message", "patient_id", "status_code", "type", "detail"]
}
}
output {
elasticsearch {
hosts => ["http://localhost:9200"]
index => "mule-%{+YYYY.MM.dd}"
#user => "elastic"
#password => "changeme"
}
stdout { codec => rubydebug }
}
I also need two more suggestion,
I am also trying to use the grok filter in the initial logs and trying to take log level fields(time and log type) from the sample log and send the remaining logs to the KV filter. Is there any reference please share for it. This is what i have tried for it. but getting as _grokparsefailure. I have passed the msgbody to the kv filter with the source option.
grok {
match => { "message" => "%{TIMESTAMP_ISO8601:timestamp}\s+%{LOGLEVEL:loglevel}\s+%{GREEDYDATA:msgbody}"}
overwrite => [ "msgbody" ]
}
I am having message fields inside sample logs as like below. When the data goes to Kibana i can see two message field tag one is with full log and other is with correct message(highlighted). Will the mutate works for this case? Is there any way we can change the full log name as something else ??
[2020-02-10 11:20:07.172] INFO Mule.api [[MuleRuntime].cpuLight.04:
[main-api-test].api-main.CPU_LITE #256c5cf5:
[main-api-test].main-api-main/processors/0/processors/0.CPU_LITE
#378f34b0]: event:00000003 {app_name=main-api-main, app_version=v1,
env=Test, timestamp=2020-02-10T11:20:07.172Z,
log={correlation_id=00000003, patient_id=12345678,
instance_id=hospital, message=Start of System API,
flow_name=main-api-main}}
prune filter error
Your prune filter does not have the #timestamp field in the whitelist_names list, your output is date based (%{+YYYY.MM.dd}), logstash needs the #timestamp field in the output to extract the date.
I've ran your pipeline with your sample message and it worked as expected, with the prune filter the message is sent to elasticsearch, but it is stored in an index named mule- without any datetime field.
Without the prune filter your message uses the time when logstash received the event as the #timestamp, since you do not have any date filter to change it.
If you created the index pattern for the index mule-* with a datetime field like #timestamp, you won't see on Kibana any documents on the index that doesn't have the same datetime field.
grok error
Your grok is wrong, you need to escape the square brackets surrounding your timestamp. Kibana has a grok debugger where you can try your patterns.
The following grok works, move your kv to run after the grok and with the msgbody as source.
grok {
match => { "message" => "\[%{TIMESTAMP_ISO8601:timestamp}\]\s+%{LOGLEVEL:loglevel}\s+%{GREEDYDATA:msgbody}"}
overwrite => [ "msgbody" ]
}
kv {
source => "msgbody"
field_split => "{},?\[\]"
transform_key => "capitalize"
transform_value => "capitalize"
trim_key => "\s"
trim_value => "\s"
include_brackets => false
}
Just run it with output only to stdout to see the filters you need to change your prune filter.
duplicated message fields
If you put your kv filter after the grok you wouldn't have duplicated message fields since your kv is capitalizing your fields, you will end with a message field containing your full log, and a Message field containing your internal message, logstash fields are case sensitive.
However you can rename any field using the mutate filter.
mutate {
rename => ["message", "fullLogMessage"]
}

Parse integer and show in timeline in Kibana

Here is the Kibana UI and I want to parse some Integer in the message. The number in the end of message is the process time for one method and I what to visualize the average process time by hour in Kibana. Is that possible?
I tried some conf in logstash:
filter{
json{
source => "message"
}
grok {
match => {
"message" => "^Finish validate %{NUMBER:cto_validate_time}$"
}
}
grok {
match => {
"message" => "^Finish customize %{NUMBER:cto_customize_time}$"
}
}
}
It works. But when I create the timechart I can not get the new field.
Since you don't care about performance issues, you may create a scripted field named process_time in your index pattern with the following painless code. What it does is simply take the last numerical value from your message field.
def m = /.*\s(\d+)$/.matcher(doc['message.keyword'].value);
if ( m.matches() ) {
return m.group(1)
} else {
return 0
}
Then you can build a chart to show the average process time by hour. Go to the Visualize tab and create a new vertical bar chart. On the Y-Axis you'll create an Average aggregation on the process_time field and on the X-Axis you'll use a Date histogram aggregation on your timestamp field. A sample is shown below:
Note: You also need to add the following line in your elasticsearch.yml file and restart ES:
script.painless.regex.enabled: true
UPDATE
If you want to do it via Logstash you can add the following grok filter
filter{
grok {
match => {
"message" => "^Finish customize in controller %{NUMBER:cto_customize_time}$"
}
}
mutate {
convert => { "cto_customize_time" => "integer" }
}
}

Can't access Elasticsearch index name metadata in Logstash filter

I want to add the elasticsearch index name as a field in the event when processing in Logstash. This is suppose to be pretty straight forward but the index name does not get printed out. Here is the complete Logstash config.
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
}
}
filter {
mutate {
add_field => {
"log_source" => "%{[#metadata][_index]}"
}
}
}
output {
elasticsearch {
index => "logstash-%{+YYYY.MM}"
}
}
This will result in log_source being set to %{[#metadata][_index]} and not the actual name of the index. I have tried this with _id and without the underscores but it will always just output the reference and not the value.
Doing just %{[#metadata]} crashes Logstash with the error that it's trying to accessing the list incorrectly so [#metadata] is being set but it seems like index or any values are missing.
Does anyone have a another way of assigning the index name to the event?
I am using 5.0.1 of both Logstash and Elasticsearch.
You're almost there, you're simply missing the docinfo setting, which is false by default:
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
docinfo => true
}
}

Logstash -> Elasticsearch : update document #timestamp if newer, discard if older

Using the elasticsearch output in logstash, how can i update only the #timestamp for a log message if newer?
I don't want to reindex the whole document, nor have the same log message indexed twice.
Also, if the #timestamp is older, it must not update/replace the current version.
Currently, i'm doing this:
filter {
if ("cloned" in [tags]) {
fingerprint {
add_tag => [ "lastlogin" ]
key => "lastlogin"
method => "SHA1"
}
}
}
output {
if ("cloned" in [tags]) {
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{fingerprint}"
index => "lastlogin-%{+YYYY.MM}"
sniffing => true
template_overwrite => true
}
}
}
It is similar to How to deduplicate documents while indexing into elasticsearch from logstash but i do not want to always update the message field; only if the #timestamp field is more recent.
You can't decide from Logstash level if a document needs to be updated or nothing should be done, this needs to be decided at Elasticsearch level. Which means that you need to experiment and test with _update API.
I suggest looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts. Meaning, if the document exists the script is executed (where you can check, if you want, the #timestamp), otherwise the content of upsert is considered as a new document.

Logstash: error for querying elasticsearch

Hello everyone,
Through logstash, I want to query elasticsearch in order to get fields from previous events and do some computation with fields of my current event and add new fields. Here is what I did:
input file:
{"device":"device1","count":5}
{"device":"device2","count":11}
{"device":"device1","count":8}
{"device":"device3","count":100}
{"device":"device3","count":95}
{"device":"device3","count":155}
{"device":"device2","count":15}
{"device":"device1","count":55}
My expected output:
{"device":"device1","count":5,"previousCount=0","delta":0}
{"device":"device2","count":11,"previousCount=0","delta":0}
{"device":"device1","count":8,"previousCount=5","delta":3}
{"device":"device3","count":100,"previousCount=0","delta":0}
{"device":"device3","count":95,"previousCount=100","delta":-5}
{"device":"device3","count":155,"previousCount=95","delta":60}
{"device":"device2","count":15,"previousCount=11","delta":4}
{"device":"device1","count":55,"previousCount=8","delta":47}
Logstash filter part:
filter {
elasticsearch {
hosts => ["localhost:9200/device"]
query => 'device:"%{[device]}"'
sort => "#timestamp:desc"
fields => ['count','previousCount']
}
if [previousCount]{
ruby {
code => "event[delta] = event[count] - event[previousCount]"
}
}
else{
mutate {
add_field => { "previousCount" => "0" }
add_field => { "delta" => "0" }
}
}
}
My problem:
For every line of my input file I got the following error : Failed to query elasticsearch for previous event ..
It seems that every line completely treated is not put in elasticsearch before logstash starts to treat the next line.
I don't know if my conclusion is correct and, if yes, why it happens.
So, do you know how I could solve this problem please ?!
Thank you for your attention and your help.
S

Resources