Logstash Update a document in elasticsearch - elasticsearch

Trying to update a specific field in elasticsearch through logstash. Is it possible to update only a set of fields through logstash ?
Please find the code below,
input {
file {
path => "/**/**/logstash/bin/*.log"
start_position => "beginning"
sincedb_path => "/dev/null"
type => "multi"
}
}
filter {
csv {
separator => "|"
columns => ["GEOREFID","COUNTRYNAME", "G_COUNTRY", "G_UPDATE", "G_DELETE", "D_COUNTRY", "D_UPDATE", "D_DELETE"]
}
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
query => "GEOREFID:%{GEOREFID}"
fields => [["JSON_COUNTRY","G_COUNTRY"],
["XML_COUNTRY","D_COUNTRY"]]
}
if [G_COUNTRY] {
mutate {
update => { "D_COUNTRY" => "%{D_COUNTRY}"
}
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
document_id => "%{GEOREFID}"
}
}
We are using the above configuration when we use this the null value field is getting removed instead of skipping null value update.
Data comes from 2 different source. One is from XML file and the other is from JSON file.
XML log format : GEO-1|CD|23|John|892|Canada|31-01-2017|QC|-|-|-|-|-
JSON log format : GEO-1|AS|33|-|-|-|-|-|Mike|123|US|31-01-2017|QC
When adding one log new document will get created in the index. When reading the second log file the existing document should get updated. The update should happen only in the first 5 fields if log file is XML and last 5 fields if the log file is JSON. Please suggest us on how to do this in logstash.
Tried with the above code. Please check and can any one help on how to fix this ?

For the Elasticsearch output to do any action other than index you need to tell it to do something else.
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
document_id => "%{GEOREFID}"
}
This should probably be wrapped in a conditional to ensure you're only updating records that need updating. There is another option, though, doc_as_upsert
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-data-monitor"
action => "update"
doc_as_upsert => true
document_id => "%{GEOREFID}"
}
This tells the plugin to insert if it is new, and update if it is not.
However, you're attempting to use two inputs to define a document. This makes things complicated. Also, you're not providing both inputs, so I'll improvise. To provide different output behavior, you will need to define two outputs.
input {
file {
path => "/var/log/xmlhome.log"
[other details]
}
file {
path => "/var/log/jsonhome.log"
[other details]
}
}
filter { [some stuff ] }
output {
if [path] == '/var/log/xmlhome.log' {
elasticsearch {
[XML file case]
}
} else if [path] == '/var/log/jsonhome.log' {
elasticsearch {
[JSON file case]
action => "update"
}
}
}
Setting it up like this will allow you to change the ElasticSearch behavior based on where the event originated.

Related

Logstash If configuration dropping events

I am using the ELK stack to save custom events. The events I am pushing may or may not contain a field called feed.name.
I am using this field to dynamically set the index, so if it doesn't exist I want to set it to unknown before sending it to Elastic.
Here is the full config I have:
input {
http{
host => "XXXXXXXXX"
port => xxxx
codec => "json"
}
}
filter{
if ![feed.name]{
mutate { add_field => { "feed.name"=> "unknown" }}
}
if [source.asn]{
mutate { convert => {"source.asn" => "string"}}
}
if [destination.asn]{
mutate { convert => {"destination.asn" => "string"}}
}
}
output {
elasticsearch {
hosts => ["xxxxxxxxx:XXXX"]
index => "l-%{feed.name}-%{+YYYY.MM.dd}"
}
}
Here is my problem: When there is no feed.name set, Logstash sets it correctly and everything is fine. However, if the field exists, the event seems to be dropped.
So 2 questions arise here: How is this behaviour explained? And also, how can I make it work (or are there any workarounds)?

Can't access Elasticsearch index name metadata in Logstash filter

I want to add the elasticsearch index name as a field in the event when processing in Logstash. This is suppose to be pretty straight forward but the index name does not get printed out. Here is the complete Logstash config.
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
}
}
filter {
mutate {
add_field => {
"log_source" => "%{[#metadata][_index]}"
}
}
}
output {
elasticsearch {
index => "logstash-%{+YYYY.MM}"
}
}
This will result in log_source being set to %{[#metadata][_index]} and not the actual name of the index. I have tried this with _id and without the underscores but it will always just output the reference and not the value.
Doing just %{[#metadata]} crashes Logstash with the error that it's trying to accessing the list incorrectly so [#metadata] is being set but it seems like index or any values are missing.
Does anyone have a another way of assigning the index name to the event?
I am using 5.0.1 of both Logstash and Elasticsearch.
You're almost there, you're simply missing the docinfo setting, which is false by default:
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
docinfo => true
}
}

How to split a large json file input into different elastic search index?

The input to logstash is
input {
file {
path => "/tmp/very-large.json"
type => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
and sample json file
{"type":"type1", "msg":"..."}
{"type":"type2", "msg":"..."}
{"type":"type1", "msg":"..."}
{"type":"type3", "msg":"..."}
Is it possible to make them feed into different elastic search index, so I can process them easier in the future?
I know if it is possible to assign them with a tag, then I can do something like
if "type1" in [tags] {
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-type1%{+YYYY.MM.dd}"
flush_size => 50
}
}
How to do similar thing by looking at a specific json field value, e.g. type in my above example?
Even simpler, just use the type field to build the index name like this:
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-%{type}%{+YYYY.MM.dd}"
flush_size => 50
}
You can compare on any fields. You'll have to first parse your json with the json filter or codec.
Then you'll have a type field to work on, like this:
if [type] == "type1" {
elasticsearch {
...
index => "logstash-type1%{+YYYY.MM.dd}"
}
} else if [type] == "type2" {
elasticsearch {
...
index => "logstash-type2%{+YYYY.MM.dd}"
}
} ...
Or like in Val's answer:
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-%{type}%{+YYYY.MM.dd}"
flush_size => 50
}

Drop filter not working logstash

I have multiple log messages in a file which I am processing using logstash filter plugins. Then, the filtered logs are getting sent to elasticsearch.
There is one field called addID in a log message. I want to drop all the log messages which have a particular addID present. These particular addIDS are present in a ID.yml file.
Scenario: If the addID of a log message matches with any of the addIDs present in the ID.yml file, that log message should be dropped.
Could anyone help me in achieving this?
Below is my config file.
input {
file {
path => "/Users/jshaw/logs/access_logs.logs
ignore_older => 0
}
}
filter {
grok {
patterns_dir => ["/Users/jshaw/patterns"]
match => ["message", "%{TIMESTAMP:Timestamp}+{IP:ClientIP}+{URI:Uri}"]
}
kv{
field_split => "&?"
include_keys => [ "addID" ]
allow_duplicate_values => "false"
}
if [addID] in "/Users/jshaw/addID.yml" {
drop{}
}
}
output {
elasticsearch{
hosts => ["localhost:9200"]
}
}
You are using the in operator wrong. It is used to check if a value is in an array, not in a file, which are usually a bit more complicated to use.
A solution would be to use the ruby filter to open the file each time.
Or put the addId value in your configuration file, like this:
if [addID] == "addID" {
drop{}
}

Data type conversion using logstash grok

Basic is a float field. The mentioned index is not present in elasticsearch. When running the config file with logstash -f, I am getting no exception. Yet, the data reflected and entered in elasticsearch shows the mapping of Basic as string. How do I rectify this? And how do I do this for multiple fields?
input {
file {
path => "/home/sagnik/work/logstash-1.4.2/bin/promosms_dec15.csv"
type => "promosms_dec15"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
csv {
columns => ["Generation_Date","Basic"]
separator => ","
}
ruby {
code => "event['Generation_Date'] = Date.parse(event['Generation_Date']);"
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
index => "promosms-%{+dd.MM.YYYY}"
workers => 1
}
}
You have two problems. First, your grok filter is listed prior to the csv filter and because filters are applied in order there won't be a "Basic" field to convert when the grok filter is applied.
Secondly, unless you explicitly allow it, grok won't overwrite existing fields. In other words,
grok{
match => [
"Basic", " %{NUMBER:Basic:float}"
]
}
will always be a no-op. Either specify overwrite => ["Basic"] or, preferably, use mutate's type conversion feature:
mutate {
convert => ["Basic", "float"]
}

Resources