Remove an event field and reference it in Logstash - elasticsearch

Using Logstash, I want to index documents into Elasticsearch and specify the type, id etc of the document that needs to be indexed. How can I specify those in my config without keeping useless fields in my documents?
Example: I want to specify the id used for insertion:
input {
stdin {
codec => json {}
}
}
output {
elasticsearch { document_id => "%{[id]}" }
}
This will insert the document in Elasticsearch with the id id but the document will keep a redundant field "id" in the mapping. How can I avoid that?
I thought of adding
filter{ mutate { remove_field => "%{[id]}"} }
in the config, but the field is removed and cannot consequently be used as document_id...

Right now this isn't possible. Logstash 1.5 introduces a #metadata field whose contents aren't included in what's eventually sent to the outputs, so you'd be able to create a [#metadata][id] field and refer to that in your output,
output {
elasticsearch { document_id => "%{[#metadata][id]}" }
}
without that field polluting the message payload indexed to Elasticsearch. See the #metadata documentation.

Related

How to change key name in a JSON at logstash level?

I am using logstash to ship json data to elasticsearch. But I need to change the key name of a particular key at logstash level before shipping to ElasticSearch.
Is it possible to do so? If yes, do I need to include some plugins for logstash?
Original data:
{"keyA": "dataA", "keyB": "dataB"}
ElasticSearch data:
{"keyC": "dataA", "keyB": "dataB"}
Yes it is possible.
Use the rename configuration option on the mutate filter to rename one or more fields.
By default, it should already come included with Logstash.
Example:
filter {
mutate {
# Renames the 'keyA' field to 'KeyC'
rename => { "keyA" => "keyC" }
}
}

Can't access Elasticsearch index name metadata in Logstash filter

I want to add the elasticsearch index name as a field in the event when processing in Logstash. This is suppose to be pretty straight forward but the index name does not get printed out. Here is the complete Logstash config.
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
}
}
filter {
mutate {
add_field => {
"log_source" => "%{[#metadata][_index]}"
}
}
}
output {
elasticsearch {
index => "logstash-%{+YYYY.MM}"
}
}
This will result in log_source being set to %{[#metadata][_index]} and not the actual name of the index. I have tried this with _id and without the underscores but it will always just output the reference and not the value.
Doing just %{[#metadata]} crashes Logstash with the error that it's trying to accessing the list incorrectly so [#metadata] is being set but it seems like index or any values are missing.
Does anyone have a another way of assigning the index name to the event?
I am using 5.0.1 of both Logstash and Elasticsearch.
You're almost there, you're simply missing the docinfo setting, which is false by default:
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
docinfo => true
}
}

Logstash -> Elasticsearch : update document #timestamp if newer, discard if older

Using the elasticsearch output in logstash, how can i update only the #timestamp for a log message if newer?
I don't want to reindex the whole document, nor have the same log message indexed twice.
Also, if the #timestamp is older, it must not update/replace the current version.
Currently, i'm doing this:
filter {
if ("cloned" in [tags]) {
fingerprint {
add_tag => [ "lastlogin" ]
key => "lastlogin"
method => "SHA1"
}
}
}
output {
if ("cloned" in [tags]) {
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{fingerprint}"
index => "lastlogin-%{+YYYY.MM}"
sniffing => true
template_overwrite => true
}
}
}
It is similar to How to deduplicate documents while indexing into elasticsearch from logstash but i do not want to always update the message field; only if the #timestamp field is more recent.
You can't decide from Logstash level if a document needs to be updated or nothing should be done, this needs to be decided at Elasticsearch level. Which means that you need to experiment and test with _update API.
I suggest looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts. Meaning, if the document exists the script is executed (where you can check, if you want, the #timestamp), otherwise the content of upsert is considered as a new document.

ElasticSearch assign own IDs while indexing with LogStash

I am indexing a large corpora of information and I have a string-key that I know is unique. I would like to avoid using the search and rather access documents by this artificial identifier.
Since the Path directive is discontinued in ES 1.5, anyone know a workaround to this problem!?
My data look like:
{unique-string},val1, val2, val3...
{unique-string2},val4, val5, val6...
I am using logstash to index the files and would prefer to fetch the documents through a direct get, rather than through an exact-match.
In your elasticsearch output plugin, just specify the document_id setting with a reference to the field you want to use as id, i.e. the one named 1 in your csv filter.
input {
file {...}
}
filter {
csv{
columns=>["1","2","3"]
separator => ","
}
}
output {
elasticsearch {
action => "index"
host => "localhost"
port => "9200"
index => "index-name"
document_id => "%{1}" <--- add this line
workers => 2
cluster => "elasticsearch-cluster"
protocol => "http"
}
}

Make logstash add different inputs to different indices

I have setup logstash to use an embedded elastisearch.
I can log events.
My logstash conf looks thus:
https://gist.github.com/khebbie/42d72d212cf3727a03a0
Now I would like to add another udp input and have that input be indexed in another index.
Is that somehow possible?
I would do it to make reporting easier, so I could have system log events in one index, and business log events in another index.
Use an if conditional in your output section, based on e.g. the message type or whatever message field is significant to the choice of index.
input {
udp {
...
type => "foo"
}
file {
...
type => "bar"
}
}
output {
if [type] == "foo" {
elasticsearch {
...
index => "foo-index"
}
} else {
elasticsearch {
...
index => "bar-index"
}
}
}
Or, if the message type can go straight into the index name you can have a single output declaration:
elasticsearch {
...
index => "%{type}-index"
}

Resources