Logstash optionally set output options for elasticsearch based on fields - elasticsearch

Suppose I have some documents with join field, so they may have parent(e.g. [join_field][parent]) field in them, so as per docs, I need to pass that to _routing in logstash elasticsearch output,
...
output {
elasticsearch {
...
routing => "%{[join_field][parent]}"
}
}
Now if there is no join_field in the doc, above will set its routing to literally %{[join_field][parent]} in ES.
Is there anyway I can make it optional so the ES output will have routing set only if [join_field][parent] is there?
Or only way here is to have if else condition on the field and have separate output for each(but it feels odd to have multiple ifs for many options)? Also can this have any performance issue?
...
output {
if [join_field][parent] {
elasticsearch {
...
routing => "%{[join_field][parent]}"
}
} else {
elasticsearch {
...
}
}
}

Related

How to bypass filter if a tag is included?

I have a logstash instance processing incoming requests on port 5044.
I have then filebeat and metricbeat sending data.
Problem is that currently logstash is configured to accept only filebeat files due to a filter:
input {
beats {
port => "5044"
}
}
filter {
grok {
match => {...
}
output { ...
and metricbeat data is, of course, discarded.
Would it be possible to include metricbeat data by defining an exception to that filter? My idea would to be add a tag to metricbeat so to be recognized....is it possible?
You can filter by the beat type using the field #metadata.beat in a conditional.
It would be something like this:
filter {
if [#metadata][beat] == "filebeat" {
filters for filebeat
}
if [#metadata][beat] == "metricbeat" {
filters for metricbeat
}
}
You can also use the same conditional in the output block if you want to store the data from each beats in a different index.

Can't access Elasticsearch index name metadata in Logstash filter

I want to add the elasticsearch index name as a field in the event when processing in Logstash. This is suppose to be pretty straight forward but the index name does not get printed out. Here is the complete Logstash config.
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
}
}
filter {
mutate {
add_field => {
"log_source" => "%{[#metadata][_index]}"
}
}
}
output {
elasticsearch {
index => "logstash-%{+YYYY.MM}"
}
}
This will result in log_source being set to %{[#metadata][_index]} and not the actual name of the index. I have tried this with _id and without the underscores but it will always just output the reference and not the value.
Doing just %{[#metadata]} crashes Logstash with the error that it's trying to accessing the list incorrectly so [#metadata] is being set but it seems like index or any values are missing.
Does anyone have a another way of assigning the index name to the event?
I am using 5.0.1 of both Logstash and Elasticsearch.
You're almost there, you're simply missing the docinfo setting, which is false by default:
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
docinfo => true
}
}

Converting nginx access log bytes to number in Kibana4

I would like to create a visualization of the sum of bytes sent using the data from my nginx access logs. When trying to create a "Metric" visualization, I can't use the bytes field as a sum because it is a string type.
And I'm not able to change it under settings.
How do I go about changing this field type to a number/bytes type?
Here is my logstash config for nginx access logs
filter {
if [type] == "nginx-access" {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "useragent"
}
}
}
Since each logstash index is being created as an index, I'm guess I need to change it here.
I tried adding
mutate {
convert => { "bytes" => "integer" }
}
But it doesn't seem to make a difference.
Field types are configured using mappings, which is configured at the index level and can hardly change. With Logstash, as a new index is created everyday, so if you wan't to change these mappings either wait for the next day or delete the current index if you can.
By default these mappings are generated automatically by Elasticsearch depending on the syntax of the indexed JSON document and the applied Index Templates:
# Type String
{"bytes":"123"}
# Type Integer
{"bytes":123}
In the end there are 2 solutions:
Tune Logstash, to make it generate an integer and let Elasticsearch guess the field type → Use the mutate/convert filter
Tune Elasticsearch, to force the field bytes for the document type nginx-access to be of type integer → Use Index Template:
Index Template API:
PUT _template/logstash-nginx-access
{
"order": 1,
"template": "logstash-*",
"mappings": {
"nginx-access": {
"properties": {
"bytes": {
"type": "integer"
}
}
}
}
}

Remove an event field and reference it in Logstash

Using Logstash, I want to index documents into Elasticsearch and specify the type, id etc of the document that needs to be indexed. How can I specify those in my config without keeping useless fields in my documents?
Example: I want to specify the id used for insertion:
input {
stdin {
codec => json {}
}
}
output {
elasticsearch { document_id => "%{[id]}" }
}
This will insert the document in Elasticsearch with the id id but the document will keep a redundant field "id" in the mapping. How can I avoid that?
I thought of adding
filter{ mutate { remove_field => "%{[id]}"} }
in the config, but the field is removed and cannot consequently be used as document_id...
Right now this isn't possible. Logstash 1.5 introduces a #metadata field whose contents aren't included in what's eventually sent to the outputs, so you'd be able to create a [#metadata][id] field and refer to that in your output,
output {
elasticsearch { document_id => "%{[#metadata][id]}" }
}
without that field polluting the message payload indexed to Elasticsearch. See the #metadata documentation.

Make logstash add different inputs to different indices

I have setup logstash to use an embedded elastisearch.
I can log events.
My logstash conf looks thus:
https://gist.github.com/khebbie/42d72d212cf3727a03a0
Now I would like to add another udp input and have that input be indexed in another index.
Is that somehow possible?
I would do it to make reporting easier, so I could have system log events in one index, and business log events in another index.
Use an if conditional in your output section, based on e.g. the message type or whatever message field is significant to the choice of index.
input {
udp {
...
type => "foo"
}
file {
...
type => "bar"
}
}
output {
if [type] == "foo" {
elasticsearch {
...
index => "foo-index"
}
} else {
elasticsearch {
...
index => "bar-index"
}
}
}
Or, if the message type can go straight into the index name you can have a single output declaration:
elasticsearch {
...
index => "%{type}-index"
}

Resources