add custom mapping for elasticsearch in logstash - elasticsearch

I am using logstash to input my logs in elasticsearch. Everyday, it create a new index
here is my output part of my logstash config file
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["127.0.0.1"]
index => "logstash-%{+YYYY.MM.dd}"
}
}
I want some fields to be not analysed. But everyday when a new index is created, a new mapping is created and all the fields are analysed. How can I force elasticsearch to use a particular mapping every time a new index is created?

You can do this by assigning templates and managing them, for example my configuration:
elasticsearch {
hosts => ["localhost:9200"]
index => "XXX-%{+YYYY.ww}"
template => "/opt/logstash/templates/XXX.json"
template_name => "XXX"
manage_template => true
}
I believe my configuration may be slightly out of date, as we are sadly on an older version of logstash ... So it would be helpful to read up on this on the docs: https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html
This is definitely possible inside logstash though.
Artur

You can use a ES index template, which then will be used when creating an index: https://www.elastic.co/guide/en/elasticsearch/reference/2.4/indices-templates.html.
In your case the template would look like this:
{
"template": "logstash-*",
"mappings": {
"_default_": {
...
}
}
}

Related

Logstash Elasticsearch plugin. Compare results from two sources

I have two deployed Elasticsearch clusters. Data "surpassingly" should be the same in both clusters. My main aim is to compare _source field for each elasticsearch document from source and target ES clusters.
I created logstash config in which I define Elasticsearch input plugin, which run over each document in source cluster, next using elasticsearch filter look up the target Elasticsearch cluster and query from it document by _id which I took from source cluster, match results of the _source field for both documents.
Could you please helm to implement such a config.
input {
elasticsearch {
hosts => ["source_cluster:9200"]
ssl => true
user => "user"
password => "password"
index => "my_index_pattern"
}
}
filter {
mutate {
remove_field => ["#version", "#timestamp"]
}
elasticsearch {
hosts => ["target_custer:9200"]
ssl => true
user => "user"
password => "password"
query => ???????
match _source field ????
}
}
output {
stdout { codec => rubydebug }
}
Maybe print some results of comparison...

Can't access Elasticsearch index name metadata in Logstash filter

I want to add the elasticsearch index name as a field in the event when processing in Logstash. This is suppose to be pretty straight forward but the index name does not get printed out. Here is the complete Logstash config.
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
}
}
filter {
mutate {
add_field => {
"log_source" => "%{[#metadata][_index]}"
}
}
}
output {
elasticsearch {
index => "logstash-%{+YYYY.MM}"
}
}
This will result in log_source being set to %{[#metadata][_index]} and not the actual name of the index. I have tried this with _id and without the underscores but it will always just output the reference and not the value.
Doing just %{[#metadata]} crashes Logstash with the error that it's trying to accessing the list incorrectly so [#metadata] is being set but it seems like index or any values are missing.
Does anyone have a another way of assigning the index name to the event?
I am using 5.0.1 of both Logstash and Elasticsearch.
You're almost there, you're simply missing the docinfo setting, which is false by default:
input {
elasticsearch {
hosts => "elasticsearch.example.com"
index => "*-logs"
docinfo => true
}
}

Logstash -> Elasticsearch : update document #timestamp if newer, discard if older

Using the elasticsearch output in logstash, how can i update only the #timestamp for a log message if newer?
I don't want to reindex the whole document, nor have the same log message indexed twice.
Also, if the #timestamp is older, it must not update/replace the current version.
Currently, i'm doing this:
filter {
if ("cloned" in [tags]) {
fingerprint {
add_tag => [ "lastlogin" ]
key => "lastlogin"
method => "SHA1"
}
}
}
output {
if ("cloned" in [tags]) {
elasticsearch {
action => "update"
doc_as_upsert => true
document_id => "%{fingerprint}"
index => "lastlogin-%{+YYYY.MM}"
sniffing => true
template_overwrite => true
}
}
}
It is similar to How to deduplicate documents while indexing into elasticsearch from logstash but i do not want to always update the message field; only if the #timestamp field is more recent.
You can't decide from Logstash level if a document needs to be updated or nothing should be done, this needs to be decided at Elasticsearch level. Which means that you need to experiment and test with _update API.
I suggest looking at https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#upserts. Meaning, if the document exists the script is executed (where you can check, if you want, the #timestamp), otherwise the content of upsert is considered as a new document.

Logstash Elasticsearch compression

I have a working ELK stack and would like to enable index compression.
The official store compression documentation tells me that I need to do it at index creation.
I couldn't find anything related to store compression or even index settings in the related logstash output documentation
Below is my logstash output configuration:
output {
elasticsearch {
hosts => [ "localhost:9200" ]
sniffing => true
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
And the created index settings:
{
"filebeat-2016.04.28": {
"settings": {
"index": {
"creation_date": "1461915752875",
"uuid": "co8bvXI7RFKFwB7oJqs8cA",
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "2030199"
}
}
}
}
}
You need to provide your own index template file in order to enable index compression.
So you need to create your filebeat-template.json file like this. This file will be used by logstash when creating a new filebeat index.
{
"template" : "filebeat-*",
"settings" : {
"index.codec" : "best_compression"
}
}
Then your elasticsearch output should be modified like this:
output {
elasticsearch {
hosts => [ "localhost:9200" ]
sniffing => true
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
template_name => "filebeat-template"
template => "/path/to/filebeat-template.json"
}
}
Then you can delete your existing filebeat-2016.04.28 index and relaunch logstash. The latter will create an index template called /_template/filebeat-template which will kick in everytime ES needs to create a new index whose name starts with filebeat- and it will apply the settings (among which the store compression one) present in the template.

How to move data from one Elasticsearch index to another using the Bulk API

I am new to Elasticsearch. How to move data from one Elasticsearch index to another using the Bulk API?
I'd suggest using Logstash for this, i.e. you use one elasticsearch input plugin to retrieve the data from your index and another elasticsearch output plugin to push the data to your other index.
The config logstash config file would look like this:
input {
elasticsearch {
hosts => "localhost:9200"
index => "source_index" <--- the name of your source index
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
elasticsearch {
host => "localhost"
port => 9200
protocol => "http"
manage_template => false
index => "target_index" <---- the name of your target index
document_type => "your_doc_type" <---- make sure to set the appropriate type
document_id => "%{id}"
workers => 5
}
}
After installing Logstash, you can run it like this:
bin/logstash -f logstash.conf

Resources