Dynamic Index in ElasticSearch from Logstash - elasticsearch

I have following configuration in logstash whereby I am able to create dynamic "document_type" into ES based on input JSON received:
elasticsearch {
hosts => ["localhost:9200"]
index => "queuelogs"
document_type => "%{action}"
}
Here, "action" is the parameter that I receive in JSON and different document_type gets created as per different action received.
Now I want this to be done same for Index creation, such as following:
elasticsearch {
hosts => ["localhost:9200"]
index => "%{logtype}"
document_type => "%{action}"
}
Here, "logtype" is the parameter that I receive in JSON.
But somehow in ES, it creates index as "%{logtype}" only, not as per actual logtype value .
The input JSON is as following:
{
"action": "UPLOAD",
"user": "123",
"timestamp": "2016 Jun 14 12:00:12",
"data": {
"file_id": "2345",
"file_name": "xyz.pdf"
},
"header": {
"proj_id": "P123",
"logtype": "httplogs"
},
"comments": "Check comments"
}
Here, I tried to generate index in following ways:
index => "%{logtype}"
index => "%{header.logtype}"
But in both the cases, Logstash does not replace the actual value of logtype from JSON.

You need to specify it like this:
elasticsearch {
hosts => ["localhost:9200"]
index => "%{[header][logtype]}"
document_type => "%{action}"
}

Related

How to use ingest pipeline with logstash elaticsearch output update feature

I am using Logstash Elasticsearch output to publish data to Elasticsearch. Two records are merged to create a single record from a request and a response. This code is working with no issues.
elasticsearch {
hosts => [ "localhost:9200" ]
index => "transactions"
action => "update"
doc_as_upsert => true
document_id => "%{tid}"
script =>'
if(ctx._source.transaction=="request"){
ctx._source.status = params.event.get("status");
}else if(ctx._source.transaction=="response"){
ctx._source.api = params.event.get("api");
}
}
Now I am trying to do add a new field with above record update using ingest pipelines.
PUT _ingest/pipeline/ingest_pipe2
{
"description" : "describe pipeline",
"processors" : [
{
"set" : {
"field": "api-test",
"value": "new"
}
}
]
}
This will add a new field to the incoming event. It works fine with following code.
elasticsearch {
hosts => [ "localhost:9200" ]
index => "transactions"
pipeline => "ingest_pipe2"
}
The problem is both logstash update and ingest pipeline update doesn't work together.
elasticsearch {
hosts => [ "localhost:9200" ]
index => "transactions"
pipeline => "ingest_pipe2"**
action => "update"
doc_as_upsert => true
document_id => "%{tid}"
script =>'
if(ctx._source.transaction=="request"){
ctx._source.status = params.event.get("status");
}else if(ctx._source.transaction=="response"){
ctx._source.api = params.event.get("api");
}
}
It is not possible to use an ingest pipeline with doc_as_upsert
Using ingest pipelines with doc_as_upsert is not supported.
You can find more info here and here

How to split a large json file input into different elastic search index?

The input to logstash is
input {
file {
path => "/tmp/very-large.json"
type => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
and sample json file
{"type":"type1", "msg":"..."}
{"type":"type2", "msg":"..."}
{"type":"type1", "msg":"..."}
{"type":"type3", "msg":"..."}
Is it possible to make them feed into different elastic search index, so I can process them easier in the future?
I know if it is possible to assign them with a tag, then I can do something like
if "type1" in [tags] {
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-type1%{+YYYY.MM.dd}"
flush_size => 50
}
}
How to do similar thing by looking at a specific json field value, e.g. type in my above example?
Even simpler, just use the type field to build the index name like this:
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-%{type}%{+YYYY.MM.dd}"
flush_size => 50
}
You can compare on any fields. You'll have to first parse your json with the json filter or codec.
Then you'll have a type field to work on, like this:
if [type] == "type1" {
elasticsearch {
...
index => "logstash-type1%{+YYYY.MM.dd}"
}
} else if [type] == "type2" {
elasticsearch {
...
index => "logstash-type2%{+YYYY.MM.dd}"
}
} ...
Or like in Val's answer:
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-%{type}%{+YYYY.MM.dd}"
flush_size => 50
}

Logstash Elasticsearch compression

I have a working ELK stack and would like to enable index compression.
The official store compression documentation tells me that I need to do it at index creation.
I couldn't find anything related to store compression or even index settings in the related logstash output documentation
Below is my logstash output configuration:
output {
elasticsearch {
hosts => [ "localhost:9200" ]
sniffing => true
manage_template => false
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
}
}
And the created index settings:
{
"filebeat-2016.04.28": {
"settings": {
"index": {
"creation_date": "1461915752875",
"uuid": "co8bvXI7RFKFwB7oJqs8cA",
"number_of_replicas": "1",
"number_of_shards": "5",
"version": {
"created": "2030199"
}
}
}
}
}
You need to provide your own index template file in order to enable index compression.
So you need to create your filebeat-template.json file like this. This file will be used by logstash when creating a new filebeat index.
{
"template" : "filebeat-*",
"settings" : {
"index.codec" : "best_compression"
}
}
Then your elasticsearch output should be modified like this:
output {
elasticsearch {
hosts => [ "localhost:9200" ]
sniffing => true
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
document_type => "%{[#metadata][type]}"
template_name => "filebeat-template"
template => "/path/to/filebeat-template.json"
}
}
Then you can delete your existing filebeat-2016.04.28 index and relaunch logstash. The latter will create an index template called /_template/filebeat-template which will kick in everytime ES needs to create a new index whose name starts with filebeat- and it will apply the settings (among which the store compression one) present in the template.

How to move data from one Elasticsearch index to another using the Bulk API

I am new to Elasticsearch. How to move data from one Elasticsearch index to another using the Bulk API?
I'd suggest using Logstash for this, i.e. you use one elasticsearch input plugin to retrieve the data from your index and another elasticsearch output plugin to push the data to your other index.
The config logstash config file would look like this:
input {
elasticsearch {
hosts => "localhost:9200"
index => "source_index" <--- the name of your source index
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ]
}
}
output {
elasticsearch {
host => "localhost"
port => 9200
protocol => "http"
manage_template => false
index => "target_index" <---- the name of your target index
document_type => "your_doc_type" <---- make sure to set the appropriate type
document_id => "%{id}"
workers => 5
}
}
After installing Logstash, you can run it like this:
bin/logstash -f logstash.conf

How to stop logstash from creating a default mapping in ElasticSearch

I am using logstash to feed logs into ElasticSearch.
I am configuring logstash output as:
input {
file {
path => "/tmp/foo.log"
codec =>
plain {
format => "%{message}"
}
}
}
output {
elasticsearch {
#host => localhost
codec => json {}
manage_template => false
index => "4glogs"
}
}
I notice that as soon as I start logstash it creates a mapping ( logs ) in ES as below.
{
"4glogs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
}
}
How can I prevent logstash from creating this mapping ?
UPDATE:
I have now resolved this error too. "object mapping for [logs] tried to parse as object, but got EOF, has a concrete value been provided to it?"
As John Petrone has stated below, once you define a mapping, you have to ensure that your documents conform to the mapping. In my case, I had defined a mapping of "type: nested" but the output from logstash was a string.
So I removed all codecs ( whether json or plain ) from my logstash config and that allowed the json document to pass through without changes.
Here is my new logstash config ( with some additional filters for multiline logs ).
input {
kafka {
zk_connect => "localhost:2181"
group_id => "logstash_group"
topic_id => "platform-logger"
reset_beginning => false
consumer_threads => 1
queue_size => 2000
consumer_id => "logstash-1"
fetch_message_max_bytes => 1048576
}
file {
path => "/tmp/foo.log"
}
}
filter {
multiline {
pattern => "^\s"
what => "previous"
}
multiline {
pattern => "[0-9]+$"
what => "previous"
}
multiline {
pattern => "^$"
what => "previous"
}
mutate{
remove_field => ["kafka"]
remove_field => ["#version"]
remove_field => ["#timestamp"]
remove_tag => ["multiline"]
}
}
output {
elasticsearch {
manage_template => false
index => "4glogs"
}
}
You will need a mapping to store data in Elasticsearch and to search on it - that's how ES knows how to index and search those content types. You can either let logstash create it dynamically or you can prevent it from doing so and instead create it manually.
Keep in mind you cannot change existing mappings (although you can add to them). So first off you will need to delete the existing index. You would then modify your settings to prevent dynamic mapping creation. At the same time you will want to create your own mapping.
For example, this will create the mappings for the logstash data but also restrict any dynamic mapping creation via "strict":
$ curl -XPUT 'http://localhost:9200/4glogs/logs/_mapping' -d '
{
"logs" : {
"dynamic": "strict",
"properties" : {
"#timestamp": {
"type": "date",
"format": "dateOptionalTime"
},
"#version": {
"type": "string"
},
"message": {
"type": "string"
}
}
}
}
'
Keep in mind that the index name "4glogs" and the type "logs" need to match what is coming from logstash.
For my production systems I generally prefer to turn off dynamic mapping as it avoids accidental mapping creation.
The following links should be useful if you want to make adjustments to your dynamic mappings:
https://www.elastic.co/guide/en/elasticsearch/guide/current/dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/custom-dynamic-mapping.html
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/dynamic-mapping.html
logs in this case is the index_type. If you don't want to create it as logs, specify some other index_type on your elasticsearch element. Every record in elasticsearch is required to have an index and a type. Logstash defaults to logs if you haven't specified it.
There's always an implicit mapping created when you insert records into Elasticsearch, so you can't prevent it from being created. You can create the mapping yourself before you insert anything (via say a template mapping).
The setting manage_template of false just prevents it from creating the template mapping for the index you've specified. You can delete the existing template if it's already been created by using something like curl -XDELETE http://localhost:9200/_template/logstash?pretty
Index templates can help you. Please see this jira for more details. You can create index templates with wildcard support to match an index name and put your default mappings.

Resources