logstash output to elasticsearch index and mapping - elasticsearch

I'm trying to have logstash output to elasticsearch but I'm not sure how to use the mapping I defined in elasticsearch...
In Kibana, I did this:
Created an index and mapping like this:
PUT /kafkajmx2
{
"mappings": {
"kafka_mbeans": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "integer"
},
"host": {
"type": "keyword"
},
"metric_path": {
"type": "text"
},
"type": {
"type": "keyword"
},
"path": {
"type": "text"
},
"metric_value_string": {
"type": "keyword"
},
"metric_value_number": {
"type": "float"
}
}
}
}
}
Can write data to it like this:
POST /kafkajmx2/kafka_mbeans
{
"metric_value_number":159.03478490788203,
"path":"/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"#timestamp":"2017-02-12T23:08:40.934Z",
"#version":"1","host":"localhost",
"metric_path":"node1.kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec.FifteenMinuteRate",
"type":null
}
now my logstash output looks like this:
input {
kafka {
kafka details here
}
}
output {
elasticsearch {
hosts => "http://elasticsearch:9050"
index => "kafkajmx2"
}
}
and it just writes it to the kafkajmx2 index but doesn't use the map, when I query it like this in kibana:
get /kafkajmx2/kafka_mbeans/_search?q=*
{
}
I get this back:
{
"_index": "kafkajmx2",
"_type": "logs",
"_id": "AVo34xF_j-lM6k7wBavd",
"_score": 1,
"_source": {
"#timestamp": "2017-02-13T14:31:53.337Z",
"#version": "1",
"message": """
{"metric_value_number":0,"path":"/home/usrxxx/logstash-5.2.0/bin/jmxconf","#timestamp":"2017-02-13T14:31:52.654Z","#version":"1","host":"localhost","metric_path":"node1.kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSec.Count","type":null}
"""
}
}
how do I tell it to use the map kafka_mbeans in the logstash output?
-----EDIT-----
I tried my output like this but still get the same results:
output {
elasticsearch {
hosts => "http://10.204.93.209:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
codec => plain {
format => "%{message}"
}
}
}
the data in elastic search should look like this:
{
"#timestamp": "2017-02-13T14:31:52.654Z",
"#version": "1",
"host": "localhost",
"metric_path": "node1.kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSec.Count",
"metric_value_number": 0,
"path": "/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"type": null
}
--------EDIT 2--------------
I atleast got the message to parse into json by adding a filter like this:
input {
kafka {
...kafka details....
}
}
filter {
json {
source => "message"
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => "http://node1:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
}
}
It doesn't use the template still but this atleast parses the json correctly...so now I get this:
{
"_index": "kafkajmx2",
"_type": "logs",
"_id": "AVo4a2Hzj-lM6k7wBcMS",
"_score": 1,
"_source": {
"metric_value_number": 0.9967205071482902,
"path": "/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"#timestamp": "2017-02-13T16:54:16.701Z",
"#version": "1",
"host": "localhost",
"metric_path": "kafka1.kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent.Value",
"type": null
}
}

What you need to change is very simple. First use the json codec in your kafka input. No need for the json filter, you can remove it.
kafka {
...kafka details....
codec => "json"
}
Then in your elasticsearch output you're missing the mapping type (parameter document_type below), which is important otherwise it defaults to logs (as you can see) and that doesn't match your kafka_mbeans mapping type. Moreover, you don't really need to use template since your index already exists. Make the following modification:
elasticsearch {
hosts => "http://node1:9050"
index => "kafkajmx2"
document_type => "kafka_mbeans"
}

This is defined with the template_name parameter on the elasticsearch output.
elasticsearch {
hosts => "http://elasticsearch:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
}
One warning, though. If you want to start creating indexes that are boxed on time, such as one index a week, you will have to take a few more steps to ensure your mapping stays with each. You have a couple of options there:
Create an elasticsearch template, and define it to apply to indexes using a glob. Such as kafkajmx2-*
Use the template parameter on the output, which specifies a JSON file that defines your mapping that will be used with all indexes created through that output.

Related

Combine two index into third index in elastic search using logstash

I have two index
employee_data
{"code":1, "name":xyz, "city":"Mumbai" }
transaction_data
{"code":1, "Month":June", payment:78000 }
I want third index like this
3)join_index
{"code":1, "name":xyz, "city":"Mumbai", "Month":June", payment:78000 }
How it's possible??
i am trying in logstash
input {
elasticsearch {
hosts => "localost"
index => "employees_data,transaction_data"
query => '{ "query": { "match": { "code": 1} } }'
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
You can use elasticsearch input on employees_data
In your filters, use the elasticsearch filter on transaction_data
input {
elasticsearch {
hosts => "localost"
index => "employees_data"
query => '{ "query": { "match_all": { } } }'
sort => "code:desc"
scroll => "5m"
docinfo => true
}
}
filter {
elasticsearch {
hosts => "localhost"
index => "transaction_data"
query => "(code:\"%{[code]}\"
fields => {
"Month" => "Month",
"payment" => "payment"
}
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
And send your new document to your third index with the elasticsearch output
You'll have 3 elastic search connection and the result can be a little slow.
But it works.
You don't need Logstash to do this, Elasticsearch itself supports that by leveraging the enrich processor.
First, you need to create an enrich policy (use the smallest index, let's say it's employees_data ):
PUT /_enrich/policy/employee-policy
{
"match": {
"indices": "employees_data",
"match_field": "code",
"enrich_fields": ["name", "city"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/employee-policy/_execute
When the enrichment index has been created and populated, the next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/employee_lookup
{
"description" : "Enriching transactions with employee data",
"processors" : [
{
"enrich" : {
"policy_name": "employee-policy",
"field" : "code",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "transaction_data"
},
"dest": {
"index": "join1",
"pipeline": "employee_lookup"
}
}
After running this, the join1 index will contain exactly what you need, for instance:
{
"_index" : "join1",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"code":1,
"name": "xyz",
"city": "Mumbai",
"Month": "June",
"payment": 78000
}
}
As long as I know, this can not be happened just using elasticsearch APIs. To handle this, you need to set a unique ID for documents that are relevant. For example, the code that you mentioned in your question can be a good ID for documents. So you can reindex the first index to the third one and use UPDATE API to update them by reading documents from the second index and update them by their IDs into the third index. I hope I could help.

Replica and shard settings not applied in elasticsearch template

I've added a template like this:
curl -X PUT "e.f.g.h:9200/_template/impression-template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["impression-%{+YYYY.MM.dd}"],
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"dynamic": false,
"properties": {
"message": {
"type": "object",
"properties": {
...
And I've logstash instance that read events from kafka on write them to ES. Here is my logstash config:
input {
kafka {
topics => ["impression"]
bootstrap_servers => "a.b.c.d:9092"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
elasticsearch {
hosts => ["e.f.g.h:9200"]
index => "impression-%{+YYYY.MM.dd}"
template_name => "impression-template"
}
}
But each day I get index with 5 shard and 1 replica (which is default config of ES). How I could fix that so I could get 2 replica and 2 shard?
Not sure you can add index_pattern as my_index-%{+YYYY.MM.dd}, because when you create it and PUT my_index-2019.03.10 it will have empty mapping because it's not recognized. I had same issue, and workaround for this was to set index_pattern as my_index-* and add year suffix to indices which should look like my_index-2017, my_index-2018...
{
"my_index_template" : {
"order" : 0,
"index_patterns" : [
"my_index-*"
],
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1"
}
},...
I took year part from timestamp field (YYYY-MM-dd) to generate year and add it to the end of index name by logstash
grok {
match => [
"timestamp", "(?<index_year>%{YEAR})"
]
}
mutate {
add_field => {
"[#metadata][index_year]" => "%{index_year}"
}
}
mutate {
remove_field => [ "index_year", "#version" ]
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index-%{[#metadata][index_year]}"
document_id => "%{some_field}"
}
}
After logstash was completed, I've managed to get my_index-2017, my_index-2018 and my_index-2019 indices with 5 shards, and 1 replica and correct mapping as I predefined in my template.

Kibana not taking data types

I am trying my hands on ELK stack. I have a problem with my Kibana. One of the fields in my data is of type integer but on the Kibana it shows the type as undefined. Please find below the sample data I am working with.
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "products",
"_type": "logs",
"_id": "AVivMgCnKd2m9Wr-3jBk",
"_score": 1,
"_source": {
"message": "product10,990\r",
"#version": "1",
"#timestamp": "2016-11-29T08:27:18.792Z",
"path": "/Users/B0079855/Documents/SERVERS/logstash-2.2.2/samples/products.csv",
"host": "LTB0079855-MAC.local",
"product_name": "product10",
"product_price": 990
}
}
]
}
}
Kibana not identifying product_price as integer.
logstash conf:
input {
file {
path => "{filepath}"
# to read from the beginning of file
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["product_name", "product_price"]
}
mutate {
convert => { "product_price" => "integer" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "products"
}
stdout { codec => rubydebug }
}
How do I make this work?
If you haven't mapped your fields using Mapping, you could do something like below in order to create a mapping before you create the index:
PUT request: http://yourhost:9200/yourindex
Request BODY:
{
"mappings": {
"your_type": { <--- document_type value in logstash conf
"properties": {
"product_price": {
"type": "integer"
}
}
}
}
}
If not you can even update the index mapping back again with a PUT mapping API:
curl -XPUT 'http://localhost:9200/your_index/_mapping/your_type' -d '
{
"yout_type" : {
"properties" : {
//your new mapping properties
"product_price": {
"type": "integer"
}
}
}
}'
Hope this SO helps as well.
EDIT:
In your case since you're converting it using logstash, you're tryign to convert it outside of csv plugin. Try converting it within the plugin itself;
filter {
csv {
columns => ["product_name", "product_price"]
mutate {
convert => { "product_price" => "integer" }
}
}
}

ELK - Kibana doesn't recognize geo_point field

I'm trying to create a Tile map on Kibana, with GEO location points.
For some reason, When I'm trying to create the map, I get the following message on Kibana:
No Compatible Fields: The "logs" index pattern does not contain any of
the following field types: geo_point
My settings:
Logstash (version 2.3.1):
filter {
grok {
match => {
"message" => "MY PATTERN"
}
}
geoip {
source => "ip"
target => "geoip"
add_field => [ "location", "%{[geoip][latitude]}, %{[geoip][longitude]}" ] #added this extra field in case the nested field is the problem
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
index => "logs"
}
}
When log input arrives, I can see it parse it as should and I do get the geoIp data for a given IP:
"geoip" => {
"ip" => "XXX.XXX.XXX.XXX",
"country_code2" => "XX",
"country_code3" => "XXX",
"country_name" => "XXXXXX",
"continent_code" => "XX",
"region_name" => "XX",
"city_name" => "XXXXX",
"latitude" => XX.0667,
"longitude" => XX.766699999999986,
"timezone" => "XXXXXX",
"real_region_name" => "XXXXXX",
"location" => [
[0] XX.766699999999986,
[1] XX.0667
]
},
"location" => "XX.0667, XX.766699999999986"
ElasticSearch (version 2.3.1):
GET /logs/_mapping returns:
{
"logs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
"geoip": {
"properties": {
.
.
.
"latitude": {
"type": "double"
},
"location": {
"type": "geo_point"
},
"longitude": {
"type": "double"
}
}
},
"location": {
"type": "geo_point"
}
}
}
}
}
}
Kibana (version 4.5.0):
I do see all the data and everything seems to be fine.
Just when I go to "Visualize" -> "Tile map" -> "From a new search" -> "Geo Coordinates", I get this error message:
No Compatible Fields: The "logs" index pattern does not contain any of the following field types: geo_point
Even tho I see in elasticsearch mapping that the location type is geo_point.
What am I missing?
Found the issue!
I called the index "logs". changed the index name to "logstash-logs" (need logstash-* prefix) and everything started to function!

upload csv with logstash to elasticsearch with new mappings

I have a csv file which I'm tryng to upload to ES using Logstash. My conf file is as follows:
input {
file {
path => ["filename"]
start_position => "beginning"
}
}
filter {
csv {
columns => ["name1", "name2", "name3", ...]
separator => ","
}
}
filter {
mutate {
remove_field => ["name31", "name32", "name33"]
}
}
output {
stdout{
codec => rubydebug
}
elasticsearch {
action => "index"
host => "localhost"
index => "newindex"
template_overwrite => true
document_type => "newdoc"
template => "template.json"
}
}
My template file looks like the following:
{
"mappings": {
"newdoc": {
"properties": {
"name1": {
"type": "integer"
},
"name2": {
"type": "float"
},
"name3": {
"format": "dateOptionalTime",
"type": "date"
},
"name4": {
"index": "not_analyzed",
"type": "string"
},
....
}
}
},
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"template": "newindex"
}
When I try to overwrite the default mapping, I get an 400 error even when I only try to write one line:
failed action with response of 400, dropping action: ["index", + ...
What can be the problem? Everything works fine if I don't overwrite the mapping but that is not a solution for me. I'm using Logstash 1.5.1 and Elasticsearch 1.5.0 on Red Hat.
Thanks
You should POST your request 'mapping' to elasticsearch before loading data in elasticsearch
POST mapping
You don't need to create the index before running logstash , It does create the index if you haven't yet , but it's better to create your own mapping before runing your conf file with logstash . Gives you more control over your field types etc.. Here is a simple tutorial on how to import csv to elasticsearch using logstash : http://freefilesdl.com/how-to-connect-logstash-to-elasticsearch-output

Resources