I am trying my hands on ELK stack. I have a problem with my Kibana. One of the fields in my data is of type integer but on the Kibana it shows the type as undefined. Please find below the sample data I am working with.
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "products",
"_type": "logs",
"_id": "AVivMgCnKd2m9Wr-3jBk",
"_score": 1,
"_source": {
"message": "product10,990\r",
"#version": "1",
"#timestamp": "2016-11-29T08:27:18.792Z",
"path": "/Users/B0079855/Documents/SERVERS/logstash-2.2.2/samples/products.csv",
"host": "LTB0079855-MAC.local",
"product_name": "product10",
"product_price": 990
}
}
]
}
}
Kibana not identifying product_price as integer.
logstash conf:
input {
file {
path => "{filepath}"
# to read from the beginning of file
start_position => beginning
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["product_name", "product_price"]
}
mutate {
convert => { "product_price" => "integer" }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "products"
}
stdout { codec => rubydebug }
}
How do I make this work?
If you haven't mapped your fields using Mapping, you could do something like below in order to create a mapping before you create the index:
PUT request: http://yourhost:9200/yourindex
Request BODY:
{
"mappings": {
"your_type": { <--- document_type value in logstash conf
"properties": {
"product_price": {
"type": "integer"
}
}
}
}
}
If not you can even update the index mapping back again with a PUT mapping API:
curl -XPUT 'http://localhost:9200/your_index/_mapping/your_type' -d '
{
"yout_type" : {
"properties" : {
//your new mapping properties
"product_price": {
"type": "integer"
}
}
}
}'
Hope this SO helps as well.
EDIT:
In your case since you're converting it using logstash, you're tryign to convert it outside of csv plugin. Try converting it within the plugin itself;
filter {
csv {
columns => ["product_name", "product_price"]
mutate {
convert => { "product_price" => "integer" }
}
}
}
Related
I have two index
employee_data
{"code":1, "name":xyz, "city":"Mumbai" }
transaction_data
{"code":1, "Month":June", payment:78000 }
I want third index like this
3)join_index
{"code":1, "name":xyz, "city":"Mumbai", "Month":June", payment:78000 }
How it's possible??
i am trying in logstash
input {
elasticsearch {
hosts => "localost"
index => "employees_data,transaction_data"
query => '{ "query": { "match": { "code": 1} } }'
scroll => "5m"
docinfo => true
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
You can use elasticsearch input on employees_data
In your filters, use the elasticsearch filter on transaction_data
input {
elasticsearch {
hosts => "localost"
index => "employees_data"
query => '{ "query": { "match_all": { } } }'
sort => "code:desc"
scroll => "5m"
docinfo => true
}
}
filter {
elasticsearch {
hosts => "localhost"
index => "transaction_data"
query => "(code:\"%{[code]}\"
fields => {
"Month" => "Month",
"payment" => "payment"
}
}
}
output {
elasticsearch {
hosts => ["localhost"]
index => "join1"
}
}
And send your new document to your third index with the elasticsearch output
You'll have 3 elastic search connection and the result can be a little slow.
But it works.
You don't need Logstash to do this, Elasticsearch itself supports that by leveraging the enrich processor.
First, you need to create an enrich policy (use the smallest index, let's say it's employees_data ):
PUT /_enrich/policy/employee-policy
{
"match": {
"indices": "employees_data",
"match_field": "code",
"enrich_fields": ["name", "city"]
}
}
Then you can execute that policy in order to create an enrichment index
POST /_enrich/policy/employee-policy/_execute
When the enrichment index has been created and populated, the next step requires you to create an ingest pipeline that uses the above enrich policy/index:
PUT /_ingest/pipeline/employee_lookup
{
"description" : "Enriching transactions with employee data",
"processors" : [
{
"enrich" : {
"policy_name": "employee-policy",
"field" : "code",
"target_field": "tmp",
"max_matches": "1"
}
},
{
"script": {
"if": "ctx.tmp != null",
"source": "ctx.putAll(ctx.tmp); ctx.remove('tmp');"
}
}
]
}
Finally, you're now ready to create your target index with the joined data. Simply leverage the _reindex API combined with the ingest pipeline we've just created:
POST _reindex
{
"source": {
"index": "transaction_data"
},
"dest": {
"index": "join1",
"pipeline": "employee_lookup"
}
}
After running this, the join1 index will contain exactly what you need, for instance:
{
"_index" : "join1",
"_type" : "_doc",
"_id" : "0uA8dXMBU9tMsBeoajlw",
"_score" : 1.0,
"_source" : {
"code":1,
"name": "xyz",
"city": "Mumbai",
"Month": "June",
"payment": 78000
}
}
As long as I know, this can not be happened just using elasticsearch APIs. To handle this, you need to set a unique ID for documents that are relevant. For example, the code that you mentioned in your question can be a good ID for documents. So you can reindex the first index to the third one and use UPDATE API to update them by reading documents from the second index and update them by their IDs into the third index. I hope I could help.
I've added a template like this:
curl -X PUT "e.f.g.h:9200/_template/impression-template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["impression-%{+YYYY.MM.dd}"],
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"dynamic": false,
"properties": {
"message": {
"type": "object",
"properties": {
...
And I've logstash instance that read events from kafka on write them to ES. Here is my logstash config:
input {
kafka {
topics => ["impression"]
bootstrap_servers => "a.b.c.d:9092"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
elasticsearch {
hosts => ["e.f.g.h:9200"]
index => "impression-%{+YYYY.MM.dd}"
template_name => "impression-template"
}
}
But each day I get index with 5 shard and 1 replica (which is default config of ES). How I could fix that so I could get 2 replica and 2 shard?
Not sure you can add index_pattern as my_index-%{+YYYY.MM.dd}, because when you create it and PUT my_index-2019.03.10 it will have empty mapping because it's not recognized. I had same issue, and workaround for this was to set index_pattern as my_index-* and add year suffix to indices which should look like my_index-2017, my_index-2018...
{
"my_index_template" : {
"order" : 0,
"index_patterns" : [
"my_index-*"
],
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1"
}
},...
I took year part from timestamp field (YYYY-MM-dd) to generate year and add it to the end of index name by logstash
grok {
match => [
"timestamp", "(?<index_year>%{YEAR})"
]
}
mutate {
add_field => {
"[#metadata][index_year]" => "%{index_year}"
}
}
mutate {
remove_field => [ "index_year", "#version" ]
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index-%{[#metadata][index_year]}"
document_id => "%{some_field}"
}
}
After logstash was completed, I've managed to get my_index-2017, my_index-2018 and my_index-2019 indices with 5 shards, and 1 replica and correct mapping as I predefined in my template.
I'm trying to have logstash output to elasticsearch but I'm not sure how to use the mapping I defined in elasticsearch...
In Kibana, I did this:
Created an index and mapping like this:
PUT /kafkajmx2
{
"mappings": {
"kafka_mbeans": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "integer"
},
"host": {
"type": "keyword"
},
"metric_path": {
"type": "text"
},
"type": {
"type": "keyword"
},
"path": {
"type": "text"
},
"metric_value_string": {
"type": "keyword"
},
"metric_value_number": {
"type": "float"
}
}
}
}
}
Can write data to it like this:
POST /kafkajmx2/kafka_mbeans
{
"metric_value_number":159.03478490788203,
"path":"/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"#timestamp":"2017-02-12T23:08:40.934Z",
"#version":"1","host":"localhost",
"metric_path":"node1.kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec.FifteenMinuteRate",
"type":null
}
now my logstash output looks like this:
input {
kafka {
kafka details here
}
}
output {
elasticsearch {
hosts => "http://elasticsearch:9050"
index => "kafkajmx2"
}
}
and it just writes it to the kafkajmx2 index but doesn't use the map, when I query it like this in kibana:
get /kafkajmx2/kafka_mbeans/_search?q=*
{
}
I get this back:
{
"_index": "kafkajmx2",
"_type": "logs",
"_id": "AVo34xF_j-lM6k7wBavd",
"_score": 1,
"_source": {
"#timestamp": "2017-02-13T14:31:53.337Z",
"#version": "1",
"message": """
{"metric_value_number":0,"path":"/home/usrxxx/logstash-5.2.0/bin/jmxconf","#timestamp":"2017-02-13T14:31:52.654Z","#version":"1","host":"localhost","metric_path":"node1.kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSec.Count","type":null}
"""
}
}
how do I tell it to use the map kafka_mbeans in the logstash output?
-----EDIT-----
I tried my output like this but still get the same results:
output {
elasticsearch {
hosts => "http://10.204.93.209:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
codec => plain {
format => "%{message}"
}
}
}
the data in elastic search should look like this:
{
"#timestamp": "2017-02-13T14:31:52.654Z",
"#version": "1",
"host": "localhost",
"metric_path": "node1.kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSec.Count",
"metric_value_number": 0,
"path": "/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"type": null
}
--------EDIT 2--------------
I atleast got the message to parse into json by adding a filter like this:
input {
kafka {
...kafka details....
}
}
filter {
json {
source => "message"
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => "http://node1:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
}
}
It doesn't use the template still but this atleast parses the json correctly...so now I get this:
{
"_index": "kafkajmx2",
"_type": "logs",
"_id": "AVo4a2Hzj-lM6k7wBcMS",
"_score": 1,
"_source": {
"metric_value_number": 0.9967205071482902,
"path": "/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"#timestamp": "2017-02-13T16:54:16.701Z",
"#version": "1",
"host": "localhost",
"metric_path": "kafka1.kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent.Value",
"type": null
}
}
What you need to change is very simple. First use the json codec in your kafka input. No need for the json filter, you can remove it.
kafka {
...kafka details....
codec => "json"
}
Then in your elasticsearch output you're missing the mapping type (parameter document_type below), which is important otherwise it defaults to logs (as you can see) and that doesn't match your kafka_mbeans mapping type. Moreover, you don't really need to use template since your index already exists. Make the following modification:
elasticsearch {
hosts => "http://node1:9050"
index => "kafkajmx2"
document_type => "kafka_mbeans"
}
This is defined with the template_name parameter on the elasticsearch output.
elasticsearch {
hosts => "http://elasticsearch:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
}
One warning, though. If you want to start creating indexes that are boxed on time, such as one index a week, you will have to take a few more steps to ensure your mapping stays with each. You have a couple of options there:
Create an elasticsearch template, and define it to apply to indexes using a glob. Such as kafkajmx2-*
Use the template parameter on the output, which specifies a JSON file that defines your mapping that will be used with all indexes created through that output.
I'm trying to simply disable dynamic mapping for any fields not explicitly defined in the mapping at index creation time. Nothing would work, so I even tried the example in their docs
PUT my_index
{
"mappings": {
"my_type": {
"dynamic": false,
"properties": {
"user": {
"type": "text"
}
}
}
}
}
Made a test insert:
POST my_index/my_type
{
"user": "tester",
"some_unknown_field": "lsdkfjsd"
}
Then searching the index shows:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "AViPrfwVko8c8Q3co8Qz",
"_score": 1,
"_source": {
"user": "tester",
"some_unknown_field": "lsdkfjsd"
}
}
]
}
}
I'm expecting "some_unknown_field" to not be indexed, since it was not defined in the mapping. So why is it still being indexed? Am I missing something?
UPDATE
It turns out that it isn't currently possible in version 5.0.0 to do what I wanted, so I removed the fields in my app before sending to elasticsearch and achieved the same end result.
What mapping does is to have your field as the type which you mention, when you create the index using the mapping. So for a field which you haven't mentioned anything during the mapping and then trying to insert values, ES will always consider it as a new field and will add it to the index with a default mapping. So if you don't want to see a particular field within your _source you could do some source filtering.
Work arounds:
If that's not the case try disabling the default mapping when
you're creating the index.
Try making the property dynamic into strict:
PUT /test
{
"settings": {
"index.mapper.dynamic": false
},
"mappings": {
"testing_type": {
"dynamic":"strict",
"properties": {
"field1": {
"type": "string"
}
}
}
}
}
If the above two doesn't work out, try making index_mapper_dynamicto false. This SO could be handy. Hope it helps.
I have a csv file which I'm tryng to upload to ES using Logstash. My conf file is as follows:
input {
file {
path => ["filename"]
start_position => "beginning"
}
}
filter {
csv {
columns => ["name1", "name2", "name3", ...]
separator => ","
}
}
filter {
mutate {
remove_field => ["name31", "name32", "name33"]
}
}
output {
stdout{
codec => rubydebug
}
elasticsearch {
action => "index"
host => "localhost"
index => "newindex"
template_overwrite => true
document_type => "newdoc"
template => "template.json"
}
}
My template file looks like the following:
{
"mappings": {
"newdoc": {
"properties": {
"name1": {
"type": "integer"
},
"name2": {
"type": "float"
},
"name3": {
"format": "dateOptionalTime",
"type": "date"
},
"name4": {
"index": "not_analyzed",
"type": "string"
},
....
}
}
},
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"template": "newindex"
}
When I try to overwrite the default mapping, I get an 400 error even when I only try to write one line:
failed action with response of 400, dropping action: ["index", + ...
What can be the problem? Everything works fine if I don't overwrite the mapping but that is not a solution for me. I'm using Logstash 1.5.1 and Elasticsearch 1.5.0 on Red Hat.
Thanks
You should POST your request 'mapping' to elasticsearch before loading data in elasticsearch
POST mapping
You don't need to create the index before running logstash , It does create the index if you haven't yet , but it's better to create your own mapping before runing your conf file with logstash . Gives you more control over your field types etc.. Here is a simple tutorial on how to import csv to elasticsearch using logstash : http://freefilesdl.com/how-to-connect-logstash-to-elasticsearch-output