upload csv with logstash to elasticsearch with new mappings - elasticsearch

I have a csv file which I'm tryng to upload to ES using Logstash. My conf file is as follows:
input {
file {
path => ["filename"]
start_position => "beginning"
}
}
filter {
csv {
columns => ["name1", "name2", "name3", ...]
separator => ","
}
}
filter {
mutate {
remove_field => ["name31", "name32", "name33"]
}
}
output {
stdout{
codec => rubydebug
}
elasticsearch {
action => "index"
host => "localhost"
index => "newindex"
template_overwrite => true
document_type => "newdoc"
template => "template.json"
}
}
My template file looks like the following:
{
"mappings": {
"newdoc": {
"properties": {
"name1": {
"type": "integer"
},
"name2": {
"type": "float"
},
"name3": {
"format": "dateOptionalTime",
"type": "date"
},
"name4": {
"index": "not_analyzed",
"type": "string"
},
....
}
}
},
"settings": {
"number_of_replicas": 0,
"number_of_shards": 1
},
"template": "newindex"
}
When I try to overwrite the default mapping, I get an 400 error even when I only try to write one line:
failed action with response of 400, dropping action: ["index", + ...
What can be the problem? Everything works fine if I don't overwrite the mapping but that is not a solution for me. I'm using Logstash 1.5.1 and Elasticsearch 1.5.0 on Red Hat.
Thanks

You should POST your request 'mapping' to elasticsearch before loading data in elasticsearch
POST mapping

You don't need to create the index before running logstash , It does create the index if you haven't yet , but it's better to create your own mapping before runing your conf file with logstash . Gives you more control over your field types etc.. Here is a simple tutorial on how to import csv to elasticsearch using logstash : http://freefilesdl.com/how-to-connect-logstash-to-elasticsearch-output

Related

Replica and shard settings not applied in elasticsearch template

I've added a template like this:
curl -X PUT "e.f.g.h:9200/_template/impression-template" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["impression-%{+YYYY.MM.dd}"],
"settings": {
"number_of_shards": 2,
"number_of_replicas": 2
},
"mappings": {
"_doc": {
"_source": {
"enabled": false
},
"dynamic": false,
"properties": {
"message": {
"type": "object",
"properties": {
...
And I've logstash instance that read events from kafka on write them to ES. Here is my logstash config:
input {
kafka {
topics => ["impression"]
bootstrap_servers => "a.b.c.d:9092"
}
}
filter {
json {
source => "message"
target => "message"
}
}
output {
elasticsearch {
hosts => ["e.f.g.h:9200"]
index => "impression-%{+YYYY.MM.dd}"
template_name => "impression-template"
}
}
But each day I get index with 5 shard and 1 replica (which is default config of ES). How I could fix that so I could get 2 replica and 2 shard?
Not sure you can add index_pattern as my_index-%{+YYYY.MM.dd}, because when you create it and PUT my_index-2019.03.10 it will have empty mapping because it's not recognized. I had same issue, and workaround for this was to set index_pattern as my_index-* and add year suffix to indices which should look like my_index-2017, my_index-2018...
{
"my_index_template" : {
"order" : 0,
"index_patterns" : [
"my_index-*"
],
"settings" : {
"index" : {
"number_of_shards" : "5",
"number_of_replicas" : "1"
}
},...
I took year part from timestamp field (YYYY-MM-dd) to generate year and add it to the end of index name by logstash
grok {
match => [
"timestamp", "(?<index_year>%{YEAR})"
]
}
mutate {
add_field => {
"[#metadata][index_year]" => "%{index_year}"
}
}
mutate {
remove_field => [ "index_year", "#version" ]
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
index => "my_index-%{[#metadata][index_year]}"
document_id => "%{some_field}"
}
}
After logstash was completed, I've managed to get my_index-2017, my_index-2018 and my_index-2019 indices with 5 shards, and 1 replica and correct mapping as I predefined in my template.

logstash output to elasticsearch index and mapping

I'm trying to have logstash output to elasticsearch but I'm not sure how to use the mapping I defined in elasticsearch...
In Kibana, I did this:
Created an index and mapping like this:
PUT /kafkajmx2
{
"mappings": {
"kafka_mbeans": {
"properties": {
"#timestamp": {
"type": "date"
},
"#version": {
"type": "integer"
},
"host": {
"type": "keyword"
},
"metric_path": {
"type": "text"
},
"type": {
"type": "keyword"
},
"path": {
"type": "text"
},
"metric_value_string": {
"type": "keyword"
},
"metric_value_number": {
"type": "float"
}
}
}
}
}
Can write data to it like this:
POST /kafkajmx2/kafka_mbeans
{
"metric_value_number":159.03478490788203,
"path":"/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"#timestamp":"2017-02-12T23:08:40.934Z",
"#version":"1","host":"localhost",
"metric_path":"node1.kafka.server:type=BrokerTopicMetrics,name=TotalFetchRequestsPerSec.FifteenMinuteRate",
"type":null
}
now my logstash output looks like this:
input {
kafka {
kafka details here
}
}
output {
elasticsearch {
hosts => "http://elasticsearch:9050"
index => "kafkajmx2"
}
}
and it just writes it to the kafkajmx2 index but doesn't use the map, when I query it like this in kibana:
get /kafkajmx2/kafka_mbeans/_search?q=*
{
}
I get this back:
{
"_index": "kafkajmx2",
"_type": "logs",
"_id": "AVo34xF_j-lM6k7wBavd",
"_score": 1,
"_source": {
"#timestamp": "2017-02-13T14:31:53.337Z",
"#version": "1",
"message": """
{"metric_value_number":0,"path":"/home/usrxxx/logstash-5.2.0/bin/jmxconf","#timestamp":"2017-02-13T14:31:52.654Z","#version":"1","host":"localhost","metric_path":"node1.kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSec.Count","type":null}
"""
}
}
how do I tell it to use the map kafka_mbeans in the logstash output?
-----EDIT-----
I tried my output like this but still get the same results:
output {
elasticsearch {
hosts => "http://10.204.93.209:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
codec => plain {
format => "%{message}"
}
}
}
the data in elastic search should look like this:
{
"#timestamp": "2017-02-13T14:31:52.654Z",
"#version": "1",
"host": "localhost",
"metric_path": "node1.kafka.server:type=SessionExpireListener,name=ZooKeeperAuthFailuresPerSec.Count",
"metric_value_number": 0,
"path": "/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"type": null
}
--------EDIT 2--------------
I atleast got the message to parse into json by adding a filter like this:
input {
kafka {
...kafka details....
}
}
filter {
json {
source => "message"
remove_field => ["message"]
}
}
output {
elasticsearch {
hosts => "http://node1:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
}
}
It doesn't use the template still but this atleast parses the json correctly...so now I get this:
{
"_index": "kafkajmx2",
"_type": "logs",
"_id": "AVo4a2Hzj-lM6k7wBcMS",
"_score": 1,
"_source": {
"metric_value_number": 0.9967205071482902,
"path": "/home/usrxxx/logstash-5.2.0/bin/jmxconf",
"#timestamp": "2017-02-13T16:54:16.701Z",
"#version": "1",
"host": "localhost",
"metric_path": "kafka1.kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent.Value",
"type": null
}
}
What you need to change is very simple. First use the json codec in your kafka input. No need for the json filter, you can remove it.
kafka {
...kafka details....
codec => "json"
}
Then in your elasticsearch output you're missing the mapping type (parameter document_type below), which is important otherwise it defaults to logs (as you can see) and that doesn't match your kafka_mbeans mapping type. Moreover, you don't really need to use template since your index already exists. Make the following modification:
elasticsearch {
hosts => "http://node1:9050"
index => "kafkajmx2"
document_type => "kafka_mbeans"
}
This is defined with the template_name parameter on the elasticsearch output.
elasticsearch {
hosts => "http://elasticsearch:9050"
index => "kafkajmx2"
template_name => "kafka_mbeans"
}
One warning, though. If you want to start creating indexes that are boxed on time, such as one index a week, you will have to take a few more steps to ensure your mapping stays with each. You have a couple of options there:
Create an elasticsearch template, and define it to apply to indexes using a glob. Such as kafkajmx2-*
Use the template parameter on the output, which specifies a JSON file that defines your mapping that will be used with all indexes created through that output.

How to use Mutate/Convert in logstash config file for nested fields in Json file

I have below JSON as input to logstash.
{
"totalTurnoverUSD":11111.456,
"children":[
{
"totalTurnoverUSD":11100.456,
"children":[
{
"totalTurnoverUSD":11.00,
"children":[
]
}
]
}
]
}
And using below config file to output that to elasticSearch and stdout.
input {
file {
type => $type
path => $filePathofJsonFile
codec => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
ignore_older => 0
close_older => 2
max_open_files => 10
}
}
filter {
mutate {
convert => { "totalTurnoverUSD" => "string" }
}
}
output {
elasticsearch{
hosts => $elasticHost
index =>"123"
}
stdout {
codec => rubydebug
}
}
But getting below error message
"error"=>{"type"=>"illegal_argument_exception", "reason"=>"mapper [children.totalTurnoverUSD] of different type, current_type [long], merged_type [double]"}}}, :level=>:warn}
because I am not converting totalTurnoverUSD field in the nested children document of JSON input file.
So, is there any way available to access nested fields in the JSON document for mutating them to converts their datatype to String.
One way to solve this is to let Logstash send whatever numeric type of totalTurnoverUSD it comes up with, but then to use an dynamic template in Elasticsearch.
You can modify your index like this:
PUT my_index
{
"mappings": {
"my_type": {
"dynamic_templates": [
{
"full_name": {
"path_match": "*.totalTurnoverUSD",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
}
What this is going to achieve is that whenever indexing any document into that index, any field named totalTurnoverUSD at any level in the document, will get the type keyword.
You might need to delete your index first and recreate it from scratch, but try it out without deleting it first.
UPDATE
if you want to apply this to all your indices, you can create an index template like this:
PUT _template/all_indices
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"full_name": {
"path_match": "*.totalTurnoverUSD",
"mapping": {
"type": "keyword"
}
}
}
]
}
}
}
As a result, all mapping type in all indices will get the dynamic template for totalTurnoverUSD

ElasticSearch 5.0.0 - error about object name is already in use

I am learning ElasticSearch and have hit a block. I am trying to use logstash to load a simple CSV into ElasticSearch. This is the data, it is a postcode, longitude, latitude
ZE1 0BH,-1.136758103355,60.150855671143
ZE1 0NW,-1.15526666950369,60.1532197533966
I am using the following logstash conf file to filter the CSV to create a "location" field
input {
file {
path => "postcodes.csv"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
columns => ["postcode", "lat", "lon"]
separator => ","
}
mutate { convert => {"lat" => "float"} }
mutate { convert => {"lon" => "float"} }
mutate { rename => {"lat" => "[location][lat]"} }
mutate { rename => {"lon" => "[location][lon]"} }
mutate { convert => { "[location]" => "float" } }
}
output {
elasticsearch {
action => "index"
hosts => "localhost"
index => "postcodes"
}
stdout { codec => rubydebug }
}
And I have added the mapping to ElasticSearch using the console in Kibana
PUT postcodes
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"feature": {
"_all": { "enabled": true },
"properties": {
"postcode": {"type": "text"},
"location": {"type": "geo_point"}
}
}
}
}
I check the mappins for the index using
GET postcodes/_mapping
{
"postcodes": {
"mappings": {
"feature": {
"_all": {
"enabled": true
},
"properties": {
"location": {
"type": "geo_point"
},
"postcode": {
"type": "text"
}
}
}
}
}
}
So this all seems to be correct having looked at the documentation and the other questions posted.
However when i run
bin/logstash -f postcodes.conf
I get an error:
[location] is defined as an object in mapping [logs] but this name is already used for a field in other types
I have tried a number of alternative methods;
Deleted the index and the create a template.json and changed my conf file to have the extra settings:
manage_template => true
template => "postcode_template.json"
template_name =>"open_names"
template_overwrite => true
and this gets the same error.
I have managed to get the data loaded by not supplying a template however the data never gets loaded in as a geo_point so you cannot use the Kibana Tile Map to visualise the data
Can anyone explain why I am receiving that error and what method I should use?
Your problem is that you don't have a document_type => feature on your elasticsearch output. Without that, it's going to create the object on type logs which is why you are getting this conflict.

ELK - Kibana doesn't recognize geo_point field

I'm trying to create a Tile map on Kibana, with GEO location points.
For some reason, When I'm trying to create the map, I get the following message on Kibana:
No Compatible Fields: The "logs" index pattern does not contain any of
the following field types: geo_point
My settings:
Logstash (version 2.3.1):
filter {
grok {
match => {
"message" => "MY PATTERN"
}
}
geoip {
source => "ip"
target => "geoip"
add_field => [ "location", "%{[geoip][latitude]}, %{[geoip][longitude]}" ] #added this extra field in case the nested field is the problem
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => ["localhost:9200"]
index => "logs"
}
}
When log input arrives, I can see it parse it as should and I do get the geoIp data for a given IP:
"geoip" => {
"ip" => "XXX.XXX.XXX.XXX",
"country_code2" => "XX",
"country_code3" => "XXX",
"country_name" => "XXXXXX",
"continent_code" => "XX",
"region_name" => "XX",
"city_name" => "XXXXX",
"latitude" => XX.0667,
"longitude" => XX.766699999999986,
"timezone" => "XXXXXX",
"real_region_name" => "XXXXXX",
"location" => [
[0] XX.766699999999986,
[1] XX.0667
]
},
"location" => "XX.0667, XX.766699999999986"
ElasticSearch (version 2.3.1):
GET /logs/_mapping returns:
{
"logs": {
"mappings": {
"logs": {
"properties": {
"#timestamp": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
.
.
.
"geoip": {
"properties": {
.
.
.
"latitude": {
"type": "double"
},
"location": {
"type": "geo_point"
},
"longitude": {
"type": "double"
}
}
},
"location": {
"type": "geo_point"
}
}
}
}
}
}
Kibana (version 4.5.0):
I do see all the data and everything seems to be fine.
Just when I go to "Visualize" -> "Tile map" -> "From a new search" -> "Geo Coordinates", I get this error message:
No Compatible Fields: The "logs" index pattern does not contain any of the following field types: geo_point
Even tho I see in elasticsearch mapping that the location type is geo_point.
What am I missing?
Found the issue!
I called the index "logs". changed the index name to "logstash-logs" (need logstash-* prefix) and everything started to function!

Resources