How to upload data to an existing index with Logstash? - elasticsearch

I am trying to insert data from a .csv-File to an already existing index (that already has data) using Logstash.
Anyway this is my logstash_true.config File:
input {
file {
path => "pathToFile"
start_position => "beginning"
sincedb_path => "/dev/null"
}
}
filter {
csv {
separator => ","
columns => ["title", "text", "subject", "date"]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
hosts => "127.0.0.1:9200"
index => "news"
document_type => "true_news"
document_id => "%{id}"
}
}
When uploading the data, I can see in the command line that there is nothing wrong with the file or the Data and the document_type true_news actually exist.
But when trying the get the data:
{
"count" : 0,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
The data wasn't loaded.
UPDATE
when enabling debugging i get the following error:
Could not index event to Elasticsearch. {:status=>400, :action=>
["index", {:_id=>"%{id}", :_index=>"news", :routing=>nil,
:_type=>"true_news"}, #<LogStash::Event:0x7e10d60f>], :response=>
{"index"=>{"_index"=>"news", "_type"=>"true_news", "_id"=>"%{id}",
"status"=>400, "error"=>{"type"=>"illegal_argument_exception",
"reason"=>"Rejecting mapping update to [news] as the final mapping
would have more than 1 type: [fake_news, true_news]"}}}}

Since Elasticsearch version 6.0 you can't have multiple types in your index.
It seems that your index news already have documents or mapping with the type fake_news and you are trying to insert documents with the type true_news, this is not possible, that's why you are getting this error:
"type"=>"illegal_argument_exception",
"reason"=>"Rejecting mapping update to [news] as the final mapping
would have more than 1 type: [fake_news, true_news]"
Since you can have only 1 type and you want to be able to distinguish between true_news and fake_news, it is better to recreate your index to use the default type, doc, for every document, and add a tag with true_news or fake_news to your documents using the config add_tag => ["tag"] in your inpus.

Related

Logstash unable to index into elasticsearch because it can't parse date

I am getting a lot of the following errors when I am running logstash to index documents into Elasticsearch
[2019-11-02T18:48:13,812][WARN ][logstash.outputs.elasticsearch] Could not index event to Elasticsearch. {:status=>400, :action=>["index", {:_id=>nil, :_index=>"my-index-2019-09-28", :_type=>"doc", :_routing=>nil}, #<LogStash::Event:0x729fc561>], :response=>{"index"=>{"_index"=>"my-index-2019-09-28", "_type"=>"doc", "_id"=>"BhlNLm4Ba4O_5bsE_PxF", "status"=>400, "error"=>{"type"=>"mapper_parsing_exception", "reason"=>"failed to parse field [timestamp] of type [date] in document with id 'BhlNLm4Ba4O_5bsE_PxF'", "caused_by"=>{"type"=>"illegal_argument_exception", "reason"=>"Invalid format: \"2019-09-28 23:32:10.586\" is malformed at \" 23:32:10.586\""}}}}}
It clearly has a problem with the date being formed but I don't see what that problem could be. Below are excerpts from my logstash config and the elasticsearch template. I include these because I'm trying to use the timestamp field to articulate the index in my logstash config by copying timestamp into #timestamp then formatting that to a YYY-MM-DD format and use that stored metadata to articulate my index
Logstash config:
input {
stdin { type => stdin }
}
filter {
csv {
separator => " " # this is a tab (/t) not just whitespace
columns => ["timestamp","field1", "field2", ...]
convert => {
"timestamp" => "date_time"
...
}
}
}
filter {
date {
match => ["timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
target => "#timestamp"
}
}
filter {
date_formatter {
source => "#timestamp"
target => "[#metadata][date]"
pattern => "YYYY-MM-dd"
}
}
filter {
mutate {
remove_field => [
"#timestamp",
...
]
}
}
output {
amazon_es {
hosts =>
["my-es-cluster.us-east-1.es.amazonaws.com"]
index => "my-index-%{[#metadata][date]}"
template => "my-config.json"
template_name => "my-index-*"
region => "us-east-1"
}
}
Template:
{
"template" : "my-index-*",
"mappings" : {
"doc" : {
"dynamic" : "false",
"properties" : {
"timestamp" : {
"type" : "date"
}, ...
},
"settings" : {
"index" : {
"number_of_shards" : "12",
"number_of_replicas" : "0"
}
}
}
When I inspect the raw data it looks like what the error is showing me and that appears to be well formed so I'm not sure what my issue is
Here is an example row, it's been redacted but the problem field is untouched and is the first one
2019-09-28 07:29:46.454 NA 2019-09-28 07:29:00 someApp 62847957802 62847957802
Turns out the source problem was the convert block. logstash is unable to understand the time format specified in the file. To address this I changed the original timestamp field to unformatted_timestamp and apply the date formatter I was already using
filter {
date {
match => ["unformatted_timestamp", "yyyy-MM-dd' 'HH:mm:ss'.'SSS'"]
target => "timestamp"
}
}
filter {
date_formatter {
source => "timestamp"
target => "[#metadata][date]"
pattern => "YYYY-MM-dd"
}
}
You are parsing your lines using the csv filter and setting the separator to a space, but your date is also split by a space, this way your first field, named timestamp only gets the date 2019-09-28 and the time is on the field named field1.
You can solve your problem creating a new field named date_and_time with the contents of the fields with the date and the time, for example.
csv {
separator => " "
columns => ["date","time","field1","field2","field3","field4","field5","field6"]
}
mutate {
add_field => { "date_and_time" => "%{date} %{time}" }
}
mutate {
remove_field => ["date","time"]
}
This will create a field named date_and_time with the value 2019-09-28 07:29:46.454, you can now use the date filter to parse this value into the #timestamp field, the default for logstash.
date {
match => ["date_and_time", "YYYY-MM-dd HH:mm:ss.SSS"]
}
This will leave you with two fields with the same value, date_and_time and #timestamp, the #timestamp is the default for logstash so I would suggest keeping it and removing the date_and_time that was created before.
mutate {
remove_field => ["date_and_time"]
}
Now you can create your date based index using the format YYYY-MM-dd and logstash will extract the date from the #timestamp field, just change your index line in your output for this one:
index => "my-index-%{+YYYY-MM-dd}"

Elasticsearch change field type from filter dissect

I use logstash-logback-encoder to send java log files to logstash, and then to elasticsearch. To parse the message in java log, I use following filter to dissect message
input {
file {
path => "/Users/MacBook-201965/Work/java/logs/oauth-logstash.log"
start_position => "beginning"
codec => "json"
}
}
filter {
if "EXECUTION_TIME" in [tags] {
dissect {
mapping => {
"message" => "%{endpoint} timeMillis:[%{execution_time_millis}] data:%{additional_data}"
}
}
mutate {
convert => { "execution_time_millis" => "integer" }
}
}
}
output {
elasticsearch {
hosts => "localhost:9200"
index => "elk-%{+YYYY}"
document_type => "log"
}
stdout {
codec => json
}
}
It dissect the message so I can get value of execution_time_millis. However the data type is string. I created the index using Kibana index pattern. How can I change the data type of execution_time_millis into long?
Here is the sample json message from logback
{
"message":"/tests/{id} timeMillis:[142] data:2282||0:0:0:0:0:0:0:1",
"logger_name":"com.timpamungkas.oauth.client.controller.ElkController",
"level_value":20000,
"endpoint":"/tests/{id}",
"execution_time_millis":"142",
"#version":1,
"host":"macbook201965s-MacBook-Air.local",
"thread_name":"http-nio-8080-exec-7",
"path":"/Users/MacBook-201965/Work/java/logs/oauth-logstash.log",
"#timestamp":"2018-01-04T11:20:20.100Z",
"level":"INFO",
"tags":[
"EXECUTION_TIME"
],
"additional_data":"2282||0:0:0:0:0:0:0:1"
}{
"message":"/tests/{id} timeMillis:[110] data:2280||0:0:0:0:0:0:0:1",
"logger_name":"com.timpamungkas.oauth.client.controller.ElkController",
"level_value":20000,
"endpoint":"/tests/{id}",
"execution_time_millis":"110",
"#version":1,
"host":"macbook201965s-MacBook-Air.local",
"thread_name":"http-nio-8080-exec-5",
"path":"/Users/MacBook-201965/Work/java/logs/oauth-logstash.log",
"#timestamp":"2018-01-04T11:20:19.780Z",
"level":"INFO",
"tags":[
"EXECUTION_TIME"
],
"additional_data":"2280||0:0:0:0:0:0:0:1"
}
Thank you
If you have already indexed the documents, you'll have to reindex the data after changing the datatype of any field.
However, you can use something like this to change the type of millis from string to integer. (long is not supported in this)
https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-convert
Also, try defining elasticsearch template before creating index if are going to add multiple index with index names having some regex pattern.Else, you can define your index format beforehand too an then start indexing.

Importing csv file into elastic search

I am trying to import a huge csv file to elastic serach.Trying to use logstash for the same.
Sample csv file [Note:multiple values ]
Shop_name,Review_Title,Review_Text,,,,
Accord ,Excellent ,Nice Collection.,,,,,
Accord , Bad ,Not too comfortable,,,
Accord , Good ,excellent location and Staff,,,
Accord , Good ,Great Colletion,,,
Shopon,good, staff very good ,,,
Harrisons ,Spacious,Nice Colletion
Logstash congiguration
input {
file {
path => ["shopreview.csv"]
start_position => "beginning"
}
}
filter {
csv {
columns => [
"Shop_name",
"Review_Title",
"Review_Text"
]
}
}
output {
stdout { codec => rubydebug }
elasticsearch {
action => "index"
hosts => ["127.0.0.1:9200"]
index => "reviews"
document_type => "shopreview"
document_id => "%{Shop_name}"
workers => 1
}
}
here when i query for review i should get all the reviews for the particular shop.
Issue
while I query with localhost:9020/review/shopreview/Accord I am not getting all values .only 1 entry. Is the config missing something . I am little new to elk stack

Creating/updating array of objects in elasticsearch logstash output

I am facing an issue using elastic search output with logstash. Here is my sample event
{
"guid":"someguid",
"nestedObject":{
"field1":"val1",
"field2":"val2"
}
}
I expect the document with id to already be present in elasticsearch when this update happens.
Here is what I want to have in my elastic search document after 2 upserts:
{
"oldField":"Some old field from original document before upserts."
"nestedObjects":[{
"field1":"val1",
"field2":"val2"
},
{
"field3":"val3",
"field4":"val4"
}]
}
Here is my current elastic search output setting:
elasticsearch {
index => "elastictest"
action => "update"
document_type => "summary"
document_id => "%{guid}"
doc_as_upsert => true
script_lang => "groovy"
script_type => "inline"
retry_on_conflict => 3
script => "
if (ctx._source.nestedObjects) {
ctx._source.nestedObjects += event.nestedObject
} else {
ctx._source.nestedObjects = [event.nestedObject]
}
"
}
Here is the error I am getting:
response=>{"update"=>{"_index"=>"elastictest", "_type"=>"summary",
"_id"=>"64648dd3-c1e9-45fd-a00b-5a4332c91ee9", "status"=>400,
"error"=>{"type"=>"mapper_parsing_exception",
"reason"=>"failed to parse [event.nestedObject]",
"caused_by"=>{"type"=>"illegal_argument_exception",
"reason"=>"unknown property [field1]"}}}}
The issue turned out to be internally generated mapping in elasticsearch due to other documents with the same document_type with conflicting type on the field nestedObject. This caused elastic to throw a mapper parsing exception. Fixing this, fixed this issue.

How to split a large json file input into different elastic search index?

The input to logstash is
input {
file {
path => "/tmp/very-large.json"
type => "json"
start_position => "beginning"
sincedb_path => "/dev/null"
}
and sample json file
{"type":"type1", "msg":"..."}
{"type":"type2", "msg":"..."}
{"type":"type1", "msg":"..."}
{"type":"type3", "msg":"..."}
Is it possible to make them feed into different elastic search index, so I can process them easier in the future?
I know if it is possible to assign them with a tag, then I can do something like
if "type1" in [tags] {
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-type1%{+YYYY.MM.dd}"
flush_size => 50
}
}
How to do similar thing by looking at a specific json field value, e.g. type in my above example?
Even simpler, just use the type field to build the index name like this:
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-%{type}%{+YYYY.MM.dd}"
flush_size => 50
}
You can compare on any fields. You'll have to first parse your json with the json filter or codec.
Then you'll have a type field to work on, like this:
if [type] == "type1" {
elasticsearch {
...
index => "logstash-type1%{+YYYY.MM.dd}"
}
} else if [type] == "type2" {
elasticsearch {
...
index => "logstash-type2%{+YYYY.MM.dd}"
}
} ...
Or like in Val's answer:
elasticsearch {
hosts => ["localhost:9200"]
action => "index"
index => "logstash-%{type}%{+YYYY.MM.dd}"
flush_size => 50
}

Resources