update multiple records in elastic using logstash - elasticsearch

Hi guy i have issue with updating multiple records in elastic using logstash.
My logstash configuration is bellow
output {
elasticsearch {
hosts => "******"
user => "xxxxx"
password => "yyyyyy"
index => "index_name"
document_type => "doc_type"
action => "update"
script_lang => "painless"
script_type => "inline"
document_id => "%{Id}"
script => 'ctx._source.Tags = params.event.get("Tags");'
}
}
My output to logstash dump folder looks like:
{"index_name":"feed_name","doc_type":"doc_type","Id":["b504d808-f82d-4eaa-b192-446ec0ba487f", "1bcbc54f-fa7a-4079-90e7-71da527f56a5"],"es_action":"update","Tags": ["tag1","tag2"]}
My biggest issue here is that I am not able to update those two recods at once but I need to create two records each with different ID.
Is there a why to solve this by writing query in my output configuration?
In sql that would look someting like this:
Update Table
SET Tags
WHERE ID in (guid1, guid2)
I know that in this case I can add two records in logstash and problem solved but I need to solve second issue where I need to replace all records that have one tag1 and give it newTag.

Have you considered to use the split filter in order to clone the event in events with one id each one? It seems the filter can help you.

Related

'tags' in logstash modifying 'tags' column from sql query

I have a LogStash setup that will fetch data from Postgres. The problem is, I have a column called "tags" in SQL which is an array of strings. But during the insertion of data, the Logstash will append the "Logstash tag" into the tags array of my column. Is there any way to override that?
jdbc {
tags => "color_list"
jdbc_connection_string => "jdbc:postgresql://${DB_HOST}:${DB_PORT}/${DB_NAME}"
jdbc_user => "${DB_USER}"
jdbc_password => "${DB_PASSWORD}"
schedule => "* * * * *"
jdbc_driver_class => "org.postgresql.Driver"
statement => "SELECT tags
from color_table;"
In my table, the tags column is empty. So I am expecting an empty array. But am receiving [color_list] instead of []. Is there any way to override that?
Logstash manages "tags" on each event. It is some kind of metadata, that you can manupulate in your pipeline : all filters have add_tag and remove_tag options, for example. Some filters will automatically add tags on failure (for example: grokparsefailure if the grok pattern doesn't match on the event's contents).
I would advise to not attempt to use the tags field for anything else.
I suggest you rename your field from the DB instead. How about this ?
statement => "SELECT tags as db_tags
from color_table;"
Then in your doument you can process the [db_tags] field as you expect, and leave the [tags] field for logstash.

Elasticsearch maintenance an unique _id across the aliases

We have ES data where we have several indexes belong to the same alias. One of them is a written index.
How can we keep the _id of documents is unique across the indexes belong to the same alias?
We are right now having a duplicated _id on our alias. Each index has 1 record of the same id. We only want the lastest record of that _id on our data, the newer will overwrite the older.
If i correctly understand the problem, you can have uniqueness of data by using _id value as a fingerprint value via logstash [ assuming its being used].
You can have something like the below in your logstash filter:
fingerprint{
source => ["session_id"]
method => "SHA1"
}
This value in the fingerprint field can then be used to put the data in an index and updated on top of an already existing document.
Below is an example of output section in logstash:
elasticsearch {
hosts => ["http://elasticsearch:9200"]
index => "indexname"
action => "update"
document_id => "%{fingerprint}"
doc_as_upsert => true
}

Create index in kibana without using kibana

I'm very new to the elasticsearch-kibana-logstash and can't seem to find solution for this one.
I'm trying to create index that I will see in kibana without having to use the POST command in Dev Tools section.
I have set test.conf-
input {
file {
path => "/home/user/logs/server.log"
type => "test-type"
start_position => "beginning"
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "new-index"
}
}
and then
bin/logstash -f test.conf from logstash directory
what i get is that I can't find the new-index in kibana (index patterns section), when I use elasticsearch - http://localhost:9200/new-index/ it presents an error and when I go to http://localhost:9600/ (the port it's showing) it doesn't seem to have any errors
Thanks a lot for the help!!
It's obvious that you won't be able to find the index which you've created using logstash in Kibana, unless you're manually creating it there within the Managemen section of Kibana.
Make sure, that you have the same name of the indice which you created using logstash. Have a look at the doc, which conveys:
When you define an index pattern, indices that match that pattern must
exist in Elasticsearch. Those indices must contain data.
which pretty much says that the indice should exist for you to create the index in Kibana. Hope it helps!
I have actually succeeded to create index even without first creating it in Kibana
I used the following config file -
input {
file {
path => "/home/hadar/tmp/logs/server.log"
type => "test-type"
id => "NEWTRY"
start_position => "beginning"
}
}
filter {
grok {
match => { "message" => "%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day} %{HOUR:hour}:%{MINUTE:minute}:%{SECOND:second} - %{LOGLEVEL:level} - %{WORD:scriptName}.%{WORD:scriptEND} - " }
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "new-index"
codec => line { format => "%{year}-%{month}-%{day} %{hour}:%{minute}:%{second} - %{level} - %{scriptName}.%{scriptEND} - \"%{message}\"" }
}
}
I made sure that the index wasn't already in Kibana (I tried with other indexes names too just to be sure...) and eventually I did see the index with the log's info in both Kibana (I added it in the index pattern section) and Elasticsearch when I went to http://localhost:9200/new-index
The only thing I should have done was to erase the .sincedb_XXX files which are created under data/plugins/inputs/file/ after every Logstash run
OR
the other solution (for tests environment only) is to add sincedb_path=>"/dev/null" to the input file plugin which indicates to not create the .sincedb_XXX file
You can create directly index in elastic search using https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html
and these indices can be used in Kibana.

Logstash doc_as_upsert cross index in Elasticsearch to eliminate duplicates

I have a logstash configuration that uses the following in the output block in an attempt to mitigate duplicates.
output {
if [type] == "usage" {
elasticsearch {
hosts => ["elastic4:9204"]
index => "usage-%{+YYYY-MM-dd-HH}"
document_id => "%{[#metadata][fingerprint]}"
action => "update"
doc_as_upsert => true
}
}
}
The fingerprint is calculated from a SHA1 hash of two unique fields.
This works when logstash sees the same doc in the same index, but since the command that generates the input data doesn't have a reliable rate at which different documents appear, logstash will sometimes insert duplicates docs in a different date stamped index.
For example, the command that logstash runs to get the input generally returns the last two hours of data. However, since I can't definitively tell when a doc will appear/disappear, I tun the command every fifteen minutes.
This is fine when the duplicates occur within the same hour. However, when the hour or day date stamp rolls over, and the document still appears, elastic/logstash thinks it's a new doc.
Is there a way to make the upsert work cross index? These would all be the same type of doc, they would simply apply to every index that matches "usage-*"
A new index is an entirely new keyspace and there's no way to tell ES to not index two documents with the same ID in two different indices.
However, you could prevent this by adding an elasticsearch filter to your pipeline which would look up the document in all indices and if it finds one, it could drop the event.
Something like this would do (note that usages would be an alias spanning all usage-* indices):
filter {
elasticsearch {
hosts => ["elastic4:9204"]
index => "usages"
query => "_id:%{[#metadata][fingerprint]}"
fields => {"_id" => "other_id"}
}
# if the document was found, drop this one
if [other_id] {
drop {}
}
}

To copy an index from one machine to another in elasticsearch

I have some indexes in one of my machines. I need to copy them to another machine, how can i do that in elasticsearch.
I did get some good documentation here, but since im an newbie to elasticsearch ecosystem and since im toying with lesser data indices, I thought I would use some plugins or ways which would be less time consuming.
I would use Logstash with an elasticsearch input plugin and an elasticsearch output plugin.
After installing Logstash, you can create a configuration file copy.conf that looks like this:
input {
elasticsearch {
hosts => "localhost:9200" <--- source ES host
index => "source_index"
}
}
filter {
mutate {
remove_field => [ "#version", "#timestamp" ] <--- remove added junk
}
}
output {
elasticsearch {
host => "localhost" <--- target ES host
port => 9200
protocol => "http"
manage_template => false
index => "target_index"
document_id => "%{id}" <--- name of your ID field
workers => 1
}
}
And then after setting the correct values (source/target host + source/target index), you can run this with bin/logstash -f copy.conf
I can see 3 options here
Snapshot/Restore - You can move your data across geographical locations.
Logstash reindex - As pointed out by Val
Stream2ES - This is a more simpler solution
You can use Snapshot and restore feature as well, where you can take snapshot (backup) of one index and then can Restore to somewhere else.
Just have a look at
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html

Resources