Add documents to elasticsearch if it does not exists - elasticsearch

I transferred some data from a log generated every day to elasticsearch using logstash, and my logstash output section looks like :
i keep the same id (id_ot) in both my log file and elasticsearch, but what i would like to do is : if the new coming id ( id_ot) already exists in elasticseach, so i will not insert it. How can i do that in logstash ?
Any help would be really appriciated !

You simply need to add the action => create parameter and if a document already exists with that ID, it is not indexed
output {
elasticsearch {
...
action => "create"
}
}

What you are asking for is an upsert , create the document if dosent exist or update an existing one .
elastic supports this via doc_as_upsert option
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#doc_as_upsert
on the right hand side in the link you can choose the elastic version that matches your version.

Related

nifi: How to specify dynamic index name when sending data to elasticsearch

I am new to apache NiFi.
I am trying to put data into elasticsearch using nifi.
I want to specify an index name by combining a specific string and the value converted from a timestamp field into date format.
I was able to create the desired shape with the expression below, but failed to create the index name with the value of the timestamp field of the content.
${now():format('yyyy-MM-dd')}
example json data
{
"timestamp" :1625579799000,
"data1": "abcd",
"date2": 12345
}
I would like to get the following result:
index : "myindex-2021.07.06"
What should I do? please tell me how
I know that if you use the PutElasticSearch Processor, you can provide it with a specific index name to use. And as long as the index name meets the proper ElasticSearch format for naming a new index, if the enable auto index creation in ElasticSearch is turned on, then when sent, Elastic will create the new index with the desired name. This has worked for me. Double check the Elastic Naming Rules that can be found here or https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-indexing.html

Add _id to the source as a separate field to all exist docs in index

I'm new to Elastic Search. I need go through all the documents, take the _id and add it to the _source as a separate field by script. Is it possible? If yes, сan I have an example of something similar or a link to similar scripts? I haven't seen anything like that on the docks. Why i need it? - Because after that i will do SELECT with Opendistro and SQL. This frame cannot return me fields witch not in source. If anyone can suggest I would be very grateful.
There are two options:
First option: Add this new field in your existing index and populate it and build the new index again.
Second option: Simply define a new field in a new index mapping(keep rest all field same) and than use reindex API with below script.
"script": {
"source": "ctx._source.<your-field-name> = ctx._id"
}

Logstash replace old index

I'm using logstash to create an elastic index. The steps are :
1. logstash start
2. datas are retrieve with a jdbc input plugin
3. datas are indexed with an elasticsearch output plugin (with a template includes an alias)
4. logstash stop
The time, I've got an index call myindex-1 which can be requested with the alias myindex.
The second time, I've got an index call myindex-2 which can be requested with the alias myindex. The first index is now deprecated and I need to delete it just before (or after the step 4).
Do you know how to do this ?
First things first, if you know the deprecated index name, then it's just a question of adding a step 5:
curl -XDELETE 'http://localhost:9200/myindex-1'
So you'd wrap your logstash run into a script with this additional step - as to my knowledge there is no option for logstash to delete an index, it's simply not its purpose.
But from the way you describe your situation, it seems you're trying to keep the data available during the new index creation, could you elaborate a bit on your use case?
Reason for the asking is that with the current procedure, you're likely to end up with duplicate data (old and new version) during the indexing period.
If there is indeed a need to refresh the data, and assuming that you have an id in the data retrieved from the DB,
you might consider another approach: configuring 2 elasticsearch outputs in your logstash,
first one with action set to "delete" targeting the old entry in previous index,
second being your standard create into new index.
Depending on the nature of your data, there might also be other possibilities.
Create and populate myindex-2, don't alias it yet
Simultaneously add alias to myindex-2 and remove it from myalias-1
REST request for step 2:
POST /_aliases
{
"actions" : [
{ "remove" : { "index" : "myindex-1", "alias" : "myindex" } },
{ "add" : { "index" : "myindex-2", "alias" : "myindex" } }
]
}
Documentation here

logstash metadata not passed to elasticsearch

I am trying to follow the example https://www.digitalocean.com/community/tutorials/how-to-install-elasticsearch-logstash-and-kibana-elk-stack-on-centos-7
But the index name set by 30-elasticsearch-output.conf is not being resolved. In the example 30-elasticsearch-output.conf file:
index => "%{[#metadata][beat]}-%{+YYYY.MM.dd}"
In my case, the result elasticsearch index name is:
"%{[#metadata][beat]}-2016.09.07"
Only the date portion of the index name is set correctly.
What is responsible for setting the metadata value? I must have missed something in following the example.
This is related to a question asked earlier: ELK not passing metadata from filebeat into logstash
You can create index like this
index => "%{[beat][name]}-%{+YYYY.MM.dd}"
This would work definitely.

Create a new index per day for Elasticsearch in Logstash configuration

I intend to have an ELK stack setup where daily JSON inputs get stored in log files created, one for each date. My logstash shall listen to the input via these logs and store it to Elasticsearch at an index corresponding to the date of the log file entry.
My logstash-output.conf goes something like:
output {
elasticsearch {
host => localhost
cluster => "elasticsearch_prod"
index => "test"
}
}
Thus, as for now, all the inputs to logstash get stored at index test of elasticsearch. What I want is that an entry to logstash occurring on say, 2015.11.19, which gets stored in logfile named logstash-2015.11.19.log, must be correspondingly stored at an index test-2015.11.19.
How should I edit my logstash configuration file to enable this ?
Answer because the comment can't be formatted and it looks awful.
Your filename ( I assume you use a file input ) is stored in your path variable as such:
file {
path => "/logs/**/*my_log_file*.log"
}
type => "myType"
}
This variable is accessible throughout your whole configuration, so what you can do is use a regex filter to parse your date out of the path, for example using grok, you could do something like that (look out: Pseudocode)
if [type] == "myType" {
grok {
match => {
"path" => "%{MY_DATE_PATTERN:myTimeStampVar}"
}
}
}
With this you now have your variable in "myTimeStampVar" and you can use it in your output:
elasticsearch {
host => "127.0.0.1"
cluster => "logstash"
index => "events-%{myTimeStampVar}"
}
Having said all this, I am not quite sure why you need this? I think it is better to have ES do the job for you. It will know the timestamp of your log and index it accordingly so you have easy access to it. However, the setup above should work for you, I used a very similar approach to parse out a client name and create sub-indexes on a per-client bases, for example: myIndex-%{client}-%{+YYYY.MM.dd}
Hope this helps,
Artur
Edit: I did some digging because I suspect that you are worried your logs get put in the wrong index because they are parsed at the wrong time? If this is correct, the solution is not to parse the index out of the log file, but to parse the timestamp out of each log.
I assume each log line for you has a timestamp. Logstash will create an #timestamp field which is the current date. So this would be not equal to the index. However, the correct way to solve this, is to mutate the #timestamp field and instead use the timestamp in your log line (the parsed one). That way logstash will have the correct index and put it there.

Resources