Logstash replace old index - elasticsearch

I'm using logstash to create an elastic index. The steps are :
1. logstash start
2. datas are retrieve with a jdbc input plugin
3. datas are indexed with an elasticsearch output plugin (with a template includes an alias)
4. logstash stop
The time, I've got an index call myindex-1 which can be requested with the alias myindex.
The second time, I've got an index call myindex-2 which can be requested with the alias myindex. The first index is now deprecated and I need to delete it just before (or after the step 4).
Do you know how to do this ?

First things first, if you know the deprecated index name, then it's just a question of adding a step 5:
curl -XDELETE 'http://localhost:9200/myindex-1'
So you'd wrap your logstash run into a script with this additional step - as to my knowledge there is no option for logstash to delete an index, it's simply not its purpose.
But from the way you describe your situation, it seems you're trying to keep the data available during the new index creation, could you elaborate a bit on your use case?
Reason for the asking is that with the current procedure, you're likely to end up with duplicate data (old and new version) during the indexing period.
If there is indeed a need to refresh the data, and assuming that you have an id in the data retrieved from the DB,
you might consider another approach: configuring 2 elasticsearch outputs in your logstash,
first one with action set to "delete" targeting the old entry in previous index,
second being your standard create into new index.
Depending on the nature of your data, there might also be other possibilities.

Create and populate myindex-2, don't alias it yet
Simultaneously add alias to myindex-2 and remove it from myalias-1
REST request for step 2:
POST /_aliases
{
"actions" : [
{ "remove" : { "index" : "myindex-1", "alias" : "myindex" } },
{ "add" : { "index" : "myindex-2", "alias" : "myindex" } }
]
}
Documentation here

Related

Add documents to elasticsearch if it does not exists

I transferred some data from a log generated every day to elasticsearch using logstash, and my logstash output section looks like :
i keep the same id (id_ot) in both my log file and elasticsearch, but what i would like to do is : if the new coming id ( id_ot) already exists in elasticseach, so i will not insert it. How can i do that in logstash ?
Any help would be really appriciated !
You simply need to add the action => create parameter and if a document already exists with that ID, it is not indexed
output {
elasticsearch {
...
action => "create"
}
}
What you are asking for is an upsert , create the document if dosent exist or update an existing one .
elastic supports this via doc_as_upsert option
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html#doc_as_upsert
on the right hand side in the link you can choose the elastic version that matches your version.

What is the use of maintaining two aliases for a single Elastic Search Index

I have been exploring Elastic Search lately.
I have been going through aliases. I see ES provides an API to create multiple aliases to a single index like below:
{ "actions" : [{ "add" : { "indices" : ["test1", "test2"], "alias" : "alias1" } }] }
Refer: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html#indices-aliases
I'm wondering what is the use case of this.
Won't the queries on aliases get split if an alias point to multiple indices?
I have tried getting the info, but failed to do so as everywhere it's being explained how to acheive this but not the use case.
Directing me to a resource where I could get more info would also help.
A possible use case is when your application has to switch from an
old index to a newindex with zero downtime.
Let's say you want to reindex an index because of some reasons and you're not using aliases with your index then you need to update your application to use the new index name.
How this is helpful?
Assume that your application is using the alias instead of an index name.
Let's create an index:
PUT /my_index
Create its alias:
PUT /my_index/_alias/my_index_alias
Now you've decided to reindex your index (maybe you want to change the existing mapping).
Once documents have been reindexed correctly, you can switch your alias to point to the new index.
Note: You need to remove the alias from the old index at the same time as we add it to the new index. You can do it using _aliases endpoint atomically.
A good read : elastic
As per your question usage of maintaining two aliases for a single index:
Create “views” on a subset of the documents in an index.
Using multiple indices having same alias:
Group multiple indices under same name, which is helpful if you want to perform a single query on multiple index at the same time.
But you can't insert/index data using this strategy.
Lets say that you have to types of events, eventA & eventB. You want to "partition" them by time, so you use alias to map multiple indices (e.g. eventA-20220920) to one alias ('eventA' in this case). And you want make one alias for all the event types, so you need to give all the eventA-* and eventB-* indices another alias 'event'.
That way when you add a third type of event (eventC) you can just add them to the 'event' alias and don't change your queries

Changing field type in elastic search 2.3.4

I am building a project using elastic search for indexing and query large data.
The automatic mapper created on of my fields as double and i would like to change it to int, i can of course do it manually but i have been trying to do it with put command on the index\type
What i tried is :
PUT myindex/_mapping/model
{
"model" : {
"properties" : {
"programnumber" : {"type" : "integer", "store" : "yes"}
}
}
}
}
Thank you
Once created, fields cannot be changed anymore (with a few notable exceptions)
If your programnumber field was created as a double, it's most probably because the value of that field in the first document you indexed was a floating-point value.
Nowadays ElasticSearch allows you to go around this.
Just change your mapping and then reindex.
As you can read from the answer to a similar question: Change field type from string to integer? and how to re-index?
... you have to reindex. You can do that with Logstash (example configs have been posted in the past) or third-party tools like es-reindex. After reindexing to a new name (e.g. the original name with an underscore appended) you can delete the original index and create an alias named like the original index that points to the new index. Thereby everything will work as before.
Interesting link: Reindex API

How to handle multiple updates / deletes with Elasticsearch?

I need to update or delete several documents.
When I update I do this:
I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000).
For each of the returned documents, I modify certain values.
I resent to elasticsearch the whole modified list (bulk index).
This operation takes place until point 1 no longer returns results.
When I delete I do this:
I first search for the documents, setting a greater limit for the returned results (let’s say, size: 10000)
I delete every found document sending to elasticsearch _id document (10000 requests)
This operation repeats until point 1 no longer returns results.
Is this the right way to make an update?
When I delete, is there a way I can send several ids to delete multiple documents at once?
For your massive index/update operation, if you don't use it already (not sure), you can take a look at the bulk api documentation. it is tailored for this kind of job.
If you want to retrieve lots of documents by small batches, you should use the scan-scroll search instead of using from/size. Related information can be found here.
To sum up :
scroll api is used to load results in memory and to be able to iterate over it efficiently
scan search type disable sorting, which is costly
Give it a try, depending on the data volume, it could improve the performance of your batch operations.
For the delete operation, you can use this same _bulk api to send multiple delete operation at once.
The format of each line is the following :
{ "delete" : { "_index" : "indexName", "_type" : "typeName", "_id" : "1" } }
{ "delete" : { "_index" : "indexName", "_type" : "typeName", "_id" : "2" } }
For deletion and update, if you want to delete or update by id you can use the bulk api:
Bulk API
The bulk API makes it possible to perform many index/delete operations
in a single API call. This can greatly increase the indexing speed.
The possible actions are index, create, delete and update. index and
create expect a source on the next line, and have the same semantics
as the op_type parameter to the standard index API (i.e. create will
fail if a document with the same index and type exists already,
whereas index will add or replace a document as necessary). delete
does not expect a source on the following line, and has the same
semantics as the standard delete API. update expects that the partial
doc, upsert and script and its options are specified on the next line.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
You can also delete by query instead:
Delete By Query API
The delete by query API allows to delete documents from one or more
indices and one or more types based on a query. The query can either
be provided using a simple query string as a parameter, or using the
Query DSL defined within the request body.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html

ElasticSearch run script on document insertion (Insert API)

Is it possible to specify a script be executed when inserting a document into ElasticSearch using its Index API? This functionality exists when updating an existing document with new information using its Update API, by passing in a script attribute in the HTTP request body. I think it would be useful too in the Index API because perhaps there are some fields the user wants to be auto-calculated and populated during insertion, without having to send an additional Update request after the insertion to have the script be executed.
Elasticsearch 1.3
If you just need to search/filter on the fields that you'd like to add, the mapping transform capabilities that were added into 1.3.0 could possibly work for you:
The document can be transformed before it is indexed by registering a
script in the transform element of the mapping. The result of the
transform is indexed but the original source is stored in the _source
field.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-transform.html
You can also have the same transformation run when you get a document as well by adding the _source_transform url parameter to the request:
The get endpoint will retransform the source if the _source_transform
parameter is set.The transform is performed before any source
filtering but it is mostly designed to make it easy to see what was
passed to the index for debugging.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/_get_transformed.html
However, I don't think the _search endpoint accepts the _source_transform url parameter so I don't think you can apply the transformation to search results. That would be a nice feature request.
Elasticsearch 1.4
Elasticsearch 1.4 added a couple features which makes all this much nicer. As you mentioned, the update API allows you to specify a script to be executed. The update API in 1.4 can also accept a default document to be used in the case of an upsert. From the 1.4 docs:
curl -XPOST 'localhost:9200/test/type1/1/_update' -d '{
"script" : "ctx._source.counter += count",
"params" : {
"count" : 4
},
"upsert" : {
"counter" : 1
}
}'
In the example above, if the document doesn't exist it uses the contents of the upsert key to initialize the document. So in the case above the counter key in the newly created document will have a value of 1.
Now, if we set scripted_upsert to true (scripted_upsert is another new option in 1.4), our script will run against the newly initialized document:
curl -XPOST 'localhost:9200/test/type1/2/_update' -d '{
"script": "ctx._source.counter += count",
"params": {
"count": 4
},
"upsert": {
"counter": 1
},
"scripted_upsert": true
}'
In this example, if the document didn't exist the counter key would have a value of 5.
Full documentation from Elasticsearch site.

Resources