Update ElasticSearch Document while maintaining its external version the same? - elasticsearch

I would like to update an ElasticSearch Document while maintaining the document's version the same. I'm using version_type=external as indicated in the versioning section of the index_ documentation. Updating a document with another of the same version is normally prevented as indicated in that section: "If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail."
The reason I want to keep the version unaltered is because I do not create a new version of my object (stored in my database) when one adds new tags to that object, but I would like the new tags to show up in my ElasticSearch index. Is this possible with ElasticSearch?
I tried deleting the document and then adding a new document with the same Id and Version but that still gives me the following exception:
VersionConflictEngineException[[myindex][2] [mytype][6]: version
conflict, current 1, provided 1]
Just for reference, I'm using PHP Elastica (with methods $type->deleteDocument($doc); and $type->addDocument($doc);) but this question should apply to ElasticSearch in general.

The time for which elasticsearch keeps information about deleted documents is controlled by index.gc_deletes parameter. By default this time is 1m. So, theoretically, you can decrease this time to 0s, wait for a second, delete the document, index a new document with the same version, and set index.gc_deletes back to 1m. But at the moment that would work only on master due to a bug. If you are using older version of elasticsearch, you will not be able to change index.gc_deletes without closing the index first.
There is a good blog post on elasticsearch.org web site that describes how versions are handled by elasticsearch in details.

Related

Copy documents in another index on creation in Elasticsearch

We want to keep track of all the changes of a document, so we want to store all the document versions in separate index.
Is there a way when a new document is added or changes to send the entire document in another index? Maybe there is a processor for this use case?
As far as I know, Elasticsearch as such supports only version numbers but there is no way to trace back to previous version.
You could maintain version history in a seperate elastic index
Whenever you update main_index ensure that you update main_index as well
POST main_index/_doc/doc_id
POST main_index/_doc/doc_id_version
May be you can configure logstash to do this...not sure

Does updating a doc increase the "delete" count of the index?

I am facing a strange issue in the number of docs getting deleted in an elasticsearch index. The data is never deleted, only inserted and/or updated. While I can see that the total number of docs are increasing, I have also been seeing some non-zero values in the docs deleted column. I am unable to understand from where did this number come from.
I tried reading whether the update doc first deletes the doc and then re-indexes it so in this way the delete count gets increased. However, I could not get any information on this.
The command I type to check the index is:
curl -XGET localhost:9200/_cat/indices
The output I get is:
yellow open e0399e012222b9fe70ec7949d1cc354f17369f20 zcq1wToKRpOICKE9-cDnvg 5 1 21219975 4302430 64.3gb 64.3gb
Note: It is a single node elasticsearch.
I expect to know the reason behind deletion of docs.
You are correct that updates are the cause that you see a count for documents delete.
If we talk about lucene then there is nothing like update there. It can also be said that documents in lucene are immutable.
So how does elastic provides the feature of update?
It does so by making use of _source field. Therefore it is said that _source should be enabled to make use of elastic update feature. When using update api, elastic refers to the _source to get all the fields and their existing values and replace the value for only the fields sent in update request. It marks the existing document as deleted and index a new document with the updated _source.
What is the advantage of this if its not an actual update?
It removes the overhead from application to always compile the complete document even when a small subset of fields need to update. Rather than sending the full document, only the fields that need an update can be sent using update api. Rest is taken care by elastic.
It reduces some extra network round-trips, reduce payload size and also reduces the chances of version conflict.
You can read more how update works here.

elasticsearch:update the doc if exists in all the shards of an index

I googled on update the docs in ES across all the shards of index if exists. I found a way (/_bulk api), but it requires we need to specify the routing values. I was not able to find the solution to my problem. If does anybody aware of the below things please update me.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
If not, is there any way to generate routing values such that we should be able to hit all shards with update query?
Ideally for bulk update, ES recommends get the documents by query which needs to get updated using scan and scroll, update the document and index them again. Internally also, ES never updates a document although it provides an Update API through scripting. It always reindexes the new document with updated field/value and deletes the older document.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
You can check the update API if its suits your purpose. Also there are plugins which can provide you update by query. Check this.
Now comes the routing part and updating all shards. If you have specified a routing value while indexing the document for very first time, then whenever you update your document, you need to set the original routing value. Otherwise ES would never know which shard did the document resided and it can send it to any shard(algo based).
If you don't use routing value, then based on the ID of the document, ES uses an algo to decide the shard it needs to go. Hence when you update a document through a bulk API and keeps the same ID without the routing, the document will be saved in the same shard as it was previous and you would see the update.

Elasticsearch : How to get all indices that ever existed

is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.

ElasticSearch getting the last version of document

I store some documents in index "blog".
When I open URL http://localhost:9200/blog/post/90?pretty=true by browser
I have the different value in "_version" field. ElasticSearch store 2 version of my document and return it randomly.
How to get the last document?
The _version property is used to implement optimistic locking. There cannot be two documents with a different version in the index. At least not in the same shard. Their might be a very short time frame in which the replicate shard can have an older version. Each update to the documents increases the version number. You can find more information about this in this blog post:
http://www.elasticsearch.org/blog/versioning/

Resources