I store some documents in index "blog".
When I open URL http://localhost:9200/blog/post/90?pretty=true by browser
I have the different value in "_version" field. ElasticSearch store 2 version of my document and return it randomly.
How to get the last document?
The _version property is used to implement optimistic locking. There cannot be two documents with a different version in the index. At least not in the same shard. Their might be a very short time frame in which the replicate shard can have an older version. Each update to the documents increases the version number. You can find more information about this in this blog post:
http://www.elasticsearch.org/blog/versioning/
Related
Documentation suggests that _version field is per document and is increasing by 1 each time document is updated. After querying data in my ES I see that _version field is global for the whole index. (looks like each update is tracked, so my documents have version with value of thousands, which after single update can be increases more than by one, typically some random number, which correlates with global updates in cluster, I guess)
How to change it?
My ES version is 7.14.0
EDIT1:
I think I need to clarify more:
After _search I can see that my documents indeed have like "_version": 410084,
maybe it is because I am using kafka-connect with elasticsearch sink to put documents from kafka? Although I don't see any configuration for this sink to manage version by itself
_version field is per document, I think you are confused with the seq_no which denotes the no of updates in a shard and is a counter for the no of updates in a shard, also its not writeable ie Elasticsearch only handles the updates to seq_no field.
As _version field update can be external, ie you can also updates the value of it, and if you seeing its getting increase more than one, it means its not being updated by Elasticsearch and someone in your application is updating it, but it doesn't correlates to global updates in cluster in any case.
I have to built an index in Elastic Search which will have more than 500,000 unique documents. The documents have nested fields as well.
All the documents in the index are updated every 10 mins (using PUT).
I read that updating an document includes reindexing the document and it can affect the search performance.
Did anyone faced similar scenario in using EL and if someone can share their experience on the search/query response time across such an index if the expected response for query is under 2 seconds?
Update:
Now, I Indexed document with id as 1 using update request. Then, I updated document (id=1) using PUT to /_update with
"doc_as_upsert" : true and doc field, I see the response contains the same version as before update for the document and has attribute result ="noop" in the output.
I assume that indexing didn't happened as version of the document is not updated.
Does this reduce impact on search response(assuming there are 100 requests/second happening) and indexing response for my use case if do the same but for 500,000 documents every 10 mins compared to using PUT (INDEX API)?
I googled on update the docs in ES across all the shards of index if exists. I found a way (/_bulk api), but it requires we need to specify the routing values. I was not able to find the solution to my problem. If does anybody aware of the below things please update me.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
If not, is there any way to generate routing values such that we should be able to hit all shards with update query?
Ideally for bulk update, ES recommends get the documents by query which needs to get updated using scan and scroll, update the document and index them again. Internally also, ES never updates a document although it provides an Update API through scripting. It always reindexes the new document with updated field/value and deletes the older document.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
You can check the update API if its suits your purpose. Also there are plugins which can provide you update by query. Check this.
Now comes the routing part and updating all shards. If you have specified a routing value while indexing the document for very first time, then whenever you update your document, you need to set the original routing value. Otherwise ES would never know which shard did the document resided and it can send it to any shard(algo based).
If you don't use routing value, then based on the ID of the document, ES uses an algo to decide the shard it needs to go. Hence when you update a document through a bulk API and keeps the same ID without the routing, the document will be saved in the same shard as it was previous and you would see the update.
I would like to update an ElasticSearch Document while maintaining the document's version the same. I'm using version_type=external as indicated in the versioning section of the index_ documentation. Updating a document with another of the same version is normally prevented as indicated in that section: "If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail."
The reason I want to keep the version unaltered is because I do not create a new version of my object (stored in my database) when one adds new tags to that object, but I would like the new tags to show up in my ElasticSearch index. Is this possible with ElasticSearch?
I tried deleting the document and then adding a new document with the same Id and Version but that still gives me the following exception:
VersionConflictEngineException[[myindex][2] [mytype][6]: version
conflict, current 1, provided 1]
Just for reference, I'm using PHP Elastica (with methods $type->deleteDocument($doc); and $type->addDocument($doc);) but this question should apply to ElasticSearch in general.
The time for which elasticsearch keeps information about deleted documents is controlled by index.gc_deletes parameter. By default this time is 1m. So, theoretically, you can decrease this time to 0s, wait for a second, delete the document, index a new document with the same version, and set index.gc_deletes back to 1m. But at the moment that would work only on master due to a bug. If you are using older version of elasticsearch, you will not be able to change index.gc_deletes without closing the index first.
There is a good blog post on elasticsearch.org web site that describes how versions are handled by elasticsearch in details.
i had indexed the data and elasticsearch allocate it a version.
Now I had updated the indexes by update API a new version will be allocate to it.
Now I want to search data based on version.
For eg:
my original indexed data is (version=1) :
{
"name":"Lav"
}
after execution of update query (version=2) :
{
"name":"Lav",
"message":"hello elasticsearch"
}
Now I want to perform search operation first in version 1 and then in version 2 .
How it can be done
If I understand well your question, you want to make a search based on the values your document had in its version 1.
In fact, the version field is just an additional information added to the document. ElasticSearch doesn't keep the values of fields for older versions. Therefore, it's not possible to do a search on the data of an older version.
The version field is useful primarily to do some "Optimistic concurrency control" like explained in this blog post.