I am using Elasticsearch 7.9.0
I was updating the document very frequently. So I was getting the below exception
Elasticsearch exception [type=version_conflict_engine_exception, reason=[111]: version conflict, required seqNo [4348], primary term [2]. current document has seqNo [4427] and primary term [2]]
Then I have given a delay of 1 second between each update.(I can't give more then that)
But still the problem exists. How can we solve this.
Please help me.
Thanks.
This issue happens because of the versioning of document in elasticsearch. This feature exists in order to prevent concurrent changes to the same documents by tasks that runs simultaneously.
When you try to update a document that is already being updated by another task you might run into this issue.
If you want to track the update process of documents by your updates you may want to use the Task management API by elastic: https://www.elastic.co/guide/en/elasticsearch/reference/current/tasks.html
Also you might want to check this documentation on Index API as it explains further: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html
I received nearly the same error in OpenSearch configuration but it wasn't due to too frequent updates like in OP's case.
In my case, I was unknowingly trying to update an existing Role in the domain. My requests were trying to create a 'new' Role when it already existed. When I tried to do this, I received the error.
My resolution was to create a Role with an entirely new name and then update that.
Related
Env Details:
Elastic Search version 7.8.1
routing param is an optional in Index settings.
As per ElasticSearch docs - https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html
When indexing documents specifying a custom _routing, the uniqueness of the _id is not guaranteed across all of the shards in the index. In fact, documents with the same _id might end up on different shards if indexed with different _routing values.
We have landed up in same scenario where earlier we were using custom routing param(let's say customerId). And for some reason we need to remove custom routing now.
Which means now docId will be used as default routing param. This is creating duplicate record with same id across different shard during Index operation. Earlier it used to (before removing custom routing) it resulted in update of record (expected)
I am thinking of following approaches to come out of this, please advise if you have better approach to suggest, key here is to AVOID DOWNTIME.
Approach 1:
As we receive the update request, let duplicate record get created. Once record without custom routing gets created, issue a delete request for a record with custom routing.
CONS: If there is no update on records, then all those records will linger around with custom routing, we want to avoid this as this might results in unforeseen scenario in future.
Approach 2
We use Re-Index API to migrate data to new index (turning off custom routing during migration). Application will use new index after successful migration.
CONS: Some of our Indexes are huge, they take 12 hrs+ for re-index operation, and since elastic search re-index API will not migrate the newer records created between this 12hr window, as it uses snapshot mechanism. This needs a downtime approach.
Please suggest alternative if you have faced this before.
Thanks #Val, also found few other approaches like write to both indexes and read from old. And then shift to read new one after re-indexing is finished. Something on following lines -
Create an aliases pointing to the old indices (*_v1)
Point the application to these aliases instead of actual indices
Create a new indices (*_v2) with the same mapping
Move data from old indices to new using re-indexing and make sure we don't
retain custom routing during this.
Post re-indexing, change the aliases to point to new index instead of old
(need to verify this though, but there are easy alternatives if this
doesn't work)
Once verification is done, delete the old Indices
What do we do in transition period (window between reindexing start to reindexing finish) -
Write to both Indices (old and new) and read from old indices via aliases
I'm using the new enrich API of Elasticsearch (ver 7.11),
to my understanding, I need to execute the policy "PUT /_enrich/policy/my-policy/_execute" each time when the source index changed, which lead to the creation of a new .enrich index.
is there an option to make it happen automatically and avoid of index creation on every change of the source index?
This is not (yet) supported and there have been other reports of similar needs.
It seems to be complex to provide the ability to regularly update an enrich index based on a changing source index and the issue above explains why.
That feature might be available some day, something seems to be in the works. I agree it would be super useful.
You can add a default pipeline to your index. that pipeline will process the documents.
See here.
I am using an elastic search for search purpose. But recently I observer that some random error while adding data into elastic search:
version conflict, required seqNo [113789], primary term [19]. current document has seqNo [113797] and primary term [19]
The above type error comes randomly and I am not able to add/update data in elastic search.
Can you please help to understand:
What is the root cause of this issue?
How I can reproduce this issue? as this coming randomly need to know the basic step to reproduce this issue
What is the solution for this? How I can solve this issue?
This error happened during an update of the document at the same time others updated it. Check the parallelism.
When this process read the document it had the version number 113789 and when ES received the update did not match with the version number (current version 113797). It causes the version conflict.
Is there a way to update documents something similar to UpdateByQuery, but in bulks and without getting them.
According to the documentation we are unable to set a size for UpdateByQuery requests.
I.e Update 5 documents at a time and not all at once.
One solution that seems obvious is to GET 5 documents, and then UPDATE them.
I'm trying to come up with a way where I dont have to do a GET request for every update.
You can set the batch size on UpdateByQueryRequest with setBatchSize as in this page from the docs.
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/master/java-rest-high-document-update-by-query.html
Now that's based on the latest version of the Java client. If you are using a different client or version, it may not be present. Hope that helps.
I googled on update the docs in ES across all the shards of index if exists. I found a way (/_bulk api), but it requires we need to specify the routing values. I was not able to find the solution to my problem. If does anybody aware of the below things please update me.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
If not, is there any way to generate routing values such that we should be able to hit all shards with update query?
Ideally for bulk update, ES recommends get the documents by query which needs to get updated using scan and scroll, update the document and index them again. Internally also, ES never updates a document although it provides an Update API through scripting. It always reindexes the new document with updated field/value and deletes the older document.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
You can check the update API if its suits your purpose. Also there are plugins which can provide you update by query. Check this.
Now comes the routing part and updating all shards. If you have specified a routing value while indexing the document for very first time, then whenever you update your document, you need to set the original routing value. Otherwise ES would never know which shard did the document resided and it can send it to any shard(algo based).
If you don't use routing value, then based on the ID of the document, ES uses an algo to decide the shard it needs to go. Hence when you update a document through a bulk API and keeps the same ID without the routing, the document will be saved in the same shard as it was previous and you would see the update.