Copy documents in another index on creation in Elasticsearch - elasticsearch

We want to keep track of all the changes of a document, so we want to store all the document versions in separate index.
Is there a way when a new document is added or changes to send the entire document in another index? Maybe there is a processor for this use case?

As far as I know, Elasticsearch as such supports only version numbers but there is no way to trace back to previous version.
You could maintain version history in a seperate elastic index
Whenever you update main_index ensure that you update main_index as well
POST main_index/_doc/doc_id
POST main_index/_doc/doc_id_version
May be you can configure logstash to do this...not sure

Related

Keeping the .enrich index updated to source index elasticsearch

I'm using the new enrich API of Elasticsearch (ver 7.11),
to my understanding, I need to execute the policy "PUT /_enrich/policy/my-policy/_execute" each time when the source index changed, which lead to the creation of a new .enrich index.
is there an option to make it happen automatically and avoid of index creation on every change of the source index?
This is not (yet) supported and there have been other reports of similar needs.
It seems to be complex to provide the ability to regularly update an enrich index based on a changing source index and the issue above explains why.
That feature might be available some day, something seems to be in the works. I agree it would be super useful.
You can add a default pipeline to your index. that pipeline will process the documents.
See here.

Updating document and adding new field in elastic search

We have usecase that data will be updated daily. Some of attributes of document changes and some of new record is there. Is it possible to reindex data with updated value, which is already there and add new reocord.
if yes, please explain how.
Is it with update API?
I am indexing like this
String json = getJsonMapper().writeValueAsString(data);
bulkRequestBuilder.add(getClient().prepareIndex(indexName, typeName).setSource(json));
I am not passing any id. How can i update this. What is best way
Elasticsearch uses Apache Lucene underneath the covers. In Lucene documents are immutable.
You can use the Update API for your use case. This API does a delete and save underneath but that doesn't concern you. You can even update a part of the document, which means that Elasticsearch will retrieve the old document, generate the new one, delete the old one and save the new one.
The problem is that for all this to work is that you need to use the same id. If you don't then Elasticsearch will generate one for you if you use the Index API. This means that it will be saved as a new document.
The Update API needs the id, otherwise it doesn't know what to update.

elasticsearch:update the doc if exists in all the shards of an index

I googled on update the docs in ES across all the shards of index if exists. I found a way (/_bulk api), but it requires we need to specify the routing values. I was not able to find the solution to my problem. If does anybody aware of the below things please update me.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
If not, is there any way to generate routing values such that we should be able to hit all shards with update query?
Ideally for bulk update, ES recommends get the documents by query which needs to get updated using scan and scroll, update the document and index them again. Internally also, ES never updates a document although it provides an Update API through scripting. It always reindexes the new document with updated field/value and deletes the older document.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
You can check the update API if its suits your purpose. Also there are plugins which can provide you update by query. Check this.
Now comes the routing part and updating all shards. If you have specified a routing value while indexing the document for very first time, then whenever you update your document, you need to set the original routing value. Otherwise ES would never know which shard did the document resided and it can send it to any shard(algo based).
If you don't use routing value, then based on the ID of the document, ES uses an algo to decide the shard it needs to go. Hence when you update a document through a bulk API and keeps the same ID without the routing, the document will be saved in the same shard as it was previous and you would see the update.

Elasticsearch : How to get all indices that ever existed

is there a way to find out the names of all the indices ever created? Even after the index might have been deleted. Does elastic store such historical info?
Thanks
Using a plugin that keeps an audit trail for all changes that happened in your ES cluster might do the trick.
If you use the changes plugin (or a more recent one), then you can query it for all the changes in all indices using
curl -XGET http://localhost:9200/_changes
and your response will contain all the index names that were at least created. Not sure this plugin works with the latest versions of ES, though.

Update ElasticSearch Document while maintaining its external version the same?

I would like to update an ElasticSearch Document while maintaining the document's version the same. I'm using version_type=external as indicated in the versioning section of the index_ documentation. Updating a document with another of the same version is normally prevented as indicated in that section: "If the value provided is less than or equal to the stored document’s version number, a version conflict will occur and the index operation will fail."
The reason I want to keep the version unaltered is because I do not create a new version of my object (stored in my database) when one adds new tags to that object, but I would like the new tags to show up in my ElasticSearch index. Is this possible with ElasticSearch?
I tried deleting the document and then adding a new document with the same Id and Version but that still gives me the following exception:
VersionConflictEngineException[[myindex][2] [mytype][6]: version
conflict, current 1, provided 1]
Just for reference, I'm using PHP Elastica (with methods $type->deleteDocument($doc); and $type->addDocument($doc);) but this question should apply to ElasticSearch in general.
The time for which elasticsearch keeps information about deleted documents is controlled by index.gc_deletes parameter. By default this time is 1m. So, theoretically, you can decrease this time to 0s, wait for a second, delete the document, index a new document with the same version, and set index.gc_deletes back to 1m. But at the moment that would work only on master due to a bug. If you are using older version of elasticsearch, you will not be able to change index.gc_deletes without closing the index first.
There is a good blog post on elasticsearch.org web site that describes how versions are handled by elasticsearch in details.

Resources