Non partial update on elasticsearch node API - elasticsearch

As the 2.0 docs say, updates performed are partial - they do not override the whole document but only merge the existing one with the value given.
Is there a way to perform a full update using this API?

Yes, you can simply re-index the document using client.index, as if it was a new document, but using the same id.
A new version of the document will be indexed and stored, which will override the one stored previously.

Related

Copy documents in another index on creation in Elasticsearch

We want to keep track of all the changes of a document, so we want to store all the document versions in separate index.
Is there a way when a new document is added or changes to send the entire document in another index? Maybe there is a processor for this use case?
As far as I know, Elasticsearch as such supports only version numbers but there is no way to trace back to previous version.
You could maintain version history in a seperate elastic index
Whenever you update main_index ensure that you update main_index as well
POST main_index/_doc/doc_id
POST main_index/_doc/doc_id_version
May be you can configure logstash to do this...not sure

Elastic search for batch update of old documents

In my application, I am using Elasticsearch for indexing and searching of documents.As expected, documents have some fields.
Due to new requirements, users want those documents to have some more new fields. I can add new fields for newly created documents, but I also need to have old documents too to have these fields.
I am thinking of writing a framework which would accept generic criteria to read old documents and update them. By generic criteria, I mean it must be able to accept any user defined condition to read older documents.
I am new to ES,and hence not sure if its feasible.
So I want to know whether it is feasible to write such a framework using Elastic search?
If you provide a custom document id, you can reindex your existing data with the update api (available also in the upsert mode). In this way you can update the documents adding the new fields when you re-import the old data.
It is important to provide a document id, otherwise it is impossible to add fields to the existing documents, since only insert are possible.

Updating document and adding new field in elastic search

We have usecase that data will be updated daily. Some of attributes of document changes and some of new record is there. Is it possible to reindex data with updated value, which is already there and add new reocord.
if yes, please explain how.
Is it with update API?
I am indexing like this
String json = getJsonMapper().writeValueAsString(data);
bulkRequestBuilder.add(getClient().prepareIndex(indexName, typeName).setSource(json));
I am not passing any id. How can i update this. What is best way
Elasticsearch uses Apache Lucene underneath the covers. In Lucene documents are immutable.
You can use the Update API for your use case. This API does a delete and save underneath but that doesn't concern you. You can even update a part of the document, which means that Elasticsearch will retrieve the old document, generate the new one, delete the old one and save the new one.
The problem is that for all this to work is that you need to use the same id. If you don't then Elasticsearch will generate one for you if you use the Index API. This means that it will be saved as a new document.
The Update API needs the id, otherwise it doesn't know what to update.

elasticsearch:update the doc if exists in all the shards of an index

I googled on update the docs in ES across all the shards of index if exists. I found a way (/_bulk api), but it requires we need to specify the routing values. I was not able to find the solution to my problem. If does anybody aware of the below things please update me.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
If not, is there any way to generate routing values such that we should be able to hit all shards with update query?
Ideally for bulk update, ES recommends get the documents by query which needs to get updated using scan and scroll, update the document and index them again. Internally also, ES never updates a document although it provides an Update API through scripting. It always reindexes the new document with updated field/value and deletes the older document.
Is there any way to update the doc in all the shards of an index if exists using a single update query?.
You can check the update API if its suits your purpose. Also there are plugins which can provide you update by query. Check this.
Now comes the routing part and updating all shards. If you have specified a routing value while indexing the document for very first time, then whenever you update your document, you need to set the original routing value. Otherwise ES would never know which shard did the document resided and it can send it to any shard(algo based).
If you don't use routing value, then based on the ID of the document, ES uses an algo to decide the shard it needs to go. Hence when you update a document through a bulk API and keeps the same ID without the routing, the document will be saved in the same shard as it was previous and you would see the update.

How to create an index from search results, all on the server?

I will be getting documents from a filtered query (quite a lot of documents). I will then immediately create an index from them (in Python, using requests to directly query the REST API), without any modification.
Is it possible to make this operation directly on the server, without the round-trip of data to the script and back?
Another question was similar (in the intent) and the only answer is to go via Logstash (equivalent to using my code, though possibly more efficient)
refer http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html
in short what you need to do is
0.) ensure you have _source set to true
1.) use scan and scroll API , pass your filtered query with search type scan,
2.)fetch documents using scroll id
2.) bulk index the result using the source field which returns you the json used to index data
refer:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
guide/en/elasticsearch/guide/current/bulk.html
guide/en/elasticsearch/guide/current/reindex.html
es 2.3 has an experimental feature that allows reindex from a query
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html

Resources