Elastic search for batch update of old documents - elasticsearch

In my application, I am using Elasticsearch for indexing and searching of documents.As expected, documents have some fields.
Due to new requirements, users want those documents to have some more new fields. I can add new fields for newly created documents, but I also need to have old documents too to have these fields.
I am thinking of writing a framework which would accept generic criteria to read old documents and update them. By generic criteria, I mean it must be able to accept any user defined condition to read older documents.
I am new to ES,and hence not sure if its feasible.
So I want to know whether it is feasible to write such a framework using Elastic search?

If you provide a custom document id, you can reindex your existing data with the update api (available also in the upsert mode). In this way you can update the documents adding the new fields when you re-import the old data.
It is important to provide a document id, otherwise it is impossible to add fields to the existing documents, since only insert are possible.

Related

How about including JSON doc version? Is it possible for elastic search, to include different versions of JSON docs, to save and to search?

We are using ElasticSearch to save and manage information on complex transactions. We might need to add more information for every transaction, on the near future.
How about including JSON doc version?
Is it possible for elastic search, to include different versions of JSON docs, to save and to search?
How does this affects performance on ElasticSearch?
It's completely possible, By default elastic uses the dynamic mappings for every new documents such as your JSON documents to index them. For each field in your documents elastic creates a table called inverted_index and the search queries executed against them so regardless of your field variation as long as you know which field you want to execute query the data throughput and performance will not be affected.

ElasticSearch1.5 : Add new field in existing working Index

I have an existing index named as "MyIndex", which I am using to store a kind of data in ElasticSearch. That same index has millions of records. I am using ElasticSearch 1.5 version.
Now I have a new requirement for which I want to add two more fields in the same document which I am storing in "MyIndex" Index. Now I want to use both new schema and old schema documents in future.
What Can I do?
Can I inset new document in the same Index?
Are we need some changes in ElasticSearch mapping?
If we don't change anything, Is it affect on existing search capability?
Please help me to conclude this issue with your opinions.
Thanks in advance.
You can add new fields to existing index by updating mapping, but in many cases it would be just ok to index documents with new fields directly, and let ES infer types (although not always recommended) - but this will depend on what type of data you're indexing, and do you need special analyzers for strings or not.

Updating document and adding new field in elastic search

We have usecase that data will be updated daily. Some of attributes of document changes and some of new record is there. Is it possible to reindex data with updated value, which is already there and add new reocord.
if yes, please explain how.
Is it with update API?
I am indexing like this
String json = getJsonMapper().writeValueAsString(data);
bulkRequestBuilder.add(getClient().prepareIndex(indexName, typeName).setSource(json));
I am not passing any id. How can i update this. What is best way
Elasticsearch uses Apache Lucene underneath the covers. In Lucene documents are immutable.
You can use the Update API for your use case. This API does a delete and save underneath but that doesn't concern you. You can even update a part of the document, which means that Elasticsearch will retrieve the old document, generate the new one, delete the old one and save the new one.
The problem is that for all this to work is that you need to use the same id. If you don't then Elasticsearch will generate one for you if you use the Index API. This means that it will be saved as a new document.
The Update API needs the id, otherwise it doesn't know what to update.

Does ElasticSearch Snapshot/Restore functionality cause the data to be analyzed again during restore?

I have a decent amount of data in my ElasticSearch index. I changed the default analyzer for the index and hence essentially I need to reindex my data so that it is analyzed again using the new analyzer. So instead of creating a test script that will delete all of the existing data in the ES index and re-add the data I thought if there is a back-up/restore module that I could use. As part of that, I found the snapshot/restore module that ES supports - ElasticSearch-SnapshotAndRestore.
My question is - If I use the above ES snapshot/restore module will it actually cause the data to be re-analyzed? Since I changed the default analyzer, I need the data to be reanalyzed. If not, is there an alternate tool/module you will suggest that will allow for pure export and import of data and hence cause the data to be re-analyzed during import?
DevUser
No it does not re-analyze the data. You will need to reindex your data.
Fortunately that's fairly straightforward with Elasticsearch as it by default stores the source of your documents:
Reindexing your data
While you can add new types to an index, or add new fields to a type,
you can’t add new analyzers or make changes to existing fields. If you
were to do so, the data that has already been indexed would be
incorrect and your searches would no longer work as expected.
The simplest way to apply these changes to your existing data is just
to reindex: create a new index with the new settings and copy all of
your documents from the old index to the new index.
One of the advantages of the _source field is that you already have
the whole document available to you in Elasticsearch itself. You don’t
have to rebuild your index from the database, which is usually much
slower.
To reindex all of the documents from the old index efficiently, use
scan & scroll to retrieve batches of documents from the old index, and
the bulk API to push them into the new index.
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/reindex.html
I'd read up on Scan and Scroll prior to taking this approach:
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/scan-scroll.html
TaskRabbit did opensource an import/export tool but I've not used it so cannot recommend but it is worth a look:
https://github.com/taskrabbit/elasticsearch-dump

Recommended way to store data in elasticsearch

I want to use elasticsearch on my backend and I have few questions:
My DB contains semi-structured data of products, i.e. each product may have different attributes inside it.
I want to be able to search a text on most of the fields and also search a text on one specific field.
What is the recommended way to store the document in ES ? to store all text in on field (maybe using _all feature) or leave it in different fields.
My concern of different fields is that I might have a lot of indexes (because I have many different product attributes)
I'm using couchbase as my main DB.
What is the recommended way to move the documents from it to ES, assuming I need to make some modifications on the document ?
To update the index from my code explicitly or use external tool ?
10x,
It depends on how many docs you are indexing at a time. If the number of docs are like >2million. Then it's better to store everything in one field , which will save time while indexing.
If the docs indexed are very less, then index them field by field and then search on _all field. This will give a clear view on the data and will be really helpful for what to display and what not to display.

Resources