ElasticSearch updating all the documents - elasticsearch

I am using elasticsearch to implement autocomplete feature. I have an api from which I get a list of all the values for autocomplete and I put those as documents in elastic search. The problem I am having is, those values could change, not very often but once a week.
I am thinking of deleting the all the documents and updating those again once a week, same as ttl of a cache. Is there any better way to achieve this?
Thank you in advance.

Maybe there are a bit more elegant than deleting and updating, you could create a new index xxxx_V2, putting new docs into xxxx_v2, and use the alias to make your app code link to the new index, then delete old index.
ideas is from https://www.elastic.co/blog/changing-mapping-with-zero-downtime.

Related

Elastic Search: applying changes of analyzers/tokenizers/filters settings to existing indices

I'm quite new to ElasticSearch, so please forgive me if I overlook something obvious/basic.
I'd like to change the settings of analyzers/tokenizers/filters, and then apply them to existing indices. The only way I can come up with is the following:
create a new index.
Suppose you want to change the setting of posts index, create posts-copy index.
reindex posts to posts-copy.
delete posts.
re-create posts index, applying the new settings.
re-reindex posts-copy to posts.
delete posts-copy.
I tried this way using Ruby client of ElasticSearch, and it looked like it worked. However, I'm not sure if this approach really is proper. And I feel like it's quite redundant and there might be more efficient way of applying new settings.
I would appreciate it if you shed some lights on this problem.
It depends what type of changes are you doing on analyzers/tokenizers/filters, if you are changing it on existing fields than these are breaking changes and you have to recreate the indices with new settings(like you mentioned), but if you are adding a new fields in the index and for that you are creating new settings, than you don't have to recreate the index, its called incremental changes.
Hope this helps.

Does Indexing rank-feature values in Elasticsearch cause a full update cycle?

https://www.elastic.co/guide/en/elasticsearch/reference/current/rank-feature.html is a really cool way to quickly assist scoring results with values known at index time, but what if I need to update those values in the index a lot? Do rank-feature and rank-features cause a full update to a document (delete the whole document and then re-index it) when I update them?
Apologies if I messed anything up, I am new here! Thanks!
Documents in Elasticsearch (Lucene) are immutable. So any time you update a field, it will require a full re-index of the document. The field type shouldn't make a difference here.

Will the Crawler reindex the records after deleted

Working on Storm Crawler 1.12.1 and Elastic Search 6.5.2. I need to increase the efficiency of my search engine. I deleted some of the documents for security reasons after indexing documents into the elastic search. So my question is that the storm crawler will re grab the deleted urls and re-index again? I don't want to re-crawl the deleted records,How can I achieve this?
I assume you deleted the documents from the content index. They are probably still in the status index and even if they are not, they might be rediscovered and added back.
The best thing to do would be to add new entries to whichever flavour of URLfilters you are using so that these URLs are covered, this way they won't be added back if rediscovered then delete them from the status index.

How to delete data from ElasticSearch through JavaAPI

EDITED
I'm trying to find out how to delete data from Elasticsearch according to a criteria. I know that older versions of ElasticSearch had Delete By Query feature, but it had really serious performance issues, so it was removed. I know also for that there is a Java plugin for delete by query:
org.elasticsearch.plugin:delete-by-query:2.2.0
But I don't know if it has a better implementation of delete which has a better performance or it's the same as the old one.
Also, someone suggested using scroll to remove data, but I know how to retrieve data scrolling, not how to use scroll to remove!
Does anyone have an idea (the amount of documents to remove in a call would be huge, over 50k documents.
Thanks in advance!
Finally used this guy's third option
You are correct that you want to use the scroll/scan. Here are the steps:
begin a new scroll/scan
Get next N records
Take the IDs from each record and do a BulkDelete of those IDs
go back to step 2
So you don't delete exactly using the scroll/scan, you just use that as a tool to get all the IDs for the records that you want to delete. In this way you're only deleting N records at a time and not all 50,000 in 1 chunk (which would cause you all kinds of problems).

reindexing elastic search or updating indexes?

I am now on elastic search, I cant figure out how to update elastic search index,type or document without deleting and reindexing? or is it the best way to achieve it?
So if I have products in my sql product table, should I better delete product type and reindex it or even entire DB as index on elasticsearc. what is the best use case and how can I achieve it?
I would like to do it with Nest preferably but if it is easier, ElasticSearch works for me as well.
Thanks
This can be a real challenge! Historic records in elasticsearch will need to be reindexed when the template changes. New records will automatically be formatted according to the template you specify.
Using this link has helped us a lot:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-templates.html
You'll want to be sure to have the logstash filter set up to match the fields in your template.

Resources