Does Indexing rank-feature values in Elasticsearch cause a full update cycle? - elasticsearch

https://www.elastic.co/guide/en/elasticsearch/reference/current/rank-feature.html is a really cool way to quickly assist scoring results with values known at index time, but what if I need to update those values in the index a lot? Do rank-feature and rank-features cause a full update to a document (delete the whole document and then re-index it) when I update them?
Apologies if I messed anything up, I am new here! Thanks!

Documents in Elasticsearch (Lucene) are immutable. So any time you update a field, it will require a full re-index of the document. The field type shouldn't make a difference here.

Related

ElasticSearch updating all the documents

I am using elasticsearch to implement autocomplete feature. I have an api from which I get a list of all the values for autocomplete and I put those as documents in elastic search. The problem I am having is, those values could change, not very often but once a week.
I am thinking of deleting the all the documents and updating those again once a week, same as ttl of a cache. Is there any better way to achieve this?
Thank you in advance.
Maybe there are a bit more elegant than deleting and updating, you could create a new index xxxx_V2, putting new docs into xxxx_v2, and use the alias to make your app code link to the new index, then delete old index.
ideas is from https://www.elastic.co/blog/changing-mapping-with-zero-downtime.

Can multiple add/delete of document to an index make it inconsistent?

For a use-case, I'll need to add and remove multiple documents to an elastic search index. My understanding is that the tf-idf or BM25 scores are affected by the frequencies that are calculated using the postings list (?)... But, if I add and remove many documents in a day, will that affect the document/word statistics?
I've already went though a lot of API's but my untrained eyes could not locate if this is the case, or if there's a way for me to force ElasticSearch to update/recompute the index every day or so...
Any help would be appreciated
Thanks
"The IDF portion of the score can be affected by deletions and modifications" the rest should be fine... (Igor Motov)
Link to discussion:
https://discuss.elastic.co/t/can-multiple-add-delete-of-document-to-an-index-make-it-inconsistent/137030

Is there a good way to track the timestamp when a particular Elastic Search Index gets updated?

Is there a good way to track when a particular index/type is updated with new documents. I have use-case where we constantly update an index/type with new documents and was wondering what is the recommended way to go about the same.
Link to a similar question asked earlier
I do understand the concept as to how the documents are stored and update in a distributed system(i.e. keeping track of timestamp wrt each document). But, was wondering, whats the recommended way of knowing when was the index updated because of a change, or does that not make any logical sense ?

sometimes when adding new fields in index, they don't get indexed in elasticsearch

Let's say I have an index test and which already exists. I want to add a new field newfield1 with some data for all documents in the database. Currently I am simply deleting all everything and then reinserting the data with the newfield1 data added in. I understand this isn't the most efficient way, but that's not my question right now.
Sometimes the data in newfield1 does not get indexed and I can't visualize it in Kibana. It's pretty annoying. Is there something wrong with what I'm doing?
NOTE: I CAN query this field in ElasticSearch which makes me think there's a problem with Kibana
Kibana caches the field mapping. Go to Settings -> Indices, select your index, and click the orange "Refresh" button.
Not much to go on here but first make sure your cluster is Green.
$ curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
If you are still struggling to understand the state of you cluster then perhaps consider installing on of the plugins like HQ https://github.com/royrusso/elasticsearch-HQ

How exactly does elasticsearch versioning work?

My understanding was that Elasticsearch would store the lastest copy of the document and just update the version field number? But I was playing around with a few thousand documents and had the need to index them repeatedly without changing any data in the document. My thinking was that the index size would remain the same, but that wasn't the case ... the index size seemed to increase.
This confused me a little bit, so i just wanted to seek clarification on the internal mechanism of versioning within elasticsearch.
An update is a Delete + Insert Lucene operation behind the scene.
But you should know that Lucene does not really delete the document but mark it as deleted.
To remove deleted docs, you have to optimize your Lucene segments.
$ curl -XPOST 'http://localhost:9200/twitter/_optimize?only_expunge_deletes=true'
See Optimize API. Also have a look at merge options. Merging segments happens behind the scene at some time.
For a general overview of versioning support in Elasticsearch, please refer to the Elasticsearch Versioning Support.

Resources