Elasticsearch refresh interval when index is not searched - elasticsearch

The documentation on refreshes says:
By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.
What happens if the index that was not queried in the last 30 seconds? When does it get refreshed? If I for example write to an index, don't search it for a long time and then search it, will I get up to date results? When I write I get the results immediately on the first search, so I seem to be misunderstanding something.

Related

Why is ElasticSearch index searchable when refresh_interval is set to -1 on initial data upload?

I'm performing a large upload of data to an empty index.
This article suggests to set "refresh_interval=-1" and "number_of_replicas=0" to increase upload performance. Then it says to enable it back.
The interesting thing is that if I don't enable it back - I can still send the queries to the newly created index and get the results.
I'd like to know why is that and what I got wrong ? (My expectation was that I should get zero results because indexing is disabled)
And one more thing I'd like to understand - if I enable refresh_interval back to the original value, do I need to execute /_refresh operation ?
By default, Elasticsearch periodically refreshes indices every second,
but only on indices that have received one search request or more in
the last 30 seconds. You can change this default interval using the
index.refresh_interval setting.
so document says: when you send a search request, it will send a refresh request with that. so you could search your data but very slow for first time or miss some data for first search. it is better to have a refresh_interval if you index new data on your indices.

elasticsearch - refresh interval of one second

I am aware of how refresh works and refresh happens every second by default. However, what disconnects me more here is
Does it mean any size of data will appear in search after exactly one second or it means it will take at least one second for the searcher to see the new documents .
From Documentation, "The default refresh interval is one second for indices that receive or more search requests in the last 30 seconds." It doesnt seem apply for all the indices, can someone shed more details about this what it really mean by for indices that receive or more search requests in the last 30 seconds in the context of what happens to other indices which didnt receive the search req in last 30 sec
Really nice question, let me try to explain to you.
1. Does it mean any size of data will appear in search after exactly one second or it means it will take at least one second for the searcher to see the new documents.
Answer: Size of data has got nothing to do here, it's simply a background process in elasticsearch which commits data from im-memory(which is not available to searches) to segments(Hope you know what segments in ES and Lucene), so that it's available for searches.
2.The default refresh interval is one second for indices that receive or more search requests in the last 30 seconds.
Answer: This is the smart optimization done by elasticsearch to reduce the overhead of refresh(explained earlier), if your indices didn't get any search request in last 30 seconds, so no need to explicit refresh(as only when you search, you will get to see the latest data, available by using refresh), Hence on indices which have not got any search requests in last 30 seconds, ES can skip the refresh on those indices, even their refresh interval is 1 second.

Elasticsearch incorrect document count after `scan` & `scroll` and then `refresh` & `flush`

I use scan and scroll to reindex from old_index to new_index. Right after the reindex completes, I do a refresh and flush, hoping that the data will persist into the disk. I then immediately read the document counts of both indices. The new index will usually have zero documents. I have to repeatedly read the document counts in a loop (with a 1-second pause in each iteration). Only after 20 seconds or so, I will see equal document counts. (The refresh rate of both indices were set to 30 seconds).
From my understanding, calling either refresh or flush, I should see the actual document counts. But it is not the case.
In Elasticsearch1.6 You can use:
curl -XPOST localhost:9200/index/_flush/synced
to sync flush execution to be finished.

Elastic search - get last second changes

How do I get all changes that were indexed in my Elasticsearch cluster within the last second?
I've tried to add a time stamp and query it, but indexing some of the items took few seconds, and therefore these items were missing in the result (I didn't get them in the next second either, since the timestamp refers to the start of the indexing process).

Getting an indexes item count with ElasticSearch

I am writing some code where we are inserting 200,000 items into an ElasticSearch index.
Whilst this works fine, when we get a count of items in the index to ascertain everything went in, we are not getting the same number. However, if we wait a second or two, the count is correct.
Therefore, is there a programmatic way we can get a real count from ElasticSearch without having to sleep or similar?
Newly indexed records become visible in search results only after the Refresh operation. Refresh is called automatically with frequency specified by index.refresh_interval setting, which is 1s by default. When writing elasticsearch tests, it's customary to call refresh after indexing to make sure that all indexed records are available in searches. However, excessive refresh calls (after each record, for example) in production code might hamper the elasticsearch indexing performance.

Resources