Elastic search - get last second changes - elasticsearch

How do I get all changes that were indexed in my Elasticsearch cluster within the last second?
I've tried to add a time stamp and query it, but indexing some of the items took few seconds, and therefore these items were missing in the result (I didn't get them in the next second either, since the timestamp refers to the start of the indexing process).

Related

Elasticsearch refresh interval when index is not searched

The documentation on refreshes says:
By default, Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.
What happens if the index that was not queried in the last 30 seconds? When does it get refreshed? If I for example write to an index, don't search it for a long time and then search it, will I get up to date results? When I write I get the results immediately on the first search, so I seem to be misunderstanding something.

elasticsearch - refresh interval of one second

I am aware of how refresh works and refresh happens every second by default. However, what disconnects me more here is
Does it mean any size of data will appear in search after exactly one second or it means it will take at least one second for the searcher to see the new documents .
From Documentation, "The default refresh interval is one second for indices that receive or more search requests in the last 30 seconds." It doesnt seem apply for all the indices, can someone shed more details about this what it really mean by for indices that receive or more search requests in the last 30 seconds in the context of what happens to other indices which didnt receive the search req in last 30 sec
Really nice question, let me try to explain to you.
1. Does it mean any size of data will appear in search after exactly one second or it means it will take at least one second for the searcher to see the new documents.
Answer: Size of data has got nothing to do here, it's simply a background process in elasticsearch which commits data from im-memory(which is not available to searches) to segments(Hope you know what segments in ES and Lucene), so that it's available for searches.
2.The default refresh interval is one second for indices that receive or more search requests in the last 30 seconds.
Answer: This is the smart optimization done by elasticsearch to reduce the overhead of refresh(explained earlier), if your indices didn't get any search request in last 30 seconds, so no need to explicit refresh(as only when you search, you will get to see the latest data, available by using refresh), Hence on indices which have not got any search requests in last 30 seconds, ES can skip the refresh on those indices, even their refresh interval is 1 second.

elastic query returns same results after insert

I'm using elasticsearch.js to move a document from one index to another.
1a) Query index_new for all docs and display on the page.
1b) Use query of index_old to obtain a document by id.
2) Use an insert to index_new, inserting result from index_old.
3) Delete document from index_old (by id).
4) Requery index_new to see all docs (including the new one). However, at this point, it returns the same list of results as returned in 1a. Not including the new document.
Is this because of caching? When I refresh the whole page, and 1a is triggered, the new document is there.. But not without a refresh.
Thanks,
Daniel
This is due to the segments merging and refreshing that happens inside the elasticsearch indexes per shard and replica.
Whenever you are writing to the index wou never write to the original index file but rather write to newer smaller files called segment which then gets merged into the bigger file in background batch jobs.
Next question that you might have is
How often does this thing happen or how can one have a control over this
There is a setting in the index level configuration called refresh_interval. It can have multiple values depending upon the kind of strategy that you want to use.
refresh_interval -
-1 : To stop elasticsearch handle the merging and you control at your end with the _refresh API in elasticsearch.
X : x is an integer and has a value in seconds. Hence elasticsearch will refresh all the indexes every x seconds.
If you have replication enabled into your indexes then you might also experience in result value toggling. This happens just because the indexes have multiple shard and a shard has multiple replicas. Hence different replicas have different window pattern for refreshing. Hence while querying the query actually routes to different shard replicas in the meantime which shows different states in the time window.
Hence if you are using a setting to set periods of refresh interval then assume to have a consistent state in next X to 2X seconds at max.
Segment Merge Background details
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/indices-update-settings.html

Last updated time for an index in Elasticsearch

I have a use case where I ran a batch code to first create and then subsequently update my index in elasticsearch.
My program crashed pre-maturedly and now I want to know what was the last time that an update was made to my elasticsearch index.
Is there any api which could give me the information for the last update time of the index.
I have not been able to find any such resources. I looked specifically in https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-stats.html
and tried,
curl http://{myhost}/{indexName}/_stats

Getting an indexes item count with ElasticSearch

I am writing some code where we are inserting 200,000 items into an ElasticSearch index.
Whilst this works fine, when we get a count of items in the index to ascertain everything went in, we are not getting the same number. However, if we wait a second or two, the count is correct.
Therefore, is there a programmatic way we can get a real count from ElasticSearch without having to sleep or similar?
Newly indexed records become visible in search results only after the Refresh operation. Refresh is called automatically with frequency specified by index.refresh_interval setting, which is 1s by default. When writing elasticsearch tests, it's customary to call refresh after indexing to make sure that all indexed records are available in searches. However, excessive refresh calls (after each record, for example) in production code might hamper the elasticsearch indexing performance.

Resources