Elasticsearch search api to get total hit count? - elasticsearch

I have a use case:
I need to use _search API to fetch whole bunch of records in a paginated way.
But at the same time, I would want to get the total hit number in the same _search API call.
Example:
The pagination number is 50, that is, I want to fetch result in a 50 batch manner. At the same time, I want to get the total hit number, let's say 5000 for each search call.
I have 2 questions:
Is this possible? get total hit number as the result of a _search API call?
Would the total hit number be impacted due to the pagination?

you can get total hit in search API with adding track_total_hits=true option.
GET localhost:9200/_search?pretty&track_total_hits=true
if you are using search API with from=X&size=50 for pagination, yes it is possible that the number of docs change during of pagination. but it depends of refresh interval. you can increase the refresh interval. there is another solution for this problem. Pit API.
https://www.elastic.co/guide/en/elasticsearch/reference/current/point-in-time-api.html
also from=X&size=50 with you have limit for pagination(I think you can only fetch 10000 docs) you could increase this limitation. or use scroll API.

Image from Search API ES-DOCs.. You can use hits -> total.

Related

Elastic Search Version 7.17 Java Rest API returns incorrect totalElements and total pages using queryBuilder

We are currently upgrading our system from ElasticSearch 6.8.8 to ElasticSearch 7.17. When we run pageable queries using the Java Rest API, the results are incorrect.
For example, in version 6.8.8, if we query for data with and request page 2 with a page size of 10, the query return the 10 items on page 2 and give us a totalElement of 10000 records which is correct. When we run this same exact query on Version 7.17, it returns 10 items on page 2 but only gives us a totalElement of 10 instead of the correct number. We need the correct number, so that our gridview handles paging correctly. Is there a setting I am missing in ElasticSearch version 7.17?
Elasticsearch implemented an option of Track_total_hits in all search in ES 7.X.
Generally the total hit count can’t be computed accurately without visiting all matches, which is costly for queries that match lots of documents. The track_total_hits parameter allows you to control how the total number of hits should be tracked. Given that it is often enough to have a lower bound of the number of hits, such as "there are at least 10000 hits", the default is set to 10,000. This means that requests will count the total hit accurately up to 10,000 hits. It is a good trade-off to speed up searches if you don’t need the accurate number of hits after a certain threshold.
So to force ES to calculate all the hit documents you should set Track_total_hits to true. For more information, you can check the ES official documentation page here.

Grafana with Elastic - Show requests count toguether with average response time

I'm new at Grafana and I'm trying to create a graph that shows the requests count together with the average response time for the requests, I was able to create my requests count but now I'm struggling to add the information with the requests time, there is an option to show both information inside a panel? Or do I need to create two panels, one with the request count and another with the average time?
And another question, there is an option to show the average time in milliseconds?

ElasticSearch paginate over 10K result

Elasticsearch's search feature only support 10K result by default. I know I can specific the "size" parameter in the search query, but this only applies to number of result to get back in one call.
If I want to iterate over 20K results using size=100, making 200 calls total. How should I do it?

ES returns only 10 records by default. How to get all records without using scroll API

When we query ES for records it returns 10 records by default. How can I get all the records in the same query without using any scroll API.
There is an option to specify the size, but size is not known in advance.
You can retrieve up to 10k results in one request (setting "size": 10000). If you have less than 10k matching documents then you can paginate over them using a combination of from/size parameters. If it is more, then you would have to use other methods:
Note that from + size can not be more than the index.max_result_window index setting which defaults to 10,000. See the Scroll or Search After API for more efficient ways to do deep scrolling.
To be able to do pagination over unknown number of documents you will have to get the total count from the first returned query.
Note that if there are concurrent changes in the data, results of paginated retrieval may not be consistent (for example, if one document gets inserted or deleted while you are paginating). Scroll is consistent since it is created from a "snapshot" of the index created at query start time.

Query for the lack of requests in specific points in time

I have an Elasticsearch/kibana stack that stores every request the application receives. It stores gereneral information about the request (RequestTimestamp, IP, Headers, HttpStatus, Route etc), and there's at least some requests per minute.
I would like to know if there's some way to query Kibana/Elastic to know the points in time that the application didn't receive any request for, let's say, 3 minutes.
I know it can be done programmatically, but it needs to be purely done with querys (so I can show it on the Dashboard).
You could do date histogram aggregation.
You could specify 3m interval and query for a specified day.
So you would get 24*60/3 = 480 values for each day.
You could plot it on the chart and see the gaps.
If you are an expert ES user you could try filtering the aggregations using bucket selector pipeline aggregation or create a moving average using moving average aggregation.

Resources