How to terminate the Elastic Search query when it times out? - elasticsearch

This is my query HTTP POST.
URL : http://127.0.0.1:9200/*-2023.02.*/_search?timeout=10ms
Request :
{
"query": {
"bool": {
"must": [
{
"match": {
"event.code": "1"
}
}
]
}
},
"sort": [
{
"#timestamp": {
"order": "asc"
}
}
],
"size": 10000
}
Response :
{
"took": 1557,
"timed_out": false,
"_shards": {
"total": 984,
"successful": 984,
"skipped": 826,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
May I ask why I set the timeout 10ms, but the time spent is 1557ms(took) ?
How can I set a timeout so that Elastic terminates the query?
Elastic Search Version 7.8.1.

The timeout parameter is per shard. If the time spent on one shard exceeds the timeout value, then the current search on that shard is cancelled and the hits gathered till then are returned.
As you can see, you have 984 shards, so if you have a single node with a single processor it could in theory take up to 9.84 seconds to return with a 10ms timeout. It's probably not your case since the query returned in 1.5 seconds, but that was just to illustrate that the timeout is not working the way you expect it to.

Related

How do I use the whitespace analyzer correctly?

I am currently having an issue where I cannot search for UUID's in my logs. For instance, I have a fieldname "log" and in there is a full log, for example:
"log": "time=\"2022-10-10T07:46:00Z\" level=info msg=\"message to endpoint (outgoing)\" message=\"{8503fb5a-3899-4305-8480-6ddc0f5df296 2022-10-10T09:45:59+02:00}\"\n",
I want to get this log in elastic search, and via Postman I send this:
{
"query": {
"match": {
"log": {
"analyzer": "whitespace",
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
},
"size": 50,
"from": 0
}
As a response I get:
{
"took": 930,
"timed_out": false,
"num_reduce_phases": 2,
"_shards": {
"total": 581,
"successful": 581,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
But when I search on "8503fb5a" alone, then I get the wanted results. This means the dashes are still causing issues, but I thought using the whitespace analyzer should fix this? Am I doing something wrong?
These are the fields I have.
You not required to use whitespace analyzer.
You have 2 option to search entire UUID.
First, You can use match query with operator set to and:
{
"query": {
"match": {
"log":{
"query": "8503fb5a-3899-4305-8480-6ddc0f5df296",
"operator": "and"
}
}
}
}
Second, You can use match_phrase query which will search for exact match.
{
"query": {
"match_phrase": {
"log": "8503fb5a-3899-4305-8480-6ddc0f5df296"
}
}
}

Elasticsearch aggregation shows incorrect total

Elasticsearch version is 7.4.2
I suck at Elasticsearch and I'm trying to figure out what's wrong with this query.
{
"size": 10,
"from": 0,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "firstName"
}
},
{
"query_string": {
"query": "*",
"fields": [
"params.display",
"params.description",
"params.name",
"lastName"
]
}
},
{
"match": {
"status": "DONE"
}
}
],
"filter": [
{
"term": {
"success": true
}
}
]
}
},
"sort": {
"createDate": "desc"
},
"collapse": {
"field": "lastName.keyword",
"inner_hits": {
"name": "lastChange",
"size": 1,
"sort": [
{
"createDate": "desc"
}
]
}
},
"aggs": {
"total": {
"cardinality": {
"field": "lastName.keyword"
}
}
}
}
It returns:
"aggregations": {
"total": {
"value": 429896
}
}
So ~430k results, but in pagination we stop getting results around the 426k mark. Meaning, when I run the query with
{
"size": 10,
"from": 427000,
...
}
I get:
{
"took": 2215,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"total": {
"value": 429896
}
}
}
But if I change from to be 426000 I still get results.
You are comparing the cardinality aggregation value of your field lastName.keyword to your total documents in the index, which is two different things.
You can check the total no of documents in your index using the count API and from/size you are defined at query level ie it brings the documents matching your search query and as you don't have track_total_hits it shows 10k with relation gte means there are more than 10k documents matching your search query.
When it comes to your aggregation, I can see in both the case it returns the count as 429896 as this aggregation is not depend on the from/size you are mentioning for your query.
I was surprised when I found out that the cardinality parameter has Precision control.
Setting the maximum value was the solution for me.

Elasticsearch aggregation limitation

When I create an aggregate query what scope it is applied to: all entries in an index or just first 10000?
For example, here is a response I got for a script metric aggregation:
{
"took": 76,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 10000,
"relation": "gte"
},
"max_score": null,
"hits": []
},
"aggregations": {
"number_of_operations_in_progress": {
"value": 2
}
}
}
hits->total->value is 10000 what makes me think that the aggregate function is applied to first 10000 entries only, not the whole data set in the index.
Is my understanding correct? If yes, is there a way to apply an aggregate function to all entries?
Aggregations are always applied to the whole document set that is selected by the query.
hits.total.value only gives a hint at how many documents match the query, in this case more than 10K documents match the query.
you can usr track_total_hits to control how the total number of hits should be tracked
POST index1/_search
{
"track_total_hits": true,
"query": {
"match_all": {}
},
"aggs": {
"groupbyk1": {
"terms": {
"field": "k1"
}
}
}
}

Building backend for reporting using elasticsearch: Is this query possible?

Is it possible to query elasticsearch to sum the number of minutes an entry is in a given status based on the datetimes for a month?
For example, entries would be of the form:
Datetime Cluster Hosts_on Hosts_off Hosts_on_percentage
Oct 10 12:01 c101 10 2 .8333
Oct 10 12:02 c101 10 2 .8333
Oct 10 12:03 c101 10 2 .8333
Is it possible to sum the number of minutes c101 has had greater than 60% hosts based on the datetime?
Not exactly, but you can get pretty close with something like this:
POST /test_index/_search?search_type=count
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"Cluster": "c101"
}
},
{
"range": {
"Hosts_on_percentage": {
"gt": 0.6
}
}
}
]
}
}
}
},
"aggs": {
"min_datetime": {
"min": {
"field": "Datetime"
}
},
"max_datetime": {
"max": {
"field": "Datetime"
}
}
}
}
With the data you posted, this query returns:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"max_datetime": {
"value": 820980000,
"value_as_string": "Jan 10 12:03"
},
"min_datetime": {
"value": 820860000,
"value_as_string": "Jan 10 12:01"
}
}
}
So then you could calculate the difference in the min and max time client-side.
Or, if you just want a count of the documents returned, you can get it from:
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
Here is some code I used to test it (getting the date mapping right is important here):
http://sense.qbox.io/gist/c62289926a18e34b1b1b31e3643f36cbe5a7b4cf
You can definitive sum up minutes if they are in a field for each cluster and datetime. You have to use bucket and metrics aggregation. The condition could be made by a range aggregation.
Links i putted below; i hope i could give you an idea how to solve this task:-)
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-range-aggregation.html

How can an aggregation be greater than the total number of hits?

I have records of the type
{
"_index": "constant",
"_type": "host",
"_id": "AU7TX249tNLhGJRMfUXb",
"_score": 1,
"_source": {
"private": true,
"host-ip": "172.22.69.64",
}
}
If I look for aggregates of private and host-ip via
POST constant/host/_search
{
"query": {
"match_all": {}
},
"size": 0,
"aggs":{
"test":{
"cardinality":{
"field": "host-ip"
}
},
"test2":{
"cardinality":{
"field": "private"
}
}
}
}
I get as a result
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 7730,
"max_score": 0,
"hits": []
},
"aggregations": {
"test": {
"value": 7860
},
"test2": {
"value": 2
}
}
}
My understanding of the result above is the following:
there is a total of 7730 documents of type host in the index constant
there are two different values for private (this is expected)
What I do not understand is how it is possible to have 7860 distinct values of host-ip when the total number of documents in the index is 7730?
Is my understanding of total in hits correct?
Cardinality aggregation is not exact. As the doc says:
A single-value metrics aggregation that calculates an approximate count of distinct values
So, that's the reason behind the greater number.
You can play with the option precision_threshold to make results more accurate, but it will consume more resources.

Resources