Query returns result with small size that is not my intention in elasticsearch - elasticsearch

I am using rest api to query the result from ElasticSearch.
Below is the API query string.
GET /..../_search
{
"size":0,
"query": {
"bool": {
"must": [
{ "range": {
"#timestamp": {
"time_zone": "+09:00",
"gte": "2023-01-24T00:00:00.000Z",
"lt": "2023-01-24T03:03:00.000Z" } } },
{
"term" : {
"serviceid.keyword" : {
"value" : "430011397"
}
}
}
]
}
},
"aggs": {
"by_day": {
"auto_date_histogram": {
"field": "#timestamp",
"minimum_interval":"minute"
},
"aggs": {
"agg-type": {
"terms": {
"field": "nxlogtype.keyword",
"size": 100000
},
"aggs": {
"my-sub-agg-name": {
"avg": {
"field": "size"
}
}
}
}
}
}
}
}
As you can see, I specified the time range about three hours in gte and lt field.
However, the result returns only 6 buckets which have 30 minute intervals.
I expected that many buckets will be returned with one minute interval during the timestamp I specified, but the result is always same even though I changed the time range as more extended one.
Since I am quite new to elastic search, I am not familiar with query usage.
How to resolve my issue?

Related

Filter an elasticsearch result after an aggregation

I have this elasticsearch query that get every x-locations for which the number of documents (with timestamp gte 1 month ago) is greater than 5000. I'm also able to get the most recent data timestamp for each of these x-locations.
Is it possible to add an additional filter at the end of the query, in order to ignore all x-locations for which the most recent timestamp is older than 2 days ago?
The query:
GET /mypattern-*/_search
{
"query": {
"bool": {
"must": [
{"match": {"method": "GET"}},
{
"range": {
"timestamp": {
"gte": "now-1M"
}
}
}
]
}
},
"aggs": {
"location_terms": {
"terms": {
"field": "x-location.keyword",
"min_doc_count": 500,
"size": 1000,
"order": {
"recent_timestamp": "desc"
}
},
"aggs": {
"recent_timestamp": {
"max": {
"field": "timestamp"
}
}
}
}
}
}

Get very large total result count from pipeline aggregation

I have a query that I'm executing on an event table, which finds all productIds for product events where the active field changed from one date to another. This query returns an extremely large dataset, which I plan to paginate using partitions.
In order to know how large my partitions should be, I need a total count of docs returned by this query. However, If I run the query itself and return all of the docs, I unsurprisingly get a memory error (this occurs even if I use filter to return just the count).
Is there a way to process and return just the total result count?
{
"query": {
"bool": {
"should": [{
"range": {
"timeRange": { "gte": "2022-05-22T00:00:00.000Z", "lte": "2022-05-22T00:00:00.000Z" }
}, {
"range": {
"timeRange": { "gte": "2022-05-01T00:00:00.000Z", "lte": "2022-05-01T00:00:00.000Z" }
}
}
]
}
},
"version": true,
"aggs": {
"total_entities": {
"stats_bucket": {
"buckets_path": "group_by_entity_id>distinct_val_count"
}
},
"group_by_entity_id": {
"terms": {
"field": "productId",
"size": 500000
},
"aggs": {
"distinct_val_count": {
"cardinality": {
"field": "active"
}
},
"distinct_val_count_filter": {
"bucket_selector": {
"buckets_path": {
"distinct_val_count": "distinct_val_count"
},
"script": "params.distinct_val_count > 1"
}
}
}
}
}
}

Elasticsearch Pagination with timestamp range

Elasticsearch official documentation introduce that elasticsearch can realize pagination by composite aggregations.
The composite aggregation will fetch data many times to get all results.
So my question is, Can I use range from now-1h to now when I execute composite aggregation?
If I can. How to composite aggregation query keep source data unchanging when every range query have different now.
If I can't. My query below has no error and the result seems to be right.
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"range": {
"timestamp": {
"gte": "now-1h"
}
}
}
]
}
},
"aggs": {
"user_device": {
"composite": {
"after": {
"user_name": "alen.lv"
},
"size": 100,
"sources": [
{
"user_name": {
"terms": {
"field": "user_name"
}
}
}
]
},
"aggs": {
"user_mac": {
"terms": {
"field": "user_mac",
"size": 1000
}
}
}
}
}
}

Need aggregation of only the query results

I need to do an aggregation but only with the limited results I get form the query, but it is not working, it returns other results outside the size limit of the query. Here is the query I am doing
{
"size": 500,
"query": {
"bool": {
"must": [
{
"term": {
"tags.keyword": "possiblePurchase"
}
},
{
"term": {
"clientName": "Ci"
}
},
{
"range": {
"firstSeenDate": {
"gte": "now-30d"
}
}
}
],
"must_not": [
{
"term": {
"tags.keyword": "skipPurchase"
}
}
]
}
},
"sort": [
{
"firstSeenDate": {
"order": "desc"
}
}
],
"aggs": {
"byClient": {
"terms": {
"field": "clientName",
"size": 25
},
"aggs": {
"byTarget": {
"terms": {
"field": "targetName",
"size": 6
},
"aggs": {
"byId": {
"terms": {
"field": "id",
"size": 5
}
}
}
}
}
}
}
}
I need the aggregations to only consider the first 500 results of the query, sorted by the field I am requesting on the query. I am completely lost. Thanks for the help
Scope of the aggregation is the number of hits of your query, the size parameter is only used to specify the number of hits to fetch and display.
If you want to restrict the scope of the aggregation on the first n hits of a query, I would suggest the sampler aggregation in combination with your query

How can I count the number of documents where a field is within a certain range?

I am trying to build an elasticsearch query that counts the number of documents where a certain field is within a certain range. This aggregation is also contained inside of a date histogram aggregation, but I don't think that matters for the purpose of this question.
Example Data:
ID: Score
01: 4
02: 5
03: 10
04: 9
I would like to count the number of documents where 'Score' is >= 9. I have tried scripts and filters within this aggregation, but I can't get it to work.
This aggregation counts all documents, not just the ones that match the script.
"aggs": {
"report_days": {
"date_histogram": {
"field": "Date",
"interval": "day"
},
"aggs": {
"value_count": {
"field": "Score",
"script": "_value >=9"
}
}
}
}
This following aggregation gives me a parse failure, saying Parse Failure [Expected [START_OBJECT] under [field], but got a [VALUE_STRING] in [value_count]]:
"aggs": {
"report_days": {
"date_histogram": {
"field": "Date",
"interval": "day"
},
"aggs": {
"value_count": {
"field": "Score",
"filter": {
"range": {
"Score": {
"gte": 9
}
}
}
}
}
}
}
Thanks for any suggestions!
This query will give you the number of docs with score >= 9
{
"query": {
"range": {
"score": {
"gte": 9
}
}
}
}
and this agg will do the same
{
"aggs": {
"my agg": {
"range": {
"field": "score",
"ranges": [
{
"from": 9
}
]
}
}
}
}
Run the query ("score:>9") and check the hits->total value. See the examples in the doc.

Resources