Elasticsearch : How get result buckets size - elasticsearch

Here is my query result
GET _search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"serviceName.keyword": "directory-view-service"
}
},
{
"match": {
"path": "thewall"
}
},
{
"range": {
"#timestamp": {
"from": "now-31d",
"to": "now"
}
}
}
]
}
},
"aggs": {
"by_day": {
"date_histogram": {
"field": "date",
"interval": "7d"
},
"aggs": {
"byUserUid": {
"terms": {
"field": "token_userId.keyword",
"size": 150000
},
"aggs": {
"filterByCallNumber": {
"bucket_selector": {
"buckets_path": {
"doc_count": "_count"
},
"script": {
"inline": "params.doc_count <= 1"
}
}
}
}
}
}
}
}
}
I want my query return all user call my endpoint min. once time by 1 month range by 7 days interval, until then everything is good.
But my result is a buckets with 370 elements and I just need to know the array size...
Are there any keyword or how can I handle it ?
Thanks

Related

ElasticSearch - Aggregation result not matching total hits

I have query like below. It returns 320 results for the below condition-
{
"size": "5000",
"sort": [
{
"errorDateTime": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"errorDateTime": {
"gte": "2021-04-07T20:08:20.516",
"lte": "2021-04-08T00:08:20.516"
}
}
},
{
"bool": {
"should": [
{
"match": {
"businessFunction": "PriceUpdate"
}
},
{
"match": {
"businessFunction": "PriceFeedIntegration"
}
},
{
"match": {
"businessFunction": "StoreConnectivity"
}
},
{
"match": {
"businessFunction": "Transaction"
}
},
{
"match": {
"businessFunction": "SalesSummary"
}
}
]
}
}
]
}
},
"aggs": {
"genres_and_store": {
"terms": {
"field": "storeId"
},
"aggs": {
"genres_and_error": {
"terms": {
"field": "errorCode"
},
"aggs": {
"genres_and_business": {
"terms": {
"field": "businessFunction"
}
}
}
}
}
}
}
}
However the aggregation results are not matching. I have so many stores which are not returned in aggregation but I can see them in query result. What am I missing? My schema looks like -
{
"errorDescription": "FTP Service unable to connect to Store to list the files for Store 12345",
"errorDateTime": "2021-04-07T21:01:15.040546",
"readBy": [],
"errorCode": "e004",
"businessFunction": "TransactionError",
"storeId": "12345"
}
Please let me know if I am writing the query wrong. I want to aggregare per store, per errorcode and per businessFunction.
If no size param is set in the terms aggregation, then by default it returns the top 10 terms, which are ordered by their doc_count. You need to add the size param in the terms aggregation, to get all the matching total hits.
Try out the below query
{
"size": "5000",
"sort": [
{
"errorDateTime": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"range": {
"errorDateTime": {
"gte": "2021-04-07T20:08:20.516",
"lte": "2021-04-08T00:08:20.516"
}
}
},
{
"bool": {
"should": [
{
"match": {
"businessFunction": "PriceUpdate"
}
},
{
"match": {
"businessFunction": "PriceFeedIntegration"
}
},
{
"match": {
"businessFunction": "StoreConnectivity"
}
},
{
"match": {
"businessFunction": "Transaction"
}
},
{
"match": {
"businessFunction": "SalesSummary"
}
}
]
}
}
]
}
},
"aggs": {
"genres_and_store": {
"terms": {
"field": "storeId",
"size": 100 // note this
},
"aggs": {
"genres_and_error": {
"terms": {
"field": "errorCode"
},
"aggs": {
"genres_and_business": {
"terms": {
"field": "businessFunction"
}
}
}
}
}
}
}
}
I think I was missing size parameter inside aggs and was getting default 10 aggregations only:
"aggs": {
"genres_and_store": {
"terms": {
"field": "storeId",
"size": 1000
},

ElasticSearch: Nested buckets aggregation

I'm new to ElasticSearch, so this question could be quite trivial for you, but here I go:
I'm using kibana_sample_data_ecommerce, which documents have a mapping like this
{
...
"order_date" : <datetime>
"taxful_total_price" : <double>
...
}
I want to get a basic daily behavior of the data:
Expecting documents like this:
[
{
"qtime" : "00:00",
"mean" : 20,
"std" : 40
},
{
"qtime" : "01:00",
"mean" : 150,
"std" : 64
},
...
]
So, the process I think that I need to do is:
Group by day all records ->
Group by time window for each day ->
Sum all record in each time window ->
Cumulative Sum for each sum by time window, thus, I get behavior of a day ->
Extended_stats by the same time window across all days
And that can be expressed like this:
But I can't unwrap those buckets to process those statistics. May you give me some advice to do that operation and get that result?
Here is my current query(kibana developer tools):
POST kibana_sample_data_ecommerce/_search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"order_date": {
"gt": "now-1M",
"lte": "now"
}
}
}
]
}
},
"aggs": {
"day_histo": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"qmin_histo": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "hour"
},
"aggs": {
"qminute_sum": {
"sum": {
"field": "taxful_total_price"
}
},
"cumulative_qminute_sum": {
"cumulative_sum": {
"buckets_path": "qminute_sum"
}
}
}
}
}
}
}
}
Here's how you pull off the extended stats:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"order_date": {
"gt": "now-4M",
"lte": "now"
}
}
}
]
}
},
"aggs": {
"by_day": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "day"
},
"aggs": {
"by_hour": {
"date_histogram": {
"field": "order_date",
"calendar_interval": "hour"
},
"aggs": {
"by_taxful_total_price": {
"extended_stats": {
"field": "taxful_total_price"
}
}
}
}
}
}
}
}
yielding

ElasticSearch query with prefix for aggregation

I am trying to add a prefix condition for my ES query in a "must" clause.
My current query looks something like this:
body = {
"query": {
"bool": {
"must":
{ "term": { "article_lang": 0 }}
,
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I need to add a mandatory condition to my query to filter articles whose id starts with "article-".
So, far I have tried this:
{
"query": {
"bool": {
"should": [
{ "term": { "article_lang": 0 }},
{ "prefix": { "article_id": {"value": "article-"} }}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}
I am fairly new to ES and from the documentations online, I know that "should" is to be used for "OR" conditions and "must" for "AND". This is returning me some data but as per the condition it will be consisting of either article_lang=0 or articles starting with article-. When I use "must", it doesn't return anything.
I am certain that there are articles with id starting with this prefix because currently, we are iterating through this result to filter out such articles. What am I missing here?
In your prefix query, you need to use the article_id.keyword field, not article_id. Also, you should prefer filter over must since you're simply doing yes/no matching (aka filters)
{
"query": {
"bool": {
"filter": [ <-- change this
{
"term": {
"article_lang": 0
}
},
{
"prefix": {
"article_id.keyword": { <-- and this
"value": "article-"
}
}
}
],
"filter": {
"range": {
"created_time": {
"gte": "now-3h"
}
}
}
}
},
"aggs": {
"articles": {
"terms": {
"field": "article_id.keyword",
"order": {
"score": "desc"
},
"size": 1000
},
"aggs": {
"score": {
"sum": {
"field": "score"
}
}
}
}
}
}

Filter based on scripted field in an aggregation

Here is my source code:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2017-04-01T00:00:00.000Z",
"lte": "2018-03-31T23:59:59.999Z"
}
}
}
]
}
},
"aggs": {
"all_match": {
"filters": {
"filters": {
"all": {
"match_all": {}
}
}
},
"aggs": {
"jobs": {
"terms": {
"field": "job_num",
"size": 200000
},
"aggs": {
"latest_job": {
"top_hits": {
"size": 1,
"sort": [{"rec_date": "desc"}],
"script_fields": {
"reqd_flag":{
"script": {
"lang": "painless",
"inline": "params['_source']['required_flag'] == 'Y' ? 0 : 1"
}
}
}
}
}
}
}
}
}
}
}
I need to filter out the records returned in 'latest_job' aggregation based on the value present in the field 'reqd_flag'.
NOTE: 'all_match' aggregation was created as a work around to apply bucket_path.
Any input / pointers / suggestions are appreciated.
Thank you in advance.
ElasticSearch forum link
GitHub link

Query elasticsearch with multiple numeric ranges

{
"query": {
"filtered": {
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"filter": {
"numeric_range": {
"time_taken": {
"gte": 10
}
}
}
}
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
I have to run this query 20 times as i want to know notification times above each of the following thresholds- [10,30,60,120,240,300,600,1200..]. Right now, i am running a loop and making 20 queries for fetching this.
Is there a more sane way to query elasticsearch once and get ranges that fall into these thresholds respectively?
What you probably want is a "range aggregation".
Here is the possible query where you can add more range or alter them -
{
"size": 0,
"query": {
"match": {
"log_path": "message_notification.log"
}
},
"aggs": {
"intervals": {
"range": {
"field": "time_taken",
"ranges": [
{
"to": 50
},
{
"from": 50,
"to": 100
},
{
"from": 100
}
]
},
"aggs": {
"distinct_user_ids": {
"cardinality": {
"field": "user_id"
}
}
}
}
}
}

Resources