Elasticserach filter on aggregated results (SQL HAVING) - elasticsearch

I have an ES query that aggregates data from a monitoring tool.
Currently, I've found the number of documents in each relevant group (by "externalId").
Now, I wish to filter the results by the number of records in each group.
(Similar to "HAVING" clause in SQL, doc_count > 0)
For instance, to find the "externalId" that stored more then one time.
This is my ES query:
{
"query":
{
"match" :
{
"method" : "METHOD_NAME"
}
},
"size":0,
"aggs":
{
"group_by_external_id":
{
"terms":
{
"field": "externalId"
}
}
}
}
The results looks like this:
"aggregations": {
"group_by_external_id": {
"doc_count_error_upper_bound": 5,
"sum_other_doc_count": 53056,
"buckets": [
{
"key": "6088417651626873",
"doc_count": 1
},
{
"key": "6088417688232882",
"doc_count": 1
}

Terms aggregations have a min_doc_count option you can use. For example,
"aggs":
{
"group_by_external_id":
{
"terms":
{
"field": "externalId",
"min_doc_count": 2
}
}
}

Related

Filtering aggregation results

This question is a subquestion of this question. Posting as a separate question for attention.
Sample Docs:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Ask: To get products belonging to a particular category. e.g cat_id = 3
Query:
GET product/_search
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cats",
"size": 10
},"aggs": {
"products": {
"terms": {
"field": "name.keyword",
"size": 10
}
}
}
}
}
}
Question:
How to filter the aggregated result for cat_id = 3 here. I tried bucket_selector as well but it is not working.
Note: Due to multi-value of cat_ids filtering and then aggregation isn't working
You can filter values, on the basis of which buckets will be created.
It is possible to filter the values for which buckets will be created.
This can be done using the include and exclude parameters which are
based on regular expression strings or arrays of exact values.
Additionally, include clauses can filter using partition expressions.
Adding a working example with index data, search query, and search result
Index Data:
{
"id":1,
"product":"p1",
"cat_ids":[1,2,3]
}
{
"id":2,
"product":"p2",
"cat_ids":[3,4,5]
}
{
"id":3,
"product":"p3",
"cat_ids":[4,5,6]
}
Search Query:
{
"size": 0,
"aggs": {
"cats": {
"terms": {
"field": "cat_ids",
"include": [ <-- note this
3
]
},
"aggs": {
"products": {
"terms": {
"field": "product.keyword",
"size": 10
}
}
}
}
}
}
Search Result:
"aggregations": {
"cats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 2,
"products": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "p1",
"doc_count": 1
},
{
"key": "p2",
"doc_count": 1
}
]
}
}
]
}

How can I know if two different aggregations aggregated the same docs?

Suppose I have two aggs:
GET .../_search
{
"size": 0,
"aggs": {
"foo": {
"terms": {
"field": "foo"
}
},
"bar": {
"terms": {
"field": "bar"
}
}
}
}
Which returns the following:
...
"aggregations": {
"foo": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Africa",
"doc_count": 23
}
]
},
"bar": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Oil",
"doc_count": 23
}
]
}
}
My question is, how can I know if both "foo" and "bar" aggs are aggregating the same 23 docs?
I tried adding a sub agg to both "foo" and "bar" aggs to sum an arbitrary numeric field, but that's not remotely foolproof.
You can add a subaggregation which aggregates the identity field of the documents, you can do this with terms or either composite aggregation. When using terms you need to provide a size. See this example:
GET .../_search
{
"size": 0,
"aggs": {
"foo": {
"terms": {
"field": "foo"
},
"aggs" : {
"terms" : {
"field" : your_id_here
}
}
},
"bar": {
"terms": {
"field": "bar"
},
"aggs" : {
"terms" : {
"field" : your_id_here
}
}
}
}
}
You will need to compare the nested aggregations then.
Another approach would be to just filter out the desired documents using the search query.

Elasticsearch - Sum with group by

I couldn't apply the concept of chain aggregation... i need help with this scenario:
My documents look like this:
{
"date":"2019-01-30",
"value":1234.56,
"partnerId":9876
}
and i would like to filter by date (month) and summarize them by partner Id, and then count it, obtaining a result like:
{
"partnerId": 9876,
"totalValue": 12345567.87,
"count": 6574
}
How would this query look like?
What you are trying to achieve can be done by sub aggregation, in other words aggregation inside aggregation.
For your case first you want to group by parternId, so you will require terms aggregation on parternId field. Lets call this aggregation as partners. This will give you two values of your expected result, parternId and count.
Now for each of the groups (bucket) of partnerId, totalValue is required i.e. sum of value for each partnerId. This can be done by adding sum aggregation inside term aggregation partners. So the final query along with the filter for date (month) will be:
{
"query": {
"bool": {
"filter": {
"range": {
"date": {
"gte": "2019-01-01",
"lte": "2019-01-31"
}
}
}
}
},
"aggs": {
"partner": {
"terms": {
"field": "partnerId"
},
"aggs": {
"totalValue": {
"sum": {
"field": "value"
}
}
}
}
}
}
Sample Result (agg only):
"aggregations": {
"partner": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 9876,
"doc_count": 3,
"totalValue": {
"value": 3704.68017578125
}
},
{
"key": 9878,
"doc_count": 2,
"totalValue": {
"value": 2454.1201171875
}
}
]
}
In the result above key is partnerId, doc_count is count and totalValue.value is totalValue of your expected result.

Getting description when aggregating with Elasticsearch

When we use the aggregation feature on elastic, we get a value of the field we aggregating back but we also want to get the description of that field. We have to use the sector.id as other parts of our api uses it later on.
For ex: our data looks like this:
[{
"id":"123"
"sectors":[{
"id":"sector-1",
"name":"Automotive"
}]
},
{
"id":"123"
"sectors":[{
"id":"sector-2",
"name":"Biology"
}]
}]
When we aggregate over sectors.id our response looks like:
"aggregations": {
"sector": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "sector-2",
"doc_count": 19672
},
{
"key": "sector-1",
"doc_count": 11699
}]
}
}
Is there any way to get sectors.name as well as the key in the results?
It seems like that sectors should a nested field. Now assuming that sector name is unique per sector-id.
You may use sub-aggregations to figure out the related keys
GET _search
{
"size": 0,
"aggs": {
"sectors": {
"nested": {
"path": "sectors"
},
"aggs": {
"sector_id": {
"terms": {
"field": "sectors.id"
},
"aggs": {
"sector_name": {
"terms": {
"field": "sectors.name"
}
}
}
}
}
}
}
}

How to use ElasticSearch to bucket historical data from midnight to now?

So I have an index with timestamps in the following format:
2015-03-20T12:00:00+0500
What I would like to do in the SQL equivalent is the following:
select date(timestamp), sum(orders)
from data
where time(timestamp) < time(now)
group by date(timestamp)
I know I need an aggregation but, for now, I've tried a basic search query below but I'm getting a malformed error:
{
"size": 0,
"query":
{
"filtered":
{
"query":
{
"match_all" : {}
},
"filter":
{
"range":
{
"#timestamp":
{
"from": "00:00:01.000",
"to": "15:00:00.000"
}
}
}
}
}
}
You do indeed want an aggregation, specifically the date histogram aggregation. Something like
{
"query": {"match_all": {}},
"aggs": {
"by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"order_sum": {
"sum": {"field": "foo"}
}
}
}
}
}
First you have a bucketing aggregation that groups your documents by date, then inside that a metric aggregation that computes a value (in this case a sum) for each bucket
which would return data of the form
{
...
"aggregations": {
"by_date": {
"buckets": [
{
"key_as_string": "2015-03-01T00:00:00.000Z",
"key": 1425168000000,
"doc_count": 8644,
"order_sum": {
"value": 1234
}
},
{
"key_as_string": "2015-03-02T00:00:00.000Z",
"key": 1425254400000,
"doc_count": 8819,
"order_sum": {
"value": 45678
}
},
...
]
}
}
}
There is a good intro to aggregations on the elasticsearch blog (part 1 and part 2) if you want to do some more reading.

Resources