Elasticsearch minBy - elasticsearch

Is there a way in elasticsearch to get a field from a document containing the maximum value? (Basically working similarly to maxBy from scala)
For example (mocked):
{
"aggregations": {
"grouped": {
"terms": {
"field": "grouping",
"order": {
"docWithMin": "asc"
}
},
"aggregations": {
"withMax": {
"max": {
"maxByField": "a",
"field": "b"
}
}
}
}
}
}
For which {"grouping":1,"a":2,"b":5},{"grouping":1,"a":1,"b":10}
would return (something like): {"grouped":1,"withMax":5}, where the max comes from the first object due to "a" being higher there.

Assuming you just want the document back for which a is maximum, you can do this:
{
"size": 0,
"aggs": {
"grouped": {
"terms": {
"field": "grouping"
},
"aggs": {
"maxByA": {
"top_hits": {
"sort": [
{"a": {"order": "desc"}}
],
"size": 1
}
}
}
}
}
}

Related

How to get the last Elasticsearch document for each unique value of a field?

I have a data structure in Elasticsearch that looks like:
{
"name": "abc",
"date": "2022-10-08T21:30:40.000Z",
"rank": 3
}
I want to get, for each unique name, the rank of the document (or the whole document) with the most recent date.
I currently have this:
"aggs": {
"group-by-name": {
"terms": {
"field": "name"
},
"aggs": {
"max-date": {
"max": {
"field": "date"
}
}
}
}
}
How can I get the rank (or the whole document) for each result, and if possible, in 1 request ?
You can use below options
Collapse
"collapse": {
"field": "name"
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
Top hits aggregation
{
"aggs": {
"group-by-name": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"top_doc": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}

Order by doc_count in composite aggregation (or suitable alternatives)

I have a search like the following
{
"size": 0,
"query": { "...": "..." },
"_source": false,
"aggregations": {
"agg1": { "...": "..." },
"agg2": { "...": "..." }
}
}
where agg* is composite aggregation of the kind
"agg1" : {
"composite": {
"size": 300,
"sources": [
{
"field1": {
"terms": {
"field": "field1.keyword",
"missing_bucket": true,
}
}
},
{
"field2": {
"terms": {
"field": "field2.keyword",
"missing_bucket": true,
"order": "asc"
}
}
}
]
},
"aggregations": {
"field3": {
"filter": { "term": { "field3.keyword": "xyz" } }
}
}
}
I want to order by doc_count of the buckets as I don't need all the buckets, but just the top n, like what happens in some Kibana visualizations. From the documentation of composite aggregations it doesn't seem possible to order the results similarly at what happens with terms aggregations. Is there a workaround or alternative queries to do this?

Is it possible to fetch count of total number of docs that contain a qualifying aggregation condition in elasticsearch?

I use ES v7.3 and as per my requirements I am aggregating some fields to fetch the required docs in response, further their is a requirement to fetch the count of total number of all such docs also that contain the nested field which qualifies the aggregation condition as described below but I did not find a way where I am able to do that.
Current aggregation query that I am using to fetch the documents is,
"aggs": {
"users": {
"composite": {
"sources": [
{
"users": {
"terms": {
"field": "co_profileId.keyword"
}
}
}
],
"size": 5000
},
"aggs": {
"sessions": {
"nested": {
"path": "co_score"
},
"aggs": {
"last_4_days": {
"filter": {
"range": {
"co_score.sessionTime": {
"gte": "2021-01-10T00:00:31.399Z",
"lte": "2021-01-14T01:37:31.399Z"
}
}
},
"aggs": {
"score_count": {
"sum": {
"field": "co_score.value"
}
}
}
}
}
},
"page_view_count_filter": {
"bucket_selector": {
"buckets_path": {
"sessionCount": "sessions > last_4_days > score_count"
},
"script": "params.sessionCount > 100"
}
},
"filtered_users": {
"top_hits": {
"size": 1,
"_source": {
"includes": [
"co_profileId",
"co_type",
"co_score"
]
}
}
}
}
}
}
Sample doc:
{
"co_profileId": "14654325",
"co_type": "identify",
"co_updatedAt": "2021-01-11T11:37:33.499Z",
"co_score": [
{
"value": 3,
"sessionTime": "2021-01-09T01:37:31.399Z"
},
{
"value": 3,
"sessionTime": "2021-01-10T10:47:33.419Z"
},
{
"value": 6,
"sessionTime": "2021-01-11T11:37:33.499Z"
}
]
}

Elasticsearch sort terms agg by arbitrary order

I have a terms aggregation and they want some specific values to always be at the top.
Like:
POST _search
{ "size": 0,
"aggs": {
"pets": {
"terms": {
"field": "species",
"order": "Dogs, Cats"
}
}
}
}
Where the results would be like "Dog", "Cat", "Iguana".
Dog and Cat at the top and everything else below.
Is this possible without scripting?
Thanks!
One way to do it is by filtering values in the terms aggregation. You'd create two terms aggregations, one with the desired terms and another with all other terms.
{
"size": 0,
"aggs": {
"top_terms": {
"terms": {
"field": "species",
"include": ["Dogs", "Cats"],
"order": { "_key" : "desc" }
}
},
"other_terms": {
"terms": {
"field": "species",
"exclude": ["Dogs", "Cats"]
}
}
}
}
Try it out
A script wouldn't be too complicated though -- first boost the two species, then sort by the scores first and then by _count:
GET pets/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"species": [
"dog",
"cat"
],
"boost": 10
}
},
{
"match_all": {}
}
]
}
},
"aggs": {
"pets": {
"terms": {
"field": "species.keyword",
"order": [
{
"max_score": "desc"
},
{
"_count": "desc"
}
]
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
}
}
}
}
}

ElasticSearch Query to find intersection of two queries

Records exist in this format: {user_id, state}.
I need to write an elasticsearch query to find all user_id's that have both states present in the records list.
For example, if sample records stored are:
{1,a}
{1,b}
{2,a}
{2,b}
{1,a}
{3,b}
{3,b}
The output from running the query for this example would be
{"1", "2"}
I've tried this so far:
{
"size": 0,
"query": {
"bool": {
"filter": {
"terms": {
"state": [
"a",
"b"
]
}
}
}
},
"aggs": {
"user_id_intersection": {
"terms": {
"field": "user_id",
"min_doc_count": 2,
"size": 100
}
}
}
}
but this will return
{"1", "2", "3"}
Assuming you know the cardinality of the states set, here 2, you can use the
Bucket Selector Aggregation
GET test/_search
{
"size": 0,
"aggs": {
"user_ids": {
"terms": {
"field": "user_id"
},
"aggs": {
"states_card": {
"cardinality": {
"field": "state"
}
},
"state_filter": {
"bucket_selector": {
"buckets_path": {
"states_card": "states_card"
},
"script": "params.states_card == 2"
}
}
}
}
}
}

Resources