How to take more fields when grouping - elasticsearch

Trying to group data and take all of its fields by the way.
GET /testnews/default/_search
{
"size": 10,
"from":50,
"query":{
"multi_match": {
"query": "serenay",
"fields": ["Data.Title", "Data.Description", "Data.Tags.Title", "Data.MentionTitle", "Data.Program.title", "Data.Program.description", "Data.Program.original_title"]
}
},
"sort":[{
"Data.CreatedAt": {
"order": "desc"
},
"Data.ViewCount": {
"order": "desc"
}
}],
"aggs": {
"group_by_state": {
"terms": {
"field": "Data.Program.title.keyword"
}
}
}
}
But when I did it, it returns only "Program Title" in the grouped result.
Just like:
{
"key": "Kocamın Ailesi",
"doc_count": 3
}
But I just want it like:
{
"key": "Kocamın Ailesi",
"description": "blabla",
"image": "blabla.jpg",
"date": "YYYY-mm-dd",
"doc_count": 3
}
just like sql
select * from x group by field

Regarding the SQL example, to get the behaviour of
select a, b, count(*) from x group by a, b
you can aggregate on a, then b like this:
"aggs": {
"group_by_a": {
"terms": {
"field": "a"
},
"aggs": {
"group_by_b": {
"terms": {
"field":"b"
}
}
}
}
}
But I don't think that is what you're looking for?
If you want the full documents in aggregations you can use the "top_hits" aggregation to select the top n hits within each aggregation:
{
"aggs": {
"group_by_state": {
"terms": {
"field": "Data.Program.title.keyword"
},
"aggs": {
"state_top_hits": {
"top_hits": {
"sort": [
{ "Data.CreatedAt": { "order": "desc" } },
{ "Data.ViewCount": { "order": "desc" } }
],
"_source": {
"includes": [ "key", "description", "image", "date" ]
},
"size": 10 //Will show top 10 hits within keyword agg ordered according to the sort
}
}
}
}
}
}

Related

How to get the last Elasticsearch document for each unique value of a field?

I have a data structure in Elasticsearch that looks like:
{
"name": "abc",
"date": "2022-10-08T21:30:40.000Z",
"rank": 3
}
I want to get, for each unique name, the rank of the document (or the whole document) with the most recent date.
I currently have this:
"aggs": {
"group-by-name": {
"terms": {
"field": "name"
},
"aggs": {
"max-date": {
"max": {
"field": "date"
}
}
}
}
}
How can I get the rank (or the whole document) for each result, and if possible, in 1 request ?
You can use below options
Collapse
"collapse": {
"field": "name"
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
Top hits aggregation
{
"aggs": {
"group-by-name": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"top_doc": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}

Elasticsearch aggregation with unqiue counting

My documents consist of a history of orders and their state, here a minimal example:
{
"orderNumber" : "xyz",
"state" : "shipping",
"day" : "2022-07-20",
"timestamp" : "2022-07-20T15:06:44.290Z",
}
the state can be strings like shipping, processing, redo,...
For every possible state, I need to count the number of orders that had this state at some point during a day, without counting a state twice for the same orderNumber that day (which can happen if there is a problem and it needs to start from the beginning that same day).
My aggregation looks like this:
GET order-history/_search
{
"aggs": {
"countDays": {
"terms": {
"field": "day",
"order": {
"_key": "desc"
},
"size": 20
},
"aggs": {
"countStates": {
"terms": {
"field": "state.keyword",
"size": 10
}
}
}
}
}
, "size": 1
}
However, this will count a state for a given orderNumber twice if it reappears that same day. How would I prevent it from counting a state twice for each orderNumber, if it is on the same day?
Tldr;
I don't think there is a flexible and simple solution.
But if you know in advance the number of state that exists. Maybe through another aggregation query, to get all type of state.
You could do the following
POST /_bulk
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"shipping","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"redo","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"shipping","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"bbb","state":"processing","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"bbb","state":"shipping","day":"2022-07-20"}
GET 73138766/_search
{
"size": 0,
"aggs": {
"per_day": {
"date_histogram": {
"field": "day",
"calendar_interval": "day"
},
"aggs": {
"shipping": {
"filter": { "term": { "state.keyword": "shipping" }
},
"aggs": {
"orders": {
"cardinality": {
"field": "orderNumber.keyword"
}
}
}
},
"processing": {
"filter": { "term": { "state.keyword": "processing" }
},
"aggs": {
"orders": {
"cardinality": {
"field": "orderNumber.keyword"
}
}
}
},
"redo": {
"filter": { "term": { "state.keyword": "redo" }
},
"aggs": {
"orders": {
"cardinality": {
"field": "orderNumber.keyword"
}
}
}
}
}
}
}
}
You will obtain the following results
{
"aggregations": {
"per_day": {
"buckets": [
{
"key_as_string": "2022-07-20T00:00:00.000Z",
"key": 1658275200000,
"doc_count": 5,
"shipping": {
"doc_count": 3,
"orders": {
"value": 2
}
},
"processing": {
"doc_count": 1,
"orders": {
"value": 1
}
},
"redo": {
"doc_count": 1,
"orders": {
"value": 1
}
}
}
]
}
}
}

Elasticsearch sort terms agg by arbitrary order

I have a terms aggregation and they want some specific values to always be at the top.
Like:
POST _search
{ "size": 0,
"aggs": {
"pets": {
"terms": {
"field": "species",
"order": "Dogs, Cats"
}
}
}
}
Where the results would be like "Dog", "Cat", "Iguana".
Dog and Cat at the top and everything else below.
Is this possible without scripting?
Thanks!
One way to do it is by filtering values in the terms aggregation. You'd create two terms aggregations, one with the desired terms and another with all other terms.
{
"size": 0,
"aggs": {
"top_terms": {
"terms": {
"field": "species",
"include": ["Dogs", "Cats"],
"order": { "_key" : "desc" }
}
},
"other_terms": {
"terms": {
"field": "species",
"exclude": ["Dogs", "Cats"]
}
}
}
}
Try it out
A script wouldn't be too complicated though -- first boost the two species, then sort by the scores first and then by _count:
GET pets/_search
{
"size": 0,
"query": {
"bool": {
"should": [
{
"terms": {
"species": [
"dog",
"cat"
],
"boost": 10
}
},
{
"match_all": {}
}
]
}
},
"aggs": {
"pets": {
"terms": {
"field": "species.keyword",
"order": [
{
"max_score": "desc"
},
{
"_count": "desc"
}
]
},
"aggs": {
"max_score": {
"max": {
"script": "_score"
}
}
}
}
}
}

how to sort the records by another field for the top_hit in ES

I have data such as:
Id name startTime(timestamp)
1 c 1510000000000
2 c 1500000000000
3 a 1510000000000
4 a 1500000000000
5 b 1500662700000
I want to get the max startTime record for each name, and then sort by name.
the result should be:
Id name startTime(timestamp)
1 a 1510000000000
5 b 1500662700000
2 c 1510000000000
currently, I can get the max startTime group by each name, but I don't know how to sort by name for the results.
Here is my query:
GET index/default/_search
{
"aggs": {
"group": {
"terms": {
"field": "name"
},
"aggs": {
"tops": {
"top_hits": {
"sort": [
{
"startTime": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
},
"size": 0
}
As I'm understand, except for top_hits sort, you want the name buckets to be sorted by the name.
Have a look at Terms Aggregation order. All you have to do is to add order by key under the terms aggregation.
Here is my suggestion:
{
"aggs": {
"group": {
"terms": {
"field": "name",
"order": { --> this will do the trick
"_term": "asc"
}
},
"aggs": {
"tops": {
"top_hits": {
"sort": [
{
"startTime": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
},
"size": 0
}

ElasticSearch Query to find intersection of two queries

Records exist in this format: {user_id, state}.
I need to write an elasticsearch query to find all user_id's that have both states present in the records list.
For example, if sample records stored are:
{1,a}
{1,b}
{2,a}
{2,b}
{1,a}
{3,b}
{3,b}
The output from running the query for this example would be
{"1", "2"}
I've tried this so far:
{
"size": 0,
"query": {
"bool": {
"filter": {
"terms": {
"state": [
"a",
"b"
]
}
}
}
},
"aggs": {
"user_id_intersection": {
"terms": {
"field": "user_id",
"min_doc_count": 2,
"size": 100
}
}
}
}
but this will return
{"1", "2", "3"}
Assuming you know the cardinality of the states set, here 2, you can use the
Bucket Selector Aggregation
GET test/_search
{
"size": 0,
"aggs": {
"user_ids": {
"terms": {
"field": "user_id"
},
"aggs": {
"states_card": {
"cardinality": {
"field": "state"
}
},
"state_filter": {
"bucket_selector": {
"buckets_path": {
"states_card": "states_card"
},
"script": "params.states_card == 2"
}
}
}
}
}
}

Resources