How to fetch maximum record using the particular channel _id in elasticsearch 7.5 - elasticsearch

i have multiple document in each channel id and i want sort descending order in view column by channel id.
{
"query":
{
"match":{
"channel_id":"UCQOd1f6pYldvhgvdQ_ktpGA"
}
},
"aggs":{
"video_views":{
"sort": {
"views": "desc"
},
"_source": ["channel_id", "views"]
}
}
}

You can get top 1 document sorted on views column
{
"size":1, --> get top 1 documnet
"query": {
"term": { --> term query to filter on channelId
"channel_id.keyword": {
"value": "UCQOd1f6pYldvhgvdQ_ktpGA"
}
}
},
"sort": [ ---> sort on views column
{
"views": {
"order": "desc"
}
}
]
}
Use Max aggregation
{
"query": {
"term": {
"channel_id.keyword": {
"value": "UCQOd1f6pYldvhgvdQ_ktpGA"
}
}
},
"aggs": {
"views": {
"max": {
"field": "views",
"missing": 0
}
}
}
}

Related

How to get the last Elasticsearch document for each unique value of a field?

I have a data structure in Elasticsearch that looks like:
{
"name": "abc",
"date": "2022-10-08T21:30:40.000Z",
"rank": 3
}
I want to get, for each unique name, the rank of the document (or the whole document) with the most recent date.
I currently have this:
"aggs": {
"group-by-name": {
"terms": {
"field": "name"
},
"aggs": {
"max-date": {
"max": {
"field": "date"
}
}
}
}
}
How can I get the rank (or the whole document) for each result, and if possible, in 1 request ?
You can use below options
Collapse
"collapse": {
"field": "name"
},
"sort": [
{
"date": {
"order": "desc"
}
}
]
Top hits aggregation
{
"aggs": {
"group-by-name": {
"terms": {
"field": "name",
"size": 100
},
"aggs": {
"top_doc": {
"top_hits": {
"sort": [
{
"date": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}

Bucket sort on dynamic aggregation name

I would like to sort my aggregations value from quantity.
But my problem is that each aggregation have a name that couldn't be know in advance :
Given this query :
{
"size": 0,
"query": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"sorting": {
"bucket_sort": {
"sort": [
{
"year>quantity": {
"order": "desc"
}
}
]
}
},
"UNKNOWN_1": {
"aggs": {
"year": {
"filter": {
"bool": {
"must": [
{
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
}
]
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
"UNKNOWN_2": {
"aggs": {
"year": {
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
}
}
},
....
}
}
it miss one level on my bucket_sort aggregation to reach that quantity value.
Here is one elastic record :
{
datetime: '2021-12-01',
item.quantity: 5
}
Note that I have remove the biggest part of the request for comprehension, like filter aggregation, ect....
I tried something with wildcard :
"sorting": {
"bucket_sort": {
"sort": [
{
"*>year>quantity": {
"order": "desc"
}
}
]
}
},
But got the same error....
Is it possible to achieve this behaviour ?
I think you misunderstood the "bucket_sort" aggregation: it won't sort your aggregations but it sorts the buckets coming from one multi-bucket aggregation. Also the bucket_sort aggregation has to be subordinate to that multi-bucket aggregation.
From the docs:
[The bucket sort aggregation is] "a parent pipeline aggregation which sorts the buckets of its parent multi-bucket aggregation"
If I get it correct, you try to create "buckets" with specific filter aggregations and you can't know in advance how many of those filter aggregations you create.
For that you can use the "multi filters" aggregation where you can specify as many filters as you want and each of them creates a bucket.
Subordinated to that filters-aggregation you can create one single sum aggregation on item.quantity.
Also subordinated to the filters-aggregations you then add your buckets_sort aggregation, where you also just have to name the sibling "sum" aggregation.
All in all it might look like that:
{
"aggs": {
"your_filters": {
"filters": {
"filters": {
"unknown_1": {
"range": {
"datetime": {
"gte": "2021-01-01",
"lte": "2021-12-09"
}
}
},
"unknown_2": {
/** more filters here... **/
}
}
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"sorting": {
"bucket_sort": {
"sort": [
{ "quantity": { "order": "desc" } }
]
}
}
}
}
}
}

How to diversify the result of top-hits aggregation?

Let's start with a concrete example. I have a document with these fields:
{
"template": {
"mappings": {
"template": {
"properties": {
"tid": {
"type": "long"
},
"folder_id": {
"type": "long"
},
"status": {
"type": "integer"
},
"major_num": {
"type": "integer"
}
}
}
}
}
}
I want to aggregate the query result by field folder_id, and for each group divided by folder_id, retrieve the top-N documents' _source detail. So i write query DSL like:
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"top_hit":{
"top_hits": {
"size": 5,
"_source": ["major_num"]
}
}
}
}
}
}
However, now comes a requirement that the top hits documents for each folder_id must be diversified on the field major_num. For each folder_id, the top hits documents retrieve by the sub top_hits aggregation under the terms aggregation, must be unique on field major_num, and for each major_num value, return at most 1 document in the sub top hits aggregation result.
top_hits aggregation cannot accept sub-aggregations, so how should i solve the question?
Why not simply adding another terms aggregation on the major_num field ?
GET /template/template/_search
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"term": {
"status": 1
}
}
]
}
},
"aggs": {
"folder": {
"terms": {
"field": "folder_id",
"size": 10
},
"aggs": {
"majornum": {
"terms": {
"field": "major_num",
"size": 10
},
"aggs": {
"top_hit": {
"top_hits": {
"size": 1
}
}
}
}
}
}
}
}

Need aggregation on document inner array object - ElasticSearch

I am trying to do aggregation over the following document
{
"pid": 900000,
"mid": 9000,
"cid": 90,
"bid": 1000,
"gmv": 1000000,
"vol": 200,
"data": [
{
"date": "25-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "24-11-2018",
"gmv": 100000,
"vol": 20
},
{
"date": "23-11-2018",
"gmv": 100000,
"vol": 20
}
]
}
The analysis which needs to be done here is:
Filter on mid or/and cid on all documents
Filter range on data.date for last 7 days and sum data.vol over that range for each pid
sort the documents over the sum obtained in previous step in desc order
Group these results by pid.
This means we are trying to get top products by sum of the volume (quantity sold) within a date range for specific cid/mid.
PID here refers product ID,
MID refers here merchant ID,
CID refers here category ID
Firstly you need to change your mapping to run the query on nested fields.
change the type for field 'data' as 'nested'.
Then you can use the range query in filter along with the terms filter on mid/cid to filter on the data. Once you get the correct data set, then you can aggregate on the pid following the sub aggregation on sum of vol.
Here is the below query.
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"range": {
"data.date": {
"gte": "28-11-2018",
"lte": "25-11-2018"
}
}
},
{
"must": [
{
"terms": {
"mid": [
"9000"
]
}
}
]
}
]
}
}
]
}
},
"aggs": {
"AGG_PID": {
"terms": {
"field": "pid",
"size": 0,
"order": {
"TOTAL_SUM": "desc"
},
"min_doc_count": 1
},
"aggs": {
"TOTAL_SUM": {
"sum": {
"field": "data.vol"
}
}
}
}
}
}
You can modify the query accordingly. Hope this will be helpful.
Please find nested aggregation query which sorts by "vol" for each bucket of "pid". You can add any number of filters in the query part.
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"mid": "2"
}
}
]
}
},
"aggs": {
"top_products_sorted_by_order_volume": {
"terms": {
"field": "pid",
"order": {
"nested_data_object>order_volume_by_range>order_volume_sum": "desc"
}
},
"aggs": {
"nested_data_object": {
"nested": {
"path": "data"
},
"aggs": {
"order_volume_by_range": {
"filter": {
"range": {
"data.date": {
"gte": "2018-11-26",
"lte": "2018-11-27"
}
}
},
"aggs": {
"order_volume_sum": {
"sum": {
"field": "data.ord_vol"
}
}
}
}
}
}
}
}
}
}

How to take more fields when grouping

Trying to group data and take all of its fields by the way.
GET /testnews/default/_search
{
"size": 10,
"from":50,
"query":{
"multi_match": {
"query": "serenay",
"fields": ["Data.Title", "Data.Description", "Data.Tags.Title", "Data.MentionTitle", "Data.Program.title", "Data.Program.description", "Data.Program.original_title"]
}
},
"sort":[{
"Data.CreatedAt": {
"order": "desc"
},
"Data.ViewCount": {
"order": "desc"
}
}],
"aggs": {
"group_by_state": {
"terms": {
"field": "Data.Program.title.keyword"
}
}
}
}
But when I did it, it returns only "Program Title" in the grouped result.
Just like:
{
"key": "Kocamın Ailesi",
"doc_count": 3
}
But I just want it like:
{
"key": "Kocamın Ailesi",
"description": "blabla",
"image": "blabla.jpg",
"date": "YYYY-mm-dd",
"doc_count": 3
}
just like sql
select * from x group by field
Regarding the SQL example, to get the behaviour of
select a, b, count(*) from x group by a, b
you can aggregate on a, then b like this:
"aggs": {
"group_by_a": {
"terms": {
"field": "a"
},
"aggs": {
"group_by_b": {
"terms": {
"field":"b"
}
}
}
}
}
But I don't think that is what you're looking for?
If you want the full documents in aggregations you can use the "top_hits" aggregation to select the top n hits within each aggregation:
{
"aggs": {
"group_by_state": {
"terms": {
"field": "Data.Program.title.keyword"
},
"aggs": {
"state_top_hits": {
"top_hits": {
"sort": [
{ "Data.CreatedAt": { "order": "desc" } },
{ "Data.ViewCount": { "order": "desc" } }
],
"_source": {
"includes": [ "key", "description", "image", "date" ]
},
"size": 10 //Will show top 10 hits within keyword agg ordered according to the sort
}
}
}
}
}
}

Resources