I am trying to execute a query in elasticsearch to get a list of products with the largest sales change percentage. The aggregation results should be group by productId and sorted by salesChangePercent. I have search around for a solution and tried solutions such as sorting elasticsearch top hits results but I am not able to sort the aggregation buckets by salesChangePercent. The following query is the only one which work, however it does not seem right to me as I am using "max_salesChangePercent" to do the sorting.
Am I doing something wrong here? Is there a better or cleaner way to get the aggregation buckets sorted? Really appreciate any help I can get to improve the query.
GET product_sales/_search
{
“size”: 0,
“query”: {
“range”: {
“salesChangePercent”: { “gte”: 50 }
}
},
“aggs”: {
“unique_products”: {
“terms”: {
“field”: “productId”,
“order" : {
“max_salesChangePercent”: “desc”
}
},
“aggs”: {
“top-sales”: {
“top_hits”: {
“size”: 1,
“_source”: {
“includes”: [
“productId”,
“productName”,
“salesChangePercent”,
]
}
}
},
“max_salesChangePercent”: {
“max”: {
“field”: “salesChangePercent”
}
}
}
}
}
}
Related
New to Open Search and couldn't really find an answer that worked for this use case. Essentially, my query uses scripts to access field document values within a multi_term search, then aggregates them into buckets reflecting certain metrics. The bucket key is an array of strings in the format of ['val1', 'val2', 'val3'] with an associated key_as_string of 'val1|val2|val3'
My goal is to be able to sort these buckets after aggregation based on any of these 3 values. Problem is, I can't seem to get sorting to work outside of a root "order" entry that sorts by the entire key (I think). Query is here:
aggregations: {
plans: {
multi_terms: {
size: 10000,
terms: [
{
script: "doc['plan.title.keyword'].value"
},
{
script: "doc['plan.type.keyword'].value"
},
{
script: "doc['plan.id.keyword'].value"
}
],
order: { _key: order } // This orders buckets by entire key?
},
aggregations: {
completed: {
filter: {
term: { 'status.keyword': 'Completed' }
}
},
in_progress: {
filter: {
term: { 'status.keyword': 'Started' }
}
},
stopped: {
filter: {
term: { 'status.keyword': 'Stopped' }
}
},
assigned: {
filter: {
term: { 'status.keyword': 'Assigned' }
}
},
my_bucket: {
bucket_sort: {
sort: [{_key: {order: 'asc'}}] // Breaks sort
}
}
}
}
},
The output of the query is correct, but the order of buckets output is not and I can't seem to get it right. I've attempted various ways of implementing bucket_sort to no avail. Feels like there is an easy solution to this and I'm just not finding it. My end goal is to be able to sort the buckets returned by a specified index of the key.
Can anyone tell me what I'm doing wrong here?
Note: Using Open Search v2.3
I’m trying to understand how to improve an inside sorting operation for an aggregation, made on new fields created by an $addFields step. I’ve got a very articulated pipeline, which I’ll just show the part that I’m interested in:
[
... other steps ...
{ '$addFields': { 'list.a_new_field': { ... } },
{ '$addFields': { 'list.other_new_field': { '$sum': [ { '$max': '$list.a_new_field } ] } } },
{ '$sort': { 'list.other_new_field': -1 } },
... other steps ...
]
The sort is taking 60s to compute, as explain’d:
{ '$sort': { sortKey: { 'list.other_new_field': -1 } },
nReturned: 3053,
executionTimeMillisEstimate: 60667 } ]
The collection has 464 documents.
The problem here is that I don’t really know how to index the sorting, cause it’s on a new field. Is there any way I can optimize the query without messing with the logic of the pipeline?
I have trouble to write a specific query in elasticsearch.
The context:
I have an index where each document represents a “SKU”: a declination of a product (symbolized by pId).
For example, the first 3 documents are declinations in color and price of product 235.
BS is for “Best SKU”: for a given product, SKUs are sorted from the most representative to the less representative.
After a search, only best SKUs matching the search should be used for further sorting or aggregations.
this is a script to create a test index:
POST /test/skus/DOC_1
{
"pId":235,
"BS":3,
"color":"red",
"price":59.00
}
POST /test/skus/DOC_2
{
"pId":235,
"BS":2,
"color":"red",
"price":29.00
}
POST /test/skus/DOC_3
{
"pId":235,
"BS":1,
"color":"green",
"price":69.00
}
POST /test/skus/DOC_4
{
"pId":236,
"BS":2,
"color":"blue",
"price":19.00
}
POST /test/skus/DOC_5
{
"pId":236,
"BS":1,
"color":"red",
"price":99.00
}
POST /test/skus/DOC_6
{
"pId":236,
"BS":3,
"color":"red",
"price":39.00
}
POST /test/skus/DOC_7
{
"pId":237,
"BS":2,
"color":"red",
"price":10.00
}
POST /test/skus/DOC_8
{
"pId":237,
"BS":1,
"color":"blue",
"price":50.00
}
POST /test/skus/DOC_9
{
"pId":237,
"BS":3,
"color":"green",
"price":20.00
}
The query I'm trying to write is a query that search, for example, the red SKUs, do an aggregation by product (using term aggregation and pId), only retains the best SKU in each bucket and THEN sort those buckets on the price of best SKU.
Here is what I've got so far:
GET /test/skus/_search
{
"size": 0,
"query": {
"term": {
"color": {
"value": "red"
}
}
},
"aggs": {
"bypId": {
"terms": {
"field": "pId",
"size": 10
},
"aggs": {
"mytophits": {
"top_hits": {
"size": 1,
"sort": ["BS"]
}
}
}
}
}
}
I don't know from here how to sort on buckets price.
I've done some screenshot to better explain what I'm trying to achieve:
screenshot1
screenshot2
screenshot3
screenshot4
screenshot5
Update: Still stuck.
An answer that tells me that it is not possible to do such a thing is also welcomed :)
I am trying to run this query in elasticsearch. Im trying to run a custom scripted_metric aggregation on my buckets. Within the metric script, I want to get access to the bucket key that it is aggregated on.
My documents in ES looks like this.
{
user_id: 5,
data: {
5: 200,
8: 300
}
},
{
user_id: 8,
data: {
5: 889,
8: 22
}
}
My aggregation query looks like this:
aggs = {
approvers: {
terms: {
field: 'user_id'
},
aggs: {
new_metric: {
scripted_metric: {
map_script: `
// IS IT POSSIBLE TO GET THE BUCKET KEY HERE?
// The bucket key here would be the user_id
// so i can do stuff like
doc['data'][**_term**]....
`
}
}
}
}
I had to do some digging and was likely having the same difficulty you were in finding a solution as to how to retrieve parent values... the only thing I could find was in regard to a special "_count" value on the child agg, but nothing related to its parent bucket names/keys.
If it's not a strict requirement to use a child agg with of a scripted_metric, I was able to find a way that allows you to at least access the bucket key within the parents. Maybe this can get you started in the direction of a solution:
aggs = {
approvers: {
terms: {
field: 'user_id',
script: '"There seems to be a magic value here: " + _value'
}
}
Sample adapted from this
I have simple documents with a scheduleId. I would like to get the count of documents for the most recent ScheduleId. Assuming Max ScheduleId is the most recent, how would we write that query. I have been searching and reading for few hours and could get it to work.
{
"aggs": {
"max_schedule": {
"max": {
"field": "ScheduleId"
}
}
}
}
That is getting me the Max ScheduleId and the total count of documents out side of that aggregate.
I would appreciate if someone could help me on how take this aggregate value and apply it as a filter (like a sub query in SQL!).
This should do it:
{
"aggs": {
"max_ScheduleId": {
"terms": {
"field": "ScheduleId",
"order" : { "_term" : "desc" },
"size": 1
}
}
}
}
The terms aggregation will give you document counts for each term, and it works for integers. You just need to order the results by the term instead of by the count (the default). And since you only want the highest ScheduleID, "size":1 is adequate.
Here is the code I used to test it:
http://sense.qbox.io/gist/93fb979393754b8bd9b19cb903a64027cba40ece