Bucket_selector on nested agg bucket doesn't works - elasticsearch

I'm trying to make an aggregation on a sibling's children aggregation to filter bucket based on a requested quantity condition, so here is my query :
GET _search
{
"size": 0,
"query": {
"bool": {
"must": [
{
"terms": {
"product.id": [20,21,22,23,24]
}
}
]
}
},
"aggs": {
"carts": {
"terms": {
"field": "item.cart_key"
},
"aggs": {
"unique_product": {
"terms": {
"field": "product.id"
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
}
}
},
"filtered_product_quantity": {
"bucket_selector": {
"buckets_path": {
"productId": "unique_product.key",
"productQuantity": "unique_product>quantity"
},
"script": {
"params": {
"requiredQuantities": {
"20": null,
"21": null,
"22": null,
"23": 3,
"24": null
}
},
"lang": "painless",
"source": "params.requiredQuantities[params.productId] <= params.productQuantity"
}
}
}
}
}
}
}
And the error :
{
"error": {
"root_cause": [],
"type": "search_phase_execution_exception",
"reason": "",
"phase": "fetch",
"grouped": true,
"failed_shards": [],
"caused_by": {
"type": "aggregation_execution_exception",
"reason": "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [Object[]] at aggregation [unique_product]"
}
},
"status": 500
}
Here is a sample document set :
[
{
product.id: 12,
item.cart_key: abc_123,
item.quantity: 2
},
{
product.id: 11,
item.cart_key: abc_123,
item.quantity: 1
},
{
product.id: 23,
item.cart_key: def_456,
item.quantity: 1
}
]
Is it the appropriate aggregation in use ?
In other way, I would like to :
Aggregate my documents by cart_key.
Per product.id , sum the quantity
Filter aggregations that have a quantity higher than a given Record object {[product.id]: minimum_quantity} (here is the requiredQuantities param
I don't know if the source script will works as elasticsearch can't reach it.

I don't think you handle the problem correctly, but here an hacky working solution
{
"size": 0,
"aggs": {
"carts": {
"terms": {
"field": "item.cart_key"
},
"aggs": {
"unique_product": {
"terms": {
"field": "product.id"
},
"aggs": {
"quantity": {
"sum": {
"field": "item.quantity"
}
},
"hackProductId": {
"max": {
"field": "product.id"
}
},
"filtered_product_quantity": {
"bucket_selector": {
"buckets_path": {
"productQuantity": "quantity",
"productId": "hackProductId"
},
"script": {
"params": {
"requiredQuantities": {
"12": 0,
"11": 0,
"22": 0,
"23": 0,
"24": 0
}
},
"lang": "painless",
"source": "params.requiredQuantities[((int)params.productId).toString()] <= params.productQuantity"
}
}
}
}
}
}
}
}
}

Related

Elasticsearch aggregation with unqiue counting

My documents consist of a history of orders and their state, here a minimal example:
{
"orderNumber" : "xyz",
"state" : "shipping",
"day" : "2022-07-20",
"timestamp" : "2022-07-20T15:06:44.290Z",
}
the state can be strings like shipping, processing, redo,...
For every possible state, I need to count the number of orders that had this state at some point during a day, without counting a state twice for the same orderNumber that day (which can happen if there is a problem and it needs to start from the beginning that same day).
My aggregation looks like this:
GET order-history/_search
{
"aggs": {
"countDays": {
"terms": {
"field": "day",
"order": {
"_key": "desc"
},
"size": 20
},
"aggs": {
"countStates": {
"terms": {
"field": "state.keyword",
"size": 10
}
}
}
}
}
, "size": 1
}
However, this will count a state for a given orderNumber twice if it reappears that same day. How would I prevent it from counting a state twice for each orderNumber, if it is on the same day?
Tldr;
I don't think there is a flexible and simple solution.
But if you know in advance the number of state that exists. Maybe through another aggregation query, to get all type of state.
You could do the following
POST /_bulk
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"shipping","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"redo","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"xyz","state":"shipping","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"bbb","state":"processing","day":"2022-07-20"}
{"index":{"_index":"73138766"}}
{"orderNumber":"bbb","state":"shipping","day":"2022-07-20"}
GET 73138766/_search
{
"size": 0,
"aggs": {
"per_day": {
"date_histogram": {
"field": "day",
"calendar_interval": "day"
},
"aggs": {
"shipping": {
"filter": { "term": { "state.keyword": "shipping" }
},
"aggs": {
"orders": {
"cardinality": {
"field": "orderNumber.keyword"
}
}
}
},
"processing": {
"filter": { "term": { "state.keyword": "processing" }
},
"aggs": {
"orders": {
"cardinality": {
"field": "orderNumber.keyword"
}
}
}
},
"redo": {
"filter": { "term": { "state.keyword": "redo" }
},
"aggs": {
"orders": {
"cardinality": {
"field": "orderNumber.keyword"
}
}
}
}
}
}
}
}
You will obtain the following results
{
"aggregations": {
"per_day": {
"buckets": [
{
"key_as_string": "2022-07-20T00:00:00.000Z",
"key": 1658275200000,
"doc_count": 5,
"shipping": {
"doc_count": 3,
"orders": {
"value": 2
}
},
"processing": {
"doc_count": 1,
"orders": {
"value": 1
}
},
"redo": {
"doc_count": 1,
"orders": {
"value": 1
}
}
}
]
}
}
}

Is it possible to fetch count of total number of docs that contain a qualifying aggregation condition in elasticsearch?

I use ES v7.3 and as per my requirements I am aggregating some fields to fetch the required docs in response, further their is a requirement to fetch the count of total number of all such docs also that contain the nested field which qualifies the aggregation condition as described below but I did not find a way where I am able to do that.
Current aggregation query that I am using to fetch the documents is,
"aggs": {
"users": {
"composite": {
"sources": [
{
"users": {
"terms": {
"field": "co_profileId.keyword"
}
}
}
],
"size": 5000
},
"aggs": {
"sessions": {
"nested": {
"path": "co_score"
},
"aggs": {
"last_4_days": {
"filter": {
"range": {
"co_score.sessionTime": {
"gte": "2021-01-10T00:00:31.399Z",
"lte": "2021-01-14T01:37:31.399Z"
}
}
},
"aggs": {
"score_count": {
"sum": {
"field": "co_score.value"
}
}
}
}
}
},
"page_view_count_filter": {
"bucket_selector": {
"buckets_path": {
"sessionCount": "sessions > last_4_days > score_count"
},
"script": "params.sessionCount > 100"
}
},
"filtered_users": {
"top_hits": {
"size": 1,
"_source": {
"includes": [
"co_profileId",
"co_type",
"co_score"
]
}
}
}
}
}
}
Sample doc:
{
"co_profileId": "14654325",
"co_type": "identify",
"co_updatedAt": "2021-01-11T11:37:33.499Z",
"co_score": [
{
"value": 3,
"sessionTime": "2021-01-09T01:37:31.399Z"
},
{
"value": 3,
"sessionTime": "2021-01-10T10:47:33.419Z"
},
{
"value": 6,
"sessionTime": "2021-01-11T11:37:33.499Z"
}
]
}

Perform multi-field / multi-dimensional aggregations with nested fields in Elastic Search

I am tracking attendance of few students. I am storing their details in the index like the below.
Each doc in "entries" have few other fields. The following data shows that a student has attended 6 classes on "Monday".
"entries" is of type "nested"
{
reg_id: 1111,
"entires" : [
{
id: "123"
day: 'Monday'
},
{
id: "1234",
attendance: true
},
{
id: "12345",
classes_attended: 6
}
],
}
I want the count of each classes_attended of students for each day.
For Example "72 entries of students found for "Monday", who has attended 6 classes"
Sample desired output - This is just a sample I am completely fine if the output schema is changed.
[
{
"day" : "monday",
"classes_attended": 6,
count: 4
},
{
"day" : "monday",
"classes_attended": 1,
count: 5
},
{
"day" : "tuesday",
"classes_attended": 5,
count: 2
},
{
"day" : "tuesday",
"classes_attended": 6,
count: 1
}
]
Not sure How to start with the aggregations query:
I tried with the following query but I know its not the correct solution
"aggs": {
"attendance_aggs": {
"nested": {
"path": "entries"
},
"aggs": {
"days": {
"terms": {
"field": "entries.day"
},
"aggs": {
"attended": {
"reverse_nested": {},
"aggs":{
"class_attended_day": {
"terms": {
"field": "entries.classes_attended"
},
"aggs": {
"class_attended_days_count": {
"reverse_nested": {},
"aggs": {
"classes_attended_final": {
"cardinality": {
"field": "entries.class_attended"
}
}
}
}
}
}
}
}
}
}
}
}
}
It's unclear what the top-level object is so I'm going to assume it's a "student attendance entry per day". I'm also unsure what the entries.ids represent but I'll assume you'll be needing them at some later point so I'll keep them untouched.
Now, since all that your entries objects have in common is the id, they can be decoupled. Meaning that you should be using nested if any only if you share some attributes across all objects which need their attribute connections preserved. Since I don't see entries.id anywhere in your aggs, I'd recommend the following adjustments to your mapping:
PUT students
{
"mappings": {
"properties": {
"day": { ------------
"type": "keyword" |
}, |
"attendance": { |
"type": "boolean" | <--
}, |
"classes_attended": { |
"type": "integer" |
}, ------------
"entries": {
"type": "nested",
"properties": {
"day": {
"type": "keyword",
"copy_to": "day" <--
},
"attendance": {
"type": "boolean",
"copy_to": "attendance" <--
},
"classes_attended": {
"type": "integer",
"copy_to": "classes_attended" <--
}
}
}
}
}
}
and here's your query:
GET students/_search
{
"size": 0,
"aggs": {
"days": {
"terms": {
"field": "day"
},
"aggs": {
"classes_attended": {
"terms": {
"field": "classes_attended"
},
"aggs": {
"student_count": {
"cardinality": {
"field": "_id"
}
}
}
}
}
}
}
}
The response can then be post-processed into whatever you prefer.
EDIT
You could hijack reverse_nested but will need to come back to it as you're referencing other nested entries:
GET students/_search
{
"size": 0,
"aggs": {
"attendance_aggs": {
"nested": {
"path": "entries"
},
"aggs": {
"days": {
"terms": {
"field": "entries.day"
},
"aggs": {
"attended": {
"reverse_nested": {},
"aggs": {
"class_attended_day": {
"nested": {
"path": "entries"
},
"aggs": {
"class_attended_day": {
"terms": {
"field": "entries.classes_attended"
},
"aggs": {
"classes_attended_final": {
"cardinality": {
"field": "entries.classes_attended"
}
}
}
}
}
}
}
}
}
}
}
}
}
}

access nested variable from sub-aggregation on elasticsearch

I have an index with documents that look like:
{
"id": 1,
"timeline": [{
"amount": {
"mpe": 30,
"drawn": 20
},
"interval": {
"gte": "2020-03-01",
"lte": "2020-04-01"
}
}, {
"amount": {
"mpe": 40,
"drawn": 10
},
"interval": {
"gte": "2020-04-01",
"lte": "2020-06-01"
}
}]
}
Then I have the following query that produces a time bucketed sum of the values from the original intervals:
{
"aggs": {
"cp-timeline": {
"nested": {
"path": "timeline"
},
"aggs": {
"mpes": {
"date_histogram": {
"field": "timeline.interval",
"calendar_interval": "day"
},
"aggs": {
"sum_mpe": {
"sum": {
"field": "timeline.amount.mpe"
}
},
"sum_drawn": {
"sum": {
"field": "timeline.amount.drawn"
}
}
}
}
}
}
}
}
The above works like a charm yielding the correct sum for each day. Now I want to improve it so I can dynamically multiply the values by a given number that may vary between query executions, although for simplicity I will just use a fixed number 2. I've tried the following:
{
"aggs": {
"cp-timeline": {
"nested": {
"path": "timeline"
},
"aggs": {
"mpes": {
"date_histogram": {
"field": "timeline.interval",
"calendar_interval": "day"
},
"aggs": {
"sum_mpe": {
"sum": {
"script": "timeline.amount.mpe * 2"
}
},
"sum_drawn": {
"sum": {
"script": "timeline.amount.drawn * 2"
}
}
}
}
}
}
}
}
But I get the following error:
{
"reason": {
"type": "script_exception",
"reason": "compile error",
"script_stack": [
"timeline.amount.mpe * 2",
"^---- HERE"
],
"script": "timeline.amount.mpe * 2",
"lang": "painless",
"position": {
"offset": 0,
"start": 0,
"end": 23
},
"caused_by": {
"type": "illegal_argument_exception",
"reason": "Variable [timeline] is not defined."
}
}
}
Is there a way to make the nested variable declared above available in the script?
This link states as how to access the fields via script. Note that you can only use this for fields which are analyzed i.e. text type.
The below should help:
POST <your_index_name>/_search
{
"size": 0,
"aggs": {
"cp-timeline": {
"nested": {
"path": "timeline"
},
"aggs": {
"mpes": {
"date_histogram": {
"field": "timeline.interval.gte",
"calendar_interval": "day",
"min_doc_count": 1 <---- Note this
},
"aggs": {
"sum_mpe": {
"sum": {
"script": "doc['timeline.amount.mpe'].value * 2" <---- Note this
}
},
"sum_drawn": {
"sum": {
"script": "doc['timeline.amount.drawn'].value * 2" <---- Note this
}
}
}
}
}
}
}
}
Also note that I've made use of min_doc_count so that your histogram would only show you the valid dates.

Sum and count aggregations over Elasticsearch fields

I am new to Elasticsearch and I am looking to perform certain aggregations over the fields from an Elasticsearch 5.x index. I have an index that contains the documents with fields langs (which have nested structure) and docLang. These are dynamically mapped fields. Following are the examples documents
DOC 1:
{
"_index":"A",
"_type":"document",
"_id":"1",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1,
"es":2,
"zh":3
},
"Y":{
"en":4,
"es":5,
"zh":6
}
},
"docLang": "en"
}
}
DOC 2:
{
"_index":"A",
"_type":"document",
"_id":"2",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1,
"es":2
},
"Y":{
"en":3,
"es":4
}
},
"docLang": "es"
}
}
DOC 3:
{
"_index":"A",
"_type":"document",
"_id":"2",
"_source":{
"text":"This is a test sentence.",
"langs":{
"X":{
"en":1
},
"Y":{
"en":2
}
},
"docLang": "en"
}
}
I want to perform sum aggregation over the langs field in a way that for each key (X/Y) and for each language, I can get the sum across all documents in an index. Also, I want to produce the counts of documents for each type of language from docLang field.
e.g.: For above 3 documents, sum aggregation over langs field would look like below:
"langs":{
"X":{
"en":3,
"es":4,
"zh":3
},
"Y":{
"en":9,
"es":9,
"zh":6
}
}
And the docLang count would look like below:
"docLang":{
"en" : 2,
"es" : 1
}
Also because of some production env restrictions, I cannot use scripts in Elasticsearch. So, I was wondering if it is possible to use just field aggregation type for above fields?
{
"size": 0,
"aggs": {
"X": {
"nested": {
"path": "langs.X"
},
"aggs": {
"X_sum_en": {
"sum": {
"field": "langs.X.en"
}
},
"X_sum_es": {
"sum": {
"field": "langs.X.es"
}
},
"X_sum_zh": {
"sum": {
"field": "langs.X.zh"
}
}
}
},
"Y": {
"nested": {
"path": "langs.Y"
},
"aggs": {
"Y_sum_en": {
"sum": {
"field": "langs.Y.en"
}
},
"Y_sum_es": {
"sum": {
"field": "langs.Y.es"
}
},
"Y_sum_zh": {
"sum": {
"field": "langs.Y.zh"
}
}
}
},
"sum_docLang": {
"terms": {
"field": "docLang.keyword",
"size": 10
}
}
}
}
Since you didn't mention, but I think it's important. I made X and Y as nested fields:
"langs": {
"properties": {
"X": {
"type": "nested",
"properties": {
"en": {
"type": "long"
},
"es": {
"type": "long"
},
"zh": {
"type": "long"
}
}
},
"Y": {
"type": "nested",
"properties": {
"en": {
"type": "long"
},
"es": {
"type": "long"
},
"zh": {
"type": "long"
}
}
}
}
}
But, if you fields are not nested at all and here I mean actually the nested field type in Elasticsearch, a simple aggregation like this one should be enough:
{
"size": 0,
"aggs": {
"X_sum_en": {
"sum": {
"field": "langs.X.en"
}
},
"X_sum_es": {
"sum": {
"field": "langs.X.es"
}
},
"X_sum_zh": {
"sum": {
"field": "langs.X.zh"
}
},
"Y_sum_en": {
"sum": {
"field": "langs.Y.en"
}
},
"Y_sum_es": {
"sum": {
"field": "langs.Y.es"
}
},
"Y_sum_zh": {
"sum": {
"field": "langs.Y.zh"
}
},
"sum_docLang": {
"terms": {
"field": "docLang.keyword",
"size": 10
}
}
}
}

Resources