Run aggregation on multiple fields in elastic search - elasticsearch

I am working on a used cars marketplace website similar to https://www.kijijiautos.ca.
I want to make an aggregation on my dataset, to get the number of cars with for eg "Ford" and in the same aggregation I want to get the total number of used cars, new cars and other specifications
I tried the following:
GET auto/_search
{
"size": 0,
"aggs": {
"stats": {
"multi_terms": {
"terms": [{"field": "Make"}, {"field": "Type"}]
}
}
}
}
I got the following result
{
"took" : 149,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"stats" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 14378,
"buckets" : [
{
"key" : [
"BMW",
"Used"
],
"key_as_string" : "BMW|Used",
"doc_count" : 2826
},
{
"key" : [
"Volkswagen",
"Used"
],
"key_as_string" : "Volkswagen|Used",
"doc_count" : 2592
},
{
"key" : [
"Audi",
"Used"
],
"key_as_string" : "Audi|Used",
"doc_count" : 2310
},
{
"key" : [
"Opel",
"Used"
],
"key_as_string" : "Opel|Used",
"doc_count" : 1494
},
{
"key" : [
"Ford",
"Used"
],
"key_as_string" : "Ford|Used",
"doc_count" : 1485
},
{
"key" : [
"Renault",
"Used"
],
"key_as_string" : "Renault|Used",
"doc_count" : 1303
},
{
"key" : [
"Peugeot",
"Used"
],
"key_as_string" : "Peugeot|Used",
"doc_count" : 1196
},
{
"key" : [
"Fiat",
"Used"
],
"key_as_string" : "Fiat|Used",
"doc_count" : 1149
},
{
"key" : [
"Skoda",
"Used"
],
"key_as_string" : "Skoda|Used",
"doc_count" : 668
},
{
"key" : [
"SEAT",
"Used"
],
"key_as_string" : "SEAT|Used",
"doc_count" : 629
}
]
}
}
}
but that's not what I am expecting
what I expect is the following :
{
// first bucket
[
{
"doc_count" : 123,
"key" : "BMW"
}
// other makes here
]
// second bucket
[
{
"doc_count" : 2500,
"key" : "Used"
},
{
"doc_count" : 500,
"key" : "New"
}
]
}
If this is possible in elasticsearch then please help me write this query
thank you

I found the solution for my problem, and I would like to leave it here for ref. ( found the solution here )
the idea is to specify multiple aggregations inside the aggs object that's all
GET auto/_search
{
"size": 0,
"aggs": {
// aggregate on types
"types": {
"terms": {
"field": "Type",
"size": 10
}
},
// aggregate on makes
"makes": {
"terms": {
"field": "Make",
"size": 10
}
},
// aggregate on body types
"body type": {
"terms": {
"field": "Body Type",
"size": 10
}
}
}
}
the result was as expected
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"types" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Used",
"doc_count" : 22414
},
{
"key" : "New",
"doc_count" : 3021
},
{
"key" : "Pre-registered",
"doc_count" : 2071
},
{
"key" : "Demonstration",
"doc_count" : 1427
},
{
"key" : "Employee's car",
"doc_count" : 900
},
{
"key" : "Antique / Classic",
"doc_count" : 195
},
{
"key" : "Automatic",
"doc_count" : 1
},
{
"key" : "Metallic",
"doc_count" : 1
}
]
},
"makes" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 9814,
"buckets" : [
{
"key" : "Volkswagen",
"doc_count" : 3295
},
{
"key" : "BMW",
"doc_count" : 3212
},
{
"key" : "Audi",
"doc_count" : 2768
},
{
"key" : "Ford",
"doc_count" : 2109
},
{
"key" : "Opel",
"doc_count" : 1840
},
{
"key" : "Fiat",
"doc_count" : 1692
},
{
"key" : "Renault",
"doc_count" : 1691
},
{
"key" : "Peugeot",
"doc_count" : 1458
},
{
"key" : "Skoda",
"doc_count" : 1193
},
{
"key" : "SEAT",
"doc_count" : 958
}
]
},
"body type" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Sedans",
"doc_count" : 8410
},
{
"key" : "Off-Road/Pick-up",
"doc_count" : 6498
},
{
"key" : "Station wagon",
"doc_count" : 4894
},
{
"key" : "Compact",
"doc_count" : 3516
},
{
"key" : "Van",
"doc_count" : 1844
},
{
"key" : "Transporter",
"doc_count" : 1539
},
{
"key" : "Coupe",
"doc_count" : 1144
},
{
"key" : "Convertible",
"doc_count" : 1104
},
{
"key" : "Other",
"doc_count" : 654
}
]
}
}
}

Related

es cumulative_sum cannot support the number of returned docs

I got a confusion that how to specify the number of returned docs from cumulative_sum aggs, this is my search:
{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"group_by_date": {
"date_histogram": {
"field": "timestamp",
"interval": "day"
},
"aggs": {
"cumulative_docs": {
"cumulative_sum": {"buckets_path": "_count"}
}
}
}
}
}
and it returns max number of buckets
"aggregations" : {
"group_by_date" : {
"buckets" : [
{
"key_as_string" : "2022-09-03T00:00:00.000Z",
"key" : 1662163200000,
"doc_count" : 19,
"cumulative_docs" : {
"value" : 19.0
}
},
{
"key_as_string" : "2022-09-04T00:00:00.000Z",
"key" : 1662249600000,
"doc_count" : 0,
"cumulative_docs" : {
"value" : 19.0
}
},
{
"key_as_string" : "2022-09-05T00:00:00.000Z",
"key" : 1662336000000,
"doc_count" : 0,
"cumulative_docs" : {
"value" : 19.0
}
},
{
"key_as_string" : "2022-09-06T00:00:00.000Z",
"key" : 1662422400000,
"doc_count" : 0,
"cumulative_docs" : {
"value" : 19.0
}
},
{
"key_as_string" : "2022-09-07T00:00:00.000Z",
"key" : 1662508800000,
"doc_count" : 0,
"cumulative_docs" : {
"value" : 19.0
}
},
{
"key_as_string" : "2022-09-08T00:00:00.000Z",
"key" : 1662595200000,
"doc_count" : 0,
"cumulative_docs" : {
"value" : 19.0
}
},
...
I tried to use bucket_selector to filter top10 or N in cumulative_sum but its return error such like can not support sub aggs in cumulative_sum, and also tried to use size param but not support.
if I wanna return only ten or more(I can specify it myself), how can I revise my code here?

Aggregating all fields for an object in a search query, without manually specifying the fields

I have an index products which has an internal object attributes which looks like:
{
properties: {
id: {...},
name: {...},
colors: {...},
// remaining fields
}
}
I'm trying to produce a search query with this form and I need to figure out how to write the aggs object.
{ query: {...}, aggs: {...} }
I can write this out manually for two fields to get the desired result, however the object contains 50+ fields so I need it to be able to handle it automatically
"aggs": {
"attributes.color_group.id": {
"terms": {
"field": "attributes.color_group.id.keyword"
}
},
"attributes.product_type.id": {
"terms": {
"field": "attributes.product_type.id.keyword"
}
}
}
Gives me the result:
"aggregations" : {
"attributes.product_type.id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 34,
"buckets" : [
{
"key" : "374",
"doc_count" : 203
},
{
"key" : "439",
"doc_count" : 79
},
{
"key" : "460",
"doc_count" : 28
},
{
"key" : "451",
"doc_count" : 24
},
{
"key" : "558",
"doc_count" : 18
},
{
"key" : "500",
"doc_count" : 10
},
{
"key" : "1559",
"doc_count" : 9
},
{
"key" : "1560",
"doc_count" : 9
},
{
"key" : "455",
"doc_count" : 7
},
{
"key" : "501",
"doc_count" : 6
}
]
},
"attributes.color_group.id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 35,
"buckets" : [
{
"key" : "12",
"doc_count" : 98
},
{
"key" : "54",
"doc_count" : 48
},
{
"key" : "118",
"doc_count" : 43
},
{
"key" : "110",
"doc_count" : 41
},
{
"key" : "111",
"doc_count" : 35
},
{
"key" : "71",
"doc_count" : 35
},
{
"key" : "119",
"doc_count" : 24
},
{
"key" : "62",
"doc_count" : 21
},
{
"key" : "115",
"doc_count" : 20
},
{
"key" : "113",
"doc_count" : 15
}
]
}
}
Which is exactly what I want. After some research I found that you can use query_string which would allow me to find everything starting with attributes., however it does not seem to work inside aggregations.
As I know what you are asking is not possible with inbuild functionality of elasticsearch. But there are some work around you can do like:
Use Search Template:
Below is Example for Search Template, where you will provide list of field as array and it will create the aggregation for all provided fields. you can store search template using Script API and use id of search template while calling search request.
POST dyagg/_search/template
{
"source": """{
"query": {
"match_all": {}
},
"aggs": {
{{#filter}}
"{{.}}": {
"terms": {
"field": "{{.}}",
"size": 10
}
}, {{/filter}}
"name": {
"terms": {
"field": "name",
"size": 10
}
}
}
}""",
"params": {
"filter":["lastname","firstname","city","country"]
}
}
Response:
"aggregations" : {
"country" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "India",
"doc_count" : 4
}
]
},
"firstname" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Rajan",
"doc_count" : 1
},
{
"key" : "Sagar",
"doc_count" : 1
},
{
"key" : "Sajan",
"doc_count" : 1
},
{
"key" : "Sunny",
"doc_count" : 1
}
]
},
"city" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Mumbai",
"doc_count" : 2
},
{
"key" : "Pune",
"doc_count" : 2
}
]
},
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Rajan Desai",
"doc_count" : 1
},
{
"key" : "Sagar Patel",
"doc_count" : 1
},
{
"key" : "Sajan Patel",
"doc_count" : 1
},
{
"key" : "Sunny Desai",
"doc_count" : 1
}
]
},
"lastname" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Desai",
"doc_count" : 2
},
{
"key" : "Patel",
"doc_count" : 2
}
]
}
}
Second way is using programming. Please check this stackoverflow answer where they have mentioned about how to do in PHP so same you can follow for other language.
NOTE:
If you noticed search template, I have added one static aggregation for name field and reason for adding is to avoid extra comma in the end of for loop complete. If you not add then you will get json_parse_exception.

Elasticsearch order aggregations bucket based on a field (can be text/string)

My document has a category id.
This is my aggregation query:
"aggs": {
"categories": {
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "price"
}
}
]
}
},
"aggs": {
"categories": {
"terms": {
"field": "category_id",
"order": {
"_count": "desc"
},
"size": 15
}
}
}
}
It produces the following results:
"categories" : {
"doc_count" : 92485,
"categories" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 4780,
"buckets" : [ {
"key" : 5053,
"doc_count" : 21827
}, {
"key" : 5413,
"doc_count" : 15760
}, {
"key" : 5057,
"doc_count" : 12473
}, {
"key" : 77978,
"doc_count" : 11388
}, {
"key" : 5030,
"doc_count" : 9898
}, {
"key" : 5055,
"doc_count" : 2492
}, {
"key" : 8543,
"doc_count" : 2461
}, {
"key" : 5684,
"doc_count" : 2106
}, {
"key" : 5050,
"doc_count" : 2001
}, {
"key" : 8544,
"doc_count" : 1803
}, {
"key" : 5049,
"doc_count" : 1635
}, {
"key" : 5054,
"doc_count" : 1284
}, {
"key" : 5035,
"doc_count" : 977
}, {
"key" : 8731,
"doc_count" : 817
}, {
"key" : 8732,
"doc_count" : 783
} ]
}
}
Is it possible to get the response such that buckets are ordered by category_id or any other field post bucketing as I want to select only 15 such buckets with maximum doc_count.
Also if possible is there a way do it based on a field which is text/string.
I tried sub-aggregation but couldn't figure it out.

Elasticsearch 2: Can I filter terms aggregation by keys

Is it possible to filter the terms aggregation by key? I did a terms aggregation on an array field, which resulted a lot of unnecessary buckets. (Even though I applied a filter to the query)
For example for a result like this:
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 9928,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"variables" : {
"buckets" : [ {
"key" : "00",
"doc_count" : 158
}, {
"key" : "1",
"doc_count" : 158
}, {
"key" : "2",
"doc_count" : 158
}, {
"key" : "3",
"doc_count" : 158
}, {
"key" : "4",
"doc_count" : 158
}, {
"key" : "5",
"doc_count" : 158
}, {
"key" : "6",
"doc_count" : 156
}, {
"key" : "7",
"doc_count" : 127
}, {
"key" : "8",
"doc_count" : 121
}, {
"key" : "9",
"doc_count" : 104
} ]
}
}
}
What I want to do is to tell ElasticSearch to only keep buckets where key=[1,2,3,4,5].
Update:
Here is the query I used:
POST xxtransaction/_search?ignore_unavailable=true
{
"size" : 0,
"timeout" : 30000,
"terminate_after" : 10000000,
"query" : {
"filtered" : {
"filter" : {
"and" : {
"filters" : [ {
"range" : {
"transaction_time" : {
"from" : 1459461600000,
"to" : 1459547999999,
"include_lower" : true,
"include_upper" : true
}
}
}, {
"term" : {
"site" : "xxx.com"
}
}, {
"terms" : {
"category_codes" : [ 1,2,3,4,5,6,7,8,9]
}
} ]
}
}
}
},
"aggregations" : {
"category_codes" : {
"terms" : {
"field" : "category_codes",
"size" : 20000
}
}
}
}

Sorting in sub aggregated result of a sub sub aggregated in elasticsearch

I have to rewrite the SQL Query to NOSQL Query.
SELECT count(1) as total,
count(CASE WHEN updated >= now() - '1 day'::interval THEN 1 END) as daily,
count(CASE WHEN updated >= now() - '7 day'::interval THEN 1 END) as weekly,
count(CASE WHEN updated >= now() - '30 day'::interval THEN 1 END) as monthly,
status_code, state
FROM alerts
GROUP BY status_code, state
ORDER BY total DESC, status_code, state
Following is output for SQL Query
total | daily | weekly | monthly | status_code | state
------------------------------------------------------------------------------------
2 0 0 1 test1 ACTIVE
2 0 1 2 test1 INACTIVE
2 1 1 1 test2 INACTIVE
1 0 0 1 test3 ACTIVE
I got struck while ordering the 'total' column while writing NOSQL Query
Below is the NOSQL Query i used
{
"stateAggregation": {
"terms": {
"field": "state"
},
"aggs": {
"statusCodeAggregation": {
"terms": {
"field": "status_code"
} ,
"aggs": {
"total" : {
"date_range": {
"field": "updated",
"ranges": [{ "to": "now" }]
}
},
"daily" : {
"date_range": {
"field": "updated",
"ranges": [{ "from": "now-1d/d" }]
}
},
"weekly" : {
"date_range": {
"field": "updated",
"ranges": [{ "from": "now-7d/d" }]
}
},
"monthly" : {
"date_range": {
"field": "updated",
"ranges": [{ "from": "now-30d/d" }]
}
}
}
}
}
}
}
Following is output for NOSQL Query
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"aggregations" : {
"stateAggregation" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "active",
"doc_count" : 3,
"statusCodeAggregation" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "test 1",
"doc_count" : 2,
"weekly" : {
"buckets" : [ {
"key" : "2015-09-04T00:00:00.000Z-*",
"from" : 1.4413248E12,
"from_as_string" : "2015-09-04T00:00:00.000Z",
"doc_count" : 0
} ]
},
"total" : {
"buckets" : [ {
"key" : "*-2015-09-11T12:42:58.463Z",
"to" : 1.441975378463E12,
"to_as_string" : "2015-09-11T12:42:58.463Z",
"doc_count" : 2
} ]
},
"monthly" : {
"buckets" : [ {
"key" : "2015-08-12T00:00:00.000Z-*",
"from" : 1.4393376E12,
"from_as_string" : "2015-08-12T00:00:00.000Z",
"doc_count" : 1
} ]
},
"daily" : {
"buckets" : [ {
"key" : "2015-09-10T00:00:00.000Z-*",
"from" : 1.4418432E12,
"from_as_string" : "2015-09-10T00:00:00.000Z",
"doc_count" : 0
} ]
}
}, {
"key" : "test",
"doc_count" : 1,
"weekly" : {
"buckets" : [ {
"key" : "2015-09-04T00:00:00.000Z-*",
"from" : 1.4413248E12,
"from_as_string" : "2015-09-04T00:00:00.000Z",
"doc_count" : 1
} ]
},
"total" : {
"buckets" : [ {
"key" : "*-2015-09-11T12:42:58.463Z",
"to" : 1.441975378463E12,
"to_as_string" : "2015-09-11T12:42:58.463Z",
"doc_count" : 1
} ]
},
"monthly" : {
"buckets" : [ {
"key" : "2015-08-12T00:00:00.000Z-*",
"from" : 1.4393376E12,
"from_as_string" : "2015-08-12T00:00:00.000Z",
"doc_count" : 1
} ]
},
"daily" : {
"buckets" : [ {
"key" : "2015-09-10T00:00:00.000Z-*",
"from" : 1.4418432E12,
"from_as_string" : "2015-09-10T00:00:00.000Z",
"doc_count" : 1
} ]
}
} ]
}
}, {
"key" : "mute",
"doc_count" : 2,
"statusCodeAggregation" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "test",
"doc_count" : 2,
"weekly" : {
"buckets" : [ {
"key" : "2015-09-04T00:00:00.000Z-*",
"from" : 1.4413248E12,
"from_as_string" : "2015-09-04T00:00:00.000Z",
"doc_count" : 1
} ]
},
"total" : {
"buckets" : [ {
"key" : "*-2015-09-11T12:42:58.463Z",
"to" : 1.441975378463E12,
"to_as_string" : "2015-09-11T12:42:58.463Z",
"doc_count" : 2
} ]
},
"monthly" : {
"buckets" : [ {
"key" : "2015-08-12T00:00:00.000Z-*",
"from" : 1.4393376E12,
"from_as_string" : "2015-08-12T00:00:00.000Z",
"doc_count" : 2
} ]
},
"daily" : {
"buckets" : [ {
"key" : "2015-09-10T00:00:00.000Z-*",
"from" : 1.4418432E12,
"from_as_string" : "2015-09-10T00:00:00.000Z",
"doc_count" : 1
} ]
}
} ]
}
} ]
}
}
}
Can anyone please help me out in modifying the NOSQL query for applying order on 'total' aggregation?
When i try to add order on total in status code aggregation
"statusCodeAggregation": {
"terms": {
"field": "status_code",
"order" :{ "total._count" : "desc" }
}
Then i got the following error
AggregationExecutionException[Invalid terms aggregation order path [total._count]. Terms buckets can only be sorted on a sub-aggregator path that is built out of zero or more single-bucket aggregations within the path and a final single-bucket or a metrics aggregation at the path end.]}

Resources