Elasticsearch - Get aggregation key sort as number - elasticsearch

I made query result that aggregate some data, and its aggregation key is number. I tried to sort result of aggregation by key. elasticsearch treated key as string.
Since the number of current result bucket is pretty large, it's unable to modify on client side. Any idea of this?
Here is my query.
"aggregations" : {
"startcount" : {
"terms" : {
"script" : "round(doc['startat'].value/1000)",
"size" : 1000,
"order" : { "_term" : "asc" }
}
}
}
and current result bucket.
"buckets": [
{
"key": "0",
"doc_count": 68
},
{
"key": "1",
"doc_count": 21
},
{
"key": "10",
"doc_count": 6
},
{
"key": "11",
"doc_count": 16
},
It's my expect result.
"buckets": [
{
"key": "0",
"doc_count": 68
},
{
"key": "1",
"doc_count": 21
},
{
"key": "2", // not '10'
"doc_count": 6
},
{
"key": "3", // not '11'
"doc_count": 16
},

Using the value_script approach should fix the alphabetical sort issue:
Example:
{
"size": 0,
"aggregations": {
"startcount": {
"terms": {
"field": "startat",
"script": "round(_value/1000)",
"size": 1000,
"order": {
"_term": "asc"
}
}
}
}
}

This is a multiple group by scenario where data are being sorted by the key descending order.
{
"size": 0,
"aggs": {
"categories": {
"filter": {
"exists": {
"field": "organization_industries"
}
},
"aggs": {
"names": {
"terms": {
"field": "organization_revenue_in_thousands_int.keyword",
"size": 200,
"order": {
"_key": "desc"
}
},
"aggs": {
"industry_stats": {
"terms": {
"field": "organization_industries.keyword"
}
}
}
}
}
}
}
}
Output
"aggregations": {
"categories": {
"doc_count": 195161605,
"names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 19226983,
"buckets": [
{
"key": "99900",
"doc_count": 1742,
"industry_stats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "internet",
"doc_count": 1605
},
{
"key": "investment management",
"doc_count": 81
},
{
"key": "biotechnology",
"doc_count": 54
},
{
"key": "computer & network security",
"doc_count": 2
}
]
}
},
{
"key": "998000",
"doc_count": 71,
"industry_stats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "finance",
"doc_count": 48
},
{
"key": "information technology & services",
"doc_count": 23
}
]
}
}
}
]
}
}
enter code here

Related

Elastic search terms aggregation for getting filter options

im trying to implement product searching and want to get search results along with filters to filter from. i have managed to get the filter keys reference, but also want values of those keys
my product body is
{
...product,
"attributes": [
{
"name": "Color",
"value": "Aqua Blue"
},
{
"name": "Gender",
"value": "Female"
},
{
"name": "Occasion",
"value": "Active Wear"
},
{
"name": "Size",
"value": "0"
}
],
}
and im using the this query in es
GET product/_search
{
"aggs": {
"filters": {
"terms": {
"field": "attributes.name"
},
"aggs": {
"values": {
"terms": {
"field": "attributes.value",
"size": 10
}
}
}
}
}
}
Not sure why, but im getting all values for each key
"aggregations": {
"filters": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Color",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Gender",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Occasion",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Size",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
}
]
}
Also i do not want to specify manually all keys explicitly like Color, Size to get their respective values each.
Thanks :)
To keep things simple must you use a single field to store attributes:
"gender":"Male"
I assume you have tons of attributes so you create an array instead, to handle that you will have to use "nested" field type.
Nested type preserves the relation between each of the nested document properties. If you dont use nested you will see all the properties and values mixed and you will not be able to aggregate by a property without manually adding filters.
You can read an article I wrote about that here:
https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
Mappings :
PUT test_product_nested
{
"mappings": {
"properties": {
"attributes": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
This query will only show Red products of size XL and aggregate by attributes.
If you want to do OR's instead of AND's you must use "should" clauses instead of "filter" clauses.
Query
POST test_product_nested/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "attributes",
"query": {
"bool": {
"filter": [
{
"term": {
"attributes.name.keyword": "Color"
}
},
{
"term": {
"attributes.value.keyword": "Red"
}
}
]
}
}
}
},
{
"nested": {
"path": "attributes",
"query": {
"bool": {
"filter": [
{
"term": {
"attributes.name.keyword": "Size"
}
},
{
"term": {
"attributes.value.keyword": "XL"
}
}
]
}
}
}
}
]
}
},
"aggs": {
"attributes": {
"nested": {
"path": "attributes"
},
"aggs": {
"name": {
"terms": {
"field": "attributes.name.keyword"
},
"aggs": {
"values": {
"terms": {
"field": "attributes.value.keyword",
"size": 10
}
}
}
}
}
}
}
}
Results
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0,
"hits": [
{
"_index": "test_product_nested",
"_id": "aJRayoQBtNG1OrZoEOQi",
"_score": 0,
"_source": {
"title": "Product 1",
"attributes": [
{
"name": "Color",
"value": "Red"
},
{
"name": "Gender",
"value": "Female"
},
{
"name": "Occasion",
"value": "Active Wear"
},
{
"name": "Size",
"value": "XL"
}
]
}
}
]
},
"aggregations": {
"attributes": {
"doc_count": 4,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Color",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Red",
"doc_count": 1
}
]
}
},
{
"key": "Gender",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Female",
"doc_count": 1
}
]
}
},
{
"key": "Occasion",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 1
}
]
}
},
{
"key": "Size",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "XL",
"doc_count": 1
}
]
}
}
]
}
}
}
}

Elasticsearch aggregation to get difference in date per bucket

I would like to find the difference between the min date of a series of buckets and the the date of that bucket. For instance I have an elastic aggregation similar to below
"id": {
"terms": {
"field": "data.id"
},
"aggs": {
"min_date": {
"min": {
"field": "data.dateSold"
}
},
"date": {
"date_histogram": {
"field": "data.dateSold",
"calendar_interval": "year"
},
"aggs": {
"sales": {
"sum": {
"field": "data.sales"
}
}
}
}
}
}
}
this produces a result similar too
{
"uwi": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 8559203,
"buckets": [
{
"key": "Tshirts",
"doc_count": 1826,
"date_histogram#date": {
"buckets": [
{
"key_as_string": "2021-01-01T00:00:00.000Z",
"key": 1609459200000,
"doc_count": 364,
"sum#sales": {
"value": 31438.67796
}
},
{
"key_as_string": "2022-01-01T00:00:00.000Z",
"key": 1640995200000,
"doc_count": 365,
"sum#sales": {
"value": 16095.7913
}
}
]
},
"min#min_date": {
"value": 1609459200000,
"value_as_string": "2021-01-01T00:00:00.000Z"
}
...
...
I would like to add an extra value per bucket that is the difference between the date (key) and the min date e.g. I get a result similar to below with an extra 'difference_with_min_date' value per bucket which is the diff between the 'min#min_date' agg and that buckets 'key'
{
"uwi": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 8559203,
"buckets": [
{
"key": "Tshirts",
"doc_count": 1826,
"date_histogram#date": {
"buckets": [
{
"key_as_string": "2021-01-01T00:00:00.000Z",
"key": 1609459200000,
"doc_count": 364,
sum#sales": {
"value": 31438.67796
},
"difference_with_min_date": {
"value": 0
}
},
{
"key_as_string": "2022-01-01T00:00:00.000Z",
"key": 1640995200000,
"doc_count": 365,
"sum#sales": {
"value": 16095.7913
},
"difference_with_min_date": {
"value": 31536000000
}
}
]
},
"min#min_date": {
"value": 1609459200000,
"value_as_string": "2021-01-01T00:00:00.000Z"
}
...
...
Any ideas would be helpful, I have tried to do this with a script with little success as you need to supply the bucket_script path within the 'sales' aggs (ie as a sibling) to do it per bucket value but then you cant reference the uncle min aggregation.
Thanks

How to use elasticsearch facet query to groupby the result

I have a json data in the below format
{
"ID": { "Color": "Black", "Product": "Car" },
"ID": { "Color": "Black", "Product": "Car" },
"ID": { "Color": "Black", "Product": "Van" },
"ID": { "Color": "Black", "Product": "Van" },
"ID": { "Color": "Ash", "Product": "Bike" }
}
I want to calculate the count of car and the corresponding color. I am using elasticsearch facet to do this.
My query
$http.post('http://localhost:9200/product/productinfoinfo/_search?size=5', { "aggregations": { "ProductInfo": { "terms": { "field": "product" } } }, "facets": { "ProductColor": { "terms": { "field": "Color", "size": 10 } } } })
I am getting the output like below
"facets": { "ProductColor": { "_type": "terms", "missing": 0, "total": 7115, "other": 1448, "terms": [ { "term": "Black", "count": 4 }, { "term": "Ash","count":1} },
"aggregations": { "ProductInfo": { "doc_count_error_upper_bound": 94, "sum_other_doc_count": 11414, "buckets": [ { "key": "Car", "doc_count": 2 }, { "key": "Van", "doc_count": 2 }, { "key": "Bike", "doc_count": 1 } ] } } }
What I actually want is,
[ { "key": "Car", "doc_count": 2, "Color":"Black", "count":2 }, { "key": "Van", "doc_count": 2,"Color":"Black", "count":2 }, { "key": "Bike", "doc_count": 1,"Color":"Ash", "count":1 } ]
I would like to groupby the result . Is it possible to do it in elasticsearch query.
Thanks in advance
This is because you're using both aggregations and facets, which, if they are similar, are not meant to be used together.
Facets are deprecated and will be soon removed from ElasticSearch.
Aggregations are the way to go to make "group by"-like queries.
You just have to nest another terms aggregation in the first one, like this :
{
"aggs": {
"By_type": {
"terms": {
"field": "Product"
},
"aggs": {
"By_color": {
"terms": {
"field": "Color"
}
}
}
}
}
}
And the result will be close to what you want :
"aggregations": {
"By_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "bike",
"doc_count": 2,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ash",
"doc_count": 1
},
{
"key": "black",
"doc_count": 1
}
]
}
},
{
"key": "car",
"doc_count": 2,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "black",
"doc_count": 2
}
]
}
},
{
"key": "van",
"doc_count": 1,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "black",
"doc_count": 1
}
]
}
}
]
}
}

elasticsearch - additional field in aggregation results

I have the following aggregation for Categories
{
"aggs": {
"category": {
"terms": { "field": "category.name" }
}
}
}
// results
"category": {
"buckets": [
{
"key": "computer & office",
"doc_count": 365
},
{
"key": "home & garden",
"doc_count": 171
},
{
"key": "consumer electronics",
"doc_count": 49
},
]
}
How can I pass additional field, like category.id to the category buckets, so I could query by category.id if the certain aggregation is clicked by a user. I'm not really clear how to query aggregations, if there's any direct way or you have to make a new query and pass bucket key to query filters.
Use a sub-aggregation on the category.id, you will do a bit more work when looking at the results, but I think it's better than changing the mapping:
{
"aggs": {
"name": {
"terms": {
"field": "name"
},
"aggs": {
"id": {
"terms": {
"field": "id"
}
}
}
}
}
}
And the results will look like the following:
"aggregations": {
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "consumer electronics",
"doc_count": 2,
"id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2,
"doc_count": 2
}
]
}
},
{
"key": "computer & office",
"doc_count": 1,
"id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 5,
"doc_count": 1
}
]
}
},
{
"key": "home & garden",
"doc_count": 1,
"id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 1
}
]
}
},
{
"key": "whatever",
"doc_count": 1,
"id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 3,
"doc_count": 1
}
]
}
}
]
}
}
You will still have the category name, but now you, also, have the id from the second aggregation as a sub-bucket in the root bucket:
"key": "consumer electronics",
...
"id": {
...
"buckets": [
{
"key": 2,
"doc_count": 2
You could add a sub aggregation:
{
"aggs": {
"category": {
"terms": {
field": "category.name",
"aggs": {
"id": {
"terms": { "field": "category.id" }
}
}
}
}
}
}
This way each category.name bucket will contain a single bucket containing the id for that category.

How to Perform a MultiTerms Aggregation using Script?

In the elastic documentation it says I can perform multi-term aggregation if I use script (reference). It is not clear to me how this is done. Basically what I am after is count(*) ... group by logsource,pid. Without a script, it seems I can only do one group by.
Can someone show me an example?
Using script can be costly, but to answer your question,
POST /_search
{
"size": 0,
"aggs": {
"test": {
"terms": {
"script": "doc['logsource'].value+\":\"+doc['pid'].value",
"size": 0
}
}
}
}
Will do!
I think by using sub aggregations I can get the intended result, take for example:
{
"query" : {
"match": {
"message": "error"
}
},
"aggs": {
"g_logsource": {
"terms": {
"field": "logsource"
},
"aggs": {
"g_pid": {
"terms": {
"field": "pid"
},
"aggs" : {
"ts" : {
"date_histogram" : {
"field" : "#timestamp",
"interval" : "1h"
}
}
}
}
}
}
}
}
Returns:
"aggregations": {
"g_logsource": {
"doc_count_error_upper_bound": 0,
"buckets": [
{
"key": "nyhq",
"doc_count": 2129,
"g_pid": {
"doc_count_error_upper_bound": 5,
"buckets": [
{
"key": "5641",
"doc_count": 9,
"ts": {
"buckets": [
{
"key_as_string": "2014-12-07T04:00:00.000Z",
"key": 1417924800000,
"doc_count": 2
},
{
"key_as_string": "2014-12-07T08:00:00.000Z",
"key": 1417939200000,
"doc_count": 4
},
{
"key_as_string": "2014-12-07T18:00:00.000Z",
"key": 1417975200000,
"doc_count": 1
},
{
"key_as_string": "2014-12-07T20:00:00.000Z",
"key": 1417982400000,
"doc_count": 2
}
]
}
},
{
"key": "14839",
"doc_count": 3,
"ts": {
"buckets": [
{
"key_as_string": "2014-12-07T09:00:00.000Z",
"key": 1417942800000,
"doc_count": 1
},
{
"key_as_string": "2014-12-07T20:00:00.000Z",
"key": 1417982400000,
"doc_count": 2
}
]
}
}
In my code, I can then combine groups to be {logsource: nyhq, pid: 5641} as the identifer for each time series. I think this is the same as GROUP BY in SQL. Would appreciate any comments confirming this.

Resources