how to group by duplicate Field in Array List : ElasticSearch - elasticsearch

I had problem with nested aggregation in Elasticsearch. I have mapping with nested field:
"Topics":{"type":"nested","properties":{
"CategoryLev1":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},
"CategoryLev2":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}} }}
After index Document:
"Topics": [
{
"CategoryRelevancy": "1.0",
"CategoryLev2": "Money",
"CategoryLev1": "Sales"
},
{
"CategoryRelevancy": "2.0",
"CategoryLev2": "Money",
"CategoryLev1": "Sales"
},
{
"CategoryRelevancy": "1.0",
"CategoryLev2": "Electrical",
"CategoryLev1": "Product"
}
]
"Topics": [
{
"CategoryRelevancy": "1.0",
"CategoryLev2": "Money",
"CategoryLev1": "Sales"
},
{
"CategoryRelevancy": "2.0",
"CategoryLev2": "Methods",
"CategoryLev1": "Sales"
},
{
"CategoryRelevancy": "1.0",
"CategoryLev2": "Engine",
"CategoryLev1": "Product"
}
]
As you see, in my nested array I have two Topics, which have Duplicate key and Value field Then I make such query:
{
"size": 10,
"aggregations": {
"resellers": {
"nested": {
"path": "Topics"
},
"aggregations": {
"topicGroup": {
"terms": {
"field": "Topics.CategoryLev1.keyword",
"size": 10
},
"aggregations": {
"Subtopic": {
"terms": {
"field": "Topics.CategoryLev2.keyword"
}
}
}
}
}
}
}
}
Then I get following result which has group by with topic Category
{
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"resellers": {
"doc_count": 6,
"topicGroup": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Sales",
"doc_count": 3,
"Subtopic": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Money",
"doc_count": 3
},
{
"key": "Method",
"doc_count": 1
}
]
}
},
{
"key": "Product",
"doc_count": 2,
"Subtopic": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Electrical",
"doc_count": 1
},
{
"key": "Engine",
"doc_count": 1
}
]
}
}
]
}
}
}
}
But I Want to result Like this
"buckets": [
{
"key": "Sales",
"doc_count": 2,
"Subtopic": {
"buckets": [
{
"key": "Money",
"doc_count": 2
},
{
"key": "Method",
"doc_count": 1
}
]
}
},
{
"key": "Product",
"doc_count": 2,
"Subtopic": {
"buckets": [
{
"key": "Electrical",
"doc_count": 1
},
{
"key": "Engine",
"doc_count": 1
}]
}
}]
Thanks in advance :)

Related

Elastic search terms aggregation for getting filter options

im trying to implement product searching and want to get search results along with filters to filter from. i have managed to get the filter keys reference, but also want values of those keys
my product body is
{
...product,
"attributes": [
{
"name": "Color",
"value": "Aqua Blue"
},
{
"name": "Gender",
"value": "Female"
},
{
"name": "Occasion",
"value": "Active Wear"
},
{
"name": "Size",
"value": "0"
}
],
}
and im using the this query in es
GET product/_search
{
"aggs": {
"filters": {
"terms": {
"field": "attributes.name"
},
"aggs": {
"values": {
"terms": {
"field": "attributes.value",
"size": 10
}
}
}
}
}
}
Not sure why, but im getting all values for each key
"aggregations": {
"filters": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Color",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Gender",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Occasion",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Size",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
}
]
}
Also i do not want to specify manually all keys explicitly like Color, Size to get their respective values each.
Thanks :)
To keep things simple must you use a single field to store attributes:
"gender":"Male"
I assume you have tons of attributes so you create an array instead, to handle that you will have to use "nested" field type.
Nested type preserves the relation between each of the nested document properties. If you dont use nested you will see all the properties and values mixed and you will not be able to aggregate by a property without manually adding filters.
You can read an article I wrote about that here:
https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
Mappings :
PUT test_product_nested
{
"mappings": {
"properties": {
"attributes": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
This query will only show Red products of size XL and aggregate by attributes.
If you want to do OR's instead of AND's you must use "should" clauses instead of "filter" clauses.
Query
POST test_product_nested/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "attributes",
"query": {
"bool": {
"filter": [
{
"term": {
"attributes.name.keyword": "Color"
}
},
{
"term": {
"attributes.value.keyword": "Red"
}
}
]
}
}
}
},
{
"nested": {
"path": "attributes",
"query": {
"bool": {
"filter": [
{
"term": {
"attributes.name.keyword": "Size"
}
},
{
"term": {
"attributes.value.keyword": "XL"
}
}
]
}
}
}
}
]
}
},
"aggs": {
"attributes": {
"nested": {
"path": "attributes"
},
"aggs": {
"name": {
"terms": {
"field": "attributes.name.keyword"
},
"aggs": {
"values": {
"terms": {
"field": "attributes.value.keyword",
"size": 10
}
}
}
}
}
}
}
}
Results
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0,
"hits": [
{
"_index": "test_product_nested",
"_id": "aJRayoQBtNG1OrZoEOQi",
"_score": 0,
"_source": {
"title": "Product 1",
"attributes": [
{
"name": "Color",
"value": "Red"
},
{
"name": "Gender",
"value": "Female"
},
{
"name": "Occasion",
"value": "Active Wear"
},
{
"name": "Size",
"value": "XL"
}
]
}
}
]
},
"aggregations": {
"attributes": {
"doc_count": 4,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Color",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Red",
"doc_count": 1
}
]
}
},
{
"key": "Gender",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Female",
"doc_count": 1
}
]
}
},
{
"key": "Occasion",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 1
}
]
}
},
{
"key": "Size",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "XL",
"doc_count": 1
}
]
}
}
]
}
}
}
}

Elasticsearch: Filter results in deep aggregation

I'm trying to retrieve the number of students participating in a combination of sport activities. I've tried using deep aggregation, but I would like to exclude events of the same eventid such that the event is not counted towards the aggregated results if the combination criteria of e.g. 100m and 100m is not met (i.e. both 100m events should have unique eventid). Would this be achievable using elasticsearch?
Mapping:
{
"properties": {
"events": {
"type": "nested",
"include_in_parent": true
}
}
}
Student data:
{
"name": "Alice"
"events": [
{"activity": "400m", "eventid": "4000"},
{"activity": "800m", "eventid": "8000"},
{"activity": "100m", "eventid": "1000"},
{"activity": "100m", "eventid": "1001"}
]
},
{
"name": "Bob"
"events": [
{"activity": "100m", "eventid": "1000"},
{"activity": "400m", "eventid": "4000"}
]
}
{
"name": "Cat"
"events": [
{"activity": "400m", "eventid": "4000"},
{"activity": "400m", "eventid": "4001"}
]
}
{
"name": "Dillian"
"events": [
{"activity": "100m", "eventid": "1001"},
{"activity": "800m", "eventid": "8000"}
]
}
Query:
{
"from": 0,
"size": 0,
"aggregations": {
"activity1": {
"terms": {
"field": "events.activity.keyword",
"size": 5,
"order": {
"_term": "asc"
}
},
"aggregations": {
"activity2": {
"terms": {
"field": "events.activity.keyword",
"size": 5,
"order": {
"_term": "asc"
}
}
}
}
}
}
}
Incorrect Result:
"aggregations": {
"activity1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100m",
"doc_count": 3,
"activity2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100m",
"doc_count": 3
},
{
"key": "400m",
"doc_count": 2
},
{
"key": "800m",
"doc_count": 2
}
]
}
},
{
"key": "400m",
"doc_count": 3,
"activity2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100m",
"doc_count": 2
},
{
"key": "400m",
"doc_count": 3
},
{
"key": "800m",
"doc_count": 1
}
]
}
},
{
"key": "800m",
"doc_count": 2,
"activity2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100m",
"doc_count": 2
},
{
"key": "400m",
"doc_count": 1
},
{
"key": "800m",
"doc_count": 2
}
]
}
}
]
}
}
Required Result:
"aggregations": {
"activity1": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100m",
"doc_count": 3,
"activity2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100m",
"doc_count": 1
},
{
"key": "400m",
"doc_count": 2
},
{
"key": "800m",
"doc_count": 2
}
]
}
},
{
"key": "400m",
"doc_count": 3,
"activity2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100m",
"doc_count": 2
},
{
"key": "400m",
"doc_count": 1
},
{
"key": "800m",
"doc_count": 1
}
]
}
},
{
"key": "800m",
"doc_count": 2,
"activity2": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "100m",
"doc_count": 2
},
{
"key": "400m",
"doc_count": 1
},
{
"key": "800m",
"doc_count": 0
}
]
}
}
]
}
}

ElasticSearch aggregation by all tokens in a string field

I have ElasticSearch 2.4 and I'm trying to do an aggregation on a text field of type String which contains multiple tokens. The field in question is an address field called mailingAddress. For example, below are a few results which look for NY in the address field.
{
"from": 0,
"size": 100,
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match": {
"customerprofile.mailingAddress": {
"query": "NY",
"fuzziness": 0,
"operator": "or"
}
}
},
{
"match": {
"customerprofile.companyId": {
"query": "999",
"fuzziness": 0,
"operator": "or"
}
}
}
]
}
}
]
}
}
}
returns
"hits":[
{
"_index":"wht_index_prod_v33_es24",
"_type":"customerprofile",
"_id":"2044",
"_score":2.9787974,
"_source":{
"customerId":2044,
"companyId":2007,
"fullName":"John Doe",
"email":"jon#aol.com",
"pictureURL":"john.png",
"profilePictureContentType":"image/png",
"phone":"(703) 999-8888",
"mailingAddress":"100 Lake Braddock Drive\nBurke, NY 22015",
"gender":"Male",
"emergencyContactsIds":[
],
"wantCorrespondence":false
}
},
{
"_index":"wht_index_prod_v33_es24",
"_type":"customerprofile",
"_id":"2045",
"_score":2.9787974,
"_source":{
"customerId":2045,
"companyId":2007,
"fullName":"Jane Anderson",
"email":"janea#touchva.net",
"pictureURL":"JAnderson.png",
"profilePictureContentType":"image/png",
"phone":"(434) 111-2345",
"mailingAddress":"PO Box 333, Boydton, NY 23917",
"gender":"Male",
"emergencyContactsIds":[
],
"wantCorrespondence":false
}
},
..
..
]
The question
When I do the aggregation by mailingAddress I expect to see buckets for each word in the text field. From the results above I expect to also find a bucket key named 'NY' but there isn't one. Can anyone explain why - my guess is that it has too few entries?
The aggregation:
{
"size": 0,
"aggs": {
"group_by_age": {
"terms": {
"field": "mailingAddress"
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender"
}
}
}
}
}
}
Aggregation results:
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 401,
"max_score": 0,
"hits": [
]
},
"aggregations": {
"group_by_age": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1041,
"buckets": [
{
"key": "st",
"doc_count": 30,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 17
},
{
"key": "male",
"doc_count": 13
}
]
}
},
{
"key": "ca",
"doc_count": 28,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 21
},
{
"key": "male",
"doc_count": 7
}
]
}
},
{
"key": "dr",
"doc_count": 16,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 13
},
{
"key": "male",
"doc_count": 3
}
]
}
},
{
"key": "street",
"doc_count": 15,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 11
},
{
"key": "male",
"doc_count": 4
}
]
}
},
{
"key": "ave",
"doc_count": 14,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 7
},
{
"key": "male",
"doc_count": 7
}
]
}
},
{
"key": "box",
"doc_count": 11,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 9
},
{
"key": "male",
"doc_count": 2
}
]
}
},
{
"key": "fl",
"doc_count": 11,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 9
},
{
"key": "male",
"doc_count": 2
}
]
}
},
{
"key": "va",
"doc_count": 11,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "male",
"doc_count": 6
},
{
"key": "female",
"doc_count": 5
}
]
}
},
{
"key": "n",
"doc_count": 10,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 7
},
{
"key": "male",
"doc_count": 3
}
]
}
},
{
"key": "az",
"doc_count": 9,
"group_by_gender": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "female",
"doc_count": 7
},
{
"key": "male",
"doc_count": 2
}
]
}
}
]
}
}
}
By default, terms aggregation return the first 10 terms, but you can decide to return more by specifying a size in your aggregation, like this:
{
"size": 0,
"aggs": {
"group_by_age": {
"terms": {
"field": "mailingAddress",
"size": 50 <---- add this
},
"aggs": {
"group_by_gender": {
"terms": {
"field": "gender"
}
}
}
}
}
}
Your mileage may vary and you might need to increase the size in order to really see NY.

Elasticsearch - Get aggregation key sort as number

I made query result that aggregate some data, and its aggregation key is number. I tried to sort result of aggregation by key. elasticsearch treated key as string.
Since the number of current result bucket is pretty large, it's unable to modify on client side. Any idea of this?
Here is my query.
"aggregations" : {
"startcount" : {
"terms" : {
"script" : "round(doc['startat'].value/1000)",
"size" : 1000,
"order" : { "_term" : "asc" }
}
}
}
and current result bucket.
"buckets": [
{
"key": "0",
"doc_count": 68
},
{
"key": "1",
"doc_count": 21
},
{
"key": "10",
"doc_count": 6
},
{
"key": "11",
"doc_count": 16
},
It's my expect result.
"buckets": [
{
"key": "0",
"doc_count": 68
},
{
"key": "1",
"doc_count": 21
},
{
"key": "2", // not '10'
"doc_count": 6
},
{
"key": "3", // not '11'
"doc_count": 16
},
Using the value_script approach should fix the alphabetical sort issue:
Example:
{
"size": 0,
"aggregations": {
"startcount": {
"terms": {
"field": "startat",
"script": "round(_value/1000)",
"size": 1000,
"order": {
"_term": "asc"
}
}
}
}
}
This is a multiple group by scenario where data are being sorted by the key descending order.
{
"size": 0,
"aggs": {
"categories": {
"filter": {
"exists": {
"field": "organization_industries"
}
},
"aggs": {
"names": {
"terms": {
"field": "organization_revenue_in_thousands_int.keyword",
"size": 200,
"order": {
"_key": "desc"
}
},
"aggs": {
"industry_stats": {
"terms": {
"field": "organization_industries.keyword"
}
}
}
}
}
}
}
}
Output
"aggregations": {
"categories": {
"doc_count": 195161605,
"names": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 19226983,
"buckets": [
{
"key": "99900",
"doc_count": 1742,
"industry_stats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "internet",
"doc_count": 1605
},
{
"key": "investment management",
"doc_count": 81
},
{
"key": "biotechnology",
"doc_count": 54
},
{
"key": "computer & network security",
"doc_count": 2
}
]
}
},
{
"key": "998000",
"doc_count": 71,
"industry_stats": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "finance",
"doc_count": 48
},
{
"key": "information technology & services",
"doc_count": 23
}
]
}
}
}
]
}
}
enter code here

How to use elasticsearch facet query to groupby the result

I have a json data in the below format
{
"ID": { "Color": "Black", "Product": "Car" },
"ID": { "Color": "Black", "Product": "Car" },
"ID": { "Color": "Black", "Product": "Van" },
"ID": { "Color": "Black", "Product": "Van" },
"ID": { "Color": "Ash", "Product": "Bike" }
}
I want to calculate the count of car and the corresponding color. I am using elasticsearch facet to do this.
My query
$http.post('http://localhost:9200/product/productinfoinfo/_search?size=5', { "aggregations": { "ProductInfo": { "terms": { "field": "product" } } }, "facets": { "ProductColor": { "terms": { "field": "Color", "size": 10 } } } })
I am getting the output like below
"facets": { "ProductColor": { "_type": "terms", "missing": 0, "total": 7115, "other": 1448, "terms": [ { "term": "Black", "count": 4 }, { "term": "Ash","count":1} },
"aggregations": { "ProductInfo": { "doc_count_error_upper_bound": 94, "sum_other_doc_count": 11414, "buckets": [ { "key": "Car", "doc_count": 2 }, { "key": "Van", "doc_count": 2 }, { "key": "Bike", "doc_count": 1 } ] } } }
What I actually want is,
[ { "key": "Car", "doc_count": 2, "Color":"Black", "count":2 }, { "key": "Van", "doc_count": 2,"Color":"Black", "count":2 }, { "key": "Bike", "doc_count": 1,"Color":"Ash", "count":1 } ]
I would like to groupby the result . Is it possible to do it in elasticsearch query.
Thanks in advance
This is because you're using both aggregations and facets, which, if they are similar, are not meant to be used together.
Facets are deprecated and will be soon removed from ElasticSearch.
Aggregations are the way to go to make "group by"-like queries.
You just have to nest another terms aggregation in the first one, like this :
{
"aggs": {
"By_type": {
"terms": {
"field": "Product"
},
"aggs": {
"By_color": {
"terms": {
"field": "Color"
}
}
}
}
}
}
And the result will be close to what you want :
"aggregations": {
"By_type": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "bike",
"doc_count": 2,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "ash",
"doc_count": 1
},
{
"key": "black",
"doc_count": 1
}
]
}
},
{
"key": "car",
"doc_count": 2,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "black",
"doc_count": 2
}
]
}
},
{
"key": "van",
"doc_count": 1,
"By_color": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "black",
"doc_count": 1
}
]
}
}
]
}
}

Resources