ElasticSearch aggregations - to lowercase or not to lowercase

ElasticSearch aggregations - to lowercase or not to lowercase - elasticsearch

Please observe this secenario:
Define mappings
PUT /my_index
{
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Add data
PUT /my_index/my_type/1
{
"city": "New York"
}
PUT /my_index/my_type/2
{
"city": "York"
}
PUT /my_index/my_type/3
{
"city": "york"
}
Query for facets
GET /my_index/_search
{
"size": 0,
"aggs": {
"Cities": {
"terms": {
"field": "city.raw"
}
}
}
}
Result
{
...
"aggregations": {
"Cities": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
},
{
"key": "York",
"doc_count": 1
},
{
"key": "york",
"doc_count": 1
}
]
}
}
}
Dilemma
I would like to 2 thing:
"York" and "york" should be combined so instead of 3 buckets with each 1 hit I would 2 buckets, one for "New York (1)" and one for "York (2)"
The casing of the city must be preserved - I don't want facet values to be all lowercased
Dream result
{
...
"aggregations": {
"Cities": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
},
{
"key": "York",
"doc_count": 2
}
]
}
}
}

It's going to make your client-side code slightly more complicated, but you could always do something like this.
Set up the index with an additional sub-field that is only lower-cased (not split on white space):
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"lowercase_analyzer": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"my_type": {
"properties": {
"city": {
"type": "string",
"fields": {
"lowercase": {
"type": "string",
"analyzer": "lowercase_analyzer"
},
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
PUT /my_index/my_type/_bulk
{"index":{"_id":1}}
{"city":"New York"}
{"index":{"_id":2}}
{"city":"York"}
{"index":{"_id":3}}
{"city":"york"}
Then use a two-level aggregation like this, where the second orders alphabetically ascending (so that upper-case term will come first) and only returns the top raw term for each lower-case term:
GET /my_index/_search
{
"size": 0,
"aggs": {
"city_lowercase": {
"terms": {
"field": "city.lowercase"
},
"aggs": {
"city_terms": {
"terms": {
"field": "city.raw",
"order" : { "_term" : "asc" },
"size": 1
}
}
}
}
}
}
which returns:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0,
"hits": []
},
"aggregations": {
"city_lowercase": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "york",
"doc_count": 2,
"city_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 1,
"buckets": [
{
"key": "York",
"doc_count": 1
}
]
}
},
{
"key": "new york",
"doc_count": 1,
"city_terms": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "New York",
"doc_count": 1
}
]
}
}
]
}
}
}
Here's the code I used (with a few more doc examples):
http://sense.qbox.io/gist/f3781d58fbaadcc1585c30ebb087108d2752dfff

Related

Elastic search terms aggregation for getting filter options

im trying to implement product searching and want to get search results along with filters to filter from. i have managed to get the filter keys reference, but also want values of those keys
my product body is
{
...product,
"attributes": [
{
"name": "Color",
"value": "Aqua Blue"
},
{
"name": "Gender",
"value": "Female"
},
{
"name": "Occasion",
"value": "Active Wear"
},
{
"name": "Size",
"value": "0"
}
],
}
and im using the this query in es
GET product/_search
{
"aggs": {
"filters": {
"terms": {
"field": "attributes.name"
},
"aggs": {
"values": {
"terms": {
"field": "attributes.value",
"size": 10
}
}
}
}
}
}
Not sure why, but im getting all values for each key
"aggregations": {
"filters": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Color",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Gender",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Occasion",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
},
{
"key": "Size",
"doc_count": 3,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 3
},
{
"key": "Aqua Blue",
"doc_count": 3
},
{
"key": "Female",
"doc_count": 3
},
{
"key": "0",
"doc_count": 2
},
{
"key": "10XL",
"doc_count": 1
}
]
}
}
]
}
Also i do not want to specify manually all keys explicitly like Color, Size to get their respective values each.
Thanks :)

To keep things simple must you use a single field to store attributes:
"gender":"Male"
I assume you have tons of attributes so you create an array instead, to handle that you will have to use "nested" field type.
Nested type preserves the relation between each of the nested document properties. If you dont use nested you will see all the properties and values mixed and you will not be able to aggregate by a property without manually adding filters.
You can read an article I wrote about that here:
https://opster.com/guides/elasticsearch/data-architecture/elasticsearch-nested-field-object-field/
Mappings :
PUT test_product_nested
{
"mappings": {
"properties": {
"attributes": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
This query will only show Red products of size XL and aggregate by attributes.
If you want to do OR's instead of AND's you must use "should" clauses instead of "filter" clauses.
Query
POST test_product_nested/_search
{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "attributes",
"query": {
"bool": {
"filter": [
{
"term": {
"attributes.name.keyword": "Color"
}
},
{
"term": {
"attributes.value.keyword": "Red"
}
}
]
}
}
}
},
{
"nested": {
"path": "attributes",
"query": {
"bool": {
"filter": [
{
"term": {
"attributes.name.keyword": "Size"
}
},
{
"term": {
"attributes.value.keyword": "XL"
}
}
]
}
}
}
}
]
}
},
"aggs": {
"attributes": {
"nested": {
"path": "attributes"
},
"aggs": {
"name": {
"terms": {
"field": "attributes.name.keyword"
},
"aggs": {
"values": {
"terms": {
"field": "attributes.value.keyword",
"size": 10
}
}
}
}
}
}
}
}
Results
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 0,
"hits": [
{
"_index": "test_product_nested",
"_id": "aJRayoQBtNG1OrZoEOQi",
"_score": 0,
"_source": {
"title": "Product 1",
"attributes": [
{
"name": "Color",
"value": "Red"
},
{
"name": "Gender",
"value": "Female"
},
{
"name": "Occasion",
"value": "Active Wear"
},
{
"name": "Size",
"value": "XL"
}
]
}
}
]
},
"aggregations": {
"attributes": {
"doc_count": 4,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Color",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Red",
"doc_count": 1
}
]
}
},
{
"key": "Gender",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Female",
"doc_count": 1
}
]
}
},
{
"key": "Occasion",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Active Wear",
"doc_count": 1
}
]
}
},
{
"key": "Size",
"doc_count": 1,
"values": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "XL",
"doc_count": 1
}
]
}
}
]
}
}
}
}

elastic search nested sub aggregations

We are using elastic search which holds records as documents with following definition
{
"loadtender": {
"aliases": {},
"mappings": {
"_doc": {
"_meta": {
"version": 20
},
"properties": {
"carrierId": {
"type": "long"
},
"destinationData": {
"type": "keyword"
},
"destinationZip": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 50
}
}
},
"effStartTime": {
"type": "date"
},
"endTime": {
"type": "date"
},
"id": {
"type": "long"
},
"mustRespondByTime": {
"type": "date"
},
"orgdiv": {
"type": "keyword"
},
"originData": {
"type": "keyword"
},
"originZip": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 50
}
}
},
"purchaseOrderNum": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 255
}
}
},
"startTime": {
"type": "date"
},
"tenderStatus": {
"type": "keyword"
},
"tenderedTime": {
"type": "date"
}
}
}
},
"settings": {
"index": {
"creation_date": "1655105542470",
"number_of_shards": "5",
"number_of_replicas": "1",
"uuid": "ohcXgA8EQ5iJj0X6_4BqXA",
"version": {
"created": "6080499"
},
"provided_name": "loadtender"
}
}
}
}
I am trying to search records to return me following filtered results
Input Parameter : startDate (yesterday), originData.originCity and originData.destinationCity
Output Required:
Three buckets for 0-30 days, 30-60 days and 60-90 days
buckets of distinct originData.city and destinationData.city combinations under each of the above
Under each of the above, buckets of data for each unique carrierId and the corresponding record list / count
Basically I was trying to achieve something like the below
{
"aggregations": {
"aggr": {
"buckets": [
{
"key": "0-30 days",
"doc_count": 10,
"aggr": {
"buckets": [
{
"key": "(originCity)Menasha, WI, US|Hanover, MD, US (DestinationCity)",
"aggr": {
"buckets": [
{
"key": "10183-carrierId",
"count": 10
}
]
}
}
]
}
},
{
"key": "30-60 days",
"doc_count": 11,
"aggr": {
"buckets": [
{
"key": "Dallas, TX, US|Houston, TX, US",
"aggr": {
"buckets": [
{
"key": "10183-carrierId",
"count": 10
},
{
"key": "10022-carrierId",
"count": 1
}
]
}
}
]
}
}
]
}
}
}
I've tried the following but I think I am not finding a way to filter it further using the sub aggregators.
{
"_source":["id", "effStartTime", "carrierId", "originData", "destinationData"],
"size": 100,
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"range": {
"startTime": {
"from": "2021-08-27T23:59:59.000Z",
"to": "2022-09-01T00:00:00.000Z",
"include_lower": true,
"include_upper": true,
"boost": 1
}
}
}
],
"adjust_pure_negative": true,
"boost": 1
}
}
],
"must_not": [
{
"term": {
"tenderStatus": {
"value": "REMOVED",
"boost": 1
}
}
}
],
"filter" : {
"exists" : {
"field" : "carrierId"
}
},
"adjust_pure_negative": true,
"boost": 1
}
},
"aggregations": {
"aggr": {
"terms": {
"script": "doc['originData'].values[0] + '|' + doc['destinationData'].values[0]"
}
}
}
}
I started beginning to think if this is even possible OR should I shift to issuing multiple queries for the same

I was able to achieve the same using the following sub-aggregations:
"aggregations": {
"aggr":{
"date_range": {
"field": "startTime",
"format": "MM-yyyy",
"ranges": [
{"to": "now-1M/M", "from": "now"}, --> now to 30 days back
{"to": "now-1M/M", "from": "now-2M/M"}, from 30 days back to 60 days back
{"to": "now-2M/M", "from": "now-3M/M"}, from 60 days back to 90 days back
{"to": "now-3M/M", "from": "now-12M/M"}
]
},
"aggregations": {
"aggr":{
"terms": {
"script": "doc['originData'].values[0] + '|' + doc['destinationData'].values[0]" --> concatenated origin and destination address as a key
},
"aggregations": {
"aggr": {
"terms": {
"field": "carrierId" --> nested carrier count
}
}
}
}
}
}
}
Following is the response template that I receive.
"aggregations": {
"aggr": {
"buckets": [
{
"key": "09-2021-06-2022",
"from": 1630454400000,
"from_as_string": "09-2021",
"to": 1654041600000,
"to_as_string": "06-2022",
"doc_count": 1,
"aggr": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Dallas, TX, US|Houston, TX, US",
"doc_count": 14,
"aggr": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 10022,
"doc_count": 14
}
]
}
}
]
}
}
]
}
}
Thank you to all of you for your efforts and time. Do let me know if you discover any better way.

What is the elastic search query for nested aggregation to return buckets of count values?

I have data of individual customers in Elastic Search, whose likings of Food_Item are stored as shown below. A customer likes many "Food_Items". So its a list. I have many customers also.
I have data in the following format:
{
"id": 1,
"customerName":"John",
"likings":[
{
"Food_Item": "Pizza",
"OnAScaleOfTen": 9
},
{
"Food_Item": "Chinese",
"OnAScaleOfTen": 10
}
]
},
{
"id": 2,
"customerName":"Mary",
"likings":[
{
"Food_Item": "Burger",
"OnAScaleOfTen": 10
},
{
"Food_Item": "Chinese",
"OnAScaleOfTen": 6
}
]
}
Now if i want to bucket list the unique "Food_Items" and their corresponding count something like this in the AGGR result:
"Liking_Status": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Chinese",
"Liking Count": {
"value": 2
}
},
{
"key": "Pizza",
"Liking Count": {
"value": 1
}
},
{
"key": "Burger",
"Liking Count": {
"value": 1
}
}]}
My mapping for the index is:
{
"mappings": {
"doc": {
"properties": {
"customerName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"id": {
"type": "long"
},
"likings": {
"type":"nested",
"properties": {
"Food_Item": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"OnAScaleOfTen": {
"type": "long"
}
}
}
}
}
}
}
Can anyone help me with the Elastic Search Query. Thank you.

What you need is nested aggregation.
{
"size": 0,
"aggs": {
"buckets": { //aggregating on nested field
"nested": {
"path": "likings"
},
"aggs": {
"liking_count": {//term aggregation on the obj
"terms": {
"field": "likings.Food_Item.keyword"
}
}
}
}
}
}
Mapping:
I just mentioned that likings as nested. Apart from others are default. In this case, Food_Item is a text. Terms aggs works on keywords. So used keyword version of it from the index.
Output:
"aggregations": {
"buckets": {
"doc_count": 4,
"liking_count": { //You can name what you want here
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Chinese",
"doc_count": 2
},
{
"key": "Burger",
"doc_count": 1
},
{
"key": "Pizza",
"doc_count": 1
}
]
}
}
}

ES - Sub buckets based on values in bucket property rather than document values

I am new to ElasticSearch and I am trying to bucket objects coming from a search by hierarchical categories.
I apologize in advance for the length of the question but I wanted to give ample samples and information to make the need as clear as possible.
What I am Trying to Achieve
The problem is that categories form a hierarchy but are represented as a flat array of objects, each with a depth. I would like to generate an aggregation that would bucket by category and category depth.
Here is a simplified mapping for the document that contains only the minimum data:
{
"mappings": {
"_doc": {
"properties": {
"categoriesList": {
"properties": {
"depth": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Here is a simplified sample document:
{
"_index": "x",
"_type": "_doc",
"_id": "wY0w5GYBOIOl7fi31c_b",
"_score": 22.72073,
"_source": {
"categoriesList": [
{
"title": "category_lvl_2_2",
"depth": 2
},
{
"title": "category_lvl_2",
"depth": 2,
},
{
"title": "category_lvl_1",
"depth": 1
}
]
}
}
Now, what I am trying to achieve is to get hierarchical buckets of categories based on their depth i.e. I want to have a bucket that contains all titles of categories of depth 1 across all hits, then another bucket (or sub-bucket with the titles of just the categories of depth 2 across all hits, and so on.
Something like:
"aggregations": {
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47,
"depth_1": {
"doc_count": 47
}
}
]
}
},
{
"key": 2,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_2_1",
"doc_count": 47
},
{
"key": "category_lvl_2_2",
"doc_count": 33
}
]
}
}
]
}
}
What I have tried
At first I tried to simply create nested aggregations as follows:
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"name": {
"terms": {
"field": "categoriesList.title.keyword"
},
}
}
}
}
This, of course, did not give what I wanted. It basically gave me buckets whose keys were by depth but that contained all titles of all categories no matter what their depth was; the contents were the same. Something like the following:
"aggregations": {
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 1,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
},
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
},
{
"key": 2,
"doc_count": 47,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
},
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
}
]
}
}
Then I tried to see if a filtered aggregation would work by trying to filter one sub-bucket by value of depth 1:
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"name": {
"terms": {
"field": "categoriesList.title.keyword"
},
"aggs": {
"depth_1": {
"filter": {
"term": {
"categoriesList.depth": 1
}
}
}
}
}
}
}
}
This gave the same results as the simple aggregation query above but with an extra nesting level that served no purpose.
The question
With my current understanding of ES, what I am seeing makes sense: it goes over each document from the search and then creates buckets based on category depth but since each document has at least one category with each depth, the entire categories list is added to the bucket.
Is what I am trying to do possible with ES? I get the feeling that this will not work because I am basically trying to bucket and filter the properties used by the initial bucketing query rather than working on the document properties.
I could also bucket myself directly in code since we are getting the categories results but I wanted to know if it was possible to get this done on ES' side which would save me from modifying quite a bit of existing code we have.
Thanks!

Based on sramalingam24's comment I did the following to get it working:
Create an index with a mapping specifying nested types
I changed the mapping to tell ES that the categoriesList property was a nested object. To do so I created a new index with the following mapping:
{
"mappings": {
"_doc": {
"properties": {
"categoriesList": {
"type": "nested",
"properties": {
"depth": {
"type": "long"
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
}
Reindex into the new index
Then I reindex from the old index to the new one.
{
"source": {
"index": "old_index"
},
"dest": {
"index": "index_with_nested_mapping"
}
}
Use a nested aggregation
Then I used a nested aggregation similar to this:
{
"aggs": {
"categories": {
"nested": {
"path": "categoriesList"
},
"aggs": {
"depth": {
"terms": {
"field": "categoriesList.depth"
},
"aggs": {
"sub-categories": {
"terms": {
"field": "categoriesList.title.keyword"
}
}
}
}
}
}
}
}
Which gave me the results I desired:
{
"aggregations": {
"categories": {
"doc_count": 96,
"depth": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": 2,
"doc_count": 49,
"sub-categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_2_1",
"doc_count": 33
},
{
"key": "category_lvl_2_2",
"doc_count": 15
}
]
}
},
{
"key": 1,
"doc_count": 47,
"sub-categories": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "category_lvl_1",
"doc_count": 47
}
]
}
}
]
}
}
}
}

ElasticSearch - Aggregations with document details

I need to aggregate the following documents:
{
"title": "American Psycho",
"releaseDate": "7/06/2000",
"imdbRate": "7.6",
"casting": [
{
"name": "Christian Bale",
"category": "Actor"
},
{
"name": "Justin Theroux",
"category": "Actor"
}
]
}
{
"title": "The Dark Knight",
"releaseDate": "13/08/2008",
"imdbRate": "9.0",
"casting": [
{
"name": "Christian Bale",
"category": "Actor"
},
{
"name": "Morgan Freeman",
"category": "Actor"
}
]
}
by actor, and would like to get the following structure:
[
{"name": "Christian Bale"},
{"movies": [
{
"title": "American Psycho",
"releaseDate": "7/06/2000",
"imdbRate": "7.6"
},
{
"title": "The Dark Knight",
"releaseDate": "13/08/2008",
"imdbRate": "9.0"
}, ...
]
Beyong using a standard term aggregation based on the casting.name field, how can I retrieve the releaseDate and imdbRate of the related documents?
For each actor, I also need movies to be sorted by releaseDate asc.
Can I perform this using one single request?

As you have an array of casting objects in your documents you'll need to use the nested type in your mapping. To get the aggregations you want you need a combination of Terms Aggregations, Nested Aggregations and Reverse Nested Aggregations. Below is an example.
Create and index with the mapping:
POST /test
{
"mappings": {
"movie": {
"properties": {
"title": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"releaseDate": {
"type": "string",
"index": "not_analyzed"
},
"casting": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"fields":{
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"category": {
"type": "string",
"fields":{
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}
Index the documents:
POST /test/movie/1
{
"title": "American Psycho",
"releaseDate": "7/06/2000",
"imdbRate": "7.6",
"casting": [
{
"name": "Christian Bale",
"category": "Actor"
},
{
"name": "Justin Theroux",
"category": "Actor"
}
]
}
POST /test/movie/2
{
"title": "The Dark Knight",
"releaseDate": "13/08/2008",
"imdbRate": "9.0",
"casting": [
{
"name": "Christian Bale",
"category": "Actor"
},
{
"name": "Morgan Freeman",
"category": "Actor"
}
]
}
And finally search:
POST /test/movie/_search?search_type=count
{
"aggs": {
"nested_path": {
"nested": {
"path": "casting"
},
"aggs": {
"actor_name": {
"terms": {
"field": "casting.name.raw"
},
"aggs": {
"movies": {
"reverse_nested": {},
"aggs": {
"movie_title": {
"terms": {
"field": "title.raw"
},
"aggs": {
"release_date": {
"terms": {
"field": "releaseDate"
}
},
"imdbRate_date": {
"terms": {
"field": "imdbRate"
}
}
}
}
}
}
}
}
}
}
}
}
The response for Christian Bale is:
{
"key": "Christian Bale",
"doc_count": 2,
"movies": {
"doc_count": 2,
"movie_title": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "American Psycho",
"doc_count": 1,
"release_date": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "7/06/2000",
"doc_count": 1
}
]
},
"imdbRate_date": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "7.6",
"doc_count": 1
}
]
}
},
{
"key": "The Dark Knight",
"doc_count": 1,
"release_date": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "13/08/2008",
"doc_count": 1
}
]
},
"imdbRate_date": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "9.0",
"doc_count": 1
}
]
}
}
]
}
}
}

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

ElasticSearch aggregations - to lowercase or not to lowercase - elasticsearch

Related

Elastic search terms aggregation for getting filter options

elastic search nested sub aggregations

What is the elastic search query for nested aggregation to return buckets of count values?

ES - Sub buckets based on values in bucket property rather than document values

ElasticSearch - Aggregations with document details

Categories

Resources