Elasticsearch aggregation to retrieve array of values associated with another value - elasticsearch

I am working with an Elasticsearch index with data like this:
"_source": {
"article_number": "123456",
"title": "Example item #1",
"attributes": [
{
"key": "Type",
"value": "Bag"
},
{
"key": "Color",
"value": "Grey"
}
]
},
"_source": {
"article_number": "654321",
"title": "Example item #2",
"attributes": [
{
"key": "Type",
"value": "Bag"
},
{
"key": "Color",
"value": "Red"
}
]
}
The goal is to dynamically generate search inputs in a page where there is one search input for each unique value of attributes.key and within that input one value for each corresponding value of attributes.value. So in this case I would want to render a "Type" input offering only the value "Bag" and a "Color" input offering the values "Grey" and "Red."
I am trying to accomplish this with an aggregation that will give me a unique set of all values of attributes.key along with an array of all the values of attributes.value that are associated with each key. An example of a result that would fit what I am hoping for would be this:
{
[
{
"key": "Type",
"values": [{
"name": "Bag",
"doc_count": 2
}]
},
{
"key": "Color",
"values": [{
"name": "Grey",
"doc_count": 1
},
{
"name": "Red",
"doc_count": 1
}]
}
}
I have tried nested and reverse nested aggregations, as well as composite aggregations, but so far without success.

Assuming your index mapping looks like this:
PUT attrs
{
"mappings": {
"properties": {
"attributes": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
}
}
you can achieve the desired results with the following combination of a nested terms aggregation and its sub-aggregation:
POST attrs/_search
{
"size": 0,
"aggs": {
"nested_context": {
"nested": {
"path": "attributes"
},
"aggs": {
"by_keys": {
"terms": {
"field": "attributes.key",
"size": 10
},
"aggs": {
"by_values": {
"terms": {
"field": "attributes.value",
"size": 10
}
}
}
}
}
}
}
}

Related

Doing a Range Query over particular Nested Document

I have a document structure like this. For this below two documents, we have nested documents called interaction info. I just need to get only the documents that have title duration and their value is greater than 60
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": "11"
},
{
"title": "timetaken",
"value": "9"
},
{
"title": "talk_time",
"value": "145"
}
]
},
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": "120"
},
{
"title": "timetaken",
"value": "9"
},
{
"title": "talk_time",
"value": "60"
}
]
}
]
Is it possible to get only the document that has title: duration and their value is greater than 60.Value Property in the nested Document is text and keyword.
There are few basic mistakes in your solution, in order to utilize the range query(ie find a document which has more than 60 value, you need to store them as an integer in your case).
Also please refer this official guide which has a similar example.
Let me show you a step-by-step example on how to do it.
Index def
{
"mappings" :{
"properties" :{
"interactionInfo" :{
"type" : "nested"
},
"key" : {
"type" : "keyword"
}
}
}
}
Index sample docs
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 120. --> note, not using `""` double quotes which would store them as integer
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 60
}
]
}
{
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 11
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 145
}
]
}
Search query
{
"query": {
"nested": {
"path": "interactionInfo",
"query": {
"bool": {
"must": [
{
"match": {
"interactionInfo.title": "duration"
}
},
{
"range": {
"interactionInfo.value": {
"gt": 60
}
}
}
]
}
}
}
}
}
And your expected search result
"hits": [
{
"_index": "nestedsoint",
"_type": "_doc",
"_id": "2",
"_score": 2.0296195,
"_source": {
"key": "f07ff9ba-36e4-482a-9c1c-d888e89f926e",
"interactionInfo": [
{
"title": "duration",
"value": 120
},
{
"title": "timetaken",
"value": 9
},
{
"title": "talk_time",
"value": 60
}
]
}
}
]

Is there any solution with elasticsearch parent-child join

I have an es settings like following:
PUT /test
{
"mappings": {
"doc": {
"properties": {
"status": {
"type": "keyword"
},
"counting": {
"type": "integer"
},
"join": {
"type": "join",
"relations": {
"vsim": ["pool", "package"]
}
},
"poolId": {
"type": "keyword"
},
"packageId": {
"type": "keyword"
},
"countries": {
"type": "keyword"
},
"vId": {
"type": "keyword"
}
}
}
}}
Then add data:
// add vsim
PUT /test/doc/doc1
{"counting":6, "join": {"name": "vsim"}, "content": "1", "status": "disabled"}
PUT /test/doc/doc2
{"counting":5,"join": {"name": "vsim"}, "content": "2", "status": "disabled"}
PUT /test/doc/doc3
{"counting":5,"join": {"name": "vsim"}, "content": "2", "status": "enabled"}
// add package
PUT /test/doc/ner2?routing=doc2
{"join": {"name": "package", "parent": "doc2"}, "countries":["CN", "UK"]}
PUT test/doc/ner12?routing=doc1
{"join": {"name": "package", "parent": "doc1"}, "countries":["CN", "US"]}
PUT /test/doc/ner11?routing=doc1
{"join":{"name": "package", "parent": "doc1"}, "countries":["US", "KR"]}
PUT /test/doc/ner13?routing=doc3
{"join":{"name": "package", "parent": "doc3"}, "countries":["UK", "AU"]}
// add pool
PUT /test/doc/ner21?routing=doc1
{"join": {"name": "pool", "parent": "doc1"}, "poolId": "MER"}
PUT /test/doc/ner22?routing=doc2
{"join": {"name": "pool", "parent": "doc2"}, "poolId": "MER"}
PUT /test/doc/ner23?routing=doc2
{"join": {"name": "pool", "parent": "doc2"}, "poolId": "NER"}
and then I want to count the counting group by the status(vsim), poolId(pool) and countries(package), the expect result like:
disabled-MER-CN: 3
disabled-MER-US: 3
enabled-MR-CN: 1
... and so on.
I'm a new player for elasticsearch, and I have learnt the document like
https://www.elastic.co/guide/en/elasticsearch/reference/current/joining-queries.html
and
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-children-aggregation.html
but still have no idea to implement this aggregation query, PLEASE give me some suggestion, thanks!
If I followed your structure of the documents - you have types pool and package on the same level (they are siblings) - I wasn't able to achieve exactly your expected results. I also highly doubt that it's possible with those types being siblings.
However, it's still possible to slice per one field in your doc (status) and later separately slice both by poolId and countries with a query like this:
{
"aggs": {
"status-aggs": {
"terms": {
"field": "status",
"size": 10
},
"aggs": {
"to-pool": {
"children": {
"type": "pool"
},
"aggs": {
"top-poolid": {
"terms": {
"field": "poolId",
"size": 10
}
}
}
},
"to-package": {
"children": {
"type": "package"
},
"aggs": {
"top-countries": {
"terms": {
"field": "countries",
"size": 10
}
}
}
}
}
}
}
}
with a response from Elasticsearch like this (I've omitted some part of json for readability):
{
"status-aggs": {
"buckets": [
{
"key": "disabled",
"doc_count": 2,
"to-pool": {
"doc_count": 3,
"top-poolid": {
"buckets": [
{
"key": "MER",
"doc_count": 2
},
{
"key": "NER",
"doc_count": 1
}
]
}
},
"to-package": {
"doc_count": 3,
"top-countries": {
"buckets": [
{
"key": "CN",
"doc_count": 2
},
{
"key": "US",
"doc_count": 2
},
{
"key": "KR",
"doc_count": 1
},
{
"key": "UK",
"doc_count": 1
}
]
}
}
},
{
"key": "enabled",
"doc_count": 1,
"to-pool": {
"doc_count": 0,
"top-poolid": {
"buckets": []
}
},
"to-package": {
"doc_count": 1,
"top-countries": {
"buckets": [
{
"key": "AU",
"doc_count": 1
},
{
"key": "UK",
"doc_count": 1
}
]
}
}
}
]
}
}

How to do aggregation on nested objects - Elasticsearch

I'm pretty new to Elasticsearch so please bear with me.
This is part of my document in ES.
{
"source": {
"detail": {
"attribute": {
"Size": ["32 Gb",4],
"Type": ["Tools",4],
"Brand": ["Sandisk",4],
"Color": ["Black",4],
"Model": ["Sdcz36-032g-b35",4],
"Manufacturer": ["Sandisk",4]
}
},
"title": {
"list": [
"Sandisk Cruzer 32gb Usb 32 Gb Flash Drive , Black - Sdcz36-032g"
]
}
}
}
So what I want to achieve is to find the best three or top three hits of the attribute object. For example, if I do a search for "sandisk", I want to get three attributes like ["Size", "Color", "Model"] or whatever attributes based on the top hits aggregation.
So i did a query like this
{
"size": 0,
"aggs": {
"categoryList": {
"filter": {
"bool": {
"filter": [
{
"term": {
"title.list": "sandisk"
}
}
]
}
},
"aggs": {
"results": {
"terms": {
"field": "detail.attribute",
"size": 3
}
}
}
}
}
}
But it seems to be not working. How do I fix this? Any hints would be much appreciated.
This is the _mappings. It is not the complete one, but I guess this would suffice.
{
"catalog2_0": {
"mappings": {
"product": {
"dynamic": "strict",
"dynamic_templates": [
{
"attributes": {
"path_match": "detail.attribute.*",
"mapping": {
"type": "text"
}
}
}
],
"properties": {
"detail": {
"properties": {
"attMaxScore": {
"type": "scaled_float",
"scaling_factor": 100
},
"attribute": {
"dynamic": "true",
"properties": {
"Brand": {
"type": "text"
},
"Color": {
"type": "text"
},
"MPN": {
"type": "text"
},
"Manufacturer": {
"type": "text"
},
"Model": {
"type": "text"
},
"Operating System": {
"type": "text"
},
"Size": {
"type": "text"
},
"Type": {
"type": "text"
}
}
},
"description": {
"type": "text"
},
"feature": {
"type": "text"
},
"tag": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
},
"title": {
"properties": {
"en": {
"type": "text"
}
}
}
}
}
}
}
}
According the documentation you can't make aggregation on field that have text datatype. They must have keyword datatype.
Then you can't make aggregation on the detail.attribute field in that way: The detail.attribute field doesn't store any value: it is an object datatype - not a nested one as you have written in the question, that means that it is a container for other field like Size, Brand etc. So you should aggregate against detail.attribute.Size field - if this one was a keyword datatype - for example.
Another presumable error is that you are trying to run a term query on a text datatype - what is the datatype of title.list field?. Term query is a prerogative for field that have keyword datatype, while match query is used to query against text datatype
Here is what I have used for a nested aggs query, minus the actual value names.
The actual field is a keyword, which as already mentioned is required, that is part of a nested JSON object:
"STATUS_ID": {
"type": "keyword",
"index": "not_analyzed",
"doc_values": true
},
Query
GET index name/_search?size=200
{
"aggs": {
"panels": {
"nested": {
"path": "nested path"
},
"aggs": {
"statusCodes": {
"terms": {
"field": "nested path.STATUS.STATUS_ID",
"size": 50
}
}
}
}
}
}
Result
"aggregations": {
"status": {
"doc_count": 12108963,
"statusCodes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O",
"doc_count": 5912218
},
{
"key": "C",
"doc_count": 401586
},
{
"key": "E",
"doc_count": 135628
},
{
"key": "Y",
"doc_count": 3742
},
{
"key": "N",
"doc_count": 1012
},
{
"key": "L",
"doc_count": 719
},
{
"key": "R",
"doc_count": 243
},
{
"key": "H",
"doc_count": 86
}
]
}
}

Further filtering of aggregations

I have a question regarding aggregation in elastic search. I have a document like the following:
{
"_index": "products",
"_type": "product",
"_id": "ID-12345",
"_score": 1,
"_source": {
"created_at": "2017-08-04T17:56:44.592Z",
"updated_at": "2017-08-04T17:56:44.592Z",
"product_information": {
"sku": "12345",
"name": "Product Name",
"price": 25,
"brand": "Brand Name",
"url": "URL"
},
"product_detail": {
"description": "Product description text here.",
"string_facets": [
{
"facet_name": "Colour",
"facet_value": "Grey"
},
{
"facet_name": "Category",
"facet_value": "Linen"
},
{
"facet_name": "Category",
"facet_value": "Throws & Blanket"
},
{
"facet_name": "Keyword",
"facet_value": "Contemporary"
},
{
"facet_name": "Keyword",
"facet_value": "Sophisticated"
}
]
}
}
}
I am storing product information such as Colour, Material, Category and Keywords within the product_detail.string_facets field. I'd like to use this for aggregation to get Colour/Material/Category/Keyword suggestions but as separate buckets. I.e, there is a separate bucket for each of those string_facet types as defined in product_detail.string_facets.facet_name.
This is the query I have at the moment which is returning data, but not as I expect. First the query (this was just to try and get Colours):
{
"from": 0,
"size": 12,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "Rug",
"fields": ["product_information.name", "product_detail.string_facets.facet_value"]
}
},
{
"multi_match": {
"query": "Blue",
"fields": ["product_information.name", "product_detail.string_facets.facet_name"]
}
}
],
"minimum_should_match": "100%"
}
},
"aggs": {
"suggestions": {
"filter": { "term": { "product_detail.string_facets.facet_name.keyword": "Colour" }},
"aggs": {
"colours": {
"terms": {
"field": "product_detail.string_facets.facet_value.keyword",
"size": 10
}
}
}
}
}
}
This is giving me output like the following:
"aggregations": {
"suggestions": {
"doc_count": 21,
"colours": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 23,
"buckets": [
{
"key": "Rug",
"doc_count": 21
},
{
"key": "Blue",
"doc_count": 18
},
{
"key": "Bold",
"doc_count": 7
},
{
"key": "Modern",
"doc_count": 6
},
{
"key": "Multi-Coloured",
"doc_count": 5
},
{
"key": "Contemporary",
"doc_count": 4
},
{
"key": "Traditional",
"doc_count": 4
},
{
"key": "White",
"doc_count": 4
},
{
"key": "Luxurious",
"doc_count": 3
},
{
"key": "Minimal",
"doc_count": 3
}
]
}
}
}
It has given me the results of all facet_name rather those of facet_type Colour as I thought it would.
Any help would be greatly appreciated. Elasticsearch seems very powerful but the documentation is quite daunting!
You did not show how the mapping looks like, but I suppose that product_detail.string_facets field is just an inner object field and that is the reason why you get this kind of result. With this type of mapping Elasticsearch flattens the array into a simple list of field names and values. In your case it becomes:
{
"product_detail.string_facets.facet_name": ["Colour", "Category", "Keyword"],
"product_detail.string_facets.facet_value": ["Grey", "Linen", "Throws & Blanket", "Contemporary", "Sophisticated"]
}
As you can see, based on this structure, Elasticsearch cannot know how to aggregate the data.
To make it work product_detail.string_facets field should be of type nested. Mapping for string_facets should be similar to this (note "type": "nested"):
"string_facets": {
"type": "nested",
"properties": {
"facet_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"facet_value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
Now I index following document:
{
"created_at": "2017-08-04T17:56:44.592Z",
"updated_at": "2017-08-04T17:56:44.592Z",
"product_information": {
"sku": "12345",
"name": "Rug",
"price": 25,
"brand": "Brand Name",
"url": "URL"
},
"product_detail": {
"description": "Product description text here.",
"string_facets": [
{
"facet_name": "Colour",
"facet_value": "Blue"
},
{
"facet_name": "Colour",
"facet_value": "Red"
},
{
"facet_name": "Category",
"facet_value": "Throws & Blanket"
},
{
"facet_name": "Keyword",
"facet_value": "Contemporary"
}
]
}
}
Now, to get aggregation of colour suggestions as separate buckets, you can try this query (I simplified the bool query for the need of my document):
{
"from": 0,
"size": 12,
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "Rug",
"fields": ["product_information.name", "product_detail.string_facets.facet_value"]
}
}
]
}
},
"aggs": {
"facets": {
"nested" : {
"path" : "product_detail.string_facets"
},
"aggs": {
"suggestions": {
"filter": { "term": { "product_detail.string_facets.facet_name.keyword": "Colour" }},
"aggs": {
"colours": {
"terms": {
"field": "product_detail.string_facets.facet_value.keyword",
"size": 10
}
}
}
}
}
}
}
}
And result:
{
...,
"hits": {
...
},
"aggregations": {
"facets": {
"doc_count": 5,
"suggestions": {
"doc_count": 2,
"colours": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Blue",
"doc_count": 1
},
{
"key": "Red",
"doc_count": 1
}
]
}
}
}
}
}

ElasticSearch aggregations - sorting values

In this sample I have some cars with an unknown number of facets on them.
When doing aggregations I would like the values in the aggregations to be sorted alphabetically. However, some of the facets are integers, and that will produce these aggregations
Color
blue (2)
red (1)
Top speed
100 (1)
120 (1)
90 (1)
Year
2015 (1)
As you can see the topspeed facet is sorted wrong - 90 should be first.
Sample data
PUT /my_index
{
"mappings": {
"product": {
"properties": {
"displayname" :{"type": "string"},
"facets": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"value": { "type": "string" },
"datatype": { "type": "string" }
}
}
}
}
}
}
PUT /my_index/product/1
{
"displayname": "HONDA",
"facets": [
{
"name": "topspeed",
"value": "100",
"datatype": "integer"
},
{
"name": "color",
"value": "Blue",
"datatype": "string"
}
]
}
PUT /my_index/product/2
{
"displayname": "WV",
"facets": [
{
"name": "topspeed",
"value": "90",
"datatype": "integer"
},
{
"name": "color",
"value": "Red",
"datatype": "string"
}
]
}
PUT /my_index/product/3
{
"displayname": "FORD",
"facets": [
{
"name": "topspeed",
"value": "120",
"datatype": "integer"
},
{
"name": "color",
"value": "Blue",
"datatype": "string"
},
{
"name": "year",
"value": "2015",
"datatype": "integer"
}
]
}
GET my_index/product/1
GET /my_index/product/_search
{
"size": 0,
"aggs": {
"facets": {
"nested": {
"path": "facets"
},
"aggs": {
"nested_facets": {
"terms": {
"field": "facets.name"
},
"aggs": {
"facet_value": {
"terms": {
"field": "facets.value",
"size": 0,
"order": {
"_term": "asc"
}
}
}
}
}
}
}
}
}
As you can see each facet has a datatype (integer or string).
Any ideas how I can get the sorting of values to be like this:
Color
blue (2)
red (1)
Top speed
90(1)
100 (1)
120 (1)
Year
2015 (1)
I've played around with adding a new field to the facet "sortable_value" where i pad the integer values like this "00000000090" at index time. But could not get the aggregations to work.
Any help is appreciated
That's an uncommon way of representing your data.
I'd suggest changing your data structure to the following
{
"displayname": "FORD",
"facets": {
"topspeed": 120,
"color": "Blue",
"year": 2015
}
}

Resources