ElasticSearch aggregations - sorting values - sorting

In this sample I have some cars with an unknown number of facets on them.
When doing aggregations I would like the values in the aggregations to be sorted alphabetically. However, some of the facets are integers, and that will produce these aggregations
Color
blue (2)
red (1)
Top speed
100 (1)
120 (1)
90 (1)
Year
2015 (1)
As you can see the topspeed facet is sorted wrong - 90 should be first.
Sample data
PUT /my_index
{
"mappings": {
"product": {
"properties": {
"displayname" :{"type": "string"},
"facets": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"value": { "type": "string" },
"datatype": { "type": "string" }
}
}
}
}
}
}
PUT /my_index/product/1
{
"displayname": "HONDA",
"facets": [
{
"name": "topspeed",
"value": "100",
"datatype": "integer"
},
{
"name": "color",
"value": "Blue",
"datatype": "string"
}
]
}
PUT /my_index/product/2
{
"displayname": "WV",
"facets": [
{
"name": "topspeed",
"value": "90",
"datatype": "integer"
},
{
"name": "color",
"value": "Red",
"datatype": "string"
}
]
}
PUT /my_index/product/3
{
"displayname": "FORD",
"facets": [
{
"name": "topspeed",
"value": "120",
"datatype": "integer"
},
{
"name": "color",
"value": "Blue",
"datatype": "string"
},
{
"name": "year",
"value": "2015",
"datatype": "integer"
}
]
}
GET my_index/product/1
GET /my_index/product/_search
{
"size": 0,
"aggs": {
"facets": {
"nested": {
"path": "facets"
},
"aggs": {
"nested_facets": {
"terms": {
"field": "facets.name"
},
"aggs": {
"facet_value": {
"terms": {
"field": "facets.value",
"size": 0,
"order": {
"_term": "asc"
}
}
}
}
}
}
}
}
}
As you can see each facet has a datatype (integer or string).
Any ideas how I can get the sorting of values to be like this:
Color
blue (2)
red (1)
Top speed
90(1)
100 (1)
120 (1)
Year
2015 (1)
I've played around with adding a new field to the facet "sortable_value" where i pad the integer values like this "00000000090" at index time. But could not get the aggregations to work.
Any help is appreciated

That's an uncommon way of representing your data.
I'd suggest changing your data structure to the following
{
"displayname": "FORD",
"facets": {
"topspeed": 120,
"color": "Blue",
"year": 2015
}
}

Related

Elasticsearch aggregation to retrieve array of values associated with another value

I am working with an Elasticsearch index with data like this:
"_source": {
"article_number": "123456",
"title": "Example item #1",
"attributes": [
{
"key": "Type",
"value": "Bag"
},
{
"key": "Color",
"value": "Grey"
}
]
},
"_source": {
"article_number": "654321",
"title": "Example item #2",
"attributes": [
{
"key": "Type",
"value": "Bag"
},
{
"key": "Color",
"value": "Red"
}
]
}
The goal is to dynamically generate search inputs in a page where there is one search input for each unique value of attributes.key and within that input one value for each corresponding value of attributes.value. So in this case I would want to render a "Type" input offering only the value "Bag" and a "Color" input offering the values "Grey" and "Red."
I am trying to accomplish this with an aggregation that will give me a unique set of all values of attributes.key along with an array of all the values of attributes.value that are associated with each key. An example of a result that would fit what I am hoping for would be this:
{
[
{
"key": "Type",
"values": [{
"name": "Bag",
"doc_count": 2
}]
},
{
"key": "Color",
"values": [{
"name": "Grey",
"doc_count": 1
},
{
"name": "Red",
"doc_count": 1
}]
}
}
I have tried nested and reverse nested aggregations, as well as composite aggregations, but so far without success.
Assuming your index mapping looks like this:
PUT attrs
{
"mappings": {
"properties": {
"attributes": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
}
}
}
}
you can achieve the desired results with the following combination of a nested terms aggregation and its sub-aggregation:
POST attrs/_search
{
"size": 0,
"aggs": {
"nested_context": {
"nested": {
"path": "attributes"
},
"aggs": {
"by_keys": {
"terms": {
"field": "attributes.key",
"size": 10
},
"aggs": {
"by_values": {
"terms": {
"field": "attributes.value",
"size": 10
}
}
}
}
}
}
}
}

How to do aggregation on nested objects - Elasticsearch

I'm pretty new to Elasticsearch so please bear with me.
This is part of my document in ES.
{
"source": {
"detail": {
"attribute": {
"Size": ["32 Gb",4],
"Type": ["Tools",4],
"Brand": ["Sandisk",4],
"Color": ["Black",4],
"Model": ["Sdcz36-032g-b35",4],
"Manufacturer": ["Sandisk",4]
}
},
"title": {
"list": [
"Sandisk Cruzer 32gb Usb 32 Gb Flash Drive , Black - Sdcz36-032g"
]
}
}
}
So what I want to achieve is to find the best three or top three hits of the attribute object. For example, if I do a search for "sandisk", I want to get three attributes like ["Size", "Color", "Model"] or whatever attributes based on the top hits aggregation.
So i did a query like this
{
"size": 0,
"aggs": {
"categoryList": {
"filter": {
"bool": {
"filter": [
{
"term": {
"title.list": "sandisk"
}
}
]
}
},
"aggs": {
"results": {
"terms": {
"field": "detail.attribute",
"size": 3
}
}
}
}
}
}
But it seems to be not working. How do I fix this? Any hints would be much appreciated.
This is the _mappings. It is not the complete one, but I guess this would suffice.
{
"catalog2_0": {
"mappings": {
"product": {
"dynamic": "strict",
"dynamic_templates": [
{
"attributes": {
"path_match": "detail.attribute.*",
"mapping": {
"type": "text"
}
}
}
],
"properties": {
"detail": {
"properties": {
"attMaxScore": {
"type": "scaled_float",
"scaling_factor": 100
},
"attribute": {
"dynamic": "true",
"properties": {
"Brand": {
"type": "text"
},
"Color": {
"type": "text"
},
"MPN": {
"type": "text"
},
"Manufacturer": {
"type": "text"
},
"Model": {
"type": "text"
},
"Operating System": {
"type": "text"
},
"Size": {
"type": "text"
},
"Type": {
"type": "text"
}
}
},
"description": {
"type": "text"
},
"feature": {
"type": "text"
},
"tag": {
"type": "text",
"fields": {
"raw": {
"type": "keyword"
}
}
}
}
},
"title": {
"properties": {
"en": {
"type": "text"
}
}
}
}
}
}
}
}
According the documentation you can't make aggregation on field that have text datatype. They must have keyword datatype.
Then you can't make aggregation on the detail.attribute field in that way: The detail.attribute field doesn't store any value: it is an object datatype - not a nested one as you have written in the question, that means that it is a container for other field like Size, Brand etc. So you should aggregate against detail.attribute.Size field - if this one was a keyword datatype - for example.
Another presumable error is that you are trying to run a term query on a text datatype - what is the datatype of title.list field?. Term query is a prerogative for field that have keyword datatype, while match query is used to query against text datatype
Here is what I have used for a nested aggs query, minus the actual value names.
The actual field is a keyword, which as already mentioned is required, that is part of a nested JSON object:
"STATUS_ID": {
"type": "keyword",
"index": "not_analyzed",
"doc_values": true
},
Query
GET index name/_search?size=200
{
"aggs": {
"panels": {
"nested": {
"path": "nested path"
},
"aggs": {
"statusCodes": {
"terms": {
"field": "nested path.STATUS.STATUS_ID",
"size": 50
}
}
}
}
}
}
Result
"aggregations": {
"status": {
"doc_count": 12108963,
"statusCodes": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "O",
"doc_count": 5912218
},
{
"key": "C",
"doc_count": 401586
},
{
"key": "E",
"doc_count": 135628
},
{
"key": "Y",
"doc_count": 3742
},
{
"key": "N",
"doc_count": 1012
},
{
"key": "L",
"doc_count": 719
},
{
"key": "R",
"doc_count": 243
},
{
"key": "H",
"doc_count": 86
}
]
}
}

ElasticSearch - Getting paged result in a nested list (nested pagination)

I have the following Json that describes a country-city (1:n) relation
{
"country": [
{
"id": 1,
"name": "Country1",
"city": [
{"id": 1, "name": "City1"},
{"id": 2,"name": "City2"}
]
}, {
"id": 2,
"name": "Country2",
"city": [
{"id": 3,"name": "City3"},
{"id": 4,"name": "City4"}
]
}, {
"id": 3,
"name": "Country3",
"city": [
{"id": 5,"name": "City5"},
{"id": 6,"name": "City6"}
]
}
]
}
I have loaded it into an ES map with 3 documents of the three countries.
I have added nested property in the city index
...
"city": {
"type": "nested",
...
I want to query all cities and get a paged result.
For instance 3 hits will return city1, city2, city3
I want to filter by country name
I tried
GET /127.0.0.1:9200/country_city/_search
{
"from": 0,
"size": 2,
"fields": [
"city.id", "city.name"
]
}
and
GET /127.0.0.1:9200/country_city/country/_search?_source=false
{
"query": {
"nested": {
"path": "city",
"query": {
"match_all": {}
},
"inner_hits": {
"sort": "city.id",
"from": 0,
"size": 3
}
}
},
"fields": [
"name",
"city.id",
"city.name"
]
}
But the first returned two 4 cities instead of 2.
(2 countries have 2 cities each)
The second returned all documents(although size is 2 in the request) and in an inner element returned the first 3 cities of each country.
How Can I get a page size of the nested object?
And then progress to the next page?
This should work
Mappings
{
"mappings": {
"type": {
"properties": {
"country": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text"
},
"city": {
"type": "nested",
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
Query
{
"query": {
"nested": {
"path": "country",
"inner_hits": {},
"query": {
"nested": {
"path": "country.city",
"query": {
"match_all": {}
},
"inner_hits": {
"from": 0,
"size": 1,
"_source": {
"includes": ["country.city.name", "country.city.id"]
}
}
}
}
}
}
}
github bug
source filtering
Thanks

ElasticSearch Advanced Aggregations

I currently have documents indexed with the following structure:
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string"
},
"Type": {
"type": "string"
},
}
}
}
}
I want to aggregate on results when searching over this type. I initially just wanted the terms from the Source field, which was easy. I just used a terms aggregations for the Source field.
Now I would like to aggregate the Type field as well. However, the types are related to the sources. For example, I could have two Sources like this:
{
"Source": "The Store",
"Type": "Purchase"
}
and
{
"Source": "The Store",
"Type": "Return"
}
I want to show the different types and their counts for each different source. In other words, I would want my response to be something like this:
{
"aggs": {
"Sources": [
{
"Key": "The Store",
"DocCount": 2,
"Aggregations": {
"Types": [
{
"Key": "Purchase",
"DocCount": 1
},
{
"Key": "Return",
"DocCount": 1
}
]
}
}
]
}
}
Is there a way to get these sub-aggregations?
Yes, there is but you need to slightly change your mapping to make your fields `not_analyzed``
"ProductInteractions": {
"properties": {
"SKU": {
"type": "string"
},
"Name": {
"type": "string"
},
"Sources": {
"properties": {
"Source": {
"type": "string",
"index": "not_analyzed"
},
"Type": {
"type": "string",
"index": "not_analyzed"
},
}
}
}
}
Then you can use the following aggregation in order to get what you want:
{
"aggs": {
"sources": {
"terms": {
"field": "Sources.Source"
},
"aggs": {
"types": {
"terms": {
"field": "Sources.Type"
}
}
}
}
}
}

Sort nested object in Elasticsearch

I'm using the following mapping:
PUT /my_index
{
"mappings": {
"blogpost": {
"properties": {
"title": {"type": "string"}
"comments": {
"type": "nested",
"properties": {
"comment": { "type": "string" },
"date": { "type": "date" }
}
}
}
}
}
}
Example of document:
PUT /my_index/blogpost/1
{
"title": "Nest eggs",
"comments": [
{
"comment": "Great article",
"date": "2014-09-01"
},
{
"comment": "More like this please",
"date": "2014-10-22"
},
{
"comment": "Visit my website",
"date": "2014-07-02"
},
{
"comment": "Awesome",
"date": "2014-08-23"
}
]
}
My question is how to retrieve this document and sort the nested object "comments" by "date"? the result:
PUT /my_index/blogpost/1
{
"title": "Nest eggs",
"comments": [
{
"comment": "Awesome",
"date": "2014-07-23"
},
{
"comment": "Visit my website",
"date": "2014-08-02"
},
{
"comment": "Great article",
"date": "2014-09-01"
},
{
"comment": "More like this please",
"date": "2014-10-22"
}
]
}
You need to sort on the inner_hits to sort the nested objects. This will give you the desired output
GET my_index/_search
{
"query": {
"nested": {
"path": "comments",
"query": {
"match_all": {}
},
"inner_hits": {
"sort": {
"comments.date": {
"order": "asc"
}
},
"size": 5
}
}
},
"_source": [
"title"
]
}
I am using source filtering to get only "title" as comments will be retrieved inside inner_hit but you can avoid that if you want
size is 5 because default value is 3 and we have 4 objects in the given example.
Hope this helps!

Resources