Elasticsearch multi-select facet functionality with child aggregation - elasticsearch

Given the following data:
curl -XPUT 'http://localhost:9200/products/'
curl -XPOST 'http://localhost:9200/products/product/_mapping' -d '{
"product": {
"_parent": {"type": "product_group"}
}
}'
curl -XPUT 'http://localhost:9200/products/product_group/1' -d '{
"title": "Product 1"
}'
curl -XPOST localhost:9200/products/product/1?parent=1 -d '{
"height": 190,
"width": 120
}'
curl -XPOST localhost:9200/products/product/2?parent=1 -d '{
"height": 120,
"width": 100
}'
curl -XPOST localhost:9200/products/product/3?parent=1 -d '{
"height": 110,
"width": 120
}'
Child aggregation on product results in the following facets:
Height
110 (1)
120 (1)
190 (1)
Width
120 (2)
100 (1)
If I now filter on height 190, what I would like is to have the height aggregation excluded from the filter so the results would be:
Height
110 (1)
120 (1)
190 (1)
Width
120 (1)
This is solvable with filter aggregation, but I'm not sure if it works or how the syntax is when using parent - child relations.
See http://distinctplace.com/2014/07/29/build-zappos-like-products-facets-with-elasticsearch/
What I've tried so far:
curl -XGET 'http://localhost:9200/products/product_group/_search?pretty=true' -d '{
"filter": {
"has_child": {
"type": "product",
"filter": {
"term": {"height": 190}
},
"inner_hits": {}
}
},
"aggs": {
"to-products": {
"children": {"type": "product"},
"aggs": {
"height": {
"filter": {"match_all": {}},
"aggs": {
"height": {
"terms": {"field": "height", "size": 10}
}
}
},
"width": {
"filter": {
"and": [{"terms": { "height": [190]}}]
},
"aggs": {
"width": {
"terms": {"field": "width", "size": 10}
}
}
}
}
}
}
}
'

I don't fully understand your question, but If you want to have multiple aggregation inside child aggregation, you have to append parent type name before every field in aggregation.
here is modified query,
curl -XPOST "http://localhost:9200/products/product_group/_search?pretty=true" -d'
{
"size": 0,
"filter": {
"has_child": {
"type": "product",
"filter": {
"term": {
"product.height": 190
}
},
"inner_hits": {}
}
},
"aggs": {
"to-products": {
"children": {
"type": "product"
},
"aggs": {
"height": {
"filter": {
"match_all": {}
},
"aggs": {
"height": {
"terms": {
"field": "product.height",
"size": 10
}
}
}
},
"width": {
"filter": {
"and": [
{
"terms": {
"product.height": [
190
]
}
}
]
},
"aggs": {
"width": {
"terms": {
"field": "product.width",
"size": 10
}
}
}
}
}
}
}
}'
It wasn't mentioned anywhere in documentation, which is confusing to many users, I guess they treat child aggregation same as nested aggregation so same way to aggregate.

Related

Search for documents by minimum value of field

I'm trying to filter products by their price, and I'm completely stumped as to how to proceed.
Hoping someone can shed some light on this, and maybe point me in the right direction.
Concept
Each product has multiple prices.
These prices are valid during a certain date-range.
The actual price of the product at a certain date is the lowest price that is valid on that date.
Goal
I want to be able to:
get the lowest and highest price for a certain date
filter the products by a max/min price on a certain date
caveat: I have simplified the restrictions for the prices for this example, but I'm not able to consolidate the dates so there's only 1 valid per date range.
Example
Mapping:
curl -XPUT 'http://localhost:9200/price-filter-test'
curl -XPUT 'http://localhost:9200/price-filter-test/_mapping/_doc' -H 'Content-Type: application/json' -d '{
"properties": {
"id": {"type": "integer"},
"name": {"type": "text"},
"prices": {
"type": "nested",
"properties": {
"price": {"type": "integer"},
"from": {"type": "date"},
"untill": {"type": "date"}
}
}
}
}'
Test entries:
curl -XPUT 'http://localhost:9200/price-filter-test/_doc/1' -H 'Content-Type: application/json' -d '{
"id": 1,
"name": "Product A",
"prices": [
{
"price": 10,
"from": "2020-02-01",
"untill": "2020-03-01"
},
{
"price": 8,
"from": "2020-02-20",
"untill": "2020-02-21"
},
{
"price": 12,
"from": "2020-02-22",
"untill": "2020-02-23"
}
]
}'
curl -XPUT 'http://localhost:9200/price-filter-test/_doc/2' -H 'Content-Type: application/json' -d '{
"id": 2,
"name": "Product B",
"prices": [
{
"price": 20,
"from": "2020-02-01",
"untill": "2020-03-01"
},
{
"price": 18,
"from": "2020-02-20",
"untill": "2020-02-21"
},
{
"price": 22,
"from": "2020-02-22",
"untill": "2020-02-23"
}
]
}'
At 2020-02-20 entries the following prices will valid, correct prices in bold:
Product A:
10
8
Product B:
20
18
Solution
Min/Max
I have figured out how to get the min and max values of the applicable prices.
This was pretty doable using aggregations:
curl -XGET 'http://localhost:9200/price-filter-test/_search?pretty=true' -H 'Content-Type: application/json' -d '{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"product_ids": {
"terms": {"field": "id"},
"aggs": {
"nested_prices": {
"nested": {"path": "prices"},
"aggs": {
"applicable_prices": {
"filter": {
"bool": {
"must": [
{"range": {"prices.from": {"lte": "2020-02-20"}}},
{"range": {"prices.untill": {"gte": "2020-02-20"}}}
]
}
},
"aggs": {
"min_price": {
"min": {"field": "prices.price"}
}
}
}
}
}
}
},
"stats_min_prices": {
"stats_bucket": {
"buckets_path": "product_ids>nested_prices>applicable_prices>min_price"
}
}
}
}'
Here I first aggregate over the different ids, to ensure prices are checked per product, then I filter by applicable dates, and then get the min prices for each.
Using the stats_bucket aggregation, I'm then able to get the min and max values of these minimum prices.
{
// ...
"aggregations" : {
// ...
"stats_min_prices" : {
"count" : 2,
"min" : 8.0,
"max" : 18.0,
"avg" : 13.0,
"sum" : 26.0
}
}
}
Here we see the correct min (8 for Product A) and max (18 for Product B)
Filtering
For filtering, I need to be able to exclude products based on their lowest price.
e.g. If I search for products that cost at least 19, I shouldn't find any as Product B's lowest price is 18
curl -X GET "localhost:9200/price-filter-test/_search?pretty" -H 'Content-Type: application/json' -d '{
"query": {
"nested": {
"path": "prices",
"query": {
"bool": {
"must": [
{
"range" : {
"prices.price" : {"gte" : 19}
}
},
{"range": {"prices.from": {"lte": "2020-02-20"}}},
{"range": {"prices.untill": {"gte": "2020-02-20"}}}
]
}
}
}
}
}'
This attempt, however, still yields "Product B" as a match, as one of the prices in this date range is higher than 19. However, as it is not the lowest price in this date range, it is not the "correct" price.
I'm completely stumped as to how to do this.
I've thought about using scripted fields, but I think I'd need to combine 2 (1 for calculated applicable prices, 1 for getting the lowest), and this doesn't appear to be an option.
Hope you can point me in the right direction
Well if i right you are looking for inner_hits:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-inner-hits.html
I was not sure for the aggregation (you cant inject inner_hits in the aggregation) what s why i didnot post at start.
Hope it s what you need.
{
"query": {
"nested": {
"path": "prices",
"query": {
"range": {
"prices.price": {
"gte": 10,
"lte": 20
}
}
},
"inner_hits": {}
}
}
}
=> will keep only nested doc mathing with the range in the inner_hits part:
"inner_hits":{
"prices":{
"hits":{
"total":2,
"max_score":1,
"hits":[
{
"_nested":{
"field":"prices",
"offset":1
},
"_score":1,
"_source":{
"price":18,
"from":"2020-02-20",
"untill":"2020-02-21"
}
},
{
"_nested":{
"field":"prices",
"offset":0
},
"_score":1,
"_source":{
"price":20,
"from":"2020-02-01",
"untill":"2020-03-01"
}
}
]
}
}
}

Subtract numeric fields between two documents with different timestamp

Lets say I have these data samples:
{
"date": "2019-06-16",
"rank": 150
"name": "doc 1"
}
{
"date": "2019-07-16",
"rank": 100
"name": "doc 1"
}
{
"date": "2019-06-16",
"rank": 50
"name": "doc 2"
}
{
"date": "2019-07-16",
"rank": 80
"name": "doc 2"
}
The expected result is by subtracting the rank field from two same name of docs with different date (old date - new date):
{
"name": "doc 1",
"diff_rank": 50
}
{
"name": "doc 2",
"diff_rank": -30
}
And sort by diff_rank if possible, otherwise I will just sort manually after getting the result.
What I have tried is by using date_histogram and serial_diff but some results are missing the diff_rank value in somehow which I am sure the data exist:
{
"aggs" : {
"group_by_name": {
"terms": {
"field": "name"
},
"aggs": {
"days": {
"date_histogram": {
"field": "date",
"interval": "day"
},
"aggs": {
"the_rank": {
"sum": {
"field": "rank"
}
},
"diff_rank": {
"serial_diff": {
"buckets_path": "the_rank",
"lag" : 30 // 1 month or 30 days in this case
}
}
}
}
}
}
}
}
The help will be much appreciated to solve my issue above!
Finally, I found a method from official doc using Filter, Bucket Script aggregation and Bucket Sort to sort the result. Here is the final snippet code:
{
"size": 0,
"aggs" : {
"group_by_name": {
"terms": {
"field": "name",
"size": 50,
"shard_size": 10000
},
"aggs": {
"last_month_rank": {
"filter": {
"term": {"date": "2019-06-17"}
},
"aggs": {
"rank": {
"sum": {
"field": "rank"
}
}
}
},
"latest_rank": {
"filter": {
"term": {"date": "2019-07-17"}
},
"aggs": {
"rank": {
"sum": {
"field": "rank"
}
}
}
},
"diff_rank": {
"bucket_script": {
"buckets_path": {
"lastMonthRank": "last_month_rank>rank",
"latestRank": "latest_rank>rank"
},
"script": "params.lastMonthRank - params.latestRank"
}
},
"rank_bucket_sort": {
"bucket_sort": {
"sort": [
{"diff_rank": {"order": "desc"}}
],
"size": 50
}
}
}
}
}
}

elastic search : Aggregating the specific nested documents only

I want to aggregate the specific nested documents which satisfies the given query.
Let me explain it through an example. I have inserted two records in my index:
First document is,
{
"project": [
{
"subject": "maths",
"marks": 47
},
{
"subject": "computers",
"marks": 22
}
]
}
second document is,
{
"project": [
{
"subject": "maths",
"marks": 65
},
{
"subject": "networks",
"marks": 72
}
]
}
Which contains the subject along with the marks in each record. From that documents, I need to have an average of maths subject alone from the given documents.
The query I tried is:
{
"size": 0,
"aggs": {
"avg_marks": {
"avg": {
"field": "project.marks"
}
}
},
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "project.subject:maths",
"analyze_wildcard": true,
"default_field": "*"
}
}
]
}
}
}
Which is returning the result of aggregating all the marks average which is not required.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"avg_marks": {
"value": 51.5
}
}
}
I just need an average of maths subject from the given documents, in which the expected result is 56.00
any help with the query or idea will be helpful.
Thanks in advance.
First you need in your mapping to specify that index have nested field like following:
PUT /nested-index {
"mappings": {
"document": {
"properties": {
"project": {
"type": "nested",
"properties": {
"subject": {
"type": "keyword"
},
"marks": {
"type": "long"
}
}
}
}
}
}
}
then you insert your docs:
PUT nested-index/document/1
{
"project": [
{
"subject": "maths",
"marks": 47
},
{
"subject": "computers",
"marks": 22
}
]
}
then insert second doc:
PUT nested-index/document/2
{
"project": [
{
"subject": "maths",
"marks": 65
},
{
"subject": "networks",
"marks": 72
}
]
}
and then you do aggregation but specify that you have nested structure like this:
GET nested-index/_search
{
"size": 0,
"aggs": {
"subjects": {
"nested": {
"path": "project"
},
"aggs": {
"subjects": {
"terms": {
"field": "project.subject",
"size": 10
},
"aggs": {
"average": {
"avg": {
"field": "project.marks"
}
}
}
}
}
}
}
}
and why your query is not working and why give that result is because when you have nested field and do average it sums all number from one array if in that array you have some keyword doesn't matter that you want to aggregate only by one subject.
So if you have those two docs because in both docs you have math subject avg will be calculated like this:
(47 + 22 + 65 + 72) / 4 = 51.5
if you want avg for networks it will return you (because in one document you have network but it will do avg over all values in array):
65 + 72 = 68.5
so you need to use nested structure in this case.
If you are interested just for one subject you can than do aggregation just for subject equal to something like this (subject equal to "maths"):
GET nested-index/_search
{
"size": 0,
"aggs": {
"project": {
"nested": {
"path": "project"
},
"aggs": {
"subjects": {
"filter": {
"term": {
"project.subject": "maths"
}
},
"aggs": {
"average": {
"avg": {
"field": "project.marks"
}
}
}
}
}
}
}
}

Elasticsearch aggregation doesn't work with nested-type fields

I can't make elasticsearch aggregation+filter to work with nested fields. The data schema (relevant part) is like this:
"mappings": {
"rb": {
"properties": {
"project": {
"type": "nested",
"properties": {
"age": {
"type": "long"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Essentially "rb" object contains a nested field called "project" which contains two more fields - "name" and "age". Query I'm running:
"aggs": {
"root": {
"aggs": {
"group": {
"aggs": {
"filtered": {
"aggs": {
"order": {
"percentiles": {
"field": "project.age",
"percents": ["50"]
}
}
},
"filter": {
"range": {
"last_updated": {
"gte": "2015-01-01",
"lt": "2015-07-01"
}
}
}
}
},
"terms": {
"field": "project.name",
"min_doc_count": 5,
"order": {
"filtered>order.50": "asc"
},
"shard_size": 10,
"size": 10
}
}
},
"nested": {
"path": "project"
}
}
}
This query is supposed to produce top 10 projects (project.name field) which match the date filter, sorted by their median age, ignoring projects with less than 5 mentions in the database. Median should be calculated only for projects matching the filter (date range).
Despite having more than a hundred thousands objects in the database, this query produces empty list. No errors, just empty response. I've tried it both on ES 1.6 and ES 2.0-beta.
I've re-organized your aggregation query a bit and I could get some results showing up. The main point is type since you are aggregating around a nested type, I took out the filter aggregation on the last_updated field and moved it up the hierarchy as the first aggregation. Then comes the nested aggregation on the project field and finally the terms and the percentile.
That seems to work out pretty well. Please try.
{
"size": 0,
"aggs": {
"filtered": {
"filter": {
"range": {
"last_updated": {
"gte": "2015-01-01",
"lt": "2015-07-01"
}
}
},
"aggs": {
"root": {
"nested": {
"path": "project"
},
"aggs": {
"group": {
"terms": {
"field": "project.name",
"min_doc_count": 5,
"shard_size": 10,
"order": {
"order.50": "asc"
},
"size": 10
},
"aggs": {
"order": {
"percentiles": {
"field": "project.age",
"percents": [
"50"
]
}
}
}
}
}
}
}
}
}
}

Elasticsearch multi-select faceting

Lets imagine we have the following documents:
curl -XPUT 'http://localhost:9200/multiselect/demo/1' -d '{
"title": "One",
"tags": ["tag1"],
"keywords": ["keyword1"]
}'
curl -XPUT 'http://localhost:9200/multiselect/demo/2' -d '{
"title": "Two",
"tags": ["tag2"],
"keywords": ["keyword2"]
}'
If we do the query:
curl -XGET '
{
"post_filter": {
"and": [
{
"terms": {
"tags": [
"tag1",
"tag2"
]
}
},
{
"terms": {
"keywords": [
"keyword1"
]
}
}
]
},
"aggs": {
"tagFacet": {
"aggs": {
"aggs": {
"terms": {
"field": "tags",
"size": 0
}
}
},
"filter": {
"terms": {
"keywords": [
"keyword1"
]
}
}
},
"keywordFacet": {
"aggs": {
"aggs": {
"terms": {
"field": "keywords",
"size": 0
}
}
},
"filter": {
"terms": {
"tags": [
"tag1",
"tag2"
]
}
}
}
}
}
'
We will have a document "One" and a list of facets: tag1 - 1, keyword1 - 1, keyword2 - 0 and tag2 - 1, but actually the last one tag2 should not be there, because we don't have anything for the keyword2 in our filter (and the facets).
The question is, is there are any possibility to get facets without tag2, and not to make 2 requests.
Let me know, if you need a better explanation, but I guess the basic idea should be clear.
PS. Some better explanation of the following pattern you can find out here: https://gist.github.com/mattweber/1947215; it's the same thing, and it have the same issue described here.

Resources