Search for documents by minimum value of field - elasticsearch

I'm trying to filter products by their price, and I'm completely stumped as to how to proceed.
Hoping someone can shed some light on this, and maybe point me in the right direction.
Concept
Each product has multiple prices.
These prices are valid during a certain date-range.
The actual price of the product at a certain date is the lowest price that is valid on that date.
Goal
I want to be able to:
get the lowest and highest price for a certain date
filter the products by a max/min price on a certain date
caveat: I have simplified the restrictions for the prices for this example, but I'm not able to consolidate the dates so there's only 1 valid per date range.
Example
Mapping:
curl -XPUT 'http://localhost:9200/price-filter-test'
curl -XPUT 'http://localhost:9200/price-filter-test/_mapping/_doc' -H 'Content-Type: application/json' -d '{
"properties": {
"id": {"type": "integer"},
"name": {"type": "text"},
"prices": {
"type": "nested",
"properties": {
"price": {"type": "integer"},
"from": {"type": "date"},
"untill": {"type": "date"}
}
}
}
}'
Test entries:
curl -XPUT 'http://localhost:9200/price-filter-test/_doc/1' -H 'Content-Type: application/json' -d '{
"id": 1,
"name": "Product A",
"prices": [
{
"price": 10,
"from": "2020-02-01",
"untill": "2020-03-01"
},
{
"price": 8,
"from": "2020-02-20",
"untill": "2020-02-21"
},
{
"price": 12,
"from": "2020-02-22",
"untill": "2020-02-23"
}
]
}'
curl -XPUT 'http://localhost:9200/price-filter-test/_doc/2' -H 'Content-Type: application/json' -d '{
"id": 2,
"name": "Product B",
"prices": [
{
"price": 20,
"from": "2020-02-01",
"untill": "2020-03-01"
},
{
"price": 18,
"from": "2020-02-20",
"untill": "2020-02-21"
},
{
"price": 22,
"from": "2020-02-22",
"untill": "2020-02-23"
}
]
}'
At 2020-02-20 entries the following prices will valid, correct prices in bold:
Product A:
10
8
Product B:
20
18
Solution
Min/Max
I have figured out how to get the min and max values of the applicable prices.
This was pretty doable using aggregations:
curl -XGET 'http://localhost:9200/price-filter-test/_search?pretty=true' -H 'Content-Type: application/json' -d '{
"query": {"match_all": {}},
"size": 0,
"aggs": {
"product_ids": {
"terms": {"field": "id"},
"aggs": {
"nested_prices": {
"nested": {"path": "prices"},
"aggs": {
"applicable_prices": {
"filter": {
"bool": {
"must": [
{"range": {"prices.from": {"lte": "2020-02-20"}}},
{"range": {"prices.untill": {"gte": "2020-02-20"}}}
]
}
},
"aggs": {
"min_price": {
"min": {"field": "prices.price"}
}
}
}
}
}
}
},
"stats_min_prices": {
"stats_bucket": {
"buckets_path": "product_ids>nested_prices>applicable_prices>min_price"
}
}
}
}'
Here I first aggregate over the different ids, to ensure prices are checked per product, then I filter by applicable dates, and then get the min prices for each.
Using the stats_bucket aggregation, I'm then able to get the min and max values of these minimum prices.
{
// ...
"aggregations" : {
// ...
"stats_min_prices" : {
"count" : 2,
"min" : 8.0,
"max" : 18.0,
"avg" : 13.0,
"sum" : 26.0
}
}
}
Here we see the correct min (8 for Product A) and max (18 for Product B)
Filtering
For filtering, I need to be able to exclude products based on their lowest price.
e.g. If I search for products that cost at least 19, I shouldn't find any as Product B's lowest price is 18
curl -X GET "localhost:9200/price-filter-test/_search?pretty" -H 'Content-Type: application/json' -d '{
"query": {
"nested": {
"path": "prices",
"query": {
"bool": {
"must": [
{
"range" : {
"prices.price" : {"gte" : 19}
}
},
{"range": {"prices.from": {"lte": "2020-02-20"}}},
{"range": {"prices.untill": {"gte": "2020-02-20"}}}
]
}
}
}
}
}'
This attempt, however, still yields "Product B" as a match, as one of the prices in this date range is higher than 19. However, as it is not the lowest price in this date range, it is not the "correct" price.
I'm completely stumped as to how to do this.
I've thought about using scripted fields, but I think I'd need to combine 2 (1 for calculated applicable prices, 1 for getting the lowest), and this doesn't appear to be an option.
Hope you can point me in the right direction

Well if i right you are looking for inner_hits:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-inner-hits.html
I was not sure for the aggregation (you cant inject inner_hits in the aggregation) what s why i didnot post at start.
Hope it s what you need.
{
"query": {
"nested": {
"path": "prices",
"query": {
"range": {
"prices.price": {
"gte": 10,
"lte": 20
}
}
},
"inner_hits": {}
}
}
}
=> will keep only nested doc mathing with the range in the inner_hits part:
"inner_hits":{
"prices":{
"hits":{
"total":2,
"max_score":1,
"hits":[
{
"_nested":{
"field":"prices",
"offset":1
},
"_score":1,
"_source":{
"price":18,
"from":"2020-02-20",
"untill":"2020-02-21"
}
},
{
"_nested":{
"field":"prices",
"offset":0
},
"_score":1,
"_source":{
"price":20,
"from":"2020-02-01",
"untill":"2020-03-01"
}
}
]
}
}
}

Related

Group and count by array of objects' keys

Given the following index definition and query:
curl -XDELETE "localhost:9200/products"
curl -XPUT "localhost:9200/products"
curl -XPUT "localhost:9200/products/_mapping" -H 'Content-Type: application/json' -d'
{
"properties": {
"opinions": {
"type": "nested",
"properties": {
"topic": {"type": "keyword"},
"count": {"type": "long"}
},
"include_in_parent": true
}
}
}'
curl -X POST "localhost:9200/products/_bulk" -H 'Content-Type: application/json' -d'
{"index":{"_id":1}}
{"opinions":[{"topic": "room", "count": 2}, {"topic": "kitchen", "count": 1}]}
{"index":{"_id":2}}
{"opinions":[{"topic": "room", "count": 1}, {"topic": "restroom", "count": 1}]}
'
sleep 2
curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"per_topic": {
"terms": {"field": "opinions.topic"},
"aggs": {
"counts": {
"sum": {"field": "opinions.count"}
}
}
}
}
}
'
Produces the result:
"aggregations" : {
"per_topic" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "room",
"doc_count" : 2,
"counts" : {
"value" : 5.0
}
},
{
"key" : "kitchen",
"doc_count" : 1,
"counts" : {
"value" : 3.0
}
},
{
"key" : "restroom",
"doc_count" : 1,
"counts" : {
"value" : 2.0
}
}
]
}
}
}
I'm expecting the sum of room to be 3, kitchen to be 1 and restroom to be 1, counting only the related nested documents, but instead it is summing all the nested count fields in all the matched the documents.
How can I sum only the matched aggregated nested documents?
UPDATE: solution based on comments
curl -X POST "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"opinions": {
"nested": {"path": "opinions"},
"aggs": {
"per_topic": {
"terms": {"field": "opinions.topic"},
"aggs": {
"counts": {
"sum": {"field": "opinions.count"}
}
}
}
}
}
}
}
'
The main initial problem was the use of object fields instead of nested fields: only using nested fields is it possible to preserve the structure [{"room", 2}, {"kitchen", 1}], as in object fields the data is flattened to {["room", "kitchen"], [1,2]} without relationships between "room" and 2.
Unluckily, at the moment is not possible to use the SQL API to group by (some?) nested fields, but it is possible to write a native Elastic query using nested aggregations.

Difference of two query results in Elasticsearch

Let's say we've indexes of e-commerce store data, and we want to get the difference of list of products which are present in 2 stores.
Information on the index content: A sample data stored in each document looks like below:
{
"product_name": "sample 1",
"store_slug": "store 1",
"sales_count": 42,
"date": "2018-04-04"
}
Below are queries which gets me all products present in 2 stores individually,
Data for store 1
curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": ["product_name"],
"query": {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "term" : { "store_slug" : "store_1"}}]}}}}}'
Data for store 2
curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": ["product_name"],
"query": {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "term" : { "store_slug" : "store_2"}}]}}}}}'
Is it possible with elasticsearch query to get the difference of both result(without doing using some script/ other languages)?
E.g. of above operation: Let's say "store 1" is selling products ["product 1", "product 2"] and "store 2" is selling products ["product 1", "product 3"], So expected output of difference of products of "store 1" and "store 2" is "product 2".
Why not doing it in a single query?
Products that are in store 1 but not in store 2:
curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d '{
"_source": [
"product_name"
],
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"term": {
"store_slug": "store_1"
}
}
],
"must_not": [
{
"term": {
"store_slug": "store_2"
}
}
]
}
}
}
}
}'
You can easily do the opposite, too.
UPDATE
After reading your updates, I think the best way to solve this is using terms aggregations, first by product and then by store and only select the products for which there is only a single store bucket (using a pipeline aggregation)
curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d '{
{
"size": 0,
"aggs": {
"products": {
"terms": {
"field": "product_name"
},
"aggs": {
"stores": {
"terms": {
"field": "store_slug"
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "stores._bucket_count"
},
"script": {
"source": "params.count == 1"
}
}
}
}
}
}
}'

How to get the number of hits of several matching fields in one record?

I have records similar to
{
"who": "John",
"hobby": [
{"name": "gardening",
"skills": 2
},
{"name": "sleeping",
"skills": 3
},
{"name": "darts",
"skills": 2
}
]
}
,
{
"who": "Mary",
"hobby": [
{"name": "gardening",
"skills": 2
},
{"name": "volleyball",
"skills": 3
},
{"name": "kung-fu",
"skills": 2
}
]
}
I am looking at building a query which would answer the question: "how many hobbies with skills=2 do we have?"
The answer for the example above would be 3 ("gardening" is common to both, and each have another unique one).
Every "query" or "query"+"aggs" I tried returns in ['hits']['hits'] or ['aggregations']['sources']['buckets'] the number of matching documents, that is two in the case above (one for "John" and one for "Mary", each of them satisfying the query).
Is there a way to build a query so that it returns the total number of fields (in the example above: the elements of the list "hobby") which matched that query? (fields, not documents)
Note: If my documents were flat:
{"who": "John", "name": "gardening", "skills": 2},
{"who": "John", "name": "sleeping", "skills": 3},
(...)
{"who": "Mary", "name": "kung-fu", "skills": 2}
then a simple "query" to match "skills": 2 + an aggregation on "name" would have done the work
Yes, you can achieve this with the nested type and using inner_hits and/or nested aggregations.
So here is the mapping you should use:
curl -XPUT localhost:9200/hobbies -d '{
"mappings": {
"hob": {
"properties": {
"who": {
"type": "string"
},
"hobby": {
"type": "nested", <--- the hobby list is of type nested
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"skills": {
"type": "integer"
}
}
}
}
}
}
}
Then we can insert your two sample documents using the _bulk endpoint like this:
curl -XPOST localhost:9200/hobbies/hob/_bulk -d '
{"index":{}}
{"who":"John", "hobby":[{"name": "gardening","skills": 2},{"name": "sleeping","skills": 3},{"name": "darts","skills": 2}]}
{"index":{}}
{"who":"Mary", "hobby":[{"name": "gardening","skills": 2},{"name": "volley-ball","skills": 3},{"name": "kung-fu","skills": 2}]}
'
And finally, we can query your index for how many hobbies have skills: 2 like this:
curl -XPOST localhost:9200/hobbies/hob/_search -d '{
"_source": false,
"query": {
"nested": {
"path": "hobby",
"query": {
"term": {
"hobby.skills": 2
}
},
"inner_hits": {} <---- this will return only the matching nested fields with skills=2
}
},
"aggs": {
"hobbies": {
"nested": {
"path": "hobby"
},
"aggs": {
"skills": {
"filter": {
"term": {
"hobby.skills": 2
}
},
"aggs": {
"by_field": { <--- this will return a breakdown of the fields with skills=2
"terms": {
"field": "name"
}
}
}
}
}
}
}
}'
What this query will return you is
In the hits part, the four fields that have skills: 2
In the aggs part, a breakdown of the 3 distinct fields which have skills: 2

Elasticsearch multi-select facet functionality with child aggregation

Given the following data:
curl -XPUT 'http://localhost:9200/products/'
curl -XPOST 'http://localhost:9200/products/product/_mapping' -d '{
"product": {
"_parent": {"type": "product_group"}
}
}'
curl -XPUT 'http://localhost:9200/products/product_group/1' -d '{
"title": "Product 1"
}'
curl -XPOST localhost:9200/products/product/1?parent=1 -d '{
"height": 190,
"width": 120
}'
curl -XPOST localhost:9200/products/product/2?parent=1 -d '{
"height": 120,
"width": 100
}'
curl -XPOST localhost:9200/products/product/3?parent=1 -d '{
"height": 110,
"width": 120
}'
Child aggregation on product results in the following facets:
Height
110 (1)
120 (1)
190 (1)
Width
120 (2)
100 (1)
If I now filter on height 190, what I would like is to have the height aggregation excluded from the filter so the results would be:
Height
110 (1)
120 (1)
190 (1)
Width
120 (1)
This is solvable with filter aggregation, but I'm not sure if it works or how the syntax is when using parent - child relations.
See http://distinctplace.com/2014/07/29/build-zappos-like-products-facets-with-elasticsearch/
What I've tried so far:
curl -XGET 'http://localhost:9200/products/product_group/_search?pretty=true' -d '{
"filter": {
"has_child": {
"type": "product",
"filter": {
"term": {"height": 190}
},
"inner_hits": {}
}
},
"aggs": {
"to-products": {
"children": {"type": "product"},
"aggs": {
"height": {
"filter": {"match_all": {}},
"aggs": {
"height": {
"terms": {"field": "height", "size": 10}
}
}
},
"width": {
"filter": {
"and": [{"terms": { "height": [190]}}]
},
"aggs": {
"width": {
"terms": {"field": "width", "size": 10}
}
}
}
}
}
}
}
'
I don't fully understand your question, but If you want to have multiple aggregation inside child aggregation, you have to append parent type name before every field in aggregation.
here is modified query,
curl -XPOST "http://localhost:9200/products/product_group/_search?pretty=true" -d'
{
"size": 0,
"filter": {
"has_child": {
"type": "product",
"filter": {
"term": {
"product.height": 190
}
},
"inner_hits": {}
}
},
"aggs": {
"to-products": {
"children": {
"type": "product"
},
"aggs": {
"height": {
"filter": {
"match_all": {}
},
"aggs": {
"height": {
"terms": {
"field": "product.height",
"size": 10
}
}
}
},
"width": {
"filter": {
"and": [
{
"terms": {
"product.height": [
190
]
}
}
]
},
"aggs": {
"width": {
"terms": {
"field": "product.width",
"size": 10
}
}
}
}
}
}
}
}'
It wasn't mentioned anywhere in documentation, which is confusing to many users, I guess they treat child aggregation same as nested aggregation so same way to aggregate.

Sort an elasicsearch resultset based on a filter term

For an ecommerce I am implementing elasticsearch in order to get a sorted and paginated resultset of product ids for a category.
I have a product document which looks like this:
PUT /products_test/product/1
{
"id": "1",
"title": "foobar",
"sort": 102,
"categories": [
"28554568",
"28554577",
"28554578"
],
}
To get the resultset I filter and sort like this:
POST /products/_search
{
"filter": {
"term": {
"categories": "28554666"
}
},
"sort" : [
{ "sort" : {"order" : "asc"}}
]
}
However, how I now learned the requirement is, that the product sorting depends on the category. Looking at the example above this means that I need to add a different sort value for each value in the categories array and depending on the category that I filter by I want to sort by the corresponding sort value.
The document should look something like this:
PUT /products_test/product/1
{
"id": "1",
"title": "foobar",
"categories": [
{ "id": "28554568", "sort": "102" },
{ "id": "28554577", "sort": "482" },
{ "id": "28554578", "sort": "2" }
]
}
My query now should be able to sort something like this:
POST /products/_search
{
"filter": {
"term": {
"categories.id": "28554666"
}
},
"sort" : [
{ "categories.{filtered_category_id}.sort" : {"order" : "asc"}}
]
}
Is it somehow possible to accomplish this?
To achieve this, you will have to store your categories as nested documents. If not, Elasticsearch will not know what sort is associated with what category ID.
Then, you will have to sort on the nested documents, by also filtering to choose the right one.
Here's a runnable example you can play with: https://www.found.no/play/gist/47282a07414e1432de6d
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"mappings": {
"type": {
"properties": {
"categories": {
"type": "nested"
}
}
}
}
}'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"id":1,"title":"foobar","categories":[{"id":"28554568","sort":102},{"id":"28554577","sort":482},{"id":"28554578","sort":2}]}
{"index":{"_index":"play","_type":"type"}}
{"id":2,"title":"barbaz","categories":[{"id":"28554577","sort":0}]}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"nested": {
"path": "categories",
"query": {
"term": {
"categories.id": {
"value": 28554577
}
}
}
}
},
"sort": {
"categories.sort": {
"order": "asc",
"nested_filter": {
"term": {
"categories.id": 28554577
}
}
}
}
}
'

Resources