Elasticsearch group and order by nested field's min value - sorting

I've got a structure of products which are available in different stores with different prices:
[{
"name": "SomeProduct",
"store_prices": [
{
"store": "FooStore1",
"price": 123.45
},
{
"store": "FooStore2",
"price": 345.67
}
]
},{
"name": "OtherProduct",
"store_prices": [
{
"store": "FooStore1",
"price": 456.78
},
{
"store": "FooStore2",
"price": 234.56
}
]
}]
I want to show a list of products, ordered by the lowest price ascending, limited to 10 results, in this way:
SomeProduct: 123.45 USD
OtherProduct: 234.56 USD
How to do this? I've tried the nested aggregation approach described in https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html but it only returns the min price of all products, not the respective min price for each product:
{
"_source": [
"name",
"store_prices.price"
],
"query": {
"match_all": {}
},
"sort": {
"store_prices.price": "asc"
},
"aggs": {
"stores": {
"nested": {
"path": "store_prices"
},
"aggs": {
"min_price": {"min": {"field": "store_prices.price"}}
}
}
},
"from": 0,
"size": 10
}
In SQL, what I want to do could be described using the following query. I'm afraid I'm thinking too much "in sql":
SELECT
p.name,
MIN(s.price) AS price
FROM
products p
INNER JOIN
store_prices s ON s.product_id = p.id
GROUP BY
p.id
ORDER BY
price ASC
LIMIT 10

You need a nested sorting:
{
"query": // HERE YOUR QUERY,
"sort": {
"store_prices.price": {
"order" : "asc",
"nested_path" : "store_prices",
"nested_filter": {
// HERE THE FILTERS WHICH ARE EVENTUALLY
// FILTERING OUT SOME OF YOUR STORES
}
}
}
}
Pay attention that you have to repeat the eventual nested queries inside the nested filter field. You find then the price in the score field of the response.

Related

How can we sort records by specific value of a filed in elastic search

We want to sort the records by specific value of a filed, for example :-
We have data with country code, name & other details and we want to show records at the top which have country code 'US', after us we want to show the results of country code 'AR'.
so if we are searching for obama, then all obama from US will come first and after that obama from AR will be available in results and we have also want to sort us records base on some rating score.
I am trying filter query with boost but not getting expected data because with filter we are getting only filtered records but we want sort the records basis on boost of specific value of country filed
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match_phrase_prefix": {
"name": {
"query": "obama"
}
}
}
],
"boost": 2.0
}
}
],
"filter": {
"bool": {
"should": [
{
"term": {
"countryCode": {
"value": "US",
"boost": 4
}
}
},
{
"term": {
"countryCode": {
"value": "AR",
"boost": 3
}
}
},
{
"term": {
"countryCode": {
"value": "ES",
"boost": 2
}
}
}
]
}
}
}
},
"size": 50,
"sort": [
{
"rating": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
}
]
}
Expectation :
All records which belongs with country US should be available on top base on sorting by rating
All records which belongs with country AR should be available after US's records with respective rating order
All records which belongs with country ES should be available after Ar's records with respective rating order
Expected example:
[
{name:"obama a", countryCode:us, rating:5}
{name:"obama b", countryCode:us, rating:4}
{name:"obama ac", countryCode:ar, rating:3}
{name:"obama ess", countryCode:es, rating:3.5}
]
If you want to tune the score but not drop the document you can use should.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
must
The clause (query) must appear in matching documents and will
contribute to the score.
filter
The clause (query) must appear in matching documents. However unlike
must the score of the query will be ignored. Filter clauses are
executed in filter context, meaning that scoring is ignored and
clauses are considered for caching.
should
The clause (query) should appear in the matching document.
must_not
The clause (query) must not appear in the matching documents. Clauses
are executed in filter context meaning that scoring is ignored and
clauses are considered for caching. Because scoring is ignored, a
score of 0 for all documents is returned.
Here is an example:
POST test_stackoverflow_us/_bulk?refresh=true&pretty
{ "index": {}}
{"name":"obama a", "countryCode":"us", "rating":5}
{ "index": {}}
{"name":"obama b", "countryCode":"us", "rating":4}
{ "index": {}}
{"name":"obama ac", "countryCode":"ar", "rating":3}
{ "index": {}}
{"name":"obama ess", "countryCode":"es", "rating":3.5}
GET test_stackoverflow_us/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"match_phrase_prefix": {
"name": {
"query": "obama"
}
}
}
],
"boost": 2
}
}
],
"should": [
{
"term": {
"countryCode": {
"value": "US",
"boost": 4
}
}
},
{
"term": {
"countryCode": {
"value": "AR",
"boost": 3
}
}
},
{
"term": {
"countryCode": {
"value": "ES",
"boost": 2
}
}
}
]
}
},
"size": 50,
"sort": [
{
"rating": {
"order": "desc"
}
},
{
"_score": {
"order": "desc"
}
}
]
}

Find all documnents with max value in field

I am new in elasticsearch, it is my first NoSql DB and I have some problem.
I have something like this:
index_logs: [
{
"entity_id": id_1,
"field_name": "name",
"old_value": null
"new_value": "some_name"
"number": 1
},
{
"entity_id": id_1,
"field_name": "description",
"old_value": null
"new_value": "some_descr"
"number": 1
},
{
"entity_id": id_1,
"field_name": "description",
"old_value": "some_descr"
"new_value": null
"number": 2
},
{
"entity_id": id_2,
"field_name": "enabled",
"old_value": true
"new_value":false
"number": 25
},
]
And I need to find all documents with specified entity_id and max_number which I do not know.
In postgres it will be as following code:
SELECT *
FROM logs AS x
WHERE x.entity_id = <some_entity_id>
AND x.number = (SELECT MAX(y.number) FROM logs AS y WHERE y.entity_id = x.entity_id)
How can I do it in elasticsearch?
Use the following query:
{
"query": {
"term": {
"entity_id": {
"value": "1"
}
}
},
"aggs": {
"max_number": {
"terms": {
"field": "number",
"size": 1,
"order": {
"_key": "desc"
}
},
"aggs": {
"top_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
The aggregation will group by the "number", sort by descending order of number, then give you the top 10 results with that number, inside the 'top_hits' subaggregation.
Another way to get "All" the documents is to simply use the same query, withouty any aggregations, and sort descending on the "number" field. On the client side you use pagination with "search_after", and paginate all the results, till the "number" field changes. The first time the number changes, you exit your pagination loop and you have all the records with the given entity id and the max number.

Sort multi-bucket aggregation by source fields inside inner multi-bucket aggregation

TL;DR: Using an inner multi-bucket aggregation (top_hits with size: 1) inside an outer multi-bucket aggregation, is it possible to sort the buckets of the outer aggregation by the data in the inner buckets?
I have the following index mappings
{
"parent": {
"properties": {
"children": {
"type": "nested",
"properties": {
"child_id": { "type": "keyword" }
}
}
}
}
}
and each child (in data) has also the properties last_modified: Date and other_property: String.
I need to fetch a list of children (of all the parents but without the parents), but only the one with the latest last_modified per each child_id. Then I need to sort and paginate those results to return manageable amounts of data.
I'm able to get the data and paginate over it with a combination of nested, terms, top_hits, and bucket_sort aggregations (and also get the total count with cardinality)
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"totalCount": {
"cardinality": {
"field": "children.child_id"
}
},
"oneChildPerId": {
"terms": {
"field": "children.child_id",
"order": { "_term": "asc" },
"size": 1000000
},
"aggs": {
"lastModified": {
"top_hits": {
"_source": [
"children.other_property"
],
"sort": {
"children.last_modified": {
"order": "desc"
}
},
"size": 1
}
},
"paginate": {
"bucket_sort": {
"from": 36,
"size": 3
}
}
}
}
}
}
}
}
but after more than a solid day of going through the docs and experimenting, I seem to be no closer to figuring out, how to sort the buckets of my oneChildPerId aggregation by the other_property of that single child retrieved by lastModified aggregation.
Is there a way to sort a multi-bucket aggregation by results in a nested multi-bucket aggregation?
What I've tried:
I thought I could use bucket_sort for that too, but apparently its sort can only be used with paths containing other single-bucket aggregations and ending in a metic one.
I've tried to find a way to somehow transform the 1-result multi-bucket of lastModified into a single-bucket, but haven't found any.
I'm using ElasticSearch 6.8.6 (the bucket_sort and similar tools weren't available in ES 5.x and older).
I had the same problem. I needed a terms aggregation with a nested top_hits, and want to sort by a specific field inside the nested aggregation.
Not sure how performant my solution is, but the desired behaviour can be achieved with a single-value metric aggregation on the same level as the top_hits. Then you can sort by this new aggregation in the terms aggregation with the order field.
Here an example:
POST books/_doc
{ "genre": "action", "title": "bookA", "pages": 200 }
POST books/_doc
{ "genre": "action", "title": "bookB", "pages": 35 }
POST books/_doc
{ "genre": "action", "title": "bookC", "pages": 170 }
POST books/_doc
{ "genre": "comedy", "title": "bookD", "pages": 80 }
POST books/_doc
{ "genre": "comedy", "title": "bookE", "pages": 90 }
GET books/_search
{
"size": 0,
"aggs": {
"by_genre": {
"terms": {
"field": "genre.keyword",
"order": {"max_pages": "asc"}
},
"aggs": {
"top_book": {
"top_hits": {
"size": 1,
"sort": [{"pages": {"order": "desc"}}]
}
},
"max_pages": {"max": {"field": "pages"}}
}
}
}
}
by_genre has the order field which sorts by a sub aggregation called max_pages. max_pages has only been added for this purpose. It creates a single-value metric by which the order is able to sort by.
Query above returns (I've shortened the output for clarity):
{ "genre" : "comedy", "title" : "bookE", "pages" : 90 }
{ "genre" : "action", "title" : "bookA", "pages" : 200 }
If you change "order": {"max_pages": "asc"} to "order": {"max_pages": "desc"}, the output becomes:
{ "genre" : "action", "title" : "bookA", "pages" : 200 }
{ "genre" : "comedy", "title" : "bookE", "pages" : 90 }
The type of the max_pages aggregation can be changed as needed , as long as it is a single-value metic aggregation (e.g. sum, avg, etc)

Group by terms and get count of nested array property?

I would like to get the count from a document series where an array item matches some value.
I have documents like these:
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 10
},{
"State": "PENDING"
"Timer": 5
}]
}
{
"Name": "jason",
"Todos": [{
"State": "COMPLETED"
"Timer": 5
},{
"State": "PENDING"
"Timer": 2
}]
}
{
"Name": "martin",
"Todos": [{
"State": "COMPLETED"
"Timer": 15
},{
"State": "PENDING"
"Timer": 10
}]
}
I would like to count how many documents I have where they have any Todos with COMPLETED State. And group by Name.
So from the above I would need to get:
jason: 2
martin: 1
Usually I do this with a term aggregation for the Name, and an other sub aggregation for other items:
"aggs": {
"statistics": {
"terms": {
"field": "Name"
},
"aggs": {
"test": {
"filter": {
"bool": {
"must": [{
"match_phrase": {
"SomeProperty.keyword": {
"query": "THEVALUE"
}
}
}
]
}
},
But not sure how to do this here as I have items in an array.
Elasticsearch has no problem with arrays because in fact it flattens them by default:
Arrays of inner object fields do not work the way you may expect. Lucene has no concept of inner objects, so Elasticsearch flattens object hierarchies into a simple list of field names and values.
So a query like the one you posted will do. I would use term query for keyword datatype, though:
POST mytodos/_search
{
"size": 0,
"aggs": {
"by name": {
"terms": {
"field": "Name"
},
"aggs": {
"how many completed": {
"filter": {
"term": {
"Todos.State": "COMPLETED"
}
}
}
}
}
}
}
I am assuming your mapping looks something like this:
PUT mytodos/_mappings
{
"properties": {
"Name": {
"type": "keyword"
},
"Todos": {
"properties": {
"State": {
"type": "keyword"
},
"Timer": {
"type": "integer"
}
}
}
}
}
The example documents that you posted will be transformed internally into something like this:
{
"Name": "jason",
"Todos.State": ["COMPLETED", "PENDING"],
"Todos.Timer": [10, 5]
}
However, if you need to query for Todos.State and Todos.Timer, for example, filter for those "COMPLETED" but only with Timer > 10, it will not be possible with such mapping because Elasticsearch forgets the link between fields of object array items.
In this case you would need to use something like nested datatype for such arrays, and query them with special nested query.
Hope that helps!

how to build a range aggregation on parent by minimum value in children docs

I have a parent/child relationship created between Product and Pricing documents. A Product has many Pricing, each with it's own subtotal field, and I'd simply like to create a range aggregation that only considers the minimum subtotal for each product and filters out the others.
I think this is possible using nested aggregations and filters, but this is the closest I've gotten:
POST /test_index/Product/_search
{
"aggs": {
"offered-at": {
"children": {
"type": "Pricing"
},
"aggs": {
"prices": {
"aggs": {
"min_price": {
"min": {
"field": "subtotal"
},
"aggs": {
"min_price_buckets": {
"range": {
"field": "subtotal",
"ranges": [
{
"to": 100
},
{
"from": 100,
"to": 200
},
{
"from": 200
}
]
}
}
}
}
}
}
}
}
}
}
However this results in the error nested: AggregationInitializationException[Aggregator [min_price] of type [min] cannot accept sub-aggregations]; }], which sort of makes sense because once you reduce to a single value there is nothing left to aggregate.
But how can I structure this so that the range aggregation is only pulling the minimum value from each set of children?
(here is a sense with mappings and test data : http://sense.qbox.io/gist/01b072b4566ef6885113dc94a796f3bdc56f19a9)

Resources