how to build a range aggregation on parent by minimum value in children docs - elasticsearch

I have a parent/child relationship created between Product and Pricing documents. A Product has many Pricing, each with it's own subtotal field, and I'd simply like to create a range aggregation that only considers the minimum subtotal for each product and filters out the others.
I think this is possible using nested aggregations and filters, but this is the closest I've gotten:
POST /test_index/Product/_search
{
"aggs": {
"offered-at": {
"children": {
"type": "Pricing"
},
"aggs": {
"prices": {
"aggs": {
"min_price": {
"min": {
"field": "subtotal"
},
"aggs": {
"min_price_buckets": {
"range": {
"field": "subtotal",
"ranges": [
{
"to": 100
},
{
"from": 100,
"to": 200
},
{
"from": 200
}
]
}
}
}
}
}
}
}
}
}
}
However this results in the error nested: AggregationInitializationException[Aggregator [min_price] of type [min] cannot accept sub-aggregations]; }], which sort of makes sense because once you reduce to a single value there is nothing left to aggregate.
But how can I structure this so that the range aggregation is only pulling the minimum value from each set of children?
(here is a sense with mappings and test data : http://sense.qbox.io/gist/01b072b4566ef6885113dc94a796f3bdc56f19a9)

Related

Elasticsearch aggregate on term multiple times per different time range

I'm trying to aggregate a field by each half of the time-range given in the query. For example, here's the query:
{
"query": {
"simple_query_string": {
"query": "+sitetype:(redacted) +sort_date:[now-2h TO now]"
}
}
}
...and I want to aggregate on term "product1.keyword" from now-2h to now-1h and aggregate on the same term "product1.keyword" from now-1h to now, so like:
"terms": {
"field": "product1",
"size": 10,
}
^ aggregate the top 10 results on product1 in now-2h TO now-1h,
and aggregate the top 10 results on product1 in now-1h TO now.
Clarification: product1 is not a date or time-related field. It would be like a type of car, phone, etc.
if you want use now in your query,you must make product1 field as date type,then you can try as below:
GET index1/_search
{
"size": 0,
"aggs": {
"dataAgg": {
"date_range": {
"field": "product1",
"ranges": [
{
"from": "now-2h",
"to": "now-1h"
},
{
"from": "now-1h",
"to": "now"
}
]
},
"aggs": {
"top10": {
"top_hits": {
"size": 10
}
}
}
}
}
}
and if you can't change product1's type ,you can try rang agg,but you must write the time explicitly instead of using now

Sort multi-bucket aggregation by source fields inside inner multi-bucket aggregation

TL;DR: Using an inner multi-bucket aggregation (top_hits with size: 1) inside an outer multi-bucket aggregation, is it possible to sort the buckets of the outer aggregation by the data in the inner buckets?
I have the following index mappings
{
"parent": {
"properties": {
"children": {
"type": "nested",
"properties": {
"child_id": { "type": "keyword" }
}
}
}
}
}
and each child (in data) has also the properties last_modified: Date and other_property: String.
I need to fetch a list of children (of all the parents but without the parents), but only the one with the latest last_modified per each child_id. Then I need to sort and paginate those results to return manageable amounts of data.
I'm able to get the data and paginate over it with a combination of nested, terms, top_hits, and bucket_sort aggregations (and also get the total count with cardinality)
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"children": {
"nested": {
"path": "children"
},
"aggs": {
"totalCount": {
"cardinality": {
"field": "children.child_id"
}
},
"oneChildPerId": {
"terms": {
"field": "children.child_id",
"order": { "_term": "asc" },
"size": 1000000
},
"aggs": {
"lastModified": {
"top_hits": {
"_source": [
"children.other_property"
],
"sort": {
"children.last_modified": {
"order": "desc"
}
},
"size": 1
}
},
"paginate": {
"bucket_sort": {
"from": 36,
"size": 3
}
}
}
}
}
}
}
}
but after more than a solid day of going through the docs and experimenting, I seem to be no closer to figuring out, how to sort the buckets of my oneChildPerId aggregation by the other_property of that single child retrieved by lastModified aggregation.
Is there a way to sort a multi-bucket aggregation by results in a nested multi-bucket aggregation?
What I've tried:
I thought I could use bucket_sort for that too, but apparently its sort can only be used with paths containing other single-bucket aggregations and ending in a metic one.
I've tried to find a way to somehow transform the 1-result multi-bucket of lastModified into a single-bucket, but haven't found any.
I'm using ElasticSearch 6.8.6 (the bucket_sort and similar tools weren't available in ES 5.x and older).
I had the same problem. I needed a terms aggregation with a nested top_hits, and want to sort by a specific field inside the nested aggregation.
Not sure how performant my solution is, but the desired behaviour can be achieved with a single-value metric aggregation on the same level as the top_hits. Then you can sort by this new aggregation in the terms aggregation with the order field.
Here an example:
POST books/_doc
{ "genre": "action", "title": "bookA", "pages": 200 }
POST books/_doc
{ "genre": "action", "title": "bookB", "pages": 35 }
POST books/_doc
{ "genre": "action", "title": "bookC", "pages": 170 }
POST books/_doc
{ "genre": "comedy", "title": "bookD", "pages": 80 }
POST books/_doc
{ "genre": "comedy", "title": "bookE", "pages": 90 }
GET books/_search
{
"size": 0,
"aggs": {
"by_genre": {
"terms": {
"field": "genre.keyword",
"order": {"max_pages": "asc"}
},
"aggs": {
"top_book": {
"top_hits": {
"size": 1,
"sort": [{"pages": {"order": "desc"}}]
}
},
"max_pages": {"max": {"field": "pages"}}
}
}
}
}
by_genre has the order field which sorts by a sub aggregation called max_pages. max_pages has only been added for this purpose. It creates a single-value metric by which the order is able to sort by.
Query above returns (I've shortened the output for clarity):
{ "genre" : "comedy", "title" : "bookE", "pages" : 90 }
{ "genre" : "action", "title" : "bookA", "pages" : 200 }
If you change "order": {"max_pages": "asc"} to "order": {"max_pages": "desc"}, the output becomes:
{ "genre" : "action", "title" : "bookA", "pages" : 200 }
{ "genre" : "comedy", "title" : "bookE", "pages" : 90 }
The type of the max_pages aggregation can be changed as needed , as long as it is a single-value metic aggregation (e.g. sum, avg, etc)

Elasticsearch group and order by nested field's min value

I've got a structure of products which are available in different stores with different prices:
[{
"name": "SomeProduct",
"store_prices": [
{
"store": "FooStore1",
"price": 123.45
},
{
"store": "FooStore2",
"price": 345.67
}
]
},{
"name": "OtherProduct",
"store_prices": [
{
"store": "FooStore1",
"price": 456.78
},
{
"store": "FooStore2",
"price": 234.56
}
]
}]
I want to show a list of products, ordered by the lowest price ascending, limited to 10 results, in this way:
SomeProduct: 123.45 USD
OtherProduct: 234.56 USD
How to do this? I've tried the nested aggregation approach described in https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html but it only returns the min price of all products, not the respective min price for each product:
{
"_source": [
"name",
"store_prices.price"
],
"query": {
"match_all": {}
},
"sort": {
"store_prices.price": "asc"
},
"aggs": {
"stores": {
"nested": {
"path": "store_prices"
},
"aggs": {
"min_price": {"min": {"field": "store_prices.price"}}
}
}
},
"from": 0,
"size": 10
}
In SQL, what I want to do could be described using the following query. I'm afraid I'm thinking too much "in sql":
SELECT
p.name,
MIN(s.price) AS price
FROM
products p
INNER JOIN
store_prices s ON s.product_id = p.id
GROUP BY
p.id
ORDER BY
price ASC
LIMIT 10
You need a nested sorting:
{
"query": // HERE YOUR QUERY,
"sort": {
"store_prices.price": {
"order" : "asc",
"nested_path" : "store_prices",
"nested_filter": {
// HERE THE FILTERS WHICH ARE EVENTUALLY
// FILTERING OUT SOME OF YOUR STORES
}
}
}
}
Pay attention that you have to repeat the eventual nested queries inside the nested filter field. You find then the price in the score field of the response.

Elasticsearch get n ordered records and then apply grouping

Here's an example of what I'm looking for. Let's say I have records of some purchases. I want to get records where price is > $50 and order by price descending. I want to limit those ordered records to 100 and then group them by zip code.
Final result should have counts of hits for each zip where sum of those counts would total to 100 record.
ES v2.1.1
what do you mean by "group them by zip code":
just want to know the number of docs in the group?
a hash with zip code as the key associated with docs?
If 1:
{
"size": 100,
"query": {
"filtered": {
"filter": {
"range": {
"price": {
"gt": 50
}
}
}
}
},
"sort": {
"price": "desc"
},
"aggs": {
"by_zip_code": {
"terms": {
"field": "zip_code"
}
}
}
}
If 2, you may use the top hits aggregations. However, sorting by price is not possible (how could we do that?), and by default Elasticsearch uses the _count (check intrinsic sorts out). If the sort is not a big deal, the following will work:
{
"size": 0,
"query": {
"filtered": {
"filter": {
"range": {
"price": {
"gt": 50
}
}
}
}
},
"sort": {
"price": "desc"
},
"aggs": {
"by_zip_code": {
"terms": {
"field": "zip_code",
"size": 100
},
"aggs": {
"hits": {
"top_hits": {}
}
}
}
}
}
You need to use the Search API to get the 100 results and then post-process to perform the aggregation (since an aggregation of top hits cannot be done directly using the ES API).
"I want to get records where price is > $50" - You need a range filter.
"...order by price descending" - You need a sort.
"I want to limit those ordered records to 100" - You need to specify
the size parameter.
"...then group them by zip code" - You need to post-process the "hits":"hits" array to do this (e.g. inserting into a hash table / dictionary with zip code as the key values).
For steps 1-3 you need:
$ curl -XGET 'http://localhost:9200/my_index/_search?pretty' -d '{"query":
{"filtered" : {"filter" : { "range": { "price": { "gt": 50 }}}}},
"size" : 100,
"sort": { "price": { "order": "desc" }}
}'

Elasticsearch: Limit filtered query to 5 items per type per day

I'm using elasticsearch to gather data for my frontpage on my event-portal. the current query is as follows:
{
"query": {
"function_score": {
"filter": {
"and": [
{
"geo_distance": {
"distance": "50km",
"location": {
"lat": 50.78,
"lon": 6.08
},
"_cache": true
}
},
{
"or": [
{
"and": [
{
"term": {
"type": "event"
}
},
{
"range": {
"datetime": {
"gt": "now"
}
}
}
]
},
{
"not": {
"term": {
"type": "event"
}
}
}
]
}
]
},
"functions": [
...
]
}
}
}
So basically all events in an 50km distance which are future events or other types. Other types could be status, photo, video, soundcloud etc... All these items have a datetime field and a parent field which account the items belongs to. There are some functions after the filter for scoring objects based on there distance and age.
Now my question:
Is there a way to filter the query to get only the first (or even better highest scored) 5 items per type per account per day?
So currently I have accounts which upload 20 images at the same time. This is too much to display on the frontpage.
I thought about using filter scripts in a post_filter. But i am not very familiar with this topic.
Any ideas?
many thanks in advance
DTFagus
I solved it this way:
"aggs": {
"byParent": {
"terms": {
"field": "parent_id"
},
"aggs": {
"byType": {
"terms": {
"field": "type"
},
"aggs": {
"perDay": {
"date_histogram" : {
"field" : "datetime",
"interval": "day"
},
"aggs": {
"topHits": {
"top_hits": {
"size": 5,
"_source": {
"include": ["path"]
}
}
}
}
}
}
}
}
}
}
Unfortunately there is no pagination for aggregations (or other way around: the pagination of the query is not used). So I will get the paginated query results and the aggregation of all hits and intersect the arrays in js. Does not sound very efficient but I currently have no better idea. Anyone?
The only way around this I see would be to index all data into two indices. One containing all data and one with only the top 5 per day per type per account. This would be less time consuming to query but more time and storage consuming while indexing :/
You can limit the number of results returned by your query using the "size" parameter.if you set size to 5, then you will get the first 5 results returned by your query.
Check the documentation http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/pagination.html
Hope this helps!

Resources