How to get the parent document in a nested top_hits aggregation? - elasticsearch

This is my document/mapping with a nested prices array:
{
"name": "Foobar",
"type": 1,
"prices": [
{
"date": "2016-03-22",
"price": 100.41
},
{
"date": "2016-03-23",
"price": 200.41
}
]
}
Mapping:
{
"properties": {
"name": {
"index": "not_analyzed",
"type": "string"
},
"type": {
"type": "byte"
},
"prices": {
"type": "nested",
"properties": {
"date": {
"format": "dateOptionalTime",
"type": "date"
},
"price": {
"type": "double"
}
}
}
}
}
I use a top_hits aggregation to get the min price of the nested price array. I also have to filter the prices by date. Here is the query and the response:
POST /index/type/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"prices": {
"nested": {
"path": "prices"
},
"aggs": {
"date_filter": {
"filter": {
"range": {
"prices.date": {
"gte": "2016-03-21"
}
}
},
"aggs": {
"min": {
"top_hits": {
"sort": {
"prices.price": {
"order": "asc"
}
},
"size": 1
}
}
}
}
}
}
}
}
Response:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 3,
"successful": 3,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": [
]
},
"aggregations": {
"prices": {
"doc_count": 4,
"date_filter": {
"doc_count": 4,
"min": {
"hits": {
"total": 4,
"max_score": null,
"hits": [
{
"_index": "index",
"_type": "type",
"_id": "4225796ALL2016061541031",
"_nested": {
"field": "prices",
"offset": 0
},
"_score": null,
"_source": {
"date": "2016-03-22",
"price": 100.41
},
"sort": [
100.41
]
}
]
}
}
}
}
}
}
Is there a way to get the parent source document (or some fields from it) with _id="4225796ALL2016061541031" in the response (e.g. name)? A second query is not an option.

Instead of applying aggregations use query and inner_hits like :
{
"query": {
"nested": {
"path": "prices",
"query": {
"range": {
"prices.date": {
"gte": "2016-03-21"
}
}
},
"inner_hits": {
"sort": {
"prices.price": {
"order": "asc"
}
},
"size": 1
}
}
}
}
Fetch data of parent_documentdata from _source and actual data from inner_hits.
Hope it helps

Related

Elasticsearch - Nested field sorting

I have an index defined by the following :
{
"mappings": {
"properties": {
"firstName": {
"type": "keyword"
},
"lastName": {
"type": "keyword"
},
"affiliations": {
"type": "nested",
"properties": {
"organisation": {
"type": "keyword"
},
"team": {
"type": "keyword"
},
"dateBeginning": {
"type": "date",
"format": "yyyy-MM-dd"
},
"dateEnding": {
"type": "date",
"format": "yyyy-MM-dd"
},
"country": {
"type": "keyword"
}
}
}
}
}
}
Basically, for each researcher (researchers is how I named my index) I want to sort the the affiliations by dateBeginning, in descending order. I've read about inner hits in the EL official doc, and not being exactly sure how it works I've tried this for researcher with _id : 3 :
{
"query": {
"nested": {
"path": "affiliations",
"query": {
"match": { "_id": 3 }
},
"inner_hits": {
"sort" : [
{
"affiliations.dateBeginning" : {
"order" : "desc",
"nested": {
"path": "affiliations",
"filter": {
"term": { "_id": 3 }
}
}
}
}
]
}
}
}
}
And it doesn't really work.
Having two affiliation for researchers with _id : 3, with one dateBeginning set on 2015-06-30, and the other on 2017-06-30. So I've tried this also :
{
"sort" : [
{
"affiliations.dateBeginning" : {
"order" : "desc",
"nested": {
"path": "affiliations"
}
}
}
],
"query": {
"nested": {
"path": "affiliations",
"query": {
"match": { "_id": 3 }
}
}
}
}
And it doesn't sort the affiliations by dateBeginning.
I've also tried to do it with the SQL API (since I'm more familiar with SQL language), and still, I can't get the data I want.
So I'm quite new to ElasticSearch, I'm using version 7.10, and I don't know what else to do.
Any suggestions about what I'm doing wrong here ?
EDIT
here's an example of a document from that index:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.0,
"hits": [{
"_index": "researchers",
"_type": "_doc",
"_id": "3",
"_score": 1.0,
"_source": {
"firstName": "Kimmich",
"lastName": "Yoshua",
"affiliations": [{
"organisation": "University of Ottawa",
"team": "Neural Network Elite Team",
"dateBeginning": "2015-06-30",
"datEnding": "2017-01-31",
"country": "Canada"
},
{
"organisation": "University of Montréal",
"team": "Picture processing team",
"dateBeginning": "2017-06-30",
"dateEnding": null,
"country": "Canada"
}
]
}
}]
}
}
Once you're inside the nested query, the inner hits don't need the extra nested query. Remove it and the sort will work properly:
{
"query": {
"nested": {
"path": "affiliations",
"query": {
"match": {
"_id": 3
}
},
"inner_hits": {
"sort": [
{
"affiliations.dateBeginning": {
"order": "desc"
}
}
]
}
}
}
}
Note that this wouldn't sort the top-level hits -- only the inner hits.
But you can sort on the top level by the values of affiliations.dateBeginning like so:
POST researchers/_search
{
"sort": [
{
"affiliations.dateBeginning": {
"order": "desc",
"nested_path": "affiliations"
}
}
]
}
but note that the syntax is now slightly different: instead of path we're saying nested_path.

Elastic Search: Aggregation sum on a particular field

I am new to elastic search and requesting some help.
Basically I have some 2 million documents in my elastic search and the documents look like below:
{
"_index": "flipkart",
"_type": "PSAD_ThirdParty",
"_id": "430001_MAM_2016-02-04",
"_version": 1,
"_score": 1,
"_source": {
"metrics": [
{
"id": "Metric1",
"value": 70
},
{
"id": "Metric2",
"value": 90
},
{
"id": "Metric3",
"value": 120
}
],
"primary": true,
"ticketId": 1,
"pliId": 206,
"bookedNumbers": 15000,
"ut": 1454567400000,
"startDate": 1451629800000,
"endDate": 1464589800000,
"tz": "EST"
}
}
I want to write an aggregation query which satisfies below conditions:
1) First query based on "_index", "_type" and "pliId".
2) Do aggregation sum on metrics.value based on metrics.id = "Metric1".
Basically I need to query records based on some fields and aggregate sum on a particular metrics value based on metrics id.
Please can you help me in getting my query right.
Your metrics field needs to be of type nested:
"metrics": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
}
}
}
If you want Metric1 to match, meaning upper-case letter, then as you see above the id needs to be not_analyzed.
Then, if you only want metrics.id = "Metric1" aggregations, you need something like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"pliId": 206
}
}
]
}
}
}
},
"aggs": {
"by_metrics": {
"nested": {
"path": "metrics"
},
"aggs": {
"metric1_only": {
"filter": {
"bool": {
"must": [
{
"term": {
"metrics.id": {
"value": "Metric1"
}
}
}
]
}
},
"aggs": {
"by_metric_id": {
"terms": {
"field": "metrics.id"
},
"aggs": {
"total_delivery": {
"sum": {
"field": "metrics.value"
}
}
}
}
}
}
}
}
}
}
Created new index:
Method : PUT ,
URL : http://localhost:9200/google/
Body:
{
"mappings": {
"PSAD_Primary": {
"properties": {
"metrics": {
"type": "nested",
"properties": {
"id": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "integer",
"index": "not_analyzed"
}
}
}
}
}
}
}
Then I inserted some 200 thousand documents and than ran the query and it worked.
Response:
{
"took": 34,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "google",
"_type": "PSAD_Primary",
"_id": "383701291_MAM_2016-01-06",
"_score": 1,
"_source": {
"metrics": [
{
"id": "Metric1",
"value": 70
},
{
"id": "Metric2",
"value": 90
},
{
"id": "Metric3",
"value": 120
}
],
"primary": true,
"ticketId": 1,
"pliId": 221244,
"bookedNumbers": 15000,
"ut": 1452061800000,
"startDate": 1451629800000,
"endDate": 1464589800000,
"tz": "EST"
}
}
]
},
"aggregations": {
"by_metrics": {
"doc_count": 3,
"metric1_only": {
"doc_count": 1,
"by_metric_id": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Metric1",
"doc_count": 1,
"total_delivery": {
"value": 70
}
}
]
}
}
}
}
}

summing a bunch of values given a condition in elasticsearch

Given the following elasticsearch document, how would I construct a search that would sum the values of the seconds column for a given datetime range?
See below for my current query.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "searchdb",
"_type": "profile",
"_id": "1825",
"_score": 1,
"_source": {
"id": 1825,
"market": "Chicago",
"geo_location": {
"lat": 41.1234,
"lon": -87.5678
},
"hourly_values": [
{
"datetime": "1997-07-16T19:00:00.00+00:00",
"seconds": 1200
},
{
"datetime": "1997-07-16T19:20:00.00+00:00",
"seconds": 1200
},
{
"datetime": "1997-07-16T19:20:00.00+00:00",
"seconds": 1200
}
]
}
},
{
"_index": "searchdb",
"_type": "profile",
"_id": "1808",
"_score": 1,
"_source": {
"id": 1808,
"market": "Chicago",
"geo_location": {
"lat": 41.1234,
"lon": -87.5678
},
"hourly_values": [
{
"datetime": "1997-07-16T19:00:00.00+00:00",
"seconds": 900
},
{
"datetime": "1997-07-16T19:20:00.00+00:00",
"seconds": 1200
},
{
"datetime": "1997-07-16T19:20:00.00+00:00",
"seconds": 800
}
]
}
}
]
}
Below is my current query. The problem with it is it doesn't take into consideration the datetime field. I need to be able to sum only the seconds values that fall within a given datetime range in the query.
{
"aggs": {
"Ids": {
"terms": {
"field": "id",
"size": 0
},
"aggs": {
"Nesting": {
"nested": {
"path": "hourly_values"
},
"aggs": {
"availability": {
"sum": {
"field": "hourly_values.seconds"
}
}
}
}
}
}
}
}
I know you can use a range, something like this:
"filter" : {
"range" : { "timestamp" : { "from" : "now/1d+9.5h", "to" : "now/1d+16h" }}
}
but I can't figure out how to integrate that into my query to get the desired output.
For clarity, my desired output is to return each of the objects returned from the query, and the values of the summation of the seconds fields, but I only want to sum the values for the given time range.
I think this can be done with filter aggregation
Try this
{
"aggs": {
"Ids": {
"terms": {
"field": "id",
"size": 0
},
"aggs": {
"Nesting": {
"nested": {
"path": "hourly_values"
},
"aggs": {
"filtered_result": {
"filter": {
"query": {
"range": {
"hourly_values.datetime": {
"gt": "1997-07-16T19:10:00.00+00:00",
"lt": "1997-07-16T19:22:00.00+00:00"
}
}
}
},
"aggs": {
"availability": {
"sum": {
"field": "hourly_values.seconds"
}
}
}
}
}
}
}
}
},
"size": 0
}
The result I get
"aggregations": {
"Ids": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1808",
"doc_count": 1,
"Nesting": {
"doc_count": 3,
"filtered_result": {
"doc_count": 2,
"availability": {
"value": 2000
}
}
}
},
{
"key": "1825",
"doc_count": 1,
"Nesting": {
"doc_count": 3,
"filtered_result": {
"doc_count": 2,
"availability": {
"value": 2400
}
}
}
}
]
}
}
Does this help?

Configuring elasticsearch to search and filter with has many / belongs to relationship

I have a Product model where each product has many skus.
I need to be able to search and filter via elasticsearch across both models, but not quite sure how to go about it. I'm currently uploading to elasticsearch in this format:
[{
id: 1
title: 'Product 1'
image: 'image1.jpg'
skus: [{
id: 1
material: 'cotton'
quantity: 4
},{
id: 2
material: 'polyester'
quantity: 22
}]
},{
...
}]
I can search the title just fine, but I am unsure as to how I could do something like
Search for title 'foobar' and filter by material 'cotton' and quantity > 5
Is this possible with elasticsearch?
Edit
I am open to uploading in a different format or using multiple indices.
I think the parent/child relationship is what you're looking for.
As a quick example, I can set up an index with a parent type and child type like this:
PUT /test_index
{
"mappings": {
"product": {
"properties": {
"id": {
"type": "long"
},
"image": {
"type": "string"
},
"title": {
"type": "string"
}
}
},
"sku": {
"_parent": {
"type": "product"
},
"properties": {
"id": {
"type": "long"
},
"material": {
"type": "string"
},
"quantity": {
"type": "long"
}
}
}
}
}
Then add a parent document and two child documents:
POST /test_index/_bulk
{"index":{"_type":"product","_id":1}}
{"id": 1,"title": "Product1","image": "image1.jpg"}
{"index":{"_type":"sku", "_id":1,"_parent":1}}
{"id": 1,"material": "cotton","quantity": 4}
{"index":{"_type":"sku","_id":2,"_parent":1}}
{"id": 2,"material": "polyester","quantity": 22}
Now if I search for a "product" with "title": "Product1" that has a child "sku" with "material": "cotton" and "quantity" greater than 5, I won't find one:
POST /test_index/product/_search
{
"query": {
"filtered": {
"query": {
"match": {
"title": "Product1"
}
},
"filter": {
"has_child": {
"type": "sku",
"filter": {
"bool": {
"must": [
{
"term": {
"material": "cotton"
}
},
{
"range": {
"quantity": {
"gt": 5
}
}
}
]
}
}
}
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
But if I search for a "product" with "title": "Product1" that has a child "sku" with "material": "polyester" and "quantity" greater than 5, I will find one:
POST /test_index/product/_search
{
"query": {
"filtered": {
"query": {
"match": {
"title": "Product1"
}
},
"filter": {
"has_child": {
"type": "sku",
"filter": {
"bool": {
"must": [
{
"term": {
"material": "polyester"
}
},
{
"range": {
"quantity": {
"gt": 5
}
}
}
]
}
}
}
}
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.4054651,
"hits": [
{
"_index": "test_index",
"_type": "product",
"_id": "1",
"_score": 1.4054651,
"_source": {
"id": 1,
"title": "Product1",
"image": "image1.jpg"
}
}
]
}
}
Here is some code I used for testing:
http://sense.qbox.io/gist/d1989a28372ac9daae335d585601c11818b2fa11

Nested query in nested, filter aggregation fails

I am trying to use a nested query filter inside of a nested, filter aggregation. When I do so, the aggregation returns with no items. If I change the query to just a plain old match_all filter, I do get items back in the bucket.
Here is a simplified version of the mapping I'm working with:
"player": {
"properties": {
"rating": {
"type": "float"
},
"playerYears": {
"type": "nested",
"properties": {
"schoolsOfInterest": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
This query, with a match_all filter on the aggregation:
GET /players/_search
{
"size": 0,
"aggs": {
"rating": {
"nested": {
"path": "playerYears"
},
"aggs": {
"rating-filtered": {
"filter": {
"match_all": {}
},
"aggs": {
"rating": {
"histogram": {
"field": "playerYears.rating",
"interval": 1
}
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"match_all": {}
}
}
}
}
returns the following:
{
"took": 16,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 167316,
"max_score": 0,
"hits": []
},
"aggregations": {
"rating": {
"doc_count": 363550,
"rating-filtered": {
"doc_count": 363550,
"rating": {
"buckets": [
{
"key_as_string": "-1",
"key": -1,
"doc_count": 20978
},
{
"key_as_string": "0",
"key": 0,
"doc_count": 312374
},
{
"key_as_string": "1",
"key": 1,
"doc_count": 1162
},
{
"key_as_string": "2",
"key": 2,
"doc_count": 12104
},
{
"key_as_string": "3",
"key": 3,
"doc_count": 9558
},
{
"key_as_string": "4",
"key": 4,
"doc_count": 5549
},
{
"key_as_string": "5",
"key": 5,
"doc_count": 1825
}
]
}
}
}
}
}
But this query, which has a nested filter in the aggregation, returns an empty bucket:
GET /players/_search
{
"size": 0,
"aggs": {
"rating": {
"nested": {
"path": "playerYears"
},
"aggs": {
"rating-filtered": {
"filter": {
"nested": {
"query": {
"match_all": {}
},
"path": "playerYears.schoolsOfInterest"
}
},
"aggs": {
"rating": {
"histogram": {
"field": "playerYears.rating",
"interval": 1
}
}
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"match_all": {}
}
}
}
}
the empty bucket:
{
"took": 8,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 167316,
"max_score": 0,
"hits": []
},
"aggregations": {
"rating": {
"doc_count": 363550,
"rating-filtered": {
"doc_count": 0,
"rating": {
"buckets": []
}
}
}
}
}
Is it possible to use nested filters inside of nested, filtered aggregations? Is there a known bug in elasticsearch about this? The nested filter works fine in the query context of the search, and it works fine if I don't use a nested aggregation.
Based on the information provided, and a few assumptions, I would like to provide two suggestions. I hope it helps solve your problem.
Case 1: using reverse nested aggregation:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"rating": {
"nested": {
"path": "playerYears.schoolsOfInterest"
},
"aggs": {
"rating-filtered": {
"filter": {
"match_all": {}
},
"aggs": {
"rating_nested": {
"reverse_nested": {},
"aggs": {
"rating": {
"histogram": {
"field": "rating",
"interval": 1
}
}
}
}
}
}
}
}
}
}
Case 2: changes to filtered aggregation:
{
"size": 0,
"aggs": {
"rating-filtered": {
"filter": {
"nested": {
"query": {
"match_all": {}
},
"path": "playerYears.schoolsOfInterest"
}
},
"aggs": {
"rating": {
"histogram": {
"field": "playerYears.rating",
"interval": 1
}
}
}
}
},
"query": {
"filtered": {
"filter": {
"match_all": {}
}
}
}
}
I would suggest you to use case 1 and verify your required results.

Resources