Elasticsearch Match Date Range or Number in Array - elasticsearch

My goal is to filter my records by date and a day of the week (Mo = 1, Tue = 2, Thu = 3, ..., Sun = 7). In this case, either the date or the weekday should match any of the days in the array. Or both, of course. I am new to Elasticsearch and seem to have a number of mistakes in my query. I documented everything here, as far as I got and hope for a couple of helpful insights. Thanks in advance.
Current Mapping
{
"index":{
"mappings":{
"entity":{
"_meta":{
"model":"AppBundle\\Entity\\Entity"
},
"properties":{
"subEntity":{
"properties":{
"date":{
"type":"date",
"format":"strict_date_optional_time||epoch_millis"
},
"days":{
"properties":{
"day":{
"type":"string"
}
}
}
}
}
}
}
}
}
}
Current Records
curl -XGET 'localhost:9200/index/_search?pretty=1'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 1.0,
"hits" : [ {
"_index" : "index",
"_type" : "entity",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"subEntity" : [ {
"date" : "2016-09-20T00:00:00+02:00",
"days" : [ ]
}, {
"date" : "2016-09-21T00:00:00+02:00",
"days" : [ ]
}, {
"date" : "2016-09-22T00:00:00+02:00",
"days" : [ {
"day" : 4
}, {
"day" : 5
}, {
"day" : 6
} ]
}, {
"date" : "2016-09-20T00:00:00+02:00",
"days" : [ ]
} ]
}
},
[...]
}
}
Current Request
{
"query":{
"should":{
"filter":[ {
"range":{
"entity.subEntity.date":{
"gte":"2016-09-20",
"lte":"2016-09-21"
}
}
}, {
"term":{
"entity.subEntity.days.day": 2
}
} ]
}
}
}
MySQL Equivalent
SELECT entity
FROM entity
LEFT JOIN subEntity ON (subEntity.entity_id = entity.id)
LEFT JOIN day ON (day.subEntity_id = subEntity.id)
WHERE subEntity.date BETWEEN 2016-09-20 AND 2016-09-21
OR day = 2

If you want to query across properties of a sub-object within a document (where a document may have a collection of such sub-objects), you need to map subEntity as a nested type. In your example, since you are only looking for documents that are within the date range or match the day value, you can use an object mapping as have, but if you need to combine queries with an and operation, then you would need a nested type mapping. If you need to do this, it would make sense to map as a nested type. Additionally, since day is a numeric value, you should map it as a byte.
{
"index":{
"mappings":{
"entity":{
"_meta":{
"model":"AppBundle\\Entity\\Entity"
},
"properties":{
"subEntity":{
"type": "nested",
"properties":{
"date":{
"type":"date",
"format":"strict_date_optional_time||epoch_millis"
},
"days":{
"properties":{
"day":{
"type":"byte"
}
}
}
}
}
}
}
}
}
}
Now that subEntity is mapped as a nested type, a nested query needs to be used to query against it, so the query becomes
{
"query": {
"nested": {
"query": {
"bool": {
"should": [
{
"bool": {
"filter": [
{
"range": {
"subEntity.date": {
"gte": "2016-09-20",
"lte": "2016-09-21"
}
}
}
]
}
},
{
"bool": {
"filter": [
{
"terms": {
"subEntity.days.day": [
2
]
}
}
]
}
}
]
}
},
"path": "subEntity"
}
}
}
Both queries are issued as bool filter queries as we don't need to calculate a relevancy score for either, we simply need to know if a document matches or not i.e. a simple yes/no answer. Warpping a query in a bool filter means that the query runs in a filter context.
Next, either query can match, so we add both as should clauses to an outer bool query.
As a complete example:
Create index and mapping
PUT http://localhost:9200/entities?pretty=true
{
"settings": {
"index.number_of_replicas": 0,
"index.number_of_shards": 1
},
"mappings": {
"entity": {
"properties": {
"id": {
"type": "integer"
},
"subEntity": {
"type": "nested",
"properties": {
"date": {
"type": "date"
},
"days": {
"properties": {
"day": {
"type": "short"
}
},
"type": "object"
}
}
}
}
}
}
}
Bulk index four entities
POST http://localhost:9200/_bulk?pretty=true
{"index":{"_index":"entities","_type":"entity","_id":"1"}}
{"subEntity":{"date":"2016-09-19T05:00:00+00:00"}}
{"index":{"_index":"entities","_type":"entity","_id":"2"}}
{"subEntity":{"date":"2016-09-20T05:00:00+00:00"}}
{"index":{"_index":"entities","_type":"entity","_id":"3"}}
{"subEntity":{"date":"2016-09-18T18:00:00+00:00","days":[{"day":2},{"day":5}]}}
{"index":{"_index":"entities","_type":"entity","_id":"4"}}
{"subEntity":{"date":"2016-09-18T18:00:00+00:00","days":[{"day":3},{"day":4}]}}
Issue the search query above
POST http://localhost:9200/entities/entity/_search?pretty=true
{
"query": {
"nested": {
"query": {
"bool": {
"should": [
{
"bool": {
"filter": [
{
"range": {
"subEntity.date": {
"gte": "2016-09-20",
"lte": "2016-09-21"
}
}
}
]
}
},
{
"bool": {
"filter": [
{
"terms": {
"subEntity.days.day": [
2
]
}
}
]
}
}
]
}
},
"path": "subEntity"
}
}
}
We should only get back entities with ids 2 and 3; id 2 matches on date and id 3 matches on day
{
"took" : 4,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.0,
"hits" : [ {
"_index" : "entities",
"_type" : "entity",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"subEntity" : {
"date" : "2016-09-20T05:00:00+00:00"
}
}
}, {
"_index" : "entities",
"_type" : "entity",
"_id" : "3",
"_score" : 0.0,
"_source" : {
"subEntity" : {
"date" : "2016-09-18T18:00:00+00:00",
"days" : [ {
"day" : 2
}, {
"day" : 5
} ]
}
}
} ]
}
}

Your Solution can be easily achieved using "or" query but now in es 2.0.0 onwards "or" query is deprecated. in-place of using or query we can use "bool" query now. Sample query is given below
{
"query": {
"bool" : {
"should" : [
{
"term" : { "CREAT_DT": "2015-11-03T07:49:07.000Z" }
},
{
"term" : { "TableName": "dwd" }
}
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}
More details about it's uses can be found in below link
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-bool-query.html

Related

elasticsearch filter nested object

I have an index with a nested object containing two attributes namely scopeId and categoryName. Following is the mappings part of the index
"mappedCategories" : {
"type" : "nested",
"properties": {
"scopeId": {"type":"long"},
"categoryName": {"type":"text",
"analyzer" : "productSearchAnalyzer",
"search_analyzer" : "productSearchQueryAnalyzer"}
}
}
A sample document containing the nested mappedCategories object is as follows:
POST productsearchna_2/_doc/1
{
"categoryName" : "Operating Systems",
"contexts" : [
0
],
"countryCode" : "US",
"id" : "10076327-1",
"languageCode" : "EN",
"localeId" : 1,
"mfgpartno" : "test123",
"manufacturerName" : "Hewlett Packard Enterprise",
"productDescription" : "HPE Microsoft Windows 2000 Datacenter Server - Complete Product - Complete Product - 1 Server - Standard",
"productId" : 10076327,
"skus" : [
{"sku": "43233004",
"skuName": "UNSPSC"},
{"sku": "43233049",
"skuName": "SP Richards"},
{"sku": "43234949",
"skuName": "Ingram Micro"}
],
"mappedCategories" : [
{"scopeId": 3228552,
"categoryName": "Laminate Bookcases"},
{"scopeId": 3228553,
"categoryName": "Bookcases"},
{"scopeId": 3228554,
"categoryName": "Laptop"}
]
}
I want to filter categoryName "lap" on scopeId: 3228553 i.e. my query should return 0 hits since Laptop is mapped to scopeId 3228554. But my following query is returning 1 hit with scopeId : 3228554
POST productsearchna_2/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "mappedCategories",
"query": {
"term": {
"mappedCategories.categoryName": "lap"
}
},
"inner_hits": {}
}
}
],
"filter": [
{
"nested": {
"path": "mappedCategories",
"query": {
"term": {
"mappedCategories.scopeId": {
"value": 3228552
}
}
}
}
}
]
}
},
"_source": ["mappedCategories.categoryName", "productId"]
}
Following is part of the result of the query:
"inner_hits" : {
"mappedCategories" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.5586993,
"hits" : [
{
"_index" : "productsearchna_2",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "mappedCategories",
"offset" : 2
},
"_score" : 1.5586993,
"_source" : {
"scopeId" : 3228554,
"categoryName" : "Laptop"
}
}
]
}
}
I want my query to return zero hits, and in case I search for "book" with scopeId: 3228552, I want my query to return 2 hits, 1 for Bookcases and another for Laminate Bookcases categoryNames. Please help.
This query solves part of the problem but when searching for book" with scopeId: 3228552 it will only get 1 result.
GET idx_test/_search?filter_path=hits.hits.inner_hits
{
"query": {
"nested": {
"path": "mappedCategories",
"query": {
"bool": {
"filter": [
{
"term": {
"mappedCategories.scopeId": {
"value": 3228553
}
}
}
],
"must": [
{
"match": {
"mappedCategories.categoryName": "laptop"
}
}
]
}
},
"inner_hits": {}
}
}
}

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

Include parent _source fields in nested top hits aggregation

I am trying to aggregate on a field and get the top records using top_ hits but I want to include other fields in the response which are not included in the nested property mapping. Currently if I specify _source:{"include":[]}, I am able to get only the fields which are in the current nested property.
Here is my mapping
{
"my_cart":{
"mappings":{
"properties":{
"store":{
"properties":{
"name":{
"type":"keyword"
}
}
},
"sales":{
"type":"nested",
"properties":{
"Price":{
"type":"float"
},
"Time":{
"type":"date"
},
"product":{
"properties":{
"name":{
"type":"text",
"fields":{
"keyword":{
"type":"keyword"
}
}
}
}
}
}
}
}
}
}
}
UPDATE
Joe's answer solved my above issue.
My current issue in response is that though I am getting the product name as "key" and other details, But I am getting other product names as well in the hits which were part of that transaction in the billing receipt. I want to aggregate on the product's name and find last sold date of each product along with other details such as price,quantity, etc .
Current Response
"aggregations" : {
"aggregate_by_most_sold_product" : {
"doc_count" : 2878592,
"all_products" : {
"buckets" : [
{
"key" : "shampoo",
"doc_count" : 1,
"lastSold" : {
"value" : 1.602569793E12,
"value_as_string" : "2018-10-13T06:16:33.000Z"
},
"using_reverse_nested" : {
"doc_count" : 1,
"latest product" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "my_cart",
"_type" : "_doc",
"_id" : "36303258-9r7w-4b3e-ba3d-fhds7cfec7aa",
"_source" : {
"cashier" : {
"firstname" : "romeo",
"uuid" : "2828dhd-0911-7229-a4f8-8ab80dde86a6"
},
"product_price": {
"price":20,
"discount_offered":10
},
"sales" : [
{
"product" : {
"name" : "shampoo",
"time":"2018-10-13T04:44:26+00:00
},
"product" : {
"name" : "noodles",
"time":"2018-10-13T04:42:26+00:00
},
"product" : {
"name" : "biscuits",
"time":"2018-10-13T04:41:26+00:00
}
}
]
}
}
]
}
}
]
Expected Response
It gives me all product name's in that transaction which is increasing the bucket size. I only want single product name with the last date sold along with other details for each product.
My aggregation is same as Joe's aggregation in answer
Also my doubt is that can I also add scripts to perform actions on fields which I got in _source.
Ex:- price-discount_offered = Final amount.
The nested context does not have access to the parent unless you use reverse_nested. In that case, however, you've lost the ability to only retrieve the applicable nested subdocument. But there is luckily a way to sort a terms aggregation by the result of a different, numeric one:
GET my_cart/_search
{
"size": 0,
"aggs": {
"aggregate": {
"nested": {
"path": "sales"
},
"aggs": {
"all_products": {
"terms": {
"field": "sales.product.name.keyword",
"size": 6500,
"order": { <--
"lowest_date": "asc"
}
},
"aggs": {
"lowest_date": { <--
"min": {
"field": "sales.Time"
}
},
"using_reverse_nested": {
"reverse_nested": {}, <--
"aggs": {
"latest product": {
"top_hits": {
"_source": {
"includes": [
"store.name"
]
},
"size": 1
}
}
}
}
}
}
}
}
}
}
The caveat is that you won't be getting the store.name inside of the top_hits -- though I suspect you're probably already doing some post-processing on the client side where you could combine those entries:
"aggregate" : {
...
"all_products" : {
...
"buckets" : [
{
"key" : "myproduct", <--
...
"using_reverse_nested" : {
...
"latest product" : {
"hits" : {
...
"hits" : [
{
...
"_source" : {
"store" : {
"name" : "mystore" <--
}
}
}
]
}
}
},
"lowest_date" : {
"value" : 1.4200704E12,
"value_as_string" : "2015/01/01" <--
}
}
]
}
}

SQl equivalent Correlated query for ElasticSearch Aggregation

I have a use case for writing an aggregation which if written in SQL can be achieved using correlated queries.
I have a index called listings where the properties/columns are ListDate, ListPrice, SoldDate, SoldPrice, OffMarketDate.
ListDate is not nullable, but SoldDate,SoldPrice, OffMarketDate can be nullable.
I want to aggregate stats from the above index based on the following requirement.
I want to have monthly stats, which I see can be achieved by
DateHistogramAggregation
For each month from the
DateHistogramAggregation, I want to find the listings as follows:
Example: For Jan 2019, get all the listings where (ListDate< Feb 1st, 2019) and (SoldDate is null or SoldDate<Jan 1st, 2019) and (OffMarketDate is null or OffMarketDate< Jan 1st, 2019)
Then run the aggregation function for those lists each month.
I appreciate any suggestions to implement this use case. Thanks in advance for the help.
Please see the below details and info as how you can approach this problem:
Mapping:
PUT listings
{
"mappings": {
"properties": {
"listDate":{
"type": "date"
},
"listPrice":{
"type": "long"
},
"soldDate":{
"type": "date"
},
"soldPrice": {
"type": "long"
},
"offMarketDate": {
"type": "date"
}
}
}
}
Note that I've constructed the above mapping looking at your question.
Sample Documents:
POST listings/_doc/1
{
"listDate": "2020-01-01",
"listPrice": "100.00",
"soldDate": "2019-12-25",
"soldPrice": "120.00",
"offMarketDate": "2019-12-20"
}
POST listings/_doc/2
{
"listDate": "2020-01-01",
"listPrice": "100.00",
"soldDate": "2019-12-24",
"soldPrice": "122.00",
"offMarketDate": "2019-12-20"
}
POST listings/_doc/3
{
"listDate": "2020-01-25",
"listPrice": "120.00",
"soldDate": "2020-01-30",
"soldPrice": "140.00",
"offMarketDate": "2020-01-26"
}
POST listings/_doc/4
{
"listDate": "2020-01-25",
"listPrice": "120.00",
"soldDate": "2020-02-02",
"soldPrice": "135.00",
"offMarketDate": "2020-01-26"
}
POST listings/_doc/5
{
"listDate": "2020-01-25",
"listPrice": "120.00"
}
POST listings/_doc/6
{
"listDate": "2020-02-02",
"listPrice": "120.00"
}
Note how I've not added the soldDate and offMarketDate in the docs 5 and 6 as that would be better option than having it with null value.
Request Query:
So I've come up with the below query for your use-case.
Also for the sake of aggregation, let's say I've calculated the total soldPrice for the docs having
listDate in the month of Jan 2020 AND
(soldDate either null OR soldDate before the month of Jan 2020) AND.
(offMarketDate either null OR offMarketDate before the month of Jan 2020).
Below is the query:
POST listings/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"listDate": {
"gte": "2020-01-01",
"lte": "2020-02-01"
}
}
},
{
"bool": {
"should": [
{
"range": {
"soldDate": {
"lte": "2020-01-01"
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "soldDate"
}
}
]
}
}
],
"minimum_should_match": 1
}
},
{
"bool": {
"should": [
{
"range": {
"offMarketDate": {
"lte": "2020-01-01"
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "offMarketDate"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
]
}
},
"aggs": {
"my_histogram": {
"date_histogram": {
"field": "listDate",
"calendar_interval": "month"
},
"aggs": {
"total_sales_price": {
"sum": {
"field": "soldPrice"
}
}
}
}
}
}
The query above is very easily readable and self explanatory. I'd suggest reading about the below different queries which I've made use of:
Bool Query
Range Query
Field Exists Query to verify if field exists or not.
Data Histogram Aggregation
Sum Metrics Aggregation
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 3.0,
"hits" : [
{
"_index" : "listings",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0,
"_source" : {
"listDate" : "2020-01-01",
"listPrice" : "100.00",
"soldDate" : "2019-12-25",
"soldPrice" : "120.00",
"offMarketDate" : "2019-12-20"
}
},
{
"_index" : "listings",
"_type" : "_doc",
"_id" : "2",
"_score" : 3.0,
"_source" : {
"listDate" : "2020-01-01",
"listPrice" : "100.00",
"soldDate" : "2019-12-24",
"soldPrice" : "122.00",
"offMarketDate" : "2019-12-20"
}
},
{
"_index" : "listings",
"_type" : "_doc",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"listDate" : "2020-01-25",
"listPrice" : "120.00"
}
}
]
},
"aggregations" : {
"my_histogram" : {
"buckets" : [
{
"key_as_string" : "2020-01-01T00:00:00.000Z",
"key" : 1577836800000,
"doc_count" : 3,
"total_sales_price" : {
"value" : 242.0
}
}
]
}
}
}
As expected, documents 1,2 and 5 are showing up with the correct aggregated sum of soldPrice.
Hope that helps!

Elasticsearch filter by multiple fields in an object which is in an array field

The goal is to filter products with multiple prices.
The data looks like this:
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
I would like to filter by membershipLevel and price. For example, if I am a silver member and query price range 0-10, the product should not appear, but if I am a gold member, the product "a" should appear. Is this kind of query supported by Elasticsearch?
You need to make use of nested datatype for price and make use of nested query for your use case.
Please see the below mapping, sample document, query and response:
Mapping:
PUT my_price_index
{
"mappings": {
"properties": {
"name":{
"type":"text"
},
"price":{
"type":"nested",
"properties": {
"membershipLevel":{
"type":"keyword"
},
"price":{
"type":"double"
}
}
}
}
}
}
Sample Document:
POST my_price_index/_doc/1
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
Query:
POST my_price_index/_search
{
"query": {
"nested": {
"path": "price",
"query": {
"bool": {
"must": [
{
"term": {
"price.membershipLevel": "Gold"
}
},
{
"range": {
"price.price": {
"gte": 0,
"lte": 10
}
}
}
]
}
},
"inner_hits": {} <---- Do note this.
}
}
}
The above query means, I want to return all the documents having price.price range from 0 to 10 and price.membershipLevel as Gold.
Notice that I've made use of inner_hits. The reason is despite being a nested document, ES as response would return the entire set of document instead of only the document specific to where the query clause is applicable.
In order to find the exact nested doc that has been matched, you would need to make use of inner_hits.
Below is how the response would return.
Response:
{
"took" : 128,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9808291,
"_source" : {
"name" : "a",
"price" : [
{
"membershipLevel" : "Gold",
"price" : "5"
},
{
"membershipLevel" : "Silver",
"price" : "50"
},
{
"membershipLevel" : "Bronze",
"price" : "100"
}
]
},
"inner_hits" : {
"price" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "price",
"offset" : 0
},
"_score" : 1.9808291,
"_source" : {
"membershipLevel" : "Gold",
"price" : "5"
}
}
]
}
}
}
}
]
}
}
Hope this helps!
Let me take show you how to do it, using the nested fields and query and filter context. I will take your example to show, you how to define index mapping, index sample documents, and search query.
It's important to note the include_in_parent param in Elasticsearch mapping, which allows us to use these nested fields without using the nested fields.
Please refer to Elasticsearch documentation about it.
If true, all fields in the nested object are also added to the parent
document as standard (flat) fields. Defaults to false.
Index Def
{
"mappings": {
"properties": {
"product": {
"type": "nested",
"include_in_parent": true
}
}
}
}
Index sample docs
{
"product": {
"price" : 5,
"membershipLevel" : "Gold"
}
}
{
"product": {
"price" : 50,
"membershipLevel" : "Silver"
}
}
{
"product": {
"price" : 100,
"membershipLevel" : "Bronze"
}
}
Search query to show Gold with price range 0-10
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Gold"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "so-60620921-nested",
"_type": "_doc",
"_id": "1",
"_score": 1.0296195,
"_source": {
"product": {
"price": 5,
"membershipLevel": "Gold"
}
}
}
]
Search query to exclude Silver, with same price range
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Silver"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Above query doesn't return any result as there isn't any matching result.
P.S :- this SO answer might help you to understand nested fields and query on them in detail.
You have to use Nested fields and nested query to archive this: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html
Define you Price property with type "Nested" and then you will be able to filter by every property of nested object

Resources