How to return 0 for requested data in aggregation if no documents matched - elasticsearch

I have an index of users with structure:
User
book_ids:[] //array of book ids
books : [{
book_id:
name:
}] //array of books
I want to create a query that returns a map of Book Id and number of users that read it.
The result of the query should include books that are not used by any user.
I have a very simplified version of the query:
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"books",
"query": {
"bool": {
"must": {
"terms": {
"books.book_id": [100,200] //book ids that provided as a parameter
}
}
}
}
}
}
]
}
},
"aggs":{
"books":{
"terms":{
"field":"book_ids",
"include":[100,200] //book ids that provided as a parameter
}
}
},
"size":0
}
The result of the query will be
buckets: [
{key: 100, doc_count: 53}
]
So there are 53 users who read the book with id 100, but there is no user who reads book with id 200(as we don't have it in response).
The question here is how can I change the query to get a following result:
buckets: [
{key: 100, doc_count: 53},
{key: 200, doc_count: 0}
]

Terms aggregations doesn't add the bucket in the result if a given term does not exist in the index.
You can use filters aggregation for this purpose:
{
"query": {
...
},
"aggs": {
"books": {
"filters": {
"filters": {
"100": { "match": { "book_ids": 100 } },
"200": { "match": { "book_ids": 200 } }
}
}
}
},
"size": 0
}
To reproduce
# post some books ids, with the 5 missing
POST /_bulk
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [1, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [4, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [6, 2, 3] }
{ "index" : { "_index" : "72201832" } }
{ "book_ids": [7, 2, 3] }
GET /72201832/_search
{
"size": 0,
"aggs": {
"books": {
"filters": {
"filters": {
"1": { "term": {"book_ids": "1"} },
"2": { "term": {"book_ids": "2"} },
"3": { "term": {"book_ids": "3"} },
"4": { "term": {"book_ids": "4"} },
"5": { "term": {"book_ids": "5"} },
"6": { "term": {"book_ids": "6"} },
"7": { "term": {"book_ids": "7"} }
}
}
}
}
}

Related

Elasticsearch filter by nested fields

I have a problem with creating a query to Elasticsearch with many conditions. My model looks like:
data class Product(
#Id
val id: String? = null,
val category: String,
val imagesUrls: List<String>,
#Field(type = FieldType.Double)
val price: Double?,
#Field(type = FieldType.Nested)
val parameters: List<Parameter>?
)
data class Parameter(
val key: String,
val values: List<String>
)
I would like to query products by:
category (for example cars)
price (between 20k $ and 50k $)
and parameters -> For example products with many parameters, like key capacity values 4L, 5L and second parameter gear transmission values manual
My current query looks like this:
GET data/_search
{
"size": 10,
"query": {
"bool": {
"must": [
{
"term": {
"category.keyword": {
"value": "cars"
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{"term": {
"parameters.key.keyword": {
"value": "Capacity"
}
}},
{
"term": {
"parameters.key": {
"value": "4L, 5L"
}
}
}
]
}
}
}
}
]
}
}
Could you tell me how to filter the product when parameter key is equal to Capacity and check that the values list contains one of the values?
How to combine many this kind operations in one query?
Example data:
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":"4L"
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
The search query shown below queries the data based on:
category (for example cars)
And parameters -> For example products with many parameters, like key capacity values 4L, 5L and second parameter gear transmission
values manual
Adding a working example with index data, mapping, search query, and search result
Index Mapping:
{
"mappings": {
"properties": {
"parameters": {
"type": "nested"
}
}
}
}
Index Data:
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"gear transmission",
"values":["4L","5L"]
},
{
"key":"capacity",
"values":"automcatic"
}
]
}
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":["4L","5L"]
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
{
"category":"cars",
"name":"Ferrari",
"price":50000,
"parameters":[
{
"key":"capacity",
"values":"4L"
},
{
"key":"gear transmission",
"values":"automcatic"
}
]
}
Search Query:
{
"query": {
"bool": {
"must": [
{
"term": {
"category.keyword": {
"value": "cars"
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{
"match": {
"parameters.key": "capacity"
}
},
{
"terms": {
"parameters.values": [
"4l",
"5l"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "parameters",
"query": {
"bool": {
"must": [
{
"match": {
"parameters.key": "gear transmission"
}
},
{
"match": {
"parameters.values": "automcatic"
}
}
]
}
}
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "bstof",
"_type": "_doc",
"_id": "1",
"_score": 3.9281754,
"_source": {
"category": "cars",
"name": "Ferrari",
"price": 50000,
"parameters": [
{
"key": "capacity",
"values": "4L"
},
{
"key": "gear transmission",
"values": "automcatic"
}
]
}
},
{
"_index": "bstof",
"_type": "_doc",
"_id": "2",
"_score": 3.9281754,
"_source": {
"category": "cars",
"name": "Ferrari",
"price": 50000,
"parameters": [
{
"key": "capacity",
"values": [
"4L",
"5L"
]
},
{
"key": "gear transmission",
"values": "automcatic"
}
]
}
}
]
When you need to match any one from a list then you can use terms query instead of term. Update the part in query from:
{
"term": {
"parameters.key": {
"value": "4L, 5L"
}
}
}
to below:
{
"terms": {
"parameters.values": {
"value": [
"4L",
"5L"
]
}
}
}
Note that if parameters.key is analysed field and there exist a keyword sub-field for the same, then use it instead. e.g parameters.values.keyword
You can read more on terms query here.

Elasticsearch aggregation over children document field values

I'm facing the following problem of selecting and sorting parent documents based on an aggregated value over its children documents. The aggregation (e.g. sum) itself depends on a query string, i.e. which children documents are relevant for the aggregation.
Example: Given the documents basket A and basket B, for each basket document, I am looking to sum over the number field of its fruit children if the name field matches my query, e.g. apples.
PUT /baskets/_doc/0
{
"name": "basket A",
"fruit": [
{
"name": "apples",
"number": 2
},
{
"name": "oranges",
"number": 3
}
]
}
PUT /baskets/_doc/1
{
"name": "basket B",
"fruit": [
{
"name": "apples",
"number": 3
},
{
"name": "apples",
"number": 3
}
]
}
Mappings:
PUT /baskets
{
"mappings": {
"properties": {
"name": { "type": "text" },
"fruit": {
"type": "nested",
"properties": {
"name": { "type": "text" },
"number": { "type": "long" }
}
}
}
}
}
Use case 1: Which basket has (strictly) more than 5 apples? Would expect only basket B
Use case 2: Sort baskets by number of apples. Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.
How can one implement this using the Elasticsearch (7.8.0) query DSL?
I have tried so far with nested queries and aggregations without success.
Thanks!
Edit: Added mappings
Edit: Updated the numbers to better reflect the problem
*Edit: Added possible answer to Use case 2 (see comments to the answer from #joe):
GET /profiles/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name",
"order": {"nest > fruit_filter > fruit_sum": "desc"}
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"term": {"fruit.name": "apple"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
}
}
}
}
}
Use case 1:
GET baskets/_search
{
"query": {
"nested": {
"path": "fruit",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"term": {
"fruit.name": {
"value": "apples"
}
}
},
{
"range": {
"fruit.number": {
"gte": 5
}
}
}
]
}
}
}
}
}
Strictly more than 5 --> gt; >=5 --> gte.
Also notice the inner_hits part -- this gives you the actual nested subdocument which caused this particular basket to match the query. It's not required but good-to-know.
Use case 2:
GET baskets/_search
{
"sort": [
{
"fruit.number": {
"nested_path": "fruit",
"order": "desc"
}
}
]
}
Use case 2 Edit:
There are probably cleaner ways of doing this but I'd go with the following:
GET baskets/_search
{
"size": 0,
"aggs": {
"multiply_and_add": {
"scripted_metric": {
"params": {
"only_fruit_name": "apples"
},
"init_script": "state.by_basket_name = [:]",
"map_script": """
def basket_name = params._source['name'];
def fruits = params._source['fruit'].findAll(group -> group.name == params.only_fruit_name);
for (def fruit_group : fruits) {
def number = fruit_group.number;
if (state.by_basket_name.containsKey(basket_name)) {
state.by_basket_name[basket_name] += number;
} else {
state.by_basket_name[basket_name] = number;
}
}
""",
"combine_script": "return state.by_basket_name",
"reduce_script": "return states"
}
}
}
}
yielding a hash map along the lines of
{
...
"aggregations":{
"multiply_and_add":{
"value":[
{
"basket A":2,
"basket B":6
}
]
}
}
}
Sorting can either be done in the reduce_script or within your ES response post-processing pipeline. You could of course choose to go w/ (sorted) lists and lambdas...
Notice the required nested_path.
After a while of searching and testing, here are (in addition to #joe's answer to use case 2) possible queries for both use cases. Note that both use cases require to change the mapping for the field name to be of type keyword.
Use case 1: Which basket has (strictly) more than 5 apples? Would expect only basket B
For more information on filtering results by their aggregation value see Bucket Selectors
GET /baskets/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name"
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"match": {"fruit.name": "apples"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
},
"basket_sum_filter":{
"bucket_selector":{
"buckets_path":{
"fruitSum":"nest > fruit_filter > fruit_sum"
},
"script":"params.fruitSum > 5"
}
}
}
}
}
}
... yielding
...,
"buckets": [
{
"key": "basket B",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 2,
"fruit_sum": {
"value": 6
}
}
}
}
]
Use case 2: Sort baskets by number of apples. Would expect basket B with a total of 6 apples, then basket A with a total of 2 apples.
GET /baskets/_search
{
"aggs": {
"aggs_baskets": {
"terms": {
"field": "name",
"order": {"nest > fruit_filter > fruit_sum": "desc"}
},
"aggs": {
"nest":{
"nested":{
"path": "fruit"
},
"aggs":{
"fruit_filter":{
"filter": {
"term": {"fruit.name": "apple"}
},
"aggs":{
"fruit_sum":{
"sum": {"field": "fruit.number"}
}
}
}
}
}
}
}
}
}
... yielding
...,
"buckets": [
{
"key": "basket B",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 2,
"fruit_sum": {
"value": 6
}
}
}
},
{
"key": "basket A",
"doc_count": 1,
"nest": {
"doc_count": 2,
"fruit_filter": {
"doc_count": 1,
"fruit_sum": {
"value": 2
}
}
}
}
]

ElasticSearch simple query

I have structure like this in my ElasticSearch
{
_index: 'index',
_type: 'product',
_id: '896',
_score: 0,
_source: {
entity_id: '896',
category: [
{
category_id: 2,
is_virtual: 'false'
},
{
category_id: 82,
is_virtual: 'false'
}
]
}
}
I want return all "producs" that have "82" category_id.
{
"query": {
"bool": {
"filter": {
"terms": {
"category.category_id": [
82
]
}
}
}
}
}
This query gives me 0 hits.
What is right way to do this?
Adding working example, you need to define the category as nested field and modify your search query by including the nested path
Index Mapping
{
"mappings": {
"properties": {
"entity_id": {
"type": "text"
},
"category": {
"type": "nested"
}
}
}
}
Index your document
{
"entity_id": "896",
"category": [
{
"category_id": 2,
"is_virtual": false
},
{
"category_id": 82,
"is_virtual": false
}
]
}
Proper search query, note we are using nested query which doesn't support normal filter(so your query gives error)
{
"query": {
"nested": {
"path": "category",
"query": {
"bool": {
"must": [
{
"match": {
"category.category_id": 82
}
}
]
}
}
}
}
}
Search result retuns indexed doc
"hits": [
{
"_index": "complexnested",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"entity_id": "896",
"category": [
{
"category_id": 2,
"is_virtual": false
},
{
"category_id": 82,
"is_virtual": false
}
]
}
}
]
If your query gives you no results, I suspect that category is of type nested in your index mapping. If that's the case, that's good and you can modify your query like this to use the nested query:
{
"query": {
"bool": {
"filter": {
"nested": {
"path": "category",
"query": {
"terms": {
"category.category_id": [
82
]
}
}
}
}
}
}
}

Elasticsearch: don't return document if any of nested object field matches term value

I struggle with writing a query that should not return a document if any of its nested objects field value matches a term value passed in a query.
Document sample:
{
"id": 1,
"test": "name",
"rules": [
{
"id": 2,
"name": "rule3",
"questionDetailConditionalRules": [
{
"questionDetailId": 1
},
{
"questionDetailId": 2
}
]
},
{
"id": 3,
"name": "rule3",
"questionDetailConditionalRules": [
{
"questionDetailId": 4
},
{
"questionDetailId": 5
}
]
}
]
}
The rule field has nested type
My nested search query is:
{
"query": {
"nested": {
"path": "rules",
"query": {
"bool": {
"must_not": [
{
"terms": {
"rules.questionDetailConditionalRules.questionDetailId": [
1
]
}
}
]
}
}
}
}
}
Expected result: the document should not be returned
Actual result: document is returned.
Should I miss anything in my query?
Was able to reproduce your issue and fixed it, please find step by step solution to make it work. you need to move nested inside the must_not block and some modification to your query.
Index def
{
"mappings" :{
"properties" :{
"rules" :{
"type" : "nested"
}
}
}
}
Index your sample doc
{
"rules": [
{
"id": 2,
"name": "rule3",
"questionDetailConditionalRules": [
{
"questionDetailId": 1
},
{
"questionDetailId": 2
}
]
},
{
"id": 3,
"name": "rule3",
"questionDetailConditionalRules": [
{
"questionDetailId": 4
},
{
"questionDetailId": 5
}
]
}
]
}
Search Query
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "rules", --> note `nested` is inside the `must_not` block.
"query": {
"bool": {
"filter": [
{
"term": {
"rules.questionDetailConditionalRules.questionDetailId": 1
}
}
]
}
}
}
}
]
}
}
}
Search result
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
Note: you can find more info in this link.

ElasticSearch: Complex filter by nested document

I have following document structure:
{
product_name: "Product1",
product_id: 1,
...,
articles: [
{
article_name: 'Article 101',
id: 101,
some_param: 10,
clients: []
},
{
article_name: 'Article 102',
id: 102,
some_param: 11,
clients: [
{
client_id: 10001,
client_name: "some client 1001"
}
...
]
}
]
},
{
product_name: "Product2",
product_id: 2,
...,
articles: [
{
article_name: 'Article 101',
id: 101,
some_param: 10,
clients: []
},
{
article_name: 'Article 102',
id: 102,
some_param: 10,
clients: [
{
client_id: 10001,
client_name: "some client 1001"
}
...
]
}
]
}
I need to get documents (product) ONLY if some of its articles match 2 conditions (single article should match both conditions): articles.some_param = 10 AND articles.clients.client_id = 10001
So I need to get only product with id 2.
I'm using this query now, which is incorrect (and I know why), because it fetches both documents:
{
"query": {
"bool": {
"filter": [
{
"term": {
"articles.clients.id": 10001
}
},
{
"terms": {
"articles.some_param": 10
}
}
]
}
}
}
How can I write query which gets only products which has at least 1 article which matches both conditions: articles.some_param = 10 AND articles.clients.client_id = 10001
e.g., to get Product with ID 2 only?
Something like this:
{
"query": {
"nested": {
"path": "articles",
"query": {
"bool": {
"must": [
{
"term": {
"articles.some_param": {
"value": 10
}
}
},
{
"nested": {
"path": "articles.clients",
"query": {
"term": {
"articles.clients.id":{
"value": 10001
}
}
}
}
}
]
}
}
}
}
}
UPDATE:
Try wrap second query to bool.
{
"query": {
"nested": {
"path": "articles",
"query": {
"bool": {
"must": [
{
"term": {
"articles.some_param": {
"value": 10
}
}
},
{
"bool":{
"must" : [
{
"nested": {
"path": "articles.clients",
"query": {
"term": {
"articles.clients.id":{
"value": 10001
}
}
}
}
}
]
}
}
]
}
}
}
}
}
p.s. I could be mistaken with a path on the second nested query. Just couldn't check. So you can play around with the path on the second query.
p.p.s. The filter is not the query what you need. It does not calculate the scores

Resources