how to combine and with or in an elastic search query - elasticsearch

I'm working on a product search function using elasticsearch and am having trouble figuring out how to represent the following logic in a nested query:
(A or B) && (C or D)
I want this to work like a traditional programming language where it must match one of each set of or conditions in order for the product to be a match (e.g., I don't want the or conditions to just boost the score I want the products that don't match at least one condition to not be selected).
In my particular case A,B,C,D are all tests against a nested property (a list of category records).
Here are two sample index records to illustrate:
{
"ProductId":1111,
"Name":"First Product",
"AllCategories":[
{"CatId":15,"CatName":"Some Tag Name", "ParentCatId":99, "ParentCatName":"Tags"},
{"CatId":352,"CatName":"Some child menu", "ParentCatId":88, "ParentCatName":"Some parent menu"}
]
},
{
"ProductId":2222,
"Name":"Second Product",
"AllCategories":[
{"CatId":20,"CatName":"Some Tag Name2", "ParentCatId":99, "ParentCatName":"Tags"},
{"CatId":352,"CatName":"Some child menu", "ParentCatId":88, "ParentCatName":"Some parent menu"}
]
}
I've tried lots of different variants of my query but haven't been able to find one that works the way I want. This ticket is asking the same basic question but the only provided answer isn't working for me (my code below is modeled after the answer from this ticket: Elastic search combine two must with OR
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"AllCategories",
"query":{
"bool": {
"must": [
{
"bool":{
"minimum_should_match": 1,
"should":[
{"term":{"AllCategories.CatId":{"value":352} } },
{"term":{"AllCategories.ParentCatId":{"value":352} } }
]
}
},
{
"bool":{
"minimum_should_match": 1,
"should":[
{"term":{"AllCategories.CatId":{"value":15} } },
{"term":{"AllCategories.CatId":{"value":8 } } }
]
}
}
]
}
}
}
}]
}
}
}
UPDATE:
Based on the posted answer I reformatted the query as follows but it's still not working for me. It's the second bool inside the nested query that's causing the trouble. I wonder if it might be an issue with testing the same field in the nested subquery in both conditions (AllCategories.CatId):
{
"query":{
"bool":{
"must":[
{
"nested":{
"path":"AllCategories",
"query":{
"bool": {
"minimum_should_match": 2,
"should": [
{
"bool":{
"minimum_should_match": 1,
"should":[
{"term":{"AllCategories.CatId":{"value":352} } },
{"term":{"AllCategories.ParentCatId":{"value":352} } }
]
}
},
{
"bool":{
"minimum_should_match": 1,
"should":[
{"term":{"AllCategories.CatId":{"value":15} } },
{"term":{"AllCategories.CatId":{"value":8 } } }
]
}
}
]
}
}
}
}]
}
}
}
This is the mapping for the index in question
{
"mappings": {
"properties": {
"ProductId": { "type": "integer" },
"Name": { "type": "text" },
"AllCategories": {
"type": "nested",
"properties": {
"CatId": { "type": "integer" },
"ParentCatId": { "type": "integer" },
"CatName": { "type": "text" },
"ParentCatName": { "type": "text" }
}
}
"SalesRank": { "type": "integer" }
}
}
}
Using the sample products, I want the search to return product 1111 but not product 2222 (product 1111 does contain one of cat 15 and cat8. product 2222 does not contain at least one of the two. Both products satisfy the first boolean condition as they both are linked to the cat 352). In my current testing, that second bool/should condition causes the search to return no results. If I remove that one I get matches.

this is a simplified version of what i'm running for one of my apis.
the outer bool contains a should and a minimum_should_match of 2 (this is your AND clause)
inside that should are 2 bool statements each contains one of the OR clauses
each OR clause is a should with a minimum_should_match of 1
Note: fieldA and fieldB are the two distinct fields, value A-D being the various values you want to test
//GET /index/type/_search
{
"from": 0,
"size": 1000,
"query": {
"bool": {
"minimum_should_match": 2,
"should": [
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"fieldA": "value a"
}
},
{
"term": {
"fieldA": "value b"
}
}
]
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"term": {
"fieldB": "value c"
}
},
{
"term": {
"fieldB": "value d"
}
}
]
}
}
]
}
}
}

Related

what is purpose in must nested in filter elasticsearch?

what's difference between the following es filter query?
1. filter context for multi query conditions:
{
"query": {
"bool": {
"filter": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
must in filter context:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
]
}
}
}
The first query is used in scenarios where you just want to filter using AND operator on different fields. By default if you write filter query in this way, it would be executed as AND operation.
The second query, in your case/scenario, does exactly as the first query (no difference, just two ways of doing same thing), however the reason we can "also" do that is to implement/cover more complex filter use-cases that uses many different AND and OR combinations.
Note that in Elasticsearch AND is represented by must while OR is represented by should clauses.
Let's say I would want to filter a scenario like I want all documents having
sales from department 101 or
sales from department 101B along with price > 150.
You probably would have to end up writing query in the below way:
POST sometestindex/_search
{
"query":{
"bool":{
"filter":[
{
"bool":{
"should":[
{
"term":{
"dept.keyword":"101"
}
},
{
"bool":{
"must":[
{
"term":{
"dept.keyword":"101B"
}
},
{
"range":{
"price":{
"gte":150
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
]
}
}
}
In short, for your scenario, first query is just a short-hand way of writing the second-query, however if you have much more complex filter logic, then you need to leverage the Bool query inside your filter as you've mentioned in your second query, as I've mentioned in the sample example.
Hope that clarifies!

How to join ElasticSearch query with multi_match, boosting, wildcard and filter?

I'm trying to acheve this goals:
Filter out results by bool query, like "status=1"
Filter out results by bool range query, like "discance: gte 10 AND lte 60"
Filter out results by match at least one int value from int array
Search words in many fields with calculating document score. Some fields needs wildcard, some boosting, like importantfield^2, somefield*, someotherfield^0.75
All above points join by AND operator. All terms in one point join by OR operator.
Now I wrote something like this, but wildcards not working. Searching "abc" don't finds "abcd" in "name" field.
How to solve this?
{
"filtered": {
"query": {
"multi_match": {
"query": "John Doe",
"fields": [
"*name*^1.75",
"someObject.name",
"tagsArray",
"*description*",
"ownerName"
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"status": 2
}
},
{
"bool": {
"should": [
{
"term": {
"someIntsArray": 1
}
},
{
"term": {
"someIntsArray": 5
}
}
]
}
},
{
"range": {
"distanceA": {
"lte": 100
}
}
},
{
"range": {
"distanceB": {
"gte": 50,
"lte": 100
}
}
}
]
}
}
}
}
Mappings:
{
"documentId": {
"type": "integer"
},
"ownerName": {
"type": "string",
"index": "not_analyzed"
},
"description": {
"type": "string"
},
"status": {
"type": "byte"
},
"distanceA": {
"type": "short"
},
"createdAt": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"distanceB": {
"type": "short"
},
"someObject": {
"properties": {
"someObject_id": {
"type": "integer"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
},
"someIntsArray": {
"type": "integer"
},
"tags": {
"type": "string",
"index": "not_analyzed"
}
}
You can make use of Query String if you would want to apply wildcard for multiple fields and at the same time apply various boosting values for individual fields:
Below is how your query would be:
POST <your_index_name>/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"abc*",
"fields":[
"*name*^1.75",
"someObject.name",
"tagsArray",
"*description*",
"ownerName"
]
}
}
],
"filter":{
"bool":{
"must":[
{
"term":{
"status":"2"
}
},
{
"bool":{
"minimum_should_match":1,
"should":[
{
"term":{
"someIntsArray":1
}
},
{
"term":{
"someIntsArray":5
}
}
]
}
},
{
"range":{
"distanceA":{
"lte":100
}
}
},
{
"range":{
"distanceB":{
"gte": 50,
"lte":100
}
}
}
]
}
}
}
}
}
Note that for the field someIntsArray, I've made use of "minimum_should_match":1 so that you won't end up with documents that'd have neither of those values.
Updated Answer:
Going by the updated comment, you can have the fields with wildcard search used by query_string and you can make use of simple match query with boosting as shown in below. Include both these queries (can even add more match queries depending on your requirement) in a combine should clause. That way you can control where wildcard query can be used and where not.
{
"query":{
"bool":{
"should":[
{
"query_string":{
"query":"joh*",
"fields":[
"name^2"
]
}
},
{
"match":{
"description":{
"query":"john",
"boost":15
}
}
}
],
"filter":{
"bool":{
"must":[
{
"term":{
"status":"2"
}
},
{
"bool":{
"minimum_should_match":1,
"should":[
{
"term":{
"someIntsArray":1
}
},
{
"term":{
"someIntsArray":5
}
}
]
}
},
{
"range":{
"distanceA":{
"lte":100
}
}
},
{
"range":{
"distanceB":{
"lte":100
}
}
}
]
}
}
}
}
}
Let me know if this helps

In ElasticSearch, how do I filter the nested documents in my result?

Suppose, in ElasticSearch 5, I have data with nesting like:
{"number":1234, "names": [
{"firstName": "John", "lastName": "Smith"},
{"firstName": "Al", "lastName": "Jones"}
]},
...
And I want to query for hits with number 1234 but return only the names that match "lastName": "Jones", so that my result omits names that don't match. In other words, I want to get back only part of the matching document, based on a term query or similar.
A simple nested query won't do, as such would be filtering top-level results. Any ideas?
{ "query" : { "bool": { "filter":[
{ "term": { "number":1234} },
???? something with "lastName": "Jones" ????
] } } }
I want back:
hits: [
{"number":1234, "names": [
{"firstName": "Al", "lastName": "Jones"}
]},
...
]
hits section returns a _source - this is exactly the same document you have indexed.
You are right, nested query filters top-level results, but with inner_hits it will show you which inner nested objects caused these top-level documents to be returned, and this is exactly what you need.
names field can be excluded from top-level hits using _source parameter.
{
"_source": {
"excludes": ["names"]
},
"query":{
"bool":{
"must":[
{
"term":{
"number":{
"value":"1234"
}
}
},
{
"nested":{
"path":"names",
"query":{
"term":{
"names.lastName":"Jones"
}
},
"inner_hits":{
}
}
}
]
}
}
}
So now top-level documents are returned without names field, and you have an additional inner_hits section with the names that match.
You should treat nested objects as part of a top-level document.
If you really need them to be separate - consider parent/child relations.
Try something like this
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
{ "term": { "number":1234} }
},
{
"nested": {
"path": "something",
"query": {
"term": {
"something.lastName": "Jones"
}
},
"inner_hits" : {}
}
}
]
}
}
}
}
}
I used this Refrence
Similar but a bit different, use the should parameter and then look at inner hits for the names. This will return the top level doc and then inner_hits will have any hits.
{
"_source": {
"excludes": ["names"]
},
"query":{
"bool":{
"must":[
{
"term":{
"number":{
"value":"1234"
}
}
}
],
should: [
{
"nested":{
"path":"names",
"query":{
"term":{
"names.lastName":"Jones"
}
},
"inner_hits":{
}
}
}
]
}
}
}

Elasticsearch get all parents with no children

Originally I've been trying to get a list of parents and a single most recent child for each one of them. I've figured how to do that with the following query
{"query":
{"has_child":
{"inner_hits":
{"name": "latest", "size": 1, "sort":
[{"started_at": {"order": "desc"}}]
},
"type": "child_type",
"query": {"match_all": {}}
}
}
}
But the problem is — the results do not include parents with no children. Adding min_children: 0 doesn't help either. So I thought I could make a query for all parents with no children and combine those two in a single OR query. But I'm having trouble building such a query. Would appreciate any suggestions.
Here is your query:
{
"query":{
"bool":{
"should":[
{
"bool":{
"must_not":[
{
"has_child":{
"type":"child_type",
"query":{
"match_all":{}
}
}
}
]
}
},
{
"has_child":{
"inner_hits":{
"name":"latest",
"size":1, "sort":[{"started_at": {"order": "desc"}}]
},
"type":"child_type",
"query":{
"match_all":{}
}
}
}
]
}
}
}
Another point: just use must_not for has_child will not only show parents without child, but all the child(s) as well, because they all don't have any child...
So another limitation should be added in the bool query:
{
"query":{
"bool": {
"must_not": [
{
"has_child": {
"type": "<child-type>",
"query": {
"match_all": {}
}
}
}
],
"should": [
{
"term": {
"<the join field>": {
"value": "<parent-type>"
}
}
}
]
}
}
}

multiple search conditions in one query in es and distinguish the items according to the conditions

For one case I need to put multiple search conditions in one query to reduce the number of queries we need.
However, I need to distinguish the returning items based on the conditions.
Currently I achieved this goal by using function score query, specifically: each condition is assigned with a score, and I can differentiate the results based on those scores.
However, the performance is not that good. Plus now we need to get the doc count of each condition.
So is there any way to do it? I'm thinking using aggregation, but not sure if I can do it.
Thanks!
update:
curl -X GET 'localhost:9200/locations/_search?fields=_id&from=0&size=1000&pretty' -d '{
"query":{
"bool":{
"should":[
{
"filtered":{
"filter":{
"bool":{
"must":[{"term":{"city":"new york"}},{"term":{"state":"ny"}}]
}
}
}
},
{
"filtered":{
"filter":{
"bool":{
"must":[{"term":{"city":"los angeles"}},{"term":{"state":"ca"}}]
}
}
}
}
]
}
}}'
Well to answer the first part of your question , names queries are the best.
For eg:
{
"query": {
"bool": {
"should": [
{
"match": {
"field1": {
"query": "qbox",
"_name": "firstQuery"
}
}
},
{
"match": {
"field2": {
"query": "hosted Elasticsearch",
"_name": "secondQuery"
}
}
}
]
}
}
}
This will return an additional field called matched_queries for each hit which will have the information on queries matched for that document.
You can find more info on names queries here
But this this information cant be used for aggregation.
So you need to handle the second part of your question in a separate manner.
Filter aggregation for each query type would be the idea solution here.
For eg:
{
"query": {
"bool": {
"should": [
{
"match": {
"text": {
"query": "qbox",
"_name": "firstQuery"
}
}
},
{
"match": {
"source": {
"query": "elasticsearch",
"_name": "secondQuery"
}
}
}
]
}
},
"aggs": {
"firstQuery": {
"filter": {
"term": {
"text": "qbox"
}
}
},
"secondQuery": {
"filter": {
"term": {
"source": "elasticsearch"
}
}
}
}
}
You can find more on filter aggregation here

Resources