ElasticSearch NEST combining AND with OR queries - elasticsearch

Problem
How do you write NEST code to generate an elastic search query for this simple boolean logic?
term1 && (term2 || term3 || term4)
Pseudo code on my implementation of this logic using Nest (5.2) statement to query ElasticSearch (5.2)
// additional requirements
( truckOemName = "HYSTER" && truckModelName = "S40FT" && partCategoryCode = "RECO" && partID != "")
//Section I can't get working correctly
AND (
( SerialRangeInclusiveFrom <= "F187V-6785D" AND SerialRangeInclusiveTo >= "F187V-6060D" )
OR
( SerialRangeInclusiveFrom = "" || SerialRangeInclusiveTo = "" )
)
Interpretation of Related Documentation
The "Combining queries with || or should clauses" in Writing Bool Queries mentions
The bool query does not quite follow the same boolean logic you expect from a programming language. term1 && (term2 || term3 || term4) does not become
bool
|___must
| |___term1
|
|___should
|___term2
|___term3
|___term4
you could get back results that only contain term1
which is exactly what I think is happening.
But their answer to solve this is above my understanding of how to apply it with Nest. The answer is either?
Add parentheses to force evaluation order (i am)
Use boost factor? (what?)
Code
Here's the NEST code
var searchDescriptor = new SearchDescriptor<ElasticPart>();
var terms = new List<Func<QueryContainerDescriptor<ElasticPart>, QueryContainer>>
{
s =>
(s.TermRange(r => r.Field(f => f.SerialRangeInclusiveFrom)
.LessThanOrEquals(dataSearchParameters.SerialRangeEnd))
&&
s.TermRange(r => r.Field(f => f.SerialRangeInclusiveTo)
.GreaterThanOrEquals(dataSearchParameters.SerialRangeStart)))
//None of the data that matches these ORs returns with the query this code generates, below.
||
(!s.Exists(exists => exists.Field(f => f.SerialRangeInclusiveFrom))
||
!s.Exists(exists => exists.Field(f => f.SerialRangeInclusiveTo))
)
};
//Terms is the piece in question
searchDescriptor.Query(s => s.Bool(bq => bq.Filter(terms))
&& !s.Terms(term => term.Field(x => x.OemID)
.Terms(RulesHelper.GetOemExclusionList(exclusions))));
searchDescriptor.Aggregations(a => a
.Terms(aggPartInformation, t => t.Script(s => s.Inline(script)).Size(50000))
);
searchDescriptor.Type(string.Empty);
searchDescriptor.Size(0);
var searchResponse = ElasticClient.Search<ElasticPart>(searchDescriptor);
Here's the ES JSON query it generates
{
"query":{
"bool":{
"must":[
{
"term":{ "truckOemName": { "value":"HYSTER" }}
},
{
"term":{ "truckModelName": { "value":"S40FT" }}
},
{
"term":{ "partCategoryCode": { "value":"RECO" }}
},
{
"bool":{
"should":[
{
"bool":{
"must":[
{
"range":{ "serialRangeInclusiveFrom": { "lte":"F187V-6785D" }}
},
{
"range":{ "serialRangeInclusiveTo": { "gte":"F187V-6060D" }}
}
]
}
},
{
"bool":{
"must_not":[
{
"exists":{ "field":"serialRangeInclusiveFrom" }
}
]
}
},
{
"bool":{
"must_not":[
{
"exists":{ "field":"serialRangeInclusiveTo" }
}
]
}
}
]
}
},
{
"exists":{
"field":"partID"
}
}
]
}
}
}
Here's the query we'd like it to generate that seems to work.
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"term": { "truckOemName": { "value": "HYSTER" }}
},
{
"term": {"truckModelName": { "value": "S40FT" }}
},
{
"term": {"partCategoryCode": { "value": "RECO" }}
},
{
"exists": { "field": "partID" }
}
],
"should": [
{
"bool": {
"must": [
{
"range": { "serialRangeInclusiveFrom": {"lte": "F187V-6785D"}}
},
{
"range": {"serialRangeInclusiveTo": {"gte": "F187V-6060D"}}
}
]
}
},
{
"bool": {
"must_not": [
{
"exists": {"field": "serialRangeInclusiveFrom"}
},
{
"exists": { "field": "serialRangeInclusiveTo"}
}
]
}
}
]
}
}
]
}
}
}
Documentation
Combining Filters
Bool Query
Writing Bool Queries

With overloaded operators for bool queries, it is not possible to express a must clause combined with a should clause i.e.
term1 && (term2 || term3 || term4)
becomes
bool
|___must
|___term1
|___bool
|___should
|___term2
|___term3
|___term4
which is a bool query with two must clauses where the second must clause is a bool query where there has to be a match for at least one of the should clauses. NEST combines the queries like this because it matches the expectation for boolean logic within .NET.
If it did become
bool
|___must
| |___term1
|
|___should
|___term2
|___term3
|___term4
a document is considered a match if it satisfies only the must clause. The should clauses in this case act as a boost i.e. if a document matches one or more of the should clauses in addition to the must clause, then it will have a higher relevancy score, assuming that term2, term3 and term4 are queries that calculate a relevancy score.
On this basis, the query that you would like to generate expresses that for a document to be considered a match, it must match all of the 4 queries in the must clause
"must": [
{
"term": { "truckOemName": { "value": "HYSTER" }}
},
{
"term": {"truckModelName": { "value": "S40FT" }}
},
{
"term": {"partCategoryCode": { "value": "RECO" }}
},
{
"exists": { "field": "partID" }
}
],
then, for documents matching the must clauses, if
it has a serialRangeInclusiveFrom less than or equal to "F187V-6785D" and a serialRangeInclusiveFrom greater than or equal to "F187V-6060D"
or
serialRangeInclusiveFrom and serialRangeInclusiveTo
then boost that documents relevancy score. The crucial point is that
If a document matches the must clauses but does not match any
of the should clauses, it will still be a match for the query (but
have a lower relevancy score).
If that is the intent, this query can be constructed using the longer form of the Bool query

Related

ElasticSearch / OpenSearch term search with logical OR

I have been scratching my head for a while looking at OpenSearch documentation and stackoverflow questions. How can I do something like this:
Select documents WHERE studentId in [1234, 5678] OR applicationId in [2468, 1357].
As long as studentId exactly matches one of the supplied values, or applicationId exactly matches one of the supplied values, then that document should be included in the response.
When I want to search for multiple values for a single field and get an exact match the following works:
{
"must":[
{
"terms": {
"studentId":["1234", "5678"]
}
}
]
}
This will find me exact matches on studentId in [1234, 5678].
If I try to add the condition to also look for (logical or) applicationId in [2468, 1357] then the following will not work:
{
"must":[
{
"terms": {
"studentId":["1234", "5678"]
}
},
{
"terms": {
"applicationId":["2468", "1357"]
}
}
]
}
because this will do a logical and on the two queries. I want logical or.
I cannot use should because this returns irrelevant results. The following does not work for me:
{
"should":[
{
"terms": {
"studentId":["1234", "5678"]
}
},
{
"terms": {
"applicationId":["2468", "1357"]
}
}
]
}
This seems to return all results, ranked by relevance. I find that the returned results do not actually match, despite the fact that this is a terms search.
Can you try with following query..
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"terms": {
"studentId":["1234", "5678"]
}
}
]
}
},
{
"bool": {
"must": [
{
"terms": {
"applicationId":["2468", "1357"]
}
}
]
}
}
]
}
}
}

How To Combine Multiple Queries In ElasticSearch

I am encountering an issue trying to correctly combine elastic search queries, in SQL my query would look something like this:
Select * from ABPs where (PId = 10 and PUId = 1130) or (PId = 30 and PUId = 2000) or (PlayerID = '12345')
I can achieve each of these by themselves and get correct results.
Query A) (PId = 10 and PUId = 1130)
translates to
{
"query": {
"bool": {
"must": [
{
"term": {
"PId": "1366"
}
},
{
"term": {
"PUId": "10"
}
}
]
}
}
}
Query B) (PId = 10 and PUId = 1130)
translates the same as above just with different values
Query C) (PlayerID = '12345')
translates to
{
"query": {
"match": {
"PlayerUuid": "62fe0832-7881-477c-88bb-9cbccdbfb3c3"
}
}
}
I have been trying to figure out how to get all of these into the same ES search query and I am just not having any luck at all and was hoping someone with more extensive ES experience would be able to give me a hand.
You can make use of Bool query using should(Logical OR) and must(Logical AND) clause.
Below is the ES query representation of the clause Select * from ABPs where (PId = 10 and PUId = 1130) or (PId = 30 and PUId = 2000) or (PlayerID = '12345')
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"PId": "10"
}
},
{
"term": {
"PUId": {
"value": "1130"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"PId": "30"
}
},
{
"term": {
"PUId": "2000"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"PlayerId": "12345"
}
}
]
}
}
]
}
}
}
Note that I'm assuming the fields PId, PUId and PlayerId are all of type keyword.
Wrap all your queries into a should-clause of a bool-query which you put in the filter-clause of another top-level bool-query.
Pseudo-code (as I’m typing on a cell phone):
“bool”: {
“filter”: {
“bool”: {
“should”: [
{query1},
{query2},
{query3}
]
}
}
}
In a bool- query made up of only should-clauses, will make it a requirement that at least one of the queries in the should-clause has to match (minimum_should_match-will be in such a scenario).
Update with the actual query (additional explanation):
POST <your_index_name>/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"should": [
{"bool": {{"must": [ {"term": {"PId": "10"}},{"term": {"PUId": "1130"}} ]}}},
{"bool": {{"must": [ {"term": {"PId": "30"}},{"term": {"PUId": "2000"}} ]}}},
{"term": {"PlayerId": "12345"}}
]
}
}
}
}
}
The example above is wrapping your actual bool-query in a filer-clause of another top-level bool-query to follow best-practices and guarantee for a better performance: whenever you don't care about the score, especially when it's always about exact-matching queries, you should put them into filter-clauses. For those queries Elasticsearch will not calculate a score and therefore can even potentially cache the results of that query for even better performance.

How to express optional matching using Query String Query

I have a query as below using the query string dsl which I'm expecting to return results that match all the three assertions. (status = active, price = 100 , language = en).
How can I make the language param optional, just score lower if is not matching instead to not match at all ?
(status:"active") AND (price:100) AND (language:"en")
You can try this
(status:active AND language:en) OR (price:100 AND language:en)
Or a shorter version like this:
+status:active +price:100 language:en
An equivalent query rewritten using bool and match queries is this one:
{
"bool": {
"must": [
{
"match": {
"status": "active"
}
},
{
"match": {
"price": 100
}
}
],
"should": {
"match": {
"language": "en"
}
}
}
}

Search document even when some fields are missing in elastic search

I want to search student based on centerId, courseId and batchId. For example i have student data as below.
{
"s1":{
"name":alex,
"centerId":"N001",
"courseId":"ncjava",
"batchId":"nb1"},
"s2":{
"name":John,
"centerId":"N001",
"courseId":"nc02",
"batchId":"ncb2"},
"s3":{
"name":David,
"centerId":"N001",
"courseId":"ncjava",
}
}
Now i want to search student where centerId,courseId and batchId matches and even want students that have matching centerId and courseId but where batchId is missing. I wrote below query
{
"query": {
"bool": {"must": [
{
"match": {
"centerId":"N001"
}},
{ "match": {
"courseId": "ncjava"
}}
],
"should":[
{
"match": {
"batchId": "nb1"
}
}
]
}
}
}
This query returns me all the student that matches with centerId and courseId. But it also returns me students who have different 'batchId'. I only want student when batchId is matched or it does not exists.
You can add query terms which are "bool", in order to make "or" logic like you want. batchId = X OR batchId is missing can be represented with a should expression (and batchId is missing with a must_not and exists), like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"centerId": "N001"
}
},
{
"match": {
"courseId": "ncjava"
}
},
{
"bool": {
"minimum_should_match": 1,
"should": [
{
"match": {
"batchId": "nb1"
}
},
{
"bool": {
"must_not": {
"exists": {
"field": "batchId"
}
}
}
}
]
}
}
]
}
}
}
You can consider must like "and", and should like "or" (though more flexible than boolean or), and must_not as boolean "not". So, the above query means something like centerId == N001 AND courseId == ncjava AND (batchId == nb1 OR NOT exists batchId).
In this particular context, minimum_should_match actually isn't required (the default behavior is already what you want), but since the behavior is different in different contexts, I like to include it explicitly, in case the query is edited in an unexpected way in the future (then the behavior of the should will remain the same despite the changed context). minimum_should_match of 1 means that at least 1 of the should clauses must match.
Here's the docs for each of these components:
bool query
exists query
minimum_should_match

A simple AND query with Elasticsearch

I am trying to do a simple query for two specified fields, and the manual and google is proving to be of little help. Example below should make it pretty clear what I want to do.
{
"query": {
"and": {
"term": {
"name.family_name": "daniel",
"name.given_name": "tyrone"
}
}
}
}
As a bonus question, why does it find "Daniel Tyrone" with "daniel", but NOT if I search for "Daniel". It behaves like a realy weird anti case sensitive search.
Edit: Updated, sorry. You need a separate Term object for each field, inside of a Bool query:
{
"query": {
"bool": {
"must" : [
{
"term": {
"name.family_name": "daniel"
}
},
{
"term": {
"name.given_name": "tyrone"
}
}
]
}
}
}
Term queries are not analyzed by ElasticSearch, which makes them case sensitive. A Term query says to ES "look for this exact token inside your index, including case and punctuation".
If you want case insensitivity, you could add a keyword + lowercase filter to your analyzer. Alternatively, you could use a query that analyzes your text at query time (like a Match query)
Edit2: You could also use And or Bool filters too.
I found a solution for at least multiple text comparisons on the same field:
{
"query": {
"match": {
"name.given_name": {
"query": "daniel tyrone",
"operator": "and"
}
}
}
And I found this for multiple fields, is this the correct way?
{
"query": {
"bool": {
"must": [
{
"match": {
"name.formatted": {
"query": "daniel tyrone",
"operator": "and"
}
}
},
{
"match": {
"display_name": "tyrone"
}
}
]
}
}
}
If composing the json with PHP, these 2 examples worked for me.
$activeFilters is just a comma separated string like: 'attractions, limpopo'
$articles = Article::searchByQuery(array(
'match' => array(
'cf_categories' => array(
'query' => $activeFilters,
'operator' =>'and'
)
)
));
// The below code is also working 100%
// Using Query String https://www.elastic.co/guide/en/elasticsearch/reference/1.4/query-dsl-query-filter.html
// https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html
/* $articles = Article::searchByQuery(array(
'query_string' => array(
'query' => 'cf_categories:attractions AND cf_categories:limpopo'
)
)); */
This worked for me: minimum_should_match is set to 2 since the number of parameters for the AND query are 2.
{
"query": {
"bool": {
"should": [
{"term": { "name.family_name": "daniel"}},
{"term": { "name.given_name": "tyrone" }}
],
"minimum_should_match" : 2
}
}
}

Resources