ElasticSearch lucene query with subclauses conversion to ES syntax - elasticsearch

I've been trying to convert a lucene style query to ES query syntax but I'm getting stuck on sub-clauses. e.g.
(title:history^10 or series:history) and (NOT(language:eng) OR language:eng^5) and (isfree eq 'true' OR (isfree eq 'false' AND owned eq 'abc^5'))
This states that "get me a match for history in 'title' or 'series' but boost the title match AND where the language doesn't have to be english, but if if is then boost it AND where the match is free or where it isn't free then make sure it's owned by customer abc".
I feel this is a tricky query but it seems to work correctly. Converting the clauses to ES syntax is confusing me as I don't really have the concept of brackets. I think I need to use bool queries... I have the following which I know doesn't apply the criteria correctly - it says you should have (language:eng OR isFree eq 'true' OR owned:abc). I can't seem to make the mental leap to build the must/should with NOT's in it.
Help please?
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "history",
"fields": [
"title^10.0",
"series"
]
}
}
],
"should": [
{
"term": {
"language": {
"value": "eng",
"boost": 5
}
}
},
{
"term": {
"isFree": {
"value": true
}
}
},
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
},

Your query is almost correct, the only thing that wasn't translated correctly was this part of the query:
(isfree eq 'true' OR (isfree eq 'false' AND owned eq 'abc^5'))
If I understand your post correctly, this is basically saying boost the 'owned' field by a factor of five when it's value is 'abc' and the price is free. To implement this, you need to use an additional bool query that:
Filters results by isFree: true
Boosts the owned field of any documents matching abc
"bool": {
"filter": [
{
"term": {
"isFree": {
"value": false
}
}
}
],
"must": [
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
Since this is not intended to limit the result set and only boost results that meet this criteria, the bool query above should be placed inside your parent bool's should section. The final query looks like:
POST /myindex/_search
{
"explain": true,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "history",
"fields": [
"title^10",
"series"
]
}
}
],
"should": [
{
"term": {
"language": {
"value": "eng",
"boost": 5
}
}
},
{
"bool": {
"filter": [
{
"term": {
"isFree": {
"value": false
}
}
}
],
"must": [
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
}
]
}
}
}
Note: Using should and must yield the same results for that inner bool, I honestly am not sure which would be better to use so I just arbitrarily used must.

Related

ElasticSearch multimatch substring search

I have to combine two filters to match requirements:
- a specific list of values in r.status field
- one of the multiple text fields contains the value.
Result query (with using Nest, but it doesn't matter) looks like:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"bool": {
"should": [
{
"match": {
"r.g.firstName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
},
{
"match": {
"r.g.lastName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
}
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
Also tried with multi_match query:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"multi_match": {
"query": "SUBSTRING_VALUE",
"fields": [
"r.g.firstName",
"r.g.lastName"
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
FirstName and LastName are configured in index mappings as text:
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
Elastic gives a lot of full-text search options: multi_match, phrase, wildcards etc. But all of them fail in my case looking a sub-string in my text fields. (terms query and isActive one work well, I just tried to run only them).
What options do I have also or maybe where I made a mistake?
UPD: Combined wildcards worked for me, but such query looks ugly. Looking for a more elegant solution.
The elasticsearch way is to use ngram tokenizer.
The ngram analyzer will split your terms with a sliding window. For example, the input "Hello World" will generate the following terms:
Hel
Hell
Hello
ell
ello
...
Wor
World
orl
...
You can configure the minimum and maximum size of the sliding window (in the example the minimum size is 3). Once the sub terms are generated you can use a match query an the subfield.
Another point, it is weird to use must within a filter. If you are interested in the score, you should always use must otherwise use filter. Read this article for a good understanding.

Elasticsearch: should + minimum_should_match vs must

I test with these 2 queries
Query with must
{
"size": 200,
"from": 0,
"query": {
"bool": {
"must": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
]
}
}
}
Query with should + minimum_should_match
{
"size": 200,
"from": 0,
"query": {
"bool": {
"should": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
],
minimum_should_match: 3
}
}
}
Both queries give me same result, I don't know the difference between these 2, when we should use minimum_should_match?
I guess you mean minimum_number_should_match, right?
In both cases it would be the same because you have the same number of clauses in should. minimum_number_should_match usually is used when you have more clauses than the number you specify there.
For example if you have 5 should clauses, but for some reason you only need three of them to be fulfilled you would do something like this:
{
"query": {
"bool": {
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
},
{
"term": {
"tag": "tech"
}
},
{
"term": {
"user": "plchia"
}
},
{
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
],
"minimum_should_match": 3
}
}
}
That's correct and desired behavior. Let's decipher it a little bit:
Boolean query with must clauses means that all clauses under must section are required to match. Just like in English - it means strong obligation.
Boolean query with should clauses means that some clauses are required to match, whereas the others are not (i.e. soft obligation). The default number of clauses that must match here is simply 1. And to override this behavior the minimum_should_match parameter is coming into play. If you specify minimum_should_match=3 it will mean 3 clauses under should must match. From the practical perspective it exactly the same as specifying those clauses with must.
Hope it explains it in details.

match query on elastic search with multiple or conditions

I have three fields status,type and search. What I want is to search the data which contains status equals to NEW or status equals to IN PROGRESS and type is equal to abc or type equals to xyz and search contains( partial match ).
My call looks like below -
{
"query": {
"bool" : {
"must" : [{
"match": {
"status": {
"query": "abc",
}
}
}, {
"match": {
"type": {
"query": "NEW",
}
}
},{
"query_string": {
"query": "*abc*", /* for partial search */
"fields": ["title", "name"]
}
}]
}
}
}
Nest your boolqueries. I think what you are missing is this:
"bool": { "should": [
{ "match": { "status": "abc" } },
{ "match": { "status": "xyz" } }
]}
This is a query which MUST match one of the should clauses as only should clauses are given.
EDIT to explain the differences:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"status": "abc"
}
},
{
"match": {
"status": "xyz"
}
}
]
}
},
{
"terms": {
"type": [
"NEW",
"IN_PROGRESS"
]
}
},
{
"query_string": {
"query": "*abc*",
"fields": [
"title",
"name"
]
}
}
]
}
}
}
So you have a boolquery at top. Every of the 3 inner queries must be true.
The first is a nested boolquery which is true if status matches either abc or xyz.
The second is true if type matches exactly NEW or IN_PROGRESS - Note the difference here. The First one would also match ABC or aBc or potentially "abc XYZ" depending on your analyzer. You might want terms for both.
The third is what you had before.

Boosting has no effect in a Boolean-filtered query in Elasticsearch

I'm trying to add a boost to documents that match to a term filter. The basis is a Boolean/MatchAll query. But the boosting in my Elasticsearch query has no effect. All result scores are set to 1:
curl -XPOST localhost:9200/wiki_content/_search?pretty -d '
{
"_source": [
"title"
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [
{
"bool": {
"should": [
{
"term": {
"title.keyword": {
"value": "Main Page",
"boost": 9
}
}
},
{
"term": {
"title.keyword": {
"value": "Top Page",
"boost": 999
}
}
}
]
}
}
]
}
}
}
'
However, when using a filtered query, the boosting works. But due to restrictions in my system I cannot use a filtered query. So is there any method to make the boosting in the original query work?
In the filter part of the query, boosting will have no effect, as the filters only job is to, ehhm, filter queries that match certain values. Try instead:
curl -XPOST localhost:9200/wiki_content/_search?pretty -d '
{
"_source": [
"title"
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"should": [
{
"term": {
"title.keyword": {
"value": "Main Page",
"boost": 9
}
}
},
{
"term": {
"title.keyword": {
"value": "Top Page",
"boost": 999
}
}
}
]
}
}
}
'
...moving the two term-queries directly into the should-clause in your top level bool query.

I want my query to treat the content of two columns as one

I have a set of news articles. These have both tags and articleTags.
Our API has a endpoint that returns articles that matches all tags.
E.g. searching for an article that contains both sport and fail:
"bool": {
"must": [
[
{
"term": {
"tags": "sport"
}
},
{
"term": {
"tags": "fail"
}
},
{
"term": {
"articleTags": "sport"
}
},
{
"term": {
"articleTags": "fail"
}
}
]
]
}
This worked when we only had tags, but when we introduced articleTags then it obviously didn't work as expected.
Is there a way we could make Elasticsearch treat tags and articleTags as
one namespace so I could do a query like this?
"bool": {
"must": [
[
{
"term": {
"mergedTags": "sport"
}
},
{
"term": {
"mergedTags": "fail"
}
}
]
]
}
I feel multi match query would be the best solution here.
There is a type of multi match query which is called cross_fields .
And its function as told by the documentation is
Treats fields with the same analyzer as though they were one big field. Looks for each word in any field. See cross_fields.
My suggestion involves using copy_to to create that "merged" field:
"tags": {
"type": "string",
"copy_to": "mergedTags"
},
"articleTags": {
"type": "string",
"copy_to": "mergedTags"
},
"mergedTags": {
"type": "string"
}
And the updated query is a simple as:
"query": {
"bool": {
"must": [
[
{
"term": {
"mergedTags": "sport"
}
},
{
"term": {
"mergedTags": "fail"
}
}
]
]
}
}

Resources