Constant weight for each part in a bool query in Elasticsearch - elasticsearch

Is it possible to get number of "shoulds" matched or return constant weight per each should matched. For example, if I have a bool query like this:
"query": {
"bool": {
"should": [
{ "match": { "last_name": "Shcheklein" }},
{ "match": { "first_name": "Bart" }}
]
}
}
I would like to get:
score == 1 if one of the fields (last_name, first_name) matches (even in a fuzzy sense)
score == 2 if both fields match
And is it possible to get a list of fields matched?

you could probably use constant score to achieve this and
use highlighting to figure out the fields that matched.
Example :
{
"query": {
"bool": {
"disable_coord": true,
"should": [
{
"constant_score": {
"query": {
"match": {
"last_name": "Shcheklein"
}
},
"boost": 1
}
},
{
"constant_score": {
"query": {
"match": {
"first_name": "bart"
}
},
"boost": 1
}
}
]
}
}
}

Related

Is it possible to limit the number of Match Queries inside a Bool Query that contribute to the score?

Let's say I have the following Documents:
[
{
"name": "Berlin",
"name_english": "Berlin"
},
{
"name": "München",
"name_english": "Munich"
}
]
Now I do query 1:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "Munich"
}
}
},
{
"match": {
"name_english": {
"query": "Munich"
}
}
}
]
}
}
}
Then I do query 2:
{
"query": {
"bool": {
"should": [
{
"match": {
"name": {
"query": "Berlin"
}
}
},
{
"match": {
"name_english": {
"query": "Berlin"
}
}
}
]
}
}
}
Query 1 will have a lower score than query 2, because query 2 has 2 hits. My goal now is to have only 1 hit maximum of the fields to contribute to the score. Is that possible somehow? Like "If there is a hit in the first Match Query, dont do the second one".
There is no out of the box solution, but maybe it's possible using the painless script, or you another way is you handle it from your application by sending queries in if..else conditions.

ElasticSearch multimatch substring search

I have to combine two filters to match requirements:
- a specific list of values in r.status field
- one of the multiple text fields contains the value.
Result query (with using Nest, but it doesn't matter) looks like:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"bool": {
"should": [
{
"match": {
"r.g.firstName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
},
{
"match": {
"r.g.lastName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
}
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
Also tried with multi_match query:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"multi_match": {
"query": "SUBSTRING_VALUE",
"fields": [
"r.g.firstName",
"r.g.lastName"
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
FirstName and LastName are configured in index mappings as text:
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
Elastic gives a lot of full-text search options: multi_match, phrase, wildcards etc. But all of them fail in my case looking a sub-string in my text fields. (terms query and isActive one work well, I just tried to run only them).
What options do I have also or maybe where I made a mistake?
UPD: Combined wildcards worked for me, but such query looks ugly. Looking for a more elegant solution.
The elasticsearch way is to use ngram tokenizer.
The ngram analyzer will split your terms with a sliding window. For example, the input "Hello World" will generate the following terms:
Hel
Hell
Hello
ell
ello
...
Wor
World
orl
...
You can configure the minimum and maximum size of the sliding window (in the example the minimum size is 3). Once the sub terms are generated you can use a match query an the subfield.
Another point, it is weird to use must within a filter. If you are interested in the score, you should always use must otherwise use filter. Read this article for a good understanding.

Elasticsearch: should + minimum_should_match vs must

I test with these 2 queries
Query with must
{
"size": 200,
"from": 0,
"query": {
"bool": {
"must": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
]
}
}
}
Query with should + minimum_should_match
{
"size": 200,
"from": 0,
"query": {
"bool": {
"should": [ {
"match": {
"_all": "science"
}
},
{
"match": {
"category": "fiction"
}
},
{
"match": {
"country": "us"
}
}
],
minimum_should_match: 3
}
}
}
Both queries give me same result, I don't know the difference between these 2, when we should use minimum_should_match?
I guess you mean minimum_number_should_match, right?
In both cases it would be the same because you have the same number of clauses in should. minimum_number_should_match usually is used when you have more clauses than the number you specify there.
For example if you have 5 should clauses, but for some reason you only need three of them to be fulfilled you would do something like this:
{
"query": {
"bool": {
"should": [
{
"term": {
"tag": "wow"
}
},
{
"term": {
"tag": "elasticsearch"
}
},
{
"term": {
"tag": "tech"
}
},
{
"term": {
"user": "plchia"
}
},
{
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
],
"minimum_should_match": 3
}
}
}
That's correct and desired behavior. Let's decipher it a little bit:
Boolean query with must clauses means that all clauses under must section are required to match. Just like in English - it means strong obligation.
Boolean query with should clauses means that some clauses are required to match, whereas the others are not (i.e. soft obligation). The default number of clauses that must match here is simply 1. And to override this behavior the minimum_should_match parameter is coming into play. If you specify minimum_should_match=3 it will mean 3 clauses under should must match. From the practical perspective it exactly the same as specifying those clauses with must.
Hope it explains it in details.

match query on elastic search with multiple or conditions

I have three fields status,type and search. What I want is to search the data which contains status equals to NEW or status equals to IN PROGRESS and type is equal to abc or type equals to xyz and search contains( partial match ).
My call looks like below -
{
"query": {
"bool" : {
"must" : [{
"match": {
"status": {
"query": "abc",
}
}
}, {
"match": {
"type": {
"query": "NEW",
}
}
},{
"query_string": {
"query": "*abc*", /* for partial search */
"fields": ["title", "name"]
}
}]
}
}
}
Nest your boolqueries. I think what you are missing is this:
"bool": { "should": [
{ "match": { "status": "abc" } },
{ "match": { "status": "xyz" } }
]}
This is a query which MUST match one of the should clauses as only should clauses are given.
EDIT to explain the differences:
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"status": "abc"
}
},
{
"match": {
"status": "xyz"
}
}
]
}
},
{
"terms": {
"type": [
"NEW",
"IN_PROGRESS"
]
}
},
{
"query_string": {
"query": "*abc*",
"fields": [
"title",
"name"
]
}
}
]
}
}
}
So you have a boolquery at top. Every of the 3 inner queries must be true.
The first is a nested boolquery which is true if status matches either abc or xyz.
The second is true if type matches exactly NEW or IN_PROGRESS - Note the difference here. The First one would also match ABC or aBc or potentially "abc XYZ" depending on your analyzer. You might want terms for both.
The third is what you had before.

Select distinct values of bool query elastic search

I have a query that gets me some user post data from an elastic index. I am happy with that query, though I need to make it return rows with unique usernames. Current, it displays relevant posts by users, but it may display one user twice..
{
"query": {
"bool": {
"should": [
{ "match_phrase": { "gtitle": {"query": "voice","boost": 1}}},
{ "match_phrase": { "gdesc": {"query": "voice","boost": 1}}},
{ "match": { "city": {"query": "voice","boost": 2}}},
{ "match": { "gtags": {"query": "voice","boost": 1} }}
],"must_not": [
{ "term": { "profilepicture": ""}}
],"minimum_should_match" : 1
}
}
}
I have read about aggregations but didn't understand much (also tried to use aggs but didn't work either).... any help is appreciated
You would need to use terms aggregation to get all unique users and then use top hits aggregation to get only one result for each user. This is how it looks.
{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"gtitle": {
"query": "voice",
"boost": 1
}
}
},
{
"match_phrase": {
"gdesc": {
"query": "voice",
"boost": 1
}
}
},
{
"match": {
"city": {
"query": "voice",
"boost": 2
}
}
},
{
"match": {
"gtags": {
"query": "voice",
"boost": 1
}
}
}
],
"must_not": [
{
"term": {
"profilepicture": ""
}
}
],
"minimum_should_match": 1
}
},
"aggs": {
"unique_user": {
"terms": {
"field": "userid",
"size": 100
},
"aggs": {
"only_one_post": {
"top_hits": {
"size": 1
}
}
}
}
},
"size": 0
}
Here size inside user aggregation is 100, you can increase that if you have more unique users(default is 10), also the outermost size is zero to get only aggregation results. One important thing to remember is your user ids have to be unique, i.e ABC and abc will be considered different users, you might have to make your userid not_analyzed to be sure about that. More on that.
Hope this helps!!

Resources