Boosting has no effect in a Boolean-filtered query in Elasticsearch - elasticsearch

I'm trying to add a boost to documents that match to a term filter. The basis is a Boolean/MatchAll query. But the boosting in my Elasticsearch query has no effect. All result scores are set to 1:
curl -XPOST localhost:9200/wiki_content/_search?pretty -d '
{
"_source": [
"title"
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"filter": [
{
"bool": {
"should": [
{
"term": {
"title.keyword": {
"value": "Main Page",
"boost": 9
}
}
},
{
"term": {
"title.keyword": {
"value": "Top Page",
"boost": 999
}
}
}
]
}
}
]
}
}
}
'
However, when using a filtered query, the boosting works. But due to restrictions in my system I cannot use a filtered query. So is there any method to make the boosting in the original query work?

In the filter part of the query, boosting will have no effect, as the filters only job is to, ehhm, filter queries that match certain values. Try instead:
curl -XPOST localhost:9200/wiki_content/_search?pretty -d '
{
"_source": [
"title"
],
"query": {
"bool": {
"must": [
{
"match_all": {}
}
],
"should": [
{
"term": {
"title.keyword": {
"value": "Main Page",
"boost": 9
}
}
},
{
"term": {
"title.keyword": {
"value": "Top Page",
"boost": 999
}
}
}
]
}
}
}
'
...moving the two term-queries directly into the should-clause in your top level bool query.

Related

ElasticSearch lucene query with subclauses conversion to ES syntax

I've been trying to convert a lucene style query to ES query syntax but I'm getting stuck on sub-clauses. e.g.
(title:history^10 or series:history) and (NOT(language:eng) OR language:eng^5) and (isfree eq 'true' OR (isfree eq 'false' AND owned eq 'abc^5'))
This states that "get me a match for history in 'title' or 'series' but boost the title match AND where the language doesn't have to be english, but if if is then boost it AND where the match is free or where it isn't free then make sure it's owned by customer abc".
I feel this is a tricky query but it seems to work correctly. Converting the clauses to ES syntax is confusing me as I don't really have the concept of brackets. I think I need to use bool queries... I have the following which I know doesn't apply the criteria correctly - it says you should have (language:eng OR isFree eq 'true' OR owned:abc). I can't seem to make the mental leap to build the must/should with NOT's in it.
Help please?
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "history",
"fields": [
"title^10.0",
"series"
]
}
}
],
"should": [
{
"term": {
"language": {
"value": "eng",
"boost": 5
}
}
},
{
"term": {
"isFree": {
"value": true
}
}
},
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
},
Your query is almost correct, the only thing that wasn't translated correctly was this part of the query:
(isfree eq 'true' OR (isfree eq 'false' AND owned eq 'abc^5'))
If I understand your post correctly, this is basically saying boost the 'owned' field by a factor of five when it's value is 'abc' and the price is free. To implement this, you need to use an additional bool query that:
Filters results by isFree: true
Boosts the owned field of any documents matching abc
"bool": {
"filter": [
{
"term": {
"isFree": {
"value": false
}
}
}
],
"must": [
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
Since this is not intended to limit the result set and only boost results that meet this criteria, the bool query above should be placed inside your parent bool's should section. The final query looks like:
POST /myindex/_search
{
"explain": true,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "history",
"fields": [
"title^10",
"series"
]
}
}
],
"should": [
{
"term": {
"language": {
"value": "eng",
"boost": 5
}
}
},
{
"bool": {
"filter": [
{
"term": {
"isFree": {
"value": false
}
}
}
],
"must": [
{
"term": {
"owned": {
"value": "abc",
"boost": 5
}
}
}
]
}
}
]
}
}
}
Note: Using should and must yield the same results for that inner bool, I honestly am not sure which would be better to use so I just arbitrarily used must.

ElasticSearch multimatch substring search

I have to combine two filters to match requirements:
- a specific list of values in r.status field
- one of the multiple text fields contains the value.
Result query (with using Nest, but it doesn't matter) looks like:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"bool": {
"should": [
{
"match": {
"r.g.firstName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
},
{
"match": {
"r.g.lastName": {
"type": "phrase",
"query": "SUBSTRING_VALUE"
}
}
}
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
Also tried with multi_match query:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"isActive": {
"value": true
}
}
},
{
"nested": {
"query": {
"bool": {
"must": [
{
"terms": {
"r.status": [
"VALUE_1",
"VALUE_2",
"VALUE_3"
]
}
},
{
"multi_match": {
"query": "SUBSTRING_VALUE",
"fields": [
"r.g.firstName",
"r.g.lastName"
]
}
}
]
}
},
"path": "r"
}
}
]
}
}
]
}
}
}
FirstName and LastName are configured in index mappings as text:
"firstName": {
"type": "text"
},
"lastName": {
"type": "text"
}
Elastic gives a lot of full-text search options: multi_match, phrase, wildcards etc. But all of them fail in my case looking a sub-string in my text fields. (terms query and isActive one work well, I just tried to run only them).
What options do I have also or maybe where I made a mistake?
UPD: Combined wildcards worked for me, but such query looks ugly. Looking for a more elegant solution.
The elasticsearch way is to use ngram tokenizer.
The ngram analyzer will split your terms with a sliding window. For example, the input "Hello World" will generate the following terms:
Hel
Hell
Hello
ell
ello
...
Wor
World
orl
...
You can configure the minimum and maximum size of the sliding window (in the example the minimum size is 3). Once the sub terms are generated you can use a match query an the subfield.
Another point, it is weird to use must within a filter. If you are interested in the score, you should always use must otherwise use filter. Read this article for a good understanding.

ElasticSearch How to AND a nested query

I am trying to figure out how to AND my Elastic Search query. I've tried a few different variations but I am always hitting a parser error.
What I have is a structure like this:
{
"title": "my title",
"details": [
{ "name": "one", "value": 100 },
{ "name": "two", "value": 21 }
]
}
I have defined details as a nested type in my mappings. What I'm trying to achieve is a query where it matches a part of the title and it matches various details by the detail's name and value.
I have the following query which gets me nearly there but I haven't been able to figure out how to AND the details. As an example I'd like to find anything that has:
detail of one with value less than or equal to 100
AND detail of two with value less than or equal to 25
The following query only allows me to search by one detail name/value:
"query" : {
"bool": {
"must": [
{ "match": {"title": {"query": titleQuery, "operator": "and" } } },
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{ "match": {"details.name" : "one"} },
{ "range": {"details.value" : { "lte": 100 } } }
]
}
}
} // nested
}
] // must
}
}
As a second question, would it be better to query the title and then move the nested part of the query into a filter?
You were so close! Just add another "nested" clause in your outer "must":
POST /test_index/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"title": {
"query": "title",
"operator": "and"
}
}
},
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{"match": {"details.name": "one" } },
{ "range": { "details.value": { "lte": 100 } } }
]
}
}
}
},
{
"nested": {
"path": "details",
"query": {
"bool": {
"must": [
{"match": {"details.name": "two" } },
{ "range": { "details.value": { "lte": 25 } } }
]
}
}
}
}
]
}
}
}
Here is some code I used to test it:
http://sense.qbox.io/gist/1fc30d49a810d22e85fa68d781114c2865a7c92e
EDIT: Oh, the answer to your second question is "yes", though if you're using 2.0 things have changed a little.

Elasticsearch Parse Exception for boolean queries

I'm trying to create queries similar to kibana queries in elasticsearch lucene queries. What I'm basically trying to do is matching some phrases. For example; my kibana query looks like this:(+"anna smith") AND ( (+"university"), (+"chairman"), (+"women rights")) It searches "anna smith" as must and one of the other phrases as should(there should be at least one of them exist in the text). I wrote a query to do this but it gives "elasticsearch parse exception:expected field name but got start_object". How can I solve this. Here is my query;
{
"query": {
"bool": {
"must": {
"match": {
"text": {
"query": "anna smith",
"operator": "and"
}
}
}
},
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match": {
"text": {
"query": "university",
"boost": 2
}
}
},
{
"match": {
"text": {
"query": "chairman",
"boost": 2
}
}
}
]
}
}]
}}}}
Your second query at the bottom cannot be there, it needs to be inside the first bool/must like this
{
"query": {
"bool": {
"must": [
{
"match": {
"text": {
"query": "anna smith",
"operator": "and"
}
}
},
{
"bool": {
"should": [
{
"match": {
"text": {
"query": "university",
"boost": 2
}
}
},
{
"match": {
"text": {
"query": "chairman",
"boost": 2
}
}
}
]
}
}
]
}
}
}

Elasticsearch boost score with nested query

I have the following query in Elasticsearch version 1.3.4:
{
"filtered": {
"query": {
"bool": {
"should": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
},
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "adobe creative suite"
}
}
]
}
}
]
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "skills",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
},
{
"bool": {
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 5
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 5
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
}
}
}
]
}
}
],
"minimum_should_match": "100%"
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
},
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "ajax"
}
},
{
"term": {
"skills.name.original": "html"
}
}
]
}
}
]
}
}
}
Mappings look like this:
skills: {
type: "nested",
include_in_parent: true,
properties: {
name: {
type: "multi_field",
fields: {
name: {type: "string"},
original: {type : "string", analyzer : "string_lowercase"}
}
}
}
}
and finally the document structure, for skills (excluded other parts), looks like this:
"skills":
[
{
"name": "java",
"source": [
"linkedin",
"facebook"
]
},
{
"name": "html",
"source": [
"meetup"
]
}
]
My goal with this query is to, first filter out some irrelevant hits with the filters (bottom of the query), then score a person by searching the whole document for the match_phrase "java", extra boosting if it also contains the match_phrase "adobe creative suit", then check the nested value where we get a hit in "skills" to see what kind of "source(s)" the skill came from. Then give the query a boost based on what source, or sources the nested object has.
This kinda of works, at least I don't get any errors, but the final score is odd and its hard to see if its working. If I give a small boost, lets say 2, the score goes DOWN slightly, my top hit at the moment has a score of 32.176407 with boost = 1. With a boost of 5 it goes down to 31.637703. I would expect it to go up, not down? With a boost of 1000, the score goes down to 2.433376.
Is this the right way to do this, or is there a better/easier way? I could change the structure and mappings etc. And why is my score decreasing?
Edit: I have simplified the query a little, only dealing with one "skill":
{
"filtered": {
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase": {
"_all": "java"
}
}
],
"minimum_should_match": 1
}
}
]
}
}
],
"should": [
{
"nested": {
"path": "skills",
"score_mode": "avg",
"query": {
"bool": {
"must": [
{
"term": {
"skills.name.original": "java"
}
}
],
"should": [
{
"match": {
"skills.source": {
"query": "linkedin",
"boost": 1.2
}
}
},
{
"match": {
"skills.source": {
"query": "meetup",
"boost": 1.2
}
}
}
]
}
}
}
}
]
}
},
"filter": {
"and": [
{
"bool": {
"should": [
{
"term": {
"skills.name.original": "java"
}
}
]
}
}
]
}
}
}
The problem now is that I expect two similar documents, where the only difference is the "source" value on the skill "java". They are "linkedin" and "meetup" respectively. In my new query, they both get the same boost, but the final _score is very different for the two documents.
From the query explanation for doc 1:
"value": 3.82485,
"description": "Score based on child doc range from 0 to 125"
and for doc two:
"value": 2.1993546,
"description": "Score based on child doc range from 0 to 125"
These values are the only ones that differ, and I cant see why.
I can't answer the question regarding the boost, but how many shards do you have on index?
TF and IDF are calculated per shard not per index and this could be creating the difference in score.
https://groups.google.com/forum/#!topic/elasticsearch/FK-PYb43zcQ.
If you reindex with only 1 shard does change the outcome?
Edit: Also, the doc range is the range of docs for each document in the shard and you can use this to calculate IDF for each doc to verify scores.

Resources