Elastic search multiple simple_query_string with boost - elasticsearch

I have an index set up for all my documents:
{
"mappings" {
"book" {
"_source": { "enabled": true },
"properties": [
"title": { "type": "string", "analyzer": "standard", "search_analyzer": "standard" },
"description": { "type": "string", "analyzer": "standard", "search_analyzer": "standard" },
"author": { "type": "string", "analyzer": "standard", "search_analyzer": "standard" }
]
}
}
}
I push this through into an index called "library".
What I want to do is execute a search with the following requirements. Assuming the user entered something like "big yellow shovel"
Execute a search of user entered keywords in three ways:
As is as a whole phrase: "simple yellow shovel"
As a set of AND keywords: "simple+yellow+shovel"
As a set of OR keywords: "simple|yellow|shovel"
Ensure that the keyword sets executed in order of priority (boosted?):
Full text first
AND'd second
OR'd third
Using a simple query works find for a single search:
{
"query": {
"simple_query_string": {
"query": "\"simple yellow shovel\""
}
}
}
How do I execute the multiple search with boosting?
Or should I be using something like a "match" query on the indexed fields?

I am not sure if I got this one correct. I have assumed priority order of
author>title>description
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"multi_match": {
"query": "simple yellow shovel",
"fields": [
"author^7",
"title^3",
"description"
],
"type": "phrase",
"boost": 10
}
}
]
}
},
{
"bool": {
"must": [
{
"multi_match": {
"query": "simple",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 5
}
},
{
"multi_match": {
"query": "yellow",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 5
}
},
{
"multi_match": {
"query": "shovel",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 5
}
}
]
}
},
{
"multi_match": {
"query": "simple",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 2
}
},
{
"multi_match": {
"query": "yellow",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 2
}
},
{
"multi_match": {
"query": "shovel",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 2
}
}
]
}
}
}
could anyone please verify this? You could refer to Boost Query link for more info. Is this what you are looking for?
I hope this helps!
EDIT : Rewritten with dis_max
{
"query": {
"bool": {
"should": [
{
"dis_max": {
"tie_breaker": 0.7,
"queries": [
{
"bool": {
"must": [
{
"multi_match": {
"query": "simple yellow shovel",
"fields": [
"author^7",
"title^3",
"description"
],
"type": "phrase",
"boost": 10
}
}
]
}
},
{
"bool": {
"must": [
{
"dis_max": {
"tie_breaker": 0.7,
"queries": [
{
"multi_match": {
"query": "simple",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 5
}
},
{
"multi_match": {
"query": "yellow",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 5
}
},
{
"multi_match": {
"query": "shovel",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 5
}
}
]
}
}
]
}
},
{
"multi_match": {
"query": "simple",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 2
}
},
{
"multi_match": {
"query": "yellow",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 2
}
},
{
"multi_match": {
"query": "shovel",
"fields": [
"author^7",
"title^3",
"description"
],
"boost": 2
}
}
]
}
}
]
}
}
}
This seems to give me much better results atleast on my dataset. This is a great source to understand dismax
Please play a lot with this and see if you are getting expected results.
Use the help of Explain API.

I've rewritten this using Dis Max Query. Keep in mind that you could try different types to get better results. See these:
best_fields
most_fields
cross_fields
Query:
POST /your_index/your_type/_search
{
"query": {
"dis_max": {
"tie_breaker": 0.7,
"boost": 1.2,
"queries": [
{
"multi_match": {
"query": "simple yellow showel",
"type": "phrase",
"boost": 3,
"fields": [
"title^3",
"author^2",
"description"
]
}
},
{
"multi_match": {
"query": "simple yellow showel",
"operator": "and",
"boost": 2,
"fields": [
"title^3",
"author^2",
"description"
]
}
},
{
"multi_match": {
"query": "simple yellow showel",
"fields": [
"title^3",
"author^2",
"description"
]
}
}
]
}
}
}
Dis Max query will pick document, which scored most from all three queries. And we give additional boost for "type": "phrase" and "operator": "and", while we leave last query untouched.

Related

Has there been any change in the format of using function_score in ES 6.8?

I have the query in below format and it runs in ES 2.4
{"query":{"function_score":{"filter":{"bool":{"must":[{"exists":{"field":"x"}},{"query_string":{"query":"en","fields":["locale"]}},{"query_string":{"query":"US","fields":["channel"]}},{"query_string":{"query":"UG","fields":["usergroups"]}}]}},"query":{"bool":{"should":{"multi_match":{"query":"refund","fields":["doc","key","title","title.standard_analyzed^3","x"],"type":"phrase","slop":20}},"must":{"multi_match":{"fuzziness":"0","query":"refund","prefix_length":"6","fields":["doc","key","title","title.standard_analyzed^3","x"],"max_expansions":"30"}}}},"functions":[{"field_value_factor":{"field":"usage","factor":1,"modifier":"log2p","missing":1}}]}},"from":0,"size":21}
But when I try the same query in 6.8 it returns errors
{"error":{"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [function_score]",
If I put filters inside query, I get the response but the order of the docs don't match due to the difference in score
There should only be the "query" key below the function score. You have to add the filter in the bool query.
I don't know about your mapping but I would use the "Term" query instead of the query string.
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "x"
}
},
{
"query_string": {
"query": "en",
"fields": [
"locale"
]
}
},
{
"query_string": {
"query": "US",
"fields": [
"channel"
]
}
},
{
"query_string": {
"query": "UG",
"fields": [
"usergroups"
]
}
}
]
}
},
"should": {
"multi_match": {
"query": "refund",
"fields": [
"doc",
"key",
"title",
"title.standard_analyzed^3",
"x"
],
"type": "phrase",
"slop": 20
}
},
"must": {
"multi_match": {
"fuzziness": "0",
"query": "refund",
"prefix_length": "6",
"fields": [
"doc",
"key",
"title",
"title.standard_analyzed^3",
"x"
],
"max_expansions": "30"
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "usage",
"factor": 1,
"modifier": "log2p",
"missing": 1
}
}
]
}
},
"from": 0,
"size": 21
}
About FunctionScore (doc 6.8)

Boost certain keywords in search query

I am tryin to boost certain keywords in my mutlimatch query that are more important than other words
data set ['black kurta','blue kurta','green kurta','black pant' ]
eg( search for "black kurta" )
first should be black kurta then 'blue kurta','green kurta' and at last kurta
{
"query": {
"multi_match" : {
"query": "Black kurta",
"type": "best_fields",
"fields": [ "name^3","meta_title^3","meta_description","short_description","meta_keyword^3","description^1" ],
"tie_breaker": 0.3
}
}
}
Try this, notice the boost section.
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "black kurta",
"fields": [
"name",
"meta_title",
"meta_description",
"short_description", ...
],
"type": "phrase",
"boost": 10
}
},
{
"multi_match": {
"query": "blue kurta",
"fields": [
"name",
"meta_title",
"meta_description",
"short_description",
...
],
"operator": "and",
"boost": 4
}
},
{
"multi_match": {
"query": "green kurta",
"fields": [
"name",
"meta_title",
"meta_description",
"short_description",
...
],
"operator": "and",
"boost": 2
}
}
]
}
}

How to boost certain words/phrases of a should clause in ES?

I am using the following query:
{
"_source": [
"title",
"bench",
"id_",
"court",
"date",
"content"
],
"size": 15,
"from": 0,
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "london",
"fields": [
"title",
"content"
]
}
}
],
"should": [
{
"multi_match": {
"query": "London is a beautiful city and has a lot of amazing landmarks. I love the Thames!",
"fields": [
"title",
"content^2"
],
"operator": "or"
}
}
]
}
},
"highlight": {
"pre_tags": [
"<tag1>"
],
"post_tags": [
"</tag1>"
],
"fields": {
"content": {}
},
"number_of_fragments": 5,
"fragment_size": 300
}
}
The rational of the query is that the word London must be present while those in the should query should just boost the score. What I would like to do is that within the should query, I would like to boost the phrase beautiful city and the word Thames. How do I do it?
PS: Content and Title are standard text fields with no analyzers applied on them.
Regards
You can add multiple clauses in should query
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "london",
"fields": [
"title",
"content"
]
}
}
],
"should": [
{
"multi_match": {
"query": "beautiful city",
"fields": [
"title",
"content^2"
],
"type": "phrase"
}
},
{
"multi_match": {
"query": "Thames",
"fields": [
"title",
"content^2"
]
}
}
]
}
}
}

How to give different weights to exact, phonetic and fuzzy queries?

Note: I checked out this answer, but could not solve the problem.
So currently I am using the following query:
{
"_source": [
"title",
"bench",
"id_",
"court",
"date"
],
"size": 15,
"from": 0,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "knife",
"fields": [
"title",
"body"
],
"operator": "and"
}
},
"should": {
"multi_match": {
"query": "knife",
"fields": [
"title",
"body"
],
"fuzziness" : 1,
"operator": "and"
}
}
}
},
"highlight": {
"pre_tags": [
"<tag1>"
],
"post_tags": [
"</tag1>"
],
"fields": {
"content": {}
},
"fragment_size": 30
}
}
What I want to achieve is that I want to give different weights to exact, phonetic and fuzy queries in the order exact > fuzzy > phonetic. How do I acheive this?
This is my mapping - (My analyzer is a Metaphone analyzer)
{
"courts_2": {
"mappings": {
"properties": {
"author": {
"type": "text",
"analyzer": "my_analyzer"
},
"bench": {
"type": "text",
"analyzer": "my_analyzer"
},
"citation": {
"type": "text"
},
"content": {
"type": "text",
"fields": {
"standard": {
"type": "text"
}
},
"analyzer": "my_analyzer"
},
"court": {
"type": "text"
},
"date": {
"type": "text"
},
"id_": {
"type": "text"
},
"title": {
"type": "text",
"fields": {
"standard": {
"type": "text"
}
},
"analyzer": "my_analyzer"
},
"verdict": {
"type": "text"
}
}
}
}
}
You might index phonetic fields on an separate sub-field as follow :
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"phonetic": {
"type": "text",
"analyzer": "my_analyzer"
}
}}}}
Then, you can do a Function score query to have the order exact > fuzzy > phonetic :
{
"_source": [
"title",
"bench",
"id_",
"court",
"date"
],
"size": 15,
"from": 0,
"query": {
"bool": {
"should": [
{
"function_score": {
"query": {
"multi_match": {
"query": "knife",
"fields": [
"title",
"body"
],
"operator": "and"
}
},
"boost": 3
}
},
{
"function_score": {
"query": {
"multi_match": {
"query": "knife",
"fields": [
"title",
"body"
],
"fuzziness": 1,
"operator": "and"
}
},
"boost": 2
}
},
{
"function_score": {
"query": {
"multi_match": {
"query": "knife",
"fields": [
"title.phonetic",
"body.phonetic"
],
"operator": "and"
}
},
"boost": 1
}
}
]
}
}
}
Hope this helps !

How to convert Lucene query string to Elasticsearch Match/Match_Prefix etc equivalent

I am currently working on migrating from SOLR v3 to Elasticsearch v5.11. My question is, how would I convert the below query string to an Elasticsearch Match/Match Phrase etc equivalent. Is this even possible?
(entityName:(john AND lewis OR "john lewis")
OR entityNameText:(john AND lewis OR "john lewis"))
AND (status( "A" OR "I" status))
I tried to do so, so far only with the first set of brackets but it doesn't seem correct:
{
"bool": {
"should": [
[{
"bool": {
"should": [
[{
"match_phrase": {
"entityName": "john lewis"
}
}]
],
"must": [
[{
"match": {
"entityName": {
"query": "john lewis",
"operator": "and"
}
}
}]
]
}
}, {
"bool": {
"should": [
[{
"match_phrase": {
"entityNameText": "john lewis"
}
}]
],
"must": [
[{
"match": {
"entityNameText": {
"query": "john lewis",
"operator": "and"
}
}
}]
]
}
}]
]
}
}
Thanks
Updated:
entityName and entityNameText are both mapped as text types with custom analyzers for both search and query. Status is mapped as a keyword type.
Posting the answer for anyone that is interesting in this in the future.
Not entirely sure why but I wrote two alternative queries using ES Query DSL and found them to be equivalent to the original Lucene query, returning exactly the same results. Not sure if that's a pro or con of the ES Query DSL.
Original Lucene Query:
{
"query": {
"query_string" : {
"query" : "entityName:(john AND Lewis OR \"john Lewis\") OR entityNameText:(john AND Lewis OR \"john Lewis\")"
}
}
}
Query alternative 1:
{
"bool": {
"should": [
[{
"bool": {
"should": [
[{
"match": {
"entityName": {
"query": "john Lewis",
"operator": "and"
}
}
}, {
"match_phrase": {
"entityName": "john Lewis"
}
}]
]
}
}, {
"bool": {
"should": [
[{
"match": {
"entityNameText": {
"query": "john Lewis",
"operator": "and"
}
}
}, {
"match_phrase": {
"entityNameText": "john Lewis"
}
}]
]
}
}]
]
}
}
Query alternative 2
{
"bool": {
"should": [
[{
"multi_match": {
"query": "john Lewis",
"type": "most_fields",
"fields": ["entityName", "entityNameText"],
"operator": "and"
}
}, {
"multi_match": {
"query": "john Lewis",
"type": "phrase",
"fields": ["entityName", "entityNameText"]
}
}]
]
}
}
With this mapping:
{
"entity": {
"dynamic_templates": [{
"catch_all": {
"match_mapping_type": "*",
"mapping": {
"type": "text",
"store": true,
"analyzer": "phonetic_index",
"search_analyzer": "phonetic_query"
}
}
}],
"_all": {
"enabled": false
},
"properties": {
"entityName": {
"type": "text",
"store": true,
"analyzer": "indexed_index",
"search_analyzer": "indexed_query",
"fields": {
"entityNameLower": {
"type": "text",
"analyzer": "lowercase"
},
"entityNameText": {
"type": "text",
"store": true,
"analyzer": "text_index",
"search_analyzer": "text_query"
},
"entityNameNgram": {
"type": "text",
"analyzer": "ngram_index",
"search_analyzer": "ngram_query"
},
"entityNamePhonetic": {
"type": "text",
"analyzer": "ngram_index",
"search_analyzer": "ngram_query"
}
}
},
"status": {
"type": "keyword",
"norms": false,
"store": true
}
}
}
}
The answer will depend on how you've specified your mapping, but I'll assume that you did zero customer mapping.
Let's break down the different parts first, then we'll put them all back together.
status( "A" OR "I" status)
This is a "terms" query, think of it as a SQL "IN" clause.
"terms": {
"status": [
"a",
"i"
]
}
entityName:(john AND lewis OR "john lewis")
ElasticSearch breaks down string fields into distinct parts. We can use this to our advantage here by using another "terms" query. we don't need to specify it as 3 different parts, ES will handle that under the hood.
"terms": {
"entityName": [
"john",
"lewis"
]
}
entityNameText:(john AND lewis OR "john lewis"))
Exactly the same logic as above, just searching on a different field
"terms": {
"entityNameText": [
"john",
"lewis"
]
}
AND vs OR
In an ES query. And = "must" Or = "should".
Put it all together
GET test1/type1/_search
{
"query": {
"bool": {
"must": [
{
"terms": {
"status": [
"a",
"i"
]
}
},
{
"bool": {
"should": [
{
"terms": {
"entityName": [
"john",
"lewis"
]
}
},
{
"terms": {
"entityNameText": [
"john",
"lewis"
]
}
}
]
}
}
]
}
}
}
Below is a link to the full setup I used to test the query.
https://gist.github.com/jayhilden/cf251cd751ef8dce7a57df1d03396778

Resources