Optimize time for a elasticsearch query using fuzziness - elasticsearch

The below search query is taking around 2356 ms to fetch 50 records.
Fuzziness leads to slower search. How can I improve performance using Fuzziness?
(Highlight cannot be skipped)
{
"from": 0,
"size": 50,
"query": {
"bool": {
"must": {
"multi_match": {
"query": "shall have the right",
"fields": [
"subType",
"title",
"type",
"content"
],
"fuzziness": "AUTO",
"minimum_should_match": "80%"
}
},
"should": {
"multi_match": {
"query": "shall have the right",
"fields": [
"subType",
"title",
"type",
"content"
],
"type": "phrase",
"slop": 1
}
}
}
},
"aggregations": {
"agg_example": {
"terms": {
"field": "type.keyword"
}
}
},
"highlight": {
"type": "unified",
"fields": {
"*": {}
}
}
}

Related

Has there been any change in the format of using function_score in ES 6.8?

I have the query in below format and it runs in ES 2.4
{"query":{"function_score":{"filter":{"bool":{"must":[{"exists":{"field":"x"}},{"query_string":{"query":"en","fields":["locale"]}},{"query_string":{"query":"US","fields":["channel"]}},{"query_string":{"query":"UG","fields":["usergroups"]}}]}},"query":{"bool":{"should":{"multi_match":{"query":"refund","fields":["doc","key","title","title.standard_analyzed^3","x"],"type":"phrase","slop":20}},"must":{"multi_match":{"fuzziness":"0","query":"refund","prefix_length":"6","fields":["doc","key","title","title.standard_analyzed^3","x"],"max_expansions":"30"}}}},"functions":[{"field_value_factor":{"field":"usage","factor":1,"modifier":"log2p","missing":1}}]}},"from":0,"size":21}
But when I try the same query in 6.8 it returns errors
{"error":{"root_cause":[{"type":"parsing_exception","reason":"no [query] registered for [function_score]",
If I put filters inside query, I get the response but the order of the docs don't match due to the difference in score
There should only be the "query" key below the function score. You have to add the filter in the bool query.
I don't know about your mapping but I would use the "Term" query instead of the query string.
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"exists": {
"field": "x"
}
},
{
"query_string": {
"query": "en",
"fields": [
"locale"
]
}
},
{
"query_string": {
"query": "US",
"fields": [
"channel"
]
}
},
{
"query_string": {
"query": "UG",
"fields": [
"usergroups"
]
}
}
]
}
},
"should": {
"multi_match": {
"query": "refund",
"fields": [
"doc",
"key",
"title",
"title.standard_analyzed^3",
"x"
],
"type": "phrase",
"slop": 20
}
},
"must": {
"multi_match": {
"fuzziness": "0",
"query": "refund",
"prefix_length": "6",
"fields": [
"doc",
"key",
"title",
"title.standard_analyzed^3",
"x"
],
"max_expansions": "30"
}
}
}
},
"functions": [
{
"field_value_factor": {
"field": "usage",
"factor": 1,
"modifier": "log2p",
"missing": 1
}
}
]
}
},
"from": 0,
"size": 21
}
About FunctionScore (doc 6.8)

Why two nodes of es return different results

There are three nodes of my es, and the target index has 1 shard and 1 replica.
The api of _cat/shards return :
article-test 0 p STARTED 210236 23.8gb 10.0.2.37 node-03
article-test 0 r STARTED 210236 23.8gb 10.0.2.182 node-01
But, when I want to search something from this index, the node-01 would not return data, but the node-03 returns the correct results.
I have troubled this problem for a long time, and can`t find some solution from other websites.
So, can anyone help me...
The query is belowed.
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{"term": {
"qqq": {
"value": true
}
}},
{
"bool": {
"should": [
{
"multi_match": {
"query": "query",
"type": "best_fields",
"fields": ["xx^3", "yy", "zz"],
"operator": "and", "analyzer": "ik_smart"}
}, {
"multi_match": {
"query": "query",
"type": "phrase",
"fields": ["xx^3", "yy", "zz"],
"operator": "and",
"analyzer": "ik_smart",
"boost": 3}
}, {
"term": {
"xx.keyword": {
"value": "query"}
}
}, {
"term": {
"aaa.keyword": {
"value": "query"}
}
}]
}
}
]
}
},
"functions": [
{
"gauss": {
"aaa": {
"origin": "2019-12-19",
"scale": "50d",
"offset": "90d",
"decay": 0.4}
}
}, {
"filter": {
"bool": {
"must": [
{"match": {"xxx": true}},
{"match": {"yyy": 1}}]}},
"weight": 5}
]
}
},
"from": 0,
"size": 10,
"_source": ["xx", "yy", "zz"],
"sort": {"_score": {"order": "desc"}},
"highlight": {
"pre_tags": ["<strong>"],
"post_tags": ["</strong>"],
"fields": {"xx": {}, "yy": {}, "zz": {}}
}
}

How to filter based on condition where the result hits come from a certain field in Elasticsearch?

I have a elasticsearch bool query with multiple should clauses combining multiple match queries on multiple fields with different boost values.
say I have 5 fields :
productName / currency / country / identifierNumber
I want to filter my results conditionally with this logic:
if the results(hits) of the bool query would come from a match query on the field (productName) then Filter by currency.
if the results(hits) of the bool query would come from a match query on the field (identifierNumber) then Do Not Filter By currency
UPDATE
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "parameter",
"fields": [
"productName.test^8",
"productName.raw^4",
"_all^2",
"Zone^10",
"category^12",
"class^71"
],
"fuzziness": "1",
"prefix_length": 1
}
},
{
"match": {
"productName.test": {
"query": "parameter",
"operator": "and",
"fuzziness": "1",
"prefix_length": 3,
"boost": 1000
}
}
},
{
"match": {
"productName.raw": {
"query": "parameter",
"operator": "or",
"fuzziness": "AUTO",
"prefix_length": 3,
"boost": 10
}
}
},
{
"match": {
"identifierNumber": {
"query": "parameter",
"boost": 3000
}
}
},
{
"term": {
"tic": {
"value": "parameter",
"boost": 30000
}
}
},
{
"match": {
"_all": {
"query": "parameter",
"operator": "or",
"fuzziness": "1",
"prefix_length": 2,
"boost": 10
}
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "productvalue"
}
}
],
"boost_mode": "multiply"
}
},
"size": 30,
"highlight": {
"fields": {
"*": {}
},
"require_field_match": false
},
"post_filter": {
"bool": {
"must": [
{
"match": {
"countryName": "France"
}
}]
}
}
}

Combine elasticsearch bool query with range boost

Combine elasticsearch bool query with range boost
I have a complex bool query as follows. I use a bogus search term dgbdrtgndgfndrtgb to fabricate my example, which should not match anything.
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"filtered": {
"filter": {
# ...
},
"query": {
"bool": {
"should": [
{
"match": {
"name.suggest_ngrams": {
"query": "dgbdrtgndgfndrtgb",
"fuzziness": "AUTO",
"prefix_length": 1,
"operator": "AND",
"boost": 10
}
}
},
{
"multi_match": {
"query": "dgbdrtgndgfndrtgb",
"fields": [
"name.untouched_lowercase"
],
"boost": 5
}
},
{
"query_string": {
"fields": [
"name.suggest"
],
"query": "dgbdrtgndgfndrtgb*",
"boost": 10
}
},
{
"query_string": {
"fields": [
"name.suggest"
],
"query": "dgbdrtgndgfndrtgb",
"boost": 10
}
},
{
"match": {
"first_word": {
"query": "dgbdrtgndgfndrtgb",
"operator": "AND",
"boost": 10
}
}
},
{
"match": {
"name": {
"query": "dgbdrtgndgfndrtgb",
"operator": "AND",
"boost": 5
}
}
}
]
}
}
}
}
}
}
}
This works well. Now, for any of those matches, I want to add a boost where the name field has fewer than 2 words. In other words, boost single-word matches or sort them to the top of the result set.
So I tried adding a range boost like this:
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"filtered": {
"filter": {
# ...
},
"query": {
"bool": {
"should": [
{
"match": {
"name.suggest_ngrams": {
"query": "dgbdrtgndgfndrtgb",
"fuzziness": "AUTO",
"prefix_length": 1,
"operator": "AND",
"boost": 10
}
}
},
{
"multi_match": {
"query": "dgbdrtgndgfndrtgb",
"fields": [
"name.untouched_lowercase"
],
"boost": 5
}
},
{
"query_string": {
"fields": [
"name.suggest"
],
"query": "dgbdrtgndgfndrtgb*",
"boost": 10
}
},
{
"query_string": {
"fields": [
"name.suggest"
],
"query": "dgbdrtgndgfndrtgb",
"boost": 10
}
},
{
"match": {
"first_word": {
"query": "dgbdrtgndgfndrtgb",
"operator": "AND",
"boost": 10
}
}
},
{
"match": {
"name": {
"query": "dgbdrtgndgfndrtgb",
"operator": "AND",
"boost": 5
}
}
},
{
"range": {
"name.word_count": {
"lt": 2,
"boost": 40
}
}
}
]
}
}
}
}
}
}
}
This sorts things like I want, but it also returns single-word matches which do not match the search term dgbdrtgndgfndrtgb.
Is there a way to only boost single-word matches, which also match the search term? I've tried lowering the boost value, which breaks the desired sorting when using a valid (found) search term.
It seems like there should be a way to AND the entire bool query with the range boost. I've tried various permutations to achieve this with no luck and the docs are less than helpful.
One caveat: I cannot use scripting as the index is hosted on AWS which doesn't support it.
Any advice is appreciated.
After sleeping on the problem, it hit me that it is just Boolean logic. So, I came up with this solution that works perfectly, wherein I wrapped the working query logic in a must tag and put the range boost in a should tag.
{
"from": 0,
"size": 10,
"query": {
"function_score": {
"boost_mode": "replace",
"query": {
"filtered": {
"filter": {
# ...
},
"query": {
"bool": {
"must": {
"bool": {
"should": [
{
"match": {
"name.suggest_ngrams": {
"query": "dgbdrtgndgfndrtgb",
"fuzziness": "AUTO",
"prefix_length": 1,
"operator": "AND",
"boost": 10
}
}
},
{
"multi_match": {
"query": "dgbdrtgndgfndrtgb",
"fields": [
"name.untouched_lowercase"
],
"boost": 5
}
},
{
"query_string": {
"fields": [
"name.suggest"
],
"query": "dgbdrtgndgfndrtgb*",
"boost": 10
}
},
{
"query_string": {
"fields": [
"name.suggest"
],
"query": "dgbdrtgndgfndrtgb",
"boost": 10
}
},
{
"match": {
"first_word": {
"query": "dgbdrtgndgfndrtgb",
"operator": "AND",
"boost": 10
}
}
},
{
"match": {
"name": {
"query": "dgbdrtgndgfndrtgb",
"operator": "AND",
"boost": 5
}
}
}
]
}
},
"should": {
"range": {
"name.word_count": {
"lt": 2,
"boost": 40
}
}
}
}
}
}
}
}
}
}
Yay!

Elasticsearch - Filtered query with weighted types

I have inherited an Elasticsearch query that I am trying to modify. The query I have at the moment is:
{
"fields": [
],
"from": 0,
"size": 51,
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"data.*"
],
"default_operator": "AND",
"query": "*Search term*"
}
},
"filter": [
{
"terms": {
"type": [
"typeOne",
"typeTwo",
"typeThree"
]
}
}
]
}
}
}
Now what I have been trying to do is boost one of these terms over the other 2 in the results but have not been able to get it to work. I have tried adding a "boost" value but this has oddly given me the opposite effect - it disables any type that is given a boost.
I tried the following as the "filter" object:
"filter": [
{
"bool": {
"should": [
{
"term": {
"type": "typeOne"
}
},
{
"term": {
"type": "typeTwo"
}
},
{
"term": {
"type": "typeThree",
"boost": 2
}
}
]
}
}
]
But as I said before, instead of boosting "typeThree" it removes all "typeThree" from the results.
Can anyone help me boost a specific term type?
There are multiple ways to structure the query to achieve the above , one approach would be using function_score .It would look something on these lines
Example:
"query": {
"function_score": {
"functions": [
{
"filter": {
"term": {
"type": "typeThree"
}
},
"weight": 2
}
],
"score_mode": "sum",
"boost_mode": "sum",
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"data.*"
],
"default_operator": "AND",
"query": "*search term*"
}
},
"filter": [
{
"terms": {
"type": [
"typeOne",
"typeTwo",
"typeThree"
]
}
}
]
}
}
}
}
You can enable explain to see how this affects the scoring
While keety's answer was 98% of the way there, it took a bit of extra googling to get it all together. The problem is that "weight" doesn't work here, instead you must use "boost_factor". The final query looks like this:
{
"fields": [
],
"from": 0,
"size": 51,
"query": {
"function_score": {
"functions": [
{
"filter": {
"term": {
"type": "typeOne"
}
},
"boost_factor": 1.2
},
{
"filter": {
"term": {
"type": "typeTwo"
}
},
"boost_factor": 1.1
},
{
"filter": {
"term": {
"type": "typeThree"
}
},
"boost_factor": 1
}
],
"score_mode": "sum",
"boost_mode": "sum",
"query": {
"filtered": {
"query": {
"query_string": {
"fields": [
"data.*"
],
"default_operator": "AND",
"query": "*search term*"
}
},
"filter": [
{
"terms": {
"type": [
"typeOne",
"typeTwo",
"typeThree"
]
}
}
]
}
}
}
}
}

Resources