How to add fuzziness to search as you type field in Elasticsearch? - elasticsearch

I've been trying to add some fuzziness to my search as you type field type on Elasticsearch, but never got the needed query. Anyone have any idea to implement this?

Fuzzy Query returns documents that contain terms similar to the search term, as measured by a Levenshtein edit distance.
The fuzziness parameter can be specified as:
AUTO -- It generates an edit distance based on the length of the term.
For lengths:
0..2 -- must match exactly
3..5 -- one edit allowed Greater than 5 -- two edits allowed
Adding working example with index data and search query.
Index Data:
{
"title":"product"
}
{
"title":"prodct"
}
Search Query:
{
"query": {
"fuzzy": {
"title": {
"value": "prodc",
"fuzziness":2,
"transpositions":true,
"boost": 5
}
}
}
}
Search Result:
"hits": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_score": 2.0794415,
"_source": {
"title": "product"
}
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_score": 2.0794415,
"_source": {
"title": "produt"
}
}
]
Refer these blogs to get a detailed explaination on fuzzy query
https://www.elastic.co/blog/found-fuzzy-search
https://qbox.io/blog/elasticsearch-optimization-fuzziness-performance
Update 1:
Refer this ES official documentation
The fuzziness , prefix_length , max_expansions , rewrite , and
fuzzy_transpositions parameters are supported for the terms that are
used to construct term queries, but do not have an effect on the
prefix query constructed from the final term.
There are some open issues and discuss links that states that - Fuzziness not work with bool_prefix multi_match (search-as-you-type)
https://github.com/elastic/elasticsearch/issues/56229
https://discuss.elastic.co/t/fuzziness-not-work-with-bool-prefix-multi-match-search-as-you-type/229602/3

I know this question is asked long ago but I think this worked for me.
Since Elasticsearch allows a single field to be declared with multiple data types, my mapping is like below.
PUT products
{
"mappings": {
"properties": {
"title": {
"type": "text",
"fields": {
"product_type": {
"type": "search_as_you_type"
}
}
}
}
}
}
After adding some data to the index I fetched like this.
GET products/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "prodc",
"type": "bool_prefix",
"fields": [
"title.product_type",
"title.product_type._2gram",
"title.product_type._3gram"
]
}
},
{
"multi_match": {
"query": "prodc",
"fuzziness": 2
}
}
]
}
}
}

Related

Elastic Search Query Priority

If we search for "Padma Priya" and is there any exact match then we need to show that result first but what happening is It's showing "Padma" posts first only because of keyword density and weightage.
If there is no "Padma Priya" then we want to show "Padma" results and "Priya" Results as per keyword density and weightage.
If we found both "Padma Priya" Keyword in Text and the URL then we need to give high priority to URL then Title after that page content.
This is my query:
{
searchBody = {
"from" : 0,
"size" : size,
"query": {
"bool": {
"should" : [
{
"match": {
"location": {
"query": q,
// "boost": 5
}
}
},
{
"match": {
"title": {
"query": q,
"boost": 5
}
}
}
,
{
"match": {
"description": {
"query": q
}
}
}
]
}
}
};
}
As shown in ES boost doc first query, you can see how to give more boost/priority to your title fields than the page content which is very common use-case.
{
"mappings": {
"properties": {
"title": {
"type": "text",
"boost": 2 ---> note this same text will get twice priority/boost if found in title field.
},
"content": {
"type": "text"
}
}
}
}
for your first use-case of If we search for "Padma Priya" and is there any exact match, you need to combine the phrase query with your existing query to get the result at the top.
Concept will be clear by below example:
Index sample doc, index will be created automatically.
{
"title" : "Padma is the author of post",
"content" : "If we search for Padma and is there any exact match then we need to show that result first but what happening is It's showing Padma posts first only because of keyword density and weightage."
}
Index another doc which has padam priya as a phrase:
{
"title" : "Padma Priya",
"content" : "If we search for Padma and is there any exact match then we need to show that result first but what happening is It's showing Padma posts first only because of keyword density and weightage."
}
Search query
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "Padma Priya",
"fields": [
"title^3", --> this will give 3X priority of title field
"content"
]
}
},
{
"match_phrase": {
"title": "Padma Priya"
}
}
]
}
}
}
And search result
"hits": [
{
"_index": "indexboost",
"_type": "_doc",
"_id": "1",
"_score": 1.8336343, // note high score for exact match
"_source": {
"title": "Padma Priya",
"content": "If we search for Padma and is there any exact match then we need to show that result first but what happening is It's showing Padma posts first only because of keyword density and weightage."
}
},
{
"_index": "indexboost",
"_type": "_doc",
"_id": "2",
"_score": 0.9081677,
"_source": {
"title": "Padma is the author of post",
"content": "If we search for Padma and is there any exact match then we need to show that result first but what happening is It's showing Padma posts first only because of keyword density and weightage."
}
}
]

What is difference between match and bool must match query in Elasticsearch

What is the difference between Only match and bool must match query in ES?
First, Only use the match query
{
"query":{
"match":{
"address":"mill"
}
}
}
Second, use compound query
{
"query": {
"bool": {
"must": [
{ "match": { "address": "mill" } }
]
}
}
}
Can you tell me everything?
What is difference between them?
When you use only one match inside a bool must clause then there is no difference, the bool clause is useful when you want to combine multiple(boolean) criteria, more info on official ES doc. It supports below criteria.
must
must_not
filter
should
Let me show by taking a small example from your question.
Index mapping with just address and first_name
{
"mappings": {
"properties": {
"address": {
"type": "text"
},
"first_name" :{
"type" : "text"
}
}
}
}
Index 3 docs, all having same address mill, but different first_name
{
"address" : "mill",
"first_name" : "Johnson"
}
{
"address" : "mill",
"first_name" : "Parker"
}
{
"address" : "mill",
"first_name" : "opster"
}
Search query to show all adresses of mill but must_not contain first_name as parker
{
"query": {
"bool": {
"must": [
{
"match": {
"address": "mill"
}
},
{
"must_not": {
"first_name": "parker"
}
}
]
}
}
}
Result only 2 address
"hits": [
{
"_index": "so-60620921-bool",
"_type": "_doc",
"_id": "2",
"_score": 0.13353139,
"_source": {
"address": "mill",
"first_name": "opster"
}
},
{
"_index": "so-60620921-bool",
"_type": "_doc",
"_id": "3",
"_score": 0.13353139,
"_source": {
"address": "mill",
"first_name": "Johnson"
}
}
]
Based on the OP comments, providing the query and filter context, to understand the performance aspects in details.
As written in your question, they will perform the same action.
The match query is a very straight forward full-text condition statement.
The bool query allows you to add multiple fields and multiple conditions such as exists (to validate a certain field is found in the documents), should (an OR equivalent) and must_not (a NOT equivalent).
Taking again your example, since the bool query only has a single must, match condition, it will only return all the documents with the value mill contained in the address field.
Hope this is helpful! :)

Elasticsearch - pass fuzziness parameter in query_string

I have a fuzzy query with customized AUTO:10,20 fuzziness value.
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
How to convert it to a query_string query? I tried nike~AUTO:10,20 but it is not working.
It's possible with query_strng as well, let me show using the same example as OP provided, both match_query provided by OP matches and query_string fetches the same document with same score.
And according to this and this ES docs, Elasticsearch supports AUTO:10,20 format, which is shown in my example as well.
Also
Index mapping
{
"mappings": {
"properties": {
"name": {
"type": "text"
}
}
}
}
Index some doc
{
"name" : "nike"
}
Search query using match with fuzziness
{
"query": {
"match": {
"name": {
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Query_string with fuzziness
{
"query": {
"query_string": {
"fields": ["name"],
"query": "nike",
"fuzziness": "AUTO:10,20"
}
}
}
And result
"hits": [
{
"_index": "so-query",
"_type": "_doc",
"_id": "1",
"_score": 0.9808292,
"_source": {
"name": "nike"
}
}
]
Lucene syntax only allows you to specify "fuzziness" with the tilde symbol "~", optionally followed by 0, 1 or 2 to indicate the edit distance.
Elasticsearch Query DSL supports a configurable special value for AUTO which then is used to build the proper Lucene query.
You would need to implement that logic on your application side, by evaluating the desired edit distance based on the length of your search term and then use <searchTerm>~<editDistance> in your query_string-query.

Why is Elasticsearch filter showing all records?

I am using Elasticsearch 5.5 and trying to run a filter query on some metrics data. For example:
{
"_index": "zabbix_test-us-east-2-node2-2017.10.29",
"_type": "jmx",
"_id": "AV9lcbNtvbkfeNFaDYH2",
"_score": 0.00015684571,
"_source": {
"metric_value_number": 95721248,
"path": "/home/ubuntu/etc_logstash/jmx/zabbix_test",
"#timestamp": "2017-10-29T00:04:31.014Z",
"#version": "1",
"host": "18.221.245.150",
"index": "zabbix_test-us-east-2-node2",
"metric_path": "zabbix_test-us-east-2-node2.Memory.NonHeapMemoryUsage.used",
"type": "jmx"
}
},
{
"_index": "zabbix_test-us-east-2-node2-2017.10.29",
"_type": "jmx",
"_id": "AV9lcbNtvbkfeNFaDYIU",
"_score": 0.00015684571,
"_source": {
"metric_value_number": 0,
"path": "/home/ubuntu/etc_logstash/jmx/zabbix_test",
"#timestamp": "2017-10-29T00:04:31.030Z",
"#version": "1",
"host": "18.221.245.150",
"index": "zabbix_test-us-east-2-node2",
"metric_path": "zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count",
"type": "jmx"
}
}
I am running the following query:
GET /zabbix_test-us-east-2-node2-2017.10.29/jmx/_search
{
"query": {
"bool": {
"must": {
"match": {
"metric_path" : "zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count"
}
}
}
}
}
Even then if it displaying all records. However, if I use the following text, it works by showing exact matches:
GET /zabbix_test-us-east-2-node2-2017.10.29/jmx/_search
{
"query": {
"bool": {
"must": {
"match": {
"metric_path" : "zabbix_test-us-east-2-node2.Memory.NonHeapMemoryUsage.used"
}
}
}
}
}
Can anyone please tell me what wrong I am doing here?
Thanks.
You didn't mention anything about mappings so I suppose you're using dynamic mapping - you've just indexed documents like these two in your elasticsearch.
Once you visit
{yourhost}/zabbix_test-us-east-2-node2-2017.10.29/_mapping
you will see that metric_path field probably has type text which is default for strings. As documentation states:
A field to index full-text values, such as the body of an email or the description of a product. These fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed
So your field is processed by analyzer and finally you're not executing match against something like this: zabbix_test-us-east-2-node2.ClientRequest.ReadLatency.Count but rather against some analyzed form, probably split by periods, and some other special characters.
So if you want to perform filtering like you posted, you should statically define your index before indexing any documents. You don't have to do it for each property, but at least metric_path should be defined as keyword. So you can start with:
PUT {yourhost}/zabbix_test-us-east-2-node2-2017.10.29
{
"mappings": {
"jmx": {
"properties": {
"metric_path": {
"type": "keyword"
}
}
}
}
}
Then you should index your documents. Mapping for other fields will be established by ES dynamically, but both queries attached by you will return exactly one result - just as you expect.

Elasticsearch: positive boost when term is not present

I'm trying to implement a simple search for products using Elasticsearch.
One of the problems that I'm having is that often search queries have implied terms. For example, consider that when someone types in "lenovo thinkpad battery" they want a battery. However, when someone types in just "lenovo thinkpad" they want a laptop, even though that term doesn't appear in the query.
My solution for this is the following. Manually put together a bunch of related terms. For example, for the computer/laptop category I could have the terms "battery", "keyboard", "power cord", "adapter", "cable", "protection plan" etc. Then, whenever no such term is present in the search query, I positive boost all the results that don't contain those terms.
Is this possible with Elasticsearch?
EDIT:
Example documents
{"_source": { "item_title": "lenovo thinkpad white/black" },
"_source": { "item_title": "lenovo thinkpad battery" }
}
Mapping
{
"properties": {
"item_title": {
"type": "string"
}
}
}
Query
POST my_index/my_type/_search
{
"from": 0,
"size": 10,
"query": {
"match": {
"item_title": "lenovo thinkpad"
}
}
}
Query result:
"hits": {
"total": 2,
"max_score": 0.2169777,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 0.2169777,
"_source": {
"item_title": "lenovo thinkpad battery"
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.2169777,
"_source": {
"item_title": "lenovo thinkpad black/white"
}
}
]
}
Notice that the score for these two results is the same. However, since the query "lenovo thinkpad" doesn't contain one of those special terms that I manually picked out, like "battery", I would like documents that don't contain that term to be positive boosted, so that the document with "item_title": "lenovo thinkpad white/black" should have higher score in the query results.
If I execute the Following Query in my Wikipedia index
GET /_search
{
"query": {
"query_string": {
"query": "(Darmstadt)^10 (NOT School)^8",
"fields": [
"title^3"
],
"phrase_slop": 3,
"use_dis_max": true
}
}
}
I Still get Darmstadt School in the results further down the list (it comes in the first 10 normally)
If i execute the Following Query
GET /_search
{
"query": {
"query_string": {
"query": "(Darmstadt AND SCHOOL )^10 (NOT School)^8",
"fields": [
"title^3"
],
"phrase_slop": 3,
"use_dis_max": true
}
}
}
I Get Darmstadt School as the First result despite it being in the NOT clause.
So I suggest you do something similar.

Resources