Fuzzy sentence search in elasticsearch based on edit distance of words - elasticsearch

For a given index I have added documents like:
[
{"expression": "tell me something about elasticsearch"},
{"expression": "this is a new feature for elasticsearch"},
{"expression": "tell me something about kibana"},
# ... and so on
]
Now, I want to query elastic search in a such a that for given input expression:
"tell me something on elasticsearch". It must give out:
{"expression": "tell me something about elasticsearch"},
{"expression": "tell me something about kibana"}
Since it this case edit distance w.r.t. to words (not character level) is less in this case.
Can we perform such a query on elasticsearch?

as per my understanding fuzziness does not allow type phrase/match phrase.
But let me share few use cases for you and try if these are helpful.
If you want to perform search ignoring missing words use slop with match_phrase and not fuzziness(this may work for you)
GET demo_index/_search
{
"query": {
"match_phrase": {
"field1": {
"query": "tell me something elasticsearch",
"slop": 1 -----> you can increase it as per your requirement
}
}
}
}
Secondly if you want to perform search on character level changes you can use below queries with fuzziness
Single search on different fields
GET index_name/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "enginere", -----> Wrong spelling but still we will get result wherever query match **engineer** keyword. Again you can increase fuzziness.
"fields": [
"field_name1",
"field_name2",
...
],
"fuzziness": 1,
"slop": 1 -----> not compulsory
}
}
]
}
}
}
Multi search in different fields
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field1": {
"query": "text1",
"fuzziness": 1
}
}
},
{
"match": {
"field2": {
"query": "text2",
"fuzziness": 2
}
}
}
],
"filter": [
{
"match": {
"field3": "text3"
}
}
]
}
}
}

Related

Fuzzy match words in any order in Elasticsearch

What I need to achieve is to match documents based on single field (product name, which consists of basically all possible filter values). I know it is not the most reliable solution, but I only have this one field to work with.
I need to be able to send a search query and the words in that query to be matched in any order to the name field (name should contain all words from the search query). Actually at this point simple match_phrase_prefix works pretty well, but what is missing there is fuzziness. Because another thing we need is to allow user make some typos and still get relevant results.
My question is, is there any way to have match_phrase_prefix-like query, but with fuzziness?
I tried some nested bool queries with match, but I don't get anything near match_phrase_prefix this way.
Examples of what I tried:
Pretty good results, but no fuzziness:
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"name.standard": {
"query": "brand thing model",
"slop": 10
}
}
}
]
}
}
}
Fuzziness, but very limited matches:
{
"query": {
"bool": {
"must": [
{
"match": {
"name.standard": {
"query": "thing",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
},
{
"match": {
"name.standard": {
"query": "brand",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
}
]
}
}
}
Using should above, I get more results, but they are way less relevant than the ones from first query.
Above can be achieved by simple match query
{
"query": {
"match": {
"name.standard": {
"query": "brand thing model",
"operator": "and" ,//It means all of above 3 tokens must be present in any order
"fuzziness": "AUTO" // value as per your choice
}
}
}
}

Elasticsearch: alternative to cross_fields with fuzziness

I have an elasticsearch index with the standard analyzer. I would like to perfom search queries containing multiple words, e.g. human anatomy. This search should be performed across several fields:
Title
Subject
Description
All the words in the query should be present in any of the fields (e.g. 'human' in title and 'anatomy' in description, etc.). If not all the words are present across these fields, the result shouldn't be returned.
Now, more importantly, I want to get fuzzy matches (for example, these queries should return approximately the same results as human anatomy:
human anatom
human anatomic
humanic anatomic
etc.
So fuzziness should apply to every word in the query.
As Elasticsearch doesn't support fuzziness for the multi-match cross-fields queries, I have been trying to achieve the desired behaviour this way:
{
"query": {
"bool" : {
"must": [
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "human",
"fuzziness": 2,
}
}
},
]
}
}
},
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
]
}
}
},
]
}
}
}
The idea behind this code is the following: find the results where
either of the fields contains human (with 2-letter edit distance, e.g.: humane, humon, humanic, etc.)
and
either of the fields contains anatomy (with 2-letter edit distance, e.g.: anatom, anatomic, etc.).
Unfortunately, this code does not work and fails to retrieve a great number of relevant results. For example (the edit distance between each of the words in the two queries <= 2):
human anatomic – 0 results
humans anatomy – 21 results
How can I make fuzziness work within the given conditions? Recreating the index with n-gram is currently not an option, so I would like to make fuzziness work.

prefixQuery in Elastic search not working

{
"from": 0,
"size": 100,
"timeout": "10m",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{
"bool": {
"filter": [
{
"bool": {
"must": [
{
"term": {
"input.custom_attrs.index": {
"value": "1",
"boost": 1
}
}
}
]
}
},
{
"bool": {
"must": [
{
"prefix": {
"input.custom_attrs.value": {
"value": "An*",
"boost": 1
}
}
}
]
}
}
]
}
}
]
}
}
]
}
}
}
Explanation -
I want to search the field with "An" as prefix .
Also i am sure that there is data with value "Annual" and "Annual Fund" ,which should appears in all match search .
But these records are not appearing with prefix query as given above.I tried with regexp query and wildcard query too .But they are also not working .Please give your valuable suggestions how to make the query working.
Possible causes why it's not working
look like while indexing data you used the default mapping or text field, which uses the default standard analyzer which converts the generated tokens to lowercase.
While prefix queries are not analyzed and search term doesn't go through any analyzer and will not be lowercased.
In your case, you are searching for An, note capital A, while for Annual and Annual fund, tokens would be annual and annual and fund, hence its not matching.
Solution:
Please use an as your prefix query and you should get your search results.

Is there any option to minimize this elastic search must not match query?

I'm trying to avoid some text from the field and for that I have used must not condition but, it seems to be static also took more lines. So, please let me know is there any other option to optimize this query.
Here is the query,
"must_not": [
{
"match": {
"field.keyword": "welcome"
}
},
{
"match": {
"field.keyword": "Welcome"
}
},
{
"match": {
"field.keyword": "entry_point"
}
},
{
"match": {
"field.keyword": "Entry point"
}
}
]
Thanks,
If search text is same , you can use multi- match which will search for text in multiple fields
"bool": {
"must_not": [
{
"multi_match": {
"query": "text",
"fields": ["field1.keyword","field2.keyword"]
}
}
]
}
If field is same and texts are different , you can use terms query
"must_not": [
{
"terms": {
"field.keyword": [
"VALUE1",
"VALUE2"
]
}
}
]
If both fields and texts are different you will have to use query in your question.
As you said you are not looking for an exact match i would just use query string for single words and match phrase for phrases.
"must_not": [
{
"query_string": {
"query": "welcome OR Welcome"
}
},
{
"match_phrase": {
"title": {
"query": "entry point",
}
}
}
]
I'm not sure which analyzer you use but if you use lowercase + alphanumeric only for example you wont have to have "duplicate" queries like "welcome" and "Welcome".

How to implement 'Starts with' search in elasticsearch 2.x

I have a requirement where I need to return only those records whose comments donot start with a String. PFB the query and this approach is not working. Need help
{
"size": 0,
"fields": ["id","comment"],
"query": {
"bool": {
"must_not": [
{
"wildcard": {
"comment":
"AG//*"
}
}
]
}
}
}
First, you should remove the "size": 0 from your query (or set the required size) to see the results.
Now, the best way to implement 'Starts with' in elasticsearch is by using the Prefix Query as follows:
{
"fields": ["id", "comment"],
"query": {
"bool": {
"must_not": [
{
"prefix": {
"comment": "AG" <-- No need for any wildcards
}
}
]
}
}
}
Note: The Prefix Query and Wildcard Query makes sense only on not_analyzed fields, so make sure your "comment" field has the same mapping.

Resources