Fuzzy match words in any order in Elasticsearch - elasticsearch

What I need to achieve is to match documents based on single field (product name, which consists of basically all possible filter values). I know it is not the most reliable solution, but I only have this one field to work with.
I need to be able to send a search query and the words in that query to be matched in any order to the name field (name should contain all words from the search query). Actually at this point simple match_phrase_prefix works pretty well, but what is missing there is fuzziness. Because another thing we need is to allow user make some typos and still get relevant results.
My question is, is there any way to have match_phrase_prefix-like query, but with fuzziness?
I tried some nested bool queries with match, but I don't get anything near match_phrase_prefix this way.
Examples of what I tried:
Pretty good results, but no fuzziness:
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"name.standard": {
"query": "brand thing model",
"slop": 10
}
}
}
]
}
}
}
Fuzziness, but very limited matches:
{
"query": {
"bool": {
"must": [
{
"match": {
"name.standard": {
"query": "thing",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
},
{
"match": {
"name.standard": {
"query": "brand",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
}
]
}
}
}
Using should above, I get more results, but they are way less relevant than the ones from first query.

Above can be achieved by simple match query
{
"query": {
"match": {
"name.standard": {
"query": "brand thing model",
"operator": "and" ,//It means all of above 3 tokens must be present in any order
"fuzziness": "AUTO" // value as per your choice
}
}
}
}

Related

Fuzzy sentence search in elasticsearch based on edit distance of words

For a given index I have added documents like:
[
{"expression": "tell me something about elasticsearch"},
{"expression": "this is a new feature for elasticsearch"},
{"expression": "tell me something about kibana"},
# ... and so on
]
Now, I want to query elastic search in a such a that for given input expression:
"tell me something on elasticsearch". It must give out:
{"expression": "tell me something about elasticsearch"},
{"expression": "tell me something about kibana"}
Since it this case edit distance w.r.t. to words (not character level) is less in this case.
Can we perform such a query on elasticsearch?
as per my understanding fuzziness does not allow type phrase/match phrase.
But let me share few use cases for you and try if these are helpful.
If you want to perform search ignoring missing words use slop with match_phrase and not fuzziness(this may work for you)
GET demo_index/_search
{
"query": {
"match_phrase": {
"field1": {
"query": "tell me something elasticsearch",
"slop": 1 -----> you can increase it as per your requirement
}
}
}
}
Secondly if you want to perform search on character level changes you can use below queries with fuzziness
Single search on different fields
GET index_name/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "enginere", -----> Wrong spelling but still we will get result wherever query match **engineer** keyword. Again you can increase fuzziness.
"fields": [
"field_name1",
"field_name2",
...
],
"fuzziness": 1,
"slop": 1 -----> not compulsory
}
}
]
}
}
}
Multi search in different fields
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field1": {
"query": "text1",
"fuzziness": 1
}
}
},
{
"match": {
"field2": {
"query": "text2",
"fuzziness": 2
}
}
}
],
"filter": [
{
"match": {
"field3": "text3"
}
}
]
}
}
}

how to make match query on array field more accurate

example:
here is a document:
{
"_source": {
"name": [
"beef soup",
"chicken rice"
]
}
}
it can be recalled by below query
{
"match": {
"name": {
"query": "soup chicken noodle",
"minimum_should_match": "67%"
}
}
}
but I only want it to be recalled by keyword hot beef soup or rice chicken hainan, is there any way except nested or span query to do this, thanks.
my es query is complex, anyone know how to rewrite it by span query
{
"query": {
"bool": {
"filter": [
...
],
"must": {
"dis_max": {
"queries": [
{
"match": {
"array_field_3": {
"boost": 2,
"minimum_should_match": "67%",
"query": "keyword aa bb"
}
}
},
......
{
"nested": {
"path": "path_1",
"query": {
"must": {
"match": {
"array_field_6": {
......
"query": "keyword aa bb"
}
}
}
}
}
}
}
],
"tie_breaker": 0.15
}
}
}
}
}
You can use match_phrase but it will only work for entire phrase. if you want to do only keyword match on each element of array then it is not possible without nested or span as mentioned in document.
Arrays of objects do not work as you would expect: you cannot query
each object independently of the other objects in the array. If you
need to be able to do this then you should use the nested data type
instead of the object data type.
When you get a document back from Elasticsearch, any arrays will be in the same order as when you indexed the document. The _source field that you get back contains exactly the same JSON document that you indexed.
However, arrays are indexed — made searchable — as multi-value fields, which are unordered. At search time you can’t refer to “the first element” or “the last element”.
Please try match_phrase query:
POST index1/_search
{
"query": {
"match_phrase": {
"text": {
"query": "chicken soup"
}
}
}
}

Elasticsearch: alternative to cross_fields with fuzziness

I have an elasticsearch index with the standard analyzer. I would like to perfom search queries containing multiple words, e.g. human anatomy. This search should be performed across several fields:
Title
Subject
Description
All the words in the query should be present in any of the fields (e.g. 'human' in title and 'anatomy' in description, etc.). If not all the words are present across these fields, the result shouldn't be returned.
Now, more importantly, I want to get fuzzy matches (for example, these queries should return approximately the same results as human anatomy:
human anatom
human anatomic
humanic anatomic
etc.
So fuzziness should apply to every word in the query.
As Elasticsearch doesn't support fuzziness for the multi-match cross-fields queries, I have been trying to achieve the desired behaviour this way:
{
"query": {
"bool" : {
"must": [
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "human",
"fuzziness": 2,
}
}
},
]
}
}
},
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
]
}
}
},
]
}
}
}
The idea behind this code is the following: find the results where
either of the fields contains human (with 2-letter edit distance, e.g.: humane, humon, humanic, etc.)
and
either of the fields contains anatomy (with 2-letter edit distance, e.g.: anatom, anatomic, etc.).
Unfortunately, this code does not work and fails to retrieve a great number of relevant results. For example (the edit distance between each of the words in the two queries <= 2):
human anatomic – 0 results
humans anatomy – 21 results
How can I make fuzziness work within the given conditions? Recreating the index with n-gram is currently not an option, so I would like to make fuzziness work.

Elasticsearch fuzzy query and match with fuzziness

So i saw these two queries.
First one is match with fuzziness option
{
"query": {
"match": {
"user": {
"query": "ki",
"fuzziness": "AUTO"
}
}
}
}
Second one is normal fuzzy search
{
"query": {
"fuzzy": {
"user": {
"value": "ki"
}
}
}
}
Result is pretty much the same. But my question is, does the query really does the same structure? and which one to use for fuzziness best practice?
In your example the results are the same. However, the fuzzy query behaves like a term query, so it does not perform analysis beforehand, whereas the match query does.
So if you searched for an address field containing pigeon street and indexed with a standard analyser, this query would work
GET my-index/_search
{
"query": {
"match": {
"address": {
"query": "wigeon street",
"fuzziness": 1
}
}
}
}
but this one would not:
GET my-index/_search
{
"query": {
"fuzzy": {
"address": {
"value": "wigeon street"
}
}
}
}

Does the structure of a match query affect the server

I'm writing some code to generate queries and I wondered if there was any one way of generating the queries that was kinder to the server.
So this query:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"Text": {
"query": "Scooby Shaggy corridor",
"fuzziness": 1,
"operator": "AND"
}
}
}
]
}
}
}
is logically equivalent to this:
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"match": {
"Text": {
"query": "Scooby",
"fuzziness": 1
}
}
},
{
"match": {
"Text": {
"query": "Shaggy",
"fuzziness": 1
}
}
},
{
"match": {
"Text": {
"query": "corridor",
"fuzziness": 1
}
}
}
]
}
}
}
but is either one easier for the server to process?
Or does it make no difference?
I realise this is a trivial example but could it make a difference with more complex queries?
If someone who knows a bit about how ElasticSearch behaves under the hood could make an observation I'd be grateful.
Thanks,
Adam.
Elasticsearch will rewrite itself your multi-term match query to the logical equivalent. see here for more details.
The match query is of type boolean. It means that the text provided is
analyzed and the analysis process constructs a boolean query from the
provided text.
But you should keep the multi-term match query and let elasticsearch do the job. Its more maintainable and you can control the rewriting thanks to the rewrite parameter ( see here )

Resources