Elasticsearch: alternative to cross_fields with fuzziness - elasticsearch

I have an elasticsearch index with the standard analyzer. I would like to perfom search queries containing multiple words, e.g. human anatomy. This search should be performed across several fields:
Title
Subject
Description
All the words in the query should be present in any of the fields (e.g. 'human' in title and 'anatomy' in description, etc.). If not all the words are present across these fields, the result shouldn't be returned.
Now, more importantly, I want to get fuzzy matches (for example, these queries should return approximately the same results as human anatomy:
human anatom
human anatomic
humanic anatomic
etc.
So fuzziness should apply to every word in the query.
As Elasticsearch doesn't support fuzziness for the multi-match cross-fields queries, I have been trying to achieve the desired behaviour this way:
{
"query": {
"bool" : {
"must": [
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "human",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "human",
"fuzziness": 2,
}
}
},
]
}
}
},
{
"query": {
"bool":
{
"should": [
{
"match": {
"title": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"description": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
{
"match": {
"subject": {
"query": "anatomy",
"fuzziness": 2,
}
}
},
]
}
}
},
]
}
}
}
The idea behind this code is the following: find the results where
either of the fields contains human (with 2-letter edit distance, e.g.: humane, humon, humanic, etc.)
and
either of the fields contains anatomy (with 2-letter edit distance, e.g.: anatom, anatomic, etc.).
Unfortunately, this code does not work and fails to retrieve a great number of relevant results. For example (the edit distance between each of the words in the two queries <= 2):
human anatomic – 0 results
humans anatomy – 21 results
How can I make fuzziness work within the given conditions? Recreating the index with n-gram is currently not an option, so I would like to make fuzziness work.

Related

Fuzzy sentence search in elasticsearch based on edit distance of words

For a given index I have added documents like:
[
{"expression": "tell me something about elasticsearch"},
{"expression": "this is a new feature for elasticsearch"},
{"expression": "tell me something about kibana"},
# ... and so on
]
Now, I want to query elastic search in a such a that for given input expression:
"tell me something on elasticsearch". It must give out:
{"expression": "tell me something about elasticsearch"},
{"expression": "tell me something about kibana"}
Since it this case edit distance w.r.t. to words (not character level) is less in this case.
Can we perform such a query on elasticsearch?
as per my understanding fuzziness does not allow type phrase/match phrase.
But let me share few use cases for you and try if these are helpful.
If you want to perform search ignoring missing words use slop with match_phrase and not fuzziness(this may work for you)
GET demo_index/_search
{
"query": {
"match_phrase": {
"field1": {
"query": "tell me something elasticsearch",
"slop": 1 -----> you can increase it as per your requirement
}
}
}
}
Secondly if you want to perform search on character level changes you can use below queries with fuzziness
Single search on different fields
GET index_name/_search
{
"query": {
"bool": {
"should": [
{
"multi_match": {
"query": "enginere", -----> Wrong spelling but still we will get result wherever query match **engineer** keyword. Again you can increase fuzziness.
"fields": [
"field_name1",
"field_name2",
...
],
"fuzziness": 1,
"slop": 1 -----> not compulsory
}
}
]
}
}
}
Multi search in different fields
GET index_name/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"field1": {
"query": "text1",
"fuzziness": 1
}
}
},
{
"match": {
"field2": {
"query": "text2",
"fuzziness": 2
}
}
}
],
"filter": [
{
"match": {
"field3": "text3"
}
}
]
}
}
}

Elasic search: find doc by id and highlight words based on query string

I like to find an document in elastic search an highlight terms based on an query string.
Is this possible?
I tried to run an query-string elastic search and filter the result based on ID. But those sounds not very efficient, because elastic first generates an huge list of all document matched the querystring (which could by millions) an pic only one document based on the filter.
Is there a way or query-contstruct to combine querystring and "search for term in _id field" in one boolean search?
Something like this (which is not working):
"query": {
"bool": {
"must": {
"query_string": {
"query": "red*",
"fields": [
"text",
"title"
]
},
"term": {
"_id":"fda72434fa172"
}
}
}
},
"highlight": {
"fields": {
[...]
I made a small example that can be a starting point.
Use filter to perform your query and retrieve the doc by id.
Then I used match and highlight to highlight the term I want.
POST test/_doc/fda72434fa172
{
"text": "I like to find an document in elastic search an highlight terms based on an query string. Is this possible?"
}
GET test/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"_id": "fda72434fa172"
}
}
],
"must": [
{
"match": {
"text": {
"query": "elastic search"
}
}
}
]
}
},
"highlight": {
"fields": {
"text": {}
}
}
}

Fuzzy match words in any order in Elasticsearch

What I need to achieve is to match documents based on single field (product name, which consists of basically all possible filter values). I know it is not the most reliable solution, but I only have this one field to work with.
I need to be able to send a search query and the words in that query to be matched in any order to the name field (name should contain all words from the search query). Actually at this point simple match_phrase_prefix works pretty well, but what is missing there is fuzziness. Because another thing we need is to allow user make some typos and still get relevant results.
My question is, is there any way to have match_phrase_prefix-like query, but with fuzziness?
I tried some nested bool queries with match, but I don't get anything near match_phrase_prefix this way.
Examples of what I tried:
Pretty good results, but no fuzziness:
{
"query": {
"bool": {
"must": [
{
"match_phrase_prefix": {
"name.standard": {
"query": "brand thing model",
"slop": 10
}
}
}
]
}
}
}
Fuzziness, but very limited matches:
{
"query": {
"bool": {
"must": [
{
"match": {
"name.standard": {
"query": "thing",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
},
{
"match": {
"name.standard": {
"query": "brand",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
}
]
}
}
}
Using should above, I get more results, but they are way less relevant than the ones from first query.
Above can be achieved by simple match query
{
"query": {
"match": {
"name.standard": {
"query": "brand thing model",
"operator": "and" ,//It means all of above 3 tokens must be present in any order
"fuzziness": "AUTO" // value as per your choice
}
}
}
}

"match_phrase" hit with no highlights returned

I have an index that includes the full text of different books belonging to a specific series. Each document represents a different volume in a series, and each volume has a set of nested documents on it corresponding to a section of text in that book. This is the query we are using in order to get highlights matching a specific phrase within all the books of a given series:
{
"from": 0,
"size": 3,
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"must": [
{
"match": {
"sections.content.phrase": {
"query": "theory legal",
"type": "phrase",
"slop": X
}
}
}
]
}
},
"path": "sections",
"inner_hits": {
"highlight": {
"order": "score",
"fields": {
"sections.content.phrase": {}
}
},
"_source": {
"include": [
"title",
"id"
]
}
}
}
}
],
"filter": [
{
"term": {
"series": "00410"
}
}
]
}
}
}
Normally this query works fine, but for some series we can get hits in books with no highlighted text returned. For example with the above phrase query, series, and a slop value of 1 we correctly get a single hit for one book in the series: (each allegation of discrimination or each <em>theory</em> of <em>legal</em> recovery not required to be set forth in separate. If we take the same query and up the slop value to 3 we suddenly get hits in 5 different books each with no matching highlights found. Not even the original hit from when the slop value was 1 is returned. Why are we getting these results?

Exact and fuzzy search

My setup:
I have some documents with name "Apple", "Apple delicous", ...
This is my query:
GET p_index/_search
{
"query": {
"bool": {
"should": [
{"match": {
"name": "apple"
}},
{ "fuzzy": {
"name": "apple"
}}
]
}
}
}
I want achieve, that first the exact match is shown and then the fuzzy one:
apple
apple delicous
Second, i am wondering that i did not get any result if i enter only app in the search:
GET p_index/_search
{
"query": {
"bool": {
"should": [
{"match": {
"name": "app"
}},
{ "fuzzy": {
"name": "app"
}}
]
}
}
}
There are two problems here.
1)To give higher score to an exact match you could try adding "index" : "not_analyzed" to your name field like this.
name: {
type: 'string',
"fields": {
"raw": {
"type": "string",
"index" : "not_analyzed" <--- here
}
}
}
After that your query would look like this
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "apple"
}
},
{
"match": {
"name.raw": "apple"
},
"boost": 5
}
]
}
}
}
This will give higher score for document with "apple" than "apple delicous"
2)To better understand fuzziness you should go through this and this article.
From the Docs
The fuzziness parameter can be set to AUTO, which results in the
following maximum edit distances:
0 for strings of one or two characters
1 for strings of three, four, or five characters
2 for strings of more than five characters
So, the reason your fuzzy query did not return apple for app is because fuzziness i.e edit distance is 2 between those words and since "app" is only three letter word, fuzziness value is 1. You could achieve the desired result with following query
{
"query": {
"fuzzy": {
"name": {
"value": "app",
"fuzziness": 2
}
}
}
}
I seriously would not recommend using this query, because It will return bizarre results, the above query will return cap, arm, pip and lot of other words as they fall within edit distance of 2.
This would better query
{
"query": {
"fuzzy": {
"name": {
"value": "appl"
}
}
}
}
It will return apple.
I hope this helps.
I think ,This will help you.
{"query":{"bool":{"must":[{"function_score":{"query":{"multi_match":{"query":"airetl","fields":["brand_lower"],"boost":1,"fuzziness":Auto,"prefix_length":1}}}}}]}}

Resources