Elasticsearch Term suggester is not returning correct suggestions when one character is missing (instead of misspelling) - elasticsearch

I'm using Elasticsearch term suggester for spell correction. my index contains huge list of ads. Each ad has subject and body fields. I've found a problematic example for which the suggester is not suggesting correct suggestions.
I have lots of ads whose subject contains word "soffa" and also 5 ads whose subject contain word "sofa". Ideally, when I send "sofa" (wrong spelling) as text to suggester, it should return "soffa" (correct spelling) as suggestions (since soffa is correct spell and most of ads contains "soffa" and only few ads contains "sofa" (wrong spell)).
Here is my suggester query body :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject",
"suggest_mode": "popular",
"min_word_length": 1
}
}
}
}
When I send above query, I get below response :
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
}
As you see in above response, it returned "soff" but not "soffa" although I have lots of docs whose subject contains "soffa".
I even played with parameters like suggest_mode and string_distance but still no luck.
I also used phrase suggester instead of term suggester but still same. Here is my phrase suggester query :
{
"suggest": {
"text": "sofa",
"subjectuggester": {
"phrase": {
"field": "subject",
"size": 10,
"gram_size": 3,
"direct_generator": [
{
"field": "subject.trigram",
"suggest_mode": "always",
"min_word_length":1
}
]
}
}
}
}
I somehow think it doesn't work when one character is missing instead of being misspelled. in the "soffa" example, one "f" is missing.
while it works fine for misspells e.g it works fine for "vovlo".
When I send "vovlo" it gives me "volvo".
Any help would be hugely appreciated.

Try changing the "string_distance".
{
"suggest": {
"text": "sof",
"subjectSuggester": {
"term": {
"field": "title",
"min_word_length":2,
"string_distance":"ngram"
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#term-suggester

I've found the workaround myself.
I added ngram filter and analyzer with max_shingle_size 3 which means trigram, then added a subfield with that analyzer (trigram) and performed suggester query on that field (instead of actual field) and it worked.
Here is the mapping changes :
{
"settings": {
"analysis": {
"filter": {
"shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
},
"analyzer": {
"trigram": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle"
],
"char_filter": [
"diacritical_marks_filter"
]
}
}
}
},
"mappings": {
"properties": {
"subject": {
"type": "text",
"fields": {
"trigram": {
"type": "text",
"analyzer": "trigram"
}
}
}
}
}
}
And here is my corrected query :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject.trigram",
"suggest_mode": "popular",
"min_word_length": 1,
"string_distance": "ngram"
}
}
}
}
Note that I'm performing suggester to subject.trigram instead of subject itself.
Here is the result :
{
"suggest": {
"subjectSuggester": [
{
"text": "sofa",
"offset": 0,
"length": 4,
"options": [
{
"text": "soffa",
"score": 0.8,
"freq": 282
},
{
"text": "soffan",
"score": 0.6666666,
"freq": 5
},
{
"text": "som",
"score": 0.625,
"freq": 102
},
{
"text": "sol",
"score": 0.625,
"freq": 82
},
{
"text": "sony",
"score": 0.625,
"freq": 50
}
]
}
]
}
}
As you can see above soffa appears as first suggestion.

There is sth weird in your result for the term suggester for the word sofa, take a look at the text that is being corrected:
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
As you can see it's sof and not sofa which means the correction is not for sofa but instead it's for sof, so I doubt that this issue is related to the analyzer you were using on this field, especially when looking at the results soff instead of soffa it's removing the last a

Related

Elasticsearch - How does one combine term suggestions from multiple fields?

The term suggester documentation lays out the basics of term suggester, but it leaves me wondering how I can find suggestions from multiple fields and combine them. I can probably come up with some implementation after-the-fact, but I'm wondering if there are some settings I'm missing.
For example, let's say I want to get suggestions from three different fields
GET product-search-product/_search
{
"suggest": {
"text": "som typu here",
"my-suggest-1": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_one"
}
},
"my-suggest-2": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_two"
}
},
"my-suggest-3": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_three"
}
}
}
}
This returns results I can use, but I have to figure out which field had the "best" suggestion.
"suggest": {
"my-suggest-1": [
{
"text": "som",
...
"options": [
{
"text": "somi"
...
}
]
},
{
"text": "typu",
...
"options": [
{
"text": "typo"
...
}
]
},
{
"text": "here",
...
"options": []
}
],
"my-suggest-2": [
{
"text": "som",
...
"options": [
{
"text": "some"
...
}
]
},
{
"text": "typu",
...
"options": []
},
{
"text": "here",
...
"options": []
}
],
"my-suggest-3": [
{
"text": "som",
...
"options": []
},
{
"text": "typu",
...
"options": [
{
"text": "typa"
...
}
]
},
{
"text": "here",
...
"options": []
}
]
}
It looks to me as if I have to implement something to determine which field came up with the best suggestions. Is there no way to combine these in the suggester so it can do that for me?
Phrase suggester was appropriate for my case and with the phrase suggester there exist candidate generators which appear to solve my problem.

Elasticsearch, HOW to make phrase suggester return the exact suggestion?

I am using elasticsearch 5.5.2
I am trying phrase suggester and NOT able to configure it to return the exact suggestion that is in the index already. My index settings, type mappings and phrase suggest query are given below. Please help.
My index settings and type mappings are
PUT test
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"trigram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["shingle"]
}
},
"filter": {
"shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
}
}
}
},
"mappings": {
"test": {
"properties": {
"title": {
"type": "text",
"fields": {
"trigram": {
"type": "text",
"analyzer": "trigram_analyzer"
}
}
}
}
}
}
}
Indexed document using
POST test/test?refresh=true
{"title": "noble prize"}
The phrase suggester I am using
POST test/_search
{
"suggest": {
"text": "nobe priz",
"simple_phrase": {
"phrase": {
"field": "title.trigram",
"size": 1,
"gram_size": 3,
"direct_generator": [ {
"field": "title.trigram",
"suggest_mode": "always"
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
The result I am getting is
"suggest": {
"simple_phrase": [
{
"text": "nobe priz",
"offset": 0,
"length": 9,
"options": [
{
"text": "noble priz",
"highlighted": "<em>noble</em> priz",
"score": 0.09049256
}
]
}
]
}
My question is, for the search text - 'nobe priz' - why I am NOT getting 'noble prize' as the suggestion. Instead why I am just getting 'noble priz'?
If we see, 'noble prize' is the document I have saved.
And if I increase the value of size to '2', then also I am NOT getting 'noble prize' as one of the suggestions.
With size as 2, for the search text 'nobe priz' I am getting the below response
"suggest": {
"simple_phrase": [
{
"text": "nobe priz",
"offset": 0,
"length": 9,
"options": [
{
"text": "noble priz",
"highlighted": "<em>nobel</em> priz",
"score": 0.09049256
},
{
"text": "nobe prize",
"highlighted": "nobe <em>prize</em>",
"score": 0.09049256
}
]
}
]
}
What should I do to get 'noble prize' as the suggestion?
Please help.
I found the answer myself. Need to tell ES how many terms in the search text are misspelled using the parameter 'max_errors'. 'max_errors' can be given as a percentage value in the form of float or an absolute number.
"click below for ES documentation on Phrase suggester with max_errors parameter"
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-suggesters-phrase.html
Accordingly I added 'max_errors' parameter value as 2 like below
POST test/_search
{
"suggest": {
"text": "nobe priz",
"simple_phrase": {
"phrase": {
"field": "title.trigram",
"size": 1,
"gram_size": 3,
"max_errors": 2,
"direct_generator": [ {
"field": "title.trigram",
"suggest_mode": "always"
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
And I got the exact matching phrase suggestion as below
"suggest": {
"simple_phrase": [
{
"text": "nobe priz",
"offset": 0,
"length": 9,
"options": [
{
"text": "noble prize",
"highlighted": "<em>noble prize</em>",
"score": 0.4833575
}
]
}
]
}
So with max_errors as 2, the suggestion 'noble prize' is getting returned.
Cheers :)

Elastic search suggest action doesn't return all possible suggestions

Seems like elastic suggest option doesn't return me what i expected. I guess it's about bad params but i don't see it.
If body contains word like "TOURNAMENT" and if i run suggest api for "TOURNAME" i got "TOURNAMENT" as suggestion. But if i run same api for "TOURNA" i don't get "TOURNAMENT" as suggestion.
EXPECTED BEHAVIOR:
{
"size": 0,
"suggest" : {
"my-sugg": {
"text" : "TOURNAME",
"term": {
"field" : "body",
"min_word_length" : "3"
}
}
}
}
I got expected result
"my-sugg": [
{
"text": "tourname",
"offset": 0,
"length": 8,
"options": [
{
"text": "tournament",
"score": 0.75,
"freq": 60
}
]
}
NOT EXPECTED BEHAVIOR:
If i run exactly the same api as above with
"my-sugg": {
"text" : "TOURNA":
"suggest": {
"my-sugg": [
{
"text": "tourna",
"offset": 0,
"length": 6,
"options": [
{
"text": "tour’s",
"score": 0.6666666,
"freq": 5
},
{
"text": "tour's",
"score": 0.6666666,
"freq": 3
},
{
"text": "tour",
"score": 0.5,
"freq": 34
},
{
"text": "turn",
"score": 0.5,
"freq": 2
}
]
}
]
I got wrong result. I would like to get "tournament' as one possible suggestion. Any clue?
I think you might be mixing up the term suggester with the completion suggester.
The first one allows you to do something like did you mean functionality, suggesting similar terms, where as the second allows you to implement search as you type functionality. I think this is what you are after here.

Elasticsearch term suggester does not return results on exact match

when i request the suggester with
{
"my-title-suggestions-1": {
"text": "tücher ",
"term": {
"field": "name",
}
},
"my-title-suggestions-2": {
"text": "tüchers ",
"term": {
"field": "name"
}
}
}
it returns
{
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"my-title-suggestions-1": [
{
"text": "tücher",
"offset": 0,
"length": 6,
"options": []
}
],
"my-title-suggestions-2": [
{
"text": "tüchers",
"offset": 0,
"length": 7,
"options": [
{
"text": "tücher",
"score": 0.8333333,
"freq": 6
}
]
}
]
}
i wonder why it does not return the exact match with the first suggester?
the second suggester obviously has that result.
can i add other options which will resolve this behavior?
edit:
the minimal mapping is just this ...
{
"name" : {
"analyzer" : "standard",
"type" : "string"
}
}
To add to what #ChintanShah25 said: According to https://www.elastic.co/guide/en/elasticsearch/reference/2.0/search-suggesters-term.html (see suggest_mode) the Term suggester will by default:
Only provide suggestions for suggest text terms that are not in the index.
I dont think you can do that and I am not sure why do you want exact match in suggestions, after all they are "suggestions".
Normally they are used to check misspelling. It will give you candidate suggestions that are similar and fall in edit distance of 2 for the word you entered.

elastic search suggester filter

I am implementing suggester filter for search operation using elastic search API.
I have encountered problem like I can do search base in prefix search only, but I cant do with middle word.
I had tried below example :
PUT / bls {
"mappings": {
"bl": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
},
"name_suggest": {
"type": "completion",
"context": {
"store": {
"type": "category"
},
"status": {
"type": "category"
}
}
}
}
}
}
}
and
POST / bls / bl / 1 {
"name": "LG 32LN5110 32 inches LED TV",
"name_suggest": {
"input": ["sony 32LN5110 32 inches LED TV"],
"context": {
"store": [
44,
45
],
"status": "Active"
}
}
}
POST / bls / _suggest ? pretty {
"name_suggest": {
"text": "sony",
"completion": {
"field": "name_suggest",
"context": {
"store": "44",
"status": "Active"
}
}
}
}
I got result with above query but I cant do search with below query :
POST / bls / _suggest ? pretty {
"name_suggest": {
"text": "LED",
"completion": {
"field": "name_suggest",
"context": {
"store": "44",
"status": "Active"
}
}
}
}
and this above query display results as below :
{
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"name_suggest": [{
"text": "LED",
"offset": 0,
"length": 3,
"options": []
}]
}
The String type are indexed by default. So even without specifying the type they are indexed with Default Analyzer if no specific analyzer was specified.
For your case, you must specify the
index: analyzed for name_suggest property
Such that an Anayzer containing whitespace analyzer is used, which will tokenize all the words in your input field. And hence can search anywhere across the text.

Resources