Elasticsearch, HOW to make phrase suggester return the exact suggestion? - elasticsearch

I am using elasticsearch 5.5.2
I am trying phrase suggester and NOT able to configure it to return the exact suggestion that is in the index already. My index settings, type mappings and phrase suggest query are given below. Please help.
My index settings and type mappings are
PUT test
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"trigram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["shingle"]
}
},
"filter": {
"shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
}
}
}
},
"mappings": {
"test": {
"properties": {
"title": {
"type": "text",
"fields": {
"trigram": {
"type": "text",
"analyzer": "trigram_analyzer"
}
}
}
}
}
}
}
Indexed document using
POST test/test?refresh=true
{"title": "noble prize"}
The phrase suggester I am using
POST test/_search
{
"suggest": {
"text": "nobe priz",
"simple_phrase": {
"phrase": {
"field": "title.trigram",
"size": 1,
"gram_size": 3,
"direct_generator": [ {
"field": "title.trigram",
"suggest_mode": "always"
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
The result I am getting is
"suggest": {
"simple_phrase": [
{
"text": "nobe priz",
"offset": 0,
"length": 9,
"options": [
{
"text": "noble priz",
"highlighted": "<em>noble</em> priz",
"score": 0.09049256
}
]
}
]
}
My question is, for the search text - 'nobe priz' - why I am NOT getting 'noble prize' as the suggestion. Instead why I am just getting 'noble priz'?
If we see, 'noble prize' is the document I have saved.
And if I increase the value of size to '2', then also I am NOT getting 'noble prize' as one of the suggestions.
With size as 2, for the search text 'nobe priz' I am getting the below response
"suggest": {
"simple_phrase": [
{
"text": "nobe priz",
"offset": 0,
"length": 9,
"options": [
{
"text": "noble priz",
"highlighted": "<em>nobel</em> priz",
"score": 0.09049256
},
{
"text": "nobe prize",
"highlighted": "nobe <em>prize</em>",
"score": 0.09049256
}
]
}
]
}
What should I do to get 'noble prize' as the suggestion?
Please help.

I found the answer myself. Need to tell ES how many terms in the search text are misspelled using the parameter 'max_errors'. 'max_errors' can be given as a percentage value in the form of float or an absolute number.
"click below for ES documentation on Phrase suggester with max_errors parameter"
https://www.elastic.co/guide/en/elasticsearch/reference/master/search-suggesters-phrase.html
Accordingly I added 'max_errors' parameter value as 2 like below
POST test/_search
{
"suggest": {
"text": "nobe priz",
"simple_phrase": {
"phrase": {
"field": "title.trigram",
"size": 1,
"gram_size": 3,
"max_errors": 2,
"direct_generator": [ {
"field": "title.trigram",
"suggest_mode": "always"
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
And I got the exact matching phrase suggestion as below
"suggest": {
"simple_phrase": [
{
"text": "nobe priz",
"offset": 0,
"length": 9,
"options": [
{
"text": "noble prize",
"highlighted": "<em>noble prize</em>",
"score": 0.4833575
}
]
}
]
}
So with max_errors as 2, the suggestion 'noble prize' is getting returned.
Cheers :)

Related

Elasticsearch Term suggester is not returning correct suggestions when one character is missing (instead of misspelling)

I'm using Elasticsearch term suggester for spell correction. my index contains huge list of ads. Each ad has subject and body fields. I've found a problematic example for which the suggester is not suggesting correct suggestions.
I have lots of ads whose subject contains word "soffa" and also 5 ads whose subject contain word "sofa". Ideally, when I send "sofa" (wrong spelling) as text to suggester, it should return "soffa" (correct spelling) as suggestions (since soffa is correct spell and most of ads contains "soffa" and only few ads contains "sofa" (wrong spell)).
Here is my suggester query body :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject",
"suggest_mode": "popular",
"min_word_length": 1
}
}
}
}
When I send above query, I get below response :
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
}
As you see in above response, it returned "soff" but not "soffa" although I have lots of docs whose subject contains "soffa".
I even played with parameters like suggest_mode and string_distance but still no luck.
I also used phrase suggester instead of term suggester but still same. Here is my phrase suggester query :
{
"suggest": {
"text": "sofa",
"subjectuggester": {
"phrase": {
"field": "subject",
"size": 10,
"gram_size": 3,
"direct_generator": [
{
"field": "subject.trigram",
"suggest_mode": "always",
"min_word_length":1
}
]
}
}
}
}
I somehow think it doesn't work when one character is missing instead of being misspelled. in the "soffa" example, one "f" is missing.
while it works fine for misspells e.g it works fine for "vovlo".
When I send "vovlo" it gives me "volvo".
Any help would be hugely appreciated.
Try changing the "string_distance".
{
"suggest": {
"text": "sof",
"subjectSuggester": {
"term": {
"field": "title",
"min_word_length":2,
"string_distance":"ngram"
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#term-suggester
I've found the workaround myself.
I added ngram filter and analyzer with max_shingle_size 3 which means trigram, then added a subfield with that analyzer (trigram) and performed suggester query on that field (instead of actual field) and it worked.
Here is the mapping changes :
{
"settings": {
"analysis": {
"filter": {
"shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
},
"analyzer": {
"trigram": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle"
],
"char_filter": [
"diacritical_marks_filter"
]
}
}
}
},
"mappings": {
"properties": {
"subject": {
"type": "text",
"fields": {
"trigram": {
"type": "text",
"analyzer": "trigram"
}
}
}
}
}
}
And here is my corrected query :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject.trigram",
"suggest_mode": "popular",
"min_word_length": 1,
"string_distance": "ngram"
}
}
}
}
Note that I'm performing suggester to subject.trigram instead of subject itself.
Here is the result :
{
"suggest": {
"subjectSuggester": [
{
"text": "sofa",
"offset": 0,
"length": 4,
"options": [
{
"text": "soffa",
"score": 0.8,
"freq": 282
},
{
"text": "soffan",
"score": 0.6666666,
"freq": 5
},
{
"text": "som",
"score": 0.625,
"freq": 102
},
{
"text": "sol",
"score": 0.625,
"freq": 82
},
{
"text": "sony",
"score": 0.625,
"freq": 50
}
]
}
]
}
}
As you can see above soffa appears as first suggestion.
There is sth weird in your result for the term suggester for the word sofa, take a look at the text that is being corrected:
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
As you can see it's sof and not sofa which means the correction is not for sofa but instead it's for sof, so I doubt that this issue is related to the analyzer you were using on this field, especially when looking at the results soff instead of soffa it's removing the last a

Elasticsearch - How does one combine term suggestions from multiple fields?

The term suggester documentation lays out the basics of term suggester, but it leaves me wondering how I can find suggestions from multiple fields and combine them. I can probably come up with some implementation after-the-fact, but I'm wondering if there are some settings I'm missing.
For example, let's say I want to get suggestions from three different fields
GET product-search-product/_search
{
"suggest": {
"text": "som typu here",
"my-suggest-1": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_one"
}
},
"my-suggest-2": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_two"
}
},
"my-suggest-3": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_three"
}
}
}
}
This returns results I can use, but I have to figure out which field had the "best" suggestion.
"suggest": {
"my-suggest-1": [
{
"text": "som",
...
"options": [
{
"text": "somi"
...
}
]
},
{
"text": "typu",
...
"options": [
{
"text": "typo"
...
}
]
},
{
"text": "here",
...
"options": []
}
],
"my-suggest-2": [
{
"text": "som",
...
"options": [
{
"text": "some"
...
}
]
},
{
"text": "typu",
...
"options": []
},
{
"text": "here",
...
"options": []
}
],
"my-suggest-3": [
{
"text": "som",
...
"options": []
},
{
"text": "typu",
...
"options": [
{
"text": "typa"
...
}
]
},
{
"text": "here",
...
"options": []
}
]
}
It looks to me as if I have to implement something to determine which field came up with the best suggestions. Is there no way to combine these in the suggester so it can do that for me?
Phrase suggester was appropriate for my case and with the phrase suggester there exist candidate generators which appear to solve my problem.

Elasticsearch: Why can't I use "5m" for precision in context queries?

I'm running on Elasticsearch 5.5
I have a document with the following mapping
"mappings": {
"shops": {
"properties": {
"locations": {
"type": "geo_point"
},
"name": {
"type": "keyword"
},
"suggest": {
"type": "completion",
"contexts": [
{
"name": "location",
"type": "GEO",
"precision": "10m",
"path": "locations"
}
]
}
}
}
I'll add a document as follows:
PUT my_index/shops
{
"name":"random shop",
"suggest":{
"input":"random shop"
},
"locations":[
{
"lat":42.38471212,
"lon":-71.12612357
}
]
}
I try to query for the document with the follow JSON call
GET my_shops/_search
{
"suggest": {
"result": {
"prefix": "random",
"completion": {
"field": "suggest",
"size": 5,
"fuzzy": true,
"contexts": {
"location": [{
"lat": 42.38471212,
"lon": -71.12612357,
"precision": "10mi"
}]
}
}
}
}
}
I get the following errors:
(source: discourse.org)
But when I change the "precision" field to an int, I get the intended search results.
I'm confused on two fronts.
Why is there a context error? The documentation seems to say that this is ok
https://www.elastic.co/guide/en/elasticsearch/reference/5.5/suggester-context.html
Why can't I use string values for the precision values?
At the bottom of the page, I see that the precision values can take either distances or numeric values.

Mappings on filed Elastic Search

I am using elastic search for autocompletion and also to correct spelling mistakes.I have this mapping for my field(for auto-completion).
**Mapping:**
"name": {
"type": "text",
"analyzer": "autocomplete"
}
Now i want to implement phrase suggester on this field.When i use this it is giving wrong result.Thats because of existing mapping i think.
**POST XYZ/_search**
{
"suggest": {
"text": "ipone 16",
"simple_phrase": {
"phrase": {
"field": "name",
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
**Results:**
"options": [
{
"text": "i ip ipo iphon iphone 1 16",
"highlighted": "i ip ipo <em>iphon iphone</em> 1 16",
"score": 1.6111489e-8
},
{
"text": "i ip ipo iphon iphon 1 16",
"highlighted": "i ip ipo <em>iphon iphon</em> 1 16",
"score": 1.4219211e-8
},
{
"text": "i ip ipo ipho iphone 1 16",
"highlighted": "i ip ipo <em>ipho iphone</em> 1 16",
"score": 1.3510152e-8
},
{
"text": "i ip ipo ipho iphon 1 16",
"highlighted": "i ip ipo <em>ipho iphon</em> 1 16",
"score": 1.1923397e-8
},
{
"text": "i ip ipo iron iphone 1 16",
"highlighted": "i ip ipo <em>iron iphone</em> 1 16",
"score": 6.443544e-9
}
]
**From the document i should use this for phrase suggester.**
"mappings": {
"test": {
"properties": {
"title": {
"type": "text",
"fields": {
"trigram": {
"type": "text",
"analyzer": "trigram"
},
"reverse": {
"type": "text",
"analyzer": "reverse"
}
}
}
}
**How can i use two different mapping on same filed?**
As your results are not tokenized properly the problem could be from
your aurocomplete analyzer. please provide your _settings to see the
defination for your analyzers.
Do Your query on name.trigram.
After solving this problem it's good to prune your result using collate
You can write a query like this. Please provide output for this query.
It's also good to have your trigram analyzer settings (tokenizer, char mappers and token filters)
{
"suggest": {
"text": "noble prize",
"simple_phrase": {
"phrase": {
"field": "name_suggest.trigram",
"size": 1,
"gram_size": 3,
"direct_generator": [
{
"field": "name_suggest.trigram",
"suggest_mode": "always"
}
],
"collate": {
"query": {
"inline": {
"match": {
"title": "{{suggestion}}"
}
}
},
"prune": true
},
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}

Elasticsearch unexpected results when sorting against deeply nested attributes

I'm trying to perform some sorting based on the attributes of a document's deeply nested children.
Let's say we have an index filled with publisher documents. A publisher has a collection of books, and
each book has a title, a published flag, and a collection of genre scores. A genre_score represents how well
a particular book matches a particular genre, or in this case a genre_id.
First, let's define some mappings (for simplicity, we will only be explicit about the nested types):
curl -XPUT 'localhost:9200/book_index' -d '
{
"mappings": {
"publisher": {
"properties": {
"books": {
"type": "nested",
"properties": {
"genre_scores": {
"type": "nested"
}
}
}
}
}
}
}'
Here are our two publishers:
curl -XPUT 'localhost:9200/book_index/publisher/1' -d '
{
"name": "Best Books Publishing",
"books": [
{
"name": "Published with medium genre_id of 1",
"published": true,
"genre_scores": [
{ "genre_id": 1, "score": 50 },
{ "genre_id": 2, "score": 15 }
]
}
]
}'
curl -XPUT 'localhost:9200/book_index/publisher/2' -d '
{
"name": "Puffin Publishers",
"books": [
{
"name": "Published book with low genre_id of 1",
"published": true,
"genre_scores": [
{ "genre_id": 1, "score": 10 },
{ "genre_id": 4, "score": 10 }
]
},
{
"name": "Unpublished book with high genre_id of 1",
"published": false,
"genre_scores": [
{ "genre_id": 1, "score": 100 },
{ "genre_id": 2, "score": 35 }
]
}
]
}'
And here is the final definition of our index & mappings...
curl -XGET 'localhost:9200/book_index/_mappings?pretty=true'
...
{
"book_index": {
"mappings": {
"publisher": {
"properties": {
"books": {
"type": "nested",
"properties": {
"genre_scores": {
"type": "nested",
"properties": {
"genre_id": {
"type": "long"
},
"score": {
"type": "long"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"published": {
"type": "boolean"
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
Now suppose we want to query for a list of publishers, and have them sorted by those who books performing
well in a particular genre. In other words, sort the publishers by the genre_score.score of one of their books
for the target genre_id.
We might write a search query like this...
curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
"size": 5,
"from": 0,
"sort": [
{
"books.genre_scores.score": {
"order": "desc",
"nested_path": "books.genre_scores",
"nested_filter": {
"term": {
"books.genre_scores.genre_id": 1
}
}
}
}
],
"_source":false,
"query": {
"nested": {
"path": "books",
"query": {
"bool": {
"must": []
}
},
"inner_hits": {
"size": 5,
"sort": []
}
}
}
}'
Which correctly returns the Puffin (with a sort value of [100]) first and Best Books second (with a sort value of [50]).
But suppose we only want to consider books for which published is true. This would change our expectation to have Best Books first (with a sort of [50]) and Puffin second (with a sort of [10]).
Let's update our nested_filter and query to the following...
curl -XGET 'localhost:9200/book_index/_search?pretty=true' -d '
{
"size": 5,
"from": 0,
"sort": [
{
"books.genre_scores.score": {
"order": "desc",
"nested_path": "books.genre_scores",
"nested_filter": {
"bool": {
"must": [
{
"term": {
"books.genre_scores.genre_id": 1
}
}, {
"term": {
"books.published": true
}
}
]
}
}
}
}
],
"_source": false,
"query": {
"nested": {
"path": "books",
"query": {
"term": {
"books.published": true
}
},
"inner_hits": {
"size": 5,
"sort": []
}
}
}
}'
Suddenly, our sort values for both publishers has become [-9223372036854775808].
Why does adding an additional term to our nested_filter in the top-level sort have this impact?
Can anyone provide some insight as to why this behavior is happening? And additionally, if there are any viable solutions to the proposed query/sort?
This occurs in both ES1.x and ES5
Thanks!

Resources