Elastic search suggest action doesn't return all possible suggestions - elasticsearch

Seems like elastic suggest option doesn't return me what i expected. I guess it's about bad params but i don't see it.
If body contains word like "TOURNAMENT" and if i run suggest api for "TOURNAME" i got "TOURNAMENT" as suggestion. But if i run same api for "TOURNA" i don't get "TOURNAMENT" as suggestion.
EXPECTED BEHAVIOR:
{
"size": 0,
"suggest" : {
"my-sugg": {
"text" : "TOURNAME",
"term": {
"field" : "body",
"min_word_length" : "3"
}
}
}
}
I got expected result
"my-sugg": [
{
"text": "tourname",
"offset": 0,
"length": 8,
"options": [
{
"text": "tournament",
"score": 0.75,
"freq": 60
}
]
}
NOT EXPECTED BEHAVIOR:
If i run exactly the same api as above with
"my-sugg": {
"text" : "TOURNA":
"suggest": {
"my-sugg": [
{
"text": "tourna",
"offset": 0,
"length": 6,
"options": [
{
"text": "tour’s",
"score": 0.6666666,
"freq": 5
},
{
"text": "tour's",
"score": 0.6666666,
"freq": 3
},
{
"text": "tour",
"score": 0.5,
"freq": 34
},
{
"text": "turn",
"score": 0.5,
"freq": 2
}
]
}
]
I got wrong result. I would like to get "tournament' as one possible suggestion. Any clue?

I think you might be mixing up the term suggester with the completion suggester.
The first one allows you to do something like did you mean functionality, suggesting similar terms, where as the second allows you to implement search as you type functionality. I think this is what you are after here.

Related

Elasticsearch Term suggester is not returning correct suggestions when one character is missing (instead of misspelling)

I'm using Elasticsearch term suggester for spell correction. my index contains huge list of ads. Each ad has subject and body fields. I've found a problematic example for which the suggester is not suggesting correct suggestions.
I have lots of ads whose subject contains word "soffa" and also 5 ads whose subject contain word "sofa". Ideally, when I send "sofa" (wrong spelling) as text to suggester, it should return "soffa" (correct spelling) as suggestions (since soffa is correct spell and most of ads contains "soffa" and only few ads contains "sofa" (wrong spell)).
Here is my suggester query body :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject",
"suggest_mode": "popular",
"min_word_length": 1
}
}
}
}
When I send above query, I get below response :
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
}
As you see in above response, it returned "soff" but not "soffa" although I have lots of docs whose subject contains "soffa".
I even played with parameters like suggest_mode and string_distance but still no luck.
I also used phrase suggester instead of term suggester but still same. Here is my phrase suggester query :
{
"suggest": {
"text": "sofa",
"subjectuggester": {
"phrase": {
"field": "subject",
"size": 10,
"gram_size": 3,
"direct_generator": [
{
"field": "subject.trigram",
"suggest_mode": "always",
"min_word_length":1
}
]
}
}
}
}
I somehow think it doesn't work when one character is missing instead of being misspelled. in the "soffa" example, one "f" is missing.
while it works fine for misspells e.g it works fine for "vovlo".
When I send "vovlo" it gives me "volvo".
Any help would be hugely appreciated.
Try changing the "string_distance".
{
"suggest": {
"text": "sof",
"subjectSuggester": {
"term": {
"field": "title",
"min_word_length":2,
"string_distance":"ngram"
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#term-suggester
I've found the workaround myself.
I added ngram filter and analyzer with max_shingle_size 3 which means trigram, then added a subfield with that analyzer (trigram) and performed suggester query on that field (instead of actual field) and it worked.
Here is the mapping changes :
{
"settings": {
"analysis": {
"filter": {
"shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
},
"analyzer": {
"trigram": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle"
],
"char_filter": [
"diacritical_marks_filter"
]
}
}
}
},
"mappings": {
"properties": {
"subject": {
"type": "text",
"fields": {
"trigram": {
"type": "text",
"analyzer": "trigram"
}
}
}
}
}
}
And here is my corrected query :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject.trigram",
"suggest_mode": "popular",
"min_word_length": 1,
"string_distance": "ngram"
}
}
}
}
Note that I'm performing suggester to subject.trigram instead of subject itself.
Here is the result :
{
"suggest": {
"subjectSuggester": [
{
"text": "sofa",
"offset": 0,
"length": 4,
"options": [
{
"text": "soffa",
"score": 0.8,
"freq": 282
},
{
"text": "soffan",
"score": 0.6666666,
"freq": 5
},
{
"text": "som",
"score": 0.625,
"freq": 102
},
{
"text": "sol",
"score": 0.625,
"freq": 82
},
{
"text": "sony",
"score": 0.625,
"freq": 50
}
]
}
]
}
}
As you can see above soffa appears as first suggestion.
There is sth weird in your result for the term suggester for the word sofa, take a look at the text that is being corrected:
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
As you can see it's sof and not sofa which means the correction is not for sofa but instead it's for sof, so I doubt that this issue is related to the analyzer you were using on this field, especially when looking at the results soff instead of soffa it's removing the last a

Elasticsearch - How does one combine term suggestions from multiple fields?

The term suggester documentation lays out the basics of term suggester, but it leaves me wondering how I can find suggestions from multiple fields and combine them. I can probably come up with some implementation after-the-fact, but I'm wondering if there are some settings I'm missing.
For example, let's say I want to get suggestions from three different fields
GET product-search-product/_search
{
"suggest": {
"text": "som typu here",
"my-suggest-1": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_one"
}
},
"my-suggest-2": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_two"
}
},
"my-suggest-3": {
"term": {
"size": 1,
"max_edits": 1,
"prefix_length": 3,
"field": "field_three"
}
}
}
}
This returns results I can use, but I have to figure out which field had the "best" suggestion.
"suggest": {
"my-suggest-1": [
{
"text": "som",
...
"options": [
{
"text": "somi"
...
}
]
},
{
"text": "typu",
...
"options": [
{
"text": "typo"
...
}
]
},
{
"text": "here",
...
"options": []
}
],
"my-suggest-2": [
{
"text": "som",
...
"options": [
{
"text": "some"
...
}
]
},
{
"text": "typu",
...
"options": []
},
{
"text": "here",
...
"options": []
}
],
"my-suggest-3": [
{
"text": "som",
...
"options": []
},
{
"text": "typu",
...
"options": [
{
"text": "typa"
...
}
]
},
{
"text": "here",
...
"options": []
}
]
}
It looks to me as if I have to implement something to determine which field came up with the best suggestions. Is there no way to combine these in the suggester so it can do that for me?
Phrase suggester was appropriate for my case and with the phrase suggester there exist candidate generators which appear to solve my problem.

Elasticsearch - Show index-wide count for each returned result based from a given term

Firstly i apologise if the terminology i use is incorrect as i am learning elasticsearch day by day and maybe use incorrect phrases.
After spending several days trying to figure this out and pulling my hair out i seem to be hitting brick walls every-time.
I am trying to get elasticsearch to provide a document count for each returned result, I will provide an example below..
{
"suggest": {
"text": "aberdeen",
"city": {
"completion": {
"field": "city_suggest",
"size": "2"
}
},
"street": {
"completion": {
"field": "street_suggest",
"size": "2"
}
}
},
"size": 0,
"aggs": {
"meta": {
"filter": {
"term": {
"city.raw": "aberdeen"
}
},
"aggs": {
"name": {
"terms": {
"field": "city.raw"
}
}
}
}
}
}
The above query returns the following results:
{
"took": 37,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1870535,
"max_score": 0,
"hits": []
},
"aggregations": {
"meta": {
"doc_count": 119196,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Aberdeen",
"doc_count": 119196
}
]
}
}
},
"suggest": {
"city": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Aberdeen",
"score": 100
}
]
}
],
"street": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Davidson House, Aberdeen, AB15",
"score": 80
},
{
"text": "Bruce House, Aberdeen, AB15",
"score": 80
}
]
}
]
}
}
The result i am trying to achieve is to have an overall document count of each returned result so for example, The returned street address of "Davidson House, Aberdeen, AB15" would say how many documents in the index match this given address and this would be repeated for each result and the same for the city in a similar way to how the aggregated city currently shows the overall count.
{
"key": "Aberdeen",
"doc_count": 119196
}
Here is an example of something similar in production
The problem i believe i have faced with aggregations is i do not know the values that are going to be returned otherwise i could predefine them with aggregations like i did the city thus requesting the overall count of each given result that way.
To help give an overall example of how i pictured the results to be i will show how i pictured that possible working results to be like:
"suggest": {
"city": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Aberdeen",
"score": 100,
"total_addresses": 196152
}
]
}
],
"street": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Davidson House, Aberdeen, AB15",
"score": 80,
"total_addresses": 158
},
{
"text": "Bruce House, Aberdeen, AB15",
"score": 80,
"total_addresses": 30
}
]
}
]
}
En terms of the elasticsearch version i am using, I have two dev servers running elasticsearch 2.3 and 5.5 to see if the newer version of elasticsearch would make a difference and unfortunately i came up short so i have been using 2.3 in favour of 5.5
Any help or advice would be greatly appreciated, Thanks all.
you need to divide your query in two. First use the suggest API to gather suggestions, then run the aggregation on the result. The drawback of this solution would be, that you have a crazy fast suggestion (less than a millisecond, if you're lucky), against a longer running aggregation. If thats ok for you, this might be a good approach.
Another idea might be to have an own suggestion index with preaggregated data, that contains such a count - this index gets recreated regurlarly in the background.

Why does the elasticsearch suggesters return multiple equal objects?

I'm playing with suggesters currently and wonder why the resultset has always multiple equal objects.
Example request:
{"suggest": {
"test" : {
"text": "holz",
"term" : {
"field":"title"
}
}
}}
Result:
{"suggest": {
"test": [
{
"text": "holz",
"offset": 0,
"length": 4,
"options": [...]
},
{
"text": "holz",
"offset": 0,
"length": 4,
"options": [...]
},
{
"text": "holz",
"offset": 0,
"length": 4,
"options": [...]
},
{
"text": "holz",
"offset": 0,
"length": 4,
"options": [...]
}
]
}}
Even the objects in options are exactly the same. It's always the same, no matter what text I want suggestions for. Is there any explanation for this?
ES version is 2.3.4
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html#skip_duplicates
You have to add the skip duplicates param.
Have a nice day,
Daniel
Have you tried adding payloads to your docs ?
https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-suggesters-completion.html
curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
"name" : "Nevermind",
"suggest" : {
"input": [ "Nevermind", "Nirvana" ],
"output": "Nirvana - Nevermind",
**"payload" : { "artistId" : 2321 },**
"weight" : 34
}
}'

Elasticsearch term suggester does not return results on exact match

when i request the suggester with
{
"my-title-suggestions-1": {
"text": "tücher ",
"term": {
"field": "name",
}
},
"my-title-suggestions-2": {
"text": "tüchers ",
"term": {
"field": "name"
}
}
}
it returns
{
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"my-title-suggestions-1": [
{
"text": "tücher",
"offset": 0,
"length": 6,
"options": []
}
],
"my-title-suggestions-2": [
{
"text": "tüchers",
"offset": 0,
"length": 7,
"options": [
{
"text": "tücher",
"score": 0.8333333,
"freq": 6
}
]
}
]
}
i wonder why it does not return the exact match with the first suggester?
the second suggester obviously has that result.
can i add other options which will resolve this behavior?
edit:
the minimal mapping is just this ...
{
"name" : {
"analyzer" : "standard",
"type" : "string"
}
}
To add to what #ChintanShah25 said: According to https://www.elastic.co/guide/en/elasticsearch/reference/2.0/search-suggesters-term.html (see suggest_mode) the Term suggester will by default:
Only provide suggestions for suggest text terms that are not in the index.
I dont think you can do that and I am not sure why do you want exact match in suggestions, after all they are "suggestions".
Normally they are used to check misspelling. It will give you candidate suggestions that are similar and fall in edit distance of 2 for the word you entered.

Resources