Elasticsearch term suggester does not return results on exact match - elasticsearch

when i request the suggester with
{
"my-title-suggestions-1": {
"text": "tücher ",
"term": {
"field": "name",
}
},
"my-title-suggestions-2": {
"text": "tüchers ",
"term": {
"field": "name"
}
}
}
it returns
{
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"my-title-suggestions-1": [
{
"text": "tücher",
"offset": 0,
"length": 6,
"options": []
}
],
"my-title-suggestions-2": [
{
"text": "tüchers",
"offset": 0,
"length": 7,
"options": [
{
"text": "tücher",
"score": 0.8333333,
"freq": 6
}
]
}
]
}
i wonder why it does not return the exact match with the first suggester?
the second suggester obviously has that result.
can i add other options which will resolve this behavior?
edit:
the minimal mapping is just this ...
{
"name" : {
"analyzer" : "standard",
"type" : "string"
}
}

To add to what #ChintanShah25 said: According to https://www.elastic.co/guide/en/elasticsearch/reference/2.0/search-suggesters-term.html (see suggest_mode) the Term suggester will by default:
Only provide suggestions for suggest text terms that are not in the index.

I dont think you can do that and I am not sure why do you want exact match in suggestions, after all they are "suggestions".
Normally they are used to check misspelling. It will give you candidate suggestions that are similar and fall in edit distance of 2 for the word you entered.

Related

Elasticsearch Term suggester is not returning correct suggestions when one character is missing (instead of misspelling)

I'm using Elasticsearch term suggester for spell correction. my index contains huge list of ads. Each ad has subject and body fields. I've found a problematic example for which the suggester is not suggesting correct suggestions.
I have lots of ads whose subject contains word "soffa" and also 5 ads whose subject contain word "sofa". Ideally, when I send "sofa" (wrong spelling) as text to suggester, it should return "soffa" (correct spelling) as suggestions (since soffa is correct spell and most of ads contains "soffa" and only few ads contains "sofa" (wrong spell)).
Here is my suggester query body :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject",
"suggest_mode": "popular",
"min_word_length": 1
}
}
}
}
When I send above query, I get below response :
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
},
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
}
As you see in above response, it returned "soff" but not "soffa" although I have lots of docs whose subject contains "soffa".
I even played with parameters like suggest_mode and string_distance but still no luck.
I also used phrase suggester instead of term suggester but still same. Here is my phrase suggester query :
{
"suggest": {
"text": "sofa",
"subjectuggester": {
"phrase": {
"field": "subject",
"size": 10,
"gram_size": 3,
"direct_generator": [
{
"field": "subject.trigram",
"suggest_mode": "always",
"min_word_length":1
}
]
}
}
}
}
I somehow think it doesn't work when one character is missing instead of being misspelled. in the "soffa" example, one "f" is missing.
while it works fine for misspells e.g it works fine for "vovlo".
When I send "vovlo" it gives me "volvo".
Any help would be hugely appreciated.
Try changing the "string_distance".
{
"suggest": {
"text": "sof",
"subjectSuggester": {
"term": {
"field": "title",
"min_word_length":2,
"string_distance":"ngram"
}
}
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#term-suggester
I've found the workaround myself.
I added ngram filter and analyzer with max_shingle_size 3 which means trigram, then added a subfield with that analyzer (trigram) and performed suggester query on that field (instead of actual field) and it worked.
Here is the mapping changes :
{
"settings": {
"analysis": {
"filter": {
"shingle": {
"type": "shingle",
"min_shingle_size": 2,
"max_shingle_size": 3
}
},
"analyzer": {
"trigram": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle"
],
"char_filter": [
"diacritical_marks_filter"
]
}
}
}
},
"mappings": {
"properties": {
"subject": {
"type": "text",
"fields": {
"trigram": {
"type": "text",
"analyzer": "trigram"
}
}
}
}
}
}
And here is my corrected query :
{
"suggest": {
"text": "sofa",
"subjectSuggester": {
"term": {
"field": "subject.trigram",
"suggest_mode": "popular",
"min_word_length": 1,
"string_distance": "ngram"
}
}
}
}
Note that I'm performing suggester to subject.trigram instead of subject itself.
Here is the result :
{
"suggest": {
"subjectSuggester": [
{
"text": "sofa",
"offset": 0,
"length": 4,
"options": [
{
"text": "soffa",
"score": 0.8,
"freq": 282
},
{
"text": "soffan",
"score": 0.6666666,
"freq": 5
},
{
"text": "som",
"score": 0.625,
"freq": 102
},
{
"text": "sol",
"score": 0.625,
"freq": 82
},
{
"text": "sony",
"score": 0.625,
"freq": 50
}
]
}
]
}
}
As you can see above soffa appears as first suggestion.
There is sth weird in your result for the term suggester for the word sofa, take a look at the text that is being corrected:
"suggest": {
"subjectSuggester": [
{
"text": "sof",
"offset": 0,
"length": 4,
"options": [
{
"text": "soff",
"score": 0.6666666,
"freq": 298
},
{
"text": "sol",
"score": 0.6666666,
"freq": 101
},
{
"text": "saf",
"score": 0.6666666,
"freq": 6
}
]
}
]
}
As you can see it's sof and not sofa which means the correction is not for sofa but instead it's for sof, so I doubt that this issue is related to the analyzer you were using on this field, especially when looking at the results soff instead of soffa it's removing the last a

Elasticsearch - Show index-wide count for each returned result based from a given term

Firstly i apologise if the terminology i use is incorrect as i am learning elasticsearch day by day and maybe use incorrect phrases.
After spending several days trying to figure this out and pulling my hair out i seem to be hitting brick walls every-time.
I am trying to get elasticsearch to provide a document count for each returned result, I will provide an example below..
{
"suggest": {
"text": "aberdeen",
"city": {
"completion": {
"field": "city_suggest",
"size": "2"
}
},
"street": {
"completion": {
"field": "street_suggest",
"size": "2"
}
}
},
"size": 0,
"aggs": {
"meta": {
"filter": {
"term": {
"city.raw": "aberdeen"
}
},
"aggs": {
"name": {
"terms": {
"field": "city.raw"
}
}
}
}
}
}
The above query returns the following results:
{
"took": 37,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1870535,
"max_score": 0,
"hits": []
},
"aggregations": {
"meta": {
"doc_count": 119196,
"name": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "Aberdeen",
"doc_count": 119196
}
]
}
}
},
"suggest": {
"city": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Aberdeen",
"score": 100
}
]
}
],
"street": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Davidson House, Aberdeen, AB15",
"score": 80
},
{
"text": "Bruce House, Aberdeen, AB15",
"score": 80
}
]
}
]
}
}
The result i am trying to achieve is to have an overall document count of each returned result so for example, The returned street address of "Davidson House, Aberdeen, AB15" would say how many documents in the index match this given address and this would be repeated for each result and the same for the city in a similar way to how the aggregated city currently shows the overall count.
{
"key": "Aberdeen",
"doc_count": 119196
}
Here is an example of something similar in production
The problem i believe i have faced with aggregations is i do not know the values that are going to be returned otherwise i could predefine them with aggregations like i did the city thus requesting the overall count of each given result that way.
To help give an overall example of how i pictured the results to be i will show how i pictured that possible working results to be like:
"suggest": {
"city": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Aberdeen",
"score": 100,
"total_addresses": 196152
}
]
}
],
"street": [
{
"text": "Aberdeen",
"offset": 0,
"length": 8,
"options": [
{
"text": "Davidson House, Aberdeen, AB15",
"score": 80,
"total_addresses": 158
},
{
"text": "Bruce House, Aberdeen, AB15",
"score": 80,
"total_addresses": 30
}
]
}
]
}
En terms of the elasticsearch version i am using, I have two dev servers running elasticsearch 2.3 and 5.5 to see if the newer version of elasticsearch would make a difference and unfortunately i came up short so i have been using 2.3 in favour of 5.5
Any help or advice would be greatly appreciated, Thanks all.
you need to divide your query in two. First use the suggest API to gather suggestions, then run the aggregation on the result. The drawback of this solution would be, that you have a crazy fast suggestion (less than a millisecond, if you're lucky), against a longer running aggregation. If thats ok for you, this might be a good approach.
Another idea might be to have an own suggestion index with preaggregated data, that contains such a count - this index gets recreated regurlarly in the background.

es query of suggest in elasticsearch 5.0.1

I have a question that i want to search a result use suggest.
My type schema like this
`
{
"name": {
"input": [
"uers1"
]
},
"usertype": 1
}{
"name": {
"input": [
"uers2"
]
},
"usertype": 2
}`
I want search data by suggest, the query like these
`{
"suggest": {
"person_suggest": {
"text": "us",
"completion": {
"field": "name"
}
}
}
}`
And the result like these
`{
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"person_suggest": [
{
"text": "word",
"offset": 0,
"length": 4,
"options": [
{
"name": "user1",
"usertype": 1,
"score": 1
},
{
"text": "user2",
"usertype": 2,
"score": 1
}
]
}
]
} `
But I only want the result is usertype = 1, like add a where condition in mysql. Any body can help me ?I want a DSL query.Thx a lot.
You can'nt filter in completion suggest queries. A solution to your problem to make different completion fields for each usertype or use standard queries with nGram analyzers.

Why does the elasticsearch suggesters return multiple equal objects?

I'm playing with suggesters currently and wonder why the resultset has always multiple equal objects.
Example request:
{"suggest": {
"test" : {
"text": "holz",
"term" : {
"field":"title"
}
}
}}
Result:
{"suggest": {
"test": [
{
"text": "holz",
"offset": 0,
"length": 4,
"options": [...]
},
{
"text": "holz",
"offset": 0,
"length": 4,
"options": [...]
},
{
"text": "holz",
"offset": 0,
"length": 4,
"options": [...]
},
{
"text": "holz",
"offset": 0,
"length": 4,
"options": [...]
}
]
}}
Even the objects in options are exactly the same. It's always the same, no matter what text I want suggestions for. Is there any explanation for this?
ES version is 2.3.4
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-completion.html#skip_duplicates
You have to add the skip duplicates param.
Have a nice day,
Daniel
Have you tried adding payloads to your docs ?
https://www.elastic.co/guide/en/elasticsearch/reference/2.1/search-suggesters-completion.html
curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
"name" : "Nevermind",
"suggest" : {
"input": [ "Nevermind", "Nirvana" ],
"output": "Nirvana - Nevermind",
**"payload" : { "artistId" : 2321 },**
"weight" : 34
}
}'

ElasticSearch: Attempting to get spelling suggestion on proper names

Before I begin, let me just say that I'm no ElasticSearch expert, but I am currently tasked with tweaking some analyzers to get spelling suggestions working better in a couple of different situations. I've seen examples of people who are doing spelling suggestions on proper names, so I know it must be possible, but I've been at this for a couple days now, and I must be missing something, because ElasticSearch doesn't seem to recognize the name I'm looking for. Can you please help me figure this out? Thanks in advance!
Here's the analyzer I'm using for index as well as search:
"full_text": {
"filter": [
"lowercase",
"asciifolding",
],
"type": "custom",
"tokenizer": "keyword"
},
This should demonstrate that the field is tokenizing into one long keyword, which I want.
{
"query": {
"match": {
"_all": "combine 5"
}
},
"script_fields": {
"terms" : {
"script": "doc[field].values",
"params": {
"field": "my_field"
}
}
}
}
...and it outputs something like this, which shows how the field is being tokenized. Looks good:
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 75,
"max_score": 0.58574116,
"hits": [
{
"_index": "my_index",
"_type": "thing",
"_id": "1",
"_score": 0.58574116,
"fields": {
"terms": [
[
"combine 5"
]
]
}
}
}
}
... but when I do a suggest query, it doesn't suggest the field, even though it's just off by a space.
{
"query": {
"match": {
"_all": "combine 5"
}
},
"suggest": {
"suggest-0": {
"term": {
"field": "_all",
"size": 5
},
"text": "combine5"
}
}
}
Which returns a bunch of documents and this suggestion:
"suggest": {
"suggest-0": [
{
"text": "combine5",
"offset": 0,
"length": 8,
"options": [
{
"text": "combined",
"score": 0.875,
"freq": 15
},
{
"text": "combine",
"score": 0.85714287,
"freq": 17
}
]
}
]
}
Note that if I change the spelling suggestion to work just on the field that contains the text, it does suggest it, but not when I'm using _all. Is there a way to get the words in a specific field to be suggested when suggesting against _all?
I'm not sure this qualifies as exactly the answer I was looking for, but I ended up solving this by adding a field on the document containing the keyword value that I was looking for "combine5", so now it is registered as a word and if I suggest on that field, or _all, the word is suggested. It's also found in queries against _all.

Resources