How to use phrase suggester results as part of a query - elasticsearch

Having spent ages reading the docs and various websites. I don't understand how one is supposed to use the phrase suggester to influence the results of a query. My understanding was that running the following query and suggester, the results from the suggester would be used for the query.
POST test/test/_search
{
"query": {
"multi_match": {
"query": "anti-inefffective",
"fields": ["*#value"]
}
},
"highlight" : {
"fields" : {
"*#value" : {
"pre_tags" : ["<mark>"],
"post_tags" : ["</mark>"]
}
}
},
"suggest" : {
"text" : "anti-inefffective"",
"simple_phrase" : {
"phrase" : {
"analyzer" : "default",
"field" : "_all",
"size" : 1,
"real_word_error_likelihood" : 0.95,
"max_errors" : 0.5,
"gram_size" : 2,
"direct_generator" : [ {
"field" : "_all",
"suggest_mode" : "always",
"min_word_length" : 1
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
How can I get the results of the suggester to be used for the query term all within a json request? All the examples I've seen have the phrase suggester executed after the query which seems bizarre to me. The only way I can see to do this would be to run a phrase suggester query then extract the value and then add it programatically to a query and then run the query with the suggested text.
In other words I would like to be able to do what Google does, if you type "cancerous tummour" in Google it returns results for "cancerous tumour" but gives you the option to use the incorrect phrase but the corrected phrase is used automatically for the query.

You should take a look at the collate+query option of the Phrase Suggester when used together with the confidence parameter.
The phrase suggester workflow looks like this:
Suggests candidate terms for cancerous and tummour based on
the parameters passed to the candidate generator section.
Generates a number of 'mad-lib' phrase suggestions using the term
candidates, combining the word-frequency of the phrase terms to
generate a score for each suggestion.
With the collate/match option, actually runs each candidate
inside a query template (defined by you, the query author) so
that queries w/zero-results can be discarded.
To emulate the Google functionality you describe, when you run the user's query you'd also:
Use the phrase suggester to generate the #1 "size": 1, top-scoring, collated/non-zero results phrase suggestion for the original user input query.
With the default "confidence": 1.0 the phrase suggester will only give you a phrase suggestion the suggester considers to be of higher confidence compared to the original user input query.
When you see the (higher-confidence) suggestion come back alongside the original query result, your client could decide to take the suggestion and execute the suggested query in place of the original query (while preserving the original query-text to display as a fallback search option).
Short answer: There's no option to automatically use the top suggestion within Elasticsearch as the query text. But you could build that in your search client using the functionality currently provided by the phrase suggester.

Related

Filter on score after rescore in Elasticsearch

I have been on an internet manhunt for days for this and getting ready to give up. I need to filter on _score in Elasticsearch after the rescore function has completed. So given an example query like this:
POST /_search
{
"query" : {
"match" : {
"message" : {
"operator" : "or",
"query" : "the quick brown"
}
}
},
"rescore" : {
"window_size" : 50,
"query" : {
"rescore_query" : {
"match_phrase" : {
"message" : {
"query" : "the quick brown",
"slop" : 2
}
}
},
"query_weight" : 0.7,
"rescore_query_weight" : 1.2
}
}
}
Say just for simplicity's sake that the above returns 5 documents with scores ranging from 0.0 to 1.0. I want the final returned results set to only be the documents with a score above 0.90. In other words, take those newly-rescored docs, and hand them off to a filter where it drops all documents scored below 0.90.
I have tried many, many different ways but nothing is working. Post_filter is apparently meant to come after the main query but before rescore, so that one doesn't work. min_score does not work at all with rescore, it only works with the original ES scores from the main query. Aggs is one functionality that I am able to get to work after rescore, but aggregating is not what I need to do here. But at least it shows me that ES has the ability to continue operating on the data after a rescore query.
Any thoughts on how to get this seemingly simple task accomplished? I have also tried using function_score and script_score but really those are just ways to further modify the scores, whereas I need to filter on the scores generated by the rescore. The requirement here is to get it done in the query. We can't do it as a post-processing step.

Phrase suggester returns unexpected result when first letter is misspelled

I'm using Elasticsearch Phrase Suggester for correcting user's misspellings. everything is working as I expected unless user enters a query which it's first letter is misspelled. At this situation phrase suggester returns nothing or returns unexpected results.
My query for suggestion:
{
"suggest": {
"text": "user_query",
"simple_phrase": {
"phrase": {
"field": "title.phrase",,
"collate": {
"query": {
"inlile" : {
"bool": {
"should": [
{ "match": {"title": "{{suggestion}}"}},
{ "match": {"participants": "{{suggestion}}"}}
]
}
}
}
}
}
}
}
}
Example when first letter is misspelled:
"simple_phrase" : [
{
"text" : "گاشانچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "گارانتی",
"score" : 0.00253151
}]
}
]
Example when fifth letter is misspelled:
"simple_phrase" : [
{
"text" : "کاشاوچی",
"offset" : 0,
"length" : 11,
"options" : [ {
"text" : "کاشانچی",
"score" : 0.1121
},
{
"text" : "کاشانجی",
"score" : 0.0021
},
{
"text" : "کاشنچی",
"score" : 0.0020
}]
}
]
I expect that these two misspelled queries have same suggestions(my expected suggestions are second one). what is wrong?
P.S: I'm using this feature for Persian language.
I have solution for your problem, only need to add some fields in your schema.
P.S: I don't have that much expertise in elasticsearch but I have solved same problem using solr, you can implement same way in elasticSearch too
Create new ngram field and copy all you title name in ngram field.
When you fire any query for missspell word and you get blank result then split
the word and again fire the same query you will get results as expected.
Example : Suppose user searching for word Akshay but type it as Skshay, then
create query in below way you will get results as expected hopefully.
I am here giving you solr example same way you can achieve it using
elasticsearch.
**(ngram:"skshay" OR ngram:"sk" OR ngram:"ks" OR ngram:"sh" OR ngram:"ha" ngram:"ay")**
We have split the word sequence wise and fire query on field ngram.
Hope it will help you.
From Elasticsearch doc:
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-suggesters-phrase.html
prefix_length
The number of minimal prefix characters that must match in order be a
candidate suggestions. Defaults to 1. Increasing this number improves
spellcheck performance. Usually misspellings don’t occur in the
beginning of terms. (Old name "prefix_len" is deprecated)
So by default phrase-suggester assumes that the first character is correct because the default value for prefix_length is 1.
Note: setting this value to 0 is not a good way because this will have performance implications.
You need to use the reverse analyzer
I explained it in this post so please go and check my answer
Elasticsearch spell check suggestions even if first letter missed
And regarding the duplicates, you can use
skip_duplicates
Whether duplicate suggestions should be filtered out (defaults to
false).

How to run Elasticsearch completion suggester query on limited set of documents

I'm using a completion suggester in Elasticsearch on a single field. The type contains documents of several users. Is there a way to limit the returned suggestions to documents that match a specific query?
I'm currently using this query:
{
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
Is there a way to combine this query with a different one, e.g.
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
}
}
Have a look at the context suggester, which is just a specialized completion suggester with filtering capabilities - however this is still not a regular query filter, just keep that in mind.
You can specify both the query and the suggester in your query, like this:
{
"query":{
"term" : {
"user_id" : "590c5bd2819c3e225c990b48"
}
},
"suggest": {
"name" : {
"text" : "Peter",
"completion" : {
"field" : "name_suggest"
}
}
}
}
I have a similar use case, and I've posted my question on elastic search forum, see here
From what I've read so far, I don't think with completion suggester you can limit documents. They essentially create a finite state transducer (prefix tree) at index time, this makes it fast but you lose the flexibility of filtering on additional fields. I don't think context suggester would work in your case (let me know if i am wrong), because the cardinality of user_id is very high.
I think edge-ngrams partial matching is more flexible and might actually work in your use case.
Let me know what you end up implementing.

How to convert filtered query with Multi_Match to filtered query with Common Terms

I am using ES 2.0. I have the following filtered query with multi_match:
{
"filtered" : {
"query": {
"multi_match" : {
"query" : "sleep",
"fields" : ["title.*^10","introduction.*"],
"cutoff_frequency" : 0.001,
"operator" : "or",
"analyzer" : "standard"
}
},
"filter" : {
...
}
}
Because of stop words issue, I would like to replace the Multi_Match with Common Terms explained here: https://www.elastic.co/blog/stop-stopping-stop-words-a-look-at-common-terms-query
How can I just replace the above multi_match with Common Terms? I cannot figure out how to handle the search on multiple fields based on Common Terms.
Thanks!
When specifying the cutoff_frequency in your multi_match query, you're already using common terms, as mentioned in the blog article you linked to:
"Common Terms has also been incorporated into the Match query and can
be enabled by setting cutoff_frequency to a value like 0.001"
The documentation for match and multi_match on cutoff_frequency also mention this fact.

elastic search faceted query returns incorrect count

I need help in aggregate / faceted queries in elastic search. I have used faceted query to group the results but I’m not getting grouped result with correct count.
Please suggest on how to get grouped results from elastic search.
{
"query" : {
"query_string" : {"query" : "pared_cat_id:1"} } ,
"facets" : {
"subcategory" : {
"terms" : {
"field": "sub_cat_id",
"size" : 50,
"order" : "term",
"all_terms" : true
}
}
},
"from" : 0,
"size": 50
}
Trying to get grouped results for sub category id for passed parent category id.
"query_string" : {"query" : "pared_cat_id:1"} } ,
This is applied to overall data and not on the facets counts.
FOr this you need to use facet query in which you can specify same which you are specifying in the main query string.
So facets count which are being shown to you now are based on the results without applying "query_string" : {"query" : "pared_cat_id:1"} } , ie. to the whole data. Incase you want facets counts after applying "query_string" : {"query" : "pared_cat_id:1"} } , provide it in the facet query.
Elasticsearch faceting queries works very well in terms of accuracy, at least I have not seen any problem yet.
Just a few questions:
What field is this string or numeric,give example?
Have you applied any custom mapping or you have used default "standard" analyzer
Please state the kind of inaccuracy like "aa" should have count 100 but its 50 or is there any other kind of inaccuracy?
Elasticsearch facets query returns incorrect count if the number of shards is >1, so as for now Facets are deprecated and will be removed in a future release. You are encouraged to migrate to aggregations instead.
I suggest that you take a look at this blog post in which Alex Brasetvik give a good description along with some examples on how to use the aggregations feature properly.

Resources