Elastic Search truncating hits.total via score - elasticsearch

Is it possible to execute a query and filter it so that only elements with score > 1.0 are considered in the hits.total response?

I believe you can use min_score to achieve this (http://www.elasticsearch.org/guide/reference/api/search/min-score/). The ES docs example:
{
"min_score": 0.5,
"query" : {
"term" : { "user" : "kimchy" }
}
}
As the docs also say, this isn't usually practical because scoring is a relative calculation. If you're heavily influencing the results however, it might be what you need.

Related

Filter on score after rescore in Elasticsearch

I have been on an internet manhunt for days for this and getting ready to give up. I need to filter on _score in Elasticsearch after the rescore function has completed. So given an example query like this:
POST /_search
{
"query" : {
"match" : {
"message" : {
"operator" : "or",
"query" : "the quick brown"
}
}
},
"rescore" : {
"window_size" : 50,
"query" : {
"rescore_query" : {
"match_phrase" : {
"message" : {
"query" : "the quick brown",
"slop" : 2
}
}
},
"query_weight" : 0.7,
"rescore_query_weight" : 1.2
}
}
}
Say just for simplicity's sake that the above returns 5 documents with scores ranging from 0.0 to 1.0. I want the final returned results set to only be the documents with a score above 0.90. In other words, take those newly-rescored docs, and hand them off to a filter where it drops all documents scored below 0.90.
I have tried many, many different ways but nothing is working. Post_filter is apparently meant to come after the main query but before rescore, so that one doesn't work. min_score does not work at all with rescore, it only works with the original ES scores from the main query. Aggs is one functionality that I am able to get to work after rescore, but aggregating is not what I need to do here. But at least it shows me that ES has the ability to continue operating on the data after a rescore query.
Any thoughts on how to get this seemingly simple task accomplished? I have also tried using function_score and script_score but really those are just ways to further modify the scores, whereas I need to filter on the scores generated by the rescore. The requirement here is to get it done in the query. We can't do it as a post-processing step.

Unique values - Terms aggregation or Wildcard query

What's the best way to get all the unique terms for a field?
Can use either terms aggregation or a wild card query
(and then reduce it to unique terms at the application side)?
{
"query": {
"wildcard" : { "text" : "**" }
}
}
or
{
"aggs" : {
"genres" : {
"terms" : { "field" : "text" }
}
}
}
Terms aggregation lets elasticsearch reduce the terms to unique values (in a distributed manner) and thereby reduce the response payload. But is it going to put too much load on elasticsearch?
I'm aware of the shard size aspect of the terms aggregation. Other than that, is one internally optimized than the other or not? What's the execution plan for each?
"An aggregation can be seen as a unit-of-work that builds analytic
information over a set of documents"
What you're trying to achieve falls under analytical information, since it is provided out of the box, it is optimized.
use "explain": true to get description of score calculation(not useful for aggregations as they do not score documents)
Refer: Aggregations in ElasticSearch

How to use phrase suggester results as part of a query

Having spent ages reading the docs and various websites. I don't understand how one is supposed to use the phrase suggester to influence the results of a query. My understanding was that running the following query and suggester, the results from the suggester would be used for the query.
POST test/test/_search
{
"query": {
"multi_match": {
"query": "anti-inefffective",
"fields": ["*#value"]
}
},
"highlight" : {
"fields" : {
"*#value" : {
"pre_tags" : ["<mark>"],
"post_tags" : ["</mark>"]
}
}
},
"suggest" : {
"text" : "anti-inefffective"",
"simple_phrase" : {
"phrase" : {
"analyzer" : "default",
"field" : "_all",
"size" : 1,
"real_word_error_likelihood" : 0.95,
"max_errors" : 0.5,
"gram_size" : 2,
"direct_generator" : [ {
"field" : "_all",
"suggest_mode" : "always",
"min_word_length" : 1
} ],
"highlight": {
"pre_tag": "<em>",
"post_tag": "</em>"
}
}
}
}
}
How can I get the results of the suggester to be used for the query term all within a json request? All the examples I've seen have the phrase suggester executed after the query which seems bizarre to me. The only way I can see to do this would be to run a phrase suggester query then extract the value and then add it programatically to a query and then run the query with the suggested text.
In other words I would like to be able to do what Google does, if you type "cancerous tummour" in Google it returns results for "cancerous tumour" but gives you the option to use the incorrect phrase but the corrected phrase is used automatically for the query.
You should take a look at the collate+query option of the Phrase Suggester when used together with the confidence parameter.
The phrase suggester workflow looks like this:
Suggests candidate terms for cancerous and tummour based on
the parameters passed to the candidate generator section.
Generates a number of 'mad-lib' phrase suggestions using the term
candidates, combining the word-frequency of the phrase terms to
generate a score for each suggestion.
With the collate/match option, actually runs each candidate
inside a query template (defined by you, the query author) so
that queries w/zero-results can be discarded.
To emulate the Google functionality you describe, when you run the user's query you'd also:
Use the phrase suggester to generate the #1 "size": 1, top-scoring, collated/non-zero results phrase suggestion for the original user input query.
With the default "confidence": 1.0 the phrase suggester will only give you a phrase suggestion the suggester considers to be of higher confidence compared to the original user input query.
When you see the (higher-confidence) suggestion come back alongside the original query result, your client could decide to take the suggestion and execute the suggested query in place of the original query (while preserving the original query-text to display as a fallback search option).
Short answer: There's no option to automatically use the top suggestion within Elasticsearch as the query text. But you could build that in your search client using the functionality currently provided by the phrase suggester.

How disable scoring in elasticsearch for one query?

Is it possible to disable score calculation on particular query (not for type or all index) in elasticsearch?
As stated in comments, you could wrap your particular query in ConstantScoreQuery
{
"constant_score" : {
"query": { your_query_here}
"filter": {your_filter_here}
"boost" : 1.0
}
}
All matched documents will get score 1.0. For more reference information - http://www.elastic.co/guide/en/elasticsearch/reference/1.5/query-dsl-constant-score-query.html

elastic search faceted query returns incorrect count

I need help in aggregate / faceted queries in elastic search. I have used faceted query to group the results but I’m not getting grouped result with correct count.
Please suggest on how to get grouped results from elastic search.
{
"query" : {
"query_string" : {"query" : "pared_cat_id:1"} } ,
"facets" : {
"subcategory" : {
"terms" : {
"field": "sub_cat_id",
"size" : 50,
"order" : "term",
"all_terms" : true
}
}
},
"from" : 0,
"size": 50
}
Trying to get grouped results for sub category id for passed parent category id.
"query_string" : {"query" : "pared_cat_id:1"} } ,
This is applied to overall data and not on the facets counts.
FOr this you need to use facet query in which you can specify same which you are specifying in the main query string.
So facets count which are being shown to you now are based on the results without applying "query_string" : {"query" : "pared_cat_id:1"} } , ie. to the whole data. Incase you want facets counts after applying "query_string" : {"query" : "pared_cat_id:1"} } , provide it in the facet query.
Elasticsearch faceting queries works very well in terms of accuracy, at least I have not seen any problem yet.
Just a few questions:
What field is this string or numeric,give example?
Have you applied any custom mapping or you have used default "standard" analyzer
Please state the kind of inaccuracy like "aa" should have count 100 but its 50 or is there any other kind of inaccuracy?
Elasticsearch facets query returns incorrect count if the number of shards is >1, so as for now Facets are deprecated and will be removed in a future release. You are encouraged to migrate to aggregations instead.
I suggest that you take a look at this blog post in which Alex Brasetvik give a good description along with some examples on how to use the aggregations feature properly.

Resources