Compare Elasticsearch query score across multiple queries - elasticsearch

I'm trying to query and compare two MLT queries scores but am a bit confused based on what I read here
https://www.elastic.co/guide/en/elasticsearch/guide/current/practical-scoring-function.html
Even though the intent of the query norm is to make results from
different queries comparable, it doesn’t work very well. The only
purpose of the relevance _score is to sort the results of the current
query in the correct order. You should not try to compare the
relevance scores from different queries.
if I ran an MLT query and document 'A' is similar to document 'B' and the score is 0.4 and conversely,
running the MLT query document 'B' is similar to document 'A' and its score is 2.4.
I would expect the score to be the same based on the tokens matched in the MLT, but that's not the case.
Also,
if I ran an MLT query and document 'A' is similar to document 'B' and the score is 0.6 and
running another MLT query document 'C' is similar to document 'A' and its score is 4.7.
So my questions are:
Does this imply that C is much more similar to A than B ?
Also, what's the best way for me compare multiple queries in elasticsearch when the scores are different?
Thanks,
- Phil

1.
No, It doesn't. As you noted in your question, you should not compare the scores of different queries. If you want to get a meaningful result of which documents are most similar to C, you should generate an MLT query for document C, and search with that.
This is made doubly true due to how MLT queries work. MLT attempts to generate a list of interesting terms to search for from your document (based on the library of terms in the index), and searches for them. The set of terms generated from doc A may be much different than that generated from Document B, thus the wildly different scores when when finding A from B, and vice-versa, even though the documents themselves will obviously have the same overlap.
2.
Don't. Listen to the docs. Scores are only designed to rank how well documents match the query that generated them. Using them outside that context is not meaningful. Rethink what you are trying to accomplish.

Related

How to boost most popular (top quartile) in elasticsearch query results (outliers)

I have an elasticsearch query that includes bool - must / should sections that I have refined to match search terms and boost for terms in priority fields, phrase match, etc.
I would like to boost documents that are the most popular. The documents include a field "popularity" that indicates the number of times the document was viewed.
Preferably, I would like to boost any documents in the result set that are outliers - meaning that the popularity score is perhaps 2 standard deviations from the average in the result set.
I see aggregations but I'm interested in boosting results in a query, not a report/dashboard.
I also noted the new rank_feature query in ES 7 (I am still on 6.8 but could upgrade). It looks like the rank_feature query looks across all documents, not the result set.
Is there a way to do this?
I think that you want to use a rank or a range query in a "rescore query".
If your need is to specific for classical queries, you can use a "function_score" query in your rescore and use a script to write your own score calculation
https://www.elastic.co/guide/en/elasticsearch/reference/7.9/filter-search-results.html
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-rescore.html

Does Elasticsearch score different length shingles with the same IDF?

In Elasticsearch 5.6.5 I'm searching against a field with the following filter applied:
"filter_shingle":{
"max_shingle_size":"4",
"min_shingle_size":"2",
"output_unigrams":"true",
"type":"shingle"
}
When I perform a search for depreciation tax against a document with that exact text, I see the following explanation of the score:
weight(Synonym(content:depreciation content:depreciation tax)) .... [7.65]
weight(content:tax) ... [6.02]
If I change the search to depreciation taffy against the exact same document with depreciation tax in the content I get this explanation:
weight(Synonym(content:depreciation content:depreciation taffy)) .... [7.64]
This is not what I expected. I thought a match on the bigram token for depreciation tax would get a much higher score than a match on the unigram depreciation. However this scoring seems to reflect a simple unigram match. There is an extremely small difference and digging further this is because the termFreq=28 under the depreciation taffy match, and termFreq=29 under the depreciation tax match. I'm also not sure how this relates as I imagine across the shard holding this document there are very different counts for depreciation, depreciation tax and depreciation tafffy
Is this expected behavior? Is ES treating all the different sized shingles, including unigrams, with the same IDF value? Do I need to split out each shingle size into different sub fields with different analyzers to get the behavior I expect?
TL;DR
Shingles and Synonyms are broken in Elastic/Lucene and a lot of hacks need to be applied until a fix is released (accurate as of ES 6).
Put unigrams, bigrams and so on in individual subfields and search them separately, combining the scores for an overall match. Don't use a single shingle filter on a field that does multiple n-gram configurations
Don't combine a synonym and shingle filter on the same field.
In my case I do a must match with synonyms on a unigram field, then a series of should matches to boost the score on shingles of each size, without synonyms
Details
I got an answer on the elastic support forums:
https://discuss.elastic.co/t/does-elasticsearch-score-different-length-shingles-with-the-same-idf/126653/2
Yep, this is mostly expected.
It's not really the shingles causing the scoring oddness, but the fact
that SynonymQueries do the frequency blending behavior that you're
seeing. They use frequency of the original token for all the
subsequent 'synonym' tokens, as a way to help prevent skewing the
score results. Synonyms are often relatively rare, and would
drastically affect the scoring if they each used their individual
df's.
From the Lucene docs:
For scoring purposes, this query tries to score the terms as if you
had indexed them as one term: it will match any of the terms but only
invoke the similarity a single time, scoring the sum of all term
frequencies for the document.
The SynonymQuery also sets the docFrequency to the maximum
docFrequency of the terms in the document. So for example, if:
"deprecation"df == 5 "deprecation tax"df == 2, "deprecation taffy"df
== 1, it will use 5 as the docFrequency for scoring purposes.
The bigger issue is that Lucene doesn't have a way to differentiate
shingles from synonyms... they both use tokens that overlap the
position of other tokens in the token stream. So if unigrams are mixed
with bi-(or larger)-grams, Lucene is tricked into thinking it's
actually a synonym situation.
The fix is to keep your unigrams and bi-plus-grams in different
fields. That way Lucene won't attempt to use SynonymQueries in these
situations, because the positions won't be overlapping anymore.
Here's another related question that I asked which relates to how actual synonyms also get mangled when combined with shingles. https://discuss.elastic.co/t/es-5-4-synonyms-and-shingles-dont-seem-to-work-together/127552
Elastic/Lucene expands the synonym set, injects them into the token stream, then creates shingles. E.g. Query: econ supply and demand => econ, economics, supply, demand. Document: `... econ foo ... => econ, foo '. Now we get the shingle from the query "econ economics" and somehow this matches the document. No idea why since I only applied synonyms to the query, not the document, so I don't see the match. Also, the way the shingles are created from the query is wrong too.
This is a known problem, and it is still not fully resolved. A number
of Lucene filters can't consume graphs as their inputs.
There is currently active work being done on developing a fixed
shingle filter, and also an idea to have a sub-field for indexing
shingles.

List items is some indices first in Elasticsearch search results

I'm scraping few sites and relisting their products, each site has their own index in Elasticsearch. Some sites have affiliate programs, I'd like to list those first in my search results.
Is there a way for me to "boost" results from a certain index?
Should I write a field hasAffiliate: true into ES when I'm scraping and then boosting the query clauses that have that has that value? Or is there a better way?
Using boost could be difficult to guarantee that they appear first in the search. According to the official guide:
Practically, there is no simple formula for deciding on the “correct”
boost value for a particular query clause. It’s a matter of
try-it-and-see. Remember that boost is just one of the factors
involved in the relevance score
https://www.elastic.co/guide/en/elasticsearch/guide/current/query-time-boosting.html
It depends on the type of queries you are doing, but here you have other couple of options:
A score function with weights: could be a more predictable option.
Simply using a sort by hasAffiliate (the easiest one).
Note: Not sure if sorting by boolean field is possible, in that case you could set hasAffiliate mapping as integer byte (smallest one), setting it as 1 when true.

Solr Boosting Logic Concepts

I'm trying to understand boosting and if boosting is the answer to my problem.
I have an index and that has different types of data.
EG: Index Animals. One of the fields is animaltype. This value can be Carnivorous, herbivorous etc.
Now when a we query in search, I want to show results of type carnivorous at top, and then the herbivorous type.
Also would it be possible to show only say top 3 results from a type and then remaining from other types?
Let assume for a herbivourous type we have a field named vegetables. This will have values only for a herbivourous animaltype.
Now, can it be possible to have boosting rules specified as follows:
Boost Levels:
animaltype:Carnivorous
then animaltype:Herbivorous and vegatablesfield: spinach
then animaltype:herbivoruous and vegetablesfield: carrot
etc. Basically boosting on various fields at various levels. Im new to this concept. It would really helpful to get some inputs/guidance.
Thanks,
Kasturi Chavan
Your example is closer to sorting than boosting, as you have a priority list for how important each document is - while boosting (in Solr) is usually applied a bit more fluent, meaning that there is no hard line between documents of type X and type Y.
However - boosting with appropriately large values will in effect give you the same result, putting the documents into different score "areas" which will then give you the sort order you're looking for. You can see the score contributed by each term by appending debugQuery=true to your query. Boosting says that 'a document with this value is z times more important than those with a different value', but if the document only contains low scoring tokens from the search (usually words that are very common), while other documents contain high scoring tokens (words that are infrequent), the latter document might still be considered more important.
Example: Searching for "city paris", where most documents contain the word 'city', but only a few contain the word 'paris' (but does not contain city). Even if you boost all documents assigned to country 'germany', the score contributed from city might still be lower - even with the boost factor than what 'paris' contributes alone. This might not occur in real life, but you should know what the boost actually changes.
Using the edismax handler, you can apply the boost in two different ways - one is to use boost=, which is multiplicative, or to use either bq= or bf=, which are additive. The difference is how the boost contributes to the end score.
For your example, the easiest way to get something similar to what you're asking, is to use bq (boost query):
bq=animaltype:Carnivorous^1000&
bq=animaltype:Herbivorous^10
These boosts will probably be large enough to move all documents matching these queries into their own buckets, without moving between groups. To create "different levels" as your example shows, you'll need to tweak these values (and remember, multiple boosts can be applied to the same document if something is both herbivorous and eats spinach).
A different approach would be to create a function query using query, if and similar functions to result in a single integer value that you can use as a sorting value. You can also calculate this value when indexing the document if it's static (which your example is), and then sort by that field instead. It will require you to reindex your documents if the sorting values change, but it might be an easy and effective solution.
To achieve the "Top 3 results from a type" you're probably going to want to look at Result grouping support - which makes it possible to get "x documents" for each value in a single field. There is, as far as I know, no way to say "I want three of these at the top, then the rest from other values", except for doing multiple queries (and excluding the three you've already retrieved from the second query). Usually issuing multiple queries works just as fine (or better) performance wise.

Difference between Elasticsearch Range Query and Range Filter

I want to query elasticsearch documents within a date range. I have two options now, both work fine for me. Have tested both of them.
1. Range Query
2. Range Filter
Since I have a small data set for now, I am unable to test the performance for both of them. What is the difference between these two? and which one would result in faster retrieval of documents and faster response?
The main difference between queries and filters has to do with scoring. Queries return documents with a relative ranked score for each document. Filters do not. This difference allows a filter to be faster for two reasons. First, it does not incur the cost of calculating the score for each document. Second, it can cache the results as it does not have to deal with possible changes in the score from moment to moment - it's just a boolean really, does the document match or not?
From the documentation:
Filters are usually faster than queries because:
they don’t have to calculate the relevance _score for each document — 
the answer is just a boolean “Yes, the document matches the filter” or
“No, the document does not match the filter”. the results from most
filters can be cached in memory, making subsequent executions faster.
As a practical matter, the question is do you use the relevance score in any way? If not, filters are the way to go. If you do, filters still may be of use but should be used where they make sense. For instance, if you had a language field (let's say language: "EN" as an example) in your documents and wanted to query by language along with a relevance score, you would combine a query for the text search along with a filter for language. The filter would cache the document ids for all documents in english and then the query could be applied to that subset.
I'm over simplifying a bit, but that's the basics. Good places to read up on this:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-filtered-query.html
http://exploringelasticsearch.com/searching_data.html
http://elasticsearch-users.115913.n3.nabble.com/Filters-vs-Queries-td3219558.html
Filters are cached so they are faster!
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/filter-caching.html

Resources