Different queryNorm values in the result for the same query - elasticsearch

The following query https://gist.github.com/anonymous/be27203a578494566a35 gives the following result set https://gist.github.com/anonymous/6935100dbf76b9a8f3e3. The documents has been indexed with the these settings https://gist.github.com/anonymous/ca42a7f67c7281935950.
As you can see, the queryNorm value for the documents in the result set varies. But according to the documentation (taken from http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/practical-scoring-function.html#query-norm):
The same query normalization factor is applied to every document and you have no way of changing it. For all intents and purposes, it can be ignored.
Unfortunately, since this does not seem to be true (or maybe I have misunderstood something), I do not get the desired result set for the query above. More specifically, I would expect the second document to have a higher relevance than the first, since there is a higher boosting factor if the query matches the "name" field compared to the "subtype" field. But, because the queryNorm factor is lower for the second document, the relevance score gets in total lower.
Why does the queryNorm behave this way?
Is there really no way of disabled it? (i.e setting the factor to 1)
I am running version 1.4.0 of Elasticsearch.

Related

How to boost most popular (top quartile) in elasticsearch query results (outliers)

I have an elasticsearch query that includes bool - must / should sections that I have refined to match search terms and boost for terms in priority fields, phrase match, etc.
I would like to boost documents that are the most popular. The documents include a field "popularity" that indicates the number of times the document was viewed.
Preferably, I would like to boost any documents in the result set that are outliers - meaning that the popularity score is perhaps 2 standard deviations from the average in the result set.
I see aggregations but I'm interested in boosting results in a query, not a report/dashboard.
I also noted the new rank_feature query in ES 7 (I am still on 6.8 but could upgrade). It looks like the rank_feature query looks across all documents, not the result set.
Is there a way to do this?
I think that you want to use a rank or a range query in a "rescore query".
If your need is to specific for classical queries, you can use a "function_score" query in your rescore and use a script to write your own score calculation
https://www.elastic.co/guide/en/elasticsearch/reference/7.9/filter-search-results.html
https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-request-rescore.html

ElasticSearch threshold for search results

I have an elasticsearch query that returns me the correct results in sorted order (the highest relevancy is at the top and is accurate). However, the query also returns me a lot of results and beyond the top 4 or 5, the results seem less relevant.
My question is :
How to set a threshold such that only the most relevant results are
returned by the query
You can use the size param in your elasticsearch query to return your configured number of results. So in your example, if you think only top 5 results are relevant for you then, you can set this size param to 5.
Note, As you might know already that elasticsearch results are sorted according to their score already, hence using size 5 means top 5 relevant documents are returned to you.

ElasticSearch scoring / number of doc

I have small (max 50 char) keywords stored in text field in ElasticSearch index. I noticed that if I clear the index and add only 1 document, let's say "samsung galaxy", the score when I match the document is like 0.95.
But when I add 500k other docs and I make the same query, the score is like 20. I would like to set a min_score for this query because I need a certain level of relevancy.
But as the score is depending of the doc count. I can't set a min_score as the number of docs in the index will constantly evolve.
I already looked for solutions like constant_score but I need the power of Elastic to give me a score (and not 1 or 0).
1) Does this behavior come from the IDF method or not only from it?
2) Is there a way to keep the current search algorythm (or just without the term frequency) and have always the same score for a query without doc count dependency ? This would allow me to set a min_score

Elasticsearch: Modifying Field Normalization at Query Time (omit_norms in queries)

Elasticsearch takes the length of a document into account when ranking (they call this field normalization). The default behavior is to rank shorter matching documents higher than longer matching documents.
Is there anyway to turn off or modify field normalization at query time? I am aware of the index time omit_norms option, but I would prefer to not reindex everything to try this out.
Also, instead of simply turning off field normalization, I wanted to try out a few things. I would like to take field length into account, but not as heavily as elasticsearch currently does. With the default behavior, a document will rank 2 times higher than a document which is two times longer. I wanted to try a non-linear relationship between ranking and length.

Difference between Elasticsearch Range Query and Range Filter

I want to query elasticsearch documents within a date range. I have two options now, both work fine for me. Have tested both of them.
1. Range Query
2. Range Filter
Since I have a small data set for now, I am unable to test the performance for both of them. What is the difference between these two? and which one would result in faster retrieval of documents and faster response?
The main difference between queries and filters has to do with scoring. Queries return documents with a relative ranked score for each document. Filters do not. This difference allows a filter to be faster for two reasons. First, it does not incur the cost of calculating the score for each document. Second, it can cache the results as it does not have to deal with possible changes in the score from moment to moment - it's just a boolean really, does the document match or not?
From the documentation:
Filters are usually faster than queries because:
they don’t have to calculate the relevance _score for each document — 
the answer is just a boolean “Yes, the document matches the filter” or
“No, the document does not match the filter”. the results from most
filters can be cached in memory, making subsequent executions faster.
As a practical matter, the question is do you use the relevance score in any way? If not, filters are the way to go. If you do, filters still may be of use but should be used where they make sense. For instance, if you had a language field (let's say language: "EN" as an example) in your documents and wanted to query by language along with a relevance score, you would combine a query for the text search along with a filter for language. The filter would cache the document ids for all documents in english and then the query could be applied to that subset.
I'm over simplifying a bit, but that's the basics. Good places to read up on this:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-filtered-query.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/0.90/query-dsl-filtered-query.html
http://exploringelasticsearch.com/searching_data.html
http://elasticsearch-users.115913.n3.nabble.com/Filters-vs-Queries-td3219558.html
Filters are cached so they are faster!
http://www.elasticsearch.org/guide/en/elasticsearch/guide/current/filter-caching.html

Resources