ElasticSearch 5.1 geo distance range query deprecation? - elasticsearch

According to the the Elasticsearch 5 release note listing breaking changes , the geo_distance_range query is no longer supported.
Eventhough the official documentation doesn't state such a change, using this API is printing an error :
[geo_distance_range] queries are no longer supported for geo_point field types. Use geo_distance sort or aggregations
The suggestion is to use bucket aggregations instead of the geo_distance_range query feature.
But aggregations are by nature something quite different from applying a search filter (possibily in combination with other filters).
Is there any way to achieve the same thing as this feature originally provided in previous versions of Elastic search ?

Related

How to add aggregions for KNN search in elasticsearch?

I want to use aggregations over the search result of the knn_search api in elasticsearch (because I need facet search on the user interface), but I cannot pass the agg parameter as in the search api. Any suggestions?
Tldr;
As per the documentation of this endpoint GET /<index>/_knn_search.
You just can not give the parameter agg.
In 8.4
Although in the latest version of elasticsearch, you can use the knn search in the standard search queries.

ElasticSearch: given a document and a query, what is the relevance score?

Once a query is executed on ElasticSearch, a relevance _score is calculated for each retrieved document.
Given a specific document (e.g. by doc ID) and a specific query, I would like to see what is its _score?
One way is perhaps to query ES, retrieve all the hit documents, and look up the desired document out of all the retrieved documents to see its score.
I assume there should be a more efficient way to do this. Given a query and a document ID, what is its _score?
I'm using ElasticSearch 7.x
PS: I need this for a learning-to-rank scenario (to create my judgment list). I have in fact a complex query that was created from various should and must over different fields. My major requirement was to get the score value for each individual sub-query, which seems there is no solution for it. I want to understand which part of this complex query is more useful and which one is less. The only way I've come up with is to execute each sub-query separately to get the score but I do not want to actually execute that query just asking for what is the score of a specific document for that sub-query.
Scoring of the document is not only related to just the document and all other documents in the index, but it also depends on various factor like:
_score is calculated per shard basis not on an index basis by default, although you can change this behavior by using DFS Query Then Fetch param in your query. More info on this official blog.
Is there is any boost applied at index or query time(index time is deprecated from 5.X).
Any custom scoring function is used in addition to the default ES scoring algorithm(tf/idf in old versions) and BM25 in the latest versions.
Edit: Based on the comments from the other respected community members, rephrasing the below statement:
To answer your question, Using the _explain API, you can understand how Elasticsearch computes a score explanation for a query and a specific document. This can give useful feedback on whether a document matches or didn’t match a specific query.

ElasticSearch: post_filter or filter?

Let's say I have a similar situation explained here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-post-filter.html
Before I stumbled upon this article, I have been using filter instead of post_filter for this kind of scenario, and it produced output just like the post_filter.
My question is: Are they the same thing? If not, which one is the recommended and more efficient method to use and why?
As far as search hits are concerned, they are the same thing, i.e. the hits you get will be correctly filtered according to either your filter in a filtered query or the filter in your post_filter.
However, as far as aggregations are concerned, the end result will not be the same. The difference between both boils down to what document set the aggregations will be computed on.
If your filter is in a filtered query, then your aggregations will be computed on the document set selected by the query(ies) and the filter(s) in your filtered query, i.e. the same set of documents that you will get in the response.
If your filter is in a post_filter, then your aggregations will be computed on the document set selected by your various query(ies). Once aggregations have been computed on that document set, the latter is further filtered by the filter(s) in your post_filter before returning the matching documents.
To sum it up,
a filtered query affects both search results and aggregations
while a post_filter only affects the search results but NOT the aggregations
Another important difference between filter and post_filter that wasn't mentioned in any of the answers: performance.
TL;DR
Don't use post_filter unless you actually need it for aggregations.
From The Definitive Guide:
WARNING: Performance consideration
Use a post_filter only if you need to differentially filter search
results and aggregations. Sometimes people will use post_filter for
regular searches.
Don’t do this! The nature of the post_filter means it runs after
the query, so any performance benefit of filtering (such as caches) is
lost completely.
The post_filter should be used only in combination with
aggregations, and only when you need differential filtering.
In my tests , I could find filter is behaving exactly as post_filter. Both are only affecting the hits section ONLY.

Getting ElasticSearch Percolator Queries

I'm trying to query ElasticSearch for all the percolator queries that are currently stored on the system. My first thought was to do a match_all with a type filter but from my testing they don't seem to be returned if I do a match_all query. I haven't for the life of me been able to find the proper way to query them or any documentation on it so any help is greatly appreciated.
Also any other information on how stored percolator queries are treated differently from other types is appreciated.
For versions 5.x and later
Percolator documents should be returned in a query as with any other document.
Documentation of this new behavior can be found here.
Please note that with the removal of mapping types in 6.x it is unclear what will happen with the percolator index type. The reader may assume that it will be removed and that percolators will/should be stored in separate indices. Separating percolators into isolated indices is usually suggested regardless. Also please note that this 6.x type removal should not affect the answer to this question.
For versions before 5.0
This will return all percolator documents stored in your elasticsearch cluster:
POST _all/.percolator/_search
This searches _all indexes (every index you have registered) for documents of the .percolator type.
It basically does what you describe above: "a match_all with a type filter". Yet it accomplishes it in a slightly different way.
I have not played around with this much more than this, but I assume this would actually allow you to perform a query/filter on percolators if you are looking for a percolator of a particular type.

does kibana support max in queries?

I am hoping to find some information on the syntax of kibana queries. I want to be able to have a query that returns the max value of a field. Is this possible I have seen some stuff on facets but not sure if it apply's?
I know that max is an option for the histogram but i would like to use it elsewhere.
Since Kibana queries use the Lucene query syntax or RegEx, currently its queries seem to return matched records only (no aggregation).
I believe that aggregation (Max, for example) is only possible in Kibana Panels such as the Histogram.

Resources