Getting ElasticSearch Percolator Queries - elasticsearch

I'm trying to query ElasticSearch for all the percolator queries that are currently stored on the system. My first thought was to do a match_all with a type filter but from my testing they don't seem to be returned if I do a match_all query. I haven't for the life of me been able to find the proper way to query them or any documentation on it so any help is greatly appreciated.
Also any other information on how stored percolator queries are treated differently from other types is appreciated.

For versions 5.x and later
Percolator documents should be returned in a query as with any other document.
Documentation of this new behavior can be found here.
Please note that with the removal of mapping types in 6.x it is unclear what will happen with the percolator index type. The reader may assume that it will be removed and that percolators will/should be stored in separate indices. Separating percolators into isolated indices is usually suggested regardless. Also please note that this 6.x type removal should not affect the answer to this question.
For versions before 5.0
This will return all percolator documents stored in your elasticsearch cluster:
POST _all/.percolator/_search
This searches _all indexes (every index you have registered) for documents of the .percolator type.
It basically does what you describe above: "a match_all with a type filter". Yet it accomplishes it in a slightly different way.
I have not played around with this much more than this, but I assume this would actually allow you to perform a query/filter on percolators if you are looking for a percolator of a particular type.

Related

Using stored_fields for retrieving a subset of the fields in Elastic Search

The documentation and recommendation for using stored_fields feature in ElasticSearch has been changing. In the latest version (7.9), stored_fields is not recommended - https://www.elastic.co/guide/en/elasticsearch/reference/7.9/search-fields.html
Is there a reason for this?
Where as in version 7.4.0, there is no such negative comment - https://www.elastic.co/guide/en/elasticsearch/reference/7.4/mapping-store.html
What is the guidance in using this feature? Is using _source filtering a better option? I ask because in some other doc, _source filtering is supposed to kill performance - https://www.elastic.co/blog/found-optimizing-elasticsearch-searches
If you use _source or _fields you will quickly kill performance. They access the stored fields data structure, which is intended to be used when accessing the resulting hits, not when processing millions of documents.
What is the best way to filter fields and not kill performance with Elastic Search?
source filtering is the recommended way to fetch the fields and you are getting confused due to the blog, but you seem to miss the very important concept and use-case where it is applicable. Please read the below statement carefully.
_source is intended to be used when accessing the resulting hits, not when processing millions of documents.
By default, elasticsearch returns only 10 hits/search results which can be changed based on the size parameter and if in your search results, you want to fetch few fields value than using source_filter makes perfect sense as it's done on the final result set(not all the documents matching search results),
While if you use the script, and using source value try to read field-value and filter the search result, this will cause queries to scan all the index which is the second part of the above-mentioned statement(not when processing millions of documents.)
Apart from the above, as all the field values are already stored as part of _source field which is enabled by default, you need not allocate extra space if you explicitly mark few fields as stored(disabled by default to save the index size) to retrieve field-values.

ElasticSearch: given a document and a query, what is the relevance score?

Once a query is executed on ElasticSearch, a relevance _score is calculated for each retrieved document.
Given a specific document (e.g. by doc ID) and a specific query, I would like to see what is its _score?
One way is perhaps to query ES, retrieve all the hit documents, and look up the desired document out of all the retrieved documents to see its score.
I assume there should be a more efficient way to do this. Given a query and a document ID, what is its _score?
I'm using ElasticSearch 7.x
PS: I need this for a learning-to-rank scenario (to create my judgment list). I have in fact a complex query that was created from various should and must over different fields. My major requirement was to get the score value for each individual sub-query, which seems there is no solution for it. I want to understand which part of this complex query is more useful and which one is less. The only way I've come up with is to execute each sub-query separately to get the score but I do not want to actually execute that query just asking for what is the score of a specific document for that sub-query.
Scoring of the document is not only related to just the document and all other documents in the index, but it also depends on various factor like:
_score is calculated per shard basis not on an index basis by default, although you can change this behavior by using DFS Query Then Fetch param in your query. More info on this official blog.
Is there is any boost applied at index or query time(index time is deprecated from 5.X).
Any custom scoring function is used in addition to the default ES scoring algorithm(tf/idf in old versions) and BM25 in the latest versions.
Edit: Based on the comments from the other respected community members, rephrasing the below statement:
To answer your question, Using the _explain API, you can understand how Elasticsearch computes a score explanation for a query and a specific document. This can give useful feedback on whether a document matches or didn’t match a specific query.

FieldvalueCache not bein populated in solr 6.3 even though we are faceting on several fields?

We have large index of 200 GB. We have query requirements where we perform faceting on 5-6 fileds (whitespace tokenized). I have read solr documents which says Faceting tokenized field will populate fieldvalueCache. But for some reason all the facets are cached in FieldCache rather than fieldvaluecahe. Can someone explain as to why this is happening?
I guess this is due to Solr favors docValue to fieldValueCache.
https://issues.apache.org/jira/browse/LUCENE-5666
if you want to use fieldValueCache, you can do via json facet.
https://issues.apache.org/jira/browse/SOLR-8466
here is some more discussion regarding the changes
https://issues.apache.org/jira/browse/SOLR-7190
Here is some related discussion in stackoverflow,
lucene Fields vs. DocValues

how can I find related keywords with elasticsearch?

I am pretty new to elasticsearch and already love it.
Right know I am interested in understanding on how I can let elasticsearch make suggestions for similar keywords.
I have already read this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.
The More Like This Query (MLT Query) finds documents that are "like" a given set of documents.
This is already more than I am looking for. I dont need similar documents but only related / similar keywords.
So lets say I have an index of documents about movies and I start a query about "godfather". Then elasticsearch should suggest related keywords - e.g. "al pacino" or "Marlon Brando" because they are likely to occur in the same documents.
any ideas how this can be done?
Unfortunately, there is no built-in way to do that in Elastic. What you could possibly do, is to write a program, that will query Elastic, return matched documents, then you will get the _source data, or just retrieve it from your original datasource (like DB or file), later you will need to calculate TF-IDF for each term in the retrieved ones and somehow combine everything all together to get top K terms out of all returned terms.

Set up Elasticsearch suggesters that can return suggestions from different data types

We're in the process of setting up Amazon Elasticsearch Service (running Elasticsearch version 2.3).
We have different types of data (that I'm currently thinking of as different document types within the same index).
We have a generic search in an app where we want an inline autocomplete function, that is, a completion suggester returning hits from all different data (document) types. How can that be set up?
When querying suggesters you have to specify an index, so that's why I wanted to keep all the data in the same index. According to the documentation, the completion suggester considers all documents in the index.
Setting up the completion suggester for the first document type was pretty straight forward and is working great. However, as far as I can see you to specify a suggest field when querying. That would be all good hadn't it been for the error message we get when setting up the mapping for the second document type:
Type: illegal_argument_exception Reason: "[suggest] is defined as an object in mapping [name_of_document_type] but this name is already used for a field in other types"
Writing this question I see that it's possible to specify more than one suggester in a single suggest query. Maybe that is what we have to solve it? (I.e. get X results from Y suggesters where we compare the scores to get the 1 suggestion we want to present to the user.)
One of the core principles of good data design for Elasticsearch (as with many data stores) is to optimise your data storage for ease of reading. Usually, this means embracing duplication.
With this in mind, I'd suggest having a separate autocomplete index with a mapping that's designed specifically for the suggester queries.
Whenever you insert or write one of your other documents, map it to your autocomplete type and add or update it in your autocomplete index at the same time (or, depending on how up-to-date it needs to be, create an offline process to update your autocomplete index e.g. every day).
Then, when you do your suggest query, you can just use your autocomplete index and not worry about dealing with different types of documents with different fields.

Resources