Is the Count of Search results from Elastic Search accurate? Or is this approximate? If accurate, will it always be accurate based on what is indexed? Assuming that all documents are indexed along with all the text in them and no documents are being ingested, what is the behavior of Count of Search results? Does it change base on the volume of the index and volume of the search results. Thanks for your help!!
Related
I understand that there is term vector in elastic search which can give the word position and other stats.
Can percolator give the word position in the documents that are being searched on?
I understand that the documents are not indexed and only percolator queries are indexed. I see the below
If the requested information wasn’t stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user.
in - https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html
So interested to know if elastic search can calculate the word position on the fly?
Any leads are appreciated. Thanks for reading.
#Kaveh
Thanks for taking time for me but really sorry I don't see how this (https://stackoverflow.com/a/67926555/4068218) is related because using artificial documents I can get the stats - https://www.elastic.co/guide/en/elasticsearch/reference/6.8/docs-termvectors.html
but what I have is percolator - https://www.youtube.com/watch?v=G2Ru2KV0DZg
So even if I get the term vector on fly using artificial documents or by the /_analyze it does not matter as they will not give me the position of terms (in percolator)
eg Percolator - I am trying to find the word - Hello.
My document has the below field and value
"text": "Hello World"
If I used artificial documents or /_analyze it will say 0 - Hello 1- World but when I percolate I will get the
percolate query that found the word Hello. I want to combine both and want the percolator tell
"I found Hello in position 0"
As you can see in the documentation for term vector if you store _source the Elastic can calculate the term vector on fly. It will analyze your text based on the source and it will aggregate it with existing term vector of index.
If you want to get the result for term you always can get your analyzed data for list of terms for more information here.
Elastic search use inverted index which is totally understandable because it returns all the documents containing the word we searched for.
But I do not understand where do we use forward index? Like, we don't search for document and expect words containing in that particular document.
Is there any practical use case for forward index? Any company using it for its product?
As mentioned in this SO answer, there is no technical difference between the forward index and the inverted index. Forward index is a list of terms contained within a particular document. The inverted index would be a list of documents containing a given term.
Please go through this blog, where it is clearly mentioned that the forward index is pretty fast when indexing and have less efficient queries.
Whereas inverted indexing have slower indexing, but fast query. To get a detailed explanation of the inverted index, you can refer to this article and this blog.
I am using Elasticsearch6.8 and I get a list of document in the response. Some of the document have the same score but they appear on the same order in the response list consistently. I wonder what the algorithm ES uses to sort the document with the same score?
ES uses the index order when there is a tie when sorting on score.
The index order is defined by the _doc field.
If I have set of documents as a response from elastic search,
how can I aggregate the results based on the score? the results should have two buckets , where the first bucket has documents whose score greater than 1 and the other less than 1.
I am new to elastic search, have seen that I can use script for this, but could not get that working.
I have a corpus of documents indexed . I also stored the term vectors when indexing. Now I want to retrieve term vectors of all documents satisfying some filtering options.
I was able to get term vector for a single document or for a set of documents by providing the document IDs. But is there a way to get term vectors for all the documents without providing document IDs?
Eventually what I want to do is to get the frequency counts of all the terms in a field, for all documents in an index (i.e., a bag of words matrix).
I am using elasticsearch-py as a client.
Appreciate any pointers. Thanks!