If elastic search is using inverted index, I want to know how elasticsearch is able to support range queries and phrase queries.
Note: I saw that inverted index supports them but i am not clear on how they do it internally.
Found the link ..
Reference : https://blog.parse.ly/post/1691/lucene/
Here’s a snippet from Lucene in Action on the topic: “If you indexed your field with NumericField, you can efficiently search a particular range for that field using NumericRangeQuery. Under the hood, Lucene translates the requested range into the equivalent set of brackets in the indexed trie structure.”
This blog actually has some nice information on lucene indexes.
Related
When I search say car engine(this is first time any user has searched for this keyword) in Elastic search/lucene , does search engine search the index for individual words in index table first and then find intersection. For example :- Say engine found the 10
documents for car and then it will search for engine say it got 5 documents. Now in 5 documents(minimal no of documents), it will search for car. It has found 2 documents.
Now search engine will rank it based on above results . Is this how multiple words are searched in index table at high level ?
For future searches against same keyword, does search engine make new entry for key car engine in index table ?
Yes, it does search for individual terms and takes the intersection or union of the results, according to your query. It uses something called an "inverted index" which it generates, as and when the documents to be searched are "indexed" into elasticsearch.
Indexing operations are different from searching. So, No, it wouldn't index user searches unless you tell it to (in your application).
The basic functioning of elasticsearch can be split into two parts:
Indexing. You create an index of documents by indexing all the documents that you want to search in. These documents could be anything from your MySQL store, or from Logstash etc, or could be made up of users' search queries that your application indexes into a relevant elastic index.
Searching. You search for the indexed documents using some keywords that could be user generated or application generated or a mixture, using ElasticSearch queries (DSL). If a result is found (according to your query) then elasticsearch returns the relevant records.
I'd encourage you to read this doc for a better understanding of how elastic searches docs:
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
I have a collection with thousands of documents each of which contains a string to be searched for. I would like to make an index for these strings like so:
index a "an apple"
index a "arbitrary value"
index s "something"
I think I will be able to improve the search performance if I create these indices so that when I search for 'something', I can only look up documents in the index 's'. I am new to database design and wonder if this is the right way to improve the performance of the queries with string values. Is there any better way to do this or does mongodb have a built in mechanism to achieve this kind of indexing? Please enlighten me.
You can create indexes based on the keys and not on the values.
Each document will have a default index created on the _id field.
You can also create compound Index, ie combining on or more fields
Creation of Index should be appropriate to your search, so that your search queries will be faster.
http://docs.mongodb.org/manual/indexes/
I'm trying to query ElasticSearch for all the percolator queries that are currently stored on the system. My first thought was to do a match_all with a type filter but from my testing they don't seem to be returned if I do a match_all query. I haven't for the life of me been able to find the proper way to query them or any documentation on it so any help is greatly appreciated.
Also any other information on how stored percolator queries are treated differently from other types is appreciated.
For versions 5.x and later
Percolator documents should be returned in a query as with any other document.
Documentation of this new behavior can be found here.
Please note that with the removal of mapping types in 6.x it is unclear what will happen with the percolator index type. The reader may assume that it will be removed and that percolators will/should be stored in separate indices. Separating percolators into isolated indices is usually suggested regardless. Also please note that this 6.x type removal should not affect the answer to this question.
For versions before 5.0
This will return all percolator documents stored in your elasticsearch cluster:
POST _all/.percolator/_search
This searches _all indexes (every index you have registered) for documents of the .percolator type.
It basically does what you describe above: "a match_all with a type filter". Yet it accomplishes it in a slightly different way.
I have not played around with this much more than this, but I assume this would actually allow you to perform a query/filter on percolators if you are looking for a percolator of a particular type.
I want to build elastisearch queries using JAVA API. I want to know how to can use Lucene analyzers in elasticsearch java programs. I have checked QueryBuilders and tried to use analyzers directly as below.
QueryBuilder builder = QueryBuilders.matchQuery(searchString, fields).analyzer("porterstem");
But, it turned out to be wrong. If any one tried it, could you please give me some information?
You should define your analyzer in mapping.
So the analyzer will be used at index time and at query time.
ANALYZERS are used to analyze the documents that your are indexed. Analysis means it Ll split,the text in to tokens, normalize it, and also Lower case your indexed doc text. This analysis process Ll b more helpful while you search and searching will be faster..
You can mention analyzer while you query . But analyze the stored documents during query time. Ll b expensive. So analyze the document during indexing time. ES will analysis the doc during indexed and query time will b less and faster result.
So mention analyzers in mapping and searching efficiently..
For more information about analyzer refer
https://lucene.apache.org/core/4_0_0/core/org/apache/lucene/analysis/Analyzer.html
Is there a way to do faceted searches using the elasticsearch Search API maintaining case (as opposed to having the results be converted to lowercase).
Thanks in advance, Chuck
Assuming you are using the "terms" facet, the facet entries are exactly the terms in the index. Briefly, analysis is the process of converting a field value into a sequence of terms, and lowercasing is a step in the default analyzer; that's why you're seeing lowercased terms. So you will want to change your analysis configuration (and perhaps introduce a multi_field if you want to run several different analyzers.)
There's a great explanation in Lucene in Action (2nd Ed.); it's applicable to ElasticSearch, too.