Elastic search/lucene index on multiple words? - elasticsearch

When I search say car engine(this is first time any user has searched for this keyword) in Elastic search/lucene , does search engine search the index for individual words in index table first and then find intersection. For example :- Say engine found the 10
documents for car and then it will search for engine say it got 5 documents. Now in 5 documents(minimal no of documents), it will search for car. It has found 2 documents.
Now search engine will rank it based on above results . Is this how multiple words are searched in index table at high level ?
For future searches against same keyword, does search engine make new entry for key car engine in index table ?

Yes, it does search for individual terms and takes the intersection or union of the results, according to your query. It uses something called an "inverted index" which it generates, as and when the documents to be searched are "indexed" into elasticsearch.
Indexing operations are different from searching. So, No, it wouldn't index user searches unless you tell it to (in your application).
The basic functioning of elasticsearch can be split into two parts:
Indexing. You create an index of documents by indexing all the documents that you want to search in. These documents could be anything from your MySQL store, or from Logstash etc, or could be made up of users' search queries that your application indexes into a relevant elastic index.
Searching. You search for the indexed documents using some keywords that could be user generated or application generated or a mixture, using ElasticSearch queries (DSL). If a result is found (according to your query) then elasticsearch returns the relevant records.
I'd encourage you to read this doc for a better understanding of how elastic searches docs:
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up

Related

How to get all the index patterns which never had any documents?

For Kibana server decommissioning purposes, I want to get a list of index patterns which never had any single document and had documents.
How to achieve this using Kibana only?
I tried this but it doesn't give the list based on the document count.
GET /_cat/indices
Also in individual level getting the count to check the documents are there is time consuming .
GET index-pattern*/_count
You can try this. V is for verbose and s stands for sort.
GET /_cat/indices?v&s=store.size:desc
From the docs :
These metrics are retrieved directly from Lucene, which {es} uses internally to power indexing and search. As a result, all document counts include hidden nested documents.

Confusion about Elasticsearch

I have some confusion about ElasticSearch's Index.
In some place I read it's the equivalent of rdbms' database and some other place, an Index is like what we have at the end of books : list of words with corresponding documents that contain the word.
If someone can clarify.
Thanks
An Elasticsearch cluster can contain multiple Indices (databases). These indices hold multiple Documents (rows), and each document has Properties or field(columns).
you can check list of your available indices with http://localhost:9200/_cat/indices?v .
but in general (computer sciences and DB) indexing means like you said.
list of words with corresponding documents that contain the word
. this structure improves the speed of data retrieval operations on a database table. this concept could be used in many DB like mysql or oracle. in elasticsearch by default all document will be indexed. (you can change this settings to not indexing some columns/fields)

Find doc ids before query phase?

As we do any search in elastic, Elastic performs it in two phases i.e Query and fetch phase as explained under section "Default search type: Query Then Fetch" at this resource
Here are the points
Send the query to each shard
Find all matching documents and calculate scores using local Term/Document Frequencies
Build a priority queue of results (sort, pagination with from/to, etc)
..
I have a question on point 1 of query phase. Per my understanding before query phase itself, elastic will find the relevant documents ids from inverted index based on the word in search query.
Then query will go specific shards only instead of going to each shard. Is that correct ?
So in query phase will elastic fetch those documents from shard based on document_id got grom inverted index, then calculate the scores for fetched document and return id's along with scrores to requesting node.
In fetch phase requesting node get all scores and decide what needs to sent to client then it actually fetches the document.
I have a question on point 1 of query phase. Per my understanding
before query phase itself, elastic will find the relevant documents
ids from inverted index based on the word in search query. Then query
will go specific shards only instead of going to each shard. Is that
correct ?
Here elastic will identify the shard based on document id before query phase. Inverted index does not come into the picture here. Once query goes to shard, the elastic refers the inverted index to find which term exists in which file/index.
Rest of the stuff is same as you pointed in mentioned resource

Elasticsearch Searching over large number of fields in a large index

On Elasticsearch 5.6.
We've got a requirement to implement a context free search (a simple google like search anything) feature that could operate over an index with 1000 fields. The index itself can be big (1 million docs per day).
I was looking at the query_string query with a fields as '*'. I came across this section
https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-search-speed.html#_search_as_few_fields_as_possible
where it says searching over multiple fields will slow down the search and a general pattern is to have an "all like field with all the values munged and run a search on it.
While this is perfectly possible, my requirement is a bit more complex that these 1000 fields are protected by document level security by using x-pack security. Therefore if I search only for the "all like" field, I might be bringing the top result as the one for which the user actually didn't have any fields relevant to their permission settings. Somewhere there's a gap here is what I foresee. Any thoughts and possible solutions?

mongodb search indexing performance

I have a collection with thousands of documents each of which contains a string to be searched for. I would like to make an index for these strings like so:
index a "an apple"
index a "arbitrary value"
index s "something"
I think I will be able to improve the search performance if I create these indices so that when I search for 'something', I can only look up documents in the index 's'. I am new to database design and wonder if this is the right way to improve the performance of the queries with string values. Is there any better way to do this or does mongodb have a built in mechanism to achieve this kind of indexing? Please enlighten me.
You can create indexes based on the keys and not on the values.
Each document will have a default index created on the _id field.
You can also create compound Index, ie combining on or more fields
Creation of Index should be appropriate to your search, so that your search queries will be faster.
http://docs.mongodb.org/manual/indexes/

Resources