I have a collection with thousands of documents each of which contains a string to be searched for. I would like to make an index for these strings like so:
index a "an apple"
index a "arbitrary value"
index s "something"
I think I will be able to improve the search performance if I create these indices so that when I search for 'something', I can only look up documents in the index 's'. I am new to database design and wonder if this is the right way to improve the performance of the queries with string values. Is there any better way to do this or does mongodb have a built in mechanism to achieve this kind of indexing? Please enlighten me.
You can create indexes based on the keys and not on the values.
Each document will have a default index created on the _id field.
You can also create compound Index, ie combining on or more fields
Creation of Index should be appropriate to your search, so that your search queries will be faster.
http://docs.mongodb.org/manual/indexes/
Related
We're using ElasticSearch and we have two different indexes with different data. Recently, we wanted to make a query that needs data from both indexes. ES allows to search through multiple indexes: /index1,index2/_search. The problem is that both indexes have properties with the same name and there could be collisions because ES doesn't know on which index to search.
How can we tell ES to look up a property from concrete index?
For example: index1.myProperty and index2.otherProperty
When I search say car engine(this is first time any user has searched for this keyword) in Elastic search/lucene , does search engine search the index for individual words in index table first and then find intersection. For example :- Say engine found the 10
documents for car and then it will search for engine say it got 5 documents. Now in 5 documents(minimal no of documents), it will search for car. It has found 2 documents.
Now search engine will rank it based on above results . Is this how multiple words are searched in index table at high level ?
For future searches against same keyword, does search engine make new entry for key car engine in index table ?
Yes, it does search for individual terms and takes the intersection or union of the results, according to your query. It uses something called an "inverted index" which it generates, as and when the documents to be searched are "indexed" into elasticsearch.
Indexing operations are different from searching. So, No, it wouldn't index user searches unless you tell it to (in your application).
The basic functioning of elasticsearch can be split into two parts:
Indexing. You create an index of documents by indexing all the documents that you want to search in. These documents could be anything from your MySQL store, or from Logstash etc, or could be made up of users' search queries that your application indexes into a relevant elastic index.
Searching. You search for the indexed documents using some keywords that could be user generated or application generated or a mixture, using ElasticSearch queries (DSL). If a result is found (according to your query) then elasticsearch returns the relevant records.
I'd encourage you to read this doc for a better understanding of how elastic searches docs:
https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up
I am pretty new to elasticsearch and already love it.
Right know I am interested in understanding on how I can let elasticsearch make suggestions for similar keywords.
I have already read this article: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-mlt-query.html.
The More Like This Query (MLT Query) finds documents that are "like" a given set of documents.
This is already more than I am looking for. I dont need similar documents but only related / similar keywords.
So lets say I have an index of documents about movies and I start a query about "godfather". Then elasticsearch should suggest related keywords - e.g. "al pacino" or "Marlon Brando" because they are likely to occur in the same documents.
any ideas how this can be done?
Unfortunately, there is no built-in way to do that in Elastic. What you could possibly do, is to write a program, that will query Elastic, return matched documents, then you will get the _source data, or just retrieve it from your original datasource (like DB or file), later you will need to calculate TF-IDF for each term in the retrieved ones and somehow combine everything all together to get top K terms out of all returned terms.
As you probably know, in MySQL you can create indexes to improve the performance of your queries. Is there any such equivalent in Elastic? (I already know that an index is somewhat the equivalent of creating a database in Elastic)
I just need confirmation from black-belt Elastic users ;)
From the documentation:
Relational databases add an index, such as a B-tree index, to specific
columns in order to improve the speed of data retrieval. Elasticsearch
and Lucene use a structure called an inverted index for exactly the
same purpose.
By default, every field in a document is indexed (has an inverted index) and thus is searchable. A field without an inverted index is
not searchable. We discuss inverted indexes in more detail in Inverted
Index.
I am trying to implement an analyzer (uppercase) and index some documents after that in elasticsearch. My question is, am i following the correct procedure?
Implement your analyzer (containing index and type name), which would create the index if it doesnt exist
Then index the documents with the same index and type name as above during which stream of text would pass through the analyzer and then would be saved in index.
Is this the correct way to go about it?
I indexed some documents with and without using analyzers, checked the contents of index before/after using Facets, and they were no different.
The content is not supposed to be different. How it's indexed is. You should recognize the difference because queries would have different results, like some documents are found which weren't without the analyzers, and viceversa.
Try for instance a March Query.
The _score may and should also change