In ElasticSearch, should I use multiple indexes for separate but related entities? - elasticsearch

The overhead of adding indexes is well-documented, but I have not been able to find good information on when to use multiple indexes with regards to the various document types being indexed.
Here is a generic example to illustrate the question:
Say we have the following entities
Products (Name, ProductID, ProductCategoryID, List-of-Stores)
Product Categories (Name, ProductCategoryID)
Stores (Name, StoreID)
Should I dump these three different types of documents into a single index, each with the appropriate elasticsearch type?
I am having difficulty establishing where to the draw the line on one vs. multiple indexes.
What if we add an unrelated entity, "Webpages". Definitely a separate index?

A very interesting video explaining elasticsearch "Data Design Patterns" by Shay Banon:
http://vimeo.com/44716955
This exact question is answered at 13:40 where examining different data flows, by looking at the concepts of Type, Filter and Routing
Regards

I was recently modeling a ElasticSearch backend from scratch and from my point of view, the best option is putting all related documents types in the same index.
I read that some people had problems with too many concurrent indexes (1 index per type). It's better for performance and robustness to unify related types in the same index.
Besides, if the types are in the same index you can use "_parent" field to create hierarquical models that allow to you interesting features for search as "has_child" and "has_parent" and of course you have not to duplicate data in your model.

Related

ElasticSearch suggestions on blog name, authors and tags

I've been looking all over the place for a good answer to my question but I just can't find any...
I'm using ElasticSearch along with Laravel. I've used ElasticSearch on another project but never used suggestions. I'm following this tutorial as I think it provides a great starting point for using Laravel with ElasticSearch: https://blog.madewithlove.be/post/how-to-integrate-your-laravel-app-with-elasticsearch/
My question is about suggestions; I want my search to be a search-as-you-type just like the one you would find on Spotify. I want my users to type a few letters in the search box and have the results be organized into multiple categories: blogs, authors, tags.
If I index my data into one index, with authors and tags being blog's nested objects, I can easily get suggestions using the completion suggester for blog names, but not for nested objects. I could also split each model and index data separately into different indexes, but that would mean I would have to make 3 queries to get my results back.
Am I doing something wrong? Should I structure my data differently? Is making 3 queries the way to go or is there a way to have a single query output search results from different indexes?
Thanks!
Xavier
Something that I did when I built a search-as-you-type was I used a separate index for suggestions. In your situation, you'd index the name (title, author, whatever) in one field and the type in another. Then you could search on one field and display the grouped results.
The advantage here is speed. This will likely be a heck of a lot faster than trying to do a suggester on your nested data. (Which you can probably do, but I'm not sure how.) And speed is pretty important for this type of feature.

ElasticSearch multiple types with same mapping in single index

I am designing an e-Commerce site with multiple warehouse. All the warehouses have same set of products.
I am using ElasticSearch for my search engine.
There are 40 fields each ES document. 20 out of them will differ in value per warehouse, rest 20 fields will contain same values for all warehouses.
I want to use multiple types (1 type for each warehouse) in 1 index. All of the types will have same mappings. Please advise if my approach is correct for such scenario.
Few things not clear to me,
Will the inverted index be created only once for all types in same index?
If new type (new warehouse) is added in future how it will be merged with the previously stored data.
How it will impact the query time if I would have used only one type in one index.
Depending on all types being assigned to the same index, it will only created once and
If a new type is added, its information is added to the existing inverted index as well - adding new terms to the index, adding pointers to existing terms in the index, adding data to doc values per new inserted document.
I honestly can't answer that one, though it is simple to test this in a proof of concept.
In my previous project, I experienced the same setting implementing a search engine with Elasticsearch on a multishop-platform. In that case we had all shops in one type and when searching per shop relevant filters were applied. Though, the approach to separate shop-data by "_type" seems pretty clean to me. We applied it the other way, since my implementation was already able to cover it by filters at the moment of the feature request.
Cheers, Dominik

which is the best way to create types in terms of performance in elasticsearch

i have a RDBMS tables having multiple columns and its hetrogenous and need to create an index in elasticsearch from these tables. So which is the best practise intems of creation of types in elasticsearch. i was thinking about the multiple option
1) create types as same as rdbms tables and add documents as same as records in table
2) create a type with two fileds, in which one of the field for identification of that document and other field will be the concatenation of tables columns vales. So in this way only two fileds will be there across the all tables and search on the one field.
So could you let me know, which is the best way to create the types. please let me know, if need more info.
Index the data in the form that best facilitates your search requirements.
If it makes sense to combine everything into one big searchable everything field, do that (by default, elasticsearch already does this, in addition to the separate fields you index). If you are going to regret not being able to separate them when you need to search for data from one particular column, go the other route. If you need to know whether a document has been matched based on it's title or it's body, for instance, they should be indexed in different fields.
The best way to cripple your performance is trying to kludge together queries for things your index's structure doesn't support well.

ElasticSearch Wrapping Head on Index Types

I'm looking into elastic search right now and I am having a hard time grasping how index types fit into the data model, I've read examples and documentation but none really goes in depth or the examples seem to use a data model that is composed of several submodels.
I am currently using mongodb to store my data, let's take this example of an Article collection that I want to be indexed for search, my doc looks like this:
Article = {
title: String,
publisher: String,
subject: String,
description: String,
year: Integer,
}
Now I want each of those fields to be searchable, so I would make an elasticsearch index of 'Article'. I will need to define each field and how it should be analysed and whether it is stored or not, that I understand.
Now how does an index type come in here? As far as I am aware, Lucene does not have this concept, this is a layer added by Elasticsearch.
For example, maybe some of you may say that we can logically group the documents by subject or publisher and create index types on those but how is this different from searching by subject or publisher?
Is it more of a performance related aspect that we have index types?
Not a very easy question to answer, but I am going to give it a try. But be warned this just my opinion.
First of all, if you do not want to keep certain documents together in an index, just because it feels they should, create separate indices. There is not really a penalty for using more indices over more types. The only thing I can think of is that you could create analysers and mappings that you can reuse over the different types.
You can use types if you feel documents belong together, they have similar structure but not necessary the same structure. Be warned though, do not create separate mappings for fields with the same name in different types within the same index. Lucene does not like this.
Than there is the final scenario, in parent-child relationships, here you need types. This way the parent and it's children can be places in the same shard which is better for performance.
Hope that helps a bit.
If I'm not mistaken, the catch with using more than one data type in one index is almost identical to using different indices. Say, you can store (as I did) documents of types "simple_address", "delivery_address", "some_strange_but_official_address_info" in the same index "address" to make your code a bit more sane. But if you don't use parent-child links, it's equivalent to just having three indices.
Speaking of your example, you should wrap your head around what would you like to search. If, for instance, you add comments in equation, it's better to use some kind of separation - either as parent-child or different indices with manual mapping by keys. And, obviously, you should have different mappings for "Article" and "Comment" types.

Elastic search and "databases"

Sorry for the ambiguous title, couldn't thing of anything better fitting.
I 'm exploring Elastic Search and it looks very cool. My question is conceptual since I 'm used to sql.
In Sql, you have different databases and you store the data for each application there. Does the same concept exist in ES? Or is all data from all my application going to end up in the same place? In that case, what are the best practices to avoid unwanted results from unfitting data?
Schemaless doesn't mean structureless:
In elastic search you can organize your data into document collections
A top-level document collection is roughly equivalent to a database
You can also hierarchically create new document collections inside top-level collections, which is a very rough equivalent of a database table
When you search documents, you search for documents inside specific document collections (such as search for all posts inside blog1)
Individual documents can be viewed as equivalent to rows in a database table
Also please note that I say roughly equivalent -- data in SQL is often normalized into tables by relations, while documents (in ES) often hold large entities of data. For instance, it generally makes sense to embed all comments inside a blog post document, whereas in SQL you would normalize comments and blogposts into individual tables.
For a nice tutorial, I recommend taking look at "ElasticSearch in 5 minutes" tutorial.
Switching from SQL to a search engine can be challenging at times. Elasticsearch has a concept of index, that can be roughly mapped to a database and type that can, again very roughly, mapped to a table. Elasticsearch has very powerful mechanism of selecting records (rows) of a single type and combining results from different types and indices (union). However, there is no support for joins at the moment. The only relationship that elasticsearch supports is has_child, but it's not suitable for modeling many-to-many relationships. So, in most cases, you need to be prepared to denormalize your data, so it can be stored in a single table.

Resources