How to organise Elasticsearch indexes with too many dynamic fields? - elasticsearch

I have a scenario where there are too many groups, and each group has numerous documents with few standard fields and few dynamic fields. So I have created an alias for every group for an index. Now, as these groups grow, these dynamic fields are also increasing. I just hit the default 1000 fields limit.
I plan to create an index for each group to solve too many fields issues, but I will have too many indexes problems.
Please let me know if someone knows a better way to handle this problem.

Related

What should I know / concern when creating a index with 30 - 40 or more columns?

As mentioned in subject, I want to create a index with 30-40 or even more columns (mostly keyword and number).
What should I concern or know about this situation? Is it bad for performance? Is it bad for elasticsearch cluster stability?
For each document in Elasticsearch, there are some limitations to the number of fields and how they are organized.
You can check these limitations in the documentation (this might be different based on ES versions). These limitations can be changed and include the total number of fields that you can have (default to 1000) and the maximum depth for a field (default to 20).
Based on the documentation defining too many fields might not be a good idea, especially if you have many documents:
Defining too many fields in an index is a condition that can lead to a mapping explosion, which can cause out of memory errors and difficult situations to recover from
Also, be aware of the dynamic fields that you put into the document. Every new field will add a new definition to the document mapping settings.
In your situation, Considering the default maximum number of fields which is 1000, having 40 fields (column?) won't be a problem. Unless you have too many inner objects that might exceed some other mapping limitations like index.mapping.nested_fields.limit orindex.mapping.nested_objects.limit. And try to fix your document structure (mapping) before adding them.

ElasticSearch multiple types with same mapping in single index

I am designing an e-Commerce site with multiple warehouse. All the warehouses have same set of products.
I am using ElasticSearch for my search engine.
There are 40 fields each ES document. 20 out of them will differ in value per warehouse, rest 20 fields will contain same values for all warehouses.
I want to use multiple types (1 type for each warehouse) in 1 index. All of the types will have same mappings. Please advise if my approach is correct for such scenario.
Few things not clear to me,
Will the inverted index be created only once for all types in same index?
If new type (new warehouse) is added in future how it will be merged with the previously stored data.
How it will impact the query time if I would have used only one type in one index.
Depending on all types being assigned to the same index, it will only created once and
If a new type is added, its information is added to the existing inverted index as well - adding new terms to the index, adding pointers to existing terms in the index, adding data to doc values per new inserted document.
I honestly can't answer that one, though it is simple to test this in a proof of concept.
In my previous project, I experienced the same setting implementing a search engine with Elasticsearch on a multishop-platform. In that case we had all shops in one type and when searching per shop relevant filters were applied. Though, the approach to separate shop-data by "_type" seems pretty clean to me. We applied it the other way, since my implementation was already able to cover it by filters at the moment of the feature request.
Cheers, Dominik

Elasticsearch get multiple documents by uids over multiple indices

The previous setting was all documents of one type were in the same index. But due to different forms (conceptually) of types, and for backing up purposes, I need multiple indices of a single type.
They will all be in the form _feed. While this setting is great in some circumstances, for
client.prepareGet(index, typename, ids).execute().actionGet(); // works great if you know in which index to search
it is useless, since no wildcards may be used. What I can do is use multiple multigets and interleave the results. This results in what I want, but increase the amount of queries significantly.
Assuming I know, for sure, only one document exist with a given index, is there a better way to query does than call a multiget on all _uids for each possible index?
The best way would be to develop a mechanism in your application that would allow you to deduce the index name from the id. But assuming that this is not possible or practical, you have pretty much only two choices. If you need realtime get, then your approach is the only way to do it. If realtime get is not a requirement, you can perform a search across all indices using ids filter. If the id list is small you can benefit from using routing on your search query. This way the search request will only be dispatch to the shards that might contain any of the ids listed in the query. However, if the list of ids is big enough to span most of the shards, it will not provide any benefit.

which is the best way to create types in terms of performance in elasticsearch

i have a RDBMS tables having multiple columns and its hetrogenous and need to create an index in elasticsearch from these tables. So which is the best practise intems of creation of types in elasticsearch. i was thinking about the multiple option
1) create types as same as rdbms tables and add documents as same as records in table
2) create a type with two fileds, in which one of the field for identification of that document and other field will be the concatenation of tables columns vales. So in this way only two fileds will be there across the all tables and search on the one field.
So could you let me know, which is the best way to create the types. please let me know, if need more info.
Index the data in the form that best facilitates your search requirements.
If it makes sense to combine everything into one big searchable everything field, do that (by default, elasticsearch already does this, in addition to the separate fields you index). If you are going to regret not being able to separate them when you need to search for data from one particular column, go the other route. If you need to know whether a document has been matched based on it's title or it's body, for instance, they should be indexed in different fields.
The best way to cripple your performance is trying to kludge together queries for things your index's structure doesn't support well.

In ElasticSearch, should I use multiple indexes for separate but related entities?

The overhead of adding indexes is well-documented, but I have not been able to find good information on when to use multiple indexes with regards to the various document types being indexed.
Here is a generic example to illustrate the question:
Say we have the following entities
Products (Name, ProductID, ProductCategoryID, List-of-Stores)
Product Categories (Name, ProductCategoryID)
Stores (Name, StoreID)
Should I dump these three different types of documents into a single index, each with the appropriate elasticsearch type?
I am having difficulty establishing where to the draw the line on one vs. multiple indexes.
What if we add an unrelated entity, "Webpages". Definitely a separate index?
A very interesting video explaining elasticsearch "Data Design Patterns" by Shay Banon:
http://vimeo.com/44716955
This exact question is answered at 13:40 where examining different data flows, by looking at the concepts of Type, Filter and Routing
Regards
I was recently modeling a ElasticSearch backend from scratch and from my point of view, the best option is putting all related documents types in the same index.
I read that some people had problems with too many concurrent indexes (1 index per type). It's better for performance and robustness to unify related types in the same index.
Besides, if the types are in the same index you can use "_parent" field to create hierarquical models that allow to you interesting features for search as "has_child" and "has_parent" and of course you have not to duplicate data in your model.

Resources