Example: Some items belong to specific users. The User is the parent, the item is the child. Indexing those items and users can be done by routing the items to the shards of the users.
Problem: The majority of items does not belong to a specific user since they have been posted anonymously. I could have those items routed to a parent-id:"anonymous", but that would lead to the majority of items being stored in one single shard.
Question: How can I introduce optional parent-child-relations so that items belonging to a registered user route to the users shard, while anonymous items get distributed randomly?
Store them in two different indexes and search both.
Here's a video and article that has more on sharding/index partitioning strategies:
Sizing Elasticsearch
ElasticSearch: Big Data, Search, and Analytics
Related
I am building a search functionality for two types of related documents, let's call them "blogs" and "posts", respectively a blog website (with a bunch of posts) and the specific posts written in that blog. I'd like to be able to search against both of them. In a relational database (which ES is not), I would have two main tables which would be linked against a foreign key, and I could search the two tables separately or with a join. In Elasticsearch, I am considering a parent-child relationship where "blog" is the parent document, and there are potentially many "post" documents associated with it as the child.
EDIT: I should explain why I want to index them this way. Basically, I want people to be able to search for blogs (the overall series of posts written by the same author), and the search terms might not be in the blog's description alone, but rather in the posts; for instance, a blog about Python might have a general description that talks about python, but the blog posts might talk about django, so if someone searches for "django" I'd like the python blog to come up. Also, I want people to be able to search for specific posts. I also think (prove me wrong!) these need to be separate types of documents because they would have different fields, e.g. a post might have a date field, while a blog would not have that field.
In any case: Ideally, I would like to be able to offer a search function against "blog" which would also search against the "post" text (as the relevant text might be in the post); additionally, I'd like to allow users to search all posts regardless of what blog they are associated with.
What are the best practices for setting this up? From what I can tell, Elasticsearch has removed the ability to have two types of documents on the same index, and parent-child relationships need to be on the same index. With this constraint, it seems like parent-child relationships would only be for relationships between documents of the same type, e.g. if you are indexing people and you can indicate who is a parent and child (literally).
The other option would be to create two indexes, one for blogs (which would include the posts' texts) and a second index which would include only the posts. But my instinct is that this would duplicate a tremendous amount of data, and also a lot more work to keep it updated and in sync with my main relational data store.
I'm working on a e-commerce search page and need to free text search products and have multiple facet options and sorting capabilities. The issue I'm facing has to do with product prices:
One product has multiple prices - there are special discounts, B2B customer specific prices, and specific B2C prices. There could be a few hundred prices per product.
I need to be able to do to a full text search on products, but still be able to sort on one of the selected price groups.
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible.
Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I would personally be inclined not to store complex pricing data within Elasticsearch, at least not prices calculated by business logic such as discounts and specific B2C prices.
A base price could be stored for querying and sorting, and apply pricing logic to this with scripting, using script queries and script sorting, respectively.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible. Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
Parent/Child relationships operate on documents within a single index, with a join datatype field on a document to indicate the relationship between a parent and a child, and child documents indexed on the same shard as the parent. If children are not evenly distributed across parents/shards e.g. one parent document has a million children and the others have only a few each, it's possible to end up with hot spots within shards that can affect performance. Product and pricing data doesn't feel like a good fit for Parent/Child; pricing sounds like it's too dynamic to be stored within documents.
I have product type data loaded into Elasticsearch containing catalogue_number and name. I also have customer data loaded into Elasticsearch containing name and purchases (where purchases is an array of product numbers).
For example:
CATALOGUE_NUMBER, NAME
518, "Toilet Paper"
388, "Candy Bar"
263, "Carrots"
And, for customers:
NAME, PURCHASES
"Jack", [518, 388]
"John", [263]
"Bill", [263, 518]
Considering the relationship is many to one (i.e. customers purchase many items), am I able to use Kibana to view a graph linking purchases to specific customers, or is this out of scope?
My end goal is to have a graph showing product and customer as vertices and edges showing which products each customer purchases. I am very confused as to whether Elasticsearch is capable, or if I should move to a pure graph database such as Neo4J and Elasticsearch for searching only.
The Graph feature can draw out these connections if they share a common field name - the unique identity of a node is a field name and a term. Terms can be in different indices but as long as they share a common field name they are seen as the same node.
I'm not sure which business problem you are trying to solve (recommendations? Fraud?) but depending on what you are trying to achieve you may want to model things differently.
If you're interested in recommendations and people who-bought-X-also-bought-Y style suggestions then the people are unlikely to be interesting nodes to plot and you can just examine the "purchases" field which will draw out which products significantly co-occur.
For more detailed "forensic" type applications you may want to just have person->product links and not have product->product links in which case you would be forced to create more classical "edge-like" documents with only 2 nodes - a person ID and a product ID.
I have two scenarios that I want to support but I don’t know the best way to design relations in the elasticsearch. I read the entire elasticsearch documentation but I couldn’t find the best way to design types for my scenarios.
Multiple one to many.
Let’s assume that I have the following tables in my relational database that I want to transfer to the elasticsearch:
Transaction table Id User1Id User2Id ….
User table Id Name
Transaction contains two references to User. As far as I know I cannot use the parent->child relation specifying two parents? I need to store transaction and user in separate types because they can be changed separately. I need to be able to search transaction through user details and return users connected with transactions. Any idea how to design such structure in the elastic search?
Many to many
Let’s assume that we have the following tables:
Order Id …
OrderLine OrderId UserId Amount …
User Id Name
Order line is always saved with the order so I thought that I can store order with order lines as a nested object relation but the user must be in the separate table. Is there any way how can I connected multiple users from order line with user type? I assume that I can use application side join but I need to retrieve order and order line always together and be able to search order by user data.
I can use grandparent and grandchildren relations but then I need to make joins in the application. Any idea how to design it in the best way?
I need to make a system using ElasticSearch.
Each user has its documents, and the scope of these documents is only inside its user scope. Any user document is no accessible for any other system user.
The question is, what's the best approach, create an index per user, or create a single index containing all the documents of each user.
Each user might have its custom meta-information field over their documents that other users have not.
I know that in general it's proposed to use a single index with user aliases, however I don't understand how to add this custom user's document meta-information in this big index.
For example, imagine userA has two documents indexed, and userB has 3 documents. In my system exists system pre-defined meta-information as filename and description, however, the system allows to each user defines each own custom meta-information, for example: userA might have a meta-information color over its documents, and userB might have a size meta-information field over each document.
I understand one posibility would be add new field on the single index, however, it can be out of bounds.
What's would be the best approach?
Thanks for all.
One index per user sounds like you'd run into trouble at some point - there is an overhead per index that would become significant once you have a lot of users (say 10000 or so)
I don't think you need this though - you could allow custom attributes on a per user basis by using nested fields - each nested object would have name and value properties (possibly multiple value properties) and so you can have arbitrary searchable metadata for your documents without needing to change the mapping each time.