Most efficient way to model data in Elasticsearch - elasticsearch

I have an example of modelling an commence site. Say that the site has few hundreds shops and few millions products. The products per shop range: 1000-100.000 products/shop. I need to be able to aggregate the products and the shop fields. All the products and all the shops have the same schema.
Product
{
"productName"
"price"
"category"
}
Shop
{
"shopName"
"rating"
}
1) Is it more efficient to have a) 1 index/shop, b) same index and 1 type/shop or c) same index, same type and have a field to determine the shop of the product?
I read some related articles and most of them are in favour of same index and 1 type/shop. But then they say that if there is one single index which has a large number of docs it might be even slower than having multiple indices.
2) I also need to perform JOINS and aggregations between the shops and the products. For example I need to be able to retrieve all the products from the shops with rating higher than 8/10 and also get the number of products per category. Is it preferable to use a) application-side JOIN, b) parent-child relationships, c) Siren plug-in, d) something else?

I would definitely use single denormalized index/type for the use cases you've mentioned. If you will need more fields for the shop, then you will create another index for shops, while still keeping first denormalized index. Mind that you may need unique shop id alongside to shop name.

Related

Elasticsearch - Count of associations between indexes?

Coming from the relational database background, I want to know if there is a way to retrieve the number of unique associations between two indexes.
Basic Example (Using relational databases)
I have 3 tables: Person, Cars, Person-Cars
Person-Cars has two columns (person_id, car_id) and holds the number of associations (ownership) between people and cars.
On Elasticsearch, I have created an index for Person and for Cars.
Main Point
Everytime that I fetch a Car document, I want to know how many people own that car (IOW how many associations it has to unique people)
--
To archieve that, I would need another index for Person-Cars, and then would have to index all the association records? Is there a simpler way? What would be the best way to do this in ES?
I have looked into aggregations, but I think that can only be done on a single level (person or cars) not sure.
Thanks!
On Elasticsearch, I have created an index for Person and for Cars.
Most of the times it makes sense to store the data in a denormalized fashion in elastic search, viz defining one-to-many relationships as either nested or parent-child relationship or simply in multi-value fields.
What would be the best way to do this in ES?
It depends on your use case (either parent-child or nested or multi-value). Having separate indexes for each type definitely will add overhead. If you add other use cases and type of queries which you would be needing then only schema can be better modelled.
Considering only the shared use case: Below car document will solve your case :
{
"id":1,
"brand":"Hyundai",
"owners":[21,31,51] // <===== Ids of owners. Ids & names both can be stored if required.
"owners_cnt": 3 // <==== OR You can simply maintain the counter as well.
}
Whenever a person buy/sell a car, then car document needs to updated in this case. If buying and selling of cars happens frequently and you need to update both car & person if a person bought a car then this type of modelling makes less sense.
In that case it makes sense to have car_ids within-person doc :
{
"id":1,
"name":"Raj",
"cars":[1,2,3]
}
In this case, we can use below query to fetch the number of persons who bought a car , having id=3
GET person/_count
{
"query": {
"match": {
"cars": 3
}
}
Again better modelling can be achieved if more context is shared.

Sorting elasticsearch types based on child type property

I'm working on a e-commerce search page and need to free text search products and have multiple facet options and sorting capabilities. The issue I'm facing has to do with product prices:
One product has multiple prices - there are special discounts, B2B customer specific prices, and specific B2C prices. There could be a few hundred prices per product.
I need to be able to do to a full text search on products, but still be able to sort on one of the selected price groups.
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible.
Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I would personally be inclined not to store complex pricing data within Elasticsearch, at least not prices calculated by business logic such as discounts and specific B2C prices.
A base price could be stored for querying and sorting, and apply pricing logic to this with scripting, using script queries and script sorting, respectively.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible. Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
Parent/Child relationships operate on documents within a single index, with a join datatype field on a document to indicate the relationship between a parent and a child, and child documents indexed on the same shard as the parent. If children are not evenly distributed across parents/shards e.g. one parent document has a million children and the others have only a few each, it's possible to end up with hot spots within shards that can affect performance. Product and pricing data doesn't feel like a good fit for Parent/Child; pricing sounds like it's too dynamic to be stored within documents.

Search strategy for online application

I have an application with around 700 000 active products with actual stock quantity.
Each product can have multiple attributes and categories.
Product name, description and attributes can be delivered to the user in a few languages.
What I need to achive is fast search. By fast I mean that for example for product group which contains 250k of products I would like to return a first page of sorted results in 100ms.
My first thought was to deformalize data and push it into document db like elastic search. But there is one issue - product price: it depends on the user that is actually logged in.
Currently there will be 30k users. Each user can have different discount for each product category or even for each particular product. When discount or price is changed there is a business requirement to synchronize prices in a few minutes. Potentially system could compute prices for search results on fly, but there is an issue with sorting and pagination. When group consist of 250k products it will be hard to get results, compute price, sort and return given page.
Is there any way to return user dependent field in elastic search? Or I should rather start looking into some other solutions like graph databases?

How to model this mysql table relationship in elastic search

I have a large amount of shop items imported into elastic search and I can query them.
I am wondering how best to model the following mysql table relationship into elastic search:
Shop items can have different offers. There are different offer types. And in some shops an item may be on offer, in other shops the item may not be on offer or have a different offer type. Items don't have to have offers. I model this below:
Items table
item_id
Offers table
shop_id, item_id, offer_type, user_id
Where user_id is the id of the user who created the offer.
So as an example, item_id 1 and shop_id's 1,2 and offer_types premium and featured.
Then the offers table could look like:
shop_id, item_id, offer_type, user_id
1,1,featured,45
2,1,premium,33
2,1,featured,45
But it's not the case that every item is on offer. And even if item_id 1 is on offer in shops 1 and 2, it might not be on offer in other shops.
I want to be able to query my /items type and it will only be for one shop at a time but for that shop I want to get all the items in e.g. a certain price range and of a certain category (that i can do all ready), but I need to know for each item in the results what offer they have if any (e.g. if featured, premium or whatever offer_type).
How can I best model this behaviour in elastic search?
One approach is Nested Object relationship - Shop contains set of items with id as your shop id
For your cases
1) Get all items of a shop - GET: http://host/your_index/shops_type/shopid
This will give you all items in a shop along with offer_type. you can filter in your program logic

CodeIgniter Cart ID + Options

I have a situation:
I have products that are in a CodeIgniter Cart custom store.
Each product has an ID associated with it, but also has options for it (sizes).
These sizes all have different prices. (We're talking about photos being sold at different print sizes).
Because CI Cart updates, adds and deletes based on the product ID inserted, I am not able to insert one product with 2 different sizes.
As of now, the only solution I can think of is to pass the ID to the cart as IMAGEID_OPTIONID so that it contains both IDs.
However, I thought there might be an easier, more uniform way of doing this?
Or a better solution than an ID that isn't (on it's own) associated with anything specific unless i explode it..?
I recently built a site that had these constraints. In short, you'll want to create a distinction between "products" and "product groups". Think of it as managing the most discrete data units. In reality, shirt X sized medium is actually a different thing than shirt X sized large...doubly so if you have prices that are built on these qualities (this becomes more realistic when you consider cloth patterns or colors).
So anyway, if you have a "groups" table, a "product_groups" table, and a "products" table, you can keep all of these ideas distinct. On your products table, you can have columns for "size" and "color" (and any other distinguishing property you can think of) and a column for "price". Alternatively, you can go even more hardcore and make separate pricing tables that match up prices to unique products (this would be especially useful if you want to keep track of historical prices and discounts).
Then in your cart you can simply attach product_ids to cart_ids and perform a couple of joins to determine what "group" this product is a part of, what pictures are in that group (or exist for that product), and so on. It's not a simple problem, but following this line of thought should help get you on the right path.
One last point: keeping track of unique products like this also makes inventory accounting much, much more straightforward.

Resources