Search strategy for online application - elasticsearch

I have an application with around 700 000 active products with actual stock quantity.
Each product can have multiple attributes and categories.
Product name, description and attributes can be delivered to the user in a few languages.
What I need to achive is fast search. By fast I mean that for example for product group which contains 250k of products I would like to return a first page of sorted results in 100ms.
My first thought was to deformalize data and push it into document db like elastic search. But there is one issue - product price: it depends on the user that is actually logged in.
Currently there will be 30k users. Each user can have different discount for each product category or even for each particular product. When discount or price is changed there is a business requirement to synchronize prices in a few minutes. Potentially system could compute prices for search results on fly, but there is an issue with sorting and pagination. When group consist of 250k products it will be hard to get results, compute price, sort and return given page.
Is there any way to return user dependent field in elastic search? Or I should rather start looking into some other solutions like graph databases?

Related

elasticsearch custom score formula

I have a site that has a search using elasticsearch.
There is a rule that I need to implement:
the site is a search engine for women's products
but we have some stores that will pay to have their products appear on the site
we need to give priority to products from paying stores
We think the formula would be this:
store that is a customer (higher weight)
number of product views
Is there a way to use function_score to solve this?
and display the products randomly? (without blocking the list with only products from paying stores)

Sorting elasticsearch types based on child type property

I'm working on a e-commerce search page and need to free text search products and have multiple facet options and sorting capabilities. The issue I'm facing has to do with product prices:
One product has multiple prices - there are special discounts, B2B customer specific prices, and specific B2C prices. There could be a few hundred prices per product.
I need to be able to do to a full text search on products, but still be able to sort on one of the selected price groups.
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible.
Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I would personally be inclined not to store complex pricing data within Elasticsearch, at least not prices calculated by business logic such as discounts and specific B2C prices.
A base price could be stored for querying and sorting, and apply pricing logic to this with scripting, using script queries and script sorting, respectively.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible. Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
Parent/Child relationships operate on documents within a single index, with a join datatype field on a document to indicate the relationship between a parent and a child, and child documents indexed on the same shard as the parent. If children are not evenly distributed across parents/shards e.g. one parent document has a million children and the others have only a few each, it's possible to end up with hot spots within shards that can affect performance. Product and pricing data doesn't feel like a good fit for Parent/Child; pricing sounds like it's too dynamic to be stored within documents.

Need help in choosing right caching strategy

We car planning to store prices data to Memcache. prices are subject to car variant and location(city). This is how it is stored in the database.
variant, city, price
21, 48, 40000
Now the confusion is that how do we store this data into Memcache.
Possibility 1 : We store each price in separate cache object and do a multiget if the price of all variant belongs to a model need to be displayed on a single page.
Possibility 2 : We store prices at the model, city level. Prices of all variants of a model will be stored in a single object. This object will be slightly heavy but multiget wouldn't be required.
Need your help in taking the right decision.
TLDR: It all depends on how you want to expose the feature to your end users, and what the query pattern looks like.
For example:
If your flow is that a user can see all the variant prices on a detail page for a city, then you could use <city_id>_<car_model_id> as the key, and store all data for variants against that key (Possibility 2).
If the flow is that a user can see prices of all variants across cities on a single page, then you would need the key as <car_model_id> and store all data as Json against this key
If the flow is that a user can see prices of one variant at a time only for every city, then you would use the key <city_id>_<car_variant_id> and store prices.
One thing to definitely keep in mind is the frequency with which you may have to refresh the cache/ perform upserts, which in the case of cars should be infrequent (who changes the prices of a car every day/second). So, I would have gone with option 1 above (Possibility 2 as described by you).

Most efficient way to model data in Elasticsearch

I have an example of modelling an commence site. Say that the site has few hundreds shops and few millions products. The products per shop range: 1000-100.000 products/shop. I need to be able to aggregate the products and the shop fields. All the products and all the shops have the same schema.
Product
{
"productName"
"price"
"category"
}
Shop
{
"shopName"
"rating"
}
1) Is it more efficient to have a) 1 index/shop, b) same index and 1 type/shop or c) same index, same type and have a field to determine the shop of the product?
I read some related articles and most of them are in favour of same index and 1 type/shop. But then they say that if there is one single index which has a large number of docs it might be even slower than having multiple indices.
2) I also need to perform JOINS and aggregations between the shops and the products. For example I need to be able to retrieve all the products from the shops with rating higher than 8/10 and also get the number of products per category. Is it preferable to use a) application-side JOIN, b) parent-child relationships, c) Siren plug-in, d) something else?
I would definitely use single denormalized index/type for the use cases you've mentioned. If you will need more fields for the shop, then you will create another index for shops, while still keeping first denormalized index. Mind that you may need unique shop id alongside to shop name.

CodeIgniter Cart ID + Options

I have a situation:
I have products that are in a CodeIgniter Cart custom store.
Each product has an ID associated with it, but also has options for it (sizes).
These sizes all have different prices. (We're talking about photos being sold at different print sizes).
Because CI Cart updates, adds and deletes based on the product ID inserted, I am not able to insert one product with 2 different sizes.
As of now, the only solution I can think of is to pass the ID to the cart as IMAGEID_OPTIONID so that it contains both IDs.
However, I thought there might be an easier, more uniform way of doing this?
Or a better solution than an ID that isn't (on it's own) associated with anything specific unless i explode it..?
I recently built a site that had these constraints. In short, you'll want to create a distinction between "products" and "product groups". Think of it as managing the most discrete data units. In reality, shirt X sized medium is actually a different thing than shirt X sized large...doubly so if you have prices that are built on these qualities (this becomes more realistic when you consider cloth patterns or colors).
So anyway, if you have a "groups" table, a "product_groups" table, and a "products" table, you can keep all of these ideas distinct. On your products table, you can have columns for "size" and "color" (and any other distinguishing property you can think of) and a column for "price". Alternatively, you can go even more hardcore and make separate pricing tables that match up prices to unique products (this would be especially useful if you want to keep track of historical prices and discounts).
Then in your cart you can simply attach product_ids to cart_ids and perform a couple of joins to determine what "group" this product is a part of, what pictures are in that group (or exist for that product), and so on. It's not a simple problem, but following this line of thought should help get you on the right path.
One last point: keeping track of unique products like this also makes inventory accounting much, much more straightforward.

Resources