Need help in choosing right caching strategy - caching

We car planning to store prices data to Memcache. prices are subject to car variant and location(city). This is how it is stored in the database.
variant, city, price
21, 48, 40000
Now the confusion is that how do we store this data into Memcache.
Possibility 1 : We store each price in separate cache object and do a multiget if the price of all variant belongs to a model need to be displayed on a single page.
Possibility 2 : We store prices at the model, city level. Prices of all variants of a model will be stored in a single object. This object will be slightly heavy but multiget wouldn't be required.
Need your help in taking the right decision.

TLDR: It all depends on how you want to expose the feature to your end users, and what the query pattern looks like.
For example:
If your flow is that a user can see all the variant prices on a detail page for a city, then you could use <city_id>_<car_model_id> as the key, and store all data for variants against that key (Possibility 2).
If the flow is that a user can see prices of all variants across cities on a single page, then you would need the key as <car_model_id> and store all data as Json against this key
If the flow is that a user can see prices of one variant at a time only for every city, then you would use the key <city_id>_<car_variant_id> and store prices.
One thing to definitely keep in mind is the frequency with which you may have to refresh the cache/ perform upserts, which in the case of cars should be infrequent (who changes the prices of a car every day/second). So, I would have gone with option 1 above (Possibility 2 as described by you).

Related

Sorting elasticsearch types based on child type property

I'm working on a e-commerce search page and need to free text search products and have multiple facet options and sorting capabilities. The issue I'm facing has to do with product prices:
One product has multiple prices - there are special discounts, B2B customer specific prices, and specific B2C prices. There could be a few hundred prices per product.
I need to be able to do to a full text search on products, but still be able to sort on one of the selected price groups.
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible.
Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I would personally be inclined not to store complex pricing data within Elasticsearch, at least not prices calculated by business logic such as discounts and specific B2C prices.
A base price could be stored for querying and sorting, and apply pricing logic to this with scripting, using script queries and script sorting, respectively.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible. Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
Parent/Child relationships operate on documents within a single index, with a join datatype field on a document to indicate the relationship between a parent and a child, and child documents indexed on the same shard as the parent. If children are not evenly distributed across parents/shards e.g. one parent document has a million children and the others have only a few each, it's possible to end up with hot spots within shards that can affect performance. Product and pricing data doesn't feel like a good fit for Parent/Child; pricing sounds like it's too dynamic to be stored within documents.

Modeling many-to-many relationship in data warehouse

I have to design data warehouse model and ETL process for class at my University. My data warehouse has to store opinions / comments about a product, each record should consist of:
comment text (String)
product score ({0, 0.5, … , 4.5, 5})
comment author (String)
comment date (Date)
product recommendation ({Yes, No})
comment up votes (Int)
comment down votes (Int)
product pros (many Strings, e.g {price, design, durability, … }) and its count
product cons (many Strings, e.g {too loud, too heavy, price, … }) and
its count
In addition data warehouse should store information about product:
product category
product brand
product model
I want to create data warehouse model first, but I have problem with storing product pros and cons as it is many-to-many relationship. In normal relational database I would simply create associative table, but here I am not sure how to proceed, after all I don’t want to normalize facts table.
I am considering 3 approaches, first, which I presented in diagram below. I used bridge table method (though, I don’t know if correctly) to get rid of many-to-many relationship. I don’t know how it will impact querying performance.
Second approach I may use is boolean column method. In PROS and CONS table I can create a column for each possible value, but there can be up to 100 different pros or cons. Also number of possible pros or cons is not constant in time. Authors in their comments can list new pros or cons (that’s how it works in data source), but I can’t add new columns (I shouldn’t change data in data warehouse).
Third approach I am considering, is to keep pros in PROS table but in 1 column, where values will be separated using commas or some other delimiter e.g. “price, design, color”. It keeps things simple but hard to analyze or slice & dice.
Which approach should I use in this situation? Which is better for loading data into data warehouse, because form data source I will get all the comments and I want to only load comments that are new since last loading?
What I think is, if we can get your first option little bit modified to than what you have said here, it would be the best as I understand.
in your image you have provided, having the Pros_Bridge_Detail table is fine. The rest need to be changed.
you can remove the pros_Bridge table that holds just the count. you can actually add that column to your COMMENT fact table you have up there. That would be more efficient and easy when it comes to queries rather than querying in many tables.
you said you have many areas to give pros like price, design, durability etc. Lets put those stuff into a separate dimension.
Add a new column to your Pros_Bridge_Detail table to hold the ID of the newly created Dimension that holds the product pro types (Design, durability etc).
Now, once you add a product Pro, the Pros_Bridge_Detail table will have the pros the user give and also hold the value of regarding what the pro is given via the ID of the new dimension.
Also don't forget to store the Comment ID as well in Pros_Bridge_Detail table as that will be your link (FK) to Comments fact table you have.
Same can be done to Cons as well.
Hope you understand what I just explained and hope it helps. let know if you have any issues.

Event Sourcing - Aggregate modeling

Two questions
1) How to model aggregate and reference between them
2) How to organise/store events so that they can be retrieved efficiently
Take this typical use case as example, we have Order and LineItem (they are an aggregate, Order is the aggregate root), and Product aggregate.
As LineItem needs to know which Product, so there are two options 1) LineItem has direct reference to Product aggregate (which seems not a best practice, as it violate the idea of aggregate being a consistence boundary because we can update Product aggregate directly from Order aggregate) 2) then LineItem only has ProductId.
It looks like 2nd option is the way to go...What do you think here?
However, another problem arises which is about building a Order read/view model. In this Order view model, it needs to know which Products are in Order (i.e. ProductId, Type, etc.). The typical use case is reporting, and CommandHandler also can use this Product object to perform logic such as whether there are too many particular products, etc. In order to do it, given the fact that those data are in two separate aggregate, then we need 1+ database roundtrips. As we are using events to build model, so the pseudo code looks like below
1) for a given order id (guid, order aggregate id), we load all the events for it; -- 1st database access
2) then build a Order aggregate, then we know which ProductId are referenced in Order;
3) for the list of ProductIds, we load all events for it; -- 2nd database access
If we build a really big graph of objects (a lot of different aggregates), then this may end up with a few more database access (each of which is slow)...What's your idea in here?
Thanks
Take this typical use case as example, we have Order and LineItem (they are an aggregate, Order is the aggregate root), and Product aggregate.
The Order aggregate makes sense the way you have described it. "Product aggregate" is more suspicious; do you ask the model if the product is allowed to change, or are you telling the model that the product has changed?
If Product can change without first consulting with the order, then the LineItem must not include the product. A reference to the product (aka the ProductId) is ok.
If we build a really big graph of objects (a lot of different aggregates), then this may end up with a few more database access (each of which is slow)...What's your idea in here?
For reads, reports, and the like -- where you aren't going to be adding new events to the history -- one possible answer is to do the slow work in advance. An asynchronous process listens for writes in the event store, and then publishes those events to a bus. Subscribers build new versions of the reports when new events are observed, and cache the results. (search keyword: cqrs)
When a client asks for a report, you give them one out of the cache. All the work is done, so it's very quick.
For command handlers, the answer is more complicated. Business rules are supposed to be in the domain model, so having the command handler try to validate the command (as opposed to the domain model) is a bit broken.
The command handler can load the products to see what the state might look like, and pass that information to the aggregate with the command data, but it's not clear that's a good idea -- if the client is going to send a command to be run, and you need to flesh out the Order command with Product data, why not instead have the command add the product data to the command directly, and skip the middle man.
CommandHandler also can use this Product object to perform logic such as whether there are too many particular products, etc.
This example is a bit vague, but taking a guess: you are thinking about cases where you prevent an order from being placed if the available inventory is insufficient to fulfill the order.
For real world inventory - a physical book in a warehouse - that's probably the wrong approach to take. First, the model itself is wrong; if you want to know how much product is in the warehouse, you should be querying the warehouse, not the product. Second, a physical warehouse is not constrained by your model -- calling the addProduct method on a warehouse aggregate doesn't cause the product to magically appear there.
Third, it probably doesn't match very well with what your domain experts want anyway. If the model says that the warehouse doesn't have enough product, do you think the stake holders want the system to
tell the shopper to buy the product somewhere else, or...
accept the order, and contact the supplier for a new delivery.
Hint: when in doubt, carefully review how amazon.com does it.

design of car booking application using elasticsearch

I need some help in designing car booking application.
There is a document with information about car (title, model, brand, info, etc.)
Problems I'm stuck with are:
How to store available booking days? (I suppose I could use nested
free date range objects in array)
How to store price per day (it's possible to have individual price
per day)?
Booking days and prices could change often. So the third question is: "how to update them cleverly (partially), so I shouldn't read the document, and then store it". I'm looking at script solution using
update api (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-update.html), but it looks ugly. Maybe there are other approaches?
Thanks,
Alex
with the introduction of the range datatypes, there is no need to use a real nested object, if you meant that.
That might also help you with storing the prices, but that could just be any object I suppose (it depends if you want to search for that as well).
Update API was made for exactly that use-case, that you do not need to get the whole document, so that shounds like a plan.

CodeIgniter Cart ID + Options

I have a situation:
I have products that are in a CodeIgniter Cart custom store.
Each product has an ID associated with it, but also has options for it (sizes).
These sizes all have different prices. (We're talking about photos being sold at different print sizes).
Because CI Cart updates, adds and deletes based on the product ID inserted, I am not able to insert one product with 2 different sizes.
As of now, the only solution I can think of is to pass the ID to the cart as IMAGEID_OPTIONID so that it contains both IDs.
However, I thought there might be an easier, more uniform way of doing this?
Or a better solution than an ID that isn't (on it's own) associated with anything specific unless i explode it..?
I recently built a site that had these constraints. In short, you'll want to create a distinction between "products" and "product groups". Think of it as managing the most discrete data units. In reality, shirt X sized medium is actually a different thing than shirt X sized large...doubly so if you have prices that are built on these qualities (this becomes more realistic when you consider cloth patterns or colors).
So anyway, if you have a "groups" table, a "product_groups" table, and a "products" table, you can keep all of these ideas distinct. On your products table, you can have columns for "size" and "color" (and any other distinguishing property you can think of) and a column for "price". Alternatively, you can go even more hardcore and make separate pricing tables that match up prices to unique products (this would be especially useful if you want to keep track of historical prices and discounts).
Then in your cart you can simply attach product_ids to cart_ids and perform a couple of joins to determine what "group" this product is a part of, what pictures are in that group (or exist for that product), and so on. It's not a simple problem, but following this line of thought should help get you on the right path.
One last point: keeping track of unique products like this also makes inventory accounting much, much more straightforward.

Resources