I work in a B2B e-commerce company and we want improve our user experience with a function called "Magic shopping cart".
Let me explain :
Our website is a marketplace with multiple sellers selling a range of products with a limited stock per product, the point of our function is to make our customers find the cheapest cart for all the products they wish to buy.
At the moment customers need to search through all the website to find the best prices and to regroup a maximum of products on the same seller to reduce shipping fees.
We are searching for an algorithm that does all the research for our customers, meaning finding the best combination of sellers and products in order to buy the cheapest products.
We have done a function that combines all possible shopping cart for given products and quantities than we test which one is the cheapest, this is flawless except it takes way too much time.
We need a quicker/ more efficient way to find the cheapest cart, we have thought of machine learning (we are no experts) but we are open to all ideas.
Conventional algorithms offer better speed in most cases as compared to machine learning algorithms. If the customer wishes particular goods, and there is already a list of ALL offerings of these goods, then you just need an efficient search algorithm.
Machine learning would help you to identify which goods match which classes, for example, but this is not the problem you are trying to solve apparently.
Perhaps you are looking for some trade-off between the speed and quality of the magic cart feature (not optimum, but a good solution). In such case, there just might be space for using some machine learning, but it takes more specific formulation of the search task to come up with specific algorithm!
You might as well look into evolutionary algorithms and other optimization methods.
Hospitals are changing the way they sterilize their equipment. Previously the local surgeons kept all their own equipment and made their own surgery trays. Now they have to confine to a country wide standard. They want to know how many of the new trays they can make from their existing stock, and how much new equipment they need to buy.
The inventory of medical equipment looks like this:
http://pastebin.com/rstWSurU
each hospitals has codes for various medical equipment and then a number for how many they have of the corresponding item
3 surgery trays with their corresponding items are show in this dictionary.
http://pastebin.com/bUAZhanK
There are a total of 144 different operation trays
the hospitals will be told they need 25 of tray x, 30 of tray y, etc...
They would like to maximize the amounts of trays they can finish with their current stock. They would also like to know what equipment they need to purchase in order to finish the remaining trays.
I have thought about two possible solutions one being representing the problem as a linear programming problem. The other solving the problem by doing a round-robin brute force solve of the first 90% of the problem and solving the remaining 10% by doing a randomized algorithm several times and then pick the best solve of those tries.
I would love to hear if anyone knows a smart way of how to tackle this problem!
If I understand this correctly we can optimize for each hospital separately. My guess is that the following would be a good start for an MIP (Mixed Integer Programming) model:
I use the following indices: i is items and t is trays. x(t,i) indicates how many items we assign to each tray type. y(t) counts the number of trays of each type that we can compose using the available items. From the solution we can calculate the shortages that we need to order.
Of course we are just maximizing the number of trays we can make. There is no consideration of balancing (many trays of one type and few or zero of another). I mitigate a little bit by not allowing to create more trays than required (if we have more items they need to go to other types of trays). This requirement is formulated as an upper bound on y(t).
For large problems we can restrict the (t,i) combinations to the ones that are possible. This will make the model smaller. When using precise math notation:
A further optimization would be to substitute out the variables x(t,i).
Adding shipping surplus items to other hospitals would make the model more difficult. In that case we could end up with a model that needs to look at all hospitals simultaneously. May be an interesting case for some decomposition approach.
I'm working on a program to "optimally" buy magic cards. On the site each user has a "mini-shop", think eBay without the auctions.
The users enters a list of cards he wants to buy, I then fetch all offers from the site and print an "optimal" shopping list. Optimal meaning cheapest. Prices differ in the shops and also postage changes depending on how many cards you buy.
I would like to implement some algorithm which creates that list for me. I have written one, which works(I think), but I have no idea how good it works.
So my question is this: Can this problem be solved by some existing algorithm? It would need to deal with ~1000 offers for EACH card (normally 40-60 cards, so around 50k different offers)
Can somone point me in the correct direction on this?
The "partition" or "bin packing" problems (which are both mappable to what you want to do) is known to be NP-complete. Thus, the only way to make SURE that you have the optimal solution is to try all possible solutions and pick the best way.
If the user wants to buy 1,000 cards, trying all possible options is not computationally feasible, so you need to use heuristics.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am planning to log all user actions like viewed page, tag etc.
What would be a good lean solution to data-mine this data to get recommendations?
Say like:
Figure all the interests from the viewed URL (assuming I know the
associated tags)
Find out people who have similar interests. E.g. John & Jane
viewed URLS related to cars etc
Edit:
It’s really my lack of knowledge in this domain that’s a limiting factor to get started.
Let me rephrase.
Lets say a site like stackoverflow or Quora. All my browsing history going through different questions are recorded and Quora does a data mining job of looking through it and populating my stream with related questions. I go through questions relating to parenting and the next time I login I see streams of questions about parenting. Ditto with Amazon shopping. I browse watches & mixers and two days later they send me a mail of related shopping items that I am interested.
My question is, how do they efficiently store these data and then data mine it to show the next relevant set of data.
Datamining is a method that needs really enormous amounts of space for storage and also enormous amounts of computing power.
I give you an example:
Imagine, you are the boss of a big chain of supermarkets like Wal-Mart, and you want to find out how to place your products in your market so that consumers spend lots of money when they enter your shops.
First of all, you need an idea. Your idea is to find products of different product-groups that are often bought together. If you have such a pair of products, you should place those products as far away as possible. If a customer wants to buy both, he/she has to walk through your whole shop and on this way you place other products that might fit well to one of that pair, but are not sold as often. Some of the customers will see this product and buy it, and the revenue of this additional product is the revenue of your datamining-process.
So you need lots of data. You have to store all data that you get from all buyings of all your customers in all your shops. When a person buys a bottle of milk, a sausage and some bread, then you need to store what goods have been sold, in what amount, and the price. Every buying needs its own ID if you want to get noticed that the milk and the sausage have been bought together.
So you have a huge amount of data of buyings. And you have a lot of different products. Let’s say, you are selling 10.000 different products in your shops. Every product can be paired with every other. This makes 10,000 * 10,000 / 2 = 50,000,000 (50 Million) pairs. And for each of this possible pairs you have to find out, if it is contained in a buying. But maybe you think that you have different customers at a Saturday afternoon than at a Wednesday late morning. So you have to store the time of buying too. Maybee you define 20 time slices along a week. This makes 50M * 20 = 1 billion records. And because people in Memphis might buy different things than people in Beverly Hills, you need the place too in your data. Lets say, you define 50 regions, so you get 50 billion records in your database.
And then you process all your data. If a customer did buy 20 products in one buying, you have 20 * 19 / 2 = 190 pairs. For each of this pair you increase the counter for the time and the place of this buying in your database. But by what should you increase the counter? Just by 1? Or by the amount of the bought products? But you have a pair of two products. Should you take the sum of both? Or the maximum? Better you use more than one counter to be able to count it in all ways you can think of.
And you have to do something else: Customers buy much more milk and bread then champagne and caviar. So if they choose arbitrary products, of course the pair milk-bread has a higher count than the pair champagne-caviar. So when you analyze your data, you must take care of some of those effects too.
Then, when you have done this all you do your datamining-query. You select the pair with the highest ratio of factual count against estimated count. You select it from a database-table with many billion records. This might need some hours to process. So think carefully if your query is really what you want to know before you submit your query!
You might find out that in rural environment people on a Saturday afternoon buy much more beer together with diapers than you did expect. So you just have to place beer at one end of the shop and diapers on the other end, and this makes lots of people walking through your whole shop where they see (and hopefully buy) many other things they wouldn't have seen (and bought) if beer and diapers was placed close together.
And remember: the costs of your datamining-process are covered only by the additional bargains of your customers!
conclusion:
You must store pairs, triples of even bigger tuples of items which will need a lot of space. Because you don't know what you will find at the end, you have to store every possible combination!
You must count those tuples
You must compare counted values with estimated values
Store each transaction as a vector of tags (i.e. visited pages containing these tags). Then do association analysis (i can recommend Weka) on this data to find associations using the "Associate" algorithms available. Effectiveness depends on a lot of different things of course.
One thing that a guy at my uni told me was that often you can simply create a vector of all the products that one person has bought and compare this with other peoples vectors and get decent recommendations. That is represent users as the products they buy or the pages they visit and do e.g. Jaccard similarity calculations. If the "people" are similar then look at products they bought that this person didn't. (Probably those that are the most common in the population of similar people)
Storage is a whole different ballgame, there are many good indices for vector data such as KD trees implemented in different RDBMs.
Take a course in datamining :) or just read one of the excellent textbooks available (I have read Introduction to data mining by Pang-Ning tan et al and its good.)
And regarding storing all the pairs of products etc, of course this is not done and more efficient algorithms based on support and confidence are used to prune the search space.
I should say recommendation is machine learning issue.
how to store the datas depends on which algorithm you chose.
Or The Traveling Salesman plays Magic!
I think this is a rather interesting algorithmic challenge. Curious if anyone has any good suggestions for solving it, or if it is already solvable in a known way.
TCGPlayer.com sells collectible cards for a variety of games, including Magic the Gathering. Instead of just selling cards from their inventory they are actually a re-seller from multiple vendors (50+). Each vendor has a different inventory of cards and a different price per card. Each vendor also charges a flat rate for shipping (usually). Given all of that, how would one find the best price for a deck of cards (say 40 - 100 cards)?
Just finding the best price for each card doesn't work because if you order 10 cards from 10 different vendors then you pay shipping 10 times, but if you order all 10 from one vendor you only pay shipping once.
The other night I wrote a simple HTML Scraper (using HTML Agility Pack) that grabs all the different prices for each card, and then finds all the vendors that carry all the cards in the deck, totals the price of the cards from each vendor and sorts by price. That was really easy. The total prices ended up being near the total median price for all the cards.
I did notice that some of the individual cards ended up being much higher than the median price. That raises the question of splitting an order over multiple vendors, but only if enough savings could be made by splitting the order up to cover the additional shipping (each added vendor adds another shipping charge).
Logically it seems that the best price will probably only involve a few different vendors, but if the cards are expensive enough (and some are) then in theory ordering each card from a different vendor could still result in enough savings to justify all the extra shipping.
If you were going to tackle this how would you do it? Pure brute force figuring every possible combination of card / vendor combinations? A process that is more likely to be done in my lifetime would seem to involve a methodical series of estimates over a fixed number of iterations. I have a couple ideas, but am curious what others might suggest.
I am looking more for the algorithm than actual code. I am currently using .NET though, if that makes any difference.
I would just be greedy.
Assume that you are going to eat the shipping cost and buy from all vendors. Work out the absolute lowest price you get. Then for each vendor work out how much being able to buy some cards from them versus someone else saves you. Order the vendors by shipping - incremental savings.
Starting with the vendors who provide the least value, axe that vendor, redistribute their cards to the other vendors, and recalculate incremental savings. Wash, rinse, and repeat until your most marginal vendor is saving you money.
This should find a good solution but is not guaranteed to find the best solution. Finding the absolute best solution, though, seems likely to be NP-hard.
This is isomorphic to the uncapacitated facility location problem.
card in the deck : client
vendor : possible facility location
vendor shipping rate : cost of opening a facility at a location
cost of a card with a particular vendor : "distance" from a client to a facility
Facility location is a well-studied problem in the combinatorial optimization literature.
Interesting question! :)
So if we have n cards and m vendors, the brute force approach might have to check up to n^m combinations, right (a bit less since not each vendor has each card, but I guess that doesn't really matter in the grand scheme of things ;).
Let's for a second assume each vendor has each card and then see later-on how things change if they don't.
find the cheapest one-vendor solution.
order the cards by price, find the most expensive card that's cheaper at another vendor.
for all cards from vendor 1, move them to vendor 2 if they're cheaper there.
if having added vendor 2 doesn't make the order cheaper, undo and terminate, otherwise repeat from step 2
So if one vendor doesn't have all cards, you have to start with a multi-vendor situation. For each vendor, you might start by buying all cards that exist there, then apply the algorithm to the remaining cards.
Obviously, you may not be able to exploit all subtleties in the pricing with this method. But if we assume that a large portion of the price differences is made up by individual high-price cards, I think you can find a reasonable solution with this way.
Ok after writing all this I realized, the n^m assumption is actually wrong.
Once you have chosen a set of vendors to buy from, you can simply choose the cheapest vendor for each card. This is a great advantage because the individual choices of where to buy each card don't interfere with each other.
What does this mean for our problem? From the first look of it, it means that the selection of dealers is the problem (in terms of computational complexity), not the individual allocation of your buying choices. So instead of n^m, you got 2^m possible configurations in the worst case. So what we need is a heuristic for choosing vendors rather than choosing individual cards. Which might make the heuristic from above actually even more justifiable.
I myself have pondered this. Consider the following:
If it takes you a week to figure out,
code, and debug and algorithm that
only provides a 1% discount, would you
do it?
The answer is probably "No" (unless you're spending your entire life savings on cards, in which case you may be crazy). =)... or Amazon.com
Consequently, there is already an easy approximating algorithm:
Wait until you're buying lots of cards (reduce the shipping overhead).
Buy the cards from 3 vendors:
- the two with the cheapest-but-most-diverse inventories
- a third which isn't really cheap but definitely has every card you'd want.
Optimize accordingly (for each card, buy from the cheaper one).
Also consider local vendors you could just walk to, pre-constructed decks, and trading.
Based on firsthand and second experience, I can say you will find that you can get the median price with perhaps a few dollars more shipping you could otherwise, while still getting around median on each. You will find that you may have to pay a tiny bit more for understocked cards, but this will be few and far between, and the shipping savings will make up for it.
I recall the old programming adage: "Never optimize, until it's absolutely necessary; chances are you won't need to, or would have optimized the wrong thing." (e.g. your time is a resource too, and also has monetary value)
edit: Given that, this is an amazingly cool problem and one should solve it if one has time.
my algorithm goes like this
for each card calculate the average price available i.e sum of the price available from each vendor divide by the no of vendors.
now for that card select a vendor that offers less than or equal to average price.
now for each card we will have the list of vendors. now go for the intersection this way we will end up with series of vendor providing the maximxum no of cards at average or below average price.
i'm still thinking over the next steps but im putting the rough idea over here
now we are left with cards which are providing us single card. for such cards we will look into the price list of alredy short listed vendors with max no of cards and if the price diff is less than the shipping cost the we add the card to that vendors list.
i know this will require a huge optimization. but this what i have roghly figured out hope this helps
How about this:
Calculate the average price per ordered card across all vendors.
For each vendor that has at least one of the cards, calculate the total savings for all cards in the order as the difference between each card's price at that vendor and the average price.
Start with the vendor with the highest total savings and select all of those cards from that vendor.
Continue to select vendors with the next highest total savings until you have all of the cards in the order selected. Skip vendors that don't have cards that you still need.
From the selected list of vendors, redistribute the card purchases to the vendors with the best price for that card.
From the remaining list of vendors, and if the list is small enough, you could then brute force any vendors with a low card count to see if you could move the cards to other vendors to eliminate the shipping cost.
I actually wrote this exact thing last year. The first thing I do after loading all the prices is I weed out my card pool:
Each vendor can have multiple
versions of each card, as there are
reprints. Find the cheapest one.
Eliminate any card where the card value is greater than the cheapest card+shipping combo. That is, if I can buy the card cheaper as a one-off to a vendor than I can by adding it to an existing order from your store, I will buy it from the other vendor.
Eliminate any vendor whose offering I can buy cheaper (for every card) from another vendor. Basically, if another vendor out-prices you on every card, and on the total + shipping, then you are gone.
Unfortunately, this still leaves a huge pool.
Then I do some sorting and some brute-force-depth-first summing and some pruning and eventually end up with a result.
Anyway, I tuned it up to the point that I can do 70 cards and, within a minute, get within 5% of the optimal goal. And in an hour, less than 2%. And then, a couple of days later, the actual, final result.
I am going to read more about facility planning. Thanks for that tip!
What about using genetic algorithm? I think I'll try that one myself. You might manipulate the pool by adding both a chromosome with lowest prices, and another with lowest shipping costs.
BTW, did you finally implement any of the solutions presented here? which one? why?
Cheers!