Find the best way to buy p Product from limit x Vendors - algorithm

I have to buy 100 Products ( or p Products) from 20 Vendors ( or v Vendors). Each Vendors have all of these Products, but they sell different Price.
I want to find the best price to get 100 Products. Asume that there is no Shipping Cost.
There are v^p ways. And I will get only one way that have best Price.
The problem seem to be easy if there is no requirement: LIMIT number of Vendors to x in the Orders because of Time Delivery ( or Some reasons).
So, the problem is: Find the best way to buy p Product from limit x Vendors ( There are v Vendors , x<=v).
I can generate all Combination of Vendors( There are C(v,x) combinations) and compare the Total Price. But There are so many combinations . (if there are 20 Vendors, there are around 185k combinations).
I stuck at this idea.
Someone has same problem , pls help me. Thank you very much.

This problem is equivalent to the non-metric k-center problem (cities = products, warehouses = vendors), which is NP-hard.
I would try mixed integer programming. Here's one formulation.
minimize c(i, j) y(i, j) # cost of all of the orders
subject to
for all i: sum over j of y(i, j) = 1 # buy each product once
for all i, j: y(i, j) <= z(j) # buy products only from chosen vendors
sum over j of z(j) <= x # choose at most x vendors
for all i, j: 0 <= y(i, j) <= 1
for all j: z(j) in {0, 1}
The interpretation of the variables is that i is a product, j is a vendor, c(i, j) is the cost of product i from vendor j, y(i, j) is 1 if we buy product i from vendor j and 0 otherwise, z(j) is 1 is we buy from vendor j at all and 0 otherwise.
There are many free mixed integer program solvers available.

Not Correct as shown by #Per the structure lacks optimal substructure
My assumptions are as follows, from the master table you need to create a sub list which has only "x" vendor columns, and "Best Price" is the "Sum" of all the prices.
Use a dynamic programming approach
What you do is define two functions, Picking (i,k) and NotPicking(i,k).
What it means is getting the best with ability to pick vendors from 1,.. i with maximum of k vendors.
Picking (1,_) = Sum(All prices)
NotPicking (1,_) = INF
Picking (_,0) = INF
NotPicking (_,0) = INF
Picking (i,k) = Min (Picking(i-1,k-1) + NotPicking(i-1,k-1)) - D (The difference you get because of having this vendor)
NotPicking (i,k) = Min (Picking(i-1,k) + NotPicking(i-1,k))
You just solve it for a i from 1 to V and k from 1 to X
You calculate the difference by maintaining for each picking the whole product list, and calculating the difference.

How about using a Greedy Approach. Since you have a limitation on the vendors ( you need to use at least x of the total v vendors). That means you need to choose at least 1 product from each vendor of the x ... And here's an example solution:
For each vendor in v, sort the products by price, then you will have "v" sets of sorted prices. Now you can pick the min of these sets and sort again, producing a new set of "v" products, containing only the cheapest ones.
Now, if p <= v, then pick the first p items and you are done, otherwise pick all v items and repeat the same logic until you reach p.

I haven't worked this out and verified, but I guess it might work. Try this:
Add two more columns called "Highest Price" and "Lowest Price" to the table and generate data for it: they should hold the highest and lowest price for each product amongst all vendors.
Also add another column, called "Range" which should hold the (highest price - lowest price).
Now do this 100 (p) times:
Pick the row with highest range. Buy the product with least price on
that row. Once bought, mark that cell as 'bought' (maybe set null).
Recalculate lowest price, range for that row (ignoring cells marked as 'bought').

EDIT: Hungarian algorithm is not the answer to your question unless you did not wanted to put a limit on vendors.
The algorithm you are looking for is Hungarian Algorithm.
There are many available implementations of it on the web.

Related

Understanding the concise DP solution for best time to buy and sell stocks IV

The problem is the famous leetcode problem (or in similar other contexts), best to buy and sell stocks, with at most k transactions.
Here is the screenshot of the problem:
I am trying to make sense of this DP solution. You can ignore the first part of large k. I don't understand the dp part why it works.
class Solution(object):
def maxProfit(self, k, prices):
"""
:type k: int
:type prices: List[int]
:rtype: int
"""
# for large k greedy approach (ignore this part for large k)
if k >= len(prices) / 2:
profit = 0
for i in range(1, len(prices)):
profit += max(0, prices[i] - prices[i-1])
return profit
# Don't understand this part
dp = [[0]*2 for i in range(k+1)]
for i in reversed(range(len(prices))):
for j in range (1, k+1):
dp[j][True] = max(dp[j][True], -prices[i]+dp[j][False])
dp[j][False] = max(dp[j][False], prices[i]+dp[j-1][True])
return dp[k][True]
I was able to drive a similar solution, but that uses two rows (dp and dp2) instead of just one row (dp in the solution above). To me it looks like the solution is overwriting values on itself, which for this solution doesn't look right. However the answer works and passes leetcode.
Lets put words to it:
for i in reversed(range(len(prices))):
For each future price we already know in advance, after considering later prices.
for j in range (1, k+1):
For each possibility of considering this price as one of k two-price transactions.
dp[j][True] = max(dp[j][True], -prices[i]+dp[j][False])
If we consider this might be a purchase -- since we are going backwards in time, a purchase means a completed transaction -- we choose the best of (1) having considered the jth purchase already (dp[j][True]) or (2) subtract this price as a purchase and add the best result we have already that includes the jth sale (-prices[i] + dp[j][False]).
dp[j][False] = max(dp[j][False], prices[i]+dp[j-1][True])
Otherwise, we might consider this as a sale (the first half of a transaction since we're going backwards), so we choose the best of (1) the jth sale already considered (dp[j][False]), or (2) we add this price as a sale and add to that the best result we have so far for the first (j - 1) completed transactions (prices[i] + dp[j-1][True]).
Note that the first dp[j][False] is referring to the jth "half-transaction," the sale if you will, since we are going backwards in time, that would have been set in an earlier iteration on a later price. We then can possibly overwrite it with our consideration of this price as a jth sale.

Best algorithm for netting orders

I am building a marketplace, and I want to build a matching mechanism for market participants orders.
For instance I receive these orders:
A buys 50
B buys 100
C sells 50
D sells 20
that can be represented as a List<Orders>, where Order is a class with Participant,BuySell, and Amount
I want to create a Match function, that outputs 2 things:
A set of unmatched orders (List<Order>)
A set of matched orders (List<MatchedOrder> where MatchOrder has Buyer,Seller,Amount
The constrain is to minimize the number of orders (unmatched and matched), while leaving no possible match undone (ie, in the end there can only be either buy or sell orders that are unmatched)
So in the example above the result would be:
A buys 50 from C
B buys 20 from D
B buys 80 (outstanding)
This seems like a fairly complex algorithm to write but very common in practice. Any pointers for where to look at?
You can model this as a flow problem in a bipartite graph. Every selling node will be on the left, and every buying node will be on the right. Like this:
Then you must find the maximum amount of flow you can pass from source to sink.
You can use any maximum flow algorithms you want, e.g. Ford Fulkerson. To minimize the number of orders, you can use a Maximum Flow/Min Cost algorithm. There are a number of techniques to do that, including applying Cycle Canceling after finding a normal MaxFlow solution.
After running the algorithm, you'll probably have a residual network like the following:
Create a WithRemainingQuantity structure with 2 members: a pointeur o to an order and an integer to store the unmatched quantity
Consider 2 List<WithRemainingQuantity> , 1 for buys Bq, 1 for sells Sq, both sorted by descending quantities of the contained order.
the algo match the head of each queue until one of them is empty
Algo (mix of meta and c++) :
struct WithRemainingQuantity
{
Order * o;
int remainingQty; // initialised with o->getQty
}
struct MatchedOrder
{
Order * orderBuy;
Order * orderSell;
int matchedQty=0;
}
List<WithRemainingQuantity> Bq;
List<WithRemainingQuantity> Sq;
/*
populate Bq and Sq and sort by quantities descending,
this is what guarantees the minimum of matched.
*/
List<MatchedOrder> l;
while( ! Bq.empty && !Sq.empty)
{
int matchedQty = std::min(Bq.front().remainingQty, Sq.front().remainingQty)
l.push_back( MatchedOrder(orderBuy=Bq.front(), sellOrder=Sq.front(), qtyMatched=matchedQty) )
Bq.remainingQty -= matchedQty
Sq.remainingQty -= matchedQty
if(Bq.remainingQty==0)
Bq.pop_front()
if(Sq.remainingQty==0)
Sq.pop_front()
}
The unmatched orders are the remaining orders in Bq or Sq (one of them if fatally empty, according to the while clause).

Algorithm possible amounts (over)paid for a specific price, based on denominations

In a current project, people can order goods delivered to their door and choose 'pay on delivery' as a payment option. To make sure the delivery guy has enough change customers are asked to input the amount they will pay (e.g. delivery is 48,13, they will pay with 60,- (3*20,-)). Now, if it were up to me I'd make it a free field, but apparantly higher-ups have decided is should be a selection based on available denominations, without giving amounts that would result in a set of denominations which could be smaller.
Example:
denominations = [1,2,5,10,20,50]
price = 78.12
possibilities:
79 (multitude of options),
80 (e.g. 4*20)
90 (e.g. 50+2*20)
100 (2*50)
It's international, so the denominations could change, and the algorithm should be based on that list.
The closest I have come which seems to work is this:
for all denominations in reversed order (large=>small)
add ceil(price/denomination) * denomination to possibles
baseprice = floor(price/denomination) * denomination;
for all smaller denominations as subdenomination in reversed order
add baseprice + (ceil((price - baseprice) / subdenomination) * subdenomination) to possibles
end for
end for
remove doubles
sort
Is seems to work, but this has emerged after wildly trying all kinds of compact algorithms, and I cannot defend why it works, which could lead to some edge-case / new countries getting wrong options, and it does generate some serious amounts of doubles.
As this is probably not a new problem, and Google et al. could not provide me with an answer save for loads of pages calculating how to make exact change, I thought I'd ask SO: have you solved this problem before? Which algorithm? Any proof it will always work?
Its an application of the Greedy Algorithm http://mathworld.wolfram.com/GreedyAlgorithm.html (An algorithm used to recursively construct a set of objects from the smallest possible constituent parts)
Pseudocode
list={1,2,5,10,20,50,100} (*ordered *)
while list not null
found_answer = false
p = ceil(price) (* assume integer denominations *)
while not found_answer
find_greedy (p, list) (*algorithm in the reference above*)
p++
remove(first(list))
EDIT> some iterations are nonsense>
list={1,2,5,10,20,50,100} (*ordered *)
p = ceil(price) (* assume integer denominations *)
while list not null
found_answer = false
while not found_answer
find_greedy (p, list) (*algorithm in the reference above*)
p++
remove(first(list))
EDIT>
I found an improvement due to Pearson on the Greedy algorithm. Its O(N^3 log Z), where N is the number of denominations and Z is the greatest bill of the set.
You can find it in http://library.wolfram.com/infocenter/MathSource/5187/
You can generate in database all possible combination sets of payd coins and paper (im not good in english) and each row contains sum of this combination.
Having this database you can simple get all possible overpaid by one query,
WHERE sum >= cost and sum <= cost + epsilon
Some word about epsilon, hmm.. you can assign it from cost value? Maybe 10% of cost + 10 bucks?:
WHERE sum >= cost and sum <= cost * 1.10 + 10
Table structure must have number of columns representing number of coins and paper type.
Value of each column have number of occurences of this type of paid item.
This is not optimal and fastest solution of this problem but easy and simple to implement.
I think about better solution of this.
Other way you can for from cost to cost + epsilon and for each value calculate smallest possible number of paid items for each. I have algorithm for it. You can do this with this algorithm but this is in C++:
int R[10000];
sort(C, C + coins, cmp);
R[0]=0;
for(int i=1; i <= coins_weight; i++)
{
R[i] = 1000000;
for (int j=0; j < coins; j++)
{
if((C[j].weight <= i) && ((C[j].value + R[i - C[j].weight]) < R[i]))
{
R[i] = C[j].value + R[i - C[j].weight];
}
}
}
return R[coins_weight];

What is a better way to sort by a 5 star rating?

I'm trying to sort a bunch of products by customer ratings using a 5 star system. The site I'm setting this up for does not have a lot of ratings and continue to add new products so it will usually have a few products with a low number of ratings.
I tried using average star rating but that algorithm fails when there is a small number of ratings.
Example a product that has 3x 5 star ratings would show up better than a product that has 100x 5 star ratings and 2x 2 star ratings.
Shouldn't the second product show up higher because it is statistically more trustworthy because of the larger number of ratings?
Prior to 2015, the Internet Movie Database (IMDb) publicly listed the formula used to rank their Top 250 movies list. To quote:
The formula for calculating the Top Rated 250 Titles gives a true Bayesian estimate:
weighted rating (WR) = (v ÷ (v+m)) × R + (m ÷ (v+m)) × C
where:
R = average for the movie (mean)
v = number of votes for the movie
m = minimum votes required to be listed in the Top 250 (currently 25000)
C = the mean vote across the whole report (currently 7.0)
For the Top 250, only votes from regular voters are considered.
It's not so hard to understand. The formula is:
rating = (v / (v + m)) * R +
(m / (v + m)) * C;
Which can be mathematically simplified to:
rating = (R * v + C * m) / (v + m);
The variables are:
R – The item's own rating. R is the average of the item's votes. (For example, if an item has no votes, its R is 0. If someone gives it 5 stars, R becomes 5. If someone else gives it 1 star, R becomes 3, the average of [1, 5]. And so on.)
C – The average item's rating. Find the R of every single item in the database, including the current one, and take the average of them; that is C. (Suppose there are 4 items in the database, and their ratings are [2, 3, 5, 5]. C is 3.75, the average of those numbers.)
v – The number of votes for an item. (To given another example, if 5 people have cast votes on an item, v is 5.)
m – The tuneable parameter. The amount of "smoothing" applied to the rating is based on the number of votes (v) in relation to m. Adjust m until the results satisfy you. And don't misinterpret IMDb's description of m as "minimum votes required to be listed" – this system is perfectly capable of ranking items with less votes than m.
All the formula does is: add m imaginary votes, each with a value of C, before calculating the average. In the beginning, when there isn't enough data (i.e. the number of votes is dramatically less than m), this causes the blanks to be filled in with average data. However, as votes accumulates, eventually the imaginary votes will be drowned out by real ones.
In this system, votes don't cause the rating to fluctuate wildly. Instead, they merely perturb it a bit in some direction.
When there are zero votes, only imaginary votes exist, and all of them are C. Thus, each item begins with a rating of C.
See also:
A demo. Click "Solve".
Another explanation of IMDb's system.
An explanation of a similar Bayesian star-rating system.
Evan Miller shows a Bayesian approach to ranking 5-star ratings:
where
nk is the number of k-star ratings,
sk is the "worth" (in points) of k stars,
N is the total number of votes
K is the maximum number of stars (e.g. K=5, in a 5-star rating system)
z_alpha/2 is the 1 - alpha/2 quantile of a normal distribution. If you want 95% confidence (based on the Bayesian posterior distribution) that the actual sort criterion is at least as big as the computed sort criterion, choose z_alpha/2 = 1.65.
In Python, the sorting criterion can be calculated with
def starsort(ns):
"""
http://www.evanmiller.org/ranking-items-with-star-ratings.html
"""
N = sum(ns)
K = len(ns)
s = list(range(K,0,-1))
s2 = [sk**2 for sk in s]
z = 1.65
def f(s, ns):
N = sum(ns)
K = len(ns)
return sum(sk*(nk+1) for sk, nk in zip(s,ns)) / (N+K)
fsns = f(s, ns)
return fsns - z*math.sqrt((f(s2, ns)- fsns**2)/(N+K+1))
For example, if an item has 60 five-stars, 80 four-stars, 75 three-stars, 20 two-stars and 25 one-stars, then its overall star rating would be about 3.4:
x = (60, 80, 75, 20, 25)
starsort(x)
# 3.3686975120774694
and you can sort a list of 5-star ratings with
sorted([(60, 80, 75, 20, 25), (10,0,0,0,0), (5,0,0,0,0)], key=starsort, reverse=True)
# [(10, 0, 0, 0, 0), (60, 80, 75, 20, 25), (5, 0, 0, 0, 0)]
This shows the effect that more ratings can have upon the overall star value.
You'll find that this formula tends to give an overall rating which is a bit
lower than the overall rating reported by sites such as Amazon, Ebay or Wal-mart
particularly when there are few votes (say, less than 300). This reflects the
higher uncertainy that comes with fewer votes. As the number of votes increases
(into the thousands) all overall these rating formulas should tend to the
(weighted) average rating.
Since the formula only depends on the frequency distribution of 5-star ratings
for the item itself, it is easy to combine reviews from multiple sources (or,
update the overall rating in light of new votes) by simply adding the frequency
distributions together.
Unlike the IMDb formula, this formula does not depend on the average score
across all items, nor an artificial minimum number of votes cutoff value.
Moreover, this formula makes use of the full frequency distribution -- not just
the average number of stars and the number of votes. And it makes sense that it
should since an item with ten 5-stars and ten 1-stars should be treated as
having more uncertainty than (and therefore not rated as highly as) an item with
twenty 3-star ratings:
In [78]: starsort((10,0,0,0,10))
Out[78]: 2.386028063783418
In [79]: starsort((0,0,20,0,0))
Out[79]: 2.795342687927806
The IMDb formula does not take this into account.
See this page for a good analysis of star-based rating systems, and this one for a good analysis of upvote-/downvote- based systems.
For up and down voting you want to estimate the probability that, given the ratings you have, the "real" score (if you had infinite ratings) is greater than some quantity (like, say, the similar number for some other item you're sorting against).
See the second article for the answer, but the conclusion is you want to use the Wilson confidence. The article gives the equation and sample Ruby code (easily translated to another language).
Well, depending on how complex you want to make it, you could have ratings additionally be weighted based on how many ratings the person has made, and what those ratings are. If the person has only made one rating, it could be a shill rating, and might count for less. Or if the person has rated many things in category a, but few in category b, and has an average rating of 1.3 out of 5 stars, it sounds like category a may be artificially weighed down by the low average score of this user, and should be adjusted.
But enough of making it complex. Let’s make it simple.
Assuming we’re working with just two values, ReviewCount and AverageRating, for a particular item, it would make sense to me to look ReviewCount as essentially being the “reliability” value. But we don’t just want to bring scores down for low ReviewCount items: a single one-star rating is probably as unreliable as a single 5 star rating. So what we want to do is probably average towards the middle: 3.
So, basically, I’m thinking of an equation something like X * AverageRating + Y * 3 = the-rating-we-want. In order to make this value come out right we need X+Y to equal 1. Also we need X to increase in value as ReviewCount increases...with a review count of 0, x should be 0 (giving us an equation of “3”), and with an infinite review count X should be 1 (which makes the equation = AverageRating).
So what are X and Y equations? For the X equation want the dependent variable to asymptotically approach 1 as the independent variable approaches infinity. A good set of equations is something like:
Y = 1/(factor^RatingCount)
and (utilizing the fact that X must be equal to 1-Y)
X = 1 – (1/(factor^RatingCount)
Then we can adjust "factor" to fit the range that we're looking for.
I used this simple C# program to try a few factors:
// We can adjust this factor to adjust our curve.
double factor = 1.5;
// Here's some sample data
double RatingAverage1 = 5;
double RatingCount1 = 1;
double RatingAverage2 = 4.5;
double RatingCount2 = 5;
double RatingAverage3 = 3.5;
double RatingCount3 = 50000; // 50000 is not infinite, but it's probably plenty to closely simulate it.
// Do the calculations
double modfactor = Math.Pow(factor, RatingCount1);
double modRating1 = (3 / modfactor)
+ (RatingAverage1 * (1 - 1 / modfactor));
double modfactor2 = Math.Pow(factor, RatingCount2);
double modRating2 = (3 / modfactor2)
+ (RatingAverage2 * (1 - 1 / modfactor2));
double modfactor3 = Math.Pow(factor, RatingCount3);
double modRating3 = (3 / modfactor3)
+ (RatingAverage3 * (1 - 1 / modfactor3));
Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
RatingAverage1, RatingCount1, modRating1));
Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
RatingAverage2, RatingCount2, modRating2));
Console.WriteLine(String.Format("RatingAverage: {0}, RatingCount: {1}, Adjusted Rating: {2:0.00}",
RatingAverage3, RatingCount3, modRating3));
// Hold up for the user to read the data.
Console.ReadLine();
So you don’t bother copying it in, it gives this output:
RatingAverage: 5, RatingCount: 1, Adjusted Rating: 3.67
RatingAverage: 4.5, RatingCount: 5, Adjusted Rating: 4.30
RatingAverage: 3.5, RatingCount: 50000, Adjusted Rating: 3.50
Something like that? You could obviously adjust the "factor" value as needed to get the kind of weighting you want.
You could sort by median instead of arithmetic mean. In this case both examples have a median of 5, so both would have the same weight in a sorting algorithm.
You could use a mode to the same effect, but median is probably a better idea.
If you want to assign additional weight to the product with 100 5-star ratings, you'll probably want to go with some kind of weighted mode, assigning more weight to ratings with the same median, but with more overall votes.
If you just need a fast and cheap solution that will mostly work without using a lot of computation here's one option (assuming a 1-5 rating scale)
SELECT Products.id, Products.title, avg(Ratings.score), etc
FROM
Products INNER JOIN Ratings ON Products.id=Ratings.product_id
GROUP BY
Products.id, Products.title
ORDER BY (SUM(Ratings.score)+25.0)/(COUNT(Ratings.id)+20.0) DESC, COUNT(Ratings.id) DESC
By adding in 25 and dividing by the total ratings + 20 you're basically adding 10 worst scores and 10 best scores to the total ratings and then sorting accordingly.
This does have known issues. For example, it unfairly rewards low-scoring products with few ratings (as this graph demonstrates, products with an average score of 1 and just one rating score a 1.2 while products with an average score of 1 and 1k+ ratings score closer to 1.05). You could also argue it unfairly punishes high-quality products with few ratings.
This chart shows what happens for all 5 ratings over 1-1000 ratings:
http://www.wolframalpha.com/input/?i=Plot3D%5B%2825%2Bxy%29/%2820%2Bx%29%2C%7Bx%2C1%2C1000%7D%2C%7By%2C0%2C6%7D%5D
You can see the dip upwards at the very bottom ratings, but overall it's a fair ranking, I think. You can also look at it this way:
http://www.wolframalpha.com/input/?i=Plot3D%5B6-%28%2825%2Bxy%29/%2820%2Bx%29%29%2C%7Bx%2C1%2C1000%7D%2C%7By%2C0%2C6%7D%5D
If you drop a marble on most places in this graph, it will automatically roll towards products with both higher scores and higher ratings.
Obviously, the low number of ratings puts this problem at a statistical handicap. Never the less...
A key element to improving the quality of an aggregate rating is to "rate the rater", i.e. to keep tabs of the ratings each particular "rater" has supplied (relative to others). This allows weighing their votes during the aggregation process.
Another solution, more of a cope out, is to supply the end-users with a count (or a range indication thereof) of votes for the underlying item.
One option is something like Microsoft's TrueSkill system, where the score is given by mean - 3*stddev, where the constants can be tweaked.
After look for a while, I choose the Bayesian system.
If someone is using Ruby, here a gem for it:
https://github.com/wbotelhos/rating
I'd highly recommend the book Programming Collective Intelligence by Toby Segaran (OReilly) ISBN 978-0-596-52932-1 which discusses how to extract meaningful data from crowd behaviour. The examples are in Python, but its easy enough to convert.

What is the algorithm to determine the best way to distribute these coupons?

Here is my problem. Imagine I am buying 3 different items, and I have up to 5 coupons. The coupons are interchangeable, but worth different amounts when used on different items.
Here is the matrix which gives the result of spending different numbers of coupons on different items:
coupons: 1 2 3 4 5
item 1 $10 off $15 off
item 2 $5 off $15 off $25 off $35 off
item 3 $2 off
I have manually worked out the best actions for this example:
If I have 1 coupon, item 1 gets it for $10 off
If I have 2 coupons, item 1 gets them for $15 off
If I have 3 coupons, item 1 gets 2, and item 3 gets 1, for $17 off
If I have 4 coupons, then either:
Item 1 gets 1 and item 2 gets 3 for a total of $25 off, or
Item 2 gets all 4 for $25 off.
If I have 5 coupons, then item 2 gets all 5 for $35 off.
However, I need to develop a general algorithm which will handle different matrices and any number of items and coupons.
I suspect I will need to iterate through every possible combination to find the best price for n coupons. Does anyone here have any ideas?
This seems like a good candidate for dynamic programming:
//int[,] discountTable = new int[NumItems][NumCoupons+1]
// bestDiscount[i][c] means the best discount if you can spend c coupons on items 0..i
int[,] bestDiscount = new int[NumItems][NumCoupons+1];
// the best discount for a set of one item is just use the all of the coupons on it
for (int c=1; c<=MaxNumCoupons; c++)
bestDiscount[0, c] = discountTable[0, c];
// the best discount for [i, c] is spending x coupons on items 0..i-1, and c-x coupons on item i
for (int i=1; i<NumItems; i++)
for (int c=1; c<=NumCoupons; c++)
for (int x=0; x<c; x++)
bestDiscount[i, c] = Math.Max(bestDiscount[i, c], bestDiscount[i-1, x] + discountTable[i, c-x]);
At the end of this, the best discount will be the highest value of bestDiscount[NumItems][x]. To rebuild the tree, follow the graph backwards:
edit to add algorithm:
//int couponsLeft;
for (int i=NumItems-1; i>=0; i++)
{
int bestSpend = 0;
for (int c=1; c<=couponsLeft; c++)
if (bestDiscount[i, couponsLeft - c] > bestDiscount[i, couponsLeft - bestSpend])
bestSpend = c;
if (i == NumItems - 1)
Console.WriteLine("Had {0} coupons left over", bestSpend);
else
Console.WriteLine("Spent {0} coupons on item {1}", bestSpend, i+1);
couponsLeft -= bestSpend;
}
Console.WriteLine("Spent {0} coupons on item 0", couponsLeft);
Storing the graph in your data structure is also a good way, but that was the way I had thought of.
It's the Knapsack problem, or rather a variation of it. Doing some research on algorithms related to this problem will point you in the best direction.
I think dynamic programming should do this. Basically, you keep track of an array A[n, c] whose values mean the optimal discount while buying the n first items having spent c coupons. The values for a[n, 0] should be 0 for all values of n, so that is a good start. Also, A[0, c] is 0 for all c.
When you evaluate A[n,c], you loop over all discount offers for item n, and add the discount for that particular offer to A[n-1,c-p] where p is the price in coupons for this particular discount. A[n-1, c-p] must of course be calculated (in the same way) prior to this. Keep the best combination and store in the array.
A recursive implementation would probably give the cleanest implementation. In that case, you should find the answer in A[N,C] where N is the total number of items and C is the total number of available coupons.
This can be written as a linear programming problem. For most 'typical' problems, the simplex method is a fast, relatively simple way to solve such problems, or there are open source LP solvers available.
For your example:
Let 0 <= xi <= 1
x1 = One if one coupon is spent on item 1, zero otherwise
x2 = One if two coupons are spent on item 1, zero otherwise
x3 = One if one coupon is spent on item 2, zero otherwise
x4 = One if two coupons are spent on item 2, zero otherwise
x5 = One if three coupons are spent on item 3, zero otherwise
...
Assume that if I spend two coupons on item 1, then both x1 and x2 are one. This implies the constraint
x1 >= x2
With similar constraints for the other items, e.g.,
x3 >= x4
x4 >= x5
The amount saved is
Saved = 10 x1 + 5 x2 + 0 x3 + 5 x4 + 10 x5 + ...
If you want to find the most money saved with a fixed number of coupons, then you want to minimize Saved subject to the constraints above and the additional constraint:
coupon count = x1 + x2 + x3 + ...
This works for any matrix and number of items. Changing notation (and feeling sad that I can't do subscripts), let 0 <= y_ij <= 1 be one if j coupons are spent on item number i. The we have the constraints
y_i(j-1) >= y_ij
If the amount saved from spending j coupons on item i is M_ij, where we define M_i0 = 0, then maximize
Saved = Sum_ij (M_ij - M_i(j-1)) y_ij
subject to the above constraints and
coupon count = Sum_ij y_ij
(The italics formatting doesn't seem to be working here)
I suspect some kind of sorted list for memoizing each coupon-count can help here.
for example, if you have 4 coupons, the optimum is possibly:
using all 4 on something. You have to check all these new prices.
using 3 and 1. either the 3-item is the optimal solution for 3 coupons, or that item overlaps with the three top choices for 1-coupon items, in which case you need to find the best combination of one of the three best 1-coupon and 3-coupon items.
using 2 and 2. find top 3 2-items. if #1 and #2 overlap, #1 and #3 , unless they also overlap, in which case #2 and #3 don't.
this answer is pretty vague.. I need to put more thought into it.
This problem is similar in concept to the Traveling Salesman problem where O(n!) is the best for finding the optimal solution. There are several shortcuts that can be taken but they required lots and lots of time to discover, which I doubt you have.
Checking each possible combination is going to be the best use of your time, assuming we are dealing with small numbers of coupons. Make the client wait a bit instead of you spending years on it.

Resources