Algorithm to pick elements of collection to achieve average price

Algorithm to pick elements of collection to achieve average price - algorithm

I need to solve the following problem:
I have an array of items with a price attached and I need to pick a given quantity to achieve a given weighted average price.
eg.
I can group quantities in bins for the same price for better understanding:
price qty
$1.00 60
$1.04 80
$0.99 55
$1.10 130
Now I need to choose how many elements I need from each bin to achieve 100 elements at an average price = 1.035
Do you know an algorithm to search for an approximate solution without brute force combination? - I am time constrained too.
It looks some sort of the knapsack algorithm, although I'm not sure of the restrains I must use.
Thank you

This is an integer linear programming problem.
Expressing this as a linear programming problem would mean having variables a, b, c, d with a+b+c+d=100, and a*1+b*1.04+c*0.99+d*1.10=1.035*100 and 0<=a<=60, 0<=b<=80, 0<=c<=55 and 0<=d<=130.
You don't explicitly say, but probably want integer solutions (that is, the quantities bought must be integers), hence it's an integer linear programming problem, rather than a linear programming problem. Although ILP problems are NP-hard in general, you can find solvers and algorithms online that will easily solve this small problem. For example, try GLPK.
Note also, that brute force will work well here; there's only 176851 combinations of a,b,c,d to try.

Related

maximize profit with n products satisfying certain constraints

I am given a list of n products with associated profits and costs per unit. The aim is to maximize the profits while keeping the total cost below some threshold. For each product either one or zero are produced.
Now suppose we have three products and Suppose we label these products 1,2 and 3. Then all possible combinations of productions can be given as the binary numbers 111,110,101,011,100,010,001 and 000, where a 1 in the i^th position denotes a production of one of product i and similarly for zero. We could then easily check which of these combinations has a production cost under the threshold and has the maximum profit. This algorithm would then be of order O(2^n) because for n products we have to check 2^n binary numbers. We can probably make this a little faster by recognizing that if 100 is above the threshold already we need not check 110 and 111 and some stuff like this but the order will not change because of this. How can I make a smarter algorithm maybe that has a better time complexity. The n can be as large as 100 in which case checking 2^100 numbers is not possible. Thanks in advance

If your costs are integers that are not too big, you can use the dynamic programming solution for the knapsack problem, which is listed in the link mentioned in David Eisenstat's comment. If your costs are either big integers or fractional, then your best bet is using one of the existing knapsack solvers that e.g. reduce to an integer linear programming problem and then do something like branch and bound in order to solve. At any rate, your problem IS the knapsack problem, with the only slight modification that you don't have to fill the knapsack completely, you can fill it partially as long as you don't overfill it. However this variant is also studied along with the original formulation, and there are solvers for it. Also it is easy to modify the dynamic programming solution to handle this, let me know if it's unclear how and I'll update my answer with an explanation.

Suggestions for fragment proposal algorithm

I'm currently trying to solve the following problem, but am unsure which algorithm I should be using. Its in the area of mass identification.
I have a series of "weights", *w_i*, which can sum up to a total weight. The as-measured total weight has an error associated with it, so is thus inexact.
I need to find, given the total weight T, the closest k possible combinations of weights that can sum up to the total, where k is an input from the user. Each weight can be used multiple times.
Now, this sounds suspiciously like the bounded-integer multiple knapsack problem, however
it is possible to go over the weight, and
I also want all of the ranked solutions in terms of error
I can probably solve it using multiple sweeps of the knapsack problem, from weight-error->weight+error, by stepping in small enough increments, however it is possible if the increment is too large to miss certain weight combinations that could be used.
The number of weights is usually small (4 ->10 weights) and the ratio of the total weight to the mean weight is usually around 2 or 3
Does anyone know the names of an algorithm that might be suitable here?

Your problem effectively resembles the knapsack problem which is a NP-complete problem.
For really limited number of weights, you could run over every combinations with repetition followed by a sorting which gives you a quite high number of manipulations; at best: (n + k - 1)! / ((n - 1)! · k!) for the combination and n·log(n) for the sorting part.
Solving this kind of problem in a reasonable amount of time is best done by evolutionary algorithms nowadays.
If you take the following example from deap, an evolutionary algorithm framework in Python:
ga_knapsack.py, you realise that by modifying lines 58-59 that automatically discards an overweight solution for something smoother (a linear relation, for instance), it will give you solutions close to the optimal one in a shorter time than brute force. Solutions are already sorted for you at the end, as you requested.

As a first attempt I'd go for constraint programming (but then I almost always do, so take the suggestion with a pinch of salt):
Given W=w_1, ..., w_i for weights and E=e_1,.., e_i for the error (you can also make it asymmetric), and T.
Find all sets S (if the weights are unique, or a list) st sum w_1+e_1,..., w_k+e_k (where w_1, .., w_k \elem and e_1, ..., e_k \elem E) \approx T within some delta which you derive from k. Or just set it to some reasonably large value and decrease it as you are solving the constraints.
I just realise that you also want to parametrise the expression w_n op e_m over op \elem +, - (any combination of weights and error terms) and off the top of my head I don't know which constraint solver would allow you to do that. In any case, you can always fall back to prolog. It may not fly, especially if you have a lot of weights, but it will give you solutions quickly.

Group incoming and outgoing invoices to make their sum 0

I've faced an interesting problem today, and decided to write an algorithm in C# to solve it.
There are incoming invoices with negative totals and outgoing invoices with positive totals. The task is to make groups out of these invoices, where the total of the invoices adds up to exactly 0. Each group can contain unlimited members, so if there are two positive and one negative members but they total value is 0, it's okay.
We try to minimize the sum of the remaining invoices' totals, and there are no other constraints at all.
I'm wondering if this problem could be traced back to a known problem, and if not, which would be the most effective way to do this. The naive approach would be to separate incoming and outgoing invoices into two different groups, sort by total, then to try add invoices one by one until zero is reached or the sign has changed. However, this presumes that the invoices in a group should be approximately of the same magnitude, which is not true (one huge incoming invoice could be put against 10 smaller outgoing ones)
Any ideas?

The problem you are facing is a well known and studied one, and is called The Subset Sum Problem.
Unfortunately, the problem is NP-Complete, so there is no known polynomial solution for it1.
In fact, there is no known polynomial solution to even determine if such a subset (even a single one) exists, let alone find it.
However, if your input consists of relatively small (absolute value) integers, there is a pretty efficient (pseudo polynomial) dynamic programming solution that can be utilized to solve the problem.
If this is not the case some other alternatives are:
Using exponential solution like brute force (you might be able to optimize it using branch and bound technique)
Heuristical solutions, such as Steepest Ascent Hill Climbing or Genethic Algorithms.
Approximation algorithms
(1) And most computer science researchers believe one does not exist, this is basically the P VS NP Problem.

LP modelling question... long time since school

Sure, this isn't a programming question, per se... but I couldn't think of a better place to ask it all the same.
I'm writing an application that ultimately will assist a shopped to determine how to achieve the greatest savings on a specific site. The site provides two prices for just about every product - a regular price and a discounted price. The discounted price is available to anyone, but only one discounted item can be added to any given order. With just that information, the incentive is to minimize your order siz and instead place multiple orders. On the other hand, the total shipping costs are determined by the order size (by weight) and so the incentive there is to maximize the order size and place just one order.
I'm looking for a model to determine the most efficient way to balance the orders given the available discount for one item and the weight's influencing the shipping costs for the order(s).
I'm remembering back to school enough that I think this is a linear programming problem... but all I can remember about that class was how confusing it was.
Anyone have any tips on how to go about the math for this program?

This isn't regular linear programming, this is integer linear programming. The former is solvable in O(n²), the second is NP-hard.
Some variant of the branch-and-bound algorithm should be applicable to your program. If you don't feel like implementing it yourself, available libraries include GLPK, COIN-OR and CPLEX.

Expanding on my comment above, this problem depends heavily on the precise structure of the shipping costs. Suppose that the shipping costs are linear with a (potentially) non-zero constant term. Namely, shipping cost = C + Rw, where C and R are constants and w is the weight of an order. Then, it turns out that the optimal solution is simple: group every item where the discount is less than C into one order and order each item where the discount is greater than C separately (left as an exercise for the reader). In the degenerate case where C = 0, you would simply place a separate order for each item.
On the other hand, if the shipping cost has more of a threshold structure -- e.g.: if the weight of a shipment is less than B then the cost is C1 but if it's greater than B then the cost is C2 -- the situation becomes a form of the NP-complete bin-packing problem. I should note here that just because a situation is shaped like an NP-complete problem that you shouldn't immediately give up hope. For many real-world situations, good heuristics exist and it's entirely possible that the range of real-world inputs restricts the problem to manageable instances.
In real life, the odds are that the shipping costs are probably a combination of a bunch of different things (e.g.: maybe piece-wise linear with discontinuities) which makes modeling the problem that much harder. But, I hope I've demonstrated that it's crucial to have a clear idea of how these costs are structured in order to understand your problem.

What's the most insidious way to pose this problem?

My best shot so far:
A delivery vehicle needs to make a series of deliveries (d1,d2,...dn), and can do so in any order--in other words, all the possible permutations of the set D = {d1,d2,...dn} are valid solutions--but the particular solution needs to be determined before it leaves the base station at one end of the route (imagine that the packages need to be loaded in the vehicle LIFO, for example).
Further, the cost of the various permutations is not the same. It can be computed as the sum of the squares of distance traveled between di -1 and di, where d0 is taken to be the base station, with the caveat that any segment that involves a change of direction costs 3 times as much (imagine this is going on on a railroad or a pneumatic tube, and backing up disrupts other traffic).
Given the set of deliveries D represented as their distance from the base station (so abs(di-dj) is the distance between two deliveries) and an iterator permutations(D) which will produce each permutation in succession, find a permutation which has a cost less than or equal to that of any other permutation.
Now, a direct implementation from this description might lead to code like this:
function Cost(D) ...
function Best_order(D)
for D1 in permutations(D)
Found = true
for D2 in permutations(D)
Found = false if cost(D2) > cost(D1)
return D1 if Found
Which is O(n*n!^2), e.g. pretty awful--especially compared to the O(n log(n)) someone with insight would find, by simply sorting D.
My question: can you come up with a plausible problem description which would naturally lead the unwary into a worse (or differently awful) implementation of a sorting algorithm?

I assume you're using this question for an interview to see if the applicant can notice a simple solution in a seemingly complex question.
[This assumption is incorrect -- MarkusQ]
You give too much information.
The key to solving this is realizing that the points are in one dimension and that a sort is all that is required. To make this question more difficult hide this fact as much as possible.
The biggest clue is the distance formula. It introduces a penalty for changing directions. The first thing an that comes to my mind is minimizing this penalty. To remove the penalty I have to order them in a certain direction, this ordering is the natural sort order.
I would remove the penalty for changing directions, it's too much of a give away.
Another major clue is the input values to the algorithm: a list of integers. Give them a list of permutations, or even all permutations. That sets them up to thinking that a O(n!) algorithm might actually be expected.
I would phrase it as:
Given a list of all possible
permutations of n delivery locations,
where each permutation of deliveries
(d1, d2, ...,
dn) has a cost defined by:
Return permutation P such that the
cost of P is less than or equal to any
other permutation.
All that really needs to be done is read in the first permutation and sort it.
If they construct a single loop to compare the costs ask them what the big-o runtime of their algorithm is where n is the number of delivery locations (Another trap).

This isn't a direct answer, but I think more clarification is needed.
Is di allowed to be negative? If so, sorting alone is not enough, as far as I can see.
For example:
d0 = 0
deliveries = (-1,1,1,2)
It seems the optimal path in this case would be 1 > 2 > 1 > -1.
Edit: This might not actually be the optimal path, but it illustrates the point.

YOu could rephrase it, having first found the optimal solution, as
"Give me a proof that the following convination is the most optimal for the following set of rules, where optimal means the smallest number results from the sum of all stage costs, taking into account that all stages (A..Z) need to be present once and once only.
Convination:
A->C->D->Y->P->...->N
Stage costs:
A->B = 5,
B->A = 3,
A->C = 2,
C->A = 4,
...
...
...
Y->Z = 7,
Z->Y = 24."
That ought to keep someone busy for a while.

This reminds me of the Knapsack problem, more than the Traveling Salesman. But the Knapsack is also an NP-Hard problem, so you might be able to fool people to think up an over complex solution using dynamic programming if they correlate your problem with the Knapsack. Where the basic problem is:
can a value of at least V be achieved
without exceeding the weight W?
Now the problem is a fairly good solution can be found when V is unique, your distances, as such:
The knapsack problem with each type of
item j having a distinct value per
unit of weight (vj = pj/wj) is
considered one of the easiest
NP-complete problems. Indeed empirical
complexity is of the order of O((log
n)2) and very large problems can be
solved very quickly, e.g. in 2003 the
average time required to solve
instances with n = 10,000 was below 14
milliseconds using commodity personal
computers1.
So you might want to state that several stops/packages might share the same vj, inviting people to think about the really hard solution to:
However in the
degenerate case of multiple items
sharing the same value vj it becomes
much more difficult with the extreme
case where vj = constant being the
subset sum problem with a complexity
of O(2N/2N).
So if you replace the weight per value to distance per value, and state that several distances might actually share the same values, degenerate, some folk might fall in this trap.

Isn't this just the (NP-Hard) Travelling Salesman Problem? It doesn't seem likely that you're going to make it much harder.
Maybe phrasing the problem so that the actual algorithm is unclear - e.g. by describing the paths as single-rail railway lines so the person would have to infer from domain knowledge that backtracking is more costly.
What about describing the question in such a way that someone is tempted to do recursive comparisions - e.g. "can you speed up the algorithm by using the optimum max subset of your best (so far) results"?
BTW, what's the purpose of this - it sounds like the intent is to torture interviewees.

You need to be clearer on whether the delivery truck has to return to base (making it a round trip), or not. If the truck does return, then a simple sort does not produce the shortest route, because the square of the return from the furthest point to base costs so much. Missing some hops on the way 'out' and using them on the way back turns out to be cheaper.
If you trick someone into a bad answer (for example, by not giving them all the information) then is it their foolishness or your deception that has caused it?
How great is the wisdom of the wise, if they heed not their ego's lies?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio