Formal name for this optimization algorithm? - algorithm

I have the following problem in one of my coding project which I will simplify here:
I am ordering groceries online and want very specific things in very specific quantities. I would like to order the following:
8 Apples
1 Yam
2 Soups
3 Steaks
20 Orange Juices
There are many stores equidistant from me which I will have food delivered from. Not all stores have what I need. I want to obtain what I need with the fewest number of orders made. For example, ordering from Store #2 below is a wasted order, since I can complete my items in less orders by ordering from different stores. What is the name of the optimization algorithm that solves this?
Store #1 Supply
50 Apples
Store #2 Supply
1 Orange Juice
2 Steaks
1 Soup
Store #3 Supply
25 Soup
50 Orange Juices
Store #4 Supply
25 Steaks
10 Yams
The lowest possible orders is 3 in this case. 8 Apples from Store #1. 2 Soup and 20 Orange Juice from Store #3. 1 Yam and 3 Steaks from Store #4.

To me, this most likely sounds like a restricted case of the Integer Linear programming problem (ILP), namely, its 0-or-1 variant, where the integer variables are restricted to the set {0, 1}. This is known to be NP-hard (and the corresponding decision problem is NP-complete).
The problem is formulated as follows (following the conventions in the op. cit.):
Given the matrix A, the constraint vector b, and the weight vector c, find the vector x ∈ {0, 1}N such that all the constraints A⋅x ≥ b are satisfied, and the cost c⋅x is minimal.
I flipped the constraint inequality, but this is equivalent to changing the sign of both A and b.
The inequalities indicate satisfaction of your order: that you can buy at the least the amount of every item in the visited store. Note that b has the same length as the number of rows in A and the number of columns in both c and x. The dot-product c⋅x is, naturally, a scalar.
Since you are minimizing the number of trips, each trip costs the same, so that c = 1, and c⋅x is the total number of trips. The store inventory matrix A has a row per item, and a column per store, and the b is your shopping list.
Naturally, the exact best solution is found by trying all possible 2N values for the x.
Since there is no single approach to NP-hard problems, consider the problem size, and how close to the optimum you want to arrive. A greedy approach would work well (when your next store to visit has the most total number of items not yet satisfied) when the "inventories" are large. If you have the idea in advance about the expected minimum number of trips, you can trim the search beam at some value, exceeding the number of trips by some multiplication coefficient. This is the best approach when your search is time constrained (I routinely do beam searches, closely related to the branch-and-cut approach mentioned in the article, in graphs that take a few GB of memory slightly faster than the limit of 30ms per exploration step with a beam as wide as 10,000). Simulated annealing also works, if the search landscape is not excessively rough.
Also search on cs.SE; it may be even a better place for questions of this type.

Related

Most efficient seating arrangement

There are n (n < 1000) groups of friends, with the size of the group being characterized by an array A[] (2 <= A[i] < 1000). Tables are present such that they can accommodate r(r>2) people at a time. What is the minimum number of tables needed for seating everyone, subject to the constraint that for every person there should be another person from his/her group sitting at his/her table.
The approach I was thinking was to break every group into sizes of twos and threes and try to solve this problem, but there are many ways of dividing a number n into groups of twos and threes and not all of them may be optimal.
Does a Mixed Integer Programming model count?
Some notes on this formulation:
I used random data to form the groups.
x(i,j) is the number of people of group i sitting at table j.
x(i,j) is a semi-integer variable, that is: it is an integer variable with values zero or between LO and UP. Not all MIP solvers offer semi-continuous and semi-integer variables but it may come handy. Here I use it to enforce that at least 2 persons from the same group need to sit at a table. If a solver does not offer these type of variables, we can formulate this construct using additional binary variables as well.
y(j) is a binary variable (0 or 1) indicating if a table is used.
the capacity equation is somewhat smart: if a table is not used (y(j)=0) its capacity is reduced to zero.
the option optcr=0 indicates we want to solve to optimality. For large, difficult problems we may want to stop say at 5%.
the order equation makes sure we start filling tables from table 1. This also reduces the symmetry of the problem and may speed up solution times.
the above model (with 200 groups and 200 potentially used tables) generates a MIP problem with 600 equations (rows) and 40k variables (columns). There are 37k integer variables. With a good MIP solver we find the proven optimal solution (with 150 tables used) in less than a minute.
Notice this is certainly not a knapsack problem (as suggested in another answer -- a knapsack problem has just one constraint) but it resembles a bin-packing problem.
It is same problem as knapsack problem which is NP complete (see https://en.wikipedia.org/wiki/Bin_packing_problem ). So finding optimal solution is pretty hard.
A heuristic that works most of the time:
Sort the groups according decreasing size.
For each group put it in the table that has least amount of space, but still can accommodate this group.
Your approach is workable. If a solution exists for a given number of tables, then a solution exists where you've split every group into some number of twos and some number of threes. First, split a three off of every group of odd size. You're left with a bunch of groups of even size. Next, split twos off of every group whose size isn't divisible by six. And forget that it's one bigger group; split it into a bunch of groups of six.
At this point, you have split all of your groups into some number of twos, some number of threes, and some number of sixes. Give each table of odd size one three, splitting sixes as necessary; now all tables have even size. All remaining sixes can now be split into twos and seated arbitrarily.

How do I calculate the most profit-dense combination in the most efficient way?

I have a combinations problem that's bothering me. I'd like someone to give me their thoughts and point out if I'm missing some obvious solution that I may have overlooked.
Let's say that there is a shop that buys all of its supplies from one supplier. The supplier has a list of items for sale. Each item has the following attributes:
size, cost, quantity, m, b
m and b are constants in the following equation:
sales = m * (price) + b
This line slopes downward. The equation tells me how many of that item I will be able to sell if I charge that particular price. Each item has its own m and b values.
Let's say that the shop has limited storage space, and limited funds. The shop wants to fill its warehouse with the most profit-dense items possible.
(By the way, profit density = profit/size. I'm defining that profit density be only with regard to the items size. I could work with the density with regard to size and cost, but to do that I'd have to know the cost of warehouse space. That's not a number I know currently, so I'm just going to use size.)
The profit density of items drops the more you buy (see below.)
If I flip the line equation, I can see what price I'd have to charge to sell some given amount of the item in some given period of time.
price = (sales-b)/m
So if I buy n items and wanted to sell all of them, I'd have to charge
price = (n-b)/m
The revenue from this would be
price*n = n*(n-b)/m
The profit would be
price*n-n*cost = n*(n-b)/m - n*cost
and the profit-density would be
(n*(n-b)/m - n*cost)/(n*size)
or, equivalently
((n-b)/m - cost)/size
So let's say I have a table containing every available item, and each item's profit-density.
The question is, how many of each item do I buy in order to maximise the amount of money that the shop makes?
One possibility is to generate every possible combination of items within the bounds of cost and space, and choose the combo with the highest profitability. In a list of 1000 items, this takes too long. (I tried this and it took 17 seconds for a list of 1000. Horrible.)
Another option I tried (on paper) was to take the top two most profitable items on the list. Let's call the most profitable item A, the 2nd-most profitable item B, and the 3rd-most profitable item C. I buy as many of item A as I can until it's less profitable than item B. Then I repeat this process using B and C, for every item in the list.
It might be the case however, that after buying item B, item A is again the most profitable item, more so than C. So this would involve hopping from the current most profitable item to the next until the resources are exhausted. I could do this, but it seems like an ugly way to do it.
I considered dynamic programming, but since the profit-densities of the items change depending on the amount you buy, I couldn't come up with a resolution for this.
I've considered multiple-linear regression, and by 'consider' I mean I've said to myself "is multi-linear regression an option?" and then done nothing with it.
My spidey-sense tells me that there's a far more obvious method staring me in the face, but I'm not seeing it. Please help me kick myself and facepalm at the same time.
If you treat this as a simple exercise in multivariate optimization, where the controllable variables are the quantities bought, then you are optimizing a quadratic function subject to a linear constraint.
If you use a Lagrange multiplier and differentiate then you get a linear equation for each quantity variable involving itself and the Lagrange multiplier as the only unknowns, and the constraint gives you a single linear equation involving all of the quantities. So write each quantity as a linear function of the Lagrange multiplier and substitute into the constraint equation to get a linear equation in the Lagrange multiplier. Solve this and then plug the Lagrange multiplier into the simpler equations to get the quantities.
This gives you a solution if you are allowed to buy fractional and negative quantities of things if required. Clearly you are not, but you might hope that nothing is very negative and you can round the non-integer quantities to get a reasonable answer. If this isn't good enough for you, you could use it as a basis for branch and bound. If you make an assumption on the value of one of the quantities and solve for the others in this way, you get an upper bound on the possible best answer - the profit predicted neglecting real world constraints on non-negativity and integer values will always be at least the profit earned if you have to comply with these constraints.
You can treat this as a dynamic programming exercise, to make the best use of a limited resource.
As a simple example, consider just satisfying the constraint on space and ignoring that on cost. Then you want to find the items that generate the most profit for the available space. Choose units so that expressing the space used as an integer is reasonable, and then, for i = 1 to number of items, work out, for each integer value of space up to the limit, the selection of the first i items that gives the most return for that amount of space. As usual, you can work out the answers for i+1 from the answers for i: for each value from 0 up to the limit on space just consider all possible quantities of the i+1th item up to that amount of space, and work out the combined return from using that quantity of the item and then using the remaining space according to the answers you have already worked out for the first i items. When i reaches the total number of items you will be working out the best possible return for the problem you actually want to solve.
If you have constraints for both space and cost, then the state of the dynamic program is not the single variable (space) but a pair of variables (space, cost) but you can still solve it, although with more work. Consider all possible values of (space, cost) from (0, 0) up to the actual constraints - you have a 2-dimensional table of returns to compute instead of a single set of values from 0 to max-space. But you can still work from i=1 to N, computing the highest possible return for the first i items for each limit of (space, cost) and using the answers for i to compute the answers for i+1.

Finding subsets being used at most k times

Every now and then I read all those conspiracy theories about Lotto-based games being controlled and a computer browsing through the combinations chosen by the players and determining the non-used subset. It got me thinking - how would such algorithm have to work in order to determine such subsets really efficiently? Finding non-used numbers is definitely crossed out as is finding the least used because it's not necesserily providing us with a solution. Also, going deeper, how could an algorithm efficiently choose such a subset that it was used some k times by the players? Saying more formally:
We are given a set of 50 numbers 1 to 50. In the draw 6 numbers are picked.
INPUT: m subsets each consisting of 6 distinct numbers 1 to 50 each,
integer k (0<=k) being the maximum players having all of their 6
numbers correct.
OUTPUT: Subsets which make not more than k players win the jackpot ('winning' means all the numbers they chose were picked in the draw).
Is there any efficient algorithm which could calculate this without using a terrabyte HDD to store all the encounters of every possible 50!/(44!*6!) in the pessimistic case? Honestly, I can't think of any.
If I wanted to run such a conspirancy I would first of all acquire the list of submissions by players. Then I would generate random lottery selections and see how many winners would be produced by each such selection. Then just choose the random lottery selection most attractive to me. There is little point doing anything more sophisticated, because that is probably already powerful enough to be noticed by staticians.
If you want to corrupt the lottery it would probably be easier and safer to select a few competitors you favour and have them win the lottery. In (the book) "1984" I think the state simply announced imaginary lottery winners, with the announcement in each area announcing somebody outside the area. One of the ideas in "The Beckoning Lady" by Margery Allingham is of a gang who attempt to set up a racecourse so they can rig races to allow them to disguise bribes as winnings.
First of all, the total number of combinations (choosing 6 from 50) is not very large. It is about 16 million which can be easily handled.
For each combination keep a count of number of people who played it. While declaring a winner choose the combination that has less than k plays.
If the number within each subset are sorted, then you can treat your subsets as strings - sort them in lexicographical order, then it is easy to count how many players selected each subset (and which subsets were not selected at all). So the time is proportional to the number of players and not the number of numbers in the lottery.

Can I do better than binary search here?

I want to pick the top "range" of cards based upon a percentage. I have all my possible 2 card hands organized in an array in order of the strength of the hand, like so:
AA, KK, AKsuited, QQ, AKoff-suit ...
I had been picking the top 10% of hands by multiplying the length of the card array by the percentage which would give me the index of the last card in the array. Then I would just make a copy of the sub-array:
Arrays.copyOfRange(cardArray, 0, 16);
However, I realize now that this is incorrect because there are more possible combinations of, say, Ace King off-suit - 12 combinations (i.e. an ace of one suit and a king of another suit) than there are combinations of, say, a pair of aces - 6 combinations.
When I pick the top 10% of hands therefore I want it to be based on the top 10% of hands in proportion to the total number of 2 cards combinations - 52 choose 2 = 1326.
I thought I could have an array of integers where each index held the combined total of all the combinations up to that point (each index would correspond to a hand from the original array). So the first few indices of the array would be:
6, 12, 16, 22
because there are 6 combinations of AA, 6 combinations of KK, 4 combinations of AKsuited, 6 combinations of QQ.
Then I could do a binary search which runs in BigOh(log n) time. In other words I could multiply the total number of combinations (1326) by the percentage, search for the first index lower than or equal to this number, and that would be the index of the original array that I need.
I wonder if there a way that I could do this in constant time instead?
As Groo suggested, if precomputation and memory overhead permits, it would be more efficient to create 6 copies of AA, 6 copies of KK, etc and store them into a sorted array. Then you could run your original algorithm on this properly weighted list.
This is best if the number of queries is large.
Otherwise, I don't think you can achieve constant time for each query. This is because the queries depend on the entire frequency distribution. You can't look only at a constant number of elements to and determine if it's the correct percentile.
had a similar discussion here Algorithm for picking thumbed-up items As a comment to my answer (basically what you want to do with your list of cards), someone suggested a particular data structure, http://en.wikipedia.org/wiki/Fenwick_tree
Also, make sure your data structure will be able to provide efficient access to, say, the range between top 5% and 15% (not a coding-related tip though ;).

A packing algorithm ... kind of

Given an array of items, each of which has a value and cost, what's the best algorithm determine the items required to reach a minimum value at the minimum cost? eg:
Item: Value -> Cost
-------------------
A 20 -> 11
B 7 -> 5
C 1 -> 2
MinValue = 30
naive solution: A + B + C + C + C. Value: 30, Cost 22
best option: A + B + B. Value: 34, Cost 21
Note that the overall value:cost ratio at the end is irrelevant (A + A would give you the best value for money, but A + B + B is a cheaper option which hits the minimum value).
This is the knapsack problem. (That is, the decision version of this problem is the same as the decision version of the knapsack problem, although the optimization version of the knapsack problem is usually stated differently.) It is NP-hard (which means no algorithm is known that is polynomial in the "size" -- number of bits -- in the input). But if your numbers are small (the largest "value" in the input, say; the costs don't matter), then there is a simple dynamic programming solution.
Let best[v] be the minimum cost to get a value of (exactly) v. Then you can calculate the values best[] for all v, by (initializing all best[v] to infinity and):
best[0] = 0
best[v] = min_(items i){cost[i] + best[v-value[i]]}
Then look at best[v] for values upto the minimum you want plus the largest value; the smallest of those will give you the cost.
If you want the actual items (and not just the minimum cost), you can either maintain some extra data, or just look through the array of best[]s and infer from it.
This problem is known as integer linear programming. It's NP-hard.
However, for small problems like your example, it's trivial to make a quick few lines of code to simply brute force all the low combinations of purchase choices.
NP-harddoesn't mean impossible or even expensive, it means your problem becomes rapidly slower to solve with larger scale problems. In your case with just three items, you can solve this in mere microseconds.
For the exact question of what's the best algorithm in general.. there are entire textbooks on it. A good start is good old Wikipedia.
Edit This answer is redacted on account of being factually incorrect. Following the advice in this will only cause you harm.
This is not actually the knapsack problem, because it assumes that you cannot pack more items than there is space for in some container. In you case you want to find the cheapest combination that will fill up the space, allowing for the fact that overflow may occur.
My solution, which I don't know is the optimal but it should be pretty close, would be to compute for each item the cost benefit ratio, find the item with the highest cost benefit and fill the structure with this item until there isn't space for one more item. Then I would test to see if there was a combination with any of the other available items that could fill the available slot for less that the cost of one of the cheapest items and then if such a solution exist, use that combination otherwise use one more of the cheapest items.
Amenddum This may actually also be NP-complete, but I am not sure yet. Anyway for all practical purposes this variation should be much faster than the naive solution.

Resources