Partitioning floating-point array - algorithm

While writing code, i found the following problem, to state it in a simple way:
Partition an array of floats X in array A and B such that the difference between the sum of the values in A and the sum of values of B is minimized
This was part of an investigation I was doing, but I can't find a way to efficiently perform this operation.
Edit:
To answer to those who believe this is from a math contest like PE, SPOJ or homework, it is not. I just had curiosity about this when i was trying to partition an already factorized number p in the set of factors a and b such that b=a+1. If we take logs from both sides, we can show this problem is equivalent to minimize a diference of sums, but that is where i have got stuck.

Just a first simple idea. Use dynamic programming methods.
I assume that this problem can be transformed to knapsack problem. You need to pick items from X (there'll be array A) to maximize sum but don't exceed (sumX - sumA) value (there'll be sum of items from array B). For algorithm to solve knapsack problem by dynamic programming approach look at wiki e.g.
This solution can be wrong, btw... but even if it'll work I'm more than sure that more efficient, elegant and short solutions exist.

Related

Select N items such that their properties are balanced

Lets say I have N objects and each of them has associated values A and B. This could be represented as a list of tuples like:
[(3,10), (8,4), (0,0), (20,7),...]
where each tuple is an object and the two values are A and B.
What I want to do is select M of these objects (where M < N) such that the sums of A and B in the selected subset is as balanced as possible. M here is a parameter of the problem, I don't want to find the optimal M. I want to be able to say "give me 100 objects, and make them as balanced as possible".
Any idea if there an efficient algorithm which can solve this problem (not necessarily completely optimally)? I think this might be related to bin-packing, but I'm not really sure.
This is a disguised variant of subset-sum. Replace each (A,B) by A-B, and then the absolute value of the sum of all selected A-B values is the "unbalancedness" of the sums. So you really have to select M of those scalars and try to have a sum as close to 0 as possible.
The "variant" bit is because you have to select exactly M items. (I think this is why your mind went to bin-packing rather than subset-sum.) If you have a black-box subset-sum solver you can account for this too: if the maximum single-pair absolute difference is D, replace each (A,B) by (A-B+D) and have the target sum be M*D. (Don't do that if you're doing a dynamic programming approach, of course, since it increases the magnitude of the numbers you're working with.)
Presuming that you're fine with an approximation (and if you're not, you're gonna have a real bad day) I would tend to use Simulated Annealing or Late Acceptance Hill Climbing as a basic approach, starting with a greedy initial solution (iteratively add whichever object results in the minimal difference), and then in each round, considering randomly replacing one object by one not-currently-selected object.

Algorithm to find smallest set of objects with a total combined value bewteen X and Y

I have a set of Objects (around a 100), and each one has a float value (realistically from -10000 to 10000, but let's assume there are no limits).
My objective is to find the smallest set of those objects (as few as possible) of which the total combined value would be between variables X and Y.
I believe I could tackle this task with some Evolutionary Algorithms, but I was wondering if there was a simpler mathematical solution to this?
I am programming in PHP, but I don't believe that's relevant and I could use any ideas on algorithm/pseudocode.
Thank you!
Your problem looks like a variant of knapsack problem. There is no easy way of solving this problem on a scale - bruteforce will work with the smallest instances. For moderatly large problems you can use dynamic programming. In general, you might be using mixed integer programming, various metaheuristics or constraint satisfaction.
To my opinion, the last one should be the best for you, for example, consider Minizinc. It is really easy to use and it's quite efficient in terms of runtime/memory consumption. For example, consider this example of solving knapsack problem.
So you can just generate textual representation of your problem, feed it to Minizinc and read back the solutions.
Using Python you can safely go for the route of combinations.
Combinations are iterables: each time you iterate on a combination it returns you a unique set of objects.
You can start with sets of very low sizes (2, 3, 4 objects...) and then perform a sum of the values and make your check.
To perform a sum of the values I would recommend you take a look at numpy.arrays, they will speed up the process.
In behind, combinations use factorial arithmetics, this means nearly exponential number of possible combinations the larger your combination is.
More info:
Numpy Array
Itertools Combinations
This will end in a bruteforce for which you can expect to check for thousands of combinations per seconds. I don't know if that's the best algorithm, but it is simple and it works.
Example code:
from itertools import combinations
import numpy
a = [1,-2,3,-4,5,-6,7,-8] # Any list of float or integers...
min = 5
max = 10
size = 3
c = combinations(a, size)
for combi in c:
a = numpy.array(combi)
if (a.sum() in range(min, max)):
print('Result found for '+str(combi))
break

Lookup table and dynamic programming

In old games era, we are used to have a look-up table of pre-computed values of sin and cos,..etc, due to the slowness of computing those values in that old CPUs.
Is that considered a dynamic programming technique ? or dynamic programming must solve a recursive function that is always computed or sort of ?
Update:
In dynamic programming the key is to have a memoization table, which is the solution for the sin,cos look up table, so what is really the difference in the technique ?
I'd say for what I see in your question no it's not dynamic programming. Dynamic programming is more about solving problems by solving smaller subproblem and create way to get solution of problem from smaller subproblem.
Your situation looks more like memoization.
For me it could be considered DP if your problem was to compute cos N and you have formula to calculate cos i from array of cos 0, cos 1, ..., cos i - 1, so you calculate cos 1, sin 1 and run you calculation for i from 0 to N.
May be somebody will correct me :)
There's also interesting quote about how dynamic programming differ from divide-and-conquer paradigm:
There are two key attributes that a problem must have in order for
dynamic programming to be applicable: optimal substructure and
overlapping subproblems. If a problem can be solved by combining
optimal solutions to non-overlapping subproblems, the strategy is
called "divide and conquer" instead. This is why mergesort and
quicksort are not classified as dynamic programming problems.
Dynamic programming is the programming technique where you solve a difficult problem by splitting it in smaller problems, which are not independent (this is important!).
Even if you could compute cos i from cos i -1, this would still not be dynamic programming, just recursion.
Dynamic programming classic example is the knapsack problem: http://en.wikipedia.org/wiki/Knapsack_problem
You want to fill a knapsack of size W, with N objects, each one with its size and value.
Since you don't know which permutation of objects will be the best, you "try" everyone.
Recurrence equation will be something like:
OPT(m,w) = MAX ( OPT(m-1, w), //if I don't take this object
OPT(m-1, w - w(m)) //If i take it
Adding the initial case, this is how you solve the problem. Of course you should build the solution starting with m = 0, w = 0 and iterating until m = N and w = W, so that you can reuse previously calculated values.
Using this technique, you can find the optimal combination of objects to bring into the knapsack in just N*W time (which is not polynomial in the input size, of course, otherwise P = NP and no one wants that!), instead of an exponential number of computation steps.
No I don't think this is dynamic programming. Due to limited computing power the values of sine and cosine were fed as pre-computed values which are just like other numeric constants.
For a problem to be solved in dynamic programming technique there are many essential conditions. One of the important condition is that we should be able to break problem into recursive solvable sub-problem, the result these sub-problem of which can be then used as look-up table to replace higher chain in recursion. So it is both recursion and memory.
For more info you can refer Wikipedia link.
http://en.wikipedia.org/wiki/Dynamic_programming
Also the lecture 19 of this course will give you an overview of dynamic programming.
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-19-dynamic-programming-i-fibonacci-shortest-paths/

Find the priority function / alphabet order for extreme higher order elements relation

This question is an extension to the following one. The difference is that now our function to optimize will have higher order relations between elements:
We have an array of elements a1,a2,...aN from an alphabet E. Assuming |N| >> |E|.
For each symbol of the alphabet we define an unique integer priority = V(sym). Let's define V{i} := V(symbol(ai)) for the simplicity.
The task is to find a priority function V for which:
Count(i)->MIN | V{i} > V{i+1} <= V{i+2}
In other words, I need to find the priorities / permutation of the alphabet for which the number of positions i, satisfying the condition V{i}>V{i+1}<=V{i+2}, is minimum.
Maximum required abstraction (low priority for me). I guess once the solution model for the initial question is extended to cover the first part of this one, extending it farther (see below) will be easier.
Given a matrix of signs B of size MxK (basically B[i,j] is from the set {<,>,<=,>=}), find the priority function V for which:
Sum(for all j in range [1,M]) {Count(i)}->EXTREMUM | V{i} B[j,1] V{i+1} B[j,2] ... B[j,K] V{i+K}
As an example, find the priority function V, for which the number of i, satisfying V{i}<V{i+1}<V{i+2} or V{i}>V{i+1}>V{i+2}, is minimum.
My intuition is that all variations on this problem will prove to be NP-hard. So I'd begin looking for heuristics that produce reasonable answers. This may involve some trial and error.
A simplistic approach is to write down a possible permutation. And then try possible swaps until you've arrived at a local minimum. Try several times, and pick the best answer.
Simulated annealing provides a more sophisticated version of this approach, see http://en.wikipedia.org/wiki/Simulated_annealing for a description. It may take some experimentation to find a set of parameters that seems to converge relatively well.
Another idea is to look for a genetic algorithm. Based on a quick Google search it looks like the standard way to do this is to try to turn an NP-complete problem into a SAT problem, and then use a genetic algorithm on that problem. This approach would require turning this into a SAT problem in some reasonable way. Unfortunately it is not obvious to me how one would go about doing this reduction. Indeed in the first version that you had, your problem was closely connected to a classic NP-hard problem. The fact that it is labeled NP-hard rather than NP-complete is evidence that people haven't found a good way to transform it into a SAT problem. So if it isn't obvious how to turn the simple version into a SAT problem, then you are unlikely to convert the hard problem either.
But you could still try some variation on genetic algorithms. Mutation is pretty simple, just swap some elements around. One way to combine elements would be to take 3 permutations and use quicksort to find the combination as follows: take a random pivot, and then use "majority wins" to bucket elements into bigger and smaller. Sort each half in the same way.
I'm sorry that I can't just give you an approach and say, "This should work." You've got what looks like an open-ended research project, and the best I can do is give you some ideas about things you can try that might work reasonably well.

What's the most insidious way to pose this problem?

My best shot so far:
A delivery vehicle needs to make a series of deliveries (d1,d2,...dn), and can do so in any order--in other words, all the possible permutations of the set D = {d1,d2,...dn} are valid solutions--but the particular solution needs to be determined before it leaves the base station at one end of the route (imagine that the packages need to be loaded in the vehicle LIFO, for example).
Further, the cost of the various permutations is not the same. It can be computed as the sum of the squares of distance traveled between di -1 and di, where d0 is taken to be the base station, with the caveat that any segment that involves a change of direction costs 3 times as much (imagine this is going on on a railroad or a pneumatic tube, and backing up disrupts other traffic).
Given the set of deliveries D represented as their distance from the base station (so abs(di-dj) is the distance between two deliveries) and an iterator permutations(D) which will produce each permutation in succession, find a permutation which has a cost less than or equal to that of any other permutation.
Now, a direct implementation from this description might lead to code like this:
function Cost(D) ...
function Best_order(D)
for D1 in permutations(D)
Found = true
for D2 in permutations(D)
Found = false if cost(D2) > cost(D1)
return D1 if Found
Which is O(n*n!^2), e.g. pretty awful--especially compared to the O(n log(n)) someone with insight would find, by simply sorting D.
My question: can you come up with a plausible problem description which would naturally lead the unwary into a worse (or differently awful) implementation of a sorting algorithm?
I assume you're using this question for an interview to see if the applicant can notice a simple solution in a seemingly complex question.
[This assumption is incorrect -- MarkusQ]
You give too much information.
The key to solving this is realizing that the points are in one dimension and that a sort is all that is required. To make this question more difficult hide this fact as much as possible.
The biggest clue is the distance formula. It introduces a penalty for changing directions. The first thing an that comes to my mind is minimizing this penalty. To remove the penalty I have to order them in a certain direction, this ordering is the natural sort order.
I would remove the penalty for changing directions, it's too much of a give away.
Another major clue is the input values to the algorithm: a list of integers. Give them a list of permutations, or even all permutations. That sets them up to thinking that a O(n!) algorithm might actually be expected.
I would phrase it as:
Given a list of all possible
permutations of n delivery locations,
where each permutation of deliveries
(d1, d2, ...,
dn) has a cost defined by:
Return permutation P such that the
cost of P is less than or equal to any
other permutation.
All that really needs to be done is read in the first permutation and sort it.
If they construct a single loop to compare the costs ask them what the big-o runtime of their algorithm is where n is the number of delivery locations (Another trap).
This isn't a direct answer, but I think more clarification is needed.
Is di allowed to be negative? If so, sorting alone is not enough, as far as I can see.
For example:
d0 = 0
deliveries = (-1,1,1,2)
It seems the optimal path in this case would be 1 > 2 > 1 > -1.
Edit: This might not actually be the optimal path, but it illustrates the point.
YOu could rephrase it, having first found the optimal solution, as
"Give me a proof that the following convination is the most optimal for the following set of rules, where optimal means the smallest number results from the sum of all stage costs, taking into account that all stages (A..Z) need to be present once and once only.
Convination:
A->C->D->Y->P->...->N
Stage costs:
A->B = 5,
B->A = 3,
A->C = 2,
C->A = 4,
...
...
...
Y->Z = 7,
Z->Y = 24."
That ought to keep someone busy for a while.
This reminds me of the Knapsack problem, more than the Traveling Salesman. But the Knapsack is also an NP-Hard problem, so you might be able to fool people to think up an over complex solution using dynamic programming if they correlate your problem with the Knapsack. Where the basic problem is:
can a value of at least V be achieved
without exceeding the weight W?
Now the problem is a fairly good solution can be found when V is unique, your distances, as such:
The knapsack problem with each type of
item j having a distinct value per
unit of weight (vj = pj/wj) is
considered one of the easiest
NP-complete problems. Indeed empirical
complexity is of the order of O((log
n)2) and very large problems can be
solved very quickly, e.g. in 2003 the
average time required to solve
instances with n = 10,000 was below 14
milliseconds using commodity personal
computers1.
So you might want to state that several stops/packages might share the same vj, inviting people to think about the really hard solution to:
However in the
degenerate case of multiple items
sharing the same value vj it becomes
much more difficult with the extreme
case where vj = constant being the
subset sum problem with a complexity
of O(2N/2N).
So if you replace the weight per value to distance per value, and state that several distances might actually share the same values, degenerate, some folk might fall in this trap.
Isn't this just the (NP-Hard) Travelling Salesman Problem? It doesn't seem likely that you're going to make it much harder.
Maybe phrasing the problem so that the actual algorithm is unclear - e.g. by describing the paths as single-rail railway lines so the person would have to infer from domain knowledge that backtracking is more costly.
What about describing the question in such a way that someone is tempted to do recursive comparisions - e.g. "can you speed up the algorithm by using the optimum max subset of your best (so far) results"?
BTW, what's the purpose of this - it sounds like the intent is to torture interviewees.
You need to be clearer on whether the delivery truck has to return to base (making it a round trip), or not. If the truck does return, then a simple sort does not produce the shortest route, because the square of the return from the furthest point to base costs so much. Missing some hops on the way 'out' and using them on the way back turns out to be cheaper.
If you trick someone into a bad answer (for example, by not giving them all the information) then is it their foolishness or your deception that has caused it?
How great is the wisdom of the wise, if they heed not their ego's lies?

Resources