Given a non-negative integer n and a positive real weight vector w with dimension m, partition n into a length-m non-negative integer vector that sums to n (call it v) such that max w_iv_i is the smallest, that is, we want to find the vector v such that the maximum of element-wise product between w and v is the smallest. There maybe several partitions, and we only want the smallest value of max w_iv_i among all possible v.
Seems like this problem can use a greedy algorithm to solve. From a target vector v for n-1, we add 1 to each entry, and find the minimum among those m vectors. but I don't think it's correct. The intuition is that it might add "over" the minimum. That is, there exists another partition not yielded by the add 1 procedure that falls in between the "minimum" of n-1 produced by this greedy algorithm and that of n produced by this greedy algorithm. Can anyone prove if this is correct or incorrect?
If you already know the maximum element-wise product P, then you can just set vi = floor(P/wi) until you run out of n.
Use binary search to find the smallest possible value of P.
The largest guess you need to try is n * min(w), so that means testing log(n) + log(min(w)) candidates, spending O(m) time for each test, or O(m*(log n + log(min(w))) all together.
Given a non-negative integer $n$ and a positive real weight vector $w$ with dimension $m$, partition $n$ into a length-$m$ non-negative integer vector that sums to $n$ (call it $v$) such that $w\cdot v$ is the smallest. There maybe several partitions, and we only want the value of $w\cdot v$.
Seems like this problem can use a greedy algorithm to solve. From a target vector for $n-1$, we add 1 to each entry, and find the minimum among those $m$ vectors. but I don't think it's correct. The intuition is that it might add "over" the minimum. That is, there exists another partition not yielded by the add 1 procedure that falls in between the "minimum" of $n-1$ produced by this greedy algorithm and that of $n$ produced by this greedy algorithm. Can anyone prove if this is correct or incorrect?
Without loss of generality, assume that the elements of w are non-decreasing. Let v be a m-vector whose values are non-negative integers that sum to n. Then the smallest inner product of v and w is achieved by setting v[0] = n and v[i] = 0 for i > 0.
This is easy to prove. Suppose v is any other vector with v[i] > 0 for some i > 0. Then we can increase v[0] by v[i] and reduce v[i] to zero. The elements of v will still sum to n and the inner product of v and w will be reduced by w[i] - w[0] >= 0.
So, say you have a collection of value pairs on the form {x, y}, say {1, 2}, {1, 3} & {2, 5}.
Then you have to find a subset of k pairs (in this case, say k = 2), such that the ratio of the sum of all x in the subset divided by all the y in the subset is as high as possible.
Could you point me in the direction for relevant theory or algorithms?
It's kind of like maximum subset sum, but since the pairs are "bound" to each other it introduces a restriction that changes it from problems known to me.
Initially I thought that a simple greedy approach could work here, but commentators pointed out some counter examples.
Instead I think a bisection approach should work.
Suppose we want to know whether it is possible to achieve a ratio of g.
We need to add a selection of k vectors to end up above a line of gradient g.
If we project each vector perpendicular to this line to get values p1,p2,p3, then the final vector will be above the line if and only if the sum of the p values is positive.
Now, with the projected values it does seem right that the optimal solution is to choose the largest k.
We can then use bisection to find the highest ratio that is achievable.
Mathematical justification
Suppose we want to have the ratio above g, i.e.
(x1+x2+x3)/(y1+y2+y3) >= g
=> (x1+x2+x3) >= g(y1+y2+y3)
=> (x1-g.y1) + (x2-g.y2) + (x3-g.y3) >= 0
=> p1 + p2 + p3 >= 0
where pi is defined to be xi-g.yi.
As I am not very proficient in various optimization/tree algorithms, I am seeking help.
Problem Description:
Assume, a large sequence of sorted nodes is given with each node representing an integer value L. L is always getting bigger with each node and no nodes have the same L.
The goal now is to find the best combination of nodes, where the difference between the L-values of subsequent nodes is closest to a given integer value M(L) that changes over L.
Example:
So, in the beginning I would have L = 50 and M = 100. The next nodes have L = 70,140,159,240,310.
First, the value of 159 seems to be closest to L+M = 150, so it is chosen as the right value.
However, in the next step, M=100 is still given and we notice that L+M = 259, which is far away from 240.
If we now go back and choose the node with L=140 instead, which then is followed by 240, the overall match between the M values and the L-differences is stronger. The algorithm should be able to find back to the optimal path, even if a mistake was made along the way.
Some additional information:
1) the start node is not necessarily part of the best combination/path, but if required, one could first develop an algorithm, which chooses the best starter candidate.
2) the optimal combination of nodes is following the sorted sequence and not "jumping back" -> so 1,3,5,7 is possible but not 1,3,5,2,7.
3) in the end, the differences between the L values of chosen nodes should in the mean squared sense be closest to the M values
Every help is much appreciated!
If I understand your question correctly, you could use Dijktras algorithm:
https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
http://www.mathworks.com/matlabcentral/fileexchange/20025-dijkstra-s-minimum-cost-path-algorithm
For that you have to know your neighbours of every node and create an Adjacency Matrix. With the implementation of Dijktras algorithm which I posted above you can specify edge weights. You could specify your edge weight in a manner that it is L of the node accessed + M. So for every node combination you have your L of new node + M. In that way the algorithm should find the optimum path between your nodes.
To get all edge combinations you can use Matlabs graph functions:
http://se.mathworks.com/help/matlab/ref/graph.html
If I understand your problem correctly you need an undirected graph.
You can access all edges with the command
G.Edges after you have created the graph.
I know its not the perfect answer but I hope it helps!
P.S. Just watch out, Djikstras algorithm can only handle positive edge weights.
Suppose we are given a number M and a list of n numbers, L[1], ..., L[n], and we want to find a subsequence of at least q of the latter numbers that minimises the sum of squared errors (SSE) with respect to M, where the SSE of a list of k positions x[1], ..., x[k] with respect to M is given by
SSE(M, x[1], ..., x[k]) = sum((L[x[i]]-L[x[i-1]]-M)^2) over all 2 <= i <= k,
with the SSE of a list of 0 or 1 positions defined to be 0.
(I'm introducing the parameter q and associated constraint on the subsequence length here because without it, there always exists a subsequence of length exactly 2 that achieves the minimum possible SSE -- and I'm guessing that such a short sequence isn't helpful to you.)
This problem can be solved in O(qn^2) time and O(qn) space using dynamic programming.
Define f(i, j) to be the minimum sum of squared errors achievable under the following constraints:
The number at position i is selected, and is the rightmost selected position. (Here, i = 0 implies that no positions are selected.)
We require that at least j (instead of q) of these first i numbers are selected.
Also define g(i, j) to be the minimum of f(k, j) over all 0 <= k <= i. Thus g(n, q) will be the minimum sum of squared errors achievable on the entire original problem. For efficient (O(1)) calculation of g(i, j), note that
g(i>0, j>0) = min(g(i-1, j), f(i, j))
g(0, 0) = 0
g(0, j>0) = infinity
To calculate f(i, j), note that if i > 0 then any solution must be formed by appending the ith position to some solution Y that selects at least j-1 positions and whose rightmost selected position is to the left of i -- i.e. whose rightmost selected position is k, for some k < i. The total SSE of this solution to the (i, j) subproblem will be whatever the SSE of Y was, plus a fixed term of (L[x[i]]-L[x[k]]-M)^2 -- so to minimise this total SSE, it suffices to minimise the SSE of Y. But we can compute that minimum: it is g(k, j-1).
Since this holds for any 0 <= k < i, it suffices to try all such values of k, and take the one that gives the lowest total SSE:
f(i>=j, j>=2) = min of (g(k, j-1) + (L[x[i]]-L[x[k]]-M)^2) over all 0 <= k < i
f(i>=j, j<2) = 0 # If we only need 0 or 1 position, SSE is 0
f(i, j>i) = infinity # Can't choose > i positions if the rightmost chosen position is i
With the above recurrences and base cases, we can compute g(n, q), the minimum possible sum of squared errors for the entire problem. By memoising values of f(i, j) and g(i, j), the time to compute all needed values of f(i, j) is O(qn^2), since there are at most (n+1)*(q+1) possible distinct combinations of input parameters (i, j), and computing a particular value of f(i, j) requires at most (n+1) iterations of the loop that chooses values of k, each iteration of which takes O(1) time outside of recursive subcalls. Storing solution values of f(i, j) requires at most (n+1)*(q+1), or O(qn), space, and likewise for g(i, j). As established above, g(i, j) can be computed in O(1) time when all needed values of f(x, y) have been computed, so g(n, q) can be computed in the same time complexity.
To actually reconstruct a solution corresponding to this minimum SSE, you can trace back through the computed values of f(i, j) in reverse order, each time looking for a value of k that achieves a minimum value in the recurrence (there may in general be many such values of k), setting i to this value of k, and continuing on until i=0. This is a standard dynamic programming technique.
I now answer my own post with my current implementation, in order to structure my post and load images. Unfortunately, the code does not do what it should do. Imagine L,M and q given like in the images below. With the calcf and calcg functions I calculated the F and G matrices where F(i+1,j+1) is the calculated and stored f(i,j) and G(i+1,j+1) from g(i,j). The SSE of the optimal combination should be G(N+1,q+1), but the result is wrong. If anyone found the mistake, that would be much appreciated.
G and F Matrix of given problem in the workspace. G and F are created by calculating g(N,q) via calcg(L,N,q,M).
calcf and calcg functions
The collapsing knapsack problem is a generalization of the ordinary knapsack problem, where the knapsack capacity is a non-increasing function of the number of items included.
Does anyone know anything (name, literature, algorithms...) about a variant where the knapsack capacity changes depending which items you select (i.e., the domain is the powerset of the items) instead of the number of items?
For general value of the 'capacities', I believe you will need to do some kind of enumeration on the set elements. If I understand correctly, it corrispond more or less to an arbitrary boolean that say wether a subset is feasible (the sum of the weights of its elements is lower than its capacity) or not.
A 'capacity' in the knapsack problem is something that appears at the right-hand-side of the constraint, i.e.
sum p_i x_i <= C
in the classical knapsack and
sum p_i x_i <= C (sum x_i)
in the collapsing knapsack.
Because those are linear constraint, they behave in a somehow predictible way, which avoid to look at all the possible combinations (the elements of the power set) to solve the problem.
Now if you have an arbitrary capacity value C_J for each element of the power set, your capacity is not a predictible function of the vector x, so your only way to remove a subset J from the list that has to be examined is if its value (sum_J a_i x_i) is lower than the value of one of the subset that you already found to be feasible (you have no information whatsoever from the capacity).
This means in particular that there is no way to model this with an integer program, because it would require at least one constraint for each C_J (just computing the cost for each feasible subset will be more efficient).
I would go with an enumeration algorithm and try to reduce the search tree as much as possible.
Let us order the items by non-increasing value a_0 >= a_1 >= ... >= a_n.
We can look all all the possible subsets by decreasing cardinality. This is because for some cardinal number k, you know that the best possible subset with cardinality at most k will have a value of M_k = sum_{i=0}^k a_i, so you will be able to stop your search before examining all the subsets (I can't think of another way to cut the search tree).
The algorithm would be:
Start with M := 0 and k=n.
Repeat:
find the best subset with cardinality k if its value A is better than M
M := max (A, M): value of the best subset found so far
if M >= M_{k-1}, stop: we found the optimum
else k := k-1
To search the best subset of cardinality k, you can use the order of the a_i:
start with {0, ..., k} and recursively examine the subsets {0} U J' with J' a subset of cardinality k-1 of {1, ..., n},
then examine all the subsets of the form {1} U J' with J'a subset of cardinality k-1 of {2, ..., n}, etc.
As soon as you found a feasible subset, update the bound M.
This is again because the subsets of cardinality k that do not contain a_0, ..., a_i are bounded by a_{i+1} + ... + a_{i+k+1}, and you can stop as soon as this is lower than the current bound M.
Note:
I assumed no hypothesis on the capacity C_J. It is certainly interesting to know if the capacity are increasing in the set-theory sense, i.e. if I included in J implies C_I <= C_J.