Formulating a bilinear optimization program as an integer linear program - integer-programming

In my work, I came across the following problem: Given a similarity matrix D, where $d_{i,j} \in \Re$ represents the similarity between objects $i$ and $j$, I would like to select $k$ objects, for $k \in {1, \dots, n}$, in such a way that minimizes the similarity between the selected objects. My first attempt to formally formulate this problem was using the following integer program:
$\minimize$ $d_{1,2}X_1X_2 + d_{1,3}X_1X_3 + \dots + d_{1,n}X_1X_n + d_{2,1}X_2X_1 + \dots + d_{n,n-1}X_nX_{n-1} $
such that $X_1 + X_2 + \dots + X_n = k$ and $X_y \in {0,1}$, for $y=1,\dots,n$
In the above program, $X_y$ indicates whether or not object $y$ was selected. Clearly, the above program is not linear. I tried to make the objective function linear by using variables $X_{1,2} $, which indicates whether or not both objects $X_1$ and $X_2$ were selected. However, I am struggling to formulate the constraint that exactly $k$ objects must be chosen, i.e., the previous constraint $X_1 + X_2 + \dots + X_n = k$.
Since I am not an expert in mathematical programming, I wonder if you could help me with this.
Thank you in advance!
All the best,
Arthur

You were on the right path, just missing one thing:
Let x_i be 1 if object i is chosen and 0 otherwise.
Let y_ij be 1 if objects i & j are both chosen and 0 otherwise
The IP goes as follows:
maximize
sum d_ij y_ij
s.t.
sum x_i = k
x_i + x_j - 1 <= y_ij for all i<j
x & y binary variables
The strange looking linking constraint says that y_ij = 1 iff x_i + x_j =2
Only define one y variable for each pair!
Hope this helps

Related

Variation on 0/1 Knapsack Algorithm

I'm very new to programming and have been asked to solve a program for work. Right now we are dealing with a typical 0/1 Knapsack problem, in which the benefit/value is maximized given mass and volume constraints.
My task is to basically reverse this and minimize either the volume or mass given a value constraint. In other words, I want my benefit score to be greater than or equal to a set value and then see how small I can get the knapsack given that threshold value.
I have tried researching this problem elsewhere and am sure that it probably has a formal name, however I am unable to find it. If anyone has any information I would greatly appreciate it. I am at a bit of a loss of how to go about solving this type of algorithm as you cannot use the same recursion formulas.
Let's call the weight of item i w(i), and its value v(i). Order the items arbitrarily, and define f(i, j) to be the minimum possible capacity of a knapsack that holds a subset of the first i items totalling at least a value of j.
To calculate f(i, j), we can either include the ith item or not in the knapsack, so
f(i>0, j>0) = min(g(i, j), h(i, j)) # Can include or exclude ith item; pick the best
f(_, 0) = 0 # Don't need any capacity to reach value of 0
f(i<=0, j>0) = infinity # Can't get a positive value with <= 0 items
g(i, j) = f(i-1, j) # Capacity needed if we exclude ith item
h(i, j) = f(i-1, max(0, j-v(i))) + w(i) # Capacity needed if we include ith item
In the last line, max(0, j-v(i)) just makes sure that the second argument in the recursive call to f() does not go negative in the case where v(i) > j.
Memoising this gives a pseudopolynomial O(nc)-time, O(nc)-space algorithm, where n is the number of items and c is the value threshold. You can save space (and possibly time, although not in the asymptotic sense) by calculating it in bottom-up fashion -- this would bring the space complexity down to O(c), since while calculating f(i, ...) you only ever need access to f(i-1, ...), so you only need to keep the previous and current "rows" of the DP matrix.
If I understand your question correctly, the problem you wish to solve is on the form:
let mass_i be the mass of item i, let vol_i the volume, and let val_i be its value.
Let x_i be a binary variable, where x_i is one if and only if the item is in the knapsack.
minimize (mass_1 * x_1 + ... + mass_n * x_n) //The case where you are minimizing mass
s.t. mass_1 * x_1 + ... + mass_n * x_n >= MinMass
vol_1 * x_1 + ... + vol_n * x_n >= MinVolume
val_1 * x_1 + ... + val_n * x_n >= MinValue
x_i in {0,1} for all i
A trick you can use to to make it more "knapsacky" is to substitute x_i with 1-y_i, where y_i is 1 one, if and only if item i is not in the knapsack. Then you get an equivalent problem on the form:
let mass_i be the mass of item i, let vol_i the volume, and let val_i be its value.
Let y_i be a binary variable, where y_i is one if and only if the item is NOT in the knapsack.
maximize mass_1 * y_1 + ... + mass_n * y_n) //The case where you are minimizing mass
s.t. mass_1 * y_1 - ... + mass_n * y_n <= mass_1 + ... + mass_n - MinMass
vol_1 * y_1 - ... + vol_n * y_n <= vol_1 + ... + vol_n - MinVolume
val_1 * y_1 - ... + val_n * y_n <= val_1 + ... + val_n - MinValue
y_i in {0,1} for all i
which is a knapsack problem with 3 constraints. The solution y can easily be transformed into an equivalent solution for your original problem by setting x_i = 1 - y_i.

Max subset of arrays whose mean is larger than threshold

I recently came across the following problem, and so far got no insight on how to solve it.
Let S = {v1, v2, v3, ..., vn} be a set of n arrays defined on the ℝ6. That is, each array has 6 entries.
For a given set of arrays, let the mean of a dimension be the average between the coordinates corresponding to that dimension for all elements in the set.
Also, let us define a certain property P of a set of arrays as the lowest value amongst all means of a set (there is a total of 6 means, one for each dimension). For instance, if a certain set has {10, 4, 1, 5, 6, 3} as means for its dimensions, then P for this set is 1.
Now to the definition of the problem: Return the maximum cardinality amongst all the subsets S' of S such that P(S') ≥ T, T a known threshold value, or 0 if such subset does not exist. Additionally, output any maximal S' (such that P(S') ≥ T).
Summarising: Inputs: the set S and the threshold value T. Output: A certain subset S' (|S'| is evidently immediate).
I first began trying to come up with a greedy solution, but got no success. Then, I moved on to a dynamic programming approach, but could not establish a recursion that solved the problem. I could expand a little more on my thoughts on the solution, but I don't think they would be of much use, given how far I got (or didn't get).
Any help would be greatly appreciated!
Bruteforce evaluation through recursion would have a time complexity of O(2^n) because each array can either be present in the subset or not.
One (still inefficient but slightly better) way to solve this problem is by taking the help of Integer Linear Programming.
Define Xi = { 1 if array Vi is present in the maximal subset, 0 otherwise }
Hence Cardinality k = summation(Xi) {i = 1 to n }
Also, since the average of all dimensions >= T, this means:
d11X1 + d12X2 + ... + d1nXn >= T*k
d21X1 + d22X2 + ... + d2nXn >= T*k
d31X1 + d32X2 + ... + d3nXn >= T*k
d41X1 + d42X2 + ... + d4nXn >= T*k
d51X1 + d52X2 + ... + d5nXn >= T*k
d61X1 + d62X2 + ... + d6nXn >= T*k
Objective function: Maximize( k )
Actually you should eliminate k by the cardinality equation but I have included it here for clarity.

Dynamic programming approximation

I am trying to calculate a function F(x,y) using dynamic programming. Functionally:
F(X,Y) = a1 F(X-1,Y)+ a2 F(X-2,Y) ... + ak F(X-k,Y) + b1 F(X,Y-1)+ b2 F(X,Y-2) ... + bk F(X,Y-k)
where k is a small number (k=10).
The problem is, X=1,000,000 and Y=1,000,000. So it is infeasible to calculate F(x,y) for every value between x=1..1000000 and y=1..1000000. Is there an approximate version of DP where I can avoid calculating F(x,y) for a large number of inputs and still get accurate estimate of F(X,Y).
A similar example is string matching algorithms (Levenshtein's distance) for two very long and similar strings (eg. similar DNA sequences). In such cases only the diagonal scores are important and the far-from-diagonal entries do not contribute to the final distance. How do we avoid calculating off-the-diagonal entries?
PS: Ignore the border cases (i.e. when x < k and y < k).
I'm not sure precisely how to adapt the following technique to your problem, but if you were working in just one dimension there is an O(k3 log n) algorithm for computing the nth term of the series. This is called a linear recurrence and can be solved using matrix math, of all things. The idea is to suppose that you have a recurrence defined as
F(1) = x_1
F(2) = x_2
...
F(k) = x_k
F(n + k + 1) = c_1 F(n) + c_2 F(n + 1) + ... + c_k F(n + k)
For example, the Fibonacci sequence is defined as
F(0) = 0
F(1) = 1
F(n + 2) = 1 x F(n) + 1 x F(n + 1)
There is a way to view this computation as working on a matrix. Specifically, suppose that we have the vector x = (x_1, x_2, ..., x_k)^T. We want to find a matrix A such that
Ax = (x_2, x_3, ..., x_k, x_{k + 1})^T
That is, we begin with a vector of terms 1 ... k of the sequence, and then after multiplying by matrix A end up with a vector of terms 2 ... k + 1 of the sequence. If we then multiply that vector by A, we'd like to get
A(x_2, x_3, ..., x_k, x_{k + 1})^T = (x_3, x_4, ..., x_k, x_{k + 1}, x_{k + 2})
In short, given k consecutive terms of the series, multiplying that vector by A gives us the next term of the series.
The trick uses the fact that we can group the multiplications by A. For example, in the above case, we multiplied our original x by A to get x' (terms 2 ... k + 1), then multiplied x' by A to get x'' (terms 3 ... k + 2). However, we could have instead just multiplied x by A2 to get x'' as well, rather than doing two different matrix multiplications. More generally, if we want to get term n of the sequence, we can compute Anx, then inspect the appropriate element of the vector.
Here, we can use the fact that matrix multiplication is associative to compute An efficiently. Specifically, we can use the method of repeated squaring to compute An in a total of O(log n) matrix multiplications. If the matrix is k x k, then each multiplication takes time O(k3) for a total of O(k3 log n) work to compute the nth term.
So all that remains is actually finding this matrix A. Well, we know that we want to map from (x_1, x_2, ..., x_k) to (x_1, x_2, ..., x_k, x_{k + 1}), and we know that x_{k + 1} = c_1 x_1 + c_2 x_2 + ... + c_k x_k, so we get this matrix:
| 0 1 0 0 ... 0 |
| 0 0 1 0 ... 0 |
A = | 0 0 0 1 ... 0 |
| ... |
| c_1 c_2 c_3 c_4 ... c_k |
For more detail on this, see the Wikipedia entry on solving linear recurrences with linear algebra, or my own code that implements the above algorithm.
The only question now is how you adapt this to when you're working in multiple dimensions. It's certainly possible to do so by treating the computation of each row as its own linear recurrence, then to go one row at a time. More specifically, you can compute the nth term of the first k rows each in O(k3 log n) time, for a total of O(k4 log n) time to compute the first k rows. From that point forward, you can compute each successive row in terms of the previous row by reusing the old values. If there are n rows to compute, this gives an O(k4 n log n) algorithm for computing the final value that you care about. If this is small compared to the work you'd be doing before (O(n2 k2), I believe), then this may be an improvement. Since you're saying that n is on the order of one million and k is about ten, this does seem like it should be much faster than the naive approach.
That said, I wouldn't be surprised if there was a much faster way of solving this problem by not proceeding row by row and instead using a similar matrix trick in multiple dimensions.
Hope this helps!
Without knowing more about your specific problem, the general approach is to use a top-down dynamic programming algorithm and memoize the intermediate results. That way you will only calculate the values that will be actually used (while saving the result to avoid repeated calculations).

Print a polynomial using minimum number of calls

I keep getting these hard interview questions. This one really baffles me.
You're given a function poly that takes and returns an int. It's actually a polynomial with nonnegative integer coefficients, but you don't know what the coefficients are.
You have to write a function that determines the coefficients using as few calls to poly as possible.
My idea is to use recursion knowing that I can get the last coefficient by poly(0). So I want to replace poly with (poly - poly(0))/x, but I don't know how to do this in code, since I can only call poly. ANyone have an idea how to do this?
Here's a neat trick.
int N = poly(1)
Now we know that every coefficient in the polynomial is at most N.
int B = poly(N+1)
Now expand B in base N+1 and you have the coefficients.
Attempted explanation: Algebraically, the polynomial is
poly = p_0 + p_1 * x + p_2 * x^2 + ... + p_k * x^k
If you have a number b and expand it in base n, then you get
b = b_0 + b_1 * n + b_2 * n^2 + ...
where each b_i is uniquely determined and b_i < n.

How can a transform a polynomial to another coordinate system?

Using assorted matrix math, I've solved a system of equations resulting in coefficients for a polynomial of degree 'n'
Ax^(n-1) + Bx^(n-2) + ... + Z
I then evaulate the polynomial over a given x range, essentially I'm rendering the polynomial curve. Now here's the catch. I've done this work in one coordinate system we'll call "data space". Now I need to present the same curve in another coordinate space. It is easy to transform input/output to and from the coordinate spaces, but the end user is only interested in the coefficients [A,B,....,Z] since they can reconstruct the polynomial on their own. How can I present a second set of coefficients [A',B',....,Z'] which represent the same shaped curve in a different coordinate system.
If it helps, I'm working in 2D space. Plain old x's and y's. I also feel like this may involve multiplying the coefficients by a transformation matrix? Would it some incorporate the scale/translation factor between the coordinate systems? Would it be the inverse of this matrix? I feel like I'm headed in the right direction...
Update: Coordinate systems are linearly related. Would have been useful info eh?
The problem statement is slightly unclear, so first I will clarify my own interpretation of it:
You have a polynomial function
f(x) = Cnxn + Cn-1xn-1 + ... + C0
[I changed A, B, ... Z into Cn, Cn-1, ..., C0 to more easily work with linear algebra below.]
Then you also have a transformation such as: z = ax + b that you want to use to find coefficients for the same polynomial, but in terms of z:
f(z) = Dnzn + Dn-1zn-1 + ... + D0
This can be done pretty easily with some linear algebra. In particular, you can define an (n+1)×(n+1) matrix T which allows us to do the matrix multiplication
d = T * c ,
where d is a column vector with top entry D0, to last entry Dn, column vector c is similar for the Ci coefficients, and matrix T has (i,j)-th [ith row, jth column] entry tij given by
tij = (j choose i) ai bj-i.
Where (j choose i) is the binomial coefficient, and = 0 when i > j. Also, unlike standard matrices, I'm thinking that i,j each range from 0 to n (usually you start at 1).
This is basically a nice way to write out the expansion and re-compression of the polynomial when you plug in z=ax+b by hand and use the binomial theorem.
If I understand your question correctly, there is no guarantee that the function will remain polynomial after you change coordinates. For example, let y=x^2, and the new coordinate system x'=y, y'=x. Now the equation becomes y' = sqrt(x'), which isn't polynomial.
Tyler's answer is the right answer if you have to compute this change of variable z = ax+b many times (I mean for many different polynomials). On the other hand, if you have to do it just once, it is much faster to combine the computation of the coefficients of the matrix with the final evaluation. The best way to do it is to symbolically evaluate your polynomial at point (ax+b) by Hörner's method:
you store the polynomial coefficients in a vector V (at the beginning, all coefficients are zero), and for i = n to 0, you multiply it by (ax+b) and add Ci.
adding Ci means adding it to the constant term
multiplying by (ax+b) means multiplying all coefficients by b into a vector K1, multiplying all coefficients by a and shifting them away from the constant term into a vector K2, and putting K1+K2 back into V.
This will be easier to program, and faster to compute.
Note that changing y into w = cy+d is really easy. Finally, as mattiast points out, a general change of coordinates will not give you a polynomial.
Technical note: if you still want to compute matrix T (as defined by Tyler), you should compute it by using a weighted version of Pascal's rule (this is what the Hörner computation does implicitely):
ti,j = b ti,j-1 + a ti-1,j-1
This way, you compute it simply, column after column, from left to right.
You have the equation:
y = Ax^(n-1) + Bx^(n-2) + ... + Z
In xy space, and you want it in some x'y' space. What you need is transformation functions f(x) = x' and g(y) = y' (or h(x') = x and j(y') = y). In the first case you need to solve for x and solve for y. Once you have x and y, you can substituted those results into your original equation and solve for y'.
Whether or not this is trivial depends on the complexity of the functions used to transform from one space to another. For example, equations such as:
5x = x' and 10y = y'
are extremely easy to solve for the result
y' = 2Ax'^(n-1) + 2Bx'^(n-2) + ... + 10Z
If the input spaces are linearly related, then yes, a matrix should be able to transform one set of coefficients to another. For example, if you had your polynomial in your "original" x-space:
ax^3 + bx^2 + cx + d
and you wanted to transform into a different w-space where w = px+q
then you want to find a', b', c', and d' such that
ax^3 + bx^2 + cx + d = a'w^3 + b'w^2 + c'w + d'
and with some algebra,
a'w^3 + b'w^2 + c'w + d' = a'p^3x^3 + 3a'p^2qx^2 + 3a'pq^2x + a'q^3 + b'p^2x^2 + 2b'pqx + b'q^2 + c'px + c'q + d'
therefore
a = a'p^3
b = 3a'p^2q + b'p^2
c = 3a'pq^2 + 2b'pq + c'p
d = a'q^3 + b'q^2 + c'q + d'
which can be rewritten as a matrix problem and solved.

Resources