Optimizing cartesian requests with affine costs - algorithm

I have a cost optimization request that I don't know how if there is literature on. It is a bit hard to explain, so I apologize in advance for the length of the question.
There is a server I am accessing that works this way:
a request is made on records (r1, ...rn) and fields (f1, ...fp)
you can only request the Cartesian product (r1, ..., rp) x (f1,...fp)
The cost (time and money) associated with a such a request is affine in the size of the request:
T((r1, ..., rn)x(f1, ..., fp) = a + b * n * p
Without loss of generality (just by normalizing), we can assume that b=1 so the cost is:
T((r1, ...,rn)x(f1,...fp)) = a + n * p
I need only to request a subset of pairs (r1, f(r1)), ... (rk, f(rk)), a request which comes from the users. My program acts as a middleman between the user and the server (which is external). I have a lot of requests like this that come in (tens of thousands a day).
Graphically, we can think of it as an n x p sparse matrix, for which I want to cover the nonzero values with a rectangular submatrix:
r1 r2 r3 ... rp
------ ___
f1 |x x| |x|
f2 |x | ---
------
f3
.. ______
fn |x x|
------
Having:
the number of submatrices being kept reasonable because of the constant cost
all the 'x' must lie within a submatrix
the total area covered must not be too large because of the linear cost
I will name g the sparseness coefficient of my problem (number of needed pairs over total possible pairs, g = k / (n * p). I know the coefficient a.
There are some obvious observations:
if a is small, the best solution is to request each (record, field) pair independently, and the total cost is: k * (a + 1) = g * n * p * (a + 1)
if a is large, the best solution is to request the whole Cartesian product, and the total cost is : a + n * p
the second solution is better as soon as g > g_min = 1/ (a+1) * (1 + 1 / (n * p))
of course the orders in the Cartesian products are unimportant, so I can transpose the rows and the columns of my matrix to make it more easily coverable, for example:
f1 f2 f3
r1 x x
r2 x
r3 x x
can be reordered as
f1 f3 f2
r1 x x
r3 x x
r2 x
And there is an optimal solution which is to request (f1,f3) x (r1,r3) + (f2) x (r2)
Trying all the solutions and looking for the lower cost is not an option, because the combinatorics explode:
for each permutation on rows: (n!)
for each permutation on columns: (p!)
for each possible covering of the n x p matrix: (time unknown, but large...)
compute cost of the covering
so I am looking for an approximate solution. I already have some kind of greedy algorithm that finds a covering given a matrix (it begins with unitary cells, then merges them if the proportion of empty cell in the merge is below some threshold).
To put some numbers in minds, my n is somewhere between 1 and 1000, and my p somewhere between 1 and 200. The coverage pattern is really 'blocky', because the records come in classes for which the fields asked are similar. Unfortunately I can't access the class of a record...
Question 1: Has someone an idea, a clever simplification, or a reference for a paper that could be useful? As I have a lot of requests, an algorithm that works well on average is what I am looking for (but I can't afford it to work very poorly on some extreme case, for example requesting the whole matrix when n and p are large, and the request is indeed quite sparse).
Question 2: In fact, the problem is even more complicated: the cost is in fact more like the form: a + n * (p^b) + c * n' * p', where b is a constant < 1 (once a record is asked for a field, it is not too costly to ask for other fields) and n' * p' = n * p * (1 - g) is the number of cells I don't want to request (because they are invalid, and there is an additional cost in requesting invalid things). I can't even dream of a rapid solution to this problem, but still... an idea anyone?

Selecting the submatrices to cover the requested values is a form of the set covering problem and hence NP complete. Your problem adds to this already hard problem that the costs of the sets differ.
That you allow to permutate the rows and columns is not such a big problem, because you can just consider disconnected submatrices. Row one, columns four to seven and row five, columns four two seven are a valid set because you can just swap row two and row five and obtain the connected submatrix row one, column four to row two, column seven. Of course this will add some constraints - not all sets are valid under all permutations - but I don't think this is the biggest problem.
The Wikipedia article gives the inapproximability results that the problem cannot be solved in polynomial time better then with a factor 0.5 * log2(n) where n is the number of sets. In your case 2^(n * p) is a (quite pessimistic) upper bound for the number of sets and yields that you can only find a solution up to a factor of 0.5 * n * p in polynomial time (besides N = NP and ignoring the varying costs).
An optimistic lower bound for the number of sets ignoring permutations of rows and columns is 0.5 * n^2 * p^2 yielding a much better factor of log2(n) + log2(p) - 0.5. In consequence you can only expect to find a solution in your worst case of n = 1000 and p = 200 up to a factor of about 17 in the optimistic case and up to a factor of about 100.000 in the pessimistic case (still ignoring the varying costs).
So the best you can do is to use a heuristic algorithm (the Wikipedia article mentions an almost optimal greedy algorithm) and accept that there will be case where the algorithm performs (very) bad. Or you go the other way and use an optimization algorithm and try to find a good solution be using more time. In this case I would suggest trying to use A* search.

I'm sure there's a really good algorithm for this out there somewhere, but here are my own intuitive ideas:
Toss-some-rectangles approach:
Determine a "roughly optimal" rectangle size based on a.
Place these rectangles (perhaps randomly) over your required points, until all points are covered.
Now take each rectangle and shrink it as much as possible without "losing" any data points.
Find rectangles close to each other and decide whether combining them would be cheaper than keeping them separate.
Grow
Start off with each point in its own 1x1 rectangle.
Locate all rectangles within n rows/columns (where n may be based on a); see if you can combine them into one rectangle for no cost (or negative cost :D).
Repeat.
Shrink
Start off with one big rectangle, that covers ALL points.
Look for a sub-rectangle which shares a pair of sides with the big one, but contains very few points.
Cut it out of the big one, producing two smaller rectangles.
Repeat.
Quad
Divide the plane into 4 rectangles. For each of these, see if you get a better cost by recursing further, or by just including the whole rectangle.
Now take your rectangles and see if you can merge any of them with little/no cost.\
Also: keep in mind that sometimes it will be better to have two overlapping rectangles than one large rectangle which is a superset of them. E.g. the case when two rectangles just overlap in one corner.

Ok, my understanding of the question has changed. New ideas:
Store each row as a long bit-string. AND pairs of bit-strings together, trying to find pairs that maximise the number of 1 bits. Grow these pairs into larger groups (sort and try to match the really big ones with each other). Then construct a request that will hit the largest group and then forget about all those bits. Repeat until everything done. Maybe switch from rows to columns sometimes.
Look for all rows/columns with zero, or few, points in them. "Delete" them temporarily. Now you are looking at what would covered by a request that leaves them out. Now perhaps apply one of the other techniques, and deal with the ignored rows/cols afterwards. Another way of thinking about this is: deal with denser points first, and then move onto sparser ones.

Since your values are sparse, could it be that many users are asking for similar values? Is caching within your application an option? The requests could be indexed by a hash that is a function of (x,y) position, so that you can easily identify cached sets that fall within the correct area of the grid. Storing the cached sets in a tree, for example, would allow you to find minimum cached subsets that cover the request range very quickly. You can then do a linear lookup on the subset, which is small.

I'd consider the n records (rows) and p fields (cols) mentioned in the user request set as n points in p-dimensional space ({0,1}^p) with the ith coordinate being 1 iff it has an X, and identify a hierarchy of clusters, with the coarsest cluster at the root including all the X. For each node in the clustering hierarchy, consider the product that covers all the columns needed (this is rows(any subnode) x cols(any subnode)). Then, decide from the bottom up whether to merge the child coverings (paying for the whole covering), or keep them as separate requests. (the coverings are not of contiguous columns, but exactly the ones needed; i.e. think of a bit vector)
I agree with Artelius that overlapping product-requests could be cheaper; my hierarchical approach would need improvement to incorporate that.

I've worked a bit on it, and here is an obvious, O(n^3) greedy, symmetry breaking algorithm (records and fields are treated separately) in python-like pseudo-code.
The idea is trivial: we start by trying one request per record, and we do the most worthy merge until there is nothing left worthy to merge. This algo has the obvious disadvantage that it does not allow overlapping requests, but I expect it to work quite well on real life case (with the a + n * (p^b) + c * n * p * (1 - g) cost function) :
# given are
# a function cost request -> positive real
# a merge function that takes two pairs of sets (f1, r1) and (f2, r2)
# and returns ((f1 U f2), (r1 U r2))
# initialize with a request per record
requests = [({record},{field if (record, field) is needed}) for all needed records]
costs = [cost(request) for request in requests]
finished = False
while not finished: # there might be something to gain
maximum_gain = 0
finished = True
this_step_merge = empty
# loop onto all pairs of request
for all (request1, request2) in (requests x request) such as request1 != request2:
merged_request = merge(request1, request2)
gain = cost(request1) + cost(request2) - cost(merged_request)
if gain > maximum_gain:
maximum_gain = gain
this_step_merge = (request1, request2, merged_request)
# if we found at least something to merge, we should continue
if maximum_gain > 0:
# so update the list of requests...
request1, request2, merged_request = this_step_merge
delete request1 from requests
delete request2 from requests
# ... and we are not done yet
insert merged_request into requests
finished = False
output requests
This is O(n3 * p) because:
after initialization we start with n requests
the while loop removes exactly one request from the pool at each iteration.
the inner for loop iterates on the (ni^2 - ni) / 2 distinct pairs of requests, with ni going from n to one in the worst case (when we merge everything into one big request).
Can someone help me pointing the very bad cases of the algorithm. Does it sound reasonnable to use this one ?
It is O(n^3) which is way too costly for large inputs. Any idea to optimize it ?
Thanks in advance !

Related

Trying to optimize DP solutions from 2D Array to 1D Array

I've been trying to wrap my head around the fact that a lot of DP questions that involve bottom up tabulation via a 2D Matrix can be simplified into a 1D Array to save on space since you only rely on the previous two rows but I don't really understand the why/how/intuition behind this.
Just wondering if anyone could offer the most dumb downed version of why this works...
Generally, we use can apply DP when the optimal solution to a given problem can be determined from the optimal solutions of its subproblems. When coming up with a solution to some algorithm, it usually helps to come up with a recursive one first. From there, if we observe that recursive subproblems are re-calculated multiple times, we can just memoize the intermediate results for fast reference to them later.
In some special cases, we don't actually need to remember the solution to all subproblems at once; we just need to know a certain subset at a time.
The space optimization described above seems to best answer to the question you're asking - how does one condense the total set of solutions as a 2D matrix into a single 1D array? Well, at a given time, we don't actually store all solutions (the 2D matrix) in any single point in time; we just store what is needed to calculate the next round of intermediate/final outputs in the algorithm.
Perhaps walking through an example application may help reinforce this description.
A nice example is the generalized stock trading problem. Basically, we have an input array prices of a stock on a given day and would like to calculate the maximum profit that can be earned if k buy-sell transactions are made, where one stock may be bought and held at any given time.
The trickiest part in my opinion is figuring out how to move from the one transaction case to the two transaction case. I'll assume we're proficient enough in dynamic programming to move immediately to the k transaction case. Notice that a nice formulation of the problem in terms of subproblems is the following:
prices = input array of prices, length is n
Define dp[k][i] = maximum profit earned by day i, after having made k transactions
dp[k][i] = max(dp[k][i - 1], (prices[i] + effectivePrice)
effectivePrice = max(dp[k - 1][i] - prices[i], effectivePrice) (compute on the fly for each i)
Now in this particular case our "naive" dp solution has a 2D matrix with k rows and n columns. The space reduction here is that, in order to calculate the result for k transactions, we only need knowledge of the case for k - 1 transactions. Therefore, it is certainly possible to solve the problem using two 1D arrays of size n.
Let oldDp = solution for the k - 1 case
Let newDp = solution for the k case (computed on the fly)
for each transaction:
for each day i:
newDp[i] = max(newDp[i - 1], (prices[i] + effectivePrice)
effectivePrice = max(oldDp[i] - prices[i], effectivePrice)
// Set up for next iteration
oldDp = newDp
newDp = blank array of size n
As we can see, we managed to save a lot of space - we went from having to use a 2D matrix with k rows and n columns to two 1D arrays of size n. An even better optimization is to just use a single 1D array; this is possible since the only indices that we examine in oldDp is the current one i when calculating effectivePrice. Because we only need to temporarily remember the old result for day i, we can just make use of a temporary variable. Thus, the optimized pseudocode (for our "naive" approach) appears as below:
Let dp = maximum profit, so that dp[i] = maximum profit after k transactions (built iteratively) on day i.
for each transaction:
for each day i:
// At this point, dp[i] is equivalent to dp[k - 1][i], yet
// for all j < i, dp[j] is equivalent to dp[k][j]!
temp = dp[i]
dp[i] = max(dp[i - 1], prices[i] + effectivePrice)
effectivePrice = max(temp - prices[i], effectivePrice)
And so, using the "naive" idea of determining the optimal solution after k transactions from k - 1 transactions, we optimize space by going from a 2D matrix of size kn to a 1D array of size n.

Counting ways of placing coins on a grid

the problem requires us to find out the number of ways of placing R coins on a N*M grid such that each row and column has at least one coin. Constraints given are N , M < 200 , R < N*M. I initially thought of backtracking, but i was made to realise that it would never finish in time . Can someone guide me to another solution? (DP , closed form formula.) any pointers would be nice. Thanks.
Answer
According to OEIS sequence A055602 one possible solution to this is:
Let a(m, n, r) = Sum_{i=0..m} (-1)^i*binomial(m, i)*binomial((m-i)*n, r)
Answer = Sum_{i=0..N} (-1)^i*binomial(N, i)*a(M, N-i, R)
You will need to evaluate N+1 different values for a.
Assuming you have precomputed binomial coefficients, each evaluation of a is O(M) so the total complexity is O(NM).
Interpretation
This formula can be derived using the inclusion-exclusion principle twice.
a(m,n,r) is the number of ways of putting r coins on a grid of size m*n such that every one of the m columns is occupied, but not all the rows are necessarily occupied.
Inclusion-Exclusion turns this into the correct answer. (The idea is that we get our first estimate from a(M,N,R). This overestimates the correct answer because not all rows are occupied so we subtract cases a(M,N-1,R) where we only occupy N-1 rows. This then underestimates so we need to correct again...)
Similarly we can compute a(m,n,r) by considering b(m,n,r) which is the number of ways of placing r coins on a grid where we don't care about rows or columns being occupied. This can be derived simply from the number of ways of choosing r places in a grid size m*n , i.e. binomial(m*n,r). We use IE to turn this into the function a(m,n,r) where we know that all columns are occupied.
If you want to allow different conditions on the number of coins on each square, then you can just change b(m,n,r) to the appropriate counting function.
This is tough, but if you begin by working out how many ways you can have at least one coin on each row and column (call them reserve coins). The answer will be the product of #1 (n! / r! (n - r)!) *, where #2 n = N*M - NUMBER_OF_RESERVE_COINS and #3 r = (R - NUMBER_OF_RESERVE_COINS) for #4 each arrangement of reserving one coin on each row/column.
#4 is where the trickier stuff takes place. For N*M where N!=M, abs(N-M) tells you how many reserve coins will be on a single rows/columns. I'm having trouble on identifying the correct way of proceeding to the next step, mainly due to lack of time (though I can return to this on the weekend), but I hope I have provided you with useful information, and if what I have said is correct that you will be able to complete the process.

Programming problem - Game of Blocks

maybe you would have an idea on how to solve the following problem.
John decided to buy his son Johnny some mathematical toys. One of his most favorite toy is blocks of different colors. John has decided to buy blocks of C different colors. For each color he will buy googol (10^100) blocks. All blocks of same color are of same length. But blocks of different color may vary in length.
Jhonny has decided to use these blocks to make a large 1 x n block. He wonders how many ways he can do this. Two ways are considered different if there is a position where the color differs. The example shows a red block of size 5, blue block of size 3 and green block of size 3. It shows there are 12 ways of making a large block of length 11.
Each test case starts with an integer 1 ≤ C ≤ 100. Next line consists c integers. ith integer 1 ≤ leni ≤ 750 denotes length of ith color. Next line is positive integer N ≤ 10^15.
This problem should be solved in 20 seconds for T <= 25 test cases. The answer should be calculated MOD 100000007 (prime number).
It can be deduced to matrix exponentiation problem, which can be solved relatively efficiently in O(N^2.376*log(max(leni))) using Coppersmith-Winograd algorithm and fast exponentiation. But it seems that a more efficient algorithm is required, as Coppersmith-Winograd implies a large constant factor. Do you have any other ideas? It can possibly be a Number Theory or Divide and Conquer problem
Firstly note the number of blocks of each colour you have is a complete red herring, since 10^100 > N always. So the number of blocks of each colour is practically infinite.
Now notice that at each position, p (if there is a valid configuration, that leaves no spaces, etc.) There must block of a color, c. There are len[c] ways for this block to lie, so that it still lies over this position, p.
My idea is to try all possible colors and positions at a fixed position (N/2 since it halves the range), and then for each case, there are b cells before this fixed coloured block and a after this fixed colour block. So if we define a function ways(i) that returns the number of ways to tile i cells (with ways(0)=1). Then the number of ways to tile a number of cells with a fixed colour block at a position is ways(b)*ways(a). Adding up all possible configurations yields the answer for ways(i).
Now I chose the fixed position to be N/2 since that halves the range and you can halve a range at most ceil(log(N)) times. Now since you are moving a block about N/2 you will have to calculate from N/2-750 to N/2-750, where 750 is the max length a block can have. So you will have to calculate about 750*ceil(log(N)) (a bit more because of the variance) lengths to get the final answer.
So in order to get good performance you have to through in memoisation, since this inherently a recursive algorithm.
So using Python(since I was lazy and didn't want to write a big number class):
T = int(raw_input())
for case in xrange(T):
#read in the data
C = int(raw_input())
lengths = map(int, raw_input().split())
minlength = min(lengths)
n = int(raw_input())
#setup memoisation, note all lengths less than the minimum length are
#set to 0 as the algorithm needs this
memoise = {}
memoise[0] = 1
for length in xrange(1, minlength):
memoise[length] = 0
def solve(n):
global memoise
if n in memoise:
return memoise[n]
ans = 0
for i in xrange(C):
if lengths[i] > n:
continue
if lengths[i] == n:
ans += 1
ans %= 100000007
continue
for j in xrange(0, lengths[i]):
b = n/2-lengths[i]+j
a = n-(n/2+j)
if b < 0 or a < 0:
continue
ans += solve(b)*solve(a)
ans %= 100000007
memoise[n] = ans
return memoise[n]
solve(n)
print "Case %d: %d" % (case+1, memoise[n])
Note I haven't exhaustively tested this, but I'm quite sure it will meet the 20 second time limit, if you translated this algorithm to C++ or somesuch.
EDIT: Running a test with N = 10^15 and a block with length 750 I get that memoise contains about 60000 elements which means non-lookup bit of solve(n) is called about the same number of time.
A word of caution: In the case c=2, len1=1, len2=2, the answer will be the N'th Fibonacci number, and the Fibonacci numbers grow (approximately) exponentially with a growth factor of the golden ratio, phi ~ 1.61803399. For the
huge value N=10^15, the answer will be about phi^(10^15), an enormous number. The answer will have storage
requirements on the order of (ln(phi^(10^15))/ln(2)) / (8 * 2^40) ~ 79 terabytes. Since you can't even access 79
terabytes in 20 seconds, it's unlikely you can meet the speed requirements in this special case.
Your best hope occurs when C is not too large, and leni is large for all i. In such cases, the answer will
still grow exponentially with N, but the growth factor may be much smaller.
I recommend that you first construct the integer matrix M which will compute the (i+1,..., i+k)
terms in your sequence based on the (i, ..., i+k-1) terms. (only row k+1 of this matrix is interesting).
Compute the first k entries "by hand", then calculate M^(10^15) based on the repeated squaring
trick, and apply it to terms (0...k-1).
The (integer) entries of the matrix will grow exponentially, perhaps too fast to handle. If this is the case, do the
very same calculation, but modulo p, for several moderate-sized prime numbers p. This will allow you to obtain
your answer modulo p, for various p, without using a matrix of bigints. After using enough primes so that you know their product
is larger than your answer, you can use the so-called "Chinese remainder theorem" to recover
your answer from your mod-p answers.
I'd like to build on the earlier #JPvdMerwe solution with some improvements. In his answer, #JPvdMerwe uses a Dynamic Programming / memoisation approach, which I agree is the way to go on this problem. Dividing the problem recursively into two smaller problems and remembering previously computed results is quite efficient.
I'd like to suggest several improvements that would speed things up even further:
Instead of going over all the ways the block in the middle can be positioned, you only need to go over the first half, and multiply the solution by 2. This is because the second half of the cases are symmetrical. For odd-length blocks you would still need to take the centered position as a seperate case.
In general, iterative implementations can be several magnitudes faster than recursive ones. This is because a recursive implementation incurs bookkeeping overhead for each function call. It can be a challenge to convert a solution to its iterative cousin, but it is usually possible. The #JPvdMerwe solution can be made iterative by using a stack to store intermediate values.
Modulo operations are expensive, as are multiplications to a lesser extent. The number of multiplications and modulos can be decreased by approximately a factor C=100 by switching the color-loop with the position-loop. This allows you to add the return values of several calls to solve() before doing a multiplication and modulo.
A good way to test the performance of a solution is with a pathological case. The following could be especially daunting: length 10^15, C=100, prime block sizes.
Hope this helps.
In the above answer
ans += 1
ans %= 100000007
could be much faster without general modulo :
ans += 1
if ans == 100000007 then ans = 0
Please see TopCoder thread for a solution. No one was close enough to find the answer in this thread.

Generate all subset sums within a range faster than O((k+N) * 2^(N/2))?

Is there a way to generate all of the subset sums s1, s2, ..., sk that fall in a range [A,B] faster than O((k+N)*2N/2), where k is the number of sums there are in [A,B]? Note that k is only known after we have enumerated all subset sums within [A,B].
I'm currently using a modified Horowitz-Sahni algorithm. For example, I first call it to for the smallest sum greater than or equal to A, giving me s1. Then I call it again for the next smallest sum greater than s1, giving me s2. Repeat this until we find a sum sk+1 greater than B. There is a lot of computation repeated between each iteration, even without rebuilding the initial two 2N/2 lists, so is there a way to do better?
In my problem, N is about 15, and the magnitude of the numbers is on the order of millions, so I haven't considered the dynamic programming route.
Check the subset sum on Wikipedia. As far as I know, it's the fastest known algorithm, which operates in O(2^(N/2)) time.
Edit:
If you're looking for multiple possible sums, instead of just 0, you can save the end arrays and just iterate through them again (which is roughly an O(2^(n/2) operation) and save re-computing them. The value of all the possible subsets is doesn't change with the target.
Edit again:
I'm not wholly sure what you want. Are we running K searches for one independent value each, or looking for any subset that has a value in a specific range that is K wide? Or are you trying to approximate the second by using the first?
Edit in response:
Yes, you do get a lot of duplicate work even without rebuilding the list. But if you don't rebuild the list, that's not O(k * N * 2^(N/2)). Building the list is O(N * 2^(N/2)).
If you know A and B right now, you could begin iteration, and then simply not stop when you find the right answer (the bottom bound), but keep going until it goes out of range. That should be roughly the same as solving subset sum for just one solution, involving only +k more ops, and when you're done, you can ditch the list.
More edit:
You have a range of sums, from A to B. First, you solve subset sum problem for A. Then, you just keep iterating and storing the results, until you find the solution for B, at which point you stop. Now you have every sum between A and B in a single run, and it will only cost you one subset sum problem solve plus K operations for K values in the range A to B, which is linear and nice and fast.
s = *i + *j; if s > B then ++i; else if s < A then ++j; else { print s; ... what_goes_here? ... }
No, no, no. I get the source of your confusion now (I misread something), but it's still not as complex as what you had originally. If you want to find ALL combinations within the range, instead of one, you will just have to iterate over all combinations of both lists, which isn't too bad.
Excuse my use of auto. C++0x compiler.
std::vector<int> sums;
std::vector<int> firstlist;
std::vector<int> secondlist;
// Fill in first/secondlist.
std::sort(firstlist.begin(), firstlist.end());
std::sort(secondlist.begin(), secondlist.end());
auto firstit = firstlist.begin();
auto secondit = secondlist.begin();
// Since we want all in a range, rather than just the first, we need to check all combinations. Horowitz/Sahni is only designed to find one.
for(; firstit != firstlist.end(); firstit++) {
for(; secondit = secondlist.end(); secondit++) {
int sum = *firstit + *secondit;
if (sum > A && sum < B)
sums.push_back(sum);
}
}
It's still not great. But it could be optimized if you know in advance that N is very large, for example, mapping or hashmapping sums to iterators, so that any given firstit can find any suitable partners in secondit, reducing the running time.
It is possible to do this in O(N*2^(N/2)), using ideas similar to Horowitz Sahni, but we try and do some optimizations to reduce the constants in the BigOh.
We do the following
Step 1: Split into sets of N/2, and generate all possible 2^(N/2) sets for each split. Call them S1 and S2. This we can do in O(2^(N/2)) (note: the N factor is missing here, due to an optimization we can do).
Step 2: Next sort the larger of S1 and S2 (say S1) in O(N*2^(N/2)) time (we optimize here by not sorting both).
Step 3: Find Subset sums in range [A,B] in S1 using binary search (as it is sorted).
Step 4: Next, for each sum in S2, find using binary search the sets in S1 whose union with this gives sum in range [A,B]. This is O(N*2^(N/2)). At the same time, find if that corresponding set in S2 is in the range [A,B]. The optimization here is to combine loops. Note: This gives you a representation of the sets (in terms of two indexes in S2), not the sets themselves. If you want all the sets, this becomes O(K + N*2^(N/2)), where K is the number of sets.
Further optimizations might be possible, for instance when sum from S2, is negative, we don't consider sums < A etc.
Since Steps 2,3,4 should be pretty clear, I will elaborate further on how to get Step 1 done in O(2^(N/2)) time.
For this, we use the concept of Gray Codes. Gray codes are a sequence of binary bit patterns in which each pattern differs from the previous pattern in exactly one bit.
Example: 00 -> 01 -> 11 -> 10 is a gray code with 2 bits.
There are gray codes which go through all possible N/2 bit numbers and these can be generated iteratively (see the wiki page I linked to), in O(1) time for each step (total O(2^(N/2)) steps), given the previous bit pattern, i.e. given current bit pattern, we can generate the next bit pattern in O(1) time.
This enables us to form all the subset sums, by using the previous sum and changing that by just adding or subtracting one number (corresponding to the differing bit position) to get the next sum.
If you modify the Horowitz-Sahni algorithm in the right way, then it's hardly slower than original Horowitz-Sahni. Recall that Horowitz-Sahni works two lists of subset sums: Sums of subsets in the left half of the original list, and sums of subsets in the right half. Call these two lists of sums L and R. To obtain subsets that sum to some fixed value A, you can sort R, and then look up a number in R that matches each number in L using a binary search. However, the algorithm is asymmetric only to save a constant factor in space and time. It's a good idea for this problem to sort both L and R.
In my code below I also reverse L. Then you can keep two pointers into R, updated for each entry in L: A pointer to the last entry in R that's too low, and a pointer to the first entry in R that's too high. When you advance to the next entry in L, each pointer might either move forward or stay put, but they won't have to move backwards. Thus, the second stage of the Horowitz-Sahni algorithm only takes linear time in the data generated in the first stage, plus linear time in the length of the output. Up to a constant factor, you can't do better than that (once you have committed to this meet-in-the-middle algorithm).
Here is a Python code with example input:
# Input
terms = [29371, 108810, 124019, 267363, 298330, 368607,
438140, 453243, 515250, 575143, 695146, 840979, 868052, 999760]
(A,B) = (500000,600000)
# Subset iterator stolen from Sage
def subsets(X):
yield []; pairs = []
for x in X:
pairs.append((2**len(pairs),x))
for w in xrange(2**(len(pairs)-1), 2**(len(pairs))):
yield [x for m, x in pairs if m & w]
# Modified Horowitz-Sahni with toolow and toohigh indices
L = sorted([(sum(S),S) for S in subsets(terms[:len(terms)/2])])
R = sorted([(sum(S),S) for S in subsets(terms[len(terms)/2:])])
(toolow,toohigh) = (-1,0)
for (Lsum,S) in reversed(L):
while R[toolow+1][0] < A-Lsum and toolow < len(R)-1: toolow += 1
while R[toohigh][0] <= B-Lsum and toohigh < len(R): toohigh += 1
for n in xrange(toolow+1,toohigh):
print '+'.join(map(str,S+R[n][1])),'=',sum(S+R[n][1])
"Moron" (I think he should change his user name) raises the reasonable issue of optimizing the algorithm a little further by skipping one of the sorts. Actually, because each list L and R is a list of sizes of subsets, you can do a combined generate and sort of each one in linear time! (That is, linear in the lengths of the lists.) L is the union of two lists of sums, those that include the first term, term[0], and those that don't. So actually you should just make one of these halves in sorted form, add a constant, and then do a merge of the two sorted lists. If you apply this idea recursively, you save a logarithmic factor in the time to make a sorted L, i.e., a factor of N in the original variable of the problem. This gives a good reason to sort both lists as you generate them. If you only sort one list, you have some binary searches that could reintroduce that factor of N; at best you have to optimize them somehow.
At first glance, a factor of O(N) could still be there for a different reason: If you want not just the subset sum, but the subset that makes the sum, then it looks like O(N) time and space to store each subset in L and in R. However, there is a data-sharing trick that also gets rid of that factor of O(N). The first step of the trick is to store each subset of the left or right half as a linked list of bits (1 if a term is included, 0 if it is not included). Then, when the list L is doubled in size as in the previous paragraph, the two linked lists for a subset and its partner can be shared, except at the head:
0
|
v
1 -> 1 -> 0 -> ...
Actually, this linked list trick is an artifact of the cost model and never truly helpful. Because, in order to have pointers in a RAM architecture with O(1) cost, you have to define data words with O(log(memory)) bits. But if you have data words of this size, you might as well store each word as a single bit vector rather than with this pointer structure. I.e., if you need less than a gigaword of memory, then you can store each subset in a 32-bit word. If you need more than a gigaword, then you have a 64-bit architecture or an emulation of it (or maybe 48 bits), and you can still store each subset in one word. If you patch the RAM cost model to take account of word size, then this factor of N was never really there anyway.
So, interestingly, the time complexity for the original Horowitz-Sahni algorithm isn't O(N*2^(N/2)), it's O(2^(N/2)). Likewise the time complexity for this problem is O(K+2^(N/2)), where K is the length of the output.

From an interview: Removing rows and columns in an n×n matrix to maximize the sum of remaining values

Given an n×n matrix of real numbers. You are allowed to erase any number (from 0 to n) of rows and any number (from 0 to n) of columns, and after that the sum of the remaining entries is computed. Come up with an algorithm which finds out which rows and columns to erase in order to maximize that sum.
The problem is NP-hard. (So you should not expect a polynomial-time algorithm for solving this problem. There could still be (non-polynomial time) algorithms that are slightly better than brute-force, though.) The idea behind the proof of NP-hardness is that if we could solve this problem, then we could solve the the clique problem in a general graph. (The maximum-clique problem is to find the largest set of pairwise connected vertices in a graph.)
Specifically, given any graph with n vertices, let's form the matrix A with entries a[i][j] as follows:
a[i][j] = 1 for i == j (the diagonal entries)
a[i][j] = 0 if the edge (i,j) is present in the graph (and i≠j)
a[i][j] = -n-1 if the edge (i,j) is not present in the graph.
Now suppose we solve the problem of removing some rows and columns (or equivalently, keeping some rows and columns) so that the sum of the entries in the matrix is maximized. Then the answer gives the maximum clique in the graph:
Claim: In any optimal solution, there is no row i and column j kept for which the edge (i,j) is not present in the graph. Proof: Since a[i][j] = -n-1 and the sum of all the positive entries is at most n, picking (i,j) would lead to a negative sum. (Note that deleting all rows and columns would give a better sum, of 0.)
Claim: In (some) optimal solution, the set of rows and columns kept is the same. This is because starting with any optimal solution, we can simply remove all rows i for which column i has not been kept, and vice-versa. Note that since the only positive entries are the diagonal ones, we do not decrease the sum (and by the previous claim, we do not increase it either).
All of which means that if the graph has a maximum clique of size k, then our matrix problem has a solution with sum k, and vice-versa. Therefore, if we could solve our initial problem in polynomial time, then the clique problem would also be solved in polynomial time. This proves that the initial problem is NP-hard. (Actually, it is easy to see that the decision version of the initial problem — is there a way of removing some rows and columns so that the sum is at least k — is in NP, so the (decision version of the) initial problem is actually NP-complete.)
Well the brute force method goes something like this:
For n rows there are 2n subsets.
For n columns there are 2n subsets.
For an n x n matrix there are 22n subsets.
0 elements is a valid subset but obviously if you have 0 rows or 0 columns the total is 0 so there are really 22n-2+1 subsets but that's no different.
So you can work out each combination by brute force as an O(an) algorithm. Fast. :)
It would be quicker to work out what the maximum possible value is and you do that by adding up all the positive numbers in the grid. If those numbers happen to form a valid sub-matrix (meaning you can create that set by removing rows and/or columns) then there's your answer.
Implicit in this is that if none of the numbers are negative then the complete matrix is, by definition, the answer.
Also, knowing what the highest possible maximum is possibly allows you to shortcut the brute force evaluation since if you get any combination equal to that maximum then that is your answer and you can stop checking.
Also if all the numbers are non-positive, the answer is the maximum value as you can reduce the matrix to a 1 x 1 matrix with that 1 value in it, by definition.
Here's an idea: construct 2n-1 n x m matrices where 1 <= m <= n. Process them one after the other. For each n x m matrix you can calculate:
The highest possible maximum sum (as per above); and
Whether no numbers are positive allowing you to shortcut the answer.
if (1) is below the currently calculate highest maximum sum then you can discard this n x m matrix. If (2) is true then you just need a simple comparison to the current highest maximum sum.
This is generally referred to as a pruning technique.
What's more you can start by saying that the highest number in the n x n matrix is the starting highest maximum sum since obviously it can be a 1 x 1 matrix.
I'm sure you could tweak this into a (slightly more) efficient recursive tree-based search algorithm with the above tests effectively allowing you to eliminate (hopefully many) unnecessary searches.
We can improve on Cletus's generalized brute-force solution by modelling this as a directed graph. The initial matrix is the start node of the graph; its leaves are all the matrices missing one row or column, and so forth. It's a graph rather than a tree, because the node for the matrix without both the first column and row will have two parents - the nodes with just the first column or row missing.
We can optimize our solution by turning the graph into a tree: There's never any point exploring a submatrix with a column or row deleted that comes before the one we deleted to get to the current node, as that submatrix will be arrived at anyway.
This is still a brute-force search, of course - but we've eliminated the duplicate cases where we remove the same rows in different orders.
Here's an example implementation in Python:
def maximize_sum(m):
frontier = [(m, 0, False)]
best = None
best_score = 0
while frontier:
current, startidx, cols_done = frontier.pop()
score = matrix_sum(current)
if score > best_score or not best:
best = current
best_score = score
w, h = matrix_size(current)
if not cols_done:
for x in range(startidx, w):
frontier.append((delete_column(current, x), x, False))
startidx = 0
for y in range(startidx, h):
frontier.append((delete_row(current, y), y, True))
return best_score, best
And here's the output on 280Z28's example matrix:
>>> m = ((1, 1, 3), (1, -89, 101), (1, 102, -99))
>>> maximize_sum(m)
(106, [(1, 3), (1, 101)])
Since nobody asked for an efficient algorithm, use brute force: generate every possible matrix that can be created by removing rows and/or columns from the original matrix, choose the best one. A slightly more efficent version, which most likely can be proved to still be correct, is to generate only those variants where the removed rows and columns contain at least one negative value.
To try it in a simple way:
We need the valid subset of the set of entries {A00, A01, A02, ..., A0n, A10, ...,Ann} which max. sum.
First compute all subsets (the power set).
A valid subset is a member of the power set that for each two contained entries Aij and A(i+x)(j+y), contains also the elements A(i+x)j and Ai(j+y) (which are the remaining corners of the rectangle spanned by Aij and A(i+x)(j+y)).
Aij ...
. .
. .
... A(i+x)(j+y)
By that you can eliminate the invalid ones from the power set and find the one with the biggest sum in the remaining.
I'm sure it can be improved by improving an algorithm for power set generation in order to generate only valid subsets and by that avoiding step 2 (adjusting the power set).
I think there are some angles of attack that might improve upon brute force.
memoization, since there are many distinct sequences of edits that will arrive at the same submatrix.
dynamic programming. Because the search space of matrices is highly redundant, my intuition is that there would be a DP formulation that can save a lot of repeated work
I think there's a heuristic approach, but I can't quite nail it down:
if there's one negative number, you can either take the matrix as it is, remove the column of the negative number, or remove its row; I don't think any other "moves" result in a higher sum. For two negative numbers, your options are: remove neither, remove one, remove the other, or remove both (where the act of removal is either by axing the row or the column).
Now suppose the matrix has only one positive number and the rest are all <=0. You clearly want to remove everything but the positive entry. For a matrix with only 2 positive entries and the rest <= 0, the options are: do nothing, whittle down to one, whittle down to the other, or whittle down to both (resulting in a 1x2, 2x1, or 2x2 matrix).
In general this last option falls apart (imagine a matrix with 50 positives & 50 negatives), but depending on your data (few negatives or few positives) it could provide a shortcut.
Create an n-by-1 vector RowSums, and an n-by-1 vector ColumnSums. Initialize them to the row and column sums of the original matrix. O(n²)
If any row or column has a negative sum, remove edit: the one with the minimum such and update the sums in the other direction to reflect their new values. O(n)
Stop when no row or column has a sum less than zero.
This is an iterative variation improving on another answer. It operates in O(n²) time, but fails for some cases mentioned in other answers, which is the complexity limit for this problem (there are n² entries in the matrix, and to even find the minimum you have to examine each cell once).
Edit: The following matrix has no negative rows or columns, but is also not maximized, and my algorithm doesn't catch it.
1 1 3 goal 1 3
1 -89 101 ===> 1 101
1 102 -99
The following matrix does have negative rows and columns, but my algorithm selects the wrong ones for removal.
-5 1 -5 goal 1
1 1 1 ===> 1
-10 2 -10 2
mine
===> 1 1 1
Compute the sum of each row and column. This can be done in O(m) (where m = n^2)
While there are rows or columns that sum to negative remove the row or column that has the lowest sum that is less than zero. Then recompute the sum of each row/column.
The general idea is that as long as there is a row or a column that sums to nevative, removing it will result in a greater overall value. You need to remove them one at a time and recompute because in removing that one row/column you are affecting the sums of the other rows/columns and they may or may not have negative sums any more.
This will produce an optimally maximum result. Runtime is O(mn) or O(n^3)
I cannot really produce an algorithm on top of my head, but to me it 'smells' like dynamic programming, if it serves as a start point.
Big Edit: I honestly don't think there's a way to assess a matrix and determine it is maximized, unless it is completely positive.
Maybe it needs to branch, and fathom all elimination paths. You never no when a costly elimination will enable a number of better eliminations later. We can short circuit if it's found the theoretical maximum, but other than any algorithm would have to be able to step forward and back. I've adapted my original solution to achieve this behaviour with recursion.
Double Secret Edit: It would also make great strides to reduce to complexity if each iteration didn't need to find all negative elements. Considering that they don't change much between calls, it makes more sense to just pass their positions to the next iteration.
Takes a matrix, the list of current negative elements in the matrix, and the theoretical maximum of the initial matrix. Returns the matrix's maximum sum and the list of moves required to get there. In my mind move list contains a list of moves denoting the row/column removed from the result of the previous operation.
Ie: r1,r1
Would translate
-1 1 0 1 1 1
-4 1 -4 5 7 1
1 2 4 ===>
5 7 1
Return if sum of matrix is the theoretical maximum
Find the positions of all negative elements unless an empty set was passed in.
Compute sum of matrix and store it along side an empty move list.
For negative each element:
Calculate the sum of that element's row and column.
clone the matrix and eliminate which ever collection has the minimum sum (row/column) from that clone, note that action as a move list.
clone the list of negative elements and remove any that are effected by the action taken in the previous step.
Recursively call this algorithm providing the cloned matrix, the updated negative element list and the theoretical maximum. Append the moves list returned to the move list for the action that produced the matrix passed to the recursive call.
If the returned value of the recursive call is greater than the stored sum, replace it and store the returned move list.
Return the stored sum and move list.
I'm not sure if it's better or worse than the brute force method, but it handles all the test cases now. Even those where the maximum contains negative values.
This is an optimization problem and can be solved approximately by an iterative algorithm based on simulated annealing:
Notation: C is number of columns.
For J iterations:
Look at each column and compute the absolute benefit of toggling it (turn it off if it's currently on or turn it on if it's currently off). That gives you C values, e.g. -3, 1, 4. A greedy deterministic solution would just pick the last action (toggle the last column to get a benefit of 4) because it locally improves the objective. But that might lock us into a local optimum. Instead, we probabilistically pick one of the three actions, with probabilities proportional to the benefits. To do this, transform them into a probability distribution by putting them through a Sigmoid function and normalizing. (Or use exp() instead of sigmoid()?) So for -3, 1, 4 you get 0.05, 0.73, 0.98 from the sigmoid and 0.03, 0.42, 0.56 after normalizing. Now pick the action according to the probability distribution, e.g. toggle the last column with probability 0.56, toggle the second column with probability 0.42, or toggle the first column with the tiny probability 0.03.
Do the same procedure for the rows, resulting in toggling one of the rows.
Iterate for J iterations until convergence.
We may also, in early iterations, make each of these probability distributions more uniform, so that we don't get locked into bad decisions early on. So we'd raise the unnormalized probabilities to a power 1/T, where T is high in early iterations and is slowly decreased until it approaches 0. For example, 0.05, 0.73, 0.98 from above, raised to 1/10 results in 0.74, 0.97, 1.0, which after normalization is 0.27, 0.36, 0.37 (so it's much more uniform than the original 0.05, 0.73, 0.98).
It's clearly NP-Complete (as outlined above). Given this, if I had to propose the best algorithm I could for the problem:
Try some iterations of quadratic integer programming, formulating the problem as: SUM_ij a_ij x_i y_j, with the x_i and y_j variables constrained to be either 0 or 1. For some matrices I think this will find a solution quickly, for the hardest cases it would be no better than brute force (and not much would be).
In parallel (and using most of the CPU), use a approximate search algorithm to generate increasingly better solutions. Simulating Annealing was suggested in another answer, but having done research on similar combinatorial optimisation problems, my experience is that tabu search would find good solutions faster. This is probably close to optimal in terms of wandering between distinct "potentially better" solutions in the shortest time, if you use the trick of incrementally updating the costs of single changes (see my paper "Graph domination, tabu search and the football pool problem").
Use the best solution so far from the second above to steer the first by avoiding searching possibilities that have lower bounds worse than it.
Obviously this isn't guaranteed to find the maximal solution. But, it generally would when this is feasible, and it would provide a very good locally maximal solution otherwise. If someone had a practical situation requiring such optimisation, this is the solution that I'd think would work best.
Stopping at identifying that a problem is likely to be NP-Complete will not look good in a job interview! (Unless the job is in complexity theory, but even then I wouldn't.) You need to suggest good approaches - that is the point of a question like this. To see what you can come up with under pressure, because the real world often requires tackling such things.
yes, it's NP-complete problem.
It's hard to easily find the best sub-matrix,but we can easily to find some better sub-matrix.
Assume that we give m random points in the matrix as "feeds". then let them to automatically extend by the rules like :
if add one new row or column to the feed-matrix, ensure that the sum will be incrementive.
,then we can compare m sub-matrix to find the best one.
Let's say n = 10.
Brute force (all possible sets of rows x all possible sets of columns) takes
2^10 * 2^10 =~ 1,000,000 nodes.
My first approach was to consider this a tree search, and use
the sum of positive entries is an upper bound for every node in the subtree
as a pruning method. Combined with a greedy algorithm to cheaply generate good initial bounds, this yielded answers in about 80,000 nodes on average.
but there is a better way ! i later realised that
Fix some choice of rows X.
Working out the optimal columns for this set of rows is now trivial (keep a column if its sum of its entries in the rows X is positive, otherwise discard it).
So we can just brute force over all possible choices of rows; this takes 2^10 = 1024 nodes.
Adding the pruning method brought this down to 600 nodes on average.
Keeping 'column-sums' and incrementally updating them when traversing the tree of row-sets should allow the calculations (sum of matrix etc) at each node to be O(n) instead of O(n^2). Giving a total complexity of O(n * 2^n)
For slightly less than optimal solution, I think this is a PTIME, PSPACE complexity issue.
The GREEDY algorithm could run as follows:
Load the matrix into memory and compute row totals. After that run the main loop,
1) Delete the smallest row,
2) Subtract the newly omitted values from the old row totals
--> Break when there are no more negative rows.
Point two is a subtle detail: subtracted two rows/columns has time complexity n.
While re-summing all but two columns has n^2 time complexity!
Take each row and each column and compute the sum. For a 2x2 matrix this will be:
2 1
3 -10
Row(0) = 3
Row(1) = -7
Col(0) = 5
Col(1) = -9
Compose a new matrix
Cost to take row Cost to take column
3 5
-7 -9
Take out whatever you need to, then start again.
You just look for negative values on the new matrix. Those are values that actually substract from the overall matrix value. It terminates when there're no more negative "SUMS" values to take out (therefore all columns and rows SUM something to the final result)
In an nxn matrix that would be O(n^2)Log(n) I think
function pruneMatrix(matrix) {
max = -inf;
bestRowBitField = null;
bestColBitField = null;
for(rowBitField=0; rowBitField<2^matrix.height; rowBitField++) {
for (colBitField=0; colBitField<2^matrix.width; colBitField++) {
sum = calcSum(matrix, rowBitField, colBitField);
if (sum > max) {
max = sum;
bestRowBitField = rowBitField;
bestColBitField = colBitField;
}
}
}
return removeFieldsFromMatrix(bestRowBitField, bestColBitField);
}
function calcSumForCombination(matrix, rowBitField, colBitField) {
sum = 0;
for(i=0; i<matrix.height; i++) {
for(j=0; j<matrix.width; j++) {
if (rowBitField & 1<<i && colBitField & 1<<j) {
sum += matrix[i][j];
}
}
}
return sum;
}

Resources