greedy algorithm matrix of 0\1 - algorithm

I'm studying for an algorithm exam and I cant find a way to solve the next problem:
INPUT: Positive integers r1,r2....rn and c1,c2....cn
OUTPUT: An n by n matrix A with 0/1 entries such that for all i the sum of the i-th row in
A is ri and the sum of the i-th column in A is ci, if such a matrix exists.
Consider the following greedy algorithm that constructs A row by row. Assume that the first
i-1 rows have been constructed. Let aj be the number of 1's in the j-th column in the first i-1 rows. The ri columns with maximum cj-aj are assigned 1's in row , and the rest
of the columns are assigned 0's. That is, the columns that still needs the most 1's are given 1's. Formally prove that this algorithm is correct using an exchange argument.
So what I tried to do as with most greedy problems I encountered is to wrap it up in an induction,
assume that the rows up to the i-th row in the greedy solution and the optimal solution are the same and then try to change the i+1-th row so it will match the greedy and prove that it wont create a less optimal solution to the optimal given. then keep it up for k-i iterations till the entire solution is similar.
After trying that unsuccessfully I tried the same idea only per coordinate assume the ij coordinate is the first one that does not match and again failed.
Then I tried a different approach assuming that I have a set of steps in the greedy and swap a whole step of the algorithm (which is basically the same idea as the first one) and still I did not manage to find a way in which it is guaranteed that I did not create a less optimal solution.
thanks in advance for any assistance.

Consider some inputs r and c and a matrix S that satisfies them.
Throw away the contents of the last row in S to get a new matrix S(n-1). If we fed S(n-1) to the greedy algorithm and asked it to fill in this last row, it'd obviously recover a satisfying solution.
Well, now throw away the contents of the last two rows in S to get S(n-2). Since we know a satisfying solution exists, we know that there are no j such that c(j) - a(j) > 2, and the number of j such that c(j)-a(j) == 2 is smaller or equal to r(n-1). It follows that the greedy algorithm will set A[n-1, j] = 1 for the latter set of j, along with some other j for which c(j)-a(j) = 1. But because we know there's a satisfying solution, it must be that the number of j with c(j) - a(j) == 1 after the n-1th row is filled is exactly r(n), and is hence satisfiable.
Anyway, we can extend this downwards: in S(n-k-1) it must be the case that:
there aren't any j such that c(j) - a(j) > k+1
there can be at most r(n-k) many j such that c(j) - a(j) = k+1,
and Sum(c(j) - a(j)) == Sum(r(i), i >= n-k)
So after the greedy algorithm has processed the n-kth row, it must be the case that
there aren't any j such that c(j) - a(j) > k
there can be at most r(n-k+1) many j such that c(j) - a(j) = k,
and Sum(c(j) - a(j)) == Sum(r(i), i >= n-k+1)
Hence when k = 0, there aren't any j such that c(j) - a(j) > 0

Related

Adding two arrays into one

I have two arrays (a and b) of size n, (positive whole numbers)
a= [a1…..an] b= [b1….bn]
I want to store them in array c, also an array of size n
c=[c1…..cn]
where I add one element from a plus one element from b (each used once) into c, lets say the first element in c is combining a1+b3
Quick example:
n=4 a=[a1,a2,a3,a4] b=[b1,b2,b3,b4]
one way could be:
c=[a1+b2,b3+a4,a2+b1,a3+b4]
The problem is that I want to add them in a way so that the elements in c become as evenly distributed as possible,
One ideal case would be that c came out as:
c=[5,5,5,5]
but the numbers in a and b might not match up so they become even, so I want it to come as close to even as possible.
I an trying to find a way so that the difference between the biggest number in c minus the smallest number in c (after being combined as evenly as I can) to be as small as possible. In my optimal example above that would be 5-5=0 which is most optimal since 0 is the smallest minimum difference I want to achieve. Some other case with other numbers might come out as 6-5=1, which might be the smallest I could get in that situation
My way of going would be to sort array a in ascending order and my array b in descending order,and then combining them with the same element that they are in. Im not sure if this is the best way or the fastest to do this in, I want my code (doing it with python) to be fast. I cant come up with a better way where I could distribute them more evenly,any clue if there are better ways to solve this problem? I really appreciate all advice I could get! Thank you
When trying to solve it in a way where one of the arrays is ascending, and the other one being descending, there might already exist an algorithm that solves it better that I have not thought of. Thank you for reading!
Your algorithm is both correct and fast. It is just proving it that is optimal which is tricky.
We can do this by proving the following two results.
Any other matching of a and b will lead to a maximum at least as big as yours.
Any other matching of a and b will lead to a minimum at least as small as yours.
And the conclusion is that any other matching must have a maximum-minimum at least as big as yours. From which yours must be optimal.
Now let's look at part 1. Sort a ascending, and b descending. Find the i such that c[i] = a[i] + b[i] is a maximum. Suppose that m is any other matching where we're matching up a[j] + b[m[j]]. Note that m[1], ..., m[n] is a permutation of 1, ..., n.
If a[i] + b[m[i]] >= a[i] + b[i] then part 1 is true..
If a[i] + b[m[i]] < a[i] + b[i] then b[m[i]] < b[i] and so we must have i < m[i]. Now there are n-i numbers in the range i+1, ..., n. m maps something out of that range into that range. Because m is a permutation, by the pigeonhole principle, m must map something in that range, out of that range.
In other words there must be a j > i such that m[j] <= i. But now a[i] <= a[j] and b[i] <= b[m[j]] and therefore a[i] + b[i] <= a[j] + b[m[j]]. And so part 1 is true again.
That concludes the proof of part 1.
The proof of part 2 is similar. Except now a[i] + b[i] is at a minimum, m[i] < i, there is a j < i with i <= m[j], a[j] <= a[i], b[m[j]] <= b[i], and a[j] + b[m[j]] <= a[i] + b[i].
And as noted, part 1 and part 2 together implies that you've minimized the difference between the minimum and maximum.

The largest number of subsets for a set

Given a set of X numbers less than or equal to Y, which may contain repeated numbers:
which algorithm gives you the maximum number of subsets whose sum of their elements is greater than or equal to Y, where none of the elements of one subset can be contained in another, and each subset cannot repeat the same element.
(note: if in the initial set two numbers are repeated, each counts as a distinct element)
subsets can group elements into duos, trios, quartets or any other number.
doing two for loops to search for the best combination for the highest number worked for doubles, but given that it is possible to do trios and so on, cases like "1 1 1 1 1 7 8" would be suboptimized
You could implement a 'brute force' method and go through every possible partitioning and check if it satisfies your requirements. This would be quite simple, but horribly inefficient except for trivial cases.
Suppose you have N elements e_i in your set S, with 0 <= e_i <= Y. Choose numparts as the number of partitions you are going to try to create with element sum >= Y. Assuming sum e_i >= Y, we can set numparts = 1 initially, otherwise, obviously, the answer is zero..
Then you can generate partitions by creating an array of N elements p_i where 0 <= p_i < numparts. There are not more than numparts^N possible such partitions!! Now, you have to try to find one in which for all 0 <= j < numparts, sum {e_i : p_i = j} >= Y. If you find one, increment numparts, if you don't, then you have your answer which is the largest numparts value for which you did find a qualifying partition.
You could improve the efficiency of this approach significantly by avoiding lots of partitions that don't have a sum >= Y. There are 'only' 2^N distinct subsets of S so the number of subsets with sums >=Y must be less than or equal to 2^N. For each such subset S_k, you can try to find the maximum number of partitions of S - S_k each with sums >= Y which is just a recursion of the same problem. This would give you the absolute maximum result you're looking for, but would still be a computational nightmare for non-trivial problems.
A quick-but-suboptimal algorithm is simply to sort the array in ascending order, then partition according to the partition sum as you process the sorted elements sequentially. e.g.
Suppose s[i] are the elements in the sorted array,...
partitionno = 0;
partitionsum = 0;
for (i=0; i<N; i++) {
partitionsum += s[i];
if (partitionsum >= Y) {
partitionsum = 0;
partitionno++;
}
}
giving partitionno subsets each having a sum of at least Y. Sorting can be done in O(N) time, and the algorithm above is also O(N) so you could use this for N in the 1000000s or more.
This is strongly NP hard since it contains as a special case the special case of the 3 partition problem of dividing a set into triplets that each have the same sum where all numbers are in the range of that sum/4 to that sum/2. And that special case is known to be strongly NP hard.
Therefore there is no known algorithm to solve it, and finding one would be a really big deal.

Game of choosing maximum amount after removing K coins optimally

I am given the following task to solve:
Two players play a game. In this game there are coins and each coin has a value. Each player takes turns and chooses 1 coin. The goal is to have the highest total value at the end. Each player is forced to play optionally(that means always choosing the highest value from the pile). I must find out the sum of the 2 players/the difference between their highest possible sums
Constraints: All values are natural integers and positive.
The task above is a classic greedy problem. From what I've tried it can be sorted with quickSort and then just picking the elements in order for the 2 players. If you need a better time on my tests Radix-Sort performs better. Ok so this task is pretty easy.
Now I have the same task as above BUT the first player must remove OPTIMALLY K coins such that the difference between their scores is maximal. Well this sounds like DP but my mind can't come up with the solution. I must find out again the maximal difference between their points(with both players playing optimally). Or the points of the 2 players in such a way that the difference between them is maximal.
Is there such an algorithm already implemented? Or can someone give me some tips on this issue?
Here is a DP approach solution. We consider n coins, sorted by descending order to simplify the notation (meaning coins[0] is the highest value coin, while coins[n-1] has the lowest value), and we want to remove k coins in order to win the game with a margin as big as possible.
We will consider a matrix M, of dimensions n-k per k.
M stores the following: M(i, j) is the best possible score after playing i turns, when j coins have been removed out of the i+j best coins. It may sound a bit counter-intuitive at first, but it actually is what we are looking for.
Indeed, we have already a value to initialize our matrix: M(0, 0) = 0.
We also can see that M(n-k, k) is actually the solution to the problem we want to solve.
We now need recurrence equations to fill up our matrix. We consider that we want to maximize the score difference for the first player. To maximize the score difference for the second player, the approach is the same, just modify some signs.
if i = 0 then:
M(i, j) = 0 // score difference is always 0 after playing 0 turns
else if j = 0 and i % 2 = 0: // player 1 plays
M(i, j) = M(i-1, j) + coins[i+j]
else if j = 0 and i % 2 = 1: // player 2 plays
M(i, j) = M(i-1, j) - coins[i+j]
else if i % 2 = 0:
M(i, j) = max(M(i, j-1), M(i-1, j) + coins[i+j])
else if i % 2 = 1:
M(i, j) = max(M(i, j-1), M(i-1, j) - coins[i+j])
This recurrence simply means that the best choice, at any point, is between removing the coin (in the case where the best value is M(i, j-1)), or not removing it(case where the best value is M(i-1, j) +/- coins[i+j]) .
That will give you the final score difference, but not the set of coins to remove. To find it, you must keep the 'optimal path' that your program used to calculate the matrix values (was the best value coming from M(i-1,j) or from M(i,j-1) ?).
This path can give you the set you are looking for. By the way, you can see this makes sense, as there are k among n possible ways to remove k coins out of n coins, and there are as well k among n paths from top left to bottom right in a k per n-k matrix if you're allowed to go right or down only.
This explanation might still be unclear, do not hesitate to ask precisions in the comment, I'll edit the answer for more clarity.

Algorithm to find best combination or path through nodes

As I am not very proficient in various optimization/tree algorithms, I am seeking help.
Problem Description:
Assume, a large sequence of sorted nodes is given with each node representing an integer value L. L is always getting bigger with each node and no nodes have the same L.
The goal now is to find the best combination of nodes, where the difference between the L-values of subsequent nodes is closest to a given integer value M(L) that changes over L.
Example:
So, in the beginning I would have L = 50 and M = 100. The next nodes have L = 70,140,159,240,310.
First, the value of 159 seems to be closest to L+M = 150, so it is chosen as the right value.
However, in the next step, M=100 is still given and we notice that L+M = 259, which is far away from 240.
If we now go back and choose the node with L=140 instead, which then is followed by 240, the overall match between the M values and the L-differences is stronger. The algorithm should be able to find back to the optimal path, even if a mistake was made along the way.
Some additional information:
1) the start node is not necessarily part of the best combination/path, but if required, one could first develop an algorithm, which chooses the best starter candidate.
2) the optimal combination of nodes is following the sorted sequence and not "jumping back" -> so 1,3,5,7 is possible but not 1,3,5,2,7.
3) in the end, the differences between the L values of chosen nodes should in the mean squared sense be closest to the M values
Every help is much appreciated!
If I understand your question correctly, you could use Dijktras algorithm:
https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
http://www.mathworks.com/matlabcentral/fileexchange/20025-dijkstra-s-minimum-cost-path-algorithm
For that you have to know your neighbours of every node and create an Adjacency Matrix. With the implementation of Dijktras algorithm which I posted above you can specify edge weights. You could specify your edge weight in a manner that it is L of the node accessed + M. So for every node combination you have your L of new node + M. In that way the algorithm should find the optimum path between your nodes.
To get all edge combinations you can use Matlabs graph functions:
http://se.mathworks.com/help/matlab/ref/graph.html
If I understand your problem correctly you need an undirected graph.
You can access all edges with the command
G.Edges after you have created the graph.
I know its not the perfect answer but I hope it helps!
P.S. Just watch out, Djikstras algorithm can only handle positive edge weights.
Suppose we are given a number M and a list of n numbers, L[1], ..., L[n], and we want to find a subsequence of at least q of the latter numbers that minimises the sum of squared errors (SSE) with respect to M, where the SSE of a list of k positions x[1], ..., x[k] with respect to M is given by
SSE(M, x[1], ..., x[k]) = sum((L[x[i]]-L[x[i-1]]-M)^2) over all 2 <= i <= k,
with the SSE of a list of 0 or 1 positions defined to be 0.
(I'm introducing the parameter q and associated constraint on the subsequence length here because without it, there always exists a subsequence of length exactly 2 that achieves the minimum possible SSE -- and I'm guessing that such a short sequence isn't helpful to you.)
This problem can be solved in O(qn^2) time and O(qn) space using dynamic programming.
Define f(i, j) to be the minimum sum of squared errors achievable under the following constraints:
The number at position i is selected, and is the rightmost selected position. (Here, i = 0 implies that no positions are selected.)
We require that at least j (instead of q) of these first i numbers are selected.
Also define g(i, j) to be the minimum of f(k, j) over all 0 <= k <= i. Thus g(n, q) will be the minimum sum of squared errors achievable on the entire original problem. For efficient (O(1)) calculation of g(i, j), note that
g(i>0, j>0) = min(g(i-1, j), f(i, j))
g(0, 0) = 0
g(0, j>0) = infinity
To calculate f(i, j), note that if i > 0 then any solution must be formed by appending the ith position to some solution Y that selects at least j-1 positions and whose rightmost selected position is to the left of i -- i.e. whose rightmost selected position is k, for some k < i. The total SSE of this solution to the (i, j) subproblem will be whatever the SSE of Y was, plus a fixed term of (L[x[i]]-L[x[k]]-M)^2 -- so to minimise this total SSE, it suffices to minimise the SSE of Y. But we can compute that minimum: it is g(k, j-1).
Since this holds for any 0 <= k < i, it suffices to try all such values of k, and take the one that gives the lowest total SSE:
f(i>=j, j>=2) = min of (g(k, j-1) + (L[x[i]]-L[x[k]]-M)^2) over all 0 <= k < i
f(i>=j, j<2) = 0 # If we only need 0 or 1 position, SSE is 0
f(i, j>i) = infinity # Can't choose > i positions if the rightmost chosen position is i
With the above recurrences and base cases, we can compute g(n, q), the minimum possible sum of squared errors for the entire problem. By memoising values of f(i, j) and g(i, j), the time to compute all needed values of f(i, j) is O(qn^2), since there are at most (n+1)*(q+1) possible distinct combinations of input parameters (i, j), and computing a particular value of f(i, j) requires at most (n+1) iterations of the loop that chooses values of k, each iteration of which takes O(1) time outside of recursive subcalls. Storing solution values of f(i, j) requires at most (n+1)*(q+1), or O(qn), space, and likewise for g(i, j). As established above, g(i, j) can be computed in O(1) time when all needed values of f(x, y) have been computed, so g(n, q) can be computed in the same time complexity.
To actually reconstruct a solution corresponding to this minimum SSE, you can trace back through the computed values of f(i, j) in reverse order, each time looking for a value of k that achieves a minimum value in the recurrence (there may in general be many such values of k), setting i to this value of k, and continuing on until i=0. This is a standard dynamic programming technique.
I now answer my own post with my current implementation, in order to structure my post and load images. Unfortunately, the code does not do what it should do. Imagine L,M and q given like in the images below. With the calcf and calcg functions I calculated the F and G matrices where F(i+1,j+1) is the calculated and stored f(i,j) and G(i+1,j+1) from g(i,j). The SSE of the optimal combination should be G(N+1,q+1), but the result is wrong. If anyone found the mistake, that would be much appreciated.
G and F Matrix of given problem in the workspace. G and F are created by calculating g(N,q) via calcg(L,N,q,M).
calcf and calcg functions

Given a set of n integers, list all possible subsets with sum>=k

Given an unsorted set of integers in the form of array, find all possible subsets whose sum is greater than or equal to a const integer k,
eg:- Our set is {1,2,3} and k=2
Possible subsets:-
{2},
{3},
{1,2},
{1,3},
{2,3},
{1,2,3}
I can only think of a naive algorithm which lists all the subsets of set and checks if sum of subset is >=k or not, but its an exponential algorithm and listing all subsets requires O(2^N). Can I use dynamic programming to solve it in polynomial time?
Listing all the subsets is going to be still O(2^N) because in the worst case you may still have to list all subsets apart from the empty one.
Dynamic programming can help you count the number of sets that have sum >= K
You go bottom-up keeping track of how many subsets summed to some value from range [1..K]. An approach like this will be O(N*K) which is going to be only feasible for small K.
The idea with the dynamic programming solution is best illustrated with an example. Consider this situation. Assume you know that out of all the sets composed of the first i elements you know that t1 sum to 2 and t2 sum to 3. Let's say that the next i+1 element is 4. Given all the existing sets we can build all the new sets by either appending the element i+1 or leaving it out. If we leave it out we get t1 subsets that sum to 2 and t2 subsets that sum to 3. If we append it then we obtain t1 subsets that sum to 6 (2 + 4) and t2 that sum to 7 (3 + 4) and one subset which contains just i+1 which sums to 4. That gives us the numbers of subsets that sum to (2,3,4,6,7) consisting of the first i+1 elements. We continue until N.
In pseudo-code this could look something like this:
int DP[N][K];
int set[N];
//go through all elements in the set by index
for i in range[0..N-1]
//count the one element subset consisting only of set[i]
DP[i][set[i]] = 1
if (i == 0) continue;
//case 1. build and count all subsets that don't contain element set[i]
for k in range[1..K-1]
DP[i][k] += DP[i-1][k]
//case 2. build and count subsets that contain element set[i]
for k in range[0..K-1]
if k + set[i] >= K then break inner loop
DP[i][k+set[i]] += DP[i-1][k]
//result is the number of all subsets - number of subsets with sum < K
//the -1 is for the empty subset
return 2^N - sum(DP[N-1][1..K-1]) - 1
Can I use dynamic programming to solve it in polynomial time?
No. The problem is even harder than #amit (in the comments) mentions. Finding if there exists a subset that sums to a specific k is the subset-sum problem, which is NP-hard. Instead you are asking for how many solutions are equal to a specific k, which is in the much more difficult class of P#. In addition, your exact problem is slightly more difficult since you want to not only count, but enumerate all the possible subsets for k and targets < k.
If k is 0, and every element of the set is positive then you have no choice but to output every possible subset, so the lower-bound to this problem is O(2N) -- the time taken to produce the output.
Unless you know something more about the value k that you haven't told us, there's no faster general solution that to just check every subset.

Resources