Hungarian (Kuhn Munkres) algorithm oddity - algorithm

I've read every answer here, Wikipedia and WikiHow, the indian guy's lecture, and other sources, and I'm pretty sure I understand what they're saying and have implemented it that way. But I'm confused about a statement that all of these explanations make that is clearly false.
They all say to cover the zeros in the matrix with a minimum number of lines, and if that is equal to N (that is, there's a zero in every row and every column), then there's a zero solution and we're done. But then I found this:
a b c d e
A 0 7 0 0 0
B 0 8 0 0 6
C 5 0 7 3 4
D 5 0 5 9 3
E 0 4 0 0 9
There's a zero in every row and column, and no way to cover the zeros with fewer than five lines, but there's clearly no zero solution. Row C has only the zero in column b, but that leaves no zero for row D.
Do I misunderstand something here? Do I need a better test for whether or not there's a zero assignment possible? Are all these sources leaving out something essential?

You can cover the zeros in the matrix in your example with only four lines: column b, row A, row B, row E.
Here is a step-by-step walkthrough of the algorithm as it is presented in the Wikipedia article as of June 25 applied to your example:
a b c d e
A 0 7 0 0 0
B 0 8 0 0 6
C 5 0 7 3 4
D 5 0 5 9 3
E 0 4 0 0 9
Step 1: The minimum in each row is zero, so the subtraction has no effect. We try to assign tasks such that every task is performed at zero cost, but this turns out to be impossible. Proceed to next step.
Step 2: The minimum in each column is also zero, so this step also has no effect. Proceed to next step.
Step 3: We locate a minimal number of lines to cover up all the zeros. We find [b,A,B,E].
a b c d e
A ---|---------
B ---|---------
C 5 | 7 3 4
D 5 | 5 9 3
E ---|---------
Step 4: We locate the minimal uncovered element. This is 3, at (C,d) and (D,e). We subtract 3 from every unmarked element and add 3 to every element covered by two lines:
a b c d e
A 0 10 0 0 0
B 0 11 0 0 6
C 2 0 4 0 1
D 2 0 2 6 0
E 0 7 0 0 9
Immediately the minimum number of lines to cover up all the zeros becomes 5. This is easy to verify as there is a zero in every row and a zero in every column. The algorithm asserts that an assignment like the one we were looking for in step 1 should now be possible on the new matrix.
We try to assign tasks such that every task is performed at zero cost (according to the new matrix). This is now possible. We find the solution [(A,e),(B,c),(C,d),(D,b),(E,a)].
We can now go back and verify that the solution that we found actually is optimal. We see that every assigned job has zero cost, except (C,d), which has cost 3. Since 3 is actually the lowest nonzero element in the matrix, and we have seen that there is no zero-cost solution, it is clear that this is an optimal solution.

Related

Segment tree built on "light bulbs"

I have encountered following problem:
There are n numbers (0 or 1) and there are 2 operations. You can swich all numbers to 0 or 1 on a specific range(note that switching 001 to 0 is 000, not 110) and you can also ask about how many elements are turned on on a specific range.
Example:
->Our array is 0100101
We set elements from 1 to 3 to 1:
->Our array is 1110101 now
We set elements from 2 to 5 to 0:
->Our array is 1000001 now
We are asking about sum from 2nd to 7th element
-> The answer is 1
Brute force soltion is too slow(O(n*q), where q is number of quetions), so I assume that there has to be a faster one. Probably using segment tree, but I can not find it...
You could build subsampling binary tree in the fashion of mipmaps used in computer graphics.
Each node of the tree contains the sum of its children's values.
E.g.:
0100010011110100
1 0 1 0 2 2 1 0
1 1 4 1
2 5
7
This will bring down complexity for a single query to O(log₂n).
For an editing operation, you also get O(log₂n) by implementing a shortcut: Instead of applying changes recursively, you stop at a node that is fully covered by the input range; thus creating a sparse representation of the sequence. Each node representing M light bulbs either
has value 0 and no children, or
has value M and no children, or
has a value in the range 1..M-1 and 2 children.
The tree above would actually be stored like this:
7
2 5
1 1 4 1
1 0 1 0 1 0
01 01 01
You end up with O(q*log₂n).

Palindrome partitioning with interval scheduling

So I was looking at the various algorithms of solving Palindrome partitioning problem.
Like for a string "banana" minimum no of cuts so that each sub-string is a palindrome is 1 i.e. "b|anana"
Now I tried solving this problem using interval scheduling like:
Input: banana
Transformed string: # b # a # n # a # n # a #
P[] = lengths of palindromes considering each character as center of palindrome.
I[] = intervals
String: # b # a # n # a # n # a #
P[i]: 0 1 0 1 0 3 0 5 0 3 0 1 0
I[i]: 0 1 2 3 4 5 6 7 8 9 10 11 12
Example: Palindrome considering 'a' (index 7) as center is 5 "anana"
Now constructing intervals for each character based on P[i]:
b = (0,2)
a = (2,4)
n = (2,8)
a = (2,12)
n = (6,12)
a = (10,12)
So, now if I have to schedule these many intervals on time 0 to 12 such that minimum no of intervals should be scheduled and no time slot remain empty, I would choose (0,2) and (2,12) intervals and hence the answer for the solution would be 1 as I have broken down the given string in two palindromes.
Another test case:
String: # E # A # B # A # E # A # B #
P[i]: 0 1 0 1 0 5 0 1 0 5 0 1 0 1 0
I[i]: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Plotting on graph:
Now, the minimum no of intervals that can be scheduled are either:
1(0,2), 2(2,4), 5(4,14) OR
3(0,10), 6(10,12), 7(12,14)
Hence, we have 3 partitions so the no of cuts would be 2 either
E|A|BAEAB
EABAE|A|B
These are just examples. I would like to know if this algorithm will work for all cases or there are some cases where it would definitely fail.
Please help me achieve a proof that it will work in every scenario.
Note: Please don't discourage me if this post makes no sense as i have put enough time and effort on this problem, just state a reason or provide some link from where I can move forward with this solution. Thank you.
As long as you can get a partition of the string, your algorith will work.
Recall to mind that a partion P of a set S is a set of non empty subset A1, ..., An:
The union of every set A1, ... An gives the set S
The intersection between any Ai, Aj (with i != j) is empty
Even if the palindrome partitioning deals with strings (which are a bit different from sets), the properties of a partition are still true.
Hence, if you have a partition, you consequently have a set of time intervals without "holes" to schedule.
Choosing the partition with the minimum number of subsets, makes you have the minimum number of time intervals and therefore the minimum number of cuts.
Furthermore, you always have at least one palindrome partition of a string: in the worst case, you get a palindrome partition made of single characters.

Choose max number of non-overlapping 2x2 squares in binary bitmap

Given a rectangle consisting of 1's and 0's, how can I find the maximum number of non-overlapping 2x2 squares of 1's?
Example:
0110
1111
1111
The solution would be 2.
I know it can be solved with Bitmask DP; but I can't really grasp it - after playing with it for hours. How does it work and how can it be formalised?
I wanted to point out that the graph we get by putting vertices at the centers of squares and joining them when they overlap is not claw-free:
If we take (in the full plane) a 2x2 square and three of the four diagonally overlapping 2x2 squares, they form the induced subgraph
• •
\ /
•
/
•
This is a claw, meaning that any region containing those squares corresponds to a non-claw-free graph.
If you build a graph where every node represents a 2x2 square of 1's and there is an edge between two nodes if they overlap, then the problem is now: find the maximum independent set in this graph.
Here is a dynamic programming solution.
The state is (row number, mask of occupied cells, shift position). It looks like this:
..#..
.##..
.#...
.#...
In this case, the row number is 2(I use zero-bases indices), the mask depends on whether we take a cell with # or not, the shift position is 1. The number of states is O(n * m * 2 ^ n). The value of a state is the maximum number of picked 2x2 squares. The base case is f(1, 0, 0) = 0(it corresponds to the first row and no 2x2 squares picked so far).
The transitions are as follows:
..#.. ..#..
.00.. -> ..1..
.0... .11..
.#... .#...
This one can be used if and only if the square consists of ones in the original matrix and there were zeros in the mask(it means that we pick this square).
The other one is:
..#.. ..#..
.##.. -> ..#..
.#... .#0..
.#... .#...
This one is always applicable. It means that we do pick this 2x2 square.
When we are done with one row, we can proceed to the next one:
..#.. ..#0.
..#.. -> ..#..
..#.. ..#..
.##.. ..#..
There are at most two transitions from each state, so the total time complexity is O(n * m * 2 ^ n). The answer the maximum among all masks and shifts for the last row.
edit 20:18, there is a counterexample posted by #ILoveCoding
My intuition says, that this will work. I am unable to prove it since I'm not advanced enough. I can't think of any counterexample though.
I will try describe the solution and post the code, please correct me if my solution is wrong.
First we load input to an array.
Then we create second array of the same size and for every possible placing of 2x2 square we mark 1 in the second array. For the example posted by OP this would look like below.
0 1 0 0
1 1 1 0
0 0 0 0
Then for every 1 in second array we calculate the number of neighbours (including diagonals) + 1 (because if it doesn't have any neighbours we have to still see it as 1). We set those new numbers to the array. Now it should look like this.
0 4 0 0
3 4 3 0
0 0 0 0
Then for every non-zero value in the array we check whether it has any neighbour with bigger or equal value. If it has then move on since it can't be in the solution.
If we don't find any such value then it will be in the solution. We have to reset to zero every neighbour of the original value (because it can't be in the solution, it would overlap the solution.
So the first such number found should be 2 on the left. After processing it the array should look like following.
0 0 0 0
3 0 3 0
0 0 0 0
When we check the other 3 situation is the same.
We end up with non-zero values in the array indicating where top left corners of 2x2 squares are.
If you need the numer of them just traverse the array and count non-zero elements.
Another example
1111 -> 1 1 1 0 -> 4 6 4 0 -> 4 0 4 0
1111 -> 1 1 1 0 -> 6 9 6 0 -> 0 0 0 0
1111 -> 1 1 1 0 -> 6 9 6 0 -> 6 0 6 0
1111 -> 1 1 1 0 -> 6 9 6 0 -> 0 0 0 0
1111 -> 1 1 1 0 -> 4 6 4 0 -> 4 0 4 0
1111 -> 0 0 0 0 -> 0 0 0 0 -> 0 0 0 0
The code I write can be found here
http://ideone.com/Wq1xRo

Converting a number into a special base system

I want to convert a number in base 10 into a special base form like this:
A*2^2 + B*3^1 + C*2^0
A can take on values of [0,1]
B can take on values of [0,1,2]
C can take on values of [0,1]
For example, the number 8 would be
1*2^2 + 1*3 + 1.
It is guaranteed that the given number can be converted to this specialized base system.
I know how to convert from this base system back to base-10, but I do not know how to convert from base-10 to this specialized base system.
In short words, treat every base number (2^2, 3^1, 2^0 in your example) as weight of an item, and the whole number as the capacity of a bag. This problem wants us to find a combination of these items which they fill the bag exactly.
In the first place this problem is NP-complete. It is identical to the subset sum problem, which can also be seen as a derivative problem of the knapsack problem.
Despite this fact, this problem can however be solved by a pseudo-polynomial time algorithm using dynamic programming in O(nW) time, which n is the number of bases, and W is the number to decompose. The details can be find in this wikipedia page: http://en.wikipedia.org/wiki/Knapsack_problem#Dynamic_programming and this SO page: What's it called when I want to choose items to fill container as full as possible - and what algorithm should I use?.
Simplifying your "special base":
X = A * 4 + B * 3 + C
A E {0,1}
B E {0,1,2}
C E {0,1}
Obviously the largest number that can be represented is 4 + 2 * 3 + 1 = 11
To figure out how to get the values of A, B, C you can do one of two things:
There are only 12 possible inputs: create a lookup table. Ugly, but quick.
Use some algorithm. A bit trickier.
Let's look at (1) first:
A B C X
0 0 0 0
0 0 1 1
0 1 0 3
0 1 1 4
0 2 0 6
0 2 1 7
1 0 0 4
1 0 1 5
1 1 0 7
1 1 1 8
1 2 0 10
1 2 1 11
Notice that 2 and 9 cannot be expressed in this system, while 4 and 7 occur twice. The fact that you have multiple possible solutions for a given input is a hint that there isn't a really robust algorithm (other than a look up table) to achieve what you want. So your table might look like this:
int A[] = {0,0,-1,0,0,1,0,1,1,-1,1,1};
int B[] = {0,0,-1,1,1,0,2,1,1,-1,2,2};
int C[] = {0,1,-1,0,2,1,0,1,1,-1,0,1};
Then look up A, B, C. If A < 0, there is no solution.

Matrix, algorithm interview question

This was one of my interview questions.
We have a matrix containing integers (no range provided). The matrix is randomly populated with integers. We need to devise an algorithm which finds those rows which match exactly with a column(s). We need to return the row number and the column number for the match. The order of of the matching elements is the same. For example, If, i'th row matches with j'th column, and i'th row contains the elements - [1,4,5,6,3]. Then jth column would also contain the elements - [1,4,5,6,3]. Size is n x n.
My solution:
RCEQUAL(A,i1..12,j1..j2)// A is n*n matrix
if(i2-i1==2 && j2-j1==2 && b[n*i1+1..n*i2] has [j1..j2])
use brute force to check if the rows and columns are same.
if (any rows and columns are same)
store the row and column numbers in b[1..n^2].//b[1],b[n+2],b[2n+3].. store row no,
// b[2..n+1] stores columns that
//match with row 1, b[n+3..2n+2]
//those that match with row 2,etc..
else
RCEQUAL(A,1..n/2,1..n/2);
RCEQUAL(A,n/2..n,1..n/2);
RCEQUAL(A,1..n/2,n/2..n);
RCEQUAL(A,n/2..n,n/2..n);
Takes O(n^2). Is this correct? If correct, is there a faster algorithm?
you could build a trie from the data in the rows. then you can compare the columns with the trie.
this would allow to exit as soon as the beginning of a column do not match any row. also this would let you check a column against all rows in one pass.
of course the trie is most interesting when n is big (setting up a trie for a small n is not worth it) and when there are many rows and columns which are quite the same. but even in the worst case where all integers in the matrix are different, the structure allows for a clear algorithm...
You could speed up the average case by calculating the sum of each row/column and narrowing your brute-force comparison (which you have to do eventually) only on rows that match the sums of columns.
This doesn't increase the worst case (all having the same sum) but if your input is truly random that "won't happen" :-)
This might only work on non-singular matrices (not sure), but...
Let A be a square (and possibly non-singular) NxN matrix. Let A' be the transpose of A. If we create matrix B such that it is a horizontal concatenation of A and A' (in other words [A A']) and put it into RREF form, we will get a diagonal on all ones in the left half and some square matrix in the right half.
Example:
A = 1 2
3 4
A'= 1 3
2 4
B = 1 2 1 3
3 4 2 4
rref(B) = 1 0 0 -2
0 1 0.5 2.5
On the other hand, if a column of A were equal to a row of A then column of A would be equal to a column of A'. Then we would get another single 1 in of of the columns of the right half of rref(B).
Example
A=
1 2 3 4 5
2 6 -3 4 6
3 8 -7 6 9
4 1 7 -5 3
5 2 4 -1 -1
A'=
1 2 3 4 5
2 6 8 1 2
3 -3 -7 7 4
4 4 6 -5 -1
5 6 9 3 -1
B =
1 2 3 4 5 1 2 3 4 5
2 6 -3 4 6 2 6 8 1 2
3 8 -7 6 9 3 -3 -7 7 4
4 1 7 -5 3 4 4 6 -5 -1
5 2 4 -1 -1 5 6 9 3 -1
rref(B)=
1 0 0 0 0 1.000 -3.689 -5.921 3.080 0.495
0 1 0 0 0 0 6.054 9.394 -3.097 -1.024
0 0 1 0 0 0 2.378 3.842 -0.961 0.009
0 0 0 1 0 0 -0.565 -0.842 1.823 0.802
0 0 0 0 1 0 -2.258 -3.605 0.540 0.662
1.000 in the top row of the right half means that the first column of A matches on of its rows. The fact that the 1.000 is in the left-most column of the right half means that it is the first row.
Without looking at your algorithm or any of the approaches in the previous answers, but since the matrix has n^2 elements to begin with, I do not think there is a method which does better than that :)
IFF the matrix is truely random...
You could create a list of pointers to the columns sorted by the first element. Then create a similar list of the rows sorted by their first element. This takes O(n*logn).
Next create an index into each sorted list initialized to 0. If the first elements match, you must compare the whole row. If they do not match, increment the index of the one with the lowest starting element (either move to the next row or to the next column). Since each index cycles from 0 to n-1 only once, you have at most 2*n comparisons unless all the rows and columns start with the same number, but we said a matrix of random numbers.
The time for a row/column comparison is n in the worst case, but is expected to be O(1) on average with random data.
So 2 sorts of O(nlogn), and a scan of 2*n*1 gives you an expected run time of O(nlogn). This is of course assuming random data. Worst case is still going to be n**3 for a large matrix with most elements the same value.

Resources