Algorithm challenge to merge sets - algorithm

Given n sets of numbers. Each set contains some numbers from 1 to 100. How to select sets to merge into the longest set under a special rule, only two non-overlapping sets can merge. [1,2,3] can merge with [4,5] but not [3,4]. What will be an efficient algorithm to merge into the longest set.
My first attempt is to form an n by n matrix. Each row/column represents a set. Entry(i,j) equals to 1 if two sets overlap, entry(i,i) stores the length of set i. Then the questions becomes can we perform row and column operations at the same time to create a diagonal sub-matrix on top left corner whose trace is as large as possible.
However, I got stuck in how to efficiently perform row and column operations to form such a diagonal sub-matrix on top left corner.

As already pointed out in the comments (maximum coverage problem) you have a NP-hart problem. Luckily, matlab offers solvers for integer linear programming.
So we try to reduce the problem to the form:
min f*x subject to Ax<=b , 0<=x
There are n sets, we can encode a set as a vector of 0s and 1s. For example (1,1,1,0,0,...) would represent {1,2,3} and (0,0,1,1,0,0...) - {3,4}.
Every column of A represents a set. A(i,j)=1 means that the i-th element is in the j-th set, A(i,j)=0 means that the i-th element is not in the j-th set.
Now, x represents the sets we select: if x_j=1 than the set j is selected, if x_j=0 - than not selected!
As every element must be at most in one set, we choose b=(1, 1, 1, ..., 1): If we take two sets which contain the i-th element, than the i-th element of (Ax) would be at least 2.
The only question left is what is f? We try to maximize the number of elements in the union, so we choose f_j=-|set_j| (minus owing to min<->max conversion), with |set_j| - number of elements in the j-th set.
Putting it all in matlab we get:
f=-sum(A)
xopt=intlinprog(f.',1:n,A,ones(m,1),[],[],zeros(n,1),ones(n,1))
f.' - cost function as column
1:n - all n elements of x are integers
A - encodes n sets
ones(m,1) - b=(1,1,1...), there are m=100 elements
[],[] - there are no constrains of the form Aeq*x=beq
zeros(n,1), 0<=x must hold
ones(n,1), x<=1 follows already from others constrains, but maybe it will help the solver a little bit

You can represent sets as bit fields. A bitwise and operation yielding zero would indicate non-overlapping sets. Depending on the width of the underlying data type, you may need to perform multiple and operations. For example, with a 64 bit machine word size, I'd need two words to cover 1 to 100 as a bit field.

Related

Toggling bits pairs in an array to maximize its dot product with another array

Suppose two arrays are given A and B. A consists of integers and the second one consists of 0 and 1.
Now an operation is given - You can choose any adjacent bits in array B and you can toggle these two bits (for example - 00->11, 01->10, 10->01, 11->00) and you can perform this operation any number of times.
The output should be the sum of A[0]*B[0]+A[1]*B[1]+....+A[N-1]*B[N-1] such that the sum is maximum.
During the interview, my approach to this problem was to get the maximum number of 1's in array B in order to maximize the sum.
So to do that, I first calculated the total number of 1's in O(n) time in B. Let count = No. Of 1's=x.
Then I started traversing the array and toggle only if count becomes greater than x or based on the elements of array A (for example: Let B[i]=0 and B[i+1]=1 & A[i]=51 and A[i+1]=50
So I will toggle B[i] B[i+1] because A[i]>A[i+1])
But the interviewer was not quite satisfied with my approach and was asking me further to develop a less time complex algorithm.
Can anyone suggest a better approach with lesser time complexity?
You can create any B-vector with an even number of flipped bits just by repeatedly flipping the first bit that is in the wrong state.
So, pick all the positive numbers in A, and then drop the smallest one if you ended up with an a count that has a different oddness than the number of 1s in B. If you can't do that, because B has an odd number of 1s and A is all negative, then just pick the negative number closest to 0.
Then turn on all the bits corresponding to the numbers you chose, and turn off the other ones.

Find every possible permutation of bits in a 2D array that has a single group of contiguous 1s

Below I have represented 2 permutations of bits in a 2D bit array (1s are red). The matrix on the left has a single group of contiguous 1s but the right matrix has 2.
I would like to loop through every possible permutation of binary values in such an array that has a single group of contiguous 1s. I am aware that for a 10×7 grid like above there are 2(10 × 7) permutations when you include non-contiguous permutations, but my hope is that by excluding non-contiguous permutations I will be able to go through them all in reasonable CPU time.
Speaking of reasonableness, I am also interested in an algorithm to determine how many permutations are contiguous.
My question is similar to, but different from, these:
2D Bit matrix with every possible combination
Finding Contiguous Areas of Bits in 2D Bit Array
Any help is appreciated. I'm a bit stuck.
So, I found that the OEIS (Online Encyclopedia of Integer Sequences) has a sequence from n = 0..7 for the "number of nonzero n X n binary arrays with all 1's connected" (A059525). They provide no formula though except for grids fixed at 1 cell wide (triangular numbers), or 2 or 3 cells wide. There's a sequence for 4 x n too but no formula.
Two approaches I can think of. One is to iterate through all possible sequences and devise a test for non-contiguous groups and some method for skipping over large regions guaranteed to be non-contiguous.
A second approach is to attempt to build all sets of contiguous groups so you don't need to test. This is the approach I would take:
Let n = width * height
Enumerate blocks left to right, top to bottom from 0 to n - 1
Fix a block at position 0.
Generate all contiguous solutions between 1 and n blocks extending from position 0
Omit position 0 and fix a block at position 1
Find all contiguous solutions between 1 and n - 1 blocks extending from position 1
Continue until you've reached position n
You can place your pieces according to the following rules, backtracking for the next placement at each depth:
To left of most recently placed piece if placed in row above prior piece provided that no other neighbors exist for that vacancy.
Above left-most available piece in row of most recently placed piece if no other neighbors exist for that vacancy.
To right of most recently placed piece (if adjacent piece exists)
On the next row, farthest left vacancy such that upper row has a piece above any contiguous right remaining neighbors
Next move for any backtracked position is first available move to the right of, or in the row below, the backtracked position (obeying prior 4 rules)

Covering N sets of contiguous integers with minimum nos

We are Given N sets of contiguous integers. Each such set is defined by two numbers. Ex : 2,5 represents a set containing 2,3,4,5. We have to print minimum nos. of numbers to select in order to cover all N sets. A nos. is said to cover a set if it is contained in the set.
Ex: Given sets [2,5] , [3,4] , [10,100]. We can choose for example {3,10} so we cover up all 3 sets. Hence answer is 2.
I can't find a proper algorithm for N<=5000.
Here is an O(nlogn) approach to solve the problem:
Sort the sets by the last element (for example, your example will be sorted as [3,4], [2,5] , [10,100]).
Choose the end of the first interval
Remove all intersecting sets
If there is some uncovered set, return to 2.
Example (based on your example):
sort - your list of sets is sorted as l =[3,4], [2,5] , [10,100]
Choose 4
Remove the covered sets, you now have l=[10,100]
back to 2 - choose 100
Remove the last entry from the list l=[]
Stop clause is reached, you are done with two points: 4,100.
Correctness Proof (Guidelines) by #j_random_hacker:
Some element in that first (after sorting) range [i,j] must be
included in the answer, or that range would not be covered. Its
rightmost element j covers at least the same set of ranges as any
other element in [i,j]. Why? Suppose to the contrary that there was
some element k < j that covered a range that is not covered by j: then
that range must have an endpoint < j, which contradicts the fact that
[i,j] has the smallest endpoint (which we know because it's the first
in the sorted list)
Note the following is a greedy algorithm that doesn't work (see the comments). I am leaving it here, in case it helps someone else.
I would approach this using a recursive algorithm. First, note that if the sets are disjoint, then then you need "n" numbers. Second, the set of "covering" points can be the ends of the sets, so this is a reduced number of options.
You can iterate/recurse your way through this. The following is a high-level sketch of the algorithm:
One iteration step is:
Extract the endpoints from all the sets
Count the number of sets that each endpoint covers
Choose the endpoint with the maximum coverage
If the maximum coverage is 1, then choose an arbitrary point from each set.
Otherwise, choose the endpoint with the maximum coverage. If there are ties for the maximum, arbitrarily choose one. I don't believe it makes a difference when there are ties.
Remove all the sets covered by the endpoint, and add the endpoint to your "coverage points".
Repeat the process until either there are no sets left or the maximum coverage is 1.

Generating Random Matrix With Pairwise Distinct Rows and Columns

I need to randomly generate an NxN matrix of integers in the range 1 to K inclusive such that all rows and columns individually have the property that their elements are pairwise distinct.
For example for N=2 and K=3
This is ok:
1 2
2 1
This is not:
1 3
1 2
(Notice that if K < N this is impossible)
When K is sufficiently larger than N an efficient enough algorithm is just to generate a random matrix of 1..K integers, check that each row and each column is pairwise distinct, and if it isn't try again.
But what about the case where K is not much larger than N?
This is not a full answer, but a warning about an intuitive solution that does not work.
I am assuming that by "randomly generate" you mean with uniform probability on all existing such matrices.
For N=2 and K=3, here are the possible matrices, up to permutations of the set [1..K]:
1 2 1 2 1 2
2 1 2 3 3 1
(since we are ignoring permutations of the set [1..K], we can assume wlog that the first line is 1 2).
Now, an intuitive (but incorrect) strategy would be to draw the matrix entries one by one, ensuring for each entry that it is distinct from the other entries on the same line or column.
To see why it's incorrect, consider that we have drawn this:
1 2
x .
and we are now drawing x. x can be 2 or 3, but if we gave each possibility the probability 1/2, then the matrix
1 2
3 1
would get probability 1/2 of being drawn at the end, while it should have only probability 1/3.
Here is a (textual) solution. I don't think it provides good randomness, but nevertherless it could be ok for your application.
Let's generate a matrix in the range [0;K-1] (you will do +1 for all elements if you want to) with the following algorithm:
Generate the first line with any random method you want.
Each number will be the first element of a random sequence calculated in such a manner that you are guarranteed to have no duplicate in subsequent rows, that is for any distinct column x and y, you will have x[i]!=y[i] for all i in [0;N-1].
Compute each row for the previous one.
All the algorithm is based on the random generator with the property I mentioned. With a quick search, I found that the Inversive congruential generator meets this requirement. It seems to be easy to implement. It works if K is prime; if K is not prime, see on the same page 'Compound Inversive Generators'. Maybe it will be a little tricky to handle with perfect squares or cubic numbers (your problem sound like sudoku :-) ), but I think it is possible by creating compound generators with prime factors of K and different parametrization. For all generators, the first element of each column is the seed.
Whatever the value of K, the complexity is only depending on N and is O(N^2).
Deterministically generate a matrix having the desired property for rows and columns. Provided K > N, this can easily be done by starting the ith row with i, and filling in the rest of the row with i+1, i+2, etc., wrapping back to 1 after K. Other algorithms are possible.
Randomly permute columns, then randomly permute rows.
Let's show that permuting rows (i.e. picking up entire rows and assembling a new matrix from them in some order, with each row possibly in a different vertical position) leaves the desired properties intact for both rows and columns, assuming they were true before. The same reasoning then holds for column permutations, and for any sequence of permutations of either kind.
Trivially, permuting rows cannot change the property that, within each row, no element appears more than once.
The effect of permuting rows on a particular column is to reorder the elements within that column. This holds for any column, and since reordering elements cannot produce duplicate elements where there were none before, permuting rows cannot change the property that, within each column, no element appears more than once.
I'm not certain whether this algorithm is capable of generating all possible satisfying matrices, or if it does, whether it will generate all possible satisfying matrices with equal probability. Another interesting question that I don't have an answer for is: How many rounds of row-permutation-then-column-permutation are needed? More precisely, is any finite sequence of row-perm-then-column-perm rounds equivalent to a bounded number of (or in particular, one) row-perm-then-column-perm round? If so then nothing is gained by further permutations after the first row and column permutations. Perhaps someone with a stronger mathematics background can comment. But it may be good enough in any case.

How to find minimal subrectangle with at least K 1's in a 0-1 matrix

I have encountered an inoridinary problem that given a NxM 0-1 matrix and a number K(<=NxM) and I have to find a minimal subrectangle area of that 0-1 matrix with at least K 1's in inside that subrectangle. Furthermore it's area(the product of both dimensions) should be minimized.
For example:
00000
01010
00100
01010
00000
K = 3
So I can find a subrectangle with minimal area 6 that contains 3 1's inside.
10
01
10
NOTE that the target subrectangle that I mean should contains consecutive numbers of rows and columns from the original 0-1 matrix.
Compute cumulative sum of rows R[i,j] and columns C[i,j].
For top-left corner (i,j) of each possible sub-rectangle:
Starting from a single-row sub-rectangle (n=i),
Search the last possible column for this sub-rectangle (m).
While m>=j:
While there are more than 'k' "ones" in this sub-rectangle:
If this is the smallest sub-rectangle so far, remember it.
Remove column (--m).
This decreases the number of "ones" by C[m+1,n]-C[m+1,j-1].
Add next row (++n).
This increases the number of "ones" by R[m,n]-R[i-1,n].
Time complexity is O(NM(N+M)).
Two nested loops may be optimized by changing linear search to binary search (to process skinny sub-rectangles faster).
Also it is possible (after adding a row/a column to the sub-rectangle) to decrease in O(1) time the number of columns/rows in such a way that the area of this sub-rectangle is not larger than the area of the best-so-far sub-rectangle.
Both these optimizations require calculation of any sub-rectangle weight in O(1). To make it possible, pre-calculate cumulative sum of all elements for sub-rectangles [1..i,1..j] (X[i,j]). Then the weight of any sub-rectangle [i..m,j..n] is computed as X[m,n]-X[i-1,n]-X[m,j-1]+X[i-1,j-1].
Compute cumulative sum of columns C[i,j].
For any starting row (k) of possible sub-rectangle:
For any ending row (l) of possible sub-rectangle:
Starting column (m = 1).
Ending column (n = 1).
While n is not out-of-bounds
While there are less than 'k' "ones" in sub-rectangle [k..l,m..n]:
Add column (++n).
This increases the number of "ones" by C[l,n]-C[k-1,n].
If this is the smallest sub-rectangle so far, remember it.
Remove column (++m).
This decreases the number of "ones" by C[l,m-1]-C[k-1,m-1].
Time complexity is O(N2M).
Loop by 'l' may be terminated when all sub-rectangles, processed inside it, are single-column sub-rectangles (too many rows) or when no sub-rectangles, processed inside it, contain enough "ones" (not enough rows).
The problem is NP-hard because the clique decision problem can be reduced to it. So there is no algorithm that is more efficient than the brute-force approach of trying all the possible submatrices (unless P=NP).
The clique decision problem can be reduced to your problem in the following way:
Let the matrix be the adjacency matrix of the graph.
Set K=L^2, where L is the size of the clique we are looking for.
Solve your problem on this input. The graph contains an L-clique iff the solution to your problem is an LxL submatrix containing only ones (which can be checked in polynomial time).
Off the top of my head, you can make a list of the coordinate pairs(?) of all ones in the matrix, find the (smallest) containing sub-rectangles for each K-combination among them*, then pick the smallest of those.
* which is defined by the smallest and largest row and column indices in the K-combination.

Resources