Find largest rectangle containing only zeros in an N×N binary matrix

Find largest rectangle containing only zeros in an N×N binary matrix - algorithm

Given an NxN binary matrix (containing only 0's or 1's), how can we go about finding largest rectangle containing all 0's?
Example:
I
0 0 0 0 1 0
0 0 1 0 0 1
II->0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1 <--IV
0 0 1 0 0 0
IV
For the above example, it is a 6×6 binary matrix. the return value in this case will be Cell 1:(2, 1) and Cell 2:(4, 4). The resulting sub-matrix can be square or rectangular. The return value can also be the size of the largest sub-matrix of all 0's, in this example 3 × 4.

Here's a solution based on the "Largest Rectangle in a Histogram" problem suggested by #j_random_hacker in the comments:
[Algorithm] works by iterating through
rows from top to bottom, for each row
solving this problem, where the
"bars" in the "histogram" consist of
all unbroken upward trails of zeros
that start at the current row (a
column has height 0 if it has a 1 in
the current row).
The input matrix mat may be an arbitrary iterable e.g., a file or a network stream. Only one row is required to be available at a time.
#!/usr/bin/env python
from collections import namedtuple
from operator import mul
Info = namedtuple('Info', 'start height')
def max_size(mat, value=0):
"""Find height, width of the largest rectangle containing all `value`'s."""
it = iter(mat)
hist = [(el==value) for el in next(it, [])]
max_size = max_rectangle_size(hist)
for row in it:
hist = [(1+h) if el == value else 0 for h, el in zip(hist, row)]
max_size = max(max_size, max_rectangle_size(hist), key=area)
return max_size
def max_rectangle_size(histogram):
"""Find height, width of the largest rectangle that fits entirely under
the histogram.
"""
stack = []
top = lambda: stack[-1]
max_size = (0, 0) # height, width of the largest rectangle
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_size = max(max_size, (top().height, (pos - top().start)),
key=area)
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_size = max(max_size, (height, (pos - start)), key=area)
return max_size
def area(size):
return reduce(mul, size)
The solution is O(N), where N is the number of elements in a matrix. It requires O(ncols) additional memory, where ncols is the number of columns in a matrix.
Latest version with tests is at https://gist.github.com/776423

Please take a look at Maximize the rectangular area under Histogram and then continue reading the solution below.
Traverse the matrix once and store the following;
For x=1 to N and y=1 to N
F[x][y] = 1 + F[x][y-1] if A[x][y] is 0 , else 0
Then for each row for x=N to 1
We have F[x] -> array with heights of the histograms with base at x.
Use O(N) algorithm to find the largest area of rectangle in this histogram = H[x]
From all areas computed, report the largest.
Time complexity is O(N*N) = O(N²) (for an NxN binary matrix)
Example:
Initial array F[x][y] array
0 0 0 0 1 0 1 1 1 1 0 1
0 0 1 0 0 1 2 2 0 2 1 0
0 0 0 0 0 0 3 3 1 3 2 1
1 0 0 0 0 0 0 4 2 4 3 2
0 0 0 0 0 1 1 5 3 5 4 0
0 0 1 0 0 0 2 6 0 6 5 1
For x = N to 1
H[6] = 2 6 0 6 5 1 -> 10 (5*2)
H[5] = 1 5 3 5 4 0 -> 12 (3*4)
H[4] = 0 4 2 4 3 2 -> 10 (2*5)
H[3] = 3 3 1 3 2 1 -> 6 (3*2)
H[2] = 2 2 0 2 1 0 -> 4 (2*2)
H[1] = 1 1 1 1 0 1 -> 4 (1*4)
The largest area is thus H[5] = 12

Here is a Python3 solution, which returns the position in addition to the area of the largest rectangle:
#!/usr/bin/env python3
import numpy
s = '''0 0 0 0 1 0
0 0 1 0 0 1
0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1
0 0 1 0 0 0'''
nrows = 6
ncols = 6
skip = 1
area_max = (0, [])
a = numpy.fromstring(s, dtype=int, sep=' ').reshape(nrows, ncols)
w = numpy.zeros(dtype=int, shape=a.shape)
h = numpy.zeros(dtype=int, shape=a.shape)
for r in range(nrows):
for c in range(ncols):
if a[r][c] == skip:
continue
if r == 0:
h[r][c] = 1
else:
h[r][c] = h[r-1][c]+1
if c == 0:
w[r][c] = 1
else:
w[r][c] = w[r][c-1]+1
minw = w[r][c]
for dh in range(h[r][c]):
minw = min(minw, w[r-dh][c])
area = (dh+1)*minw
if area > area_max[0]:
area_max = (area, [(r-dh, c-minw+1, r, c)])
print('area', area_max[0])
for t in area_max[1]:
print('Cell 1:({}, {}) and Cell 2:({}, {})'.format(*t))
Output:
area 12
Cell 1:(2, 1) and Cell 2:(4, 4)

Here is J.F. Sebastians method translated into C#:
private Vector2 MaxRectSize(int[] histogram) {
Vector2 maxSize = Vector2.zero;
int maxArea = 0;
Stack<Vector2> stack = new Stack<Vector2>();
int x = 0;
for (x = 0; x < histogram.Length; x++) {
int start = x;
int height = histogram[x];
while (true) {
if (stack.Count == 0 || height > stack.Peek().y) {
stack.Push(new Vector2(start, height));
} else if(height < stack.Peek().y) {
int tempArea = (int)(stack.Peek().y * (x - stack.Peek().x));
if(tempArea > maxArea) {
maxSize = new Vector2(stack.Peek().y, (x - stack.Peek().x));
maxArea = tempArea;
}
Vector2 popped = stack.Pop();
start = (int)popped.x;
continue;
}
break;
}
}
foreach (Vector2 data in stack) {
int tempArea = (int)(data.y * (x - data.x));
if(tempArea > maxArea) {
maxSize = new Vector2(data.y, (x - data.x));
maxArea = tempArea;
}
}
return maxSize;
}
public Vector2 GetMaximumFreeSpace() {
// STEP 1:
// build a seed histogram using the first row of grid points
// example: [true, true, false, true] = [1,1,0,1]
int[] hist = new int[gridSizeY];
for (int y = 0; y < gridSizeY; y++) {
if(!invalidPoints[0, y]) {
hist[y] = 1;
}
}
// STEP 2:
// get a starting max area from the seed histogram we created above.
// using the example from above, this value would be [1, 1], as the only valid area is a single point.
// another example for [0,0,0,1,0,0] would be [1, 3], because the largest area of contiguous free space is 3.
// Note that at this step, the heigh fo the found rectangle will always be 1 because we are operating on
// a single row of data.
Vector2 maxSize = MaxRectSize(hist);
int maxArea = (int)(maxSize.x * maxSize.y);
// STEP 3:
// build histograms for each additional row, re-testing for new possible max rectangluar areas
for (int x = 1; x < gridSizeX; x++) {
// build a new histogram for this row. the values of this row are
// 0 if the current grid point is occupied; otherwise, it is 1 + the value
// of the previously found historgram value for the previous position.
// What this does is effectly keep track of the height of continous avilable spaces.
// EXAMPLE:
// Given the following grid data (where 1 means occupied, and 0 means free; for clairty):
// INPUT: OUTPUT:
// 1.) [0,0,1,0] = [1,1,0,1]
// 2.) [0,0,1,0] = [2,2,0,2]
// 3.) [1,1,0,1] = [0,0,1,0]
//
// As such, you'll notice position 1,0 (row 1, column 0) is 2, because this is the height of contiguous
// free space.
for (int y = 0; y < gridSizeY; y++) {
if(!invalidPoints[x, y]) {
hist[y] = 1 + hist[y];
} else {
hist[y] = 0;
}
}
// find the maximum size of the current histogram. If it happens to be larger
// that the currently recorded max size, then it is the new max size.
Vector2 maxSizeTemp = MaxRectSize(hist);
int tempArea = (int)(maxSizeTemp.x * maxSizeTemp.y);
if (tempArea > maxArea) {
maxSize = maxSizeTemp;
maxArea = tempArea;
}
}
// at this point, we know the max size
return maxSize;
}
A few things to note about this:
This version is meant for use with the Unity API. You can easily make this more generic by replacing instances of Vector2 with KeyValuePair. Vector2 is only used for a convenient way to store two values.
invalidPoints[] is an array of bool, where true means the grid point is "in use", and false means it is not.

Solution with space complexity O(columns) [Can be modified to O(rows) also] and time complexity O(rows*columns)
public int maximalRectangle(char[][] matrix) {
int m = matrix.length;
if (m == 0)
return 0;
int n = matrix[0].length;
int maxArea = 0;
int[] aux = new int[n];
for (int i = 0; i < n; i++) {
aux[i] = 0;
}
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
aux[j] = matrix[i][j] - '0' + aux[j];
maxArea = Math.max(maxArea, maxAreaHist(aux));
}
}
return maxArea;
}
public int maxAreaHist(int[] heights) {
int n = heights.length;
Stack<Integer> stack = new Stack<Integer>();
stack.push(0);
int maxRect = heights[0];
int top = 0;
int leftSideArea = 0;
int rightSideArea = heights[0];
for (int i = 1; i < n; i++) {
if (stack.isEmpty() || heights[i] >= heights[stack.peek()]) {
stack.push(i);
} else {
while (!stack.isEmpty() && heights[stack.peek()] > heights[i]) {
top = stack.pop();
rightSideArea = heights[top] * (i - top);
leftSideArea = 0;
if (!stack.isEmpty()) {
leftSideArea = heights[top] * (top - stack.peek() - 1);
} else {
leftSideArea = heights[top] * top;
}
maxRect = Math.max(maxRect, leftSideArea + rightSideArea);
}
stack.push(i);
}
}
while (!stack.isEmpty()) {
top = stack.pop();
rightSideArea = heights[top] * (n - top);
leftSideArea = 0;
if (!stack.isEmpty()) {
leftSideArea = heights[top] * (top - stack.peek() - 1);
} else {
leftSideArea = heights[top] * top;
}
maxRect = Math.max(maxRect, leftSideArea + rightSideArea);
}
return maxRect;
}
But I get Time Limite exceeded excpetion when I try this on LeetCode. Is there any less complex solution?

I propose a O(nxn) method.
First, you can list all the maximum empty rectangles. Empty means that it covers only 0s. A maximum empty rectangle is such that it cannot be extended in a direction without covering (at least) one 1.
A paper presenting a O(nxn) algorithm to create such a list can be found at www.ulg.ac.be/telecom/rectangles as well as source code (not optimized). There is no need to store the list, it is sufficient to call a callback function each time a rectangle is found by the algorithm, and to store only the largest one (or choose another criterion if you want).
Note that a proof exists (see the paper) that the number of largest empty rectangles is bounded by the number of pixels of the image (nxn in this case).
Therefore, selecting the optimal rectangle can be done in O(nxn), and the overall method is also O(nxn).
In practice, this method is very fast, and is used for realtime video stream analysis.

Here is a version of jfs' solution, which also delivers the position of the largest rectangle:
from collections import namedtuple
from operator import mul
Info = namedtuple('Info', 'start height')
def max_rect(mat, value=0):
"""returns (height, width, left_column, bottom_row) of the largest rectangle
containing all `value`'s.
Example:
[[0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
[0, 4, 0, 2, 4, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 0, 0, 3, 0, 0, 4],
[0, 0, 0, 0, 4, 2, 0, 0, 0, 0],
[0, 0, 0, 2, 0, 0, 0, 0, 0, 0],
[4, 3, 0, 0, 1, 2, 0, 0, 0, 0],
[3, 0, 0, 0, 2, 0, 0, 0, 0, 4],
[0, 0, 0, 1, 0, 3, 2, 4, 3, 2],
[0, 3, 0, 0, 0, 2, 0, 1, 0, 0]]
gives: (3, 4, 6, 5)
"""
it = iter(mat)
hist = [(el==value) for el in next(it, [])]
max_rect = max_rectangle_size(hist) + (0,)
for irow,row in enumerate(it):
hist = [(1+h) if el == value else 0 for h, el in zip(hist, row)]
max_rect = max(max_rect, max_rectangle_size(hist) + (irow+1,), key=area)
# irow+1, because we already used one row for initializing max_rect
return max_rect
def max_rectangle_size(histogram):
stack = []
top = lambda: stack[-1]
max_size = (0, 0, 0) # height, width and start position of the largest rectangle
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_size = max(max_size, (top().height, (pos - top().start), top().start), key=area)
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_size = max(max_size, (height, (pos - start), start), key=area)
return max_size
def area(size):
return size[0] * size[1]

To be complete, here's the C# version which outputs the rectangle coordinates.
It's based on dmarra's answer but without any other dependencies.
There's only the function bool GetPixel(int x, int y), which returns true when a pixel is set at the coordinates x,y.
public struct INTRECT
{
public int Left, Right, Top, Bottom;
public INTRECT(int aLeft, int aTop, int aRight, int aBottom)
{
Left = aLeft;
Top = aTop;
Right = aRight;
Bottom = aBottom;
}
public int Width { get { return (Right - Left + 1); } }
public int Height { get { return (Bottom - Top + 1); } }
public bool IsEmpty { get { return Left == 0 && Right == 0 && Top == 0 && Bottom == 0; } }
public static bool operator ==(INTRECT lhs, INTRECT rhs)
{
return lhs.Left == rhs.Left && lhs.Top == rhs.Top && lhs.Right == rhs.Right && lhs.Bottom == rhs.Bottom;
}
public static bool operator !=(INTRECT lhs, INTRECT rhs)
{
return !(lhs == rhs);
}
public override bool Equals(Object obj)
{
return obj is INTRECT && this == (INTRECT)obj;
}
public bool Equals(INTRECT obj)
{
return this == obj;
}
public override int GetHashCode()
{
return Left.GetHashCode() ^ Right.GetHashCode() ^ Top.GetHashCode() ^ Bottom.GetHashCode();
}
}
public INTRECT GetMaximumFreeRectangle()
{
int XEnd = 0;
int YStart = 0;
int MaxRectTop = 0;
INTRECT MaxRect = new INTRECT();
// STEP 1:
// build a seed histogram using the first row of grid points
// example: [true, true, false, true] = [1,1,0,1]
int[] hist = new int[Height];
for (int y = 0; y < Height; y++)
{
if (!GetPixel(0, y))
{
hist[y] = 1;
}
}
// STEP 2:
// get a starting max area from the seed histogram we created above.
// using the example from above, this value would be [1, 1], as the only valid area is a single point.
// another example for [0,0,0,1,0,0] would be [1, 3], because the largest area of contiguous free space is 3.
// Note that at this step, the heigh fo the found rectangle will always be 1 because we are operating on
// a single row of data.
Tuple<int, int> maxSize = MaxRectSize(hist, out YStart);
int maxArea = (int)(maxSize.Item1 * maxSize.Item2);
MaxRectTop = YStart;
// STEP 3:
// build histograms for each additional row, re-testing for new possible max rectangluar areas
for (int x = 1; x < Width; x++)
{
// build a new histogram for this row. the values of this row are
// 0 if the current grid point is occupied; otherwise, it is 1 + the value
// of the previously found historgram value for the previous position.
// What this does is effectly keep track of the height of continous avilable spaces.
// EXAMPLE:
// Given the following grid data (where 1 means occupied, and 0 means free; for clairty):
// INPUT: OUTPUT:
// 1.) [0,0,1,0] = [1,1,0,1]
// 2.) [0,0,1,0] = [2,2,0,2]
// 3.) [1,1,0,1] = [0,0,1,0]
//
// As such, you'll notice position 1,0 (row 1, column 0) is 2, because this is the height of contiguous
// free space.
for (int y = 0; y < Height; y++)
{
if (!GetPixel(x, y))
{
hist[y]++;
}
else
{
hist[y] = 0;
}
}
// find the maximum size of the current histogram. If it happens to be larger
// that the currently recorded max size, then it is the new max size.
Tuple<int, int> maxSizeTemp = MaxRectSize(hist, out YStart);
int tempArea = (int)(maxSizeTemp.Item1 * maxSizeTemp.Item2);
if (tempArea > maxArea)
{
maxSize = maxSizeTemp;
maxArea = tempArea;
MaxRectTop = YStart;
XEnd = x;
}
}
MaxRect.Left = XEnd - maxSize.Item1 + 1;
MaxRect.Top = MaxRectTop;
MaxRect.Right = XEnd;
MaxRect.Bottom = MaxRectTop + maxSize.Item2 - 1;
// at this point, we know the max size
return MaxRect;
}
private Tuple<int, int> MaxRectSize(int[] histogram, out int YStart)
{
Tuple<int, int> maxSize = new Tuple<int, int>(0, 0);
int maxArea = 0;
Stack<Tuple<int, int>> stack = new Stack<Tuple<int, int>>();
int x = 0;
YStart = 0;
for (x = 0; x < histogram.Length; x++)
{
int start = x;
int height = histogram[x];
while (true)
{
if (stack.Count == 0 || height > stack.Peek().Item2)
{
stack.Push(new Tuple<int, int>(start, height));
}
else if (height < stack.Peek().Item2)
{
int tempArea = (int)(stack.Peek().Item2 * (x - stack.Peek().Item1));
if (tempArea > maxArea)
{
YStart = stack.Peek().Item1;
maxSize = new Tuple<int, int>(stack.Peek().Item2, (x - stack.Peek().Item1));
maxArea = tempArea;
}
Tuple<int, int> popped = stack.Pop();
start = (int)popped.Item1;
continue;
}
break;
}
}
foreach (Tuple<int, int> data in stack)
{
int tempArea = (int)(data.Item2 * (x - data.Item1));
if (tempArea > maxArea)
{
YStart = data.Item1;
maxSize = new Tuple<int, int>(data.Item2, (x - data.Item1));
maxArea = tempArea;
}
}
return maxSize;
}

An appropriate algorithm can be found within Algorithm for finding the largest inscribed rectangle in polygon (2019).
I implemented it in python:
import largestinteriorrectangle as lir
import numpy as np
grid = np.array([[0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0]],
"bool")
grid = ~grid
lir.lir(grid) # [1, 2, 4, 3]
the result comes as x, y, width, height

Related

Looking for a optimal solution

This was one of the interview questions in amazon.
Given a 2D array of 0's and 1's we need to find the pattern of maximum size.
Patters is as follows:
size = 1:
1
1 1 1
1
size = 2:
1
1
1 1 1
1
1
Naive Solution: Traverse each and every element of MxN matrix, and search for the index with value 1 and check if left & right entries as 1 and note the maximum length of 1's above and below the index.
Looking for a better solution. If anyone has a clue please do post.

I assume that any 1 values that surround such a pattern do not destroy it, so that also this would have size 1:
1 1 1 1
1 1 1 1
1 1 0 1
1 1 1 1
In that case I would suggest an algorithm where for each column you do the following:
Initialise size as 0
For each cell in this column:
Push the current size on a stack; it represents the number of 1 values in the upward direction starting from this cell.
If the value in this cell is a 1, then increase size, otherwise set it to 0
Initialise size as 0
For each cell in this column, but in reverse order:
Pop the last value from the stack
Call thisSize the least of the popped value and the value of size.
If thisSize is greater than the best pattern found so far and the values at both sides of the current cell are 1, then consider this the best pattern.
If the value in this cell is a 1, then increase size, otherwise set it to 0
As a further optimisation you could exit the second loop as soon as the distance between the current cell and the top of the grid becomes smaller than the size of the largest pattern we already found earlier.
Here is an implementation in JavaScript:
function findPattern(a) {
var rows = a.length,
cols = a[0].length,
maxSize = -1,
stack = [],
row, col, pos, thisSize;
for (col = 1; col < cols-1; col++) {
stack = [];
// Downward counting to store the number of 1s in upward direction
size = 0;
for (row = 0; row < rows; row++) {
stack.push(size);
size = a[row][col] == 1 ? size + 1 : 0;
}
// Reverse, but only as far as still useful given the size we already found
size = 0;
for (row = rows - 1; row > maxSize; row--) {
thisSize = Math.min(size, stack.pop());
if (thisSize >= maxSize && a[row][col-1] == 1 && a[row][col+1] == 1) {
maxSize = thisSize;
pos = [row, col];
}
size = a[row][col] == 1 ? size + 1 : 0;
}
}
return [maxSize, pos];
}
// Sample data:
var a = [
[0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 1, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 1],
[1, 0, 1, 0, 1, 1, 0],
[1, 1, 1, 1, 0, 1, 0],
[0, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 0, 0, 1, 0]];
var [size, pos] = findPattern(a);
console.log('Size ' + size + ' with center at row ' + (pos[0]+1)
+ ' and column ' + (pos[1]+1) + ' (first row/col is numbered 1)');

Here: gist.github.com/... is a generic Java implementation that finds the largest plus(+) pattern in a 2D matrix of any size.
The idea is to find for the biggest possible plus(+) pattern first around the central elements(initial window) of the matrix. For each element in the window find the max plus size centered at that element.
If largest is found return the largest size.
If largest possible '+' is not found, store the size of whatever smaller than that was found and repeat search from step #1 in the next outer window around the previous window for 1-size smaller '+' pattern; iteratively searching for '+' in an 'onion layer pattern' from inside towards outside.
The initial central window is chosen such that edges of matrix are equally far on all sides from this window.
Example 1 - For matrix of size {4x3}, smallest central window lies
from (1,1) to (2,1)
Example 2 - For matrix of size {3x9}, smallest
central window lies from (1,1) to (1,7)
int rows = arr.length;
int cols = arr[0].length;
int min = rows < cols ? rows : cols;
int diff = rows > cols ? rows - cols : cols - rows;
// Initializing initial window params. The center most smallest window possible
int first_r, first_c, last_r, last_c;
first_r = (min - 1) / 2;
first_c = (min - 1) / 2;
last_r = rows < cols ? (rows / 2) : (cols / 2) + diff;
last_c = rows > cols ? (cols / 2) : (rows / 2) + diff;
Full Java code:
public class PlusPattern {
/**
* Utility method to verify matrix dimensions
*
* #param a matrix to be verified
* #return true if matrix size is greater than 0;
*/
private static boolean isValid(int[][] a) {
return a.length > 0 && a[0].length > 0;
}
/**
* Finds the size of largest plus(+) pattern of given 'symbol' integer in an integer 2D matrix .
*
* The idea is to find for the biggest possible plus(+) pattern first around the central elements
* of the matrix. If largest is found return the largest size. If largest possible + is not
* found, store the size of whatever smaller than that was found and repeat search for 1 size
* smaller + in the next outer window around the previous window.
*
* #param arr matrix to be searched
* #param symbol whose + patter is sought
* #return the radius of largest + found in the matrix.
*/
static int findLargestPlusPattern(int[][] arr, int symbol) {
if (!isValid(arr)) {
throw new IllegalArgumentException("Cannot perform search on empty array");
}
int maxPlusRadius = 0;
int rows = arr.length;
int cols = arr[0].length;
int min = rows < cols ? rows : cols;
int diff = rows > cols ? rows - cols : cols - rows;
// Initializing initial window params. The center most smallest window possible
// Example - For matrix of size {4x3}, smallest central window lies from [1][1] to [2][1]
// Example - For matrix of size {3x9}, smallest central window lies from [1][1] to [1][7]
int first_r, first_c, last_r, last_c;
first_r = (min - 1) / 2;
first_c = (min - 1) / 2;
last_r = rows < cols ? (rows / 2) : (cols / 2) + diff;
last_c = rows > cols ? (cols / 2) : (rows / 2) + diff;
// Initializing with biggest possible search radius in the matrix
int searchRadius = (min - 1) / 2;
int r, c;
int found;
// Iteratively searching for + in an 'onion layer pattern' from inside to outside
while (searchRadius > maxPlusRadius) { // no need to find smaller + patterns than the one already found
// initializing r and c cursor for this window iterations.
r = first_r;
c = first_c;
// Search each of the 4 sides of the current window in a clockwise manner
// 1# Scan the top line of window
do { // do-while used to search inside initial window with width==1
found = findLargestPlusAt(r, c, arr, symbol, searchRadius);
if (found == searchRadius) {
return searchRadius; // cannot find a bigger plus(+) than this in remaining matrix
} else if (found > maxPlusRadius) {
maxPlusRadius = found;
}
c++;
} while (c < last_c);
if (c > last_c)
c--; // for initial window with width==1. Otherwise #3 condition will be true for invalid c-index
// 2# Scan the right line of window
do { // do-while used to search inside initial window with height==1
found = findLargestPlusAt(r, c, arr, symbol, searchRadius);
if (found == searchRadius) {
return searchRadius;
} else if (found > maxPlusRadius) {
maxPlusRadius = found;
}
r++;
} while (r < last_r);
if (r > last_r)
r--; // for initial window with height==1. Otherwise #4 condition will be true for invalid r-index
// 3# Scan the bottom line of window
while (c > first_c) {
found = findLargestPlusAt(r, c, arr, symbol, searchRadius);
if (found == searchRadius) {
return searchRadius;
} else if (found > maxPlusRadius) {
maxPlusRadius = found;
}
c--;
}
// 4# Scan the left line of window
while (r > first_r) {
found = findLargestPlusAt(r, c, arr, symbol, searchRadius);
if (found == searchRadius) {
return searchRadius;
} else if (found > maxPlusRadius) {
maxPlusRadius = found;
}
r--;
}
// r and c comes back at first_r and first_c.
// increasing window on all sides by 1.
first_r--;
first_c--;
last_r++;
last_c++;
// reducing search radius to avoid out of bounds error on next window.
searchRadius--;
}
return maxPlusRadius;
}
/**
* Finds, if exist, the size of largest plus around a given point a[r][c]. It grows radius
* greedily to maximise the search for + pattern returns 0 if is the point is the only symbol.
*
* #param r row coordinate of search center
* #param c column coordinate of search center
* #param a matrix
* #param symbol search symbol
* #param max_radius around the center to be searched for + pattern
* #return returns -1 if the point itself is not the symbol.
* returns n if all the next elements in E W N S directions within radius n are the symbol elements.
*/
static int findLargestPlusAt(int r, int c, int[][] a, int symbol, int max_radius) {
int largest = -1;
if (a[r][c] != symbol) { // If center coordinate itself is not the symbol
return largest;
} else {
largest = 0;
}
for (int rad = 1; rad <= max_radius; rad++) {
if (a[r + rad][c] == symbol && a[r][c + rad] == symbol && a[r - rad][c] == symbol && a[r][c - rad] == symbol) {
largest = rad; // At least a '+' of radius 'rad' is present.
} else {
break;
}
}
return largest;
}
public static void main(String[] args) {
int mat[][];
mat = new int[][]{ // max + = 3
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 1, 1, 1, 1, 1,},
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 0, 1, 1, 0, 1,},
};
int find = findLargestPlusPattern(mat, 1);
System.out.println("# Max + size radius is : " + find);
mat = new int[][]{ // max + = 2
{1, 1, 9, 1, 1, 9, 1,},
{1, 1, 9, 1, 1, 9, 1,},
{7, 1, 1, 1, 1, 1, 1,},
{1, 1, 9, 1, 1, 9, 1,},
{1, 1, 9, 1, 1, 9, 1,},
};
find = findLargestPlusPattern(mat, 1);
System.out.println("# Max + size radius is : " + find);
mat = new int[][]{ // max + = 1
{1, 1, 0, 1, 1},
{1, 1, 0, 1, 1},
{1, 1, 0, 1, 1},
{1, 1, 1, 6, 1},
{1, 1, 0, 1, 1},
{1, 1, 0, 1, 1},
};
find = findLargestPlusPattern(mat, 1);
System.out.println("# Max + size radius is : " + find);
}
}

The following uses basically the same logic as that provided by trincot in his answer, but does the reverse scan whenever there's a break in consecutive 1's. This eliminates the need to build an explicit stack.
The running time should be approximately the same. The only advantage to my method is that this algorithm uses O(1) extra space, whereas trincot's uses O(rowCount) extra space for the stack.
The extra stack makes for shorter and more readable code, though.
Code is in C#:
class FindPatternResult
{
public int Size { get; set; }
public int Col { get; set; }
public int Row { get; set; }
}
private FindPatternResult FindPattern(int[,] a)
{
var numCols = a.GetUpperBound(0)+1;
var numRows = a.GetUpperBound(1)+1;
var maxSize = -1;
var maxCol = -1;
var maxRow = -1;
// anonymous function for checking matches when there is
// a break in consecutive 1's.
var checkForMatch = new Action<int, int, int>((height, bottomRow, col) =>
{
var topRow = bottomRow - height + 1;
for (int row = bottomRow-1; row > topRow; --row)
{
// There's a potential optimization opportunity here.
// If we get beyond the midpoint and size is decreasing,
// then if size < maxSize, we can early-out.
// For example, if maxSize is 3 and tow-topRow < 3,
// then there can't possibly be a longer match in this column.
// I didn't add that optimization because it doesn't
// really change the algorithm. But if the number of rows
// is very large, it could be meaningful.
if (a[row, col - 1] == 1 && a[row, col + 1] == 1)
{
var size = Math.Min(bottomRow-row, row-topRow);
if (size > maxSize)
{
maxSize = size;
maxCol = col;
maxRow = row;
}
}
}
});
for (int col = 1; col < numCols - 1; ++col)
{
var size = 0;
for (int row = 0; row < numRows; ++row)
{
if (a[row,col] == 1)
{
++size;
}
else
{
// If size >= 3, then go back and check for a match
if (size >= 3)
{
checkForMatch(size, row, col);
}
size = 0;
}
}
// If we end the loop with size >= 3, then check from the bottom row.
if (size >= 3)
{
checkForMatch(size, numRows - 1, col);
}
}
Test with:
private void DoIt()
{
var rslt = FindPattern(_sampleData);
Console.WriteLine($"Result: {rslt.Size}, [{rslt.Row}, {rslt.Col}]");
}
private readonly int[,] _sampleData =
{
{0, 0, 1, 0, 0, 1, 0},
{0, 0, 1, 1, 0, 1, 0},
{1, 1, 1, 0, 0, 1, 1},
{1, 0, 1, 0, 1, 1, 0},
{1, 1, 1, 1, 0, 1, 0},
{0, 1, 1, 1, 1, 1, 1},
{0, 0, 1, 0, 0, 1, 0}
};

compute maximum number of black rectangles in a cutome given black and white board [duplicate]

What's the most efficient algorithm to find the rectangle with the largest area which will fit in the empty space?
Let's say the screen looks like this ('#' represents filled area):
....................
..............######
##..................
.................###
.................###
#####...............
#####...............
#####...............
A probable solution is:
....................
..............######
##...++++++++++++...
.....++++++++++++###
.....++++++++++++###
#####++++++++++++...
#####++++++++++++...
#####++++++++++++...
Normally I'd enjoy figuring out a solution. Although this time I'd like to avoid wasting time fumbling around on my own since this has a practical use for a project I'm working on. Is there a well-known solution?
Shog9 wrote:
Is your input an array (as implied by the other responses), or a list of occlusions in the form of arbitrarily sized, positioned rectangles (as might be the case in a windowing system when dealing with window positions)?
Yes, I have a structure which keeps track of a set of windows placed on the screen. I also have a grid which keeps track of all the areas between each edge, whether they are empty or filled, and the pixel position of their left or top edge. I think there is some modified form which would take advantage of this property. Do you know of any?

I'm the author of that Dr. Dobb's article and get occasionally asked about an implementation. Here is a simple one in C:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int one;
int two;
} Pair;
Pair best_ll = { 0, 0 };
Pair best_ur = { -1, -1 };
int best_area = 0;
int *c; /* Cache */
Pair *s; /* Stack */
int top = 0; /* Top of stack */
void push(int a, int b) {
s[top].one = a;
s[top].two = b;
++top;
}
void pop(int *a, int *b) {
--top;
*a = s[top].one;
*b = s[top].two;
}
int M, N; /* Dimension of input; M is length of a row. */
void update_cache() {
int m;
char b;
for (m = 0; m!=M; ++m) {
scanf(" %c", &b);
fprintf(stderr, " %c", b);
if (b=='0') {
c[m] = 0;
} else { ++c[m]; }
}
fprintf(stderr, "\n");
}
int main() {
int m, n;
scanf("%d %d", &M, &N);
fprintf(stderr, "Reading %dx%d array (1 row == %d elements)\n", M, N, M);
c = (int*)malloc((M+1)*sizeof(int));
s = (Pair*)malloc((M+1)*sizeof(Pair));
for (m = 0; m!=M+1; ++m) { c[m] = s[m].one = s[m].two = 0; }
/* Main algorithm: */
for (n = 0; n!=N; ++n) {
int open_width = 0;
update_cache();
for (m = 0; m!=M+1; ++m) {
if (c[m]>open_width) { /* Open new rectangle? */
push(m, open_width);
open_width = c[m];
} else /* "else" optional here */
if (c[m]<open_width) { /* Close rectangle(s)? */
int m0, w0, area;
do {
pop(&m0, &w0);
area = open_width*(m-m0);
if (area>best_area) {
best_area = area;
best_ll.one = m0; best_ll.two = n;
best_ur.one = m-1; best_ur.two = n-open_width+1;
}
open_width = w0;
} while (c[m]<open_width);
open_width = c[m];
if (open_width!=0) {
push(m0, w0);
}
}
}
}
fprintf(stderr, "The maximal rectangle has area %d.\n", best_area);
fprintf(stderr, "Location: [col=%d, row=%d] to [col=%d, row=%d]\n",
best_ll.one+1, best_ll.two+1, best_ur.one+1, best_ur.two+1);
return 0;
}
It takes its input from the console. You could e.g. pipe this file to it:
16 12
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 1 1 0 0 1 0 0 0 1 1 0 1 0
0 0 0 1 1 0 1 1 1 0 1 1 1 0 1 0
0 0 0 0 1 1 * * * * * * 0 0 1 0
0 0 0 0 0 0 * * * * * * 0 0 1 0
0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0
0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0
And after printing its input, it will output:
The maximal rectangle has area 12.
Location: [col=7, row=6] to [col=12, row=5]
The implementation above is nothing fancy of course, but it's very close to the explanation in the Dr. Dobb's article and should be easy to translate to whatever is needed.

#lassevk
I found the referenced article, from DDJ: The Maximal Rectangle Problem

I am the author of the Maximal Rectangle Solution on LeetCode, which is what this answer is based on.
Since the stack-based solution has already been discussed in the other answers, I would like to present an optimal O(NM) dynamic programming solution which originates from user morrischen2008.
Intuition
Imagine an algorithm where for each point we computed a rectangle by doing the following:
Finding the maximum height of the rectangle by iterating upwards until a filled area is reached
Finding the maximum width of the rectangle by iterating outwards left and right until a height that doesn't accommodate the maximum height of the rectangle
For example finding the rectangle defined by the yellow point:
We know that the maximal rectangle must be one of the rectangles constructed in this manner (the max rectangle must have a point on its base where the next filled square is height above that point).
For each point we define some variables:
h - the height of the rectangle defined by that point
l - the left bound of the rectangle defined by that point
r - the right bound of the rectangle defined by that point
These three variables uniquely define the rectangle at that point. We can compute the area of this rectangle with h * (r - l). The global maximum of all these areas is our result.
Using dynamic programming, we can use the h, l, and r of each point in the previous row to compute the h, l, and r for every point in the next row in linear time.
Algorithm
Given row matrix[i], we keep track of the h, l, and r of each point in the row by defining three arrays - height, left, and right.
height[j] will correspond to the height of matrix[i][j], and so on and so forth with the other arrays.
The question now becomes how to update each array.
height
h is defined as the number of continuous unfilled spaces in a line from our point. We increment if there is a new space, and set it to zero if the space is filled (we are using '1' to indicate an empty space and '0' as a filled one).
new_height[j] = old_height[j] + 1 if row[j] == '1' else 0
left:
Consider what causes changes to the left bound of our rectangle. Since all instances of filled spaces occurring in the row above the current one have already been factored into the current version of left, the only thing that affects our left is if we encounter a filled space in our current row.
As a result we can define:
new_left[j] = max(old_left[j], cur_left)
cur_left is one greater than rightmost filled space we have encountered. When we "expand" the rectangle to the left, we know it can't expand past that point, otherwise it'll run into the filled space.
right:
Here we can reuse our reasoning in left and define:
new_right[j] = min(old_right[j], cur_right)
cur_right is the leftmost occurrence of a filled space we have encountered.
Implementation
def maximalRectangle(matrix):
if not matrix: return 0
m = len(matrix)
n = len(matrix[0])
left = [0] * n # initialize left as the leftmost boundary possible
right = [n] * n # initialize right as the rightmost boundary possible
height = [0] * n
maxarea = 0
for i in range(m):
cur_left, cur_right = 0, n
# update height
for j in range(n):
if matrix[i][j] == '1': height[j] += 1
else: height[j] = 0
# update left
for j in range(n):
if matrix[i][j] == '1': left[j] = max(left[j], cur_left)
else:
left[j] = 0
cur_left = j + 1
# update right
for j in range(n-1, -1, -1):
if matrix[i][j] == '1': right[j] = min(right[j], cur_right)
else:
right[j] = n
cur_right = j
# update the area
for j in range(n):
maxarea = max(maxarea, height[j] * (right[j] - left[j]))
return maxarea

I implemented the solution of Dobbs in Java.
No warranty for anything.
package com.test;
import java.util.Stack;
public class Test {
public static void main(String[] args) {
boolean[][] test2 = new boolean[][] { new boolean[] { false, true, true, false },
new boolean[] { false, true, true, false }, new boolean[] { false, true, true, false },
new boolean[] { false, true, false, false } };
solution(test2);
}
private static class Point {
public Point(int x, int y) {
this.x = x;
this.y = y;
}
public int x;
public int y;
}
public static int[] updateCache(int[] cache, boolean[] matrixRow, int MaxX) {
for (int m = 0; m < MaxX; m++) {
if (!matrixRow[m]) {
cache[m] = 0;
} else {
cache[m]++;
}
}
return cache;
}
public static void solution(boolean[][] matrix) {
Point best_ll = new Point(0, 0);
Point best_ur = new Point(-1, -1);
int best_area = 0;
final int MaxX = matrix[0].length;
final int MaxY = matrix.length;
Stack<Point> stack = new Stack<Point>();
int[] cache = new int[MaxX + 1];
for (int m = 0; m != MaxX + 1; m++) {
cache[m] = 0;
}
for (int n = 0; n != MaxY; n++) {
int openWidth = 0;
cache = updateCache(cache, matrix[n], MaxX);
for (int m = 0; m != MaxX + 1; m++) {
if (cache[m] > openWidth) {
stack.push(new Point(m, openWidth));
openWidth = cache[m];
} else if (cache[m] < openWidth) {
int area;
Point p;
do {
p = stack.pop();
area = openWidth * (m - p.x);
if (area > best_area) {
best_area = area;
best_ll.x = p.x;
best_ll.y = n;
best_ur.x = m - 1;
best_ur.y = n - openWidth + 1;
}
openWidth = p.y;
} while (cache[m] < openWidth);
openWidth = cache[m];
if (openWidth != 0) {
stack.push(p);
}
}
}
}
System.out.printf("The maximal rectangle has area %d.\n", best_area);
System.out.printf("Location: [col=%d, row=%d] to [col=%d, row=%d]\n", best_ll.x + 1, best_ll.y + 1,
best_ur.x + 1, best_ur.y + 1);
}
}

#lassevk
// 4. Outer double-for-loop to consider all possible positions
// for topleft corner.
for (int i=0; i < M; i++) {
for (int j=0; j < N; j++) {
// 2.1 With (i,j) as topleft, consider all possible bottom-right corners.
for (int a=i; a < M; a++) {
for (int b=j; b < N; b++) {
HAHA... O(m2 n2).. That's probably what I would have come up with.
I see they go on to develop optmizations... looks good, I'll have a read.

Implementation of the stack-based algorithm in plain Javascript (with linear time complexity):
function maxRectangle(mask) {
var best = {area: 0}
const width = mask[0].length
const depth = Array(width).fill(0)
for (var y = 0; y < mask.length; y++) {
const ranges = Array()
for (var x = 0; x < width; x++) {
const d = depth[x] = mask[y][x] ? depth[x] + 1 : 0
if (!ranges.length || ranges[ranges.length - 1].height < d) {
ranges.push({left: x, height: d})
} else {
for (var j = ranges.length - 1; j >= 0 && ranges[j].height >= d; j--) {
const {left, height} = ranges[j]
const area = (x - left) * height
if (area > best.area) {
best = {area, left, top: y + 1 - height, right: x, bottom: y + 1}
}
}
ranges.splice(j+2)
ranges[j+1].height = d
}
}
}
return best;
}
var example = [
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0],
[0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0],
[0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]]
console.log(maxRectangle(example))

Hungarian Algorithm: finding minimum number of lines to cover zeroes?

I am trying to implement the Hungarian Algorithm but I am stuck on the step 5. Basically, given a n X n matrix of numbers, how can I find minimum number of vertical+horizontal lines such that the zeroes in the matrix are covered?
Before someone marks this question as a duplicate of this, the solution mentioned there is incorrect and someone else also ran into the bug in the code posted there.
I am not looking for code but rather the concept by which I can draw these lines...
EDIT:
Please do not post the simple (but wrong) greedy algorithm:
Given this input:
(0, 1, 0, 1, 1)
(1, 1, 0, 1, 1)
(1, 0, 0, 0, 1)
(1, 1, 0, 1, 1)
(1, 0, 0, 1, 0)
I select, column 2 obviously (0-indexed):
(0, 1, x, 1, 1)
(1, 1, x, 1, 1)
(1, 0, x, 0, 1)
(1, 1, x, 1, 1)
(1, 0, x, 1, 0)
Now I can either select row 2 or col 1 both of which have two "remaining" zeroes. If I select col2, I end up with incorrect solution down this path:
(0, x, x, 1, 1)
(1, x, x, 1, 1)
(1, x, x, 0, 1)
(1, x, x, 1, 1)
(1, x, x, 1, 0)
The correct solution is using 4 lines:
(x, x, x, x, x)
(1, 1, x, 1, 1)
(x, x, x, x, x)
(1, 1, x, 1, 1)
(x, x, x, x, x)

Update
I have implemented the Hungarian Algorithm in the same steps provided by the link you posted: Hungarian algorithm
Here's the files with comments:
Github
Algorithm (Improved greedy) for step 3: (This code is very detailed and good for understanding the concept of choosing line to draw: horizontal vs Vertical. But note that this step code is improved in my code in Github)
Calculate the max number of zeros vertically vs horizontally for each xy position in the input matrix and store the result in a separate array called m2.
While calculating, if horizontal zeros > vertical zeroes, then the calculated number is converted to negative. (just to distinguish which direction we chose for later use)
Loop through all elements in the m2 array. If the value is positive, draw a vertical line in array m3, if value is negative, draw an horizontal line in m3
Follow the below example + code to understand more the algorithm:
Create 3 arrays:
m1: First array, holds the input values
m2: Second array, holds maxZeroes(vertical,horizontal) at each x,y position
m3: Third array, holds the final lines (0 index uncovered, 1 index covered)
Create 2 functions:
hvMax(m1,row,col); returns maximum number of zeroes horizontal or vertical. (Positive number means vertical, negative number means horizontal)
clearNeighbours(m2, m3,row,col); void method, it will clear the horizontal neighbors if the value at row col indexes is negative, or clear vertical neighbors if positive. Moreover, it will set the line in the m3 array, by flipping the zero bit to 1.
Code
public class Hungarian {
public static void main(String[] args) {
// m1 input values
int[][] m1 = { { 0, 1, 0, 1, 1 }, { 1, 1, 0, 1, 1 }, { 1, 0, 0, 0, 1 },
{ 1, 1, 0, 1, 1 }, { 1, 0, 0, 1, 0 } };
// int[][] m1 = { {13,14,0,8},
// {40,0,12,40},
// {6,64,0,66},
// {0,1,90,0}};
// int[][] m1 = { {0,0,100},
// {50,100,0},
// {0,50,50}};
// m2 max(horizontal,vertical) values, with negative number for
// horizontal, positive for vertical
int[][] m2 = new int[m1.length][m1.length];
// m3 where the line are drawen
int[][] m3 = new int[m1.length][m1.length];
// loop on zeroes from the input array, and sotre the max num of zeroes
// in the m2 array
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
if (m1[row][col] == 0)
m2[row][col] = hvMax(m1, row, col);
}
}
// print m1 array (Given input array)
System.out.println("Given input array");
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
System.out.print(m1[row][col] + "\t");
}
System.out.println();
}
// print m2 array
System.out
.println("\nm2 array (max num of zeroes from horizontal vs vertical) (- for horizontal and + for vertical)");
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
System.out.print(m2[row][col] + "\t");
}
System.out.println();
}
// Loop on m2 elements, clear neighbours and draw the lines
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
if (Math.abs(m2[row][col]) > 0) {
clearNeighbours(m2, m3, row, col);
}
}
}
// prinit m3 array (Lines array)
System.out.println("\nLines array");
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
System.out.print(m3[row][col] + "\t");
}
System.out.println();
}
}
// max of vertical vs horizontal at index row col
public static int hvMax(int[][] m1, int row, int col) {
int vertical = 0;
int horizontal = 0;
// check horizontal
for (int i = 0; i < m1.length; i++) {
if (m1[row][i] == 0)
horizontal++;
}
// check vertical
for (int i = 0; i < m1.length; i++) {
if (m1[i][col] == 0)
vertical++;
}
// negative for horizontal, positive for vertical
return vertical > horizontal ? vertical : horizontal * -1;
}
// clear the neighbors of the picked largest value, the sign will let the
// app decide which direction to clear
public static void clearNeighbours(int[][] m2, int[][] m3, int row, int col) {
// if vertical
if (m2[row][col] > 0) {
for (int i = 0; i < m2.length; i++) {
if (m2[i][col] > 0)
m2[i][col] = 0; // clear neigbor
m3[i][col] = 1; // draw line
}
} else {
for (int i = 0; i < m2.length; i++) {
if (m2[row][i] < 0)
m2[row][i] = 0; // clear neigbor
m3[row][i] = 1; // draw line
}
}
m2[row][col] = 0;
m3[row][col] = 1;
}
}
Output
Given input array
0 1 0 1 1
1 1 0 1 1
1 0 0 0 1
1 1 0 1 1
1 0 0 1 0
m2 array (max num of zeroes from horizontal vs vertical) (- for horizontal and + for vertical)
-2 0 5 0 0
0 0 5 0 0
0 -3 5 -3 0
0 0 5 0 0
0 -3 5 0 -3
Lines array
1 1 1 1 1
0 0 1 0 0
1 1 1 1 1
0 0 1 0 0
1 1 1 1 1
PS: Your example that you pointed to, will never occur because as you can see the first loop do the calculations by taking the max(horizontal,vertical) and save them in m2. So col1 will not be selected because -3 means draw horizontal line, and -3 was calculated by taking the max between horizontal vs vertical zeros. So at the first iteration at the elements, the program has checked how to draw the lines, on the second iteration, the program draw the lines.

Greedy algorithms may not work for some cases.
Firstly, it is possible reformulate your problem as following: given a bipartite graph, find a minimum vertex cover. In this problem there are 2n nodes, n for rows and n for columns. There is an edge between two nodes if element at the intersection of corresponding column and row is zero. Vertex cover is a set of nodes (rows and columns) such that each edge is incident to some node from that set (each zero is covered by row or column).
This is a well known problem and can be solved in O(n^3) by finding a maximum matching. Check wikipedia for details

There are cases where Amir's code fails.
Consider the following m1:
0 0 1
0 1 1
1 0 1
The best solution is to draw vertical lines in the first two columns.
Amir's code would give the following m2:
-2 -2 0
2 0 0
0 2 0
And the result would draw the two vertical lines AS WELL AS a line in the first row.
It seems to me the problem is the tie-breaking case:
return vertical > horizontal ? vertical : horizontal * -1;
Because of the way the code is written, the very similar m1 will NOT fail:
0 1 1
1 0 1
0 0 1
Where the first row is moved to the bottom, because the clearing function will clear the -2 values from m2 before those cells are reached. In the first case, the -2 values are hit first, so a horizontal line is drawn through the first row.
I've been working a little through this, and this is what I have. In the case of a tie, do not set any value and do not draw a line through those cells. This covers the case of the matrix I mentioned above, we are done at this step.
Clearly, there are situations where there will remain 0s that are uncovered. Below is another example of a matrix that will fail in Amir's method (m1):
0 0 1 1 1
0 1 0 1 1
0 1 1 0 1
1 1 0 0 1
1 1 1 1 1
The optimal solution is four lines, e.g. the first four columns.
Amir's method gives m2:
3 -2 0 0 0
3 0 -2 0 0
3 0 0 -2 0
0 0 -2 -2 0
0 0 0 0 0
Which will draw lines at the first four rows and the first column (an incorrect solution, giving 5 lines). Again, the tie-breaker case is the issue. We solve this by not setting a value for the ties, and iterating the procedure.
If we ignore the ties we get an m2:
3 -2 0 0 0
3 0 0 0 0
3 0 0 0 0
0 0 0 0 0
0 0 0 0 0
This leads to covering only the first row and the first column. We then take out the 0s that are covered to give new m1:
1 1 1 1 1
1 1 0 1 1
1 1 1 0 1
1 1 0 0 1
1 1 1 1 1
Then we keep repeating the procedure (ignoring ties) until we reach a solution. Repeat for a new m2:
0 0 0 0 0
0 0 2 0 0
0 0 0 2 0
0 0 0 0 0
0 0 0 0 0
Which leads to two vertical lines through the second and third columns. All 0s are now covered, needing only four lines (this is an alternative to lining the first four columns). The above matrix only needs 2 iterations, and I imagine most of these cases will need only two iterations unless there are sets of ties nested within sets of ties. I tried to come up with one, but it became difficult to manage.
Sadly, this is not good enough, because there will be cases that will remain tied forever. Particularly, in cases where there is a 'disjoint set of tied cells'. Not sure how else to describe this except to draw the following two examples:
0 0 1 1
0 1 1 1
1 0 1 1
1 1 1 0
or
0 0 1 1 1
0 1 1 1 1
1 0 1 1 1
1 1 1 0 0
1 1 1 0 0
The upper-left 3x3 sub-matrices in these two examples are identical to my original example, I have added 1 or 2 rows/cols to that example at the bottom and right. The only newly added zeros are where the new rows and columns cross. Describing for clarity.
With the iterative method I described, these matrices will be caught in an infinite loop. The zeros will always remain tied (col-count vs row-count). At this point, it does make sense to just arbitrarily choose a direction in the case of a tie, at least from what I can imagine.
The only issue I'm running into is setting up the stopping criteria for the loop. I can't assume that 2 iterations is enough (or any n), but I also can't figure out how to detect if a matrix has only infinite loops left within it. I'm still not sure how to describe these disjoint-tied-sets computationally.
Here is the code to do what I have come up with so far (in MATLAB script):
function [Lines, AllRows, AllCols] = FindMinLines(InMat)
%The following code finds the minimum set of lines (rows and columns)
%required to cover all of the true-valued cells in a matrix. If using for
%the Hungarian problem where 'true-values' are equal to zero, make the
%necessary changes. This code is not complete, since it will be caught in
%an infinite loop in the case of disjoint-tied-sets
%If passing in a matrix where 0s are the cells of interest, uncomment the
%next line
%InMat = InMat == 0;
%Assume square matrix
Count = length(InMat);
Lines = zeros(Count);
%while there are any 'true' values not covered by lines
while any(any(~Lines & InMat))
%Calculate row-wise and col-wise totals of 'trues' not-already-covered
HorzCount = repmat(sum(~Lines & InMat, 2), 1, Count).*(~Lines & InMat);
VertCount = repmat(sum(~Lines & InMat, 1), Count, 1).*(~Lines & InMat);
%Calculate for each cell the difference between row-wise and col-wise
%counts. I.e. row-oriented cells will have a negative number, col-oriented
%cells will have a positive numbers, ties and 'non-trues' will be 0.
%Non-zero values indicate lines to be drawn where orientation is determined
%by sign.
DiffCounts = VertCount - HorzCount;
%find the row and col indices of the lines
HorzIdx = any(DiffCounts < 0, 2);
VertIdx = any(DiffCounts > 0, 1);
%Set the horizontal and vertical indices of the Lines matrix to true
Lines(HorzIdx, :) = true;
Lines(:, VertIdx) = true;
end
%compute index numbers to be returned.
AllRows = [find(HorzIdx); find(DisjTiedRows)];
AllCols = find(VertIdx);
end

Step 5:
The drawing of line in the matrix is evaluated diagonally with a maximum evaluations of the length of the matrix.
Based on http://www.wikihow.com/Use-the-Hungarian-Algorithm with Steps 1 - 8 only.
Run code snippet and see results in console
Console Output
horizontal line (row): {"0":0,"2":2,"4":4}
vertical line (column): {"2":2}
Step 5: Matrix
0 1 0 1 1
1 1 0 1 1
1 0 0 0 1
1 1 0 1 1
1 0 0 1 0
Smallest number in uncovered matrix: 1
Step 6: Matrix
x x x x x
1 1 x 1 1
x x x x x
1 1 x 1 1
x x x x x
JSFiddle: http://jsfiddle.net/jjcosare/6Lpz5gt9/2/
// http://www.wikihow.com/Use-the-Hungarian-Algorithm
var inputMatrix = [
[0, 1, 0, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 0, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 1, 0]
];
//var inputMatrix = [
// [10, 19, 8, 15],
// [10, 18, 7, 17],
// [13, 16, 9, 14],
// [12, 19, 8, 18],
// [14, 17, 10, 19]
// ];
var matrix = inputMatrix;
var HungarianAlgorithm = {};
HungarianAlgorithm.step1 = function(stepNumber) {
console.log("Step " + stepNumber + ": Matrix");
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
var sb = "";
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[i][j];
sb += currentNumber + " ";
}
console.log(sb);
}
}
HungarianAlgorithm.step2 = function() {
var largestNumberInMatrix = getLargestNumberInMatrix(matrix);
var rowLength = matrix.length;
var columnLength = matrix[0].length;
var dummyMatrixToAdd = 0;
var isAddColumn = rowLength > columnLength;
var isAddRow = columnLength > rowLength;
if (isAddColumn) {
dummyMatrixToAdd = rowLength - columnLength;
for (var i = 0; i < rowLength; i++) {
for (var j = columnLength; j < (columnLength + dummyMatrixToAdd); j++) {
matrix[i][j] = largestNumberInMatrix;
}
}
} else if (isAddRow) {
dummyMatrixToAdd = columnLength - rowLength;
for (var i = rowLength; i < (rowLength + dummyMatrixToAdd); i++) {
matrix[i] = [];
for (var j = 0; j < columnLength; j++) {
matrix[i][j] = largestNumberInMatrix;
}
}
}
HungarianAlgorithm.step1(2);
console.log("Largest number in matrix: " + largestNumberInMatrix);
function getLargestNumberInMatrix(matrix) {
var largestNumberInMatrix = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[i][j];
largestNumberInMatrix = (largestNumberInMatrix > currentNumber) ?
largestNumberInMatrix : currentNumber;
}
}
return largestNumberInMatrix;
}
}
HungarianAlgorithm.step3 = function() {
var smallestNumberInRow = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
smallestNumberInRow = getSmallestNumberInRow(matrix, i);
console.log("Smallest number in row[" + i + "]: " + smallestNumberInRow);
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[i][j];
matrix[i][j] = currentNumber - smallestNumberInRow;
}
}
HungarianAlgorithm.step1(3);
function getSmallestNumberInRow(matrix, rowIndex) {
var smallestNumberInRow = matrix[rowIndex][0];
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[rowIndex][i];
smallestNumberInRow = (smallestNumberInRow < currentNumber) ?
smallestNumberInRow : currentNumber;
}
return smallestNumberInRow;
}
}
HungarianAlgorithm.step4 = function() {
var smallestNumberInColumn = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
smallestNumberInColumn = getSmallestNumberInColumn(matrix, i);
console.log("Smallest number in column[" + i + "]: " + smallestNumberInColumn);
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[j][i];
matrix[j][i] = currentNumber - smallestNumberInColumn;
}
}
HungarianAlgorithm.step1(4);
function getSmallestNumberInColumn(matrix, columnIndex) {
var smallestNumberInColumn = matrix[0][columnIndex];
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[i][columnIndex];
smallestNumberInColumn = (smallestNumberInColumn < currentNumber) ?
smallestNumberInColumn : currentNumber;
}
return smallestNumberInColumn;
}
}
var rowLine = {};
var columnLine = {};
HungarianAlgorithm.step5 = function() {
var zeroNumberCountRow = 0;
var zeroNumberCountColumn = 0;
rowLine = {};
columnLine = {};
for (var i = 0; i < matrix.length; i++) {
zeroNumberCountRow = getZeroNumberCountInRow(matrix, i);
zeroNumberCountColumn = getZeroNumberCountInColumn(matrix, i);
if (zeroNumberCountRow > zeroNumberCountColumn) {
rowLine[i] = i;
if (zeroNumberCountColumn > 1) {
columnLine[i] = i;
}
} else if (zeroNumberCountRow < zeroNumberCountColumn) {
columnLine[i] = i;
if (zeroNumberCountRow > 1) {
rowLine[i] = i;
}
} else {
if ((zeroNumberCountRow + zeroNumberCountColumn) > 2) {
rowLine[i] = i;
columnLine[i] = i;
}
}
}
var zeroCount = 0;
for (var i in columnLine) {
zeroCount = getZeroNumberCountInColumnLine(matrix, columnLine[i], rowLine);
if (zeroCount == 0) {
delete columnLine[i];
}
}
for (var i in rowLine) {
zeroCount = getZeroNumberCountInRowLine(matrix, rowLine[i], columnLine);
if (zeroCount == 0) {
delete rowLine[i];
}
}
console.log("horizontal line (row): " + JSON.stringify(rowLine));
console.log("vertical line (column): " + JSON.stringify(columnLine));
HungarianAlgorithm.step1(5);
//if ((Object.keys(rowLine).length + Object.keys(columnLine).length) == matrix.length) {
// TODO:
// HungarianAlgorithm.step9();
//} else {
// HungarianAlgorithm.step6();
// HungarianAlgorithm.step7();
// HungarianAlgorithm.step8();
//}
function getZeroNumberCountInColumnLine(matrix, columnIndex, rowLine) {
var zeroNumberCount = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[i][columnIndex];
if (currentNumber == 0 && !(rowLine[i] == i)) {
zeroNumberCount++
}
}
return zeroNumberCount;
}
function getZeroNumberCountInRowLine(matrix, rowIndex, columnLine) {
var zeroNumberCount = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[rowIndex][i];
if (currentNumber == 0 && !(columnLine[i] == i)) {
zeroNumberCount++
}
}
return zeroNumberCount;
}
function getZeroNumberCountInColumn(matrix, columnIndex) {
var zeroNumberCount = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[i][columnIndex];
if (currentNumber == 0) {
zeroNumberCount++
}
}
return zeroNumberCount;
}
function getZeroNumberCountInRow(matrix, rowIndex) {
var zeroNumberCount = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[rowIndex][i];
if (currentNumber == 0) {
zeroNumberCount++
}
}
return zeroNumberCount;
}
}
HungarianAlgorithm.step6 = function() {
var smallestNumberInUncoveredMatrix = getSmallestNumberInUncoveredMatrix(matrix, rowLine, columnLine);
console.log("Smallest number in uncovered matrix: " + smallestNumberInUncoveredMatrix);
var columnIndex = 0;
for (var i in columnLine) {
columnIndex = columnLine[i];
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[i][columnIndex];
//matrix[i][columnIndex] = currentNumber + smallestNumberInUncoveredMatrix;
matrix[i][columnIndex] = "x";
}
}
var rowIndex = 0;
for (var i in rowLine) {
rowIndex = rowLine[i];
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[rowIndex][i];
//matrix[rowIndex][i] = currentNumber + smallestNumberInUncoveredMatrix;
matrix[rowIndex][i] = "x";
}
}
HungarianAlgorithm.step1(6);
function getSmallestNumberInUncoveredMatrix(matrix, rowLine, columnLine) {
var smallestNumberInUncoveredMatrix = null;;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
if (rowLine[i]) {
continue;
}
for (var j = 0; j < matrix[i].length; j++) {
if (columnLine[j]) {
continue;
}
currentNumber = matrix[i][j];
if (!smallestNumberInUncoveredMatrix) {
smallestNumberInUncoveredMatrix = currentNumber;
}
smallestNumberInUncoveredMatrix =
(smallestNumberInUncoveredMatrix < currentNumber) ?
smallestNumberInUncoveredMatrix : currentNumber;
}
}
return smallestNumberInUncoveredMatrix;
}
}
HungarianAlgorithm.step7 = function() {
var smallestNumberInMatrix = getSmallestNumberInMatrix(matrix);
console.log("Smallest number in matrix: " + smallestNumberInMatrix);
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[j][i];
matrix[j][i] = currentNumber - smallestNumberInMatrix;
}
}
HungarianAlgorithm.step1(7);
function getSmallestNumberInMatrix(matrix) {
var smallestNumberInMatrix = matrix[0][0];
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[i][j];
smallestNumberInMatrix = (smallestNumberInMatrix < currentNumber) ?
smallestNumberInMatrix : currentNumber;
}
}
return smallestNumberInMatrix;
}
}
HungarianAlgorithm.step8 = function() {
console.log("Step 8: Covering zeroes with Step 5 - 8 until Step 9 is reached");
HungarianAlgorithm.step5();
}
HungarianAlgorithm.step9 = function(){
console.log("Step 9...");
}
HungarianAlgorithm.step1(1);
HungarianAlgorithm.step2();
HungarianAlgorithm.step3();
HungarianAlgorithm.step4();
HungarianAlgorithm.step5();
HungarianAlgorithm.step6();

Do the assignment using the steps mentioned below:
assign a row if it has only one 0, else skip the row temporarily
cross out the 0's in the assigned column
Do the same for every column
After doing the assignment using the above steps, follow the steps below to get the minimum number of lines which cover all the 0's
step 1 - Tick an unassigned row
step 2 - If a ticked row has a 0, then tick the corresponding column
step 3 - If a ticked column has an assignment, then tick the corresponding row
step 4 - Repeat steps 2 and 3, till no more ticking is possible
step 5 - Draw lines through un-ticked rows and ticked columns
For your case: (0-indexing for rows and columns)
skip row 0, as it has two 0's
assign row 1, and cross out all the 0's in column 2
skip row 2, as it has two uncrossed 0's
skip row 3, as it has no uncrossed 0
skip row 4, as it has 2 uncrossed 0's
assign column 0
skip column 1 as it has two uncrossed 0's (in row-2 and row-4)
skip column 2, as it has an already assigned 0
assign column 3,and cross out the 0 in row 2
assign column 4, and cross out the 0 in row 4
assigned 0's are shown by '_' and 'x' shows crossed out 0's
( _ 1 x 1 1 ),
( 1 1 _ 1 1 ),
( 1 x x _ 1 ),
( 1 1 x 1 1 ),
( 1 x x 1 _ )
The matrix looks like the one shown above after doing the assignments
Now follow the 5 steps mentioned above to get the minimum number of lines that cover all the 0's
Tick row 3 as it is not assigned yet
Since row 3 has a 0 in column 2, tick column 2
Since column 2 has an assignment in row 1, tick row 1
Now draw lines through un-ticked rows (i.e. row 0,2,4) and ticked columns(i.e. column 2)
These 4 lines will cover all the 0's
Hope this helps:)
PS : For cases where no initial assignment is possible due to multiple 0's in each row and column, this could be handled by taking one arbitrary assignment (For the cases where multiple 0's are present in each row and column, it is very likely that more than one possible assignment would result in an optimal solution)

#CMPS answer fails on quite a few graphs. I think I have implemented a solution which solves the problem.
I followed the Wikipedia article on the Hungarian algorithm and I made an implementation that seems to work all the time.
From Wikipedia, here is a the method to draw the minimum number of lines:
First, assign as many tasks as possible.
Mark all rows having no assignments.
Mark all (unmarked) columns having zeros in newly marked row(s).
Mark all rows having assignments in newly marked columns.
Repeat for all non-assigned rows.
Here is my Ruby implementation:
def draw_lines grid
#copies the array
marking_grid = grid.map { |a| a.dup }
marked_rows = Array.new
marked_cols = Array.new
while there_is_zero(marking_grid) do
marking_grid = grid.map { |a| a.dup }
marked_cols.each do |col|
cross_out(marking_grid,nil, col)
end
marked = assignment(grid, marking_grid)
marked_rows = marked[0]
marked_cols.concat(marked[1]).uniq!
marking_grid = grid.map { |a| a.dup }
marking_grid.length.times do |row|
if !(marked_rows.include? row) then
cross_out(marking_grid,row, nil)
end
end
marked_cols.each do |col|
cross_out(marking_grid,nil, col)
end
end
lines = Array.new
marked_cols.each do |index|
lines.push(["column", index])
end
grid.each_index do |index|
if !(marked_rows.include? index) then
lines.push(["row", index])
end
end
return lines
end
def there_is_zero grid
grid.each_with_index do |row|
row.each_with_index do |value|
if value == 0 then
return true
end
end
end
return false
end
def assignment grid, marking_grid
marking_grid.each_index do |row_index|
first_zero = marking_grid[row_index].index(0)
#if there is no zero go to next row
if first_zero.nil? then
next
else
cross_out(marking_grid, row_index, first_zero)
marking_grid[row_index][first_zero] = "*"
end
end
return mark(grid, marking_grid)
end
def mark grid, marking_grid, marked_rows = Array.new, marked_cols = Array.new
marking_grid.each_with_index do |row, row_index|
selected_assignment = row.index("*")
if selected_assignment.nil? then
marked_rows.push(row_index)
end
end
marked_rows.each do |index|
grid[index].each_with_index do |cost, col_index|
if cost == 0 then
marked_cols.push(col_index)
end
end
end
marked_cols = marked_cols.uniq
marked_cols.each do |col_index|
marking_grid.each_with_index do |row, row_index|
if row[col_index] == "*" then
marked_rows.push(row_index)
end
end
end
return [marked_rows, marked_cols]
end
def cross_out(marking_grid, row, col)
if col != nil then
marking_grid.each_index do |i|
marking_grid[i][col] = "X"
end
end
if row != nil then
marking_grid[row].map! {|i| "X"}
end
end
grid = [
[0,0,1,0],
[0,0,1,0],
[0,1,1,1],
[0,1,1,1],
]
p draw_lines(grid)

Given a 2D array of numbers, find clusters

Given a 2D array, for example:
0 0 0 0 0
0 2 3 0 1
0 8 5 0 7
7 0 0 0 4
Output should be groups of clusters:
Cluster 1: <2,3,8,5,7>
Cluster 2: <1,7,4>

Your problem seems to be finding connected components. You should use a traverse method (either BFS or DFS will do the work). Iterate over all elements, for each non-zero element start a traverse and record all elements you see in that traverse and turn each visited element into zero. Something like the code below:
void DFS(int x, int y)
{
printf("%d ", g[x][y]);
g[x][y] = 0;
// iterate over neighbours
for(dx=-1; dx<=1; dx++)
for(dy=-1; dy<=1; dy++)
if (g[x+dx][y+dy]) DFS(x+dx, y+dy);
}
for(i=0; i<n; i++)
for(j=0; j<n; j++)
if (g[i][j])
{
DFS(i, j);
printf("\n");
}

You want to do Connected Component Labeling. This is usually considered an image processing algorithm, but it matches what you describe.
You will also get recommendations of K-means because you said 2D array of numbers and it is easy to interpret that as array of 2D numbers. K-means finds clusters of points in a plane, not connected groups in a 2D array like you request.

Code sample:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Practice
{
class Program
{
static void Main()
{
var matrix = new[] { new StringBuilder("00000"), new StringBuilder("02301"), new StringBuilder("08507"), new StringBuilder("70004") };
var clusters = 0;
for (var i = 0; i < matrix.Length; i++)
for (var j = 0; j < (matrix.Any() ? matrix[i].Length : 0); j++)
if (matrix[i][j] != '0')
{
Console.Write("Cluster {0}: <", ++clusters);
var cluster = new List<char>();
Traverse(matrix, i, j, ref cluster);
Console.WriteLine("{0}>", string.Join(",", cluster));
}
Console.ReadKey();
}
private static void Traverse(StringBuilder[] matrix, int row, int col, ref List<char> cluster)
{
cluster.Add(matrix[row][col]);
matrix[row][col] = '0';
for (var i = -1; i <= 1 && row + i >= 0 && row + i < matrix.Length; i++)
for (var j = -1; j <= 1 && col + j >= 0 && col + j < (matrix.Any() ? matrix[row + i].Length : 0); j++)
if(matrix[row + i][col + j] != '0')
Traverse(matrix, row + i, col + j, ref cluster);
}
}
}
Algorithm explanation:
For each row:
For each column in that row:
If the item is an unvisited non-zero:
Save the found cluster member;
Mark location at [row, column] as visited;
Check for any unvisited non-zero neighbors while staying in-bounds of the matrix:
[row - 1, column - 1];
[row - 1, column];
[row - 1, column + 1];
[row, column - 1];
[row, column + 1];
[row + 1, column - 1];
[row + 1, column];
[row + 1, column + 1].
If any neighbor is an unvisited non-zero repeat steps 1-4 recursively until all neighbors are visited zeros (all cluster members have been found).

One way to do it is with a graph. Traverse the matrix in some order (I'd go left to right, top to bottom). When you encounter a non-zero element, add it to the graph. Then check all of its neighbors (it looks like you want 8-connected neighbors), and for each one that is non-zero, add its node to the graph, and connect it to the current element. The elements in the graph will probably have to keep track of their coordinates so you can see if you're adding a duplicate or not. When you're done traversing the matrix, you have a graph which contains a set of clusters. Clusters should be sub-graphs of connected elements.

If you know the number of groups or want to fit your data to a static number of groups, you can do k-means.
http://en.wikipedia.org/wiki/K-means_clustering

Using DFS:
import math
import numpy as np
def findConnectivityDFS(matrix):
rows = matrix.shape[0]
cols = matrix.shape[1]
finalResult = {}
# For 4 - connectivity use the following:
dx = [+1, 0, -1, 0]
dy = [0, +1, 0, -1]
# For 8 - connectivity use the following:
# dx = [-1, 0, 1,-1 ,1 ,-1,0,1]
# dy = [-1,-1,-1,-0 ,0 ,1, 1,1]
label = np.zeros((rows, cols))
def dfs(x, y, current_label):
# Taking care of boundaries and visited elements and zero elements in original matrix
if x < 0 or x == rows or y < 0 or y == cols or label[x][y] != 0 or matrix[x][y] == 0:
return
label[x][y] = current_label
if current_label not in finalResult.keys():
finalResult[current_label] = []
finalResult[current_label].append(matrix[x][y])
for direction in range(len(dx)):
dfs(x + dx[direction], y + dy[direction], current_label)
component = 0
for i in range(rows):
for j in range(cols):
if (label[i][j] == 0 and matrix[i][j] != 0):
component = component + 1
dfs(i, j, component)
return finalResult
BFS solution:
def findConnectivityBFS(matrix):
dx = [+1, 0, -1, 0]
dy = [0, +1, 0, -1]
h, w = matrix.shape
label = np.zeros((h, w))
clusters = {}
component = 1
for i in range(h):
for j in range(w):
if matrix[i][j] != 0 and label[i][j] == 0:
cluster = []
queue = [(i, j, component)]
while queue:
x, y, component = queue.pop(0)
if 0 <= x < h and 0 <= y < w and label[x][y] == 0 and matrix[x][y] != 0:
label[x][y] = component
cluster.append(matrix[x][y])
for direction in range(len(dx)):
queue.append([x+dx[direction], y+dy[direction], component])
clusters[component] = cluster
component += 1
return clusters
Result:
matrix = np.array([
[0 ,0, 0, 0,0],
[0, 2, 3, 0, 1],
[0, 8, 5, 0, 7],
[7, 0, 0, 0, 4],
])
print(matrix)
print("============findConnectivityDFS============")
print(findConnectivityDFS(matrix))
print("============findConnectivityBFS============")
print(findConnectivityBFS(matrix))

Puzzle: Find largest rectangle (maximal rectangle problem)

What's the most efficient algorithm to find the rectangle with the largest area which will fit in the empty space?
Let's say the screen looks like this ('#' represents filled area):
....................
..............######
##..................
.................###
.................###
#####...............
#####...............
#####...............
A probable solution is:
....................
..............######
##...++++++++++++...
.....++++++++++++###
.....++++++++++++###
#####++++++++++++...
#####++++++++++++...
#####++++++++++++...
Normally I'd enjoy figuring out a solution. Although this time I'd like to avoid wasting time fumbling around on my own since this has a practical use for a project I'm working on. Is there a well-known solution?
Shog9 wrote:
Is your input an array (as implied by the other responses), or a list of occlusions in the form of arbitrarily sized, positioned rectangles (as might be the case in a windowing system when dealing with window positions)?
Yes, I have a structure which keeps track of a set of windows placed on the screen. I also have a grid which keeps track of all the areas between each edge, whether they are empty or filled, and the pixel position of their left or top edge. I think there is some modified form which would take advantage of this property. Do you know of any?

I'm the author of that Dr. Dobb's article and get occasionally asked about an implementation. Here is a simple one in C:
#include <assert.h>
#include <stdio.h>
#include <stdlib.h>
typedef struct {
int one;
int two;
} Pair;
Pair best_ll = { 0, 0 };
Pair best_ur = { -1, -1 };
int best_area = 0;
int *c; /* Cache */
Pair *s; /* Stack */
int top = 0; /* Top of stack */
void push(int a, int b) {
s[top].one = a;
s[top].two = b;
++top;
}
void pop(int *a, int *b) {
--top;
*a = s[top].one;
*b = s[top].two;
}
int M, N; /* Dimension of input; M is length of a row. */
void update_cache() {
int m;
char b;
for (m = 0; m!=M; ++m) {
scanf(" %c", &b);
fprintf(stderr, " %c", b);
if (b=='0') {
c[m] = 0;
} else { ++c[m]; }
}
fprintf(stderr, "\n");
}
int main() {
int m, n;
scanf("%d %d", &M, &N);
fprintf(stderr, "Reading %dx%d array (1 row == %d elements)\n", M, N, M);
c = (int*)malloc((M+1)*sizeof(int));
s = (Pair*)malloc((M+1)*sizeof(Pair));
for (m = 0; m!=M+1; ++m) { c[m] = s[m].one = s[m].two = 0; }
/* Main algorithm: */
for (n = 0; n!=N; ++n) {
int open_width = 0;
update_cache();
for (m = 0; m!=M+1; ++m) {
if (c[m]>open_width) { /* Open new rectangle? */
push(m, open_width);
open_width = c[m];
} else /* "else" optional here */
if (c[m]<open_width) { /* Close rectangle(s)? */
int m0, w0, area;
do {
pop(&m0, &w0);
area = open_width*(m-m0);
if (area>best_area) {
best_area = area;
best_ll.one = m0; best_ll.two = n;
best_ur.one = m-1; best_ur.two = n-open_width+1;
}
open_width = w0;
} while (c[m]<open_width);
open_width = c[m];
if (open_width!=0) {
push(m0, w0);
}
}
}
}
fprintf(stderr, "The maximal rectangle has area %d.\n", best_area);
fprintf(stderr, "Location: [col=%d, row=%d] to [col=%d, row=%d]\n",
best_ll.one+1, best_ll.two+1, best_ur.one+1, best_ur.two+1);
return 0;
}
It takes its input from the console. You could e.g. pipe this file to it:
16 12
0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 1 1 0 0 1 0 0 0 1 1 0 1 0
0 0 0 1 1 0 1 1 1 0 1 1 1 0 1 0
0 0 0 0 1 1 * * * * * * 0 0 1 0
0 0 0 0 0 0 * * * * * * 0 0 1 0
0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 0
0 0 1 0 0 0 0 1 0 0 1 1 1 0 1 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0
And after printing its input, it will output:
The maximal rectangle has area 12.
Location: [col=7, row=6] to [col=12, row=5]
The implementation above is nothing fancy of course, but it's very close to the explanation in the Dr. Dobb's article and should be easy to translate to whatever is needed.

#lassevk
I found the referenced article, from DDJ: The Maximal Rectangle Problem

I am the author of the Maximal Rectangle Solution on LeetCode, which is what this answer is based on.
Since the stack-based solution has already been discussed in the other answers, I would like to present an optimal O(NM) dynamic programming solution which originates from user morrischen2008.
Intuition
Imagine an algorithm where for each point we computed a rectangle by doing the following:
Finding the maximum height of the rectangle by iterating upwards until a filled area is reached
Finding the maximum width of the rectangle by iterating outwards left and right until a height that doesn't accommodate the maximum height of the rectangle
For example finding the rectangle defined by the yellow point:
We know that the maximal rectangle must be one of the rectangles constructed in this manner (the max rectangle must have a point on its base where the next filled square is height above that point).
For each point we define some variables:
h - the height of the rectangle defined by that point
l - the left bound of the rectangle defined by that point
r - the right bound of the rectangle defined by that point
These three variables uniquely define the rectangle at that point. We can compute the area of this rectangle with h * (r - l). The global maximum of all these areas is our result.
Using dynamic programming, we can use the h, l, and r of each point in the previous row to compute the h, l, and r for every point in the next row in linear time.
Algorithm
Given row matrix[i], we keep track of the h, l, and r of each point in the row by defining three arrays - height, left, and right.
height[j] will correspond to the height of matrix[i][j], and so on and so forth with the other arrays.
The question now becomes how to update each array.
height
h is defined as the number of continuous unfilled spaces in a line from our point. We increment if there is a new space, and set it to zero if the space is filled (we are using '1' to indicate an empty space and '0' as a filled one).
new_height[j] = old_height[j] + 1 if row[j] == '1' else 0
left:
Consider what causes changes to the left bound of our rectangle. Since all instances of filled spaces occurring in the row above the current one have already been factored into the current version of left, the only thing that affects our left is if we encounter a filled space in our current row.
As a result we can define:
new_left[j] = max(old_left[j], cur_left)
cur_left is one greater than rightmost filled space we have encountered. When we "expand" the rectangle to the left, we know it can't expand past that point, otherwise it'll run into the filled space.
right:
Here we can reuse our reasoning in left and define:
new_right[j] = min(old_right[j], cur_right)
cur_right is the leftmost occurrence of a filled space we have encountered.
Implementation
def maximalRectangle(matrix):
if not matrix: return 0
m = len(matrix)
n = len(matrix[0])
left = [0] * n # initialize left as the leftmost boundary possible
right = [n] * n # initialize right as the rightmost boundary possible
height = [0] * n
maxarea = 0
for i in range(m):
cur_left, cur_right = 0, n
# update height
for j in range(n):
if matrix[i][j] == '1': height[j] += 1
else: height[j] = 0
# update left
for j in range(n):
if matrix[i][j] == '1': left[j] = max(left[j], cur_left)
else:
left[j] = 0
cur_left = j + 1
# update right
for j in range(n-1, -1, -1):
if matrix[i][j] == '1': right[j] = min(right[j], cur_right)
else:
right[j] = n
cur_right = j
# update the area
for j in range(n):
maxarea = max(maxarea, height[j] * (right[j] - left[j]))
return maxarea

I implemented the solution of Dobbs in Java.
No warranty for anything.
package com.test;
import java.util.Stack;
public class Test {
public static void main(String[] args) {
boolean[][] test2 = new boolean[][] { new boolean[] { false, true, true, false },
new boolean[] { false, true, true, false }, new boolean[] { false, true, true, false },
new boolean[] { false, true, false, false } };
solution(test2);
}
private static class Point {
public Point(int x, int y) {
this.x = x;
this.y = y;
}
public int x;
public int y;
}
public static int[] updateCache(int[] cache, boolean[] matrixRow, int MaxX) {
for (int m = 0; m < MaxX; m++) {
if (!matrixRow[m]) {
cache[m] = 0;
} else {
cache[m]++;
}
}
return cache;
}
public static void solution(boolean[][] matrix) {
Point best_ll = new Point(0, 0);
Point best_ur = new Point(-1, -1);
int best_area = 0;
final int MaxX = matrix[0].length;
final int MaxY = matrix.length;
Stack<Point> stack = new Stack<Point>();
int[] cache = new int[MaxX + 1];
for (int m = 0; m != MaxX + 1; m++) {
cache[m] = 0;
}
for (int n = 0; n != MaxY; n++) {
int openWidth = 0;
cache = updateCache(cache, matrix[n], MaxX);
for (int m = 0; m != MaxX + 1; m++) {
if (cache[m] > openWidth) {
stack.push(new Point(m, openWidth));
openWidth = cache[m];
} else if (cache[m] < openWidth) {
int area;
Point p;
do {
p = stack.pop();
area = openWidth * (m - p.x);
if (area > best_area) {
best_area = area;
best_ll.x = p.x;
best_ll.y = n;
best_ur.x = m - 1;
best_ur.y = n - openWidth + 1;
}
openWidth = p.y;
} while (cache[m] < openWidth);
openWidth = cache[m];
if (openWidth != 0) {
stack.push(p);
}
}
}
}
System.out.printf("The maximal rectangle has area %d.\n", best_area);
System.out.printf("Location: [col=%d, row=%d] to [col=%d, row=%d]\n", best_ll.x + 1, best_ll.y + 1,
best_ur.x + 1, best_ur.y + 1);
}
}

#lassevk
// 4. Outer double-for-loop to consider all possible positions
// for topleft corner.
for (int i=0; i < M; i++) {
for (int j=0; j < N; j++) {
// 2.1 With (i,j) as topleft, consider all possible bottom-right corners.
for (int a=i; a < M; a++) {
for (int b=j; b < N; b++) {
HAHA... O(m2 n2).. That's probably what I would have come up with.
I see they go on to develop optmizations... looks good, I'll have a read.

Implementation of the stack-based algorithm in plain Javascript (with linear time complexity):
function maxRectangle(mask) {
var best = {area: 0}
const width = mask[0].length
const depth = Array(width).fill(0)
for (var y = 0; y < mask.length; y++) {
const ranges = Array()
for (var x = 0; x < width; x++) {
const d = depth[x] = mask[y][x] ? depth[x] + 1 : 0
if (!ranges.length || ranges[ranges.length - 1].height < d) {
ranges.push({left: x, height: d})
} else {
for (var j = ranges.length - 1; j >= 0 && ranges[j].height >= d; j--) {
const {left, height} = ranges[j]
const area = (x - left) * height
if (area > best.area) {
best = {area, left, top: y + 1 - height, right: x, bottom: y + 1}
}
}
ranges.splice(j+2)
ranges[j+1].height = d
}
}
}
return best;
}
var example = [
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0],
[0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0],
[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0],
[0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],
[0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]]
console.log(maxRectangle(example))

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Find largest rectangle containing only zeros in an N×N binary matrix - algorithm

Related

Looking for a optimal solution

compute maximum number of black rectangles in a cutome given black and white board [duplicate]

Hungarian Algorithm: finding minimum number of lines to cover zeroes?

Given a 2D array of numbers, find clusters

Puzzle: Find largest rectangle (maximal rectangle problem)

Categories

Resources