Given a 2D array of numbers, find clusters

Given a 2D array of numbers, find clusters - algorithm

Given a 2D array, for example:
0 0 0 0 0
0 2 3 0 1
0 8 5 0 7
7 0 0 0 4
Output should be groups of clusters:
Cluster 1: <2,3,8,5,7>
Cluster 2: <1,7,4>

Your problem seems to be finding connected components. You should use a traverse method (either BFS or DFS will do the work). Iterate over all elements, for each non-zero element start a traverse and record all elements you see in that traverse and turn each visited element into zero. Something like the code below:
void DFS(int x, int y)
{
printf("%d ", g[x][y]);
g[x][y] = 0;
// iterate over neighbours
for(dx=-1; dx<=1; dx++)
for(dy=-1; dy<=1; dy++)
if (g[x+dx][y+dy]) DFS(x+dx, y+dy);
}
for(i=0; i<n; i++)
for(j=0; j<n; j++)
if (g[i][j])
{
DFS(i, j);
printf("\n");
}

You want to do Connected Component Labeling. This is usually considered an image processing algorithm, but it matches what you describe.
You will also get recommendations of K-means because you said 2D array of numbers and it is easy to interpret that as array of 2D numbers. K-means finds clusters of points in a plane, not connected groups in a 2D array like you request.

Code sample:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace Practice
{
class Program
{
static void Main()
{
var matrix = new[] { new StringBuilder("00000"), new StringBuilder("02301"), new StringBuilder("08507"), new StringBuilder("70004") };
var clusters = 0;
for (var i = 0; i < matrix.Length; i++)
for (var j = 0; j < (matrix.Any() ? matrix[i].Length : 0); j++)
if (matrix[i][j] != '0')
{
Console.Write("Cluster {0}: <", ++clusters);
var cluster = new List<char>();
Traverse(matrix, i, j, ref cluster);
Console.WriteLine("{0}>", string.Join(",", cluster));
}
Console.ReadKey();
}
private static void Traverse(StringBuilder[] matrix, int row, int col, ref List<char> cluster)
{
cluster.Add(matrix[row][col]);
matrix[row][col] = '0';
for (var i = -1; i <= 1 && row + i >= 0 && row + i < matrix.Length; i++)
for (var j = -1; j <= 1 && col + j >= 0 && col + j < (matrix.Any() ? matrix[row + i].Length : 0); j++)
if(matrix[row + i][col + j] != '0')
Traverse(matrix, row + i, col + j, ref cluster);
}
}
}
Algorithm explanation:
For each row:
For each column in that row:
If the item is an unvisited non-zero:
Save the found cluster member;
Mark location at [row, column] as visited;
Check for any unvisited non-zero neighbors while staying in-bounds of the matrix:
[row - 1, column - 1];
[row - 1, column];
[row - 1, column + 1];
[row, column - 1];
[row, column + 1];
[row + 1, column - 1];
[row + 1, column];
[row + 1, column + 1].
If any neighbor is an unvisited non-zero repeat steps 1-4 recursively until all neighbors are visited zeros (all cluster members have been found).

One way to do it is with a graph. Traverse the matrix in some order (I'd go left to right, top to bottom). When you encounter a non-zero element, add it to the graph. Then check all of its neighbors (it looks like you want 8-connected neighbors), and for each one that is non-zero, add its node to the graph, and connect it to the current element. The elements in the graph will probably have to keep track of their coordinates so you can see if you're adding a duplicate or not. When you're done traversing the matrix, you have a graph which contains a set of clusters. Clusters should be sub-graphs of connected elements.

If you know the number of groups or want to fit your data to a static number of groups, you can do k-means.
http://en.wikipedia.org/wiki/K-means_clustering

Using DFS:
import math
import numpy as np
def findConnectivityDFS(matrix):
rows = matrix.shape[0]
cols = matrix.shape[1]
finalResult = {}
# For 4 - connectivity use the following:
dx = [+1, 0, -1, 0]
dy = [0, +1, 0, -1]
# For 8 - connectivity use the following:
# dx = [-1, 0, 1,-1 ,1 ,-1,0,1]
# dy = [-1,-1,-1,-0 ,0 ,1, 1,1]
label = np.zeros((rows, cols))
def dfs(x, y, current_label):
# Taking care of boundaries and visited elements and zero elements in original matrix
if x < 0 or x == rows or y < 0 or y == cols or label[x][y] != 0 or matrix[x][y] == 0:
return
label[x][y] = current_label
if current_label not in finalResult.keys():
finalResult[current_label] = []
finalResult[current_label].append(matrix[x][y])
for direction in range(len(dx)):
dfs(x + dx[direction], y + dy[direction], current_label)
component = 0
for i in range(rows):
for j in range(cols):
if (label[i][j] == 0 and matrix[i][j] != 0):
component = component + 1
dfs(i, j, component)
return finalResult
BFS solution:
def findConnectivityBFS(matrix):
dx = [+1, 0, -1, 0]
dy = [0, +1, 0, -1]
h, w = matrix.shape
label = np.zeros((h, w))
clusters = {}
component = 1
for i in range(h):
for j in range(w):
if matrix[i][j] != 0 and label[i][j] == 0:
cluster = []
queue = [(i, j, component)]
while queue:
x, y, component = queue.pop(0)
if 0 <= x < h and 0 <= y < w and label[x][y] == 0 and matrix[x][y] != 0:
label[x][y] = component
cluster.append(matrix[x][y])
for direction in range(len(dx)):
queue.append([x+dx[direction], y+dy[direction], component])
clusters[component] = cluster
component += 1
return clusters
Result:
matrix = np.array([
[0 ,0, 0, 0,0],
[0, 2, 3, 0, 1],
[0, 8, 5, 0, 7],
[7, 0, 0, 0, 4],
])
print(matrix)
print("============findConnectivityDFS============")
print(findConnectivityDFS(matrix))
print("============findConnectivityBFS============")
print(findConnectivityBFS(matrix))

Related

Replace operators of equation, so that the sum is equal to zero

I'm given the equation like this one:
n = 7
1 + 1 - 4 - 4 - 4 - 2 - 2
How can I optimally replace operators, so that the sum of the equation is equal to zero, or print  -1. I think of one algorithm, but it is not optimal. I have an idea to bruteforce all cases with complexity O(n*2^n), but (n < 300).
Here is the link of the problem: http://codeforces.com/gym/100989/problem/M.

You can solve this with dynamic programming. Keep a map of all possible partial sums (mapping to the minimum number of changes to reach this sum), and then update it one number at a time,
Here's a concise Python solution:
def signs(nums):
xs = {nums[0]: 0}
for num in nums[1:]:
ys = dict()
for d, k in xs.iteritems():
for cost, n in enumerate([num, -num]):
ys[d+n] = min(ys.get(d+n, 1e100), k+cost)
xs = ys
return xs.get(0, -1)
print signs([1, 1, -4, -4, -4, -2, -2])
In theory this has exponential complexity in the worst case (since the number of partial sums can double at each step). However, if (as here) the given numbers are always (bounded) small ints, then the number of partial sums grows linearly, and the program works in O(n^2) time.
A somewhat more optimised version uses a sorted array of (subtotal, cost) instead of a dict. One can discard partial sums that are too large or too small (making it impossible to end up at 0 assuming all of the remaining elements are between -300 and +300). This runs approximately twice as fast, and is a more natural implementation to port to a lower-level language than Python for maximum speed.
def merge(xs, num):
i = j = 0
ci = 0 if num >= 0 else 1
cj = 0 if num < 0 else 1
num = abs(num)
while j < len(xs):
if xs[i][0] + num < xs[j][0] - num:
yield (xs[i][0] + num, xs[i][1] + ci)
i += 1
elif xs[i][0] + num > xs[j][0] - num:
yield (xs[j][0] - num, xs[j][1] + cj)
j += 1
else:
yield (xs[i][0] + num, min(xs[i][1] + ci, xs[j][1] + cj))
i += 1
j += 1
while i < len(xs):
yield (xs[i][0] + num, xs[i][1] + ci)
i += 1
def signs2(nums):
xs = [(nums[0], 0)]
for i in xrange(1, len(nums)):
limit = (len(nums) - 1 - i) * 300
xs = [x for x in merge(xs, nums[i]) if -limit <= x[0] <= limit]
for x, c in xs:
if x == 0: return c
return -1
print signs2([1, 1, -4, -4, -4, -2, -2])

Here is the implementation in C++:
unordered_map <int, int> M, U;
unordered_map<int, int>::iterator it;
int a[] = {1, -1, 4, -4};
int solve() {
for(int i = 0; i < n; ++i) {
if(i == 0) M[a[i]] = 1;
else {
vector <pair <int, int>> vi;
for(it = M.begin(); it != M.end(); ++it) {
int k = it->first, d = it->second;
vi.push_back({k + a[i], d});
vi.push_back({k - a[i], d + 1});
}
for(int j = 0; j < vi.size(); ++j) M[vi[j].first] = MAXN;
for(int j = 0; j < vi.size(); ++j) {
M[vi[j].first] = min(M[vi[j].first], vi[j].second);
}
}
}
return (M[0] == 0 ? -1 : M[0] - 1);
}

What I can think of:
You calculate the original equation. This results in -14.
Now you sort the numbers (taking into account their + or -)
When the equation results in a negative number, you look for the largest numbers to fix the equation. When a number is too large, you skip it.
orig_eq = -14
After sorting:
-4, -4, -4, -2, -2, 1, 1
You loop over this and select each number if the equation orig_eq - current number is closer to zero.
This way you can select each number to change the sign of

Maximum Sum of Product

I have a following problem.
Given N numbers, in range -100..100.
It is required to rearrange elements to have maximum sum of product value.
Sum of Product in this task is defined as A1*A2+A2*A3...AN-1*AN
For example, given numbers 10 20 50 40 30.
Then, we can rearrange them following way:
10, 30, 50, 40, 20 from the left to have maximum 10×30+30×50+50×40+40×20=4600
The idea is to sort the sequence, and then put max number in the middle of new sequence, then put next max number to the right, then to the left, and so on.
But, regarding negative numbers this is not working.
I have tried following algorithm:
1) sort initial sequence
2) process positive numbers and zero values how described above
3) process negative numbers how described above
4) find minimum number from positive sequence, it would be either left or right element and add after of before this number processed negative sequence.
For example, given sequence:
1,-2,3,-4,5,-6,7,-8,9,10,11,12,13,14,15,-16
Expected maximum sum of product is 1342.
My algorithm gives next rearrangements:
3,7,10,12,14,15,13,11,9,5,1,-4,-8,-16,-6,-2
Sum of product is 1340.
This seem to work, but it does not.
Could you please advise?

Your approach is sound, but you have to separate the positive and negative numbers.
Sort the array and split it into left and right parts, one containing all the negative numbers, and one containing all the non-negative numbers. Rearrange them as you were doing before, with the largest (absolute) values in the middle and decreasing values placed alternately on either side, but make sure that the smallest values in each part are at opposite ends.
Specifically, the negative number with the smallest absolute value should be the last element of the left part, and the non-negative value with the smallest value should be the first element of the right part.
Then concatenate the two parts and calculate the sum of adjacent products.
Here's a worked example:
arr = [2, 3, 5, -6, -2, -5]
arr.sort() = [-6, -5, -2, 2, 3, 5]
left, right = [-5, -6, -2], [2, 5, 3]
max_sum_of_product = -5*-6 + -6*-2 + -2*2 + 2*5 + 5*3 = 63
I don't have a formal proof of correctness, but this method gives the same results as a brute force search over all permutations of the input array:
def max_sum_of_products(arr):
from itertools import permutations
n = len(arr)
###### brute force method
max1 = max([sum([a[x-1]*a[x] for x in range(1,n)]) for a in permutations(arr)])
###### split method
lo, hi = [x for x in arr if x<0], [x for x in arr if x>=0]
lo.sort()
hi.sort()
lo_ordered, hi_ordered = [], []
t = (len(lo)%2 == 1)
for x in lo:
if t:
lo_ordered = lo_ordered + [x]
else:
lo_ordered = [x] + lo_ordered
t = not t
t = (len(hi)%2 == 0)
for x in hi[::-1]:
if t:
hi_ordered = hi_ordered + [x]
else:
hi_ordered = [x] + hi_ordered
t = not t
arr = lo_ordered + hi_ordered
max2 = sum([arr[x-1]*arr[x] for x in range(1,n)])
return (max1, max2)
def test():
from random import randint
for i in range(10):
a = []
for j in range(randint(4,9)):
a = a + [randint(-10,10)]
print a,
(max1,max2) = max_sum_of_products(a)
if max2!=max1:
print "bad result :-("
else:
print max1
test()

I have written a method in java that will take the array as an input and return the maximum sum of product pairs as output.
First I compute the negative part, then the positive part and then return their computed sum.
While computing the negative part, if the number of elements are odd, then the remaining element needs to be avoided (as it can be multiplied by 0 and nullified), we do this so that that negative addition will lower the sum.
All other negative items are needed to multiplied in pair and summed.
Then coming to second positive part, when we see 1 we need to add it if number of elements are odd, otherwise simply multiply and go forward.
public static long sum(int arr[]) {
Arrays.sort(arr);
long ans = 0;
long ans1 = 0;
boolean flag = false;
boolean flag2 = false;
int[] arr1 = new int[arr.length];
int[] arr2 = new int[arr.length];
int i = 0;
while (arr[i] < 0) {
arr1[i] = arr[i];
i++;
}
if (arr[i] == 0) flag = true;
if (i % 2 == 0) { //even -6,-5,-3,-2,-1
for (int j = 0; j < i - 1; j += 2) {
ans = arr1[j] * arr1[j + 1];
}
} else {
if (flag) {
for (int j = 0; j < i - 2; j += 2) {
ans = arr1[j] * arr1[j + 1];
}
}
}
int j = 0;
while (i<arr.length) {
arr2[j] = arr[i];
i++;
j++;
}
if (arr2[j] == 1) flag2 = true;
if (i % 2 == 0) {
for (int k=i-1; k>0; k-=2) {
ans1 = arr2[k] * arr2[k-1];
}
if (flag2) ans1 = ans1 + 1;
} else {
for (int k=arr2.length-1; k>1; k-=2) {
ans1 = arr2[k] * arr2[k-1];
}
ans1 = ans1 + arr2[0];
}
return ans + ans1;
}

Hungarian Algorithm: finding minimum number of lines to cover zeroes?

I am trying to implement the Hungarian Algorithm but I am stuck on the step 5. Basically, given a n X n matrix of numbers, how can I find minimum number of vertical+horizontal lines such that the zeroes in the matrix are covered?
Before someone marks this question as a duplicate of this, the solution mentioned there is incorrect and someone else also ran into the bug in the code posted there.
I am not looking for code but rather the concept by which I can draw these lines...
EDIT:
Please do not post the simple (but wrong) greedy algorithm:
Given this input:
(0, 1, 0, 1, 1)
(1, 1, 0, 1, 1)
(1, 0, 0, 0, 1)
(1, 1, 0, 1, 1)
(1, 0, 0, 1, 0)
I select, column 2 obviously (0-indexed):
(0, 1, x, 1, 1)
(1, 1, x, 1, 1)
(1, 0, x, 0, 1)
(1, 1, x, 1, 1)
(1, 0, x, 1, 0)
Now I can either select row 2 or col 1 both of which have two "remaining" zeroes. If I select col2, I end up with incorrect solution down this path:
(0, x, x, 1, 1)
(1, x, x, 1, 1)
(1, x, x, 0, 1)
(1, x, x, 1, 1)
(1, x, x, 1, 0)
The correct solution is using 4 lines:
(x, x, x, x, x)
(1, 1, x, 1, 1)
(x, x, x, x, x)
(1, 1, x, 1, 1)
(x, x, x, x, x)

Update
I have implemented the Hungarian Algorithm in the same steps provided by the link you posted: Hungarian algorithm
Here's the files with comments:
Github
Algorithm (Improved greedy) for step 3: (This code is very detailed and good for understanding the concept of choosing line to draw: horizontal vs Vertical. But note that this step code is improved in my code in Github)
Calculate the max number of zeros vertically vs horizontally for each xy position in the input matrix and store the result in a separate array called m2.
While calculating, if horizontal zeros > vertical zeroes, then the calculated number is converted to negative. (just to distinguish which direction we chose for later use)
Loop through all elements in the m2 array. If the value is positive, draw a vertical line in array m3, if value is negative, draw an horizontal line in m3
Follow the below example + code to understand more the algorithm:
Create 3 arrays:
m1: First array, holds the input values
m2: Second array, holds maxZeroes(vertical,horizontal) at each x,y position
m3: Third array, holds the final lines (0 index uncovered, 1 index covered)
Create 2 functions:
hvMax(m1,row,col); returns maximum number of zeroes horizontal or vertical. (Positive number means vertical, negative number means horizontal)
clearNeighbours(m2, m3,row,col); void method, it will clear the horizontal neighbors if the value at row col indexes is negative, or clear vertical neighbors if positive. Moreover, it will set the line in the m3 array, by flipping the zero bit to 1.
Code
public class Hungarian {
public static void main(String[] args) {
// m1 input values
int[][] m1 = { { 0, 1, 0, 1, 1 }, { 1, 1, 0, 1, 1 }, { 1, 0, 0, 0, 1 },
{ 1, 1, 0, 1, 1 }, { 1, 0, 0, 1, 0 } };
// int[][] m1 = { {13,14,0,8},
// {40,0,12,40},
// {6,64,0,66},
// {0,1,90,0}};
// int[][] m1 = { {0,0,100},
// {50,100,0},
// {0,50,50}};
// m2 max(horizontal,vertical) values, with negative number for
// horizontal, positive for vertical
int[][] m2 = new int[m1.length][m1.length];
// m3 where the line are drawen
int[][] m3 = new int[m1.length][m1.length];
// loop on zeroes from the input array, and sotre the max num of zeroes
// in the m2 array
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
if (m1[row][col] == 0)
m2[row][col] = hvMax(m1, row, col);
}
}
// print m1 array (Given input array)
System.out.println("Given input array");
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
System.out.print(m1[row][col] + "\t");
}
System.out.println();
}
// print m2 array
System.out
.println("\nm2 array (max num of zeroes from horizontal vs vertical) (- for horizontal and + for vertical)");
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
System.out.print(m2[row][col] + "\t");
}
System.out.println();
}
// Loop on m2 elements, clear neighbours and draw the lines
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
if (Math.abs(m2[row][col]) > 0) {
clearNeighbours(m2, m3, row, col);
}
}
}
// prinit m3 array (Lines array)
System.out.println("\nLines array");
for (int row = 0; row < m1.length; row++) {
for (int col = 0; col < m1.length; col++) {
System.out.print(m3[row][col] + "\t");
}
System.out.println();
}
}
// max of vertical vs horizontal at index row col
public static int hvMax(int[][] m1, int row, int col) {
int vertical = 0;
int horizontal = 0;
// check horizontal
for (int i = 0; i < m1.length; i++) {
if (m1[row][i] == 0)
horizontal++;
}
// check vertical
for (int i = 0; i < m1.length; i++) {
if (m1[i][col] == 0)
vertical++;
}
// negative for horizontal, positive for vertical
return vertical > horizontal ? vertical : horizontal * -1;
}
// clear the neighbors of the picked largest value, the sign will let the
// app decide which direction to clear
public static void clearNeighbours(int[][] m2, int[][] m3, int row, int col) {
// if vertical
if (m2[row][col] > 0) {
for (int i = 0; i < m2.length; i++) {
if (m2[i][col] > 0)
m2[i][col] = 0; // clear neigbor
m3[i][col] = 1; // draw line
}
} else {
for (int i = 0; i < m2.length; i++) {
if (m2[row][i] < 0)
m2[row][i] = 0; // clear neigbor
m3[row][i] = 1; // draw line
}
}
m2[row][col] = 0;
m3[row][col] = 1;
}
}
Output
Given input array
0 1 0 1 1
1 1 0 1 1
1 0 0 0 1
1 1 0 1 1
1 0 0 1 0
m2 array (max num of zeroes from horizontal vs vertical) (- for horizontal and + for vertical)
-2 0 5 0 0
0 0 5 0 0
0 -3 5 -3 0
0 0 5 0 0
0 -3 5 0 -3
Lines array
1 1 1 1 1
0 0 1 0 0
1 1 1 1 1
0 0 1 0 0
1 1 1 1 1
PS: Your example that you pointed to, will never occur because as you can see the first loop do the calculations by taking the max(horizontal,vertical) and save them in m2. So col1 will not be selected because -3 means draw horizontal line, and -3 was calculated by taking the max between horizontal vs vertical zeros. So at the first iteration at the elements, the program has checked how to draw the lines, on the second iteration, the program draw the lines.

Greedy algorithms may not work for some cases.
Firstly, it is possible reformulate your problem as following: given a bipartite graph, find a minimum vertex cover. In this problem there are 2n nodes, n for rows and n for columns. There is an edge between two nodes if element at the intersection of corresponding column and row is zero. Vertex cover is a set of nodes (rows and columns) such that each edge is incident to some node from that set (each zero is covered by row or column).
This is a well known problem and can be solved in O(n^3) by finding a maximum matching. Check wikipedia for details

There are cases where Amir's code fails.
Consider the following m1:
0 0 1
0 1 1
1 0 1
The best solution is to draw vertical lines in the first two columns.
Amir's code would give the following m2:
-2 -2 0
2 0 0
0 2 0
And the result would draw the two vertical lines AS WELL AS a line in the first row.
It seems to me the problem is the tie-breaking case:
return vertical > horizontal ? vertical : horizontal * -1;
Because of the way the code is written, the very similar m1 will NOT fail:
0 1 1
1 0 1
0 0 1
Where the first row is moved to the bottom, because the clearing function will clear the -2 values from m2 before those cells are reached. In the first case, the -2 values are hit first, so a horizontal line is drawn through the first row.
I've been working a little through this, and this is what I have. In the case of a tie, do not set any value and do not draw a line through those cells. This covers the case of the matrix I mentioned above, we are done at this step.
Clearly, there are situations where there will remain 0s that are uncovered. Below is another example of a matrix that will fail in Amir's method (m1):
0 0 1 1 1
0 1 0 1 1
0 1 1 0 1
1 1 0 0 1
1 1 1 1 1
The optimal solution is four lines, e.g. the first four columns.
Amir's method gives m2:
3 -2 0 0 0
3 0 -2 0 0
3 0 0 -2 0
0 0 -2 -2 0
0 0 0 0 0
Which will draw lines at the first four rows and the first column (an incorrect solution, giving 5 lines). Again, the tie-breaker case is the issue. We solve this by not setting a value for the ties, and iterating the procedure.
If we ignore the ties we get an m2:
3 -2 0 0 0
3 0 0 0 0
3 0 0 0 0
0 0 0 0 0
0 0 0 0 0
This leads to covering only the first row and the first column. We then take out the 0s that are covered to give new m1:
1 1 1 1 1
1 1 0 1 1
1 1 1 0 1
1 1 0 0 1
1 1 1 1 1
Then we keep repeating the procedure (ignoring ties) until we reach a solution. Repeat for a new m2:
0 0 0 0 0
0 0 2 0 0
0 0 0 2 0
0 0 0 0 0
0 0 0 0 0
Which leads to two vertical lines through the second and third columns. All 0s are now covered, needing only four lines (this is an alternative to lining the first four columns). The above matrix only needs 2 iterations, and I imagine most of these cases will need only two iterations unless there are sets of ties nested within sets of ties. I tried to come up with one, but it became difficult to manage.
Sadly, this is not good enough, because there will be cases that will remain tied forever. Particularly, in cases where there is a 'disjoint set of tied cells'. Not sure how else to describe this except to draw the following two examples:
0 0 1 1
0 1 1 1
1 0 1 1
1 1 1 0
or
0 0 1 1 1
0 1 1 1 1
1 0 1 1 1
1 1 1 0 0
1 1 1 0 0
The upper-left 3x3 sub-matrices in these two examples are identical to my original example, I have added 1 or 2 rows/cols to that example at the bottom and right. The only newly added zeros are where the new rows and columns cross. Describing for clarity.
With the iterative method I described, these matrices will be caught in an infinite loop. The zeros will always remain tied (col-count vs row-count). At this point, it does make sense to just arbitrarily choose a direction in the case of a tie, at least from what I can imagine.
The only issue I'm running into is setting up the stopping criteria for the loop. I can't assume that 2 iterations is enough (or any n), but I also can't figure out how to detect if a matrix has only infinite loops left within it. I'm still not sure how to describe these disjoint-tied-sets computationally.
Here is the code to do what I have come up with so far (in MATLAB script):
function [Lines, AllRows, AllCols] = FindMinLines(InMat)
%The following code finds the minimum set of lines (rows and columns)
%required to cover all of the true-valued cells in a matrix. If using for
%the Hungarian problem where 'true-values' are equal to zero, make the
%necessary changes. This code is not complete, since it will be caught in
%an infinite loop in the case of disjoint-tied-sets
%If passing in a matrix where 0s are the cells of interest, uncomment the
%next line
%InMat = InMat == 0;
%Assume square matrix
Count = length(InMat);
Lines = zeros(Count);
%while there are any 'true' values not covered by lines
while any(any(~Lines & InMat))
%Calculate row-wise and col-wise totals of 'trues' not-already-covered
HorzCount = repmat(sum(~Lines & InMat, 2), 1, Count).*(~Lines & InMat);
VertCount = repmat(sum(~Lines & InMat, 1), Count, 1).*(~Lines & InMat);
%Calculate for each cell the difference between row-wise and col-wise
%counts. I.e. row-oriented cells will have a negative number, col-oriented
%cells will have a positive numbers, ties and 'non-trues' will be 0.
%Non-zero values indicate lines to be drawn where orientation is determined
%by sign.
DiffCounts = VertCount - HorzCount;
%find the row and col indices of the lines
HorzIdx = any(DiffCounts < 0, 2);
VertIdx = any(DiffCounts > 0, 1);
%Set the horizontal and vertical indices of the Lines matrix to true
Lines(HorzIdx, :) = true;
Lines(:, VertIdx) = true;
end
%compute index numbers to be returned.
AllRows = [find(HorzIdx); find(DisjTiedRows)];
AllCols = find(VertIdx);
end

Step 5:
The drawing of line in the matrix is evaluated diagonally with a maximum evaluations of the length of the matrix.
Based on http://www.wikihow.com/Use-the-Hungarian-Algorithm with Steps 1 - 8 only.
Run code snippet and see results in console
Console Output
horizontal line (row): {"0":0,"2":2,"4":4}
vertical line (column): {"2":2}
Step 5: Matrix
0 1 0 1 1
1 1 0 1 1
1 0 0 0 1
1 1 0 1 1
1 0 0 1 0
Smallest number in uncovered matrix: 1
Step 6: Matrix
x x x x x
1 1 x 1 1
x x x x x
1 1 x 1 1
x x x x x
JSFiddle: http://jsfiddle.net/jjcosare/6Lpz5gt9/2/
// http://www.wikihow.com/Use-the-Hungarian-Algorithm
var inputMatrix = [
[0, 1, 0, 1, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 0, 1],
[1, 1, 0, 1, 1],
[1, 0, 0, 1, 0]
];
//var inputMatrix = [
// [10, 19, 8, 15],
// [10, 18, 7, 17],
// [13, 16, 9, 14],
// [12, 19, 8, 18],
// [14, 17, 10, 19]
// ];
var matrix = inputMatrix;
var HungarianAlgorithm = {};
HungarianAlgorithm.step1 = function(stepNumber) {
console.log("Step " + stepNumber + ": Matrix");
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
var sb = "";
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[i][j];
sb += currentNumber + " ";
}
console.log(sb);
}
}
HungarianAlgorithm.step2 = function() {
var largestNumberInMatrix = getLargestNumberInMatrix(matrix);
var rowLength = matrix.length;
var columnLength = matrix[0].length;
var dummyMatrixToAdd = 0;
var isAddColumn = rowLength > columnLength;
var isAddRow = columnLength > rowLength;
if (isAddColumn) {
dummyMatrixToAdd = rowLength - columnLength;
for (var i = 0; i < rowLength; i++) {
for (var j = columnLength; j < (columnLength + dummyMatrixToAdd); j++) {
matrix[i][j] = largestNumberInMatrix;
}
}
} else if (isAddRow) {
dummyMatrixToAdd = columnLength - rowLength;
for (var i = rowLength; i < (rowLength + dummyMatrixToAdd); i++) {
matrix[i] = [];
for (var j = 0; j < columnLength; j++) {
matrix[i][j] = largestNumberInMatrix;
}
}
}
HungarianAlgorithm.step1(2);
console.log("Largest number in matrix: " + largestNumberInMatrix);
function getLargestNumberInMatrix(matrix) {
var largestNumberInMatrix = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[i][j];
largestNumberInMatrix = (largestNumberInMatrix > currentNumber) ?
largestNumberInMatrix : currentNumber;
}
}
return largestNumberInMatrix;
}
}
HungarianAlgorithm.step3 = function() {
var smallestNumberInRow = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
smallestNumberInRow = getSmallestNumberInRow(matrix, i);
console.log("Smallest number in row[" + i + "]: " + smallestNumberInRow);
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[i][j];
matrix[i][j] = currentNumber - smallestNumberInRow;
}
}
HungarianAlgorithm.step1(3);
function getSmallestNumberInRow(matrix, rowIndex) {
var smallestNumberInRow = matrix[rowIndex][0];
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[rowIndex][i];
smallestNumberInRow = (smallestNumberInRow < currentNumber) ?
smallestNumberInRow : currentNumber;
}
return smallestNumberInRow;
}
}
HungarianAlgorithm.step4 = function() {
var smallestNumberInColumn = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
smallestNumberInColumn = getSmallestNumberInColumn(matrix, i);
console.log("Smallest number in column[" + i + "]: " + smallestNumberInColumn);
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[j][i];
matrix[j][i] = currentNumber - smallestNumberInColumn;
}
}
HungarianAlgorithm.step1(4);
function getSmallestNumberInColumn(matrix, columnIndex) {
var smallestNumberInColumn = matrix[0][columnIndex];
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[i][columnIndex];
smallestNumberInColumn = (smallestNumberInColumn < currentNumber) ?
smallestNumberInColumn : currentNumber;
}
return smallestNumberInColumn;
}
}
var rowLine = {};
var columnLine = {};
HungarianAlgorithm.step5 = function() {
var zeroNumberCountRow = 0;
var zeroNumberCountColumn = 0;
rowLine = {};
columnLine = {};
for (var i = 0; i < matrix.length; i++) {
zeroNumberCountRow = getZeroNumberCountInRow(matrix, i);
zeroNumberCountColumn = getZeroNumberCountInColumn(matrix, i);
if (zeroNumberCountRow > zeroNumberCountColumn) {
rowLine[i] = i;
if (zeroNumberCountColumn > 1) {
columnLine[i] = i;
}
} else if (zeroNumberCountRow < zeroNumberCountColumn) {
columnLine[i] = i;
if (zeroNumberCountRow > 1) {
rowLine[i] = i;
}
} else {
if ((zeroNumberCountRow + zeroNumberCountColumn) > 2) {
rowLine[i] = i;
columnLine[i] = i;
}
}
}
var zeroCount = 0;
for (var i in columnLine) {
zeroCount = getZeroNumberCountInColumnLine(matrix, columnLine[i], rowLine);
if (zeroCount == 0) {
delete columnLine[i];
}
}
for (var i in rowLine) {
zeroCount = getZeroNumberCountInRowLine(matrix, rowLine[i], columnLine);
if (zeroCount == 0) {
delete rowLine[i];
}
}
console.log("horizontal line (row): " + JSON.stringify(rowLine));
console.log("vertical line (column): " + JSON.stringify(columnLine));
HungarianAlgorithm.step1(5);
//if ((Object.keys(rowLine).length + Object.keys(columnLine).length) == matrix.length) {
// TODO:
// HungarianAlgorithm.step9();
//} else {
// HungarianAlgorithm.step6();
// HungarianAlgorithm.step7();
// HungarianAlgorithm.step8();
//}
function getZeroNumberCountInColumnLine(matrix, columnIndex, rowLine) {
var zeroNumberCount = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[i][columnIndex];
if (currentNumber == 0 && !(rowLine[i] == i)) {
zeroNumberCount++
}
}
return zeroNumberCount;
}
function getZeroNumberCountInRowLine(matrix, rowIndex, columnLine) {
var zeroNumberCount = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[rowIndex][i];
if (currentNumber == 0 && !(columnLine[i] == i)) {
zeroNumberCount++
}
}
return zeroNumberCount;
}
function getZeroNumberCountInColumn(matrix, columnIndex) {
var zeroNumberCount = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[i][columnIndex];
if (currentNumber == 0) {
zeroNumberCount++
}
}
return zeroNumberCount;
}
function getZeroNumberCountInRow(matrix, rowIndex) {
var zeroNumberCount = 0;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[rowIndex][i];
if (currentNumber == 0) {
zeroNumberCount++
}
}
return zeroNumberCount;
}
}
HungarianAlgorithm.step6 = function() {
var smallestNumberInUncoveredMatrix = getSmallestNumberInUncoveredMatrix(matrix, rowLine, columnLine);
console.log("Smallest number in uncovered matrix: " + smallestNumberInUncoveredMatrix);
var columnIndex = 0;
for (var i in columnLine) {
columnIndex = columnLine[i];
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[i][columnIndex];
//matrix[i][columnIndex] = currentNumber + smallestNumberInUncoveredMatrix;
matrix[i][columnIndex] = "x";
}
}
var rowIndex = 0;
for (var i in rowLine) {
rowIndex = rowLine[i];
for (var i = 0; i < matrix.length; i++) {
currentNumber = matrix[rowIndex][i];
//matrix[rowIndex][i] = currentNumber + smallestNumberInUncoveredMatrix;
matrix[rowIndex][i] = "x";
}
}
HungarianAlgorithm.step1(6);
function getSmallestNumberInUncoveredMatrix(matrix, rowLine, columnLine) {
var smallestNumberInUncoveredMatrix = null;;
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
if (rowLine[i]) {
continue;
}
for (var j = 0; j < matrix[i].length; j++) {
if (columnLine[j]) {
continue;
}
currentNumber = matrix[i][j];
if (!smallestNumberInUncoveredMatrix) {
smallestNumberInUncoveredMatrix = currentNumber;
}
smallestNumberInUncoveredMatrix =
(smallestNumberInUncoveredMatrix < currentNumber) ?
smallestNumberInUncoveredMatrix : currentNumber;
}
}
return smallestNumberInUncoveredMatrix;
}
}
HungarianAlgorithm.step7 = function() {
var smallestNumberInMatrix = getSmallestNumberInMatrix(matrix);
console.log("Smallest number in matrix: " + smallestNumberInMatrix);
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[j][i];
matrix[j][i] = currentNumber - smallestNumberInMatrix;
}
}
HungarianAlgorithm.step1(7);
function getSmallestNumberInMatrix(matrix) {
var smallestNumberInMatrix = matrix[0][0];
var currentNumber = 0;
for (var i = 0; i < matrix.length; i++) {
for (var j = 0; j < matrix[i].length; j++) {
currentNumber = matrix[i][j];
smallestNumberInMatrix = (smallestNumberInMatrix < currentNumber) ?
smallestNumberInMatrix : currentNumber;
}
}
return smallestNumberInMatrix;
}
}
HungarianAlgorithm.step8 = function() {
console.log("Step 8: Covering zeroes with Step 5 - 8 until Step 9 is reached");
HungarianAlgorithm.step5();
}
HungarianAlgorithm.step9 = function(){
console.log("Step 9...");
}
HungarianAlgorithm.step1(1);
HungarianAlgorithm.step2();
HungarianAlgorithm.step3();
HungarianAlgorithm.step4();
HungarianAlgorithm.step5();
HungarianAlgorithm.step6();

Do the assignment using the steps mentioned below:
assign a row if it has only one 0, else skip the row temporarily
cross out the 0's in the assigned column
Do the same for every column
After doing the assignment using the above steps, follow the steps below to get the minimum number of lines which cover all the 0's
step 1 - Tick an unassigned row
step 2 - If a ticked row has a 0, then tick the corresponding column
step 3 - If a ticked column has an assignment, then tick the corresponding row
step 4 - Repeat steps 2 and 3, till no more ticking is possible
step 5 - Draw lines through un-ticked rows and ticked columns
For your case: (0-indexing for rows and columns)
skip row 0, as it has two 0's
assign row 1, and cross out all the 0's in column 2
skip row 2, as it has two uncrossed 0's
skip row 3, as it has no uncrossed 0
skip row 4, as it has 2 uncrossed 0's
assign column 0
skip column 1 as it has two uncrossed 0's (in row-2 and row-4)
skip column 2, as it has an already assigned 0
assign column 3,and cross out the 0 in row 2
assign column 4, and cross out the 0 in row 4
assigned 0's are shown by '_' and 'x' shows crossed out 0's
( _ 1 x 1 1 ),
( 1 1 _ 1 1 ),
( 1 x x _ 1 ),
( 1 1 x 1 1 ),
( 1 x x 1 _ )
The matrix looks like the one shown above after doing the assignments
Now follow the 5 steps mentioned above to get the minimum number of lines that cover all the 0's
Tick row 3 as it is not assigned yet
Since row 3 has a 0 in column 2, tick column 2
Since column 2 has an assignment in row 1, tick row 1
Now draw lines through un-ticked rows (i.e. row 0,2,4) and ticked columns(i.e. column 2)
These 4 lines will cover all the 0's
Hope this helps:)
PS : For cases where no initial assignment is possible due to multiple 0's in each row and column, this could be handled by taking one arbitrary assignment (For the cases where multiple 0's are present in each row and column, it is very likely that more than one possible assignment would result in an optimal solution)

#CMPS answer fails on quite a few graphs. I think I have implemented a solution which solves the problem.
I followed the Wikipedia article on the Hungarian algorithm and I made an implementation that seems to work all the time.
From Wikipedia, here is a the method to draw the minimum number of lines:
First, assign as many tasks as possible.
Mark all rows having no assignments.
Mark all (unmarked) columns having zeros in newly marked row(s).
Mark all rows having assignments in newly marked columns.
Repeat for all non-assigned rows.
Here is my Ruby implementation:
def draw_lines grid
#copies the array
marking_grid = grid.map { |a| a.dup }
marked_rows = Array.new
marked_cols = Array.new
while there_is_zero(marking_grid) do
marking_grid = grid.map { |a| a.dup }
marked_cols.each do |col|
cross_out(marking_grid,nil, col)
end
marked = assignment(grid, marking_grid)
marked_rows = marked[0]
marked_cols.concat(marked[1]).uniq!
marking_grid = grid.map { |a| a.dup }
marking_grid.length.times do |row|
if !(marked_rows.include? row) then
cross_out(marking_grid,row, nil)
end
end
marked_cols.each do |col|
cross_out(marking_grid,nil, col)
end
end
lines = Array.new
marked_cols.each do |index|
lines.push(["column", index])
end
grid.each_index do |index|
if !(marked_rows.include? index) then
lines.push(["row", index])
end
end
return lines
end
def there_is_zero grid
grid.each_with_index do |row|
row.each_with_index do |value|
if value == 0 then
return true
end
end
end
return false
end
def assignment grid, marking_grid
marking_grid.each_index do |row_index|
first_zero = marking_grid[row_index].index(0)
#if there is no zero go to next row
if first_zero.nil? then
next
else
cross_out(marking_grid, row_index, first_zero)
marking_grid[row_index][first_zero] = "*"
end
end
return mark(grid, marking_grid)
end
def mark grid, marking_grid, marked_rows = Array.new, marked_cols = Array.new
marking_grid.each_with_index do |row, row_index|
selected_assignment = row.index("*")
if selected_assignment.nil? then
marked_rows.push(row_index)
end
end
marked_rows.each do |index|
grid[index].each_with_index do |cost, col_index|
if cost == 0 then
marked_cols.push(col_index)
end
end
end
marked_cols = marked_cols.uniq
marked_cols.each do |col_index|
marking_grid.each_with_index do |row, row_index|
if row[col_index] == "*" then
marked_rows.push(row_index)
end
end
end
return [marked_rows, marked_cols]
end
def cross_out(marking_grid, row, col)
if col != nil then
marking_grid.each_index do |i|
marking_grid[i][col] = "X"
end
end
if row != nil then
marking_grid[row].map! {|i| "X"}
end
end
grid = [
[0,0,1,0],
[0,0,1,0],
[0,1,1,1],
[0,1,1,1],
]
p draw_lines(grid)

What is the more efficient algorithm to equalize a vector?

Given a vector of n elements of type integer, what is the more efficient algorithm that produce the minimum number of transformation step resulting in a vector that have all its elements equals, knowing that :
in a single step, you could transfer at most one point from element to its neighbours ([0, 3, 0] -> [1, 2, 0] is ok but not [0, 3, 0] -> [1, 1, 1]).
in a single step, an element could receive 2 points : one from its left neighbour and one from the right ([3, 0 , 3] -> [2, 2, 2]).
first element and last element have only one neighbour, respectively, the 2nd element and the n-1 element.
an element cannot be negative at any step.
Examples :
Given :
0, 3, 0
Then 2 steps are required :
1, 2, 0
1, 1, 1
Given :
3, 0, 3
Then 1 step is required :
2, 2, 2
Given :
4, 0, 0, 0, 4, 0, 0, 0
Then 3 steps are required :
3, 1, 0, 0, 3, 1, 0, 0
2, 1, 1, 0, 2, 1, 1, 0
1, 1, 1; 1, 1, 1, 1, 1
My current algorithm is based on the sums of the integers at each side of an element. But I'm not sure if it produce the minimum steps.
FYI the problem is part of a code contest (created by Criteo http://codeofduty.criteo.com) that is over.

Here is a way. You know the sum of the array, so you know the target number in each cell.
Thus you also know the target sum for each subarray.
Then iterate through the array and on each step you make a desicion:
Move 1 to the left: if the sum up to the previous element is less then desired.
Move 1 to the right: if the sum up to the current element is more than desired
Don't do anything: if both of the above are false
Repeat this until no more changes are made (i.e. you only applied 3 for each of the elements).
public static int F(int[] ar)
{
int iter = -1;
bool finished = false;
int total = ar.Sum();
if (ar.Length == 0 || total % ar.Length != 0) return 0; //can't do it
int target = total / ar.Length;
int sum = 0;
while (!finished)
{
iter++;
finished = true;
bool canMoveNext = true;
//first element
if (ar[0] > target)
{
finished = false;
ar[0]--;
ar[1]++;
canMoveNext = ar[1] != 1;
}
sum = ar[0];
for (int i = 1; i < ar.Length; i++)
{
if (!canMoveNext)
{
canMoveNext = true;
sum += ar[i];
continue;
}
if (sum < i * target && ar[i] > 0)
{
finished = false;
ar[i]--;
ar[i - 1]++;
sum++;
}
else if (sum + ar[i] > (i + 1) * target && ar[i] > 0) //this can't happen for the last element so we are safe
{
finished = false;
ar[i]--;
ar[i + 1]++;
canMoveNext = ar[i + 1] != 1;
}
sum += ar[i];
}
}
return iter;
}

I've got an idea. I'm not sure it produces the optimal result, but it feels like it can.
Suppose the initial vector is the N-sized vector V. You need two additional N-sized vector :
In the L vector, you sum elements starting from the left : L[n] = sum(i=0;i<=n) V[n]
In the R vector, you sum elements starting from the right: R[n] = sum(i=n;i<N) V[n]
You finally need one last specific value : the sum of all the elements of V is supposed to be equal to k*N with k an integer. And you have L[N-1] == R[0] == k*N
Let's take the L vector. The idea is that for any n, consider the V vector divided in two parts, one from 0 to n, and the other contains the rest. If L[n]<n*k, then you've got to "fill" the first part with values from the second part. And vice versa if L[n]>n*k. If L[i]==i*k, then congratulations, the problem can be subdivided in two subproblems! There is no reason for any value from the second vector to be transferred to the first vector, and vice-versa.
Then, the algorithm is simple : for every value of n, check the value of L[n]-n*k and R[n]-(N-n)*k and act accordingly. There is just one special case, if L[n]-n*k>0 and R[n]-(N-n)*k>0 (there is a high value at V[n]), you must empty it in both directions. Just choose at random a direction to tranfer.
Of course, don't forget to update L and R accordingly.
Edit : In fact, it seems that you only need the L vector. Here is a simplified algorithm.
If L[n]==n*k, don't do anything
If L[n]<n*k, then transfer one value from V[n+1] to V[n] (if V[n+1]>0 of course)
If L[n]>n*k, then transfer one value from V[n] to V[n+1] (if V[n]>0 of course)
And (the special case) if you're asked to tranfer from V[n] to V[n-1] and V[n+1], just tranfer randomly once, it won't change the final result.

Thanks to Sam Hocevar, for the following alternative implementation to the fiver's one :
public static int F(int[] ar)
{
int total = ar.Sum();
if (ar.Length == 0 || total % ar.Length != 0) return 0; //can't do it
int target = total / ar.Length;
int[] left = new int[ar.Length];
int[] right = new int[ar.Length];
int maxshifts = 0;
int delta = 0;
for (int i = 0; i < ar.Length; i++)
{
left[i] = delta < 0 ? -delta : 0;
delta += ar[i] - target;
right[i] = delta > 0 ? delta : 0;
if (left[i] + right[i] > maxshifts) {
maxshifts = left[i] + right[i];
}
}
for (int iter = 0; iter < maxshifts; iter++)
{
int lastleftadd = -1;
for (int i = 0; i < ar.Length; i++)
{
if (left[i] != 0 && ar[i] != 0)
{
ar[i]--;
ar[i - 1]++;
left[i]--;
}
else if (right[i] != 0 && ar[i] != 0
&& (ar[i] != 1 || lastleftadd != i))
{
ar[i]--;
ar[i + 1]++;
lastleftadd = i + 1;
right[i]--;
}
}
}
return maxshifts;
}

Find largest rectangle containing only zeros in an N×N binary matrix

Given an NxN binary matrix (containing only 0's or 1's), how can we go about finding largest rectangle containing all 0's?
Example:
I
0 0 0 0 1 0
0 0 1 0 0 1
II->0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1 <--IV
0 0 1 0 0 0
IV
For the above example, it is a 6×6 binary matrix. the return value in this case will be Cell 1:(2, 1) and Cell 2:(4, 4). The resulting sub-matrix can be square or rectangular. The return value can also be the size of the largest sub-matrix of all 0's, in this example 3 × 4.

Here's a solution based on the "Largest Rectangle in a Histogram" problem suggested by #j_random_hacker in the comments:
[Algorithm] works by iterating through
rows from top to bottom, for each row
solving this problem, where the
"bars" in the "histogram" consist of
all unbroken upward trails of zeros
that start at the current row (a
column has height 0 if it has a 1 in
the current row).
The input matrix mat may be an arbitrary iterable e.g., a file or a network stream. Only one row is required to be available at a time.
#!/usr/bin/env python
from collections import namedtuple
from operator import mul
Info = namedtuple('Info', 'start height')
def max_size(mat, value=0):
"""Find height, width of the largest rectangle containing all `value`'s."""
it = iter(mat)
hist = [(el==value) for el in next(it, [])]
max_size = max_rectangle_size(hist)
for row in it:
hist = [(1+h) if el == value else 0 for h, el in zip(hist, row)]
max_size = max(max_size, max_rectangle_size(hist), key=area)
return max_size
def max_rectangle_size(histogram):
"""Find height, width of the largest rectangle that fits entirely under
the histogram.
"""
stack = []
top = lambda: stack[-1]
max_size = (0, 0) # height, width of the largest rectangle
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_size = max(max_size, (top().height, (pos - top().start)),
key=area)
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_size = max(max_size, (height, (pos - start)), key=area)
return max_size
def area(size):
return reduce(mul, size)
The solution is O(N), where N is the number of elements in a matrix. It requires O(ncols) additional memory, where ncols is the number of columns in a matrix.
Latest version with tests is at https://gist.github.com/776423

Please take a look at Maximize the rectangular area under Histogram and then continue reading the solution below.
Traverse the matrix once and store the following;
For x=1 to N and y=1 to N
F[x][y] = 1 + F[x][y-1] if A[x][y] is 0 , else 0
Then for each row for x=N to 1
We have F[x] -> array with heights of the histograms with base at x.
Use O(N) algorithm to find the largest area of rectangle in this histogram = H[x]
From all areas computed, report the largest.
Time complexity is O(N*N) = O(N²) (for an NxN binary matrix)
Example:
Initial array F[x][y] array
0 0 0 0 1 0 1 1 1 1 0 1
0 0 1 0 0 1 2 2 0 2 1 0
0 0 0 0 0 0 3 3 1 3 2 1
1 0 0 0 0 0 0 4 2 4 3 2
0 0 0 0 0 1 1 5 3 5 4 0
0 0 1 0 0 0 2 6 0 6 5 1
For x = N to 1
H[6] = 2 6 0 6 5 1 -> 10 (5*2)
H[5] = 1 5 3 5 4 0 -> 12 (3*4)
H[4] = 0 4 2 4 3 2 -> 10 (2*5)
H[3] = 3 3 1 3 2 1 -> 6 (3*2)
H[2] = 2 2 0 2 1 0 -> 4 (2*2)
H[1] = 1 1 1 1 0 1 -> 4 (1*4)
The largest area is thus H[5] = 12

Here is a Python3 solution, which returns the position in addition to the area of the largest rectangle:
#!/usr/bin/env python3
import numpy
s = '''0 0 0 0 1 0
0 0 1 0 0 1
0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1
0 0 1 0 0 0'''
nrows = 6
ncols = 6
skip = 1
area_max = (0, [])
a = numpy.fromstring(s, dtype=int, sep=' ').reshape(nrows, ncols)
w = numpy.zeros(dtype=int, shape=a.shape)
h = numpy.zeros(dtype=int, shape=a.shape)
for r in range(nrows):
for c in range(ncols):
if a[r][c] == skip:
continue
if r == 0:
h[r][c] = 1
else:
h[r][c] = h[r-1][c]+1
if c == 0:
w[r][c] = 1
else:
w[r][c] = w[r][c-1]+1
minw = w[r][c]
for dh in range(h[r][c]):
minw = min(minw, w[r-dh][c])
area = (dh+1)*minw
if area > area_max[0]:
area_max = (area, [(r-dh, c-minw+1, r, c)])
print('area', area_max[0])
for t in area_max[1]:
print('Cell 1:({}, {}) and Cell 2:({}, {})'.format(*t))
Output:
area 12
Cell 1:(2, 1) and Cell 2:(4, 4)

Here is J.F. Sebastians method translated into C#:
private Vector2 MaxRectSize(int[] histogram) {
Vector2 maxSize = Vector2.zero;
int maxArea = 0;
Stack<Vector2> stack = new Stack<Vector2>();
int x = 0;
for (x = 0; x < histogram.Length; x++) {
int start = x;
int height = histogram[x];
while (true) {
if (stack.Count == 0 || height > stack.Peek().y) {
stack.Push(new Vector2(start, height));
} else if(height < stack.Peek().y) {
int tempArea = (int)(stack.Peek().y * (x - stack.Peek().x));
if(tempArea > maxArea) {
maxSize = new Vector2(stack.Peek().y, (x - stack.Peek().x));
maxArea = tempArea;
}
Vector2 popped = stack.Pop();
start = (int)popped.x;
continue;
}
break;
}
}
foreach (Vector2 data in stack) {
int tempArea = (int)(data.y * (x - data.x));
if(tempArea > maxArea) {
maxSize = new Vector2(data.y, (x - data.x));
maxArea = tempArea;
}
}
return maxSize;
}
public Vector2 GetMaximumFreeSpace() {
// STEP 1:
// build a seed histogram using the first row of grid points
// example: [true, true, false, true] = [1,1,0,1]
int[] hist = new int[gridSizeY];
for (int y = 0; y < gridSizeY; y++) {
if(!invalidPoints[0, y]) {
hist[y] = 1;
}
}
// STEP 2:
// get a starting max area from the seed histogram we created above.
// using the example from above, this value would be [1, 1], as the only valid area is a single point.
// another example for [0,0,0,1,0,0] would be [1, 3], because the largest area of contiguous free space is 3.
// Note that at this step, the heigh fo the found rectangle will always be 1 because we are operating on
// a single row of data.
Vector2 maxSize = MaxRectSize(hist);
int maxArea = (int)(maxSize.x * maxSize.y);
// STEP 3:
// build histograms for each additional row, re-testing for new possible max rectangluar areas
for (int x = 1; x < gridSizeX; x++) {
// build a new histogram for this row. the values of this row are
// 0 if the current grid point is occupied; otherwise, it is 1 + the value
// of the previously found historgram value for the previous position.
// What this does is effectly keep track of the height of continous avilable spaces.
// EXAMPLE:
// Given the following grid data (where 1 means occupied, and 0 means free; for clairty):
// INPUT: OUTPUT:
// 1.) [0,0,1,0] = [1,1,0,1]
// 2.) [0,0,1,0] = [2,2,0,2]
// 3.) [1,1,0,1] = [0,0,1,0]
//
// As such, you'll notice position 1,0 (row 1, column 0) is 2, because this is the height of contiguous
// free space.
for (int y = 0; y < gridSizeY; y++) {
if(!invalidPoints[x, y]) {
hist[y] = 1 + hist[y];
} else {
hist[y] = 0;
}
}
// find the maximum size of the current histogram. If it happens to be larger
// that the currently recorded max size, then it is the new max size.
Vector2 maxSizeTemp = MaxRectSize(hist);
int tempArea = (int)(maxSizeTemp.x * maxSizeTemp.y);
if (tempArea > maxArea) {
maxSize = maxSizeTemp;
maxArea = tempArea;
}
}
// at this point, we know the max size
return maxSize;
}
A few things to note about this:
This version is meant for use with the Unity API. You can easily make this more generic by replacing instances of Vector2 with KeyValuePair. Vector2 is only used for a convenient way to store two values.
invalidPoints[] is an array of bool, where true means the grid point is "in use", and false means it is not.

Solution with space complexity O(columns) [Can be modified to O(rows) also] and time complexity O(rows*columns)
public int maximalRectangle(char[][] matrix) {
int m = matrix.length;
if (m == 0)
return 0;
int n = matrix[0].length;
int maxArea = 0;
int[] aux = new int[n];
for (int i = 0; i < n; i++) {
aux[i] = 0;
}
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
aux[j] = matrix[i][j] - '0' + aux[j];
maxArea = Math.max(maxArea, maxAreaHist(aux));
}
}
return maxArea;
}
public int maxAreaHist(int[] heights) {
int n = heights.length;
Stack<Integer> stack = new Stack<Integer>();
stack.push(0);
int maxRect = heights[0];
int top = 0;
int leftSideArea = 0;
int rightSideArea = heights[0];
for (int i = 1; i < n; i++) {
if (stack.isEmpty() || heights[i] >= heights[stack.peek()]) {
stack.push(i);
} else {
while (!stack.isEmpty() && heights[stack.peek()] > heights[i]) {
top = stack.pop();
rightSideArea = heights[top] * (i - top);
leftSideArea = 0;
if (!stack.isEmpty()) {
leftSideArea = heights[top] * (top - stack.peek() - 1);
} else {
leftSideArea = heights[top] * top;
}
maxRect = Math.max(maxRect, leftSideArea + rightSideArea);
}
stack.push(i);
}
}
while (!stack.isEmpty()) {
top = stack.pop();
rightSideArea = heights[top] * (n - top);
leftSideArea = 0;
if (!stack.isEmpty()) {
leftSideArea = heights[top] * (top - stack.peek() - 1);
} else {
leftSideArea = heights[top] * top;
}
maxRect = Math.max(maxRect, leftSideArea + rightSideArea);
}
return maxRect;
}
But I get Time Limite exceeded excpetion when I try this on LeetCode. Is there any less complex solution?

I propose a O(nxn) method.
First, you can list all the maximum empty rectangles. Empty means that it covers only 0s. A maximum empty rectangle is such that it cannot be extended in a direction without covering (at least) one 1.
A paper presenting a O(nxn) algorithm to create such a list can be found at www.ulg.ac.be/telecom/rectangles as well as source code (not optimized). There is no need to store the list, it is sufficient to call a callback function each time a rectangle is found by the algorithm, and to store only the largest one (or choose another criterion if you want).
Note that a proof exists (see the paper) that the number of largest empty rectangles is bounded by the number of pixels of the image (nxn in this case).
Therefore, selecting the optimal rectangle can be done in O(nxn), and the overall method is also O(nxn).
In practice, this method is very fast, and is used for realtime video stream analysis.

Here is a version of jfs' solution, which also delivers the position of the largest rectangle:
from collections import namedtuple
from operator import mul
Info = namedtuple('Info', 'start height')
def max_rect(mat, value=0):
"""returns (height, width, left_column, bottom_row) of the largest rectangle
containing all `value`'s.
Example:
[[0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
[0, 4, 0, 2, 4, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 0, 0, 3, 0, 0, 4],
[0, 0, 0, 0, 4, 2, 0, 0, 0, 0],
[0, 0, 0, 2, 0, 0, 0, 0, 0, 0],
[4, 3, 0, 0, 1, 2, 0, 0, 0, 0],
[3, 0, 0, 0, 2, 0, 0, 0, 0, 4],
[0, 0, 0, 1, 0, 3, 2, 4, 3, 2],
[0, 3, 0, 0, 0, 2, 0, 1, 0, 0]]
gives: (3, 4, 6, 5)
"""
it = iter(mat)
hist = [(el==value) for el in next(it, [])]
max_rect = max_rectangle_size(hist) + (0,)
for irow,row in enumerate(it):
hist = [(1+h) if el == value else 0 for h, el in zip(hist, row)]
max_rect = max(max_rect, max_rectangle_size(hist) + (irow+1,), key=area)
# irow+1, because we already used one row for initializing max_rect
return max_rect
def max_rectangle_size(histogram):
stack = []
top = lambda: stack[-1]
max_size = (0, 0, 0) # height, width and start position of the largest rectangle
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_size = max(max_size, (top().height, (pos - top().start), top().start), key=area)
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_size = max(max_size, (height, (pos - start), start), key=area)
return max_size
def area(size):
return size[0] * size[1]

To be complete, here's the C# version which outputs the rectangle coordinates.
It's based on dmarra's answer but without any other dependencies.
There's only the function bool GetPixel(int x, int y), which returns true when a pixel is set at the coordinates x,y.
public struct INTRECT
{
public int Left, Right, Top, Bottom;
public INTRECT(int aLeft, int aTop, int aRight, int aBottom)
{
Left = aLeft;
Top = aTop;
Right = aRight;
Bottom = aBottom;
}
public int Width { get { return (Right - Left + 1); } }
public int Height { get { return (Bottom - Top + 1); } }
public bool IsEmpty { get { return Left == 0 && Right == 0 && Top == 0 && Bottom == 0; } }
public static bool operator ==(INTRECT lhs, INTRECT rhs)
{
return lhs.Left == rhs.Left && lhs.Top == rhs.Top && lhs.Right == rhs.Right && lhs.Bottom == rhs.Bottom;
}
public static bool operator !=(INTRECT lhs, INTRECT rhs)
{
return !(lhs == rhs);
}
public override bool Equals(Object obj)
{
return obj is INTRECT && this == (INTRECT)obj;
}
public bool Equals(INTRECT obj)
{
return this == obj;
}
public override int GetHashCode()
{
return Left.GetHashCode() ^ Right.GetHashCode() ^ Top.GetHashCode() ^ Bottom.GetHashCode();
}
}
public INTRECT GetMaximumFreeRectangle()
{
int XEnd = 0;
int YStart = 0;
int MaxRectTop = 0;
INTRECT MaxRect = new INTRECT();
// STEP 1:
// build a seed histogram using the first row of grid points
// example: [true, true, false, true] = [1,1,0,1]
int[] hist = new int[Height];
for (int y = 0; y < Height; y++)
{
if (!GetPixel(0, y))
{
hist[y] = 1;
}
}
// STEP 2:
// get a starting max area from the seed histogram we created above.
// using the example from above, this value would be [1, 1], as the only valid area is a single point.
// another example for [0,0,0,1,0,0] would be [1, 3], because the largest area of contiguous free space is 3.
// Note that at this step, the heigh fo the found rectangle will always be 1 because we are operating on
// a single row of data.
Tuple<int, int> maxSize = MaxRectSize(hist, out YStart);
int maxArea = (int)(maxSize.Item1 * maxSize.Item2);
MaxRectTop = YStart;
// STEP 3:
// build histograms for each additional row, re-testing for new possible max rectangluar areas
for (int x = 1; x < Width; x++)
{
// build a new histogram for this row. the values of this row are
// 0 if the current grid point is occupied; otherwise, it is 1 + the value
// of the previously found historgram value for the previous position.
// What this does is effectly keep track of the height of continous avilable spaces.
// EXAMPLE:
// Given the following grid data (where 1 means occupied, and 0 means free; for clairty):
// INPUT: OUTPUT:
// 1.) [0,0,1,0] = [1,1,0,1]
// 2.) [0,0,1,0] = [2,2,0,2]
// 3.) [1,1,0,1] = [0,0,1,0]
//
// As such, you'll notice position 1,0 (row 1, column 0) is 2, because this is the height of contiguous
// free space.
for (int y = 0; y < Height; y++)
{
if (!GetPixel(x, y))
{
hist[y]++;
}
else
{
hist[y] = 0;
}
}
// find the maximum size of the current histogram. If it happens to be larger
// that the currently recorded max size, then it is the new max size.
Tuple<int, int> maxSizeTemp = MaxRectSize(hist, out YStart);
int tempArea = (int)(maxSizeTemp.Item1 * maxSizeTemp.Item2);
if (tempArea > maxArea)
{
maxSize = maxSizeTemp;
maxArea = tempArea;
MaxRectTop = YStart;
XEnd = x;
}
}
MaxRect.Left = XEnd - maxSize.Item1 + 1;
MaxRect.Top = MaxRectTop;
MaxRect.Right = XEnd;
MaxRect.Bottom = MaxRectTop + maxSize.Item2 - 1;
// at this point, we know the max size
return MaxRect;
}
private Tuple<int, int> MaxRectSize(int[] histogram, out int YStart)
{
Tuple<int, int> maxSize = new Tuple<int, int>(0, 0);
int maxArea = 0;
Stack<Tuple<int, int>> stack = new Stack<Tuple<int, int>>();
int x = 0;
YStart = 0;
for (x = 0; x < histogram.Length; x++)
{
int start = x;
int height = histogram[x];
while (true)
{
if (stack.Count == 0 || height > stack.Peek().Item2)
{
stack.Push(new Tuple<int, int>(start, height));
}
else if (height < stack.Peek().Item2)
{
int tempArea = (int)(stack.Peek().Item2 * (x - stack.Peek().Item1));
if (tempArea > maxArea)
{
YStart = stack.Peek().Item1;
maxSize = new Tuple<int, int>(stack.Peek().Item2, (x - stack.Peek().Item1));
maxArea = tempArea;
}
Tuple<int, int> popped = stack.Pop();
start = (int)popped.Item1;
continue;
}
break;
}
}
foreach (Tuple<int, int> data in stack)
{
int tempArea = (int)(data.Item2 * (x - data.Item1));
if (tempArea > maxArea)
{
YStart = data.Item1;
maxSize = new Tuple<int, int>(data.Item2, (x - data.Item1));
maxArea = tempArea;
}
}
return maxSize;
}

An appropriate algorithm can be found within Algorithm for finding the largest inscribed rectangle in polygon (2019).
I implemented it in python:
import largestinteriorrectangle as lir
import numpy as np
grid = np.array([[0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0]],
"bool")
grid = ~grid
lir.lir(grid) # [1, 2, 4, 3]
the result comes as x, y, width, height

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Given a 2D array of numbers, find clusters - algorithm

Given a 2D array, for example: 0 0 0 0 0 0 2 3 0 1 0 8 5 0 7 7 0 0 0 4 Output should be groups of clusters: Cluster 1: <2,3,8,5,7> Cluster 2: <1,7,4>

If you know the number of groups or want to fit your data to a static number of groups, you can do k-means. http://en.wikipedia.org/wiki/K-means_clustering

Related

Replace operators of equation, so that the sum is equal to zero

Maximum Sum of Product

Hungarian Algorithm: finding minimum number of lines to cover zeroes?

What is the more efficient algorithm to equalize a vector?

Find largest rectangle containing only zeros in an N×N binary matrix

Categories

Resources