Looking for a optimal solution - algorithm

This was one of the interview questions in amazon.
Given a 2D array of 0's and 1's we need to find the pattern of maximum size.
Patters is as follows:
size = 1:
1
1 1 1
1
size = 2:
1
1
1 1 1
1
1
Naive Solution: Traverse each and every element of MxN matrix, and search for the index with value 1 and check if left & right entries as 1 and note the maximum length of 1's above and below the index.
Looking for a better solution. If anyone has a clue please do post.

I assume that any 1 values that surround such a pattern do not destroy it, so that also this would have size 1:
1 1 1 1
1 1 1 1
1 1 0 1
1 1 1 1
In that case I would suggest an algorithm where for each column you do the following:
Initialise size as 0
For each cell in this column:
Push the current size on a stack; it represents the number of 1 values in the upward direction starting from this cell.
If the value in this cell is a 1, then increase size, otherwise set it to 0
Initialise size as 0
For each cell in this column, but in reverse order:
Pop the last value from the stack
Call thisSize the least of the popped value and the value of size.
If thisSize is greater than the best pattern found so far and the values at both sides of the current cell are 1, then consider this the best pattern.
If the value in this cell is a 1, then increase size, otherwise set it to 0
As a further optimisation you could exit the second loop as soon as the distance between the current cell and the top of the grid becomes smaller than the size of the largest pattern we already found earlier.
Here is an implementation in JavaScript:
function findPattern(a) {
var rows = a.length,
cols = a[0].length,
maxSize = -1,
stack = [],
row, col, pos, thisSize;
for (col = 1; col < cols-1; col++) {
stack = [];
// Downward counting to store the number of 1s in upward direction
size = 0;
for (row = 0; row < rows; row++) {
stack.push(size);
size = a[row][col] == 1 ? size + 1 : 0;
}
// Reverse, but only as far as still useful given the size we already found
size = 0;
for (row = rows - 1; row > maxSize; row--) {
thisSize = Math.min(size, stack.pop());
if (thisSize >= maxSize && a[row][col-1] == 1 && a[row][col+1] == 1) {
maxSize = thisSize;
pos = [row, col];
}
size = a[row][col] == 1 ? size + 1 : 0;
}
}
return [maxSize, pos];
}
// Sample data:
var a = [
[0, 0, 1, 0, 0, 1, 0],
[0, 0, 1, 1, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 1],
[1, 0, 1, 0, 1, 1, 0],
[1, 1, 1, 1, 0, 1, 0],
[0, 1, 1, 1, 1, 1, 1],
[0, 0, 1, 0, 0, 1, 0]];
var [size, pos] = findPattern(a);
console.log('Size ' + size + ' with center at row ' + (pos[0]+1)
+ ' and column ' + (pos[1]+1) + ' (first row/col is numbered 1)');

Here: gist.github.com/... is a generic Java implementation that finds the largest plus(+) pattern in a 2D matrix of any size.
The idea is to find for the biggest possible plus(+) pattern first around the central elements(initial window) of the matrix. For each element in the window find the max plus size centered at that element.
If largest is found return the largest size.
If largest possible '+' is not found, store the size of whatever smaller than that was found and repeat search from step #1 in the next outer window around the previous window for 1-size smaller '+' pattern; iteratively searching for '+' in an 'onion layer pattern' from inside towards outside.
The initial central window is chosen such that edges of matrix are equally far on all sides from this window.
Example 1 - For matrix of size {4x3}, smallest central window lies
from (1,1) to (2,1)
Example 2 - For matrix of size {3x9}, smallest
central window lies from (1,1) to (1,7)
int rows = arr.length;
int cols = arr[0].length;
int min = rows < cols ? rows : cols;
int diff = rows > cols ? rows - cols : cols - rows;
// Initializing initial window params. The center most smallest window possible
int first_r, first_c, last_r, last_c;
first_r = (min - 1) / 2;
first_c = (min - 1) / 2;
last_r = rows < cols ? (rows / 2) : (cols / 2) + diff;
last_c = rows > cols ? (cols / 2) : (rows / 2) + diff;
Full Java code:
public class PlusPattern {
/**
* Utility method to verify matrix dimensions
*
* #param a matrix to be verified
* #return true if matrix size is greater than 0;
*/
private static boolean isValid(int[][] a) {
return a.length > 0 && a[0].length > 0;
}
/**
* Finds the size of largest plus(+) pattern of given 'symbol' integer in an integer 2D matrix .
*
* The idea is to find for the biggest possible plus(+) pattern first around the central elements
* of the matrix. If largest is found return the largest size. If largest possible + is not
* found, store the size of whatever smaller than that was found and repeat search for 1 size
* smaller + in the next outer window around the previous window.
*
* #param arr matrix to be searched
* #param symbol whose + patter is sought
* #return the radius of largest + found in the matrix.
*/
static int findLargestPlusPattern(int[][] arr, int symbol) {
if (!isValid(arr)) {
throw new IllegalArgumentException("Cannot perform search on empty array");
}
int maxPlusRadius = 0;
int rows = arr.length;
int cols = arr[0].length;
int min = rows < cols ? rows : cols;
int diff = rows > cols ? rows - cols : cols - rows;
// Initializing initial window params. The center most smallest window possible
// Example - For matrix of size {4x3}, smallest central window lies from [1][1] to [2][1]
// Example - For matrix of size {3x9}, smallest central window lies from [1][1] to [1][7]
int first_r, first_c, last_r, last_c;
first_r = (min - 1) / 2;
first_c = (min - 1) / 2;
last_r = rows < cols ? (rows / 2) : (cols / 2) + diff;
last_c = rows > cols ? (cols / 2) : (rows / 2) + diff;
// Initializing with biggest possible search radius in the matrix
int searchRadius = (min - 1) / 2;
int r, c;
int found;
// Iteratively searching for + in an 'onion layer pattern' from inside to outside
while (searchRadius > maxPlusRadius) { // no need to find smaller + patterns than the one already found
// initializing r and c cursor for this window iterations.
r = first_r;
c = first_c;
// Search each of the 4 sides of the current window in a clockwise manner
// 1# Scan the top line of window
do { // do-while used to search inside initial window with width==1
found = findLargestPlusAt(r, c, arr, symbol, searchRadius);
if (found == searchRadius) {
return searchRadius; // cannot find a bigger plus(+) than this in remaining matrix
} else if (found > maxPlusRadius) {
maxPlusRadius = found;
}
c++;
} while (c < last_c);
if (c > last_c)
c--; // for initial window with width==1. Otherwise #3 condition will be true for invalid c-index
// 2# Scan the right line of window
do { // do-while used to search inside initial window with height==1
found = findLargestPlusAt(r, c, arr, symbol, searchRadius);
if (found == searchRadius) {
return searchRadius;
} else if (found > maxPlusRadius) {
maxPlusRadius = found;
}
r++;
} while (r < last_r);
if (r > last_r)
r--; // for initial window with height==1. Otherwise #4 condition will be true for invalid r-index
// 3# Scan the bottom line of window
while (c > first_c) {
found = findLargestPlusAt(r, c, arr, symbol, searchRadius);
if (found == searchRadius) {
return searchRadius;
} else if (found > maxPlusRadius) {
maxPlusRadius = found;
}
c--;
}
// 4# Scan the left line of window
while (r > first_r) {
found = findLargestPlusAt(r, c, arr, symbol, searchRadius);
if (found == searchRadius) {
return searchRadius;
} else if (found > maxPlusRadius) {
maxPlusRadius = found;
}
r--;
}
// r and c comes back at first_r and first_c.
// increasing window on all sides by 1.
first_r--;
first_c--;
last_r++;
last_c++;
// reducing search radius to avoid out of bounds error on next window.
searchRadius--;
}
return maxPlusRadius;
}
/**
* Finds, if exist, the size of largest plus around a given point a[r][c]. It grows radius
* greedily to maximise the search for + pattern returns 0 if is the point is the only symbol.
*
* #param r row coordinate of search center
* #param c column coordinate of search center
* #param a matrix
* #param symbol search symbol
* #param max_radius around the center to be searched for + pattern
* #return returns -1 if the point itself is not the symbol.
* returns n if all the next elements in E W N S directions within radius n are the symbol elements.
*/
static int findLargestPlusAt(int r, int c, int[][] a, int symbol, int max_radius) {
int largest = -1;
if (a[r][c] != symbol) { // If center coordinate itself is not the symbol
return largest;
} else {
largest = 0;
}
for (int rad = 1; rad <= max_radius; rad++) {
if (a[r + rad][c] == symbol && a[r][c + rad] == symbol && a[r - rad][c] == symbol && a[r][c - rad] == symbol) {
largest = rad; // At least a '+' of radius 'rad' is present.
} else {
break;
}
}
return largest;
}
public static void main(String[] args) {
int mat[][];
mat = new int[][]{ // max + = 3
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 1, 1, 1, 1, 1,},
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 0, 1, 1, 0, 1,},
{1, 1, 0, 1, 1, 0, 1,},
};
int find = findLargestPlusPattern(mat, 1);
System.out.println("# Max + size radius is : " + find);
mat = new int[][]{ // max + = 2
{1, 1, 9, 1, 1, 9, 1,},
{1, 1, 9, 1, 1, 9, 1,},
{7, 1, 1, 1, 1, 1, 1,},
{1, 1, 9, 1, 1, 9, 1,},
{1, 1, 9, 1, 1, 9, 1,},
};
find = findLargestPlusPattern(mat, 1);
System.out.println("# Max + size radius is : " + find);
mat = new int[][]{ // max + = 1
{1, 1, 0, 1, 1},
{1, 1, 0, 1, 1},
{1, 1, 0, 1, 1},
{1, 1, 1, 6, 1},
{1, 1, 0, 1, 1},
{1, 1, 0, 1, 1},
};
find = findLargestPlusPattern(mat, 1);
System.out.println("# Max + size radius is : " + find);
}
}

The following uses basically the same logic as that provided by trincot in his answer, but does the reverse scan whenever there's a break in consecutive 1's. This eliminates the need to build an explicit stack.
The running time should be approximately the same. The only advantage to my method is that this algorithm uses O(1) extra space, whereas trincot's uses O(rowCount) extra space for the stack.
The extra stack makes for shorter and more readable code, though.
Code is in C#:
class FindPatternResult
{
public int Size { get; set; }
public int Col { get; set; }
public int Row { get; set; }
}
private FindPatternResult FindPattern(int[,] a)
{
var numCols = a.GetUpperBound(0)+1;
var numRows = a.GetUpperBound(1)+1;
var maxSize = -1;
var maxCol = -1;
var maxRow = -1;
// anonymous function for checking matches when there is
// a break in consecutive 1's.
var checkForMatch = new Action<int, int, int>((height, bottomRow, col) =>
{
var topRow = bottomRow - height + 1;
for (int row = bottomRow-1; row > topRow; --row)
{
// There's a potential optimization opportunity here.
// If we get beyond the midpoint and size is decreasing,
// then if size < maxSize, we can early-out.
// For example, if maxSize is 3 and tow-topRow < 3,
// then there can't possibly be a longer match in this column.
// I didn't add that optimization because it doesn't
// really change the algorithm. But if the number of rows
// is very large, it could be meaningful.
if (a[row, col - 1] == 1 && a[row, col + 1] == 1)
{
var size = Math.Min(bottomRow-row, row-topRow);
if (size > maxSize)
{
maxSize = size;
maxCol = col;
maxRow = row;
}
}
}
});
for (int col = 1; col < numCols - 1; ++col)
{
var size = 0;
for (int row = 0; row < numRows; ++row)
{
if (a[row,col] == 1)
{
++size;
}
else
{
// If size >= 3, then go back and check for a match
if (size >= 3)
{
checkForMatch(size, row, col);
}
size = 0;
}
}
// If we end the loop with size >= 3, then check from the bottom row.
if (size >= 3)
{
checkForMatch(size, numRows - 1, col);
}
}
Test with:
private void DoIt()
{
var rslt = FindPattern(_sampleData);
Console.WriteLine($"Result: {rslt.Size}, [{rslt.Row}, {rslt.Col}]");
}
private readonly int[,] _sampleData =
{
{0, 0, 1, 0, 0, 1, 0},
{0, 0, 1, 1, 0, 1, 0},
{1, 1, 1, 0, 0, 1, 1},
{1, 0, 1, 0, 1, 1, 0},
{1, 1, 1, 1, 0, 1, 0},
{0, 1, 1, 1, 1, 1, 1},
{0, 0, 1, 0, 0, 1, 0}
};

Related

Algorithm puzzle: minimum cost for allow all persons standing on a line to communicate with each other

I have a algorithm design puzzle that I could not solve.
The puzzle is formulated like this: There are N persons standing on a number line, each of them maybe standing on any integer number on that line. Multiple persons may stand on the same number. For any two persons to be able to communicate with each other, the distance between them should be less than K. The goal is to move them so that each pair of two persons can communicate each other (possibly via other people). In other words, we need to move them so that the distance between any neighboring two persons is smaller than K.
Question: What is the minimum number of total moves? It feels like this falls into greedy algorithm family or dynamic programming. Any hints are appreciated!
We can do the following in O(n):
Calculate the cost of moving all people to the right of person i towards person i at an acceptable distance:
costRight(A[i]) = costRight(A[i+1]) + (A[i+1] - A[i] - k + 1) * count of people to the right
K = 3; A = { 0, 3, 11, 17, 21}
costRight = {32, 28, 10, 2, 0}
Calculate the cost of moving all people to the left of person i towards person i at an acceptable distance:
costLeft(A[i]) = costLeft(A[i-1]) + (A[i] - A[i-1] - k + 1) * count of people to the left
K = 3; A = { 0, 3, 11, 17, 21}
costLeft = { 0, 1, 13, 25, 33}
costRight = {32, 28, 10, 2, 0}
Now that we have cost from both directions we can do this in O(n):
minCost = min(costRight + costLeft) for all A[i]
minCost = min(32 + 0, 28 + 1, 13 + 10, 25 + 2, 33 + 0) = 23
But sometimes that's no enough:
K = 3; A = { 0, 0, 1, 8, 8}
carry: -2 -4 3
costLeft = { 0, 0, 0, 11, 11}
carry: -3 5 -2
costRight = { 8, 8, 8, 0, 0}
The optimum is neither 11 nor 8. Test the current best by moving towards the greatest saving:
move 1 to 2, cost = 1
K = 3; A = { 0, 0, 2, 8, 8}
carry: -2 -2 -10
costLeft = { 0, 0, 0, 10, 10}
carry: -2 -2
costRight = { 6, 6, 6, 0, 0}
minCost = 1 + min(0 + 6, 0 + 6, 0 + 6, 10 + 0, 10 + 0) = 1 + 6 = 7
Not quite sure how to formularize this efficiently.
Here is a greedy algorithm written in Java, but I don't know if it gives the optimal solution in every case. Also it is more a proof of concept, there is some room for optimizations.
It is based on the fact that two neighbouring persons must not be more than K apart, the next neighbour must not be more than 2K away and so on. In each step we move the person that "violates these constraints most". The details of this calculation are in method calcForce.
package so;
import java.util.Arrays;
public class Main {
public static void main(String args[]) {
int[] position = new int[] {0, 0, 5, 11, 17, 23};
int k = 5;
solve(position, k);
}
private static void solve(int[] position, int k) {
if (!sorted(position)) {
throw new IllegalArgumentException("positions must be sorted");
}
int[] force = new int[position.length];
int steps = 0;
while (calcForce(position, k, force)) {
int mp = -1;
int mv = -1;
for (int i = 0; i < force.length; i++) {
if (mv < Math.abs(force[i])) {
mv = Math.abs(force[i]);
mp = i;
}
}
System.out.printf("move %d to the %s%n", mp, force[mp] > 0 ? "right" : "left");
if (force[mp] > 0) {
position[mp]++;
} else {
position[mp]--;
}
steps++;
}
System.out.printf("total: %d steps%n", steps);
}
private static boolean calcForce(int[] position, int k, int[] force) {
boolean commProblem = false;
Arrays.fill(force, 0);
for (int i = 0; i < position.length - 1; i++) {
for (int j = i + 1; j < position.length; j++) {
int f = position[j] - position[i] - (j - i) * k;
if (f > 0) {
force[i] += f;
force[j] -= f;
commProblem = true;
}
}
}
return commProblem;
}
private static boolean sorted(int[] position) {
for (int i = 0; i < position.length - 1; i++) {
if (position[i] > position[i+1]) {
return false;
}
}
return true;
}
}

Find the first "missing" number in a sorted list

Let's say I have the continuous range of integers [0, 1, 2, 4, 6], in which the 3 is the first "missing" number. I need an algorithm to find this first "hole". Since the range is very large (containing perhaps 2^32 entries), efficiency is important. The range of numbers is stored on disk; space efficiency is also a main concern.
What's the best time and space efficient algorithm?
Use binary search. If a range of numbers has no hole, then the difference between the end and start of the range will also be the number of entries in the range.
You can therefore begin with the entire list of numbers, and chop off either the first or second half based on whether the first half has a gap. Eventually you will come to a range with two entries with a hole in the middle.
The time complexity of this is O(log N). Contrast to a linear scan, whose worst case is O(N).
Based on the approach suggested by #phs above, here is the C code to do that:
#include <stdio.h>
int find_missing_number(int arr[], int len) {
int first, middle, last;
first = 0;
last = len - 1;
middle = (first + last)/2;
while (first < last) {
if ((arr[middle] - arr[first]) != (middle - first)) {
/* there is a hole in the first half */
if ((middle - first) == 1 && (arr[middle] - arr[first] > 1)) {
return (arr[middle] - 1);
}
last = middle;
} else if ((arr[last] - arr[middle]) != (last - middle)) {
/* there is a hole in the second half */
if ((last - middle) == 1 && (arr[last] - arr[middle] > 1)) {
return (arr[middle] + 1);
}
first = middle;
} else {
/* there is no hole */
return -1;
}
middle = (first + last)/2;
}
/* there is no hole */
return -1;
}
int main() {
int arr[] = {3, 5, 1};
printf("%d", find_missing_number(arr, sizeof arr/(sizeof arr[0]))); /* prints 4 */
return 0;
}
Since numbers from 0 to n - 1 are sorted in an array, the first numbers should be same as their indexes. That's to say, the number 0 is located at the cell with index 0, the number 1 is located at the cell with index 1, and so on. If the missing number is denoted as m. Numbers less then m are located at cells with indexes same as values.
The number m + 1 is located at a cell with index m, The number m + 2 is located at a cell with index m + 1, and so on. We can see that, the missing number m is the first cell whose value is not identical to its value.
Therefore, it is required to search in an array to find the first cell whose value is not identical to its value. Since the array is sorted, we could find it in O(lg n) time based on the binary search algorithm as implemented below:
int getOnceNumber_sorted(int[] numbers)
{
int length = numbers.length
int left = 0;
int right = length - 1;
while(left <= right)
{
int middle = (right + left) >> 1;
if(numbers[middle] != middle)
{
if(middle == 0 || numbers[middle - 1] == middle - 1)
return middle;
right = middle - 1;
}
else
left = middle + 1;
}
return -1;
}
This solution is borrowed from my blog: http://codercareer.blogspot.com/2013/02/no-37-missing-number-in-array.html.
Based on algorithm provided by #phs
int findFirstMissing(int array[], int start , int end){
if(end<=start+1){
return start+1;
}
else{
int mid = start + (end-start)/2;
if((array[mid] - array[start]) != (mid-start))
return findFirstMissing(array, start, mid);
else
return findFirstMissing(array, mid+1, end);
}
}
Below is my solution, which I believe is simple and avoids an excess number of confusing if-statements. It also works when you don't start at 0 or have negative numbers involved! The complexity is O(lg(n)) time with O(1) space, assuming the client owns the array of numbers (otherwise it's O(n)).
The Algorithm in C Code
int missingNumber(int a[], int size) {
int lo = 0;
int hi = size - 1;
// TODO: Use this if we need to ensure we start at 0!
//if(a[0] != 0) { return 0; }
// All elements present? If so, return next largest number.
if((hi-lo) == (a[hi]-a[lo])) { return a[hi]+1; }
// While 2 or more elements to left to consider...
while((hi-lo) >= 2) {
int mid = (lo + hi) / 2;
if((mid-lo) != (a[mid]-a[lo])) { // Explore left-hand side
hi = mid;
} else { // Explore right hand side
lo = mid + 1;
}
}
// Return missing value from the two candidates remaining...
return (lo == (a[lo]-a[0])) ? hi + a[0] : lo + a[0];
}
Test Outputs
int a[] = {0}; // Returns: 1
int a[] = {1}; // Returns: 2
int a[] = {0, 1}; // Returns: 2
int a[] = {1, 2}; // Returns: 3
int a[] = {0, 2}; // Returns: 1
int a[] = {0, 2, 3, 4}; // Returns: 1
int a[] = {0, 1, 2, 4}; // Returns: 3
int a[] = {0, 1, 2, 4, 5, 6, 7, 8, 9}; // Returns: 3
int a[] = {2, 3, 5, 6, 7, 8, 9}; // Returns: 4
int a[] = {2, 3, 4, 5, 6, 8, 9}; // Returns: 7
int a[] = {-3, -2, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; // Returns: -1
int a[] = {-3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; // Returns: 10
The general procedure is:
(Optional) Check if the array starts at 0. If it doesn't, return 0 as missing.
Check if the array of integers is complete with no missing integer. If it is not missing an integer, return the next largest integer.
In a binary search fashion, check for a mismatch between the difference in the indices and array values. A mismatch tells us which half a missing element is in. If there is a mismatch in the first half, move left, otherwise move right. Do this until you have two candidate elements left to consider.
Return the number that is missing based on incorrect candidate.
Note, the algorithm's assumptions are:
First and last elements are considered to never be missing. These elements establish a range.
Only one integer is ever missing in the array. This will not find more than one missing integer!
Integer in the array are expected to increase in steps of 1, not at any other rate.
Have you considered a run-length encoding? That is, you encode the first number as well as the count of numbers that follow it consecutively. Not only can you represent the numbers used very efficiently this way, the first hole will be at the end of the first run-length encoded segment.
To illustrate with your example:
[0, 1, 2, 4, 6]
Would be encoded as:
[0:3, 4:1, 6:1]
Where x:y means there is a set of numbers consecutively starting at x for y numbers in a row. This tells us immediately that the first gap is at location 3. Note, however, that this will be much more efficient when the assigned addresses are clustered together, not randomly dispersed throughout the range.
if the list is sorted, I'd iterate over the list and do something like this Python code:
missing = []
check = 0
for n in numbers:
if n > check:
# all the numbers in [check, n) were not present
missing += range(check, n)
check = n + 1
# now we account for any missing numbers after the last element of numbers
if check < MAX:
missing += range(check, MAX + 1)
if lots of numbers are missing, you might want to use #Nathan's run-length encoding suggestion for the missing list.
Missing
Number=(1/2)(n)(n+1)-(Sum of all elements in the array)
Here n is the size of array+1.
Array: [1,2,3,4,5,6,8,9]
Index: [0,1,2,3,4,5,6,7]
int findMissingEmementIndex(int a[], int start, int end)
{
int mid = (start + end)/2;
if( Math.abs(a[mid] - a[start]) != Math.abs(mid - start) ){
if( Math.abs(mid - start) == 1 && Math.abs(a[mid] - a[start])!=1 ){
return start +1;
}
else{
return findMissingElmementIndex(a,start,mid);
}
}
else if( a[mid] - a[end] != end - start){
if( Math.abs(end - mid) ==1 && Math.abs(a[end] - a[mid])!=1 ){
return mid +1;
}
else{
return findMissingElmementIndex(a,mid,end);
}
}
else{
return No_Problem;
}
}
This is an interview Question. We will have an array of more than one missing numbers and we will put all those missing numbers in an ArrayList.
public class Test4 {
public static void main(String[] args) {
int[] a = { 1, 3, 5, 7, 10 };
List<Integer> list = new ArrayList<>();
int start = 0;
for (int i = 0; i < a.length; i++) {
int ch = a[i];
if (start == ch) {
start++;
} else {
list.add(start);
start++;
i--; // a must do.
} // else
} // for
System.out.println(list);
}
}
Functional Programming solution (Scala)
Nice and elegant
Lazy evaluation
def gapFinder(sortedList: List[Int], start: Int = 0): Int = {
def withGuards: Stream[Int] =
(start - 1) +: sortedList.toStream :+ (sortedList.last + 2)
if (sortedList.isEmpty) start
else withGuards.sliding(2)
.dropWhile { p => p.head + 1 >= p.last }.next()
.headOption.getOrElse(start) + 1
} // 8-line solution
// Tests
assert(gapFinder(List()) == 0)
assert(gapFinder(List[Int](0)) == 1)
assert(gapFinder(List[Int](1)) == 0)
assert(gapFinder(List[Int](2)) == 0)
assert(gapFinder(List[Int](0, 1, 2)) == 3)
assert(gapFinder(List[Int](0, 2, 4)) == 1)
assert(gapFinder(List[Int](0, 1, 2, 4)) == 3)
assert(gapFinder(List[Int](0, 1, 2, 4, 5)) == 3)
import java.util.Scanner;
class MissingNumber {
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
int n = scan.nextInt();
int[] arr =new int[n];
for (int i=0;i<n;i++){
arr[i]=scan.nextInt();
}
for (int i=0;i<n;i++){
if(arr[i+1]==arr[i]+1){
}
else{
System.out.println(arr[i]+1);
break;
}
}
}
}
I was looking for a super simple way to find the first missing number in a sorted array with a max potential value in javascript and didn't have to worry about efficiency too much as I didn't plan on using a list longer 10-20 items at the most. This is the recursive function I came up with:
function findFirstMissingNumber(sortedList, index, x, maxAllowedValue){
if(sortedList[index] == x && x < maxAllowedValue){
return findFirstMissingNumber(sortedList, (index+1), (x+1), maxAllowedValue);
}else{ return x; }
}
findFirstMissingNumber([3, 4, 5, 7, 8, 9], 0, 3, 10);
//expected output: 6
Give it your array, the index you wish to start at, the value you expect it to be and the maximum value you'd like to check up to.
i got one algorithm for finding the missing number in the sorted list. its complexity is logN.
public int execute2(int[] array) {
int diff = Math.min(array[1]-array[0], array[2]-array[1]);
int min = 0, max = arr.length-1;
boolean missingNum = true;
while(min<max) {
int mid = (min + max) >>> 1;
int leftDiff = array[mid] - array[min];
if(leftDiff > diff * (mid - min)) {
if(mid-min == 1)
return (array[mid] + array[min])/2;
max = mid;
missingNum = false;
continue;
}
int rightDiff = array[max] - array[mid];
if(rightDiff > diff * (max - mid)) {
if(max-mid == 1)
return (array[max] + array[mid])/2;
min = mid;
missingNum = false;
continue;
}
if(missingNum)
break;
}
return -1;
}
Based on algorithm provided by #phs
public class Solution {
public int missing(int[] array) {
// write your solution here
if(array == null){
return -1;
}
if (array.length == 0) {
return 1;
}
int left = 0;
int right = array.length -1;
while (left < right - 1) {
int mid = left + (right - left) / 2;
if (array[mid] - array[left] != mid - left) { //there is gap in [left, mid]
right = mid;
}else if (array[right] - array[mid] != right - mid) { //there is gap in [mid, right]
left = mid;
}else{ //there is no gapin [left, right], which means the missing num is the at 0 and N
return array[0] == 1 ? array.length + 1 : 1 ;
}
}
if (array[right] - array[left] == 2){ //missing number is between array[left] and array[right]
return left + 2;
}else{
return array[0] == 1 ? -1 : 1; //when ther is only one element in array
}
}
}
public static int findFirst(int[] arr) {
int l = -1;
int r = arr.length;
while (r - l > 1) {
int middle = (r + l) / 2;
if (arr[middle] > middle) {
r = middle;
}
l = middle;
}
return r;
}

What is the more efficient algorithm to equalize a vector?

Given a vector of n elements of type integer, what is the more efficient algorithm that produce the minimum number of transformation step resulting in a vector that have all its elements equals, knowing that :
in a single step, you could transfer at most one point from element to its neighbours ([0, 3, 0] -> [1, 2, 0] is ok but not [0, 3, 0] -> [1, 1, 1]).
in a single step, an element could receive 2 points : one from its left neighbour and one from the right ([3, 0 , 3] -> [2, 2, 2]).
first element and last element have only one neighbour, respectively, the 2nd element and the n-1 element.
an element cannot be negative at any step.
Examples :
Given :
0, 3, 0
Then 2 steps are required :
1, 2, 0
1, 1, 1
Given :
3, 0, 3
Then 1 step is required :
2, 2, 2
Given :
4, 0, 0, 0, 4, 0, 0, 0
Then 3 steps are required :
3, 1, 0, 0, 3, 1, 0, 0
2, 1, 1, 0, 2, 1, 1, 0
1, 1, 1; 1, 1, 1, 1, 1
My current algorithm is based on the sums of the integers at each side of an element. But I'm not sure if it produce the minimum steps.
FYI the problem is part of a code contest (created by Criteo http://codeofduty.criteo.com) that is over.
Here is a way. You know the sum of the array, so you know the target number in each cell.
Thus you also know the target sum for each subarray.
Then iterate through the array and on each step you make a desicion:
Move 1 to the left: if the sum up to the previous element is less then desired.
Move 1 to the right: if the sum up to the current element is more than desired
Don't do anything: if both of the above are false
Repeat this until no more changes are made (i.e. you only applied 3 for each of the elements).
public static int F(int[] ar)
{
int iter = -1;
bool finished = false;
int total = ar.Sum();
if (ar.Length == 0 || total % ar.Length != 0) return 0; //can't do it
int target = total / ar.Length;
int sum = 0;
while (!finished)
{
iter++;
finished = true;
bool canMoveNext = true;
//first element
if (ar[0] > target)
{
finished = false;
ar[0]--;
ar[1]++;
canMoveNext = ar[1] != 1;
}
sum = ar[0];
for (int i = 1; i < ar.Length; i++)
{
if (!canMoveNext)
{
canMoveNext = true;
sum += ar[i];
continue;
}
if (sum < i * target && ar[i] > 0)
{
finished = false;
ar[i]--;
ar[i - 1]++;
sum++;
}
else if (sum + ar[i] > (i + 1) * target && ar[i] > 0) //this can't happen for the last element so we are safe
{
finished = false;
ar[i]--;
ar[i + 1]++;
canMoveNext = ar[i + 1] != 1;
}
sum += ar[i];
}
}
return iter;
}
I've got an idea. I'm not sure it produces the optimal result, but it feels like it can.
Suppose the initial vector is the N-sized vector V. You need two additional N-sized vector :
In the L vector, you sum elements starting from the left : L[n] = sum(i=0;i<=n) V[n]
In the R vector, you sum elements starting from the right: R[n] = sum(i=n;i<N) V[n]
You finally need one last specific value : the sum of all the elements of V is supposed to be equal to k*N with k an integer. And you have L[N-1] == R[0] == k*N
Let's take the L vector. The idea is that for any n, consider the V vector divided in two parts, one from 0 to n, and the other contains the rest. If L[n]<n*k, then you've got to "fill" the first part with values from the second part. And vice versa if L[n]>n*k. If L[i]==i*k, then congratulations, the problem can be subdivided in two subproblems! There is no reason for any value from the second vector to be transferred to the first vector, and vice-versa.
Then, the algorithm is simple : for every value of n, check the value of L[n]-n*k and R[n]-(N-n)*k and act accordingly. There is just one special case, if L[n]-n*k>0 and R[n]-(N-n)*k>0 (there is a high value at V[n]), you must empty it in both directions. Just choose at random a direction to tranfer.
Of course, don't forget to update L and R accordingly.
Edit : In fact, it seems that you only need the L vector. Here is a simplified algorithm.
If L[n]==n*k, don't do anything
If L[n]<n*k, then transfer one value from V[n+1] to V[n] (if V[n+1]>0 of course)
If L[n]>n*k, then transfer one value from V[n] to V[n+1] (if V[n]>0 of course)
And (the special case) if you're asked to tranfer from V[n] to V[n-1] and V[n+1], just tranfer randomly once, it won't change the final result.
Thanks to Sam Hocevar, for the following alternative implementation to the fiver's one :
public static int F(int[] ar)
{
int total = ar.Sum();
if (ar.Length == 0 || total % ar.Length != 0) return 0; //can't do it
int target = total / ar.Length;
int[] left = new int[ar.Length];
int[] right = new int[ar.Length];
int maxshifts = 0;
int delta = 0;
for (int i = 0; i < ar.Length; i++)
{
left[i] = delta < 0 ? -delta : 0;
delta += ar[i] - target;
right[i] = delta > 0 ? delta : 0;
if (left[i] + right[i] > maxshifts) {
maxshifts = left[i] + right[i];
}
}
for (int iter = 0; iter < maxshifts; iter++)
{
int lastleftadd = -1;
for (int i = 0; i < ar.Length; i++)
{
if (left[i] != 0 && ar[i] != 0)
{
ar[i]--;
ar[i - 1]++;
left[i]--;
}
else if (right[i] != 0 && ar[i] != 0
&& (ar[i] != 1 || lastleftadd != i))
{
ar[i]--;
ar[i + 1]++;
lastleftadd = i + 1;
right[i]--;
}
}
}
return maxshifts;
}

Find largest rectangle containing only zeros in an N×N binary matrix

Given an NxN binary matrix (containing only 0's or 1's), how can we go about finding largest rectangle containing all 0's?
Example:
I
0 0 0 0 1 0
0 0 1 0 0 1
II->0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1 <--IV
0 0 1 0 0 0
IV
For the above example, it is a 6×6 binary matrix. the return value in this case will be Cell 1:(2, 1) and Cell 2:(4, 4). The resulting sub-matrix can be square or rectangular. The return value can also be the size of the largest sub-matrix of all 0's, in this example 3 × 4.
Here's a solution based on the "Largest Rectangle in a Histogram" problem suggested by #j_random_hacker in the comments:
[Algorithm] works by iterating through
rows from top to bottom, for each row
solving this problem, where the
"bars" in the "histogram" consist of
all unbroken upward trails of zeros
that start at the current row (a
column has height 0 if it has a 1 in
the current row).
The input matrix mat may be an arbitrary iterable e.g., a file or a network stream. Only one row is required to be available at a time.
#!/usr/bin/env python
from collections import namedtuple
from operator import mul
Info = namedtuple('Info', 'start height')
def max_size(mat, value=0):
"""Find height, width of the largest rectangle containing all `value`'s."""
it = iter(mat)
hist = [(el==value) for el in next(it, [])]
max_size = max_rectangle_size(hist)
for row in it:
hist = [(1+h) if el == value else 0 for h, el in zip(hist, row)]
max_size = max(max_size, max_rectangle_size(hist), key=area)
return max_size
def max_rectangle_size(histogram):
"""Find height, width of the largest rectangle that fits entirely under
the histogram.
"""
stack = []
top = lambda: stack[-1]
max_size = (0, 0) # height, width of the largest rectangle
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_size = max(max_size, (top().height, (pos - top().start)),
key=area)
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_size = max(max_size, (height, (pos - start)), key=area)
return max_size
def area(size):
return reduce(mul, size)
The solution is O(N), where N is the number of elements in a matrix. It requires O(ncols) additional memory, where ncols is the number of columns in a matrix.
Latest version with tests is at https://gist.github.com/776423
Please take a look at Maximize the rectangular area under Histogram and then continue reading the solution below.
Traverse the matrix once and store the following;
For x=1 to N and y=1 to N
F[x][y] = 1 + F[x][y-1] if A[x][y] is 0 , else 0
Then for each row for x=N to 1
We have F[x] -> array with heights of the histograms with base at x.
Use O(N) algorithm to find the largest area of rectangle in this histogram = H[x]
From all areas computed, report the largest.
Time complexity is O(N*N) = O(N²) (for an NxN binary matrix)
Example:
Initial array F[x][y] array
0 0 0 0 1 0 1 1 1 1 0 1
0 0 1 0 0 1 2 2 0 2 1 0
0 0 0 0 0 0 3 3 1 3 2 1
1 0 0 0 0 0 0 4 2 4 3 2
0 0 0 0 0 1 1 5 3 5 4 0
0 0 1 0 0 0 2 6 0 6 5 1
For x = N to 1
H[6] = 2 6 0 6 5 1 -> 10 (5*2)
H[5] = 1 5 3 5 4 0 -> 12 (3*4)
H[4] = 0 4 2 4 3 2 -> 10 (2*5)
H[3] = 3 3 1 3 2 1 -> 6 (3*2)
H[2] = 2 2 0 2 1 0 -> 4 (2*2)
H[1] = 1 1 1 1 0 1 -> 4 (1*4)
The largest area is thus H[5] = 12
Here is a Python3 solution, which returns the position in addition to the area of the largest rectangle:
#!/usr/bin/env python3
import numpy
s = '''0 0 0 0 1 0
0 0 1 0 0 1
0 0 0 0 0 0
1 0 0 0 0 0
0 0 0 0 0 1
0 0 1 0 0 0'''
nrows = 6
ncols = 6
skip = 1
area_max = (0, [])
a = numpy.fromstring(s, dtype=int, sep=' ').reshape(nrows, ncols)
w = numpy.zeros(dtype=int, shape=a.shape)
h = numpy.zeros(dtype=int, shape=a.shape)
for r in range(nrows):
for c in range(ncols):
if a[r][c] == skip:
continue
if r == 0:
h[r][c] = 1
else:
h[r][c] = h[r-1][c]+1
if c == 0:
w[r][c] = 1
else:
w[r][c] = w[r][c-1]+1
minw = w[r][c]
for dh in range(h[r][c]):
minw = min(minw, w[r-dh][c])
area = (dh+1)*minw
if area > area_max[0]:
area_max = (area, [(r-dh, c-minw+1, r, c)])
print('area', area_max[0])
for t in area_max[1]:
print('Cell 1:({}, {}) and Cell 2:({}, {})'.format(*t))
Output:
area 12
Cell 1:(2, 1) and Cell 2:(4, 4)
Here is J.F. Sebastians method translated into C#:
private Vector2 MaxRectSize(int[] histogram) {
Vector2 maxSize = Vector2.zero;
int maxArea = 0;
Stack<Vector2> stack = new Stack<Vector2>();
int x = 0;
for (x = 0; x < histogram.Length; x++) {
int start = x;
int height = histogram[x];
while (true) {
if (stack.Count == 0 || height > stack.Peek().y) {
stack.Push(new Vector2(start, height));
} else if(height < stack.Peek().y) {
int tempArea = (int)(stack.Peek().y * (x - stack.Peek().x));
if(tempArea > maxArea) {
maxSize = new Vector2(stack.Peek().y, (x - stack.Peek().x));
maxArea = tempArea;
}
Vector2 popped = stack.Pop();
start = (int)popped.x;
continue;
}
break;
}
}
foreach (Vector2 data in stack) {
int tempArea = (int)(data.y * (x - data.x));
if(tempArea > maxArea) {
maxSize = new Vector2(data.y, (x - data.x));
maxArea = tempArea;
}
}
return maxSize;
}
public Vector2 GetMaximumFreeSpace() {
// STEP 1:
// build a seed histogram using the first row of grid points
// example: [true, true, false, true] = [1,1,0,1]
int[] hist = new int[gridSizeY];
for (int y = 0; y < gridSizeY; y++) {
if(!invalidPoints[0, y]) {
hist[y] = 1;
}
}
// STEP 2:
// get a starting max area from the seed histogram we created above.
// using the example from above, this value would be [1, 1], as the only valid area is a single point.
// another example for [0,0,0,1,0,0] would be [1, 3], because the largest area of contiguous free space is 3.
// Note that at this step, the heigh fo the found rectangle will always be 1 because we are operating on
// a single row of data.
Vector2 maxSize = MaxRectSize(hist);
int maxArea = (int)(maxSize.x * maxSize.y);
// STEP 3:
// build histograms for each additional row, re-testing for new possible max rectangluar areas
for (int x = 1; x < gridSizeX; x++) {
// build a new histogram for this row. the values of this row are
// 0 if the current grid point is occupied; otherwise, it is 1 + the value
// of the previously found historgram value for the previous position.
// What this does is effectly keep track of the height of continous avilable spaces.
// EXAMPLE:
// Given the following grid data (where 1 means occupied, and 0 means free; for clairty):
// INPUT: OUTPUT:
// 1.) [0,0,1,0] = [1,1,0,1]
// 2.) [0,0,1,0] = [2,2,0,2]
// 3.) [1,1,0,1] = [0,0,1,0]
//
// As such, you'll notice position 1,0 (row 1, column 0) is 2, because this is the height of contiguous
// free space.
for (int y = 0; y < gridSizeY; y++) {
if(!invalidPoints[x, y]) {
hist[y] = 1 + hist[y];
} else {
hist[y] = 0;
}
}
// find the maximum size of the current histogram. If it happens to be larger
// that the currently recorded max size, then it is the new max size.
Vector2 maxSizeTemp = MaxRectSize(hist);
int tempArea = (int)(maxSizeTemp.x * maxSizeTemp.y);
if (tempArea > maxArea) {
maxSize = maxSizeTemp;
maxArea = tempArea;
}
}
// at this point, we know the max size
return maxSize;
}
A few things to note about this:
This version is meant for use with the Unity API. You can easily make this more generic by replacing instances of Vector2 with KeyValuePair. Vector2 is only used for a convenient way to store two values.
invalidPoints[] is an array of bool, where true means the grid point is "in use", and false means it is not.
Solution with space complexity O(columns) [Can be modified to O(rows) also] and time complexity O(rows*columns)
public int maximalRectangle(char[][] matrix) {
int m = matrix.length;
if (m == 0)
return 0;
int n = matrix[0].length;
int maxArea = 0;
int[] aux = new int[n];
for (int i = 0; i < n; i++) {
aux[i] = 0;
}
for (int i = 0; i < m; i++) {
for (int j = 0; j < n; j++) {
aux[j] = matrix[i][j] - '0' + aux[j];
maxArea = Math.max(maxArea, maxAreaHist(aux));
}
}
return maxArea;
}
public int maxAreaHist(int[] heights) {
int n = heights.length;
Stack<Integer> stack = new Stack<Integer>();
stack.push(0);
int maxRect = heights[0];
int top = 0;
int leftSideArea = 0;
int rightSideArea = heights[0];
for (int i = 1; i < n; i++) {
if (stack.isEmpty() || heights[i] >= heights[stack.peek()]) {
stack.push(i);
} else {
while (!stack.isEmpty() && heights[stack.peek()] > heights[i]) {
top = stack.pop();
rightSideArea = heights[top] * (i - top);
leftSideArea = 0;
if (!stack.isEmpty()) {
leftSideArea = heights[top] * (top - stack.peek() - 1);
} else {
leftSideArea = heights[top] * top;
}
maxRect = Math.max(maxRect, leftSideArea + rightSideArea);
}
stack.push(i);
}
}
while (!stack.isEmpty()) {
top = stack.pop();
rightSideArea = heights[top] * (n - top);
leftSideArea = 0;
if (!stack.isEmpty()) {
leftSideArea = heights[top] * (top - stack.peek() - 1);
} else {
leftSideArea = heights[top] * top;
}
maxRect = Math.max(maxRect, leftSideArea + rightSideArea);
}
return maxRect;
}
But I get Time Limite exceeded excpetion when I try this on LeetCode. Is there any less complex solution?
I propose a O(nxn) method.
First, you can list all the maximum empty rectangles. Empty means that it covers only 0s. A maximum empty rectangle is such that it cannot be extended in a direction without covering (at least) one 1.
A paper presenting a O(nxn) algorithm to create such a list can be found at www.ulg.ac.be/telecom/rectangles as well as source code (not optimized). There is no need to store the list, it is sufficient to call a callback function each time a rectangle is found by the algorithm, and to store only the largest one (or choose another criterion if you want).
Note that a proof exists (see the paper) that the number of largest empty rectangles is bounded by the number of pixels of the image (nxn in this case).
Therefore, selecting the optimal rectangle can be done in O(nxn), and the overall method is also O(nxn).
In practice, this method is very fast, and is used for realtime video stream analysis.
Here is a version of jfs' solution, which also delivers the position of the largest rectangle:
from collections import namedtuple
from operator import mul
Info = namedtuple('Info', 'start height')
def max_rect(mat, value=0):
"""returns (height, width, left_column, bottom_row) of the largest rectangle
containing all `value`'s.
Example:
[[0, 0, 0, 0, 0, 0, 0, 0, 3, 2],
[0, 4, 0, 2, 4, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 0, 0, 3, 0, 0, 4],
[0, 0, 0, 0, 4, 2, 0, 0, 0, 0],
[0, 0, 0, 2, 0, 0, 0, 0, 0, 0],
[4, 3, 0, 0, 1, 2, 0, 0, 0, 0],
[3, 0, 0, 0, 2, 0, 0, 0, 0, 4],
[0, 0, 0, 1, 0, 3, 2, 4, 3, 2],
[0, 3, 0, 0, 0, 2, 0, 1, 0, 0]]
gives: (3, 4, 6, 5)
"""
it = iter(mat)
hist = [(el==value) for el in next(it, [])]
max_rect = max_rectangle_size(hist) + (0,)
for irow,row in enumerate(it):
hist = [(1+h) if el == value else 0 for h, el in zip(hist, row)]
max_rect = max(max_rect, max_rectangle_size(hist) + (irow+1,), key=area)
# irow+1, because we already used one row for initializing max_rect
return max_rect
def max_rectangle_size(histogram):
stack = []
top = lambda: stack[-1]
max_size = (0, 0, 0) # height, width and start position of the largest rectangle
pos = 0 # current position in the histogram
for pos, height in enumerate(histogram):
start = pos # position where rectangle starts
while True:
if not stack or height > top().height:
stack.append(Info(start, height)) # push
elif stack and height < top().height:
max_size = max(max_size, (top().height, (pos - top().start), top().start), key=area)
start, _ = stack.pop()
continue
break # height == top().height goes here
pos += 1
for start, height in stack:
max_size = max(max_size, (height, (pos - start), start), key=area)
return max_size
def area(size):
return size[0] * size[1]
To be complete, here's the C# version which outputs the rectangle coordinates.
It's based on dmarra's answer but without any other dependencies.
There's only the function bool GetPixel(int x, int y), which returns true when a pixel is set at the coordinates x,y.
public struct INTRECT
{
public int Left, Right, Top, Bottom;
public INTRECT(int aLeft, int aTop, int aRight, int aBottom)
{
Left = aLeft;
Top = aTop;
Right = aRight;
Bottom = aBottom;
}
public int Width { get { return (Right - Left + 1); } }
public int Height { get { return (Bottom - Top + 1); } }
public bool IsEmpty { get { return Left == 0 && Right == 0 && Top == 0 && Bottom == 0; } }
public static bool operator ==(INTRECT lhs, INTRECT rhs)
{
return lhs.Left == rhs.Left && lhs.Top == rhs.Top && lhs.Right == rhs.Right && lhs.Bottom == rhs.Bottom;
}
public static bool operator !=(INTRECT lhs, INTRECT rhs)
{
return !(lhs == rhs);
}
public override bool Equals(Object obj)
{
return obj is INTRECT && this == (INTRECT)obj;
}
public bool Equals(INTRECT obj)
{
return this == obj;
}
public override int GetHashCode()
{
return Left.GetHashCode() ^ Right.GetHashCode() ^ Top.GetHashCode() ^ Bottom.GetHashCode();
}
}
public INTRECT GetMaximumFreeRectangle()
{
int XEnd = 0;
int YStart = 0;
int MaxRectTop = 0;
INTRECT MaxRect = new INTRECT();
// STEP 1:
// build a seed histogram using the first row of grid points
// example: [true, true, false, true] = [1,1,0,1]
int[] hist = new int[Height];
for (int y = 0; y < Height; y++)
{
if (!GetPixel(0, y))
{
hist[y] = 1;
}
}
// STEP 2:
// get a starting max area from the seed histogram we created above.
// using the example from above, this value would be [1, 1], as the only valid area is a single point.
// another example for [0,0,0,1,0,0] would be [1, 3], because the largest area of contiguous free space is 3.
// Note that at this step, the heigh fo the found rectangle will always be 1 because we are operating on
// a single row of data.
Tuple<int, int> maxSize = MaxRectSize(hist, out YStart);
int maxArea = (int)(maxSize.Item1 * maxSize.Item2);
MaxRectTop = YStart;
// STEP 3:
// build histograms for each additional row, re-testing for new possible max rectangluar areas
for (int x = 1; x < Width; x++)
{
// build a new histogram for this row. the values of this row are
// 0 if the current grid point is occupied; otherwise, it is 1 + the value
// of the previously found historgram value for the previous position.
// What this does is effectly keep track of the height of continous avilable spaces.
// EXAMPLE:
// Given the following grid data (where 1 means occupied, and 0 means free; for clairty):
// INPUT: OUTPUT:
// 1.) [0,0,1,0] = [1,1,0,1]
// 2.) [0,0,1,0] = [2,2,0,2]
// 3.) [1,1,0,1] = [0,0,1,0]
//
// As such, you'll notice position 1,0 (row 1, column 0) is 2, because this is the height of contiguous
// free space.
for (int y = 0; y < Height; y++)
{
if (!GetPixel(x, y))
{
hist[y]++;
}
else
{
hist[y] = 0;
}
}
// find the maximum size of the current histogram. If it happens to be larger
// that the currently recorded max size, then it is the new max size.
Tuple<int, int> maxSizeTemp = MaxRectSize(hist, out YStart);
int tempArea = (int)(maxSizeTemp.Item1 * maxSizeTemp.Item2);
if (tempArea > maxArea)
{
maxSize = maxSizeTemp;
maxArea = tempArea;
MaxRectTop = YStart;
XEnd = x;
}
}
MaxRect.Left = XEnd - maxSize.Item1 + 1;
MaxRect.Top = MaxRectTop;
MaxRect.Right = XEnd;
MaxRect.Bottom = MaxRectTop + maxSize.Item2 - 1;
// at this point, we know the max size
return MaxRect;
}
private Tuple<int, int> MaxRectSize(int[] histogram, out int YStart)
{
Tuple<int, int> maxSize = new Tuple<int, int>(0, 0);
int maxArea = 0;
Stack<Tuple<int, int>> stack = new Stack<Tuple<int, int>>();
int x = 0;
YStart = 0;
for (x = 0; x < histogram.Length; x++)
{
int start = x;
int height = histogram[x];
while (true)
{
if (stack.Count == 0 || height > stack.Peek().Item2)
{
stack.Push(new Tuple<int, int>(start, height));
}
else if (height < stack.Peek().Item2)
{
int tempArea = (int)(stack.Peek().Item2 * (x - stack.Peek().Item1));
if (tempArea > maxArea)
{
YStart = stack.Peek().Item1;
maxSize = new Tuple<int, int>(stack.Peek().Item2, (x - stack.Peek().Item1));
maxArea = tempArea;
}
Tuple<int, int> popped = stack.Pop();
start = (int)popped.Item1;
continue;
}
break;
}
}
foreach (Tuple<int, int> data in stack)
{
int tempArea = (int)(data.Item2 * (x - data.Item1));
if (tempArea > maxArea)
{
YStart = data.Item1;
maxSize = new Tuple<int, int>(data.Item2, (x - data.Item1));
maxArea = tempArea;
}
}
return maxSize;
}
An appropriate algorithm can be found within Algorithm for finding the largest inscribed rectangle in polygon (2019).
I implemented it in python:
import largestinteriorrectangle as lir
import numpy as np
grid = np.array([[0, 0, 0, 0, 1, 0],
[0, 0, 1, 0, 0, 1],
[0, 0, 0, 0, 0, 0],
[1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0]],
"bool")
grid = ~grid
lir.lir(grid) # [1, 2, 4, 3]
the result comes as x, y, width, height

find largest submatrix algorithm

I have an N*N matrix (N=2 to 10000) of numbers that may range from 0 to 1000.
How can I find the largest (rectangular) submatrix that consists of the same number?
Example:
1 2 3 4 5
-- -- -- -- --
1 | 10 9 9 9 80
2 | 5 9 9 9 10
3 | 85 86 54 45 45
4 | 15 21 5 1 0
5 | 5 6 88 11 10
The output should be the area of the submatrix, followed by 1-based coordinates of its top left element. For the example, it would be (6, 2, 1) because there are six 9s situated at column 2, row 1.
This is a work in progress
I thought about this problem and I think I may have a O(w*h) algorithm.
The idea goes like this:
for any (i,j) compute the highest number of cells with the same value in the column j starting from (i,j). Store this values as heights[i][j].
create an empty vector of sub matrix (a lifo)
for all row: i
for all column: j
pop all sub matrix whose height > heights[i][j]. Because the submatrix with height > heights[i][j] cannot continue on this cell
push a submatrix defined by (i,j,heights[i][j]) where j is the farest coordinate where we can fit a submatrix of height: heights[i][j]
update the current max sub matrix
The tricky part is in the inner loop. I use something similar to the max subwindow algorithm to ensure it is O(1) on average for each cell.
I will try to formulate a proof but in the meantime here is the code.
#include <algorithm>
#include <iterator>
#include <iostream>
#include <ostream>
#include <vector>
typedef std::vector<int> row_t;
typedef std::vector<row_t> matrix_t;
std::size_t height(matrix_t const& M) { return M.size(); }
std::size_t width (matrix_t const& M) { return M.size() ? M[0].size() : 0u; }
std::ostream& operator<<(std::ostream& out, matrix_t const& M) {
for(unsigned i=0; i<height(M); ++i) {
std::copy(begin(M[i]), end(M[i]),
std::ostream_iterator<int>(out, ", "));
out << std::endl;
}
return out;
}
struct sub_matrix_t {
int i, j, h, w;
sub_matrix_t(): i(0),j(0),h(0),w(1) {}
sub_matrix_t(int i_,int j_,int h_,int w_):i(i_),j(j_),h(h_),w(w_) {}
bool operator<(sub_matrix_t const& rhs) const { return (w*h)<(rhs.w*rhs.h); }
};
// Pop all sub_matrix from the vector keeping only those with an height
// inferior to the passed height.
// Compute the max sub matrix while removing sub matrix with height > h
void pop_sub_m(std::vector<sub_matrix_t>& subs,
int i, int j, int h, sub_matrix_t& max_m) {
sub_matrix_t sub_m(i, j, h, 1);
while(subs.size() && subs.back().h >= h) {
sub_m = subs.back();
subs.pop_back();
sub_m.w = j-sub_m.j;
max_m = std::max(max_m, sub_m);
}
// Now sub_m.{i,j} is updated to the farest coordinates where there is a
// submatrix with heights >= h
// If we don't cut the current height (because we changed value) update
// the current max submatrix
if(h > 0) {
sub_m.h = h;
sub_m.w = j-sub_m.j+1;
max_m = std::max(max_m, sub_m);
subs.push_back(sub_m);
}
}
void push_sub_m(std::vector<sub_matrix_t>& subs,
int i, int j, int h, sub_matrix_t& max_m) {
if(subs.empty() || subs.back().h < h)
subs.emplace_back(i, j, h, 1);
}
void solve(matrix_t const& M, sub_matrix_t& max_m) {
// Initialize answer suitable for an empty matrix
max_m = sub_matrix_t();
if(height(M) == 0 || width(M) == 0) return;
// 1) Compute the heights of columns of the same values
matrix_t heights(height(M), row_t(width(M), 1));
for(unsigned i=height(M)-1; i>0; --i)
for(unsigned j=0; j<width(M); ++j)
if(M[i-1][j]==M[i][j])
heights[i-1][j] = heights[i][j]+1;
// 2) Run through all columns heights to compute local sub matrices
std::vector<sub_matrix_t> subs;
for(int i=height(M)-1; i>=0; --i) {
push_sub_m(subs, i, 0, heights[i][0], max_m);
for(unsigned j=1; j<width(M); ++j) {
bool same_val = (M[i][j]==M[i][j-1]);
int pop_height = (same_val) ? heights[i][j] : 0;
int pop_j = (same_val) ? j : j-1;
pop_sub_m (subs, i, pop_j, pop_height, max_m);
push_sub_m(subs, i, j, heights[i][j], max_m);
}
pop_sub_m(subs, i, width(M)-1, 0, max_m);
}
}
matrix_t M1{
{10, 9, 9, 9, 80},
{ 5, 9, 9, 9, 10},
{85, 86, 54, 45, 45},
{15, 21, 5, 1, 0},
{ 5, 6, 88, 11, 10},
};
matrix_t M2{
{10, 19, 9, 29, 80},
{ 5, 9, 9, 9, 10},
{ 9, 9, 54, 45, 45},
{ 9, 9, 5, 1, 0},
{ 5, 6, 88, 11, 10},
};
int main() {
sub_matrix_t answer;
std::cout << M1 << std::endl;
solve(M1, answer);
std::cout << '(' << (answer.w*answer.h)
<< ',' << (answer.j+1) << ',' << (answer.i+1) << ')'
<< std::endl;
answer = sub_matrix_t();
std::cout << M2 << std::endl;
solve(M2, answer);
std::cout << '(' << (answer.w*answer.h)
<< ',' << (answer.j+1) << ',' << (answer.i+1) << ')'
<< std::endl;
}
This is an order Rows*Columns Solution
It works by
starting at the bottom of the array, and determining how many items below each number match it in a column. This is done in O(MN) time (very trivially)
Then it goes top to bottom & left to right and sees if any given number matches the number to the left. If so, it keeps track of how the heights relate to each other to track the possible rectangle shapes
Here is a working python implementation. Apologies since I'm not sure how to get the syntax highlighting working
# this program finds the largest area in an array where all the elements have the same value
# It solves in O(rows * columns) time using O(rows*columns) space using dynamic programming
def max_area_subarray(array):
rows = len(array)
if (rows == 0):
return [[]]
columns = len(array[0])
# initialize a blank new array
# this will hold max elements of the same value in a column
new_array = []
for i in range(0,rows-1):
new_array.append([0] * columns)
# start with the bottom row, these all of 1 element of the same type
# below them, including themselves
new_array.append([1] * columns)
# go from the second to bottom row up, finding how many contiguous
# elements of the same type there are
for i in range(rows-2,-1,-1):
for j in range(columns-1,-1,-1):
if ( array[i][j] == array[i+1][j]):
new_array[i][j] = new_array[i+1][j]+1
else:
new_array[i][j] = 1
# go left to right and match up the max areas
max_area = 0
top = 0
bottom = 0
left = 0
right = 0
for i in range(0,rows):
running_height =[[0,0,0]]
for j in range(0,columns):
matched = False
if (j > 0): # if this isn't the leftmost column
if (array[i][j] == array[i][j-1]):
# this matches the array to the left
# keep track of if this is a longer column, shorter column, or same as
# the one on the left
matched = True
while( new_array[i][j] < running_height[-1][0]):
# this is less than the one on the left, pop that running
# height from the list, and add it's columns to the smaller
# running height below it
if (running_height[-1][1] > max_area):
max_area = running_height[-1][1]
top = i
right = j-1
bottom = i + running_height[-1][0]-1
left = j - running_height[-1][2]
previous_column = running_height.pop()
num_columns = previous_column[2]
if (len(running_height) > 0):
running_height[-1][1] += running_height[-1][0] * (num_columns)
running_height[-1][2] += num_columns
else:
# for instance, if we have heights 2,2,1
# this will trigger on the 1 after we pop the 2 out, and save the current
# height of 1, the running area of 3, and running columsn of 3
running_height.append([new_array[i][j],new_array[i][j]*(num_columns),num_columns])
if (new_array[i][j] > running_height[-1][0]):
# longer then the one on the left
# append this height and area
running_height.append([new_array[i][j],new_array[i][j],1])
elif (new_array[i][j] == running_height[-1][0]):
# same as the one on the left, add this area to the one on the left
running_height[-1][1] += new_array[i][j]
running_height[-1][2] += 1
if (matched == False or j == columns -1):
while(running_height):
# unwind the maximums & see if this is the new max area
if (running_height[-1][1] > max_area):
max_area = running_height[-1][1]
top = i
right = j
bottom = i + running_height[-1][0]-1
left = j - running_height[-1][2]+1
# this wasn't a match, so move everything one bay to the left
if (matched== False):
right = right-1
left = left-1
previous_column = running_height.pop()
num_columns = previous_column[2]
if (len(running_height) > 0):
running_height[-1][1] += running_height[-1][0] * num_columns
running_height[-1][2] += num_columns
if (matched == False):
# this is either the left column, or we don't match to the column to the left, so reset
running_height = [[new_array[i][j],new_array[i][j],1]]
if (running_height[-1][1] > max_area):
max_area = running_height[-1][1]
top = i
right = j
bottom = i + running_height[-1][0]-1
left = j - running_height[-1][2]+1
max_array = []
for i in range(top,bottom+1):
max_array.append(array[i][left:right+1])
return max_array
numbers = [[6,4,1,9],[5,2,2,7],[2,2,2,1],[2,3,1,5]]
for row in numbers:
print row
print
print
max_array = max_area_subarray(numbers)
max_area = len(max_array) * len(max_array[0])
print 'max area is ',max_area
print
for row in max_array:
print row

Resources