Rotating a two dimensional array by 90 degrees - algorithm

I am studying this piece of code on rotating an NxN matrix; I have traced the program countless times, and I sort of understand how the actual rotation happens. It basically rotates the corners first and the elements after the corners in a clockwise direction. I just do not understand a couple of lines, and the code is still not "driven home" in my brain, so to speak. Please help. I am rotating it 90 degrees, given a 4x4 matrix as my tracing example.
[1][2][3][4]
[5][6][7][8]
[9][0][1][2]
[3][4][5][6]
becomes
[3][9][5][1]
[4][0][6][2]
[5][1][7][3]
[6][2][8][4]
public static void rotate(int[][] matrix, int n){
for(int layer=0; layer < n/2; ++layer) {
int first=layer; //It moves from the outside in.
int last=n-1-layer; //<--This I do not understand
for(int i=first; i<last;++i){
int offset=i-first; //<--A bit confusing for me
//save the top left of the matrix
int top = matrix[first][i];
//shift left to top;
matrix[first][i]=matrix[last-offset][first];
/*I understand that it needs
last-offset so that it will go up the column in the matrix,
and first signifies it's in the first column*/
//shift bottom to left
matrix[last-offset][first]=matrix[last][last-offset];
/*I understand that it needs
last-offset so that the number decreases and it may go up the column (first
last-offset) and left (latter). */
//shift right to bottom
matrix[last][last-offset]=matrix[i][last];
/*I understand that it i so that in the next iteration, it moves down
the column*/
//rightmost top corner
matrix[i][last]=top;
}
}
}

It's easier to understand an algorithm like this if you draw a diagram, so I made a quick pic in Paint to demonstrate for a 5x5 matrix :D
The outer for(int layer=0; layer < n/2; ++layer) loop iterates over the layers from outside to inside. The outer layer (layer 0) is depicted by coloured elements. Each layer is effectively a square of elements requiring rotation. For n = 5, layer will take on values from 0 to 1 as there are 2 layers since we can ignore the centre element/layer which is unaffected by rotation. first and last refer to the first and last rows/columns of elements for a layer; e.g. layer 0 has elements from Row/Column first = 0 to last = 4 and layer 1 from Row/Column 1 to 3.
Then for each layer/square, the inner for(int i=first; i<last;++i) loop rotates it by rotating 4 elements in each iteration. Offset represents how far along the sides of the square we are. For our 5x5 below, we first rotate the red elements (offset = 0), then yellow (offset = 1), then green and blue. Arrows 1-5 demonstrate the 4-element rotation for the red elements, and 6+ for the rest which are performed in the same fashion. Note how the 4-element rotation is essentially a 5-assignment circular swap with the first assignment temporarily putting aside an element. The //save the top left of the matrix comment for this assignment is misleading since matrix[first][i] isn't necessarily the top left of the matrix or even the layer for that matter. Also, note that the row/column indexes of elements being rotated are sometimes proportional to offset and sometimes proportional to its inverse, last - offset.
We move along the sides of the outer layer (delineated by first=0 and last=4) in this manner, then move onto the inner layer (first = 1 and last = 3) and do the same thing there. Eventually, we hit the centre and we're done.

This trigger a WTF. The easiest way to rotate a matrix in place is by
first transposing the matrix (swap M[i,j] with M[j,i])
then swapping M[i,j] with M[i, nColumns - j]
When matrices are column-major, the second operation is swapping columns, and hence has good data locality properties. If the matrix is row major, then first permute rows, and then transpose.

Here is a recursive way of solving this:
// rotating a 2 D array (mXn) by 90 degrees
public void rotateArray(int[][] inputArray) {
System.out.println("Input Array: ");
print2D(inputArray);
rotateArray(inputArray, 0, 0, inputArray.length - 1,
inputArray[0].length - 1);
System.out.println("\n\nOutput Array: ");
print2D(inputArray);
}
public void rotateArray(int[][] inputArray, int currentRow,
int currentColumn, int lastRow, int lastColumn) {
// condition to come out of recursion.
// if all rows are covered or all columns are covered (all layers
// covered)
if (currentRow >= lastRow || currentColumn >= lastColumn)
return;
// rotating the corner elements first
int top = inputArray[currentRow][currentColumn];
inputArray[currentRow][currentColumn] = inputArray[lastRow][currentColumn];
inputArray[lastRow][currentColumn] = inputArray[lastRow][lastColumn];
inputArray[lastRow][lastColumn] = inputArray[currentRow][lastColumn];
inputArray[currentRow][lastColumn] = top;
// clockwise rotation of remaining elements in the current layer
for (int i = currentColumn + 1; i < lastColumn; i++) {
int temp = inputArray[currentRow][i];
inputArray[currentRow][i] = inputArray[lastRow - i][currentColumn];
inputArray[lastRow - i][currentColumn] = inputArray[lastRow][lastColumn
- i];
inputArray[lastRow][lastColumn - i] = inputArray[currentRow + i][lastColumn];
inputArray[currentRow + i][lastColumn] = temp;
}
// call recursion on remaining layers
rotateArray(inputArray, ++currentRow, ++currentColumn, --lastRow,
--lastColumn);
}

Related

Calculate area given list of directions

Let's say you're given a list of directions:
up, up, right, down, right, down, left, left
If you follow the directions, you will always return to the starting location. Calculate the area of the shape that you just created.
The shape formed by the directions above would look something like:
___
| |___
|_______|
Clearly, from the picture, you can see that the area is 3.
I tried to use a 2d matrix to trace the directions, but unsure how to get the area from that...
For example, in my 2d array:
O O
O O O
O O O
This is probably not a good way of handling this, any ideas?
Since the polygon you create has axis-aligned edges only, you can calculate the total area from vertical slabs.
Let's say we are given a list of vertices V. I assume we have wrapping in this list, so we can query V.next(v) for every vertex v in V. For the last one, the result is the first.
First, try to find the leftmost and rightmost point, and the vertex where the leftmost point is reached (in linear time).
x = 0 // current x-position
xMin = inf, xMax = -inf // leftmost and rightmost point
leftVertex = null // leftmost vertex
foreach v in V
x = x + (v is left ? -1 : v is right ? 1 : 0)
xMax = max(x, xMax)
if x < xMin
xMin = x
leftVertex = V.next(v)
Now we create a simple data structure: for every vertical slab we keep a max heap (a sorted list is fine as well, but we only need to repetitively fetch the maximum element in the end).
width = xMax - xMin
heaps = new MaxHeap[width]
We start tracing the shape from vertex leftVertex now (the leftmost vertex we found in the first step). We now choose that this vertex has x/y-position (0, 0), just because it is convenient.
x = 0, y = 0
v = leftVertex
do
if v is left
x = x-1 // use left endpoint for index
heaps[x].Add(y) // first dec, then store
if v is right
heaps[x].Add(y) // use left endpoint for index
x = x+1 // first store, then inc
if v is up
y = y+1
if v is down
y = y-1
v = V.next(v)
until v = leftVertex
You can build this structure in O(n log n) time, because adding to a heap costs logarithmic time.
Finally, we need to compute the area from the heap. For a well-formed input, we need to get two contiguous y-values from the heap and subtract them.
area = 0
foreach heap in heaps
while heap not empty
area += heap.PopMax() - heap.PopMax() // each polygon's area
return area
Again, this takes O(n log n) time.
I ported the algorithm to a java implementation (see Ideone). Two sample runs:
public static void main (String[] args) {
// _
// | |_
// |_ _ |
Direction[] input = { Direction.Up, Direction.Up,
Direction.Right, Direction.Down,
Direction.Right, Direction.Down,
Direction.Left, Direction.Left };
System.out.println(computeArea(input));
// _
// |_|_
// |_|
Direction[] input2 = { Direction.Up, Direction.Right,
Direction.Down, Direction.Down,
Direction.Right, Direction.Up,
Direction.Left, Direction.Left };
System.out.println(computeArea(input2));
}
Returns (as expected):
3
2
Assuming some starting point (say, (0,0)) and the y direction is positive upwards:
left adds (-1,0) to the last point.
right adds (+1,0) to the last point.
up adds (0,+1) to the last point.
down adds (0,-1) to the last point.
A sequence of directions would then produce a list of (x,y) vertex co-ordinates from which the area of the resulting (implied closed) polygon can be found from How do I calculate the surface area of a 2d polygon?
EDIT
Here's an implementation and test in Python. The first two functions are from the answer linked above:
def segments(p):
return zip(p, p[1:] + [p[0]])
def area(p):
return 0.5 * abs(sum(x0*y1 - x1*y0
for ((x0, y0), (x1, y1)) in segments(p)))
def mkvertices(pth):
vert = [(0,0)]
for (dx,dy) in pth:
vert.append((vert[-1][0]+dx,vert[-1][1]+dy))
return vert
left = (-1,0)
right = (+1,0)
up = (0,+1)
down = (0,-1)
# _
# | |_
# |__|
print (area(mkvertices([up, up, right, down, right, down, left, left])))
# _
# |_|_
# |_|
print (area(mkvertices([up, right, down, down, right, up, left, left])))
Output:
3.0
0.0
Note that this approach fails for polygons that contain intersecting lines as in the second example.
This can be implemented in place using Shoelace formula for simple polygons.
For each segment (a, b) we have to calculate (b.x - a.x)*(a.y + b.y)/2. The sum over all segments is the signed area of a polygon.
What's more, here we're dealing only with axis aligned segments of length 1. Vertical segments can be ignored because b.x - a.x = 0.
Horizontal segments have a.y + b.y / 2 = a.y = b.y and b.x - a.x = +-1.
So in the end we only have to keep track of y and the area added is always +-y
Here is a sample C++ code:
#include <iostream>
#include <vector>
enum struct Direction
{
Up, Down, Left, Right
};
int area(const std::vector<Direction>& moves)
{
int area = 0;
int y = 0;
for (auto move : moves)
{
switch(move)
{
case Direction::Left:
area += y;
break;
case Direction::Right:
area -= y;
break;
case Direction::Up:
y -= 1;
break;
case Direction::Down:
y += 1;
break;
}
}
return area < 0 ? -area : area;
}
int main()
{
std::vector<Direction> moves{{
Direction::Up,
Direction::Up,
Direction::Right,
Direction::Down,
Direction::Right,
Direction::Down,
Direction::Left,
Direction::Left
}};
std::cout << area(moves);
return 0;
}
I assume there should be some restrictions on the shapes you are drawing (Axis aligned, polygonal graph, closed, non intersecting lines) to be able to calculate the area.
Represent the the shape using segments, each segments consists of two points, each has two coordinates: x and y.
Taking these assumptions into consideration, we can say that any horizontal segment has one parallel segment that has the same x dimensions for its two points but different y dimensions.
The surface area between these two segments equal the hight difference between them.Summing the area for all the horizontal segments gives you the total surface area of the shape.

Determine if a point is in the union of rectangles

I have a set of axis parallel 2d rectangles defined by their top left and bottom right hand corners(all in integer coordinates). Given a point query, how can you efficiently determine if it is in one of the rectangles? I just need a yes/no answer and don't need to worry about which rectangle it is in.
I can check if (x,y) is in ((x1, y1), (x2, y2)) by seeing if x is between x1 and x2 and y is between y1 and y2. I can do this separately for each rectangle which runs in linear time in the number of rectangles. But as I have a lot of rectangles and I will do a lot of point queries I would like something faster.
The answer depends a little bit on how many rectangles you have. The brute force method checks your coordinates against each rectangular pair in turn:
found = false
for each r in rectangles:
if point.x > r.x1 && point.x < r.x2:
if point.y > r.y1 && point.y < r.y2
found = true
break
You can get more efficient by sorting the rectangles into regions, and looking at "bounding rectangles". You then do a binary search through a tree of ever-decreasing bounding rectangles. This takes a bit more work up front, but it makes the lookup O(ln(n)) rather than O(n) - for large collections of rectangles and many lookups, the performance improvement will be significant. You can see a version of this (which looks at intersection of a rectangle with a set of rectangles - but you easily adapt to "point within") in this earlier answer. More generally, look at the topic of quad trees which are exactly the kind of data structure you would need for a 2D problem like this.
A slightly less efficient (but faster) method would sort the rectangles by lower left corner (for example) - you then need to search only a subset of the rectangles.
If the coordinates are integer type, you could make a binary mask - then the lookup is a single operation (in your case this would require a 512MB lookup table). If your space is relatively sparsely populated (i.e. the probability of a "miss" is quite large) then you could consider using an undersampled bit map (e.g. using coordinates/8) - then map size drops to 8M, and if you have "no hit" you save yourself the expense of looking more closely. Of course you have to round down the left/bottom, and round up the top/right coordinates to make this work right.
Expanding a little bit with an example:
Imagine coordinates can be just 0 - 15 in x, and 0 - 7 in y. There are three rectangles (all [x1 y1 x2 y2]: [2 3 4 5], [3 4 6 7] and [7 1 10 5]. We can draw these in a matrix (I mark the bottom left hand corner with the number of the rectangle - note that 1 and 2 overlap):
...xxxx.........
...xxxx.........
..xxxxx.........
..x2xxxxxxx.....
..1xx..xxxx.....
.......xxxx.....
.......3xxx.....
................
You can turn this into an array of zeros and ones - so that "is there a rectangle at this point" is the same as "is this bit set". A single lookup will give you the answer. To save space you could downsample the array - if there is still no hit, you have your answer, but if there is a hit you would need to check "is this real" - so it saves less time, and savings depend on sparseness of your matrix (sparser = faster). Subsampled array would look like this (2x downsampling):
.oxx....
.xxooo..
.oooxo..
...ooo..
I use x to mark "if you hit this point, you are sure to be in a rectangle", and o to say "some of these are a rectangle". Many of the points are now "maybe", and less time is saved. If you did more severe downsampling you might consider having a two-bit mask: this would allow you to say "this entire block is filled with rectangles" (i.e. - no further processing needed: the x above) or "further processing needed" (like the o above). This soon starts to be more complicated than the Q-tree approach...
Bottom line: the more sorting / organizing of the rectangles you do up front, the faster you can do the lookup.
My favourite for a variety of 2D geometry queries is Sweep Line Algorithm. It's widely utilize in CAD software, which would be my wild guess for the purpose of your program.
Basically, you order all points and all polygon vertices (all 4 rectangle corners in your case) along X-axis, and advance along X-axis from one point to the next. In case of non-Manhattan geometries you would also introduce intermediate points, the segment intersections.
The data structure is a balanced tree of the points and polygon (rectangle) edge intersections with the vertical line at the current X-position, ordered in Y-direction. If the structure is properly maintained it's very easy to tell whether a point at the current X-position is contained in a rectangle or not: just examine Y-orientation of the vertically adjacent to the point edge intersections. If rectangles are allowed to overlap or have rectangle holes it's just a bit more complicated, but still very fast.
The overall complexity for N points and M rectangles is O((N+M)*log(N+M)). One can actually prove that this is asymptotically optimal.
Store the coordinate parts of your rectangles to a tree structure. For any left value make an entry that points to corresponding right values pointing to corresponding top values pointing to corresponding bottom values.
To search you have to check the x value of your point against the left values. If all left values do not match, meaning they are greater than your x value, you know the point is outside any rectangle. Otherwise you check the x value against the right values of the corresponding left value. Again if all right values do not match, you're outside. Otherwise the same with top and bottom values. Once you find a matching bottom value, you know you are inside of any rectangle and you are finished checking.
As I stated in my comment below, there are much room for optimizations, for example minimum left and top values and also maximum right and botom values, to quick check if you are outside.
The following approach is in C# and needs adaption to your preferred language:
public class RectangleUnion
{
private readonly Dictionary<int, Dictionary<int, Dictionary<int, HashSet<int>>>> coordinates =
new Dictionary<int, Dictionary<int, Dictionary<int, HashSet<int>>>>();
public void Add(Rectangle rect)
{
Dictionary<int, Dictionary<int, HashSet<int>>> verticalMap;
if (coordinates.TryGetValue(rect.Left, out verticalMap))
AddVertical(rect, verticalMap);
else
coordinates.Add(rect.Left, CreateVerticalMap(rect));
}
public bool IsInUnion(Point point)
{
foreach (var left in coordinates)
{
if (point.X < left.Key) continue;
foreach (var right in left.Value)
{
if (right.Key < point.X) continue;
foreach (var top in right.Value)
{
if (point.Y < top.Key) continue;
foreach (var bottom in top.Value)
{
if (point.Y > bottom) continue;
return true;
}
}
}
}
return false;
}
private static void AddVertical(Rectangle rect,
IDictionary<int, Dictionary<int, HashSet<int>>> verticalMap)
{
Dictionary<int, HashSet<int>> bottomMap;
if (verticalMap.TryGetValue(rect.Right, out bottomMap))
AddBottom(rect, bottomMap);
else
verticalMap.Add(rect.Right, CreateBottomMap(rect));
}
private static void AddBottom(
Rectangle rect,
IDictionary<int, HashSet<int>> bottomMap)
{
HashSet<int> bottomList;
if (bottomMap.TryGetValue(rect.Top, out bottomList))
bottomList.Add(rect.Bottom);
else
bottomMap.Add(rect.Top, new HashSet<int> { rect.Bottom });
}
private static Dictionary<int, Dictionary<int, HashSet<int>>> CreateVerticalMap(
Rectangle rect)
{
var bottomMap = CreateBottomMap(rect);
return new Dictionary<int, Dictionary<int, HashSet<int>>>
{
{ rect.Right, bottomMap }
};
}
private static Dictionary<int, HashSet<int>> CreateBottomMap(Rectangle rect)
{
var bottomList = new HashSet<int> { rect.Bottom };
return new Dictionary<int, HashSet<int>>
{
{ rect.Top, bottomList }
};
}
}
It's not beautiful, but should point you in the right direction.

Largest rectangular sub matrix with the same number

I am trying to come up with a dynamic programming algorithm that finds the largest sub matrix within a matrix that consists of the same number:
example:
{5 5 8}
{5 5 7}
{3 4 1}
Answer : 4 elements due to the matrix
5 5
5 5
This is a question I already answered here (and here, modified version). In both cases the algorithm was applied to binary case (zeros and ones), but the modification for arbitrary numbers is quite easy (but sorry, I keep the images for the binary version of the problem). You can do this very efficiently by two pass linear O(n) time algorithm - n being number of elements. However, this is not a dynamic programming - I think using dynamic programming here would be clumsy and inefficient in the end, because of the difficulties with problem decomposition, as the OP mentioned - unless its a homework - but in that case you can try to impress by this algorithm :-) as there's obviously no faster solution than O(n).
Algorithm (pictures depict binary case):
Say you want to find largest rectangle of free (white) elements.
Here follows the two pass linear O(n) time algorithm (n being number of elemets):
1) in a first pass, go by columns, from bottom to top, and for each element, denote the number of consecutive elements available up to this one:
repeat, until:
Pictures depict the binary case. In case of arbitrary numbers you hold 2 matrices - first with the original numbers and second with the auxiliary numbers that are filled in the image above. You have to check the original matrix and if you find a number different from the previous one, you just start the numbering (in the auxiliary matrix) again from 1.
2) in a second pass you go by rows, holding data structure of potential rectangles, i.e. the rectangles containing current position somewhere at the top edge. See the following picture (current position is red, 3 potential rectangles - purple - height 1, green - height 2 and yellow - height 3):
For each rectangle we keep its height k and its left edge. In other words we keep track of the sums of consecutive numbers that were >= k (i.e. potential rectangles of height k). This data structure can be represented by an array with double linked list linking occupied items, and the array size would be limited by the matrix height.
Pseudocode of 2nd pass (non-binary version with arbitrary numbers):
var m[] // original matrix
var aux[] // auxiliary matrix filled in the 1st pass
var rect[] // array of potential rectangles, indexed by their height
// the occupied items are also linked in double linked list,
// ordered by height
foreach row = 1..N // go by rows
foreach col = 1..M
if (col > 1 AND m[row, col] != m[row, col - 1]) // new number
close_potential_rectangles_higher_than(0); // close all rectangles
height = aux[row, col] // maximal height possible at current position
if (!rect[height]) { // rectangle with height does not exist
create rect[height] // open new rectangle
if (rect[height].next) // rectangle with nearest higher height
// if it exists, start from its left edge
rect[height].left_col = rect[height].next.left_col
else
rect[height].left_col = col;
}
close_potential_rectangles_higher_than(height)
end for // end row
close_potential_rectangles_higher_than(0);
// end of row -> close all rect., supposing col is M+1 now!
end for // end matrix
The function for closing rectangles:
function close_potential_rectangles_higher_than(height)
close_r = rectangle with highest height (last item in dll)
while (close_r.height > height) { // higher? close it
area = close_r.height * (col - close_r.left_col)
if (area > max_area) { // we have maximal rectangle!
max_area = area
max_topleft = [row, close_r.left_col]
max_bottomright = [row + height - 1, col - 1]
}
close_r = close_r.prev
// remove the rectangle close_r from the double linked list
}
end function
This way you can also get all maximum rectangles. So in the end you get:
And what the complexity will be? You see that the function close_potential_rectangles_higher_than is O(1) per closed rectangle. Because for each field we create 1 potential rectangle at the maximum, the total number of potential rectangles ever present in particular row is never higher than the length of the row. Therefore, complexity of this function is O(1) amortized!
So the whole complexity is O(n) where n is number of matrix elements.
A dynamic solution:
Define a new matrix A wich will store in A[i,j] two values: the width and the height of the largest submatrix with the left upper corner at i,j, fill this matrix starting from the bottom right corner, by rows bottom to top. You'll find four cases:
case 1: none of the right or bottom neighbour elements in the original matrix are equal to the current one, i.e: M[i,j] != M[i+1,j] and M[i,j] != M[i,j+1] being M the original matrix, in this case, the value of A[i,j] is 1x1
case 2: the neighbour element to the right is equal to the current one but the bottom one is different, the value of A[i,j].width is A[i+1,j].width+1 and A[i,j].height=1
case 3: the neighbour element to the bottom is equal but the right one is different, A[i,j].width=1, A[i,j].height=A[i,j+1].height+1
case 4: both neighbours are equal: A[i,j].width = min(A[i+1,j].width+1,A[i,j+1].width) and A[i,j].height = min(A[i,j+1]+1,A[i+1,j])
the size of the largest matrix that has the upper left corner at i,j is A[i,j].width*A[i,j].height so you can update the max value found while calculating the A[i,j]
the bottom row and the rightmost column elements are treated as if their neighbours to the bottom and to the right respectively are different
in your example, the resulting matrix A would be:
{2:2 1:2 1:1}
{2:1 1:1 1:1}
{1:1 1:1 1:1}
being w:h width:height
Modification to the above answer:
Define a new matrix A wich will store in A[i,j] two values: the width and the height of the largest submatrix with the left upper corner at i,j, fill this matrix starting from the bottom right corner, by rows bottom to top. You'll find four cases:
case 1: none of the right or bottom neighbour elements in the original matrix are equal to the current one, i.e: M[i,j] != M[i+1,j] and M[i,j] != M[i,j+1] being M the original matrix, in this case, the value of A[i,j] is 1x1
case 2: the neighbour element to the right is equal to the current one but the bottom one is different, the value of A[i,j].width is A[i+1,j].width+1 and A[i,j].height=1
case 3: the neighbour element to the bottom is equal but the right one is different, A[i,j].width=1, A[i,j].height=A[i,j+1].height+1
case 4: both neighbours are equal:
Three rectangles are considered:
1. A[i,j].width=A[i,j+1].width+1; A[i,j].height=1;
A[i,j].height=A[i+1,j].height+1; a[i,j].width=1;
A[i,j].width = min(A[i+1,j].width+1,A[i,j+1].width) and A[i,j].height = min(A[i,j+1]+1,A[i+1,j])
The one with the max area in the above three cases will be considered to represent the rectangle at this position.
The size of the largest matrix that has the upper left corner at i,j is A[i,j].width*A[i,j].height so you can update the max value found while calculating the A[i,j]
the bottom row and the rightmost column elements are treated as if their neighbours to the bottom and to the right respectively are different.
This question is a duplicate. I have tried to flag it as a duplicate. Here is a Python solution, which also returns the position and shape of the largest rectangular submatrix:
#!/usr/bin/env python3
import numpy
s = '''5 5 8
5 5 7
3 4 1'''
nrows = 3
ncols = 3
skip_not = 5
area_max = (0, [])
a = numpy.fromstring(s, dtype=int, sep=' ').reshape(nrows, ncols)
w = numpy.zeros(dtype=int, shape=a.shape)
h = numpy.zeros(dtype=int, shape=a.shape)
for r in range(nrows):
for c in range(ncols):
if not a[r][c] == skip_not:
continue
if r == 0:
h[r][c] = 1
else:
h[r][c] = h[r-1][c]+1
if c == 0:
w[r][c] = 1
else:
w[r][c] = w[r][c-1]+1
minw = w[r][c]
for dh in range(h[r][c]):
minw = min(minw, w[r-dh][c])
area = (dh+1)*minw
if area > area_max[0]:
area_max = (area, [(r, c, dh+1, minw)])
print('area', area_max[0])
for t in area_max[1]:
print('coord and shape', t)
Output:
area 4
coord and shape (1, 1, 2, 2)

Binary image shearing algorithm

I'm looking for a simple shearing algorithm. The image to be sheared is binary (0 - background pixels, 1 - foreground pixels), represented by a 2D array. It's going to be used for handwritten digit slant correction so the shearing needs to be done on the x axis only.
I found some mathematical explanations, but not sure how to implement it correctly.
Thanks!
Just loop through the rows, starting with the bottom row, and keep track of the current pixelshift along the x-axis (as a floating- or fixed-point number). After every row you increase the shift by the desired constant slope. For drawing purposes you take the nearest integer of the corresponding pixelshift at every row.
In pseudocode this would be:
slope = 0.2; // one pixel shift every five rows
shift = 0.0; // current pixelshift along x-axis
for (row = rows-1; row>=0; row--) {
integershift = round(shift) // round to nearest integer
for (column = columns-1; column>=0; column--) {
sourcecolumn = column + integershift; // get the pixel from this column
if (sourcecolumn < columns)
outputImage[row][column] = inputImage[row][sourcecolumn];
else // draw black if we're outside the inputImage
outputImage[row][column] = 0;
}
shift += slope;
}
This is basically the Bresenham line drawing algorithm, so you should find plenty of implementation details for that.

Calculating length of objects in binary image - algorithm

I need to calculate length of the object in a binary image (maximum distance between the pixels inside the object). As it is a binary image, so we might consider it a 2D array with values 0 (white) and 1 (black). The thing I need is a clever (and preferably simple) algorithm to perform this operation. Keep in mind there are many objects in the image.
The image to clarify:
Sample input image:
I think the problem is simple if the boundary of an object is convex and no three vertices are on a line (i.e. no vertex can be removed without changing the polygon): Then you can just pick two points at random and use a simple gradient-descent type search to find the longest line:
Start with random vertices A, B
See if the line A' - B is longer than A - B where A' is the point left of A; if so, replace A with A'
See if the line A' - B is longer than A - B where A' is the point right of A; if so, replace A with A'
Do the same for B
repeat until convergence
So I'd suggest finding the convex hull for each seed blob, removing all "superfluos" vertices (to ensure convergence) and running the algorithm above.
Constructing a convex hull is an O(n log n) operation IIRC, where n is the number of boundary pixels. Should be pretty efficient for small objects like these. EDIT: I just remembered that the O(n log n) for the convex hull algorithm was needed to sort the points. If the boundary points are the result of a connected component analysis, they are already sorted. So the whole algorithm should run in O(n) time, where n is the number of boundary points. (It's a lot of work, though, because you might have to write your own convex-hull algorithm or modify one to skip the sort.)
Add: Response to comment
If you don't need 100% accuracy, you could simply fit an ellipse to each blob and calculate the length of the major axis: This can be computed from central moments (IIRC it's simply the square root if the largest eigenvalue of the covariance matrix), so it's an O(n) operation and can efficiently be calculated in a single sweep over the image. It has the additional advantage that it takes all pixels of a blob into account, not just two extremal points, i.e. it is far less affected by noise.
Find the major-axis length of the ellipse that has the same normalized second central moments as the region. In MATLAB you can use regionprops.
A very crude, brute-force approach would be to first identify all the edge pixels (any black pixel in the object adjacent to a non-black pixel) and calculate the distances between all possible pairs of edge pixels. The longest of these distances will give you the length of the object.
If the objects are always shaped like the ones in your sample, you could speed this up by only evaluating the pixels with the highest and lowest x and y values within the object.
I would suggest trying an "reverse" distance transform. In the magical world of mathematical morphology (sorry couldn't resist the alliteration) the distance transform gives you the closest distance of each pixel to its nearest boundary pixel. In your case, you are interested in the farthest distance to a boundary pixel, hence I have cleverly applied a "reverse" prefix.
You can find information on the distance transform here and here. I believe that matlab implements the distance transform as per here. That would lead me to believe that you can find an open source implementation of the distance transform in octave. Furthermore, it would not surprise me in the least if opencv implemented it.
I haven't given it much thought but its intuitive to me that you should be able to reverse the distance transform and calculate it in roughly the same amount of time as the original distance transform.
I think you could consider using a breadth first search algorithm.
The basic idea is that you loop over each row and column in the image, and if you haven't visited the node (a node is a row and column with a colored pixel) yet, then you would run the breadth first search. You would visit each node you possibly could, and keep track of the max and min points for the object.
Here's some C++ sample code (untested):
#include <vector>
#include <queue>
#include <cmath>
using namespace std;
// used to transition from given row, col to each of the
// 8 different directions
int dr[] = { -1, 0, 1, -1, 1, -1, 0, 1 };
int dc[] = { -1, -1, -1, 0, 0, 1, 1, 1 };
// WHITE or COLORED cells
const int WHITE = 0;
const int COLORED = 1;
// number of rows and columns
int nrows = 2000;
int ncols = 2000;
// assume G is the image
int G[2000][2000];
// the "visited array"
bool vis[2000][2000];
// get distance between 2 points
inline double getdist(double x1, double y1, double x2, double y2) {
double d1 = x1 - x2;
double d2 = y1 - y2;
return sqrt(d1*d1+d2*d2);
}
// this function performs the breadth first search
double bfs(int startRow, int startCol) {
queue< int > q;
q.push(startRow);
q.push(startCol);
vector< pair< int, int > > points;
while(!q.empty()) {
int r = q.front();
q.pop();
int c = q.front();
q.pop();
// already visited?
if (vis[r][c])
continue;
points.push_back(make_pair(r,c));
vis[r][c] = true;
// try all eight directions
for(int i = 0; i < 8; ++i) {
int nr = r + dr[i];
int nc = c + dc[i];
if (nr < 0 || nr >= nrows || nc < 0 || nc >= ncols)
continue; // out of bounds
// push next node on queue
q.push(nr);
q.push(nc);
}
}
// the distance is maximum difference between any 2 points encountered in the BFS
double diff = 0;
for(int i = 0; i < (int)points.size(); ++i) {
for(int j = i+1; j < (int)points.size(); ++j) {
diff = max(diff,getdist(points[i].first,points[i].second,points[j].first,points[j].second));
}
}
return diff;
}
int main() {
vector< double > lengths;
memset(vis,false,sizeof vis);
for(int r = 0; r < nrows; ++r) {
for(int c = 0; c < ncols; ++c) {
if (G[r][c] == WHITE)
continue; // we don't care about cells without objects
if (vis[r][c])
continue; // we've already processed this object
// find the length of this object
double len = bfs(r,c);
lengths.push_back(len);
}
}
return 0;
}

Resources