How can I find hole in a 2D matrix? - algorithm

I know the title seems kind of ambiguous and for this reason I've attached an image which will be helpful to understand the problem clearly. I need to find holes inside the white region. A hole is defined as one or many cells with value '0' inside the white region I mean it'll have to be fully enclosed by cell's with value '1' (e.g. here we can see three holes marked as 1, 2 and 3). I've come up with a pretty naive solution:
1. Search the whole matrix for cells with value '0'
2. Run a DFS(Flood-Fill) when such a cell (black one) is encountered and check whether we can touch the boundary of the main rectangular region
3. If we can touch boundary during DFS then it's not a hole and if we can't reach boundary then it'll be considered as a hole
Now, this solution works but I was wondering if there's any other efficient/fast solution for this problem.
Please let me know your thoughts. Thanks.

With floodfill, which you already have: run along the BORDER of your matrix and floodfill it, i.e.,
change all zeroes (black) to 2 (filled black) and ones to 3 (filled white); ignore 2 and 3's that come from an earlier floodfill.
For example with your matrix, you start from the upper left, and floodfill black a zone with area 11. Then you move right, and find a black cell that you just filled. Move right again and find a white area, very large (actually all the white in your matrix). Floodfill it. Then you move right again, another fresh black area that runs along the whole upper and right borders. Moving around, you now find two white cells that you filled earlier and skip them. And finally you find the black area along the bottom border.
Counting the number of colours you found and set might already supply the information on whethere there are holes in the matrix.
Otherwise, or to find where they are, scan the matrix: all areas you find that are still of color 0 are holes in the black. You might also have holes in the white.
Another method, sort of "arrested flood fill"
Run all around the border of the first matrix. Where you find "0", you set
to "2". Where you find "1", you set to "3".
Now run around the new inner border (those cells that touch the border you have just scanned).
Zero cells touching 2's become 2, 1 cells touching 3 become 3.
You will have to scan twice, once clockwise, once counterclockwise, checking the cells "outwards" and "before" the current cell. That is because you might find something like this:
22222222222333333
2AB11111111C
31
Cell A is actually 1. You examine its neighbours and you find 1 (but it's useless to check that since you haven't processed it yet, so you can't know if it's a 1 or should be a 3 - which is the case, by the way), 2 and 2. A 2 can't change a 1, so cell A remains 1. The same goes with cell B which is again a 1, and so on. When you arrive at cell C, you discover that it is a 1, and has a 3 neighbour, so it toggles to 3... but all the cells from A to C should now toggle.
The simplest, albeit not most efficient, way to deal with this is to scan the cells clockwise, which gives you the wrong answer (C and D are 1's, by the way)
22222222222333333
211111111DC333333
33
and then scan them again counterclockwise. Now when you arrive to cell C, it has a 3-neighbour and toggles to 3. Next you inspect cell D, whose previous-neighbour is C, which is now 3, so D toggles to 3 again. In the end you get the correct answer
22222222222333333
23333333333333333
33
and for each cell you examined two neighbours going clockwise, one going counterclockwise. Moreover, one of the neighbours is actually the cell you checked just before, so you can keep it in a ready variable and save one matrix access.
If you find that you scanned a whole border without even once toggling a single cell, you can halt the procedure. Checking this will cost you 2(W*H) operations, so it is only really worthwhile if there are lots of holes.
In at most W*H*2 steps, you should be done.
You might also want to check the Percolation Algorithm and try to adapt that one.

Make some sort of a "LinkedCells" class that will store cells that are linked with each other. Then check cells on-by-one in a from-left-to-right-from-top-to-bottom order, making the following check for each cell: if it's neighbouring cell is black - add this cell to that cell's group. Else you should create new group for this cell. You should only check for top and left neighbour.
UPD: Sorry, I forgot about merging groups: if both neighbouring cells are black and are from different groups - you should merege tha groups in one.
Your "LinkedCells" class should have a flag if it is connected to the edge. It is false by default and can be changed to true if you add edge cell to this group. In case of merging two groups you should set new flag as a || of previous flags.
In the end you will have a set of groups and each group having false connection flag will be "hole".
This algorithm will be O(x*y).

You can represent the grid as a graph with individual cells as vertexes and edges occurring between adjacent vertexes. Then you can use Breadth First Search or Depth First Search to start at each of the cells, on the sides. As you will only find the components connected to the sides, the black cells which have not been visited are the holes. You can use the search algorithm again to divide the holes into distinct components.
EDIT: Worst case complexity must be linear to the number of cells, otherwise, give some input to the algorithm, check which cells (as you're sublinear, there will be big unvisited spots) the algorithm hasn't looked into and put a hole in there. Now you've got an input for which the algorithm doesn't find one of the holes.

Your algorithm is globally Ok. It's just a matter of optimizing it by merging the flood fill exploration with the cell scanning. This will just minimize tests.
The general idea is to perform the flood fill exploration line by line while scanning the table. So you'll have multiple parallel flood fill that you have to keep track of.
The table is then processed row by row from top to bottom, and each row processed from right to left. The order is arbitrary, could be reverse if you prefer.
Let segments identify a sequence of consecutive cells with value 0 in a row. You only need the index of the first and last cell with value 0 to define a segment.
As you may guess a segment is also a flood fill in progress. So we'll add an identification number to the segments to distinguish between the different flood fills.
The nice thing of this algorithm is that you only need to keep track of segments and their identification number in row i and i-1. So that when you process row i, you have the list of segments found in the row i-1 and their associated identification number.
You then have to process segment connection in row i and row i-1. I'll explain below how this can be made efficient.
For now you have to consider three cases:
found a segment in row i not connected to a segment in row i-1. Assign it a new hole identification (incremented integer). If it's connected to the border of the table, make this number negative.
found a segment in row i-1 not connected to a segment in row i-1. You found the lowest segment of a hole. If it has a negative identification number it is connected to the border and you can ignore it. Otherwise, congratulation, you found a hole.
found a segment in row i connected to one or more segments in row i-1. Set the identification number of all these connected segments to the smallest identification number. See the following possible use case.
row i-1: 2 333 444 111
row i : **** *** ***
The segments in row i should all get the value 1 identifying the same flood fill.
Matching segments in rows i and row i-1 can be done efficiently by keeping them in order from left to right and comparing segments indexes.
Process segments by lowest start index first. Then check if it's connected to the segment with lowest start index of the other row. If no, process case 1 or 2. Otherwise continue identifying connected segments, keeping track of the smallest identification number. When no more connected segments is found, set the identification number of all connected segments found in row i to the smallest identification value.
Index comparison for connectivity test can by optimized by storing (first-1,last) as segment definition since segments may be connected by their corners. You then can directly compare indexes bare value and detect overlapping segments.
The rule to pick the smallest identification number ensures that you automatically get the negative number for connected segments and at least one connected to the border. It propagates to other segments and flood fills.
This is a nice exercise to program. You didn't specify the exact output you need. So this is also left as exercise.

The brute force algorithm as described here is as follow.
We now assume we can write in cells a value different from 0 or 1.
You need a flood fill functions receiving the coordinates of a cell to start from and an integer value to write into all connected cells holding the value 0.
Since you need to only consider holes (cells with value 0 surrounded by cells with value 1), you have to use two pass.
A first pass visit only cells touching the border. For every cell containing the value 0, you do a flood fill with the value -1. This tells you that this cell has a value different of 1 and has a connection to the border. After this scan, all cells with a value 0 belong to one or more holes.
To distinguish between different holes, you need the second scan. You then scan the remaining cells in the rectangle (1,1)x(n-2,n-2) you didn't scan yet. Whenever your scan hit a cell with value 0, you discovered a new hole. You then flood fill this hole with the integer of your choice to distinguish it from the others. After that you proceed with the scan until all cells have been visited.
When done, you may replace the values -1 with 0 because there shouldn't be any 0 left.
This algorithm works, but is not as efficient as the other algorithm I propose. Its advantage is that it's simple and doesn't need an extra data storage to hold the segments, hole identification and eventual segment chaining reference.

Related

Number of paths in a rectangular grid with obstacles, after blocking one cell

Given a square grid of size NxN with each cell being empty or having an obstacle, block only one cell so as to minimize the number of paths from top left to bottom right corner. You are only allowed to move one step down or right. After blocking one cell, count the number of paths from top left to bottom right cell. There are always at least 3 empty cell. Two of them are always the start and the finish cell and other one can be any of the remaining cells.
The part of counting the number of paths from top left to bottom right is fairly easy and can be solved easily using dynamic programming.
The part I'm stuck at is the one cell to be blocked to minimize number of paths. Intuition says to search the grid horizontally and block the first cell with max number of incoming and outgoing paths. For example for the grid
..## -> Row 1
..##
....
.... -> Row 4
I would block (3,2) because that would block most of the path and the number of paths remaining would be just one. But I am not entirely confident this is the correct approach. Any insights?
The part of counting the number of paths from top left to bottom right is fairly easy and can be solved easily using dynamic programming
This algorithm is an excellent starting point. Consider its implementation that uses an array pathsFromStart[N][N] to store the number of paths from the starting point to a point at (row, col). Run the algorithm again, but now start at the end . This gives you a second 2D array, pathsFromFinish[N][N].
With this two arrays in hand you are ready to find the answer to your original problem: if a point at (row,col) has X paths leading to it from the start, and Y paths leading to it from the finish, then the total number of paths that you would cut by removing that point would be XY. Go through all points on the grid excluding the start, the finish, and the points that are already blocked, and exclude the point with
MAX(pathsFromStart[row][col]*pathsFromFinish[row][col])

Nearest neighbor search in 2D using a grid partitioning

I have a fairly large set of 2D points (~20000) in a set, and for each point in the x-y plane want to determine which point from the set is closest. (Actually, the points are of different types, and I just want to know which type is closest. And the x-y plane is a bitmap, say 640x480.)
From this answer to the question "All k nearest neighbors in 2D, C++" I got the idea to make a grid. I created n*m C++ vectors and put the points in the vector, depending on which bin it falls into. The idea is that you only have to check the distance of the points in the bin, instead of all points. If there is no point in the bin, you continue with the adjacent bins in a spiralling manner.
Unfortunately, I only read Oli Charlesworth's comment afterwards:
Not just adjacent, unfortunately (consider that points in the cell two
to the east may be closer than points in the cell directly north-east,
for instance; this problem gets much worse in higher dimensions).
Also, what if the neighbouring cells happen to have less than 10
points in them? In practice, you will need to "spiral out".
Fortunately, I already had the spiraling code figured out (a nice C++ version here, and there are other versions in the same question). But I'm still left with the problem:
If I find a hit in a cell, there could be a closer hit in an adjacent cell (yellow is my probe, red is the wrong choice, green the actual closest point):
If I find a hit in an adjacent cell, there could be a hit in a cell 2 steps away, as Oli Charlesworth remarked:
But even worse, if I find a hit in a cell two steps away, there could still be a closer hit in a hit three steps away! That means I'd have to consider all cells with dx,dy= -3...3, or 49 cells!
Now, in practice this won't happen often, because I can choose my bin size so the cells are filled enough. Still, I'd like to have a correct result, without iterating over all points.
So how do I find out when to stop "spiralling" or searching? I heard there is an approach with multiple overlapping grids, but I didn't quite understand it. Is it possible to salvage this grid technique?
Since the dimensions of your bitmap are not large and you want to calculate the closest point for every (x,y), you can use dynamic programming.
Let V[i][j] be the distance from (i,j) to the closest point in the set, but considering only the points in the set that are in the "rectangle" [(1, 1), (i, j)].
Then V[i][j] = 0 if there is a point in (i, j), or V[i][j] = min(V[i'][j'] + dist((i, j), (i', j'))) where (i', j') is one of the three neighbours of (i,j):
i.e.
(i - 1, j)
(i, j - 1)
(i - 1, j - 1)
This gives you the minimum distance, but only for the "upper left" rectangle. We do the same for the "upper right", "lower left", and "lower right" orientations, and then take the minimum.
The complexity is O(size of the plane), which is optimal.
For you task usually a Point Quadtree is used, especially when the points are not evenly distributed.
To save main memory you als can use a PM or PMR-Quadtree which uses buckets.
You search in your cell and in worst case all quad cells surounding the cell.
You can also use a k-d tree.
A solution im trying
First make a grid such that you have an average of say 1 (more if you want larger scan) points per box.
Select the center box. Continue selecting neighbor boxes in a circular manner until you find at least one neighbor. At this point you can have 1 or 9 or so on boxes selected
Select one more layer of adjacent boxes
Now you have a fairly small list of points, usually not more than 10 which you can punch into the distance formula to find the nearest neighbor.
Since you have on average 1 points per box, you will mostly be selecting 9 boxes and comparing 9 distances. Can adjust grid size according to your dataset properties to achieve better results.
Also, if your data has a lot of variance, you can try 2 levels of grid (or even more) so if selection works and returns more than 50 points in a single query, start a next grid search with a grid 1/10th the size ...
One solution would be to construct multiple partitionings with different grid sizes.
Assume you create partitions at levels 1,2,4,8,..
Now, search for a point in grid size 1 (you are basically searching in 9 squares). If there is a point in the search area and if distance to that point is less than 1, stop. Otherwise move on to the next grid size.
The number of grids you need to construct is about twice as compared to creating just one level of partitioning.

Scanline fill algorithm using odd/parity rule

The question is this:
Number the rows and columns in the following figure (outside the figure). Use these row column numbers to show how the scanline stack region filling algorithm would fill in this figure, starting at the pixel indicated. Show the contents of the stack at each phase of the algorithm and show the location in the figure of the pixels on the stack.
Since row 0 is already filled moving to
row 1, it's fairly simple that party turns odd at (0,1) and fills until it turns even again at (12,1)
row 2. (0,2) triggers parity odd so it fills the next 2 pixels. at (3,2), I'm confused between the rule "vertices on horizontal line does not count" vs "count vertex if it's Ymin of that". How do I proceed at this part? and how will the rest of pixel should be treated? All the examples I could find concerning those two rules involves polygons with pointy vertices, not like the one I uploaded.

Way to determine if a group of touching nodes assigned values in a graph can be added to a specific value?

I have abstracted my actual problem into a more general graph problem for the sake of asking.
I have an undirected graph where each node is assigned an integer. How can I determine whether or not there exists in the graph a group of touching nodes that add to a specific value?
Edit: More specifics: I have a grid of tiles that are labeled with either a positive or negative integer. A single "move" would be to combine any one tile with an adjacent tile (not diagonal), leaving a gap in the grid behind. If the tile being combined into has a value of zero now, it also is removed. I basically need to do a test after every time a tile that has a value of 0 is removed in order to see if any more possible "moves" that result in values of 0 exist in the grid, even if it takes multiple combinations to get to them.
The grid is only 6x6.
I suppose since I only need to test if at least one "zeroing" still exists in the grid, most of the time it wouldn't be hitting the worst case. It can stop searching as soon as it finds one.
A straightforward approach is to consider each group of touching nodes and solve the subset sum problem on them.
Whether this is affordable obviously depends on what sort of graph you have and what the exact definition of touching nodes is. Is it just a particular node and its neighbours? In that case, your diagram makes this look tractable, as no node has more than four neighbours. If n grows with the size of the graph, you are probably in trouble.
Edit - add 6x6 answer
On a 6x6 grid a modification of one of the subset sum algorithms looks expensive but not impossible.
With 18 nodes in the top half of the grid, there are 2^18 patterns of nodes, including the null pattern, which you can ignore, and some of these can't make contiguous patterns even if e.g. they match a bottom half made up of all 18 nodes, so you can generate and store all of the possible ones, e.g. by backtrack search. An 18-bit bitmap and a sum doesn't even take up that much space per pattern. You finish early if any sum to zero. If not, discard all the patterns with no nodes on the lowest line, and keep the rest.
Repeat - swapping lowest for highest - with the bottom half, and then, for each node in the top half, look at all the bottom half patterns with their sum the -ve of the top half sum. If the lowest top half line meets up with the highest bottom half line and the result is contiguous you have a solution. If you didn't get any early zero sums, and none of the a + (-a) patterns match up, then there is no solution.

Algorithm for labeling edges of a triangular mesh

Introduction
As part of a larger program (related to rendering of volumetric graphics), I have a small but tricky subproblem where an arbitrary (but finite) triangular 2D mesh needs to be labeled in a specific way. Already a while ago I wrote a solution (see below) which was good enough for the test meshes I had at the time, even though I realized that the approach will probably not work very well for every possible mesh that one could think of. Now I have finally encountered a mesh for which the present solution does not perform that well at all -- and it looks like I should come up with a totally different kind of an approach. Unfortunately, it seems that I am not really able to reset my lines of thinking, which is why I thought I'd ask here.
The problem
Consider the picture below. (The colors are not part of the problem; I just added them to improve (?) the visualization. Also the varying edge width is a totally irrelevant artifact.)
For every triangle (e.g., the orange ABC and the green ABD), each of the three edges needs to be given one of two labels, say "0" or "1". There are just two requirements:
Not all the edges of a triangle can have the same label. In other words, for every triangle there must be two "0"s and one "1", or two "1"s and one "0".
If an edge is shared by two triangles, it must have the same label for both. In other words, if the edge AB in the picture is labeled "0" for the triangle ABC, it must be labeled "0" for ABD, too.
The mesh is a genuine 2D one, and it is finite: i.e., it does not wrap, and it has a well-defined outer border. Obviously, on the border it is quite easy to satisfy the requirements -- but it gets more difficult inside.
Intuitively, it looks like at least one solution should always exist, even though I cannot prove it. (Usually there are several -- any one of them is enough.)
Current solution
My current solution is a really brute-force one (provided here just for completeness -- feel free to skip this section):
Maintain four sets of triangles -- one for each possible count (0..3) of edges remaining to be labeled. In the beginning, every triangle is in the set where three edges remain to be labeled.
For as long as there are triangles with non-labeled edges:Find the smallest non-zero number of unallocated edges for which there are still triangles left. In other words: at any given time, we try to minimize the number of triangles for which the labeling has been partially completed. The number of edges remaining will be anything between 1 and 3. Then, just pick one such triangle with this specific number of edges remaining to be allocated. For this triangle, do the following:
See if the labeling of any remaining edge is already imposed by the labeling of some other triangle. If so, assign the labels as implied by requirement #2 above.
If this results in a dead end (i.e., requirement #1 can no more be satisfied for the present triangle), then start over the whole process from the very beginning.
Allocate any remaining edges as follows:
If no edges have been labeled so far, assign the first one randomly.
When one edge already allocated, assign the second one so that it will have the opposite label.
When two edges allocated: if they have the same label, assign the third one to have the opposite label (obviously); if the two have different labels, assign the third one randomly.
Update the sets of triangles for the different counts of unallocated edges.
If we ever get here, then we have a solution -- hooray!
Usually this approach finds a solution within just a couple of iterations, but recently I encountered a mesh for which the algorithm tends to terminate only after one or two thousands of retries... Which obviously suggests that there may be meshes for which it never terminates.
Now, I would love to have a deterministic algorithm that is guaranteed to always find a solution. Computational complexity is not that big an issue, because the meshes are not very large and the labeling basically only has to be done when a new mesh is loaded, which does not happen all the time -- so an algorithm with (for example) exponential complexity ought to be fine, as long as it works. (But of course: the more efficient, the better.)
Thank you for reading this far. Now, any help would be greatly appreciated!
Edit: Results based on suggested solutions
Unfortunately, I cannot get the approach suggested by Dialecticus to work. Maybe I did not get it right... Anyway, consider the following mesh, with the start point indicated by a green dot:
Let's zoom in a little bit...
Now, let's start the algorithm. After the first step, the labeling looks like this (red = "starred paths", blue = "ringed paths"):
So far so good. After the second step:
And the third:
... fourth:
But now we have a problem! Let's do one more round - but please pay attention on the triangle plotted in magenta:
According to my current implementation, all the edges of the magenta triangle are on a ring path, so they should be blue -- which effectively makes this a counterexample. Now maybe I got it wrong somehow... But in any case the two edges that are nearest to the start node obviously cannot be red; and if the third one is labeled red, then it seems that the solution does not really fit the idea anymore.
Btw, here is the data used. Each row represents one edge, and the columns are to be interpreted as follows:
Index of first node
Index of second node
x coordinate of first node
y coordinate of first node
x coordinate of second node
y coordinate of second node
The start node is the one having index 1.
I guess that next I should try the method suggested by RafaƂ Dowgird... But perhaps I ought to do something completely different for a while :)
If you order the triangles so that for every triangle at most 2 of its neighbors precede it in the order, then you are set: just color them in this order. The condition guarantees that for each triangle being colored you will always have at least one uncolored edge whose color you can choose so that the condition is satisfied.
Such an order exists and can be constructed the following way:
Sort all of the vertices left-to-right, breaking ties by top-to-bottom order.
Sort the triangles by their last vertex in this order.
When several triangles share the same last vertex, break ties by sorting them clockwise.
Given any node in the mesh the mesh can be viewed as set of concentric rings around this node (like spider's web). Give all edges that are not in the ring (starred paths) a value of 0, and give all edges that are in the ring (ringed paths) a value of 1. I can't prove it, but I'm certain you will get the correct labeling. Every triangle will have exactly one edge that is part of some ring.

Resources