Let's say I have a list of points (x,y) that are correspond to the black dots in the image below, which is a rectangular grid. Here, there are four curved "rows", and eight "columns".
How would I group the each row of points together? In other words, in the image below, how I do group together the first row of points circled in blue (let's call this Group 1), and group together the second row of points circled in blue (let's call this Group 2), etc.
My initial intuition says to start with the top-left point, and search for the closest point using a distance metric that would penalize the y-distance between two points. However, the problem I run into is that when I reach the last point in the first row, how do I know that the row is "complete", and I shouldn't add the right-most point of the 2nd row to my group of points?
Is there a better approach to this type of problem?
That highly depends on how the points are distributed.
For this special case a simple solution would be:
sort points by x
split point list into groups of 4 consecutive points (that's your columns)
sort columns by y
pick the first element of each column and put into row 1
pick the second element of each column and put it into row 2
...
Related
Suppose we have a grid and we want to paint rectangular regions on it using the smallest number of colors possible, one for each region.
There are some cells that are already painted black and cannot be painted over:
Is there a polynomial algorithm to solve this problem?
After testing, I found out that the solution for this case is 9 (because we need 9 different colors to paint the minimum number of regions to fill the whole grid):
The greedy approach seems to work well: just search for the rectangle with biggest (white) area and paint it, repeating this until there's nothing else to be painted, but I didn't measure the complexity or the correctness.
Here are a few observations that can simplify this problem in specific cases. First of all, adjacent identical rows and columns can be reduced to one row or column without changing the required number of regions, to form a simplified grid:
A simplified grid where no row or column is divided into more than two uncoloured parts (i.e. has two or more seperate black cells), has an optimal solution which can be found by using the rows or columns as regions (depending on whether the width or height of the grid is greater):
The number of regions is then minimum(width, height) + number of black cells.
If a border row or column in a simplified grid contains no black cells, then using it as a region is always the optimal solution; adding some parts of it to other regions would require at least one additional region to be made in the border row or column (depending on the number of black cells in the adjacent row or column):
This means that the grid can be further simplified by removing border rows and columns with no black cells, and adding the number of removed regions to the region count:
Similarly, if one or more border cells are isolated by a black cell in the adjacent row or column, all the connected uncoloured neighbouring cells can be regarded as one region:
At each point you can go back to previous rules; e.g. after the right- and left-most columns have been turned into regions in the example above, we are left with the grid below, which can be simplified with the first rule, because the bottom two rows are identical:
Collapsing identical adjacent rows or columns can also be applied locally to isolated parts of the grid. The example below has no identical adjacent rows, but the center part is isolated, so there rows 3 to 6 can be collapsed:
And on the left row 3 and 4 can be collapsed locally, and on the right rows 5 and 6, so we end up with the situation in the third image above. These collapsed cells then act as one.
Once you can't find any further simplifications using the rules above, and you want to check every possible division of (part of) a grid, a first step could be to list the maximum rectangle sizes that can be made with the corresponding cell as their top left corner; for the simplified 6x7 grid in the first example above that would be:
COL.1 COL.2 COL.3 COL.4 COL.5 COL.6
ROW 1 [6x1, 3x3, 1x7] [5x1, 2x3] [4x1, 1x7] [3x1] [2x5] [1x7]
ROW 2 [3x2, 1x6] [2x2] [1x6] [] [2x4] [1x6]
ROW 3 [6x1, 1x5] [5x1] [4x3, 2x5] [3x3, 1x5] [2x3] [1x5]
ROW 4 [1x4] [] [4x2, 2x4] [3x2, 1x4] [2x2] [1x4]
ROW 5 [6x1, 4x3] [5x1, 3x3] [4x1, 2x3] [3x1, 1x3] [2x1] [1x3]
ROW 6 [4x2] [3x2] [2x2] [1x2] [] [1x2]
ROW 7 [6x1] [5x1] [4x1] [3x1] [2x1] [1x1]
You can then use these maximum sizes to generate every option for each cell; e.g. for cell (1,1) they would be:
6x1, 5x1, 4x1, 3x3, 3x2, 3x1, 2x3, 2x2, 2x1, 1x7, 1x6, 1x5, 1x4, 1x3, 1x2, 1x1
(Some rectangle sizes in the list can be skipped; e.g. it never makes sense to use the 3x1-sized region without adding the fourth isolated cell to get 4x1.)
After choosing an option, you would skip the cells which are covered by the rectangle you've chosen and try each option for the next cell, and so on...
Running this on large grids will lead to huge numbers op options. However, at each point you can go back to checking whether the simplification rules can help.
To see that a greedy algorithm, which selects the largest rectangles first, cannot guarantee an optimal solution, consider the example below. Selecting the 2x2 square in the middle would lead to a solution with 5 regions, while several solutions with only 4 regions exist.
I created a 10 x 10 matrix which was originally filled up with empty spaces. Then I placed 25 characters 'a' in random cells on the grid. Now I need to move each one of them to one of the two to four adjacent cells (consider the 'a' next to the edge or in the corner). If the chosen destination was occupied by another 'a', it should stay where it is.
Here is my approach to the problem (the code itself looks lengthy now)
break the 10✕10 grid into 9 possible situations (in the middle 8✕8, four corners and four edges of the grid)
for each of case above, I check each one of the two to four adjacent cells. If it containes an empty space, I swap 'a' with it.
However, this results in a huge ugly piece of code. So, I wanted to ask: (1)is there a way that I can do the boundary check without breaking the grid into 8 different situations? (2) How can I choose the destination for each of the 25 'a' RANDOMLY?
Thank you so much in advance!!!
I know the title seems kind of ambiguous and for this reason I've attached an image which will be helpful to understand the problem clearly. I need to find holes inside the white region. A hole is defined as one or many cells with value '0' inside the white region I mean it'll have to be fully enclosed by cell's with value '1' (e.g. here we can see three holes marked as 1, 2 and 3). I've come up with a pretty naive solution:
1. Search the whole matrix for cells with value '0'
2. Run a DFS(Flood-Fill) when such a cell (black one) is encountered and check whether we can touch the boundary of the main rectangular region
3. If we can touch boundary during DFS then it's not a hole and if we can't reach boundary then it'll be considered as a hole
Now, this solution works but I was wondering if there's any other efficient/fast solution for this problem.
Please let me know your thoughts. Thanks.
With floodfill, which you already have: run along the BORDER of your matrix and floodfill it, i.e.,
change all zeroes (black) to 2 (filled black) and ones to 3 (filled white); ignore 2 and 3's that come from an earlier floodfill.
For example with your matrix, you start from the upper left, and floodfill black a zone with area 11. Then you move right, and find a black cell that you just filled. Move right again and find a white area, very large (actually all the white in your matrix). Floodfill it. Then you move right again, another fresh black area that runs along the whole upper and right borders. Moving around, you now find two white cells that you filled earlier and skip them. And finally you find the black area along the bottom border.
Counting the number of colours you found and set might already supply the information on whethere there are holes in the matrix.
Otherwise, or to find where they are, scan the matrix: all areas you find that are still of color 0 are holes in the black. You might also have holes in the white.
Another method, sort of "arrested flood fill"
Run all around the border of the first matrix. Where you find "0", you set
to "2". Where you find "1", you set to "3".
Now run around the new inner border (those cells that touch the border you have just scanned).
Zero cells touching 2's become 2, 1 cells touching 3 become 3.
You will have to scan twice, once clockwise, once counterclockwise, checking the cells "outwards" and "before" the current cell. That is because you might find something like this:
22222222222333333
2AB11111111C
31
Cell A is actually 1. You examine its neighbours and you find 1 (but it's useless to check that since you haven't processed it yet, so you can't know if it's a 1 or should be a 3 - which is the case, by the way), 2 and 2. A 2 can't change a 1, so cell A remains 1. The same goes with cell B which is again a 1, and so on. When you arrive at cell C, you discover that it is a 1, and has a 3 neighbour, so it toggles to 3... but all the cells from A to C should now toggle.
The simplest, albeit not most efficient, way to deal with this is to scan the cells clockwise, which gives you the wrong answer (C and D are 1's, by the way)
22222222222333333
211111111DC333333
33
and then scan them again counterclockwise. Now when you arrive to cell C, it has a 3-neighbour and toggles to 3. Next you inspect cell D, whose previous-neighbour is C, which is now 3, so D toggles to 3 again. In the end you get the correct answer
22222222222333333
23333333333333333
33
and for each cell you examined two neighbours going clockwise, one going counterclockwise. Moreover, one of the neighbours is actually the cell you checked just before, so you can keep it in a ready variable and save one matrix access.
If you find that you scanned a whole border without even once toggling a single cell, you can halt the procedure. Checking this will cost you 2(W*H) operations, so it is only really worthwhile if there are lots of holes.
In at most W*H*2 steps, you should be done.
You might also want to check the Percolation Algorithm and try to adapt that one.
Make some sort of a "LinkedCells" class that will store cells that are linked with each other. Then check cells on-by-one in a from-left-to-right-from-top-to-bottom order, making the following check for each cell: if it's neighbouring cell is black - add this cell to that cell's group. Else you should create new group for this cell. You should only check for top and left neighbour.
UPD: Sorry, I forgot about merging groups: if both neighbouring cells are black and are from different groups - you should merege tha groups in one.
Your "LinkedCells" class should have a flag if it is connected to the edge. It is false by default and can be changed to true if you add edge cell to this group. In case of merging two groups you should set new flag as a || of previous flags.
In the end you will have a set of groups and each group having false connection flag will be "hole".
This algorithm will be O(x*y).
You can represent the grid as a graph with individual cells as vertexes and edges occurring between adjacent vertexes. Then you can use Breadth First Search or Depth First Search to start at each of the cells, on the sides. As you will only find the components connected to the sides, the black cells which have not been visited are the holes. You can use the search algorithm again to divide the holes into distinct components.
EDIT: Worst case complexity must be linear to the number of cells, otherwise, give some input to the algorithm, check which cells (as you're sublinear, there will be big unvisited spots) the algorithm hasn't looked into and put a hole in there. Now you've got an input for which the algorithm doesn't find one of the holes.
Your algorithm is globally Ok. It's just a matter of optimizing it by merging the flood fill exploration with the cell scanning. This will just minimize tests.
The general idea is to perform the flood fill exploration line by line while scanning the table. So you'll have multiple parallel flood fill that you have to keep track of.
The table is then processed row by row from top to bottom, and each row processed from right to left. The order is arbitrary, could be reverse if you prefer.
Let segments identify a sequence of consecutive cells with value 0 in a row. You only need the index of the first and last cell with value 0 to define a segment.
As you may guess a segment is also a flood fill in progress. So we'll add an identification number to the segments to distinguish between the different flood fills.
The nice thing of this algorithm is that you only need to keep track of segments and their identification number in row i and i-1. So that when you process row i, you have the list of segments found in the row i-1 and their associated identification number.
You then have to process segment connection in row i and row i-1. I'll explain below how this can be made efficient.
For now you have to consider three cases:
found a segment in row i not connected to a segment in row i-1. Assign it a new hole identification (incremented integer). If it's connected to the border of the table, make this number negative.
found a segment in row i-1 not connected to a segment in row i-1. You found the lowest segment of a hole. If it has a negative identification number it is connected to the border and you can ignore it. Otherwise, congratulation, you found a hole.
found a segment in row i connected to one or more segments in row i-1. Set the identification number of all these connected segments to the smallest identification number. See the following possible use case.
row i-1: 2 333 444 111
row i : **** *** ***
The segments in row i should all get the value 1 identifying the same flood fill.
Matching segments in rows i and row i-1 can be done efficiently by keeping them in order from left to right and comparing segments indexes.
Process segments by lowest start index first. Then check if it's connected to the segment with lowest start index of the other row. If no, process case 1 or 2. Otherwise continue identifying connected segments, keeping track of the smallest identification number. When no more connected segments is found, set the identification number of all connected segments found in row i to the smallest identification value.
Index comparison for connectivity test can by optimized by storing (first-1,last) as segment definition since segments may be connected by their corners. You then can directly compare indexes bare value and detect overlapping segments.
The rule to pick the smallest identification number ensures that you automatically get the negative number for connected segments and at least one connected to the border. It propagates to other segments and flood fills.
This is a nice exercise to program. You didn't specify the exact output you need. So this is also left as exercise.
The brute force algorithm as described here is as follow.
We now assume we can write in cells a value different from 0 or 1.
You need a flood fill functions receiving the coordinates of a cell to start from and an integer value to write into all connected cells holding the value 0.
Since you need to only consider holes (cells with value 0 surrounded by cells with value 1), you have to use two pass.
A first pass visit only cells touching the border. For every cell containing the value 0, you do a flood fill with the value -1. This tells you that this cell has a value different of 1 and has a connection to the border. After this scan, all cells with a value 0 belong to one or more holes.
To distinguish between different holes, you need the second scan. You then scan the remaining cells in the rectangle (1,1)x(n-2,n-2) you didn't scan yet. Whenever your scan hit a cell with value 0, you discovered a new hole. You then flood fill this hole with the integer of your choice to distinguish it from the others. After that you proceed with the scan until all cells have been visited.
When done, you may replace the values -1 with 0 because there shouldn't be any 0 left.
This algorithm works, but is not as efficient as the other algorithm I propose. Its advantage is that it's simple and doesn't need an extra data storage to hold the segments, hole identification and eventual segment chaining reference.
I have many horizontal and vertical lines which make up rectangle such as in this example.
Is there an algorithm or code which can locate every rectangle which does not contain another rectangle. I mean, the largest rectangle in this image is not a rectangle I am looking for because it contains other rectangles inside of it.
The rectangles I am looking for must be empty. I have a list of the starting points and end points of each line like (a,b) to (c,d). I want as a result a list of rectangles (x,y,w,h) or equivalent.
Note that some lines have lines intersecting them at right angles, for example the top line of the widest rectangle in this image is a single line it has an intersecting vertical line going downwards.
These kind of questions are largely answered by some standard Computational Geometry algorithms. I can think of a vertical sweep line algorithm for this particular problem.
Assuming a rectangle is represented by a pair of points (p1, p2), where p1 is the upper left corner and p2 is the bottom right corner. And a point has two attributes (can be accessed as p.x and p.y).
Here is the algorithm.
Sort all the pairs of points - O(n log n)
Initialize a list called sweep line status. This will hold all the rectangles that are encountered till now, that are alive. Also initialize another list called event queue that holds upcoming events. This event queue currently holds starting points of all the rectagles.
Process the events, start from the first element in the event queue.
If the event is a start point, then add that rectangle to sweep line status (in sorted order by y-coordinate) (in O(log n) time) and add its bottom-right point to the event queue at the appropriate position (sorted by the points) (again in O(log n) time). When you add it to sweep line status, you just need to check if this point lies in the rectangle alive just above it in the sweep line status. If it does lie inside, this is not your rectangle, otherwise, add this to your list of required rectangles.
If the event is an end point, just remove the correspoinding rectangle from the sweep line status.
Running time (for n rectangles):
Sorting takes O(n log n).
Number of events = 2*n = O(n)
Each event takes O(log n) time (for insertions in event queue as well as sweep line status. So total is O(n log n).
Therefore, O(n log n).
For more details, please refer to Bentley–Ottmann algorithm. The above just a simple modification of this.
EDIT:
Just realized that input is in terms of line segments, but since they always form rectangles (according to question), a linear traversal for a pre-process can convert them into the rectangle (pair of points) form.
I think a different representation will help you solve your problem. As an example, consider the large rectangle (without the block on the end). There are four unique x and y coordinates, sort and index them. Pictorially it would look like this:
If there is a corner of a rectangle on the coordinate (x_i, y_j) put it in a matrix like so:
__|_1__2__3__4_
1 | x x 0 x
2 | x x 0 0
3 | 0 x x x
4 | x x x x
Now by definition, a rectangle in real space is a rectangle on the matrix coordinates. For example there is a rectangle at (3,2) (3,4) (4,4), (4,3) but it is not a "base" rectangle since it contains a sub-rectangle (3,3) (3,4), (4,4), (4,3). A recursive algorithm is easily seen here and for added speed use memoization to prevent repetitive calculations.
A sweep-line algorithm...
Structures required:
V = A set of the vertical lines, sorted by x-coordinate.
H = A set of all start and end points of the horizontal lines (and have each point keep a reference to the line) and sorted by x-coordinate.
CH = An (initially empty) sorted (by y-coordinate) set of current horizontal lines.
CR = A sorted (by y-coordinate) set of current rectangles. These rectangles will have left, top and bottom coordinates, but not yet a right coordinate. Note that there will be no overlap in this set.
Algorithm:
Simultaneously process V and H from left to right.
Whenever a start of horizontal line is encountered, add the line to CH.
Whenever an end of horizontal line is encountered, remove this from CH.
Whenever a vertical line is encountered:
Remove from CR all rectangles that overlap with the line. For all removed rectangle, if it is completely contained within the line, compare its size with the best rectangle thus far and store it if better.
Process each element in CH iteratively between the bottom point and the top point of the line as follows:
Add a rectangle to CR with the last processed point as bottom, the current point as top and the vertical line's y-coordinate as left.
Done.
Note:
When the x-coordinate of horizontal start points or end points or vertical lines are equal the following order must be maintained:
x of horizontal start < x of vertical line < x of horizontal finish
Otherwise you'll miss rectangles.
Are all your lines parallel to either x or y axis? Or, all your lines either parallel or perpendicular?
From the example you gave I am assuming all your lines are parallel to x or y axis. In such case your lines are going to be [(a,b), (a,d)] or [(a,b), (c,b)].
In any case, the first task is to find the corners. that is set of points where two perpendicular lines meet.
The second task is to detect rectangles. For every pair of corners you can check if they do form rectangles.
The third task is to find if a rectangle has any rectangles within itself.
For the first task, you need to separate lines into two sets: vertical and horizontal. After that sort one of the sets. Ex. Sort vertical lines according to their x axis coordinates. Then you can take all the horizontal lines and do a binary search to find all the intersecting points.
For the second task, consider every pair of corners and see if the other two corners exist. If yes, then see if there are lines to join all these four corners. If yes, you have a rectangle.
For the third task, put all the rectangles in a interval tree. After that you can check if two rectangles overlap.
The question is this:
Number the rows and columns in the following figure (outside the figure). Use these row column numbers to show how the scanline stack region filling algorithm would fill in this figure, starting at the pixel indicated. Show the contents of the stack at each phase of the algorithm and show the location in the figure of the pixels on the stack.
Since row 0 is already filled moving to
row 1, it's fairly simple that party turns odd at (0,1) and fills until it turns even again at (12,1)
row 2. (0,2) triggers parity odd so it fills the next 2 pixels. at (3,2), I'm confused between the rule "vertices on horizontal line does not count" vs "count vertex if it's Ymin of that". How do I proceed at this part? and how will the rest of pixel should be treated? All the examples I could find concerning those two rules involves polygons with pointy vertices, not like the one I uploaded.