Connected component labeling with diagonal connections using union-find - algorithm

I'm trying to develop a modification of the connected component algorithm I found as an answer to this question: Connected Component Labelling.
Basically, I have 2d- and 3d- matrices consisting of 0s and 1s. My problem is to find connected regions of 1s, labeling each region separately. The matrix sizes can be very large (consisting of 5e4-by-5e4 elements in 2-d and 1000^3 elements in 3d). So I need something which doesn't strain the stack memory, and which is fast enough to repeat several times over the course of a simulation.
The most upvoted answer to that question, using depth-first search, gives a stack overflow error (as noted in a comment). I have been trying to use the union-find algorithm suggested by another user.
The original code (by user Dukeling) works very well for large 2-d matrices, but I want to have diagonal connections between elements. Here's my code, with the example input I am trying to use:
#include <iostream>
#include <stdio.h>
#include <stdlib.h>
const int w = 8, h = 8;
int input[w][h] = {{1,0,0,0,1,0,0,1},
{1,1,0,1,1,1,1,0},
{0,1,0,0,0,0,0,1},
{1,1,1,1,0,1,0,1},
{0,0,0,0,0,0,1,0},
{0,0,1,0,0,1,0,0},
{0,1,0,0,1,1,1,0},
{1,0,1,1,0,1,0,1}};
int component[w*h];
void doUnion(int a, int b)
{
// get the root component of a and b, and set the one's parent to the other
while (component[a] != a)
a = component[a];
while (component[b] != b)
b = component[b];
component[b] = a;
}
void unionCoords(int x, int y, int x2, int y2)
{
if (y2 < h && x2 < w && input[x][y] && input[x2][y2] && y2 > 0 && x2 > 0)
doUnion(x*h + y, x2*h + y2);
}
int main()
{
int i, j;
for (i = 0; i < w*h; i++)
component[i] = i;
for (int x = 0; x < w; x++)
for (int y = 0; y < h; y++)
{
unionCoords(x, y, x+1, y);
unionCoords(x, y, x, y+1);
unionCoords(x, y, x+1, y+1);
unionCoords(x, y, x-1, y+1);
unionCoords(x, y, x+1, y-1);
unionCoords(x, y, x-1, y-1);
}
// print the array
for (int x = 0; x < w; x++)
{
for (int y = 0; y < h; y++)
{
if (input[x][y] == 0)
{
printf("%4d ",input[x][y]);
continue;
}
int c = x*h + y;
while (component[c] != c) c = component[c];
printf("%4d ", component[c]);
}
printf("\n");
}
}
As you can see, I added 4 commands for doing diagonal connectivity between elements. Is this a valid modification of the union-find algorithm? I searched Google and stackoverflow in particular, but I can't find any example of diagonal connectivity. In addition, I want to extend this to 3 dimensions - so I would need to add 26 commands for checking. Will this way scale well? I mean the code seems to work for my case, but sometimes I randomly get an unlabeled isolated element. I don't want to integrate it with my code only to discover a bug months later.
Thanks.

There is nothing wrong with your approach using the union find algorithm. Union find runs on any graph. For each node it examines, it checks its connected nodes to determine whether they are in the same subset. Your approach appears to be doing just that, checking the 8 adjacent nodes of any observed node. The union find algorithm has nothing to do with the dimensions of your graph. You can extend that approach to 3d or any dimension, as long as your graph corresponds correctly to that dimension. If you are experiencing errors with this, you can post an example of that error, or check out code review: https://codereview.stackexchange.com/.

Related

minimum cost to reach destination through tunnels

Recently I faced a problem in the interview and not able to answer it. Any help will be appreciated.
Given a two dimensional grid (1 <= row <= 10^9 && 1 <= col <= 10^9) and starting and ending coordinates. we can only go to the four adjacent cells and it cost 1 unit. we also have N tunnels (1 <= N <= 200) whose starting and ending coordinates are given and if we go through the tunnel it costs k unit (1 <= k <= 10^9).
Note: It is not necessary to take tunnels but if we take one it costs k unit of energy per tunnel taken.
we have to find the minimum cost to reach the destination from the starting coordinate.
starting coordinate (1 <= sx, sy <= 10^9)
destination coordinate (1 <= fx, fy <= 10^9)
The problem needs to be transposed into a graph with a weight given to each vertex. Once we have done that, we can use the Dijkstra algorithm to find the shortest path.
Solving the problem thus boils down to transposing the problem into a graph with weighted vertices.
We can go from any cell to any other cell without going through a tunnel. The cost is then the manhattan distance. When the coordinate of a cell c1 is (x1,y1) and another cell c2 is (x2,y2), the manhattan distance between c1 and c2 is d=abs(x2-x1)+abs(y2-y1).
The nodes of the graph will correspond to the starting cell, the final cell, and every tunnel exit cells. The number of nodes in the graph is 2 + n where n is the number of tunnels.
There is a vertex between every node. The weight of a vertex to the final node is simply the manhattan distance. The weight of a vertex to a tunnel exit node is the manhattan distance to the tunnel entry cell plus the weight k associated to the tunnel.
This yields a graph that we can now solve using the Dijkstra algorithm to find the shortest path.
As chmike mentioned, the question can first be transformed into a graph. Then Djikstra's algorithm for finding shortest paths can be used. Here's is my code -
#include<bits/stdc++.h>
using namespace std;
#define int long long int
const int N = 402;
int dp[N][N];
pair<int,int> g[N];
int dist[N];
bool vis[N];
int32_t main(){
int k,a,b,c,d,n,p,q,r,s,index,nodes,val;
cin>>k>>a>>b>>c>>d>>n;
index = 2;
nodes = 2*n+1;
for(int i=1;i<=nodes;i++)
dist[i] = INT_MAX;
memset(vis,false,sizeof(vis));
memset(dp,-1,sizeof(dp));
for(int i=0;i<=nodes;i++)
dp[i][i] = 0;
g[0] = {a,b};
g[1] = {c,d};
dp[0][1] = dp[1][0] = abs(a-c)+abs(b-d);
for(int i=0;i<n;i++){
cin>>p>>q>>r>>s;
dp[index][index+1] = k;
dp[index+1][index] = k;
g[index] = {p,q};
g[index+1] = {r,s};
for(int j=0;j<index;j++){
val = abs(p-g[j].first)+abs(q-g[j].second);
dp[j][index] = dp[index][j] = val;
val = abs(r-g[j].first)+abs(s-g[j].second);
dp[j][index+1] = dp[index+1][j] = val;
}
index += 2;
}
for(int i=0;i<=nodes;i++){
int v = -1;
for(int j=0;j<=nodes;j++){
if(!vis[j] && (v == -1 || dist[j] < dist[v]))
v = j;
}
if(dist[v] == INT_MAX)
break;
vis[v] = true;
for(int j=0;j<=nodes;j++)
dist[j] = min(dist[j], dist[v]+dp[v][j]);
}
cout<<dist[1];
return 0;
}
you can use dynamic programming
#include <bits/stdc++.h>
using namespace std;
#define type long long
int main()
{ //k i sost of travelling through tunnel
//sx and sy are starting coordinates
//fx and fy are ending coordinates
//n are number of tunnels
int k, sx, sy, fx ,fy,n;
cin>>k>>sx>>sy>>fx>>fy>>n;
vector<vector<int>> arr(n, vector<int>(4,0));
map<pair<int, int> , pair<int,int>> mp;
//taking inputof tunnel elements and storing it in a map
for(int i=0; i<n; i++)
{
for(int j=0; j<4; j++)
cin>>arr[i][j];
pair<int,int> a,b;
a= pair<int,int> (arr[i][0], arr[i][1]);
b= pair<int, int> (arr[i][2], arr[i][3]);
mp[a] = b;
mp[b] =a;
}//cin the elements
//function
vector<vector<type>> dp (fx+1, vector<type>(fy+1,LONG_LONG_MAX));
dp[fx][fy] =0; //end
for(int i= fx; i>=0; i--)
{
for(int j= fy; j>=0; j--)
{
//go down
if(j+1< fy)
{
dp[i][j] = min(dp[i][j] , dp[i][j+1]+1 );
}
//go right
if(i+1< fx)
{
dp[i][j] = min(dp[i][j] , dp[i+1][j]+1 );
}
//tunnel
if(mp.find(pair<int, int> (i,j))!= mp.end())
{
pair<int, int> temp= mp[pair<int, int> (i,j)];
int a= temp.first, b= temp.second;
if(dp[a][b]!= LONG_LONG_MAX)
{
dp[i][j] = min(dp[i][j] , dp[a][b]+ k );
}
}
}
}//
cout<<dp[sx][sy]<<'\n';
}
here i have used dp
the array dp is 2-d matrix that saves the cost to reach fx, fy.
we start from bottom up approach, at each cell we find the minimum cost to reach the end.
we check the cost to reach by stepping 1 cell downward i.e. from dp[i][j] to dp[i][j+1] .
then we check the right cell by dp[i+1][j]
we see if tunnel is present.

Solving a simple linear equation

Suppose I needed to solve the following equation,
ax + by = c
Where a, b, and c are known values and x, y are natural numbers between 0 and 10 (inclusively).
Other than the trivial solution of,
for (x = 0; x <= 10; x++)
for (y = 0; y <= 10; y++)
if (a * x + b * y == c)
printf("%d %d", x, y);
... is there any way to find all solutions for this independent system efficiently?
In your case, since x and y only take values between 0 and 10, brute force algorithm maybe the best option as it takes less time to implement.
However, if you have to find all pairs of integral solution (x, y) in a larger range, you really should apply the right mathematical tool for tackling this problem.
You are trying to solve a linear Diophantine equation, and it is well known that integral solution exists if and only if the greatest common divisor d of a and b divides c.
If solution does not exist, then you are done. Otherwise, you should first apply the Extended Euclidean Algorithm to find a paritcular solution for the equation ax + by = d.
And according to Bézout's identity, all other integral solutions are of the form:
where k is an arbitrary integer.
But note that we are interested in the solution of ax + by = c, we have to scale all our pairs of (x, y) by a factor of c / d.
You only to loop thru x, then calculate y. (x, y) is a solution if y is integer, and between 0 and 10.
In C:
for (int x = 0; x <= 10; ++x) {
double y = (double)(c - ax) / b;
// If y is an integer, and it's between 0 and 10, then (x, y) is a solution
BOOL isInteger = abs(floor(y) - y) < 0.001;
if (isInteger && 0 <= y && y <= 10) {
printf("%d %d", x, y);
}
}
You could avoid the second for loop by checking directly if (c-a*x)/b is an integer.
EDIT: My code is less clean than I had hoped, due to some careless oversights on my part pointed out in the comments, but it is still faster than nested for loops.
int by;
for (x = 0; x <= 10; x++) {
by = c-a*x; // this is b*y
if(b==0) { // check for special case of b==0
if (by==0) {
printf("%d and any value for y", x);
}
} else { // b!=0 case
y = by/b;
if (by%b==0 && 0<=y && y<=10) { // is y an integer between 0 and 10?
printf("%d %d", x, by/b);
}
}
}

Number of Triangles Containing The Point (0,0)

First off, credits to Topcoder, as this problem was used in one of their SRMs (but they have no editorial for it..)
In this problem, I am given n points (where n is between 1 and 1000). For every three points, there is obviously a triangle that connects them. The question is, how many of these triangles contain the point (0,0).
I have tried looking at this thread on stack:
triangle points around a point
But I am unable to understand what data structures are used/how to use them to solve this problem.
An obvious naive solution to this problem is to use an inefficient O(n^3) algorithm and search all points. However, could someone please help me make this more efficient, and do this in O(n^2) time?
Below is Petr's solution to this problem... it is very short, but has a large idea I cannot understand.
/**
* Built using CHelper plug-in
* Actual solution is at the top
*/
public class TrianglesContainOrigin {
public long count(int[] x, int[] y) {
int n = x.length;
long res = (long) n * (n - 1) * (n - 2) / 6;
for (int i = 0; i < n; ++i) {
int x0 = x[i];
int y0 = y[i];
long cnt = 0;
for (int j = 0; j < n; ++j) {
int x1 = x[j];
int y1 = y[j];
if (x0 * y1 - y0 * x1 < 0) {
++cnt;
}
}
res -= cnt * (cnt - 1) / 2;
}
return res;
}
}
Let there be a triangle with 3 points p1=(x_1, y_1),p2=(x_2, y_2) and p3=(x_3, y_3). Let p1, p2, p3 be the position vectors. If the origin lies within, then cross product of any one position vector with other two will be different in sign (one negative, one positive). But if the origin lies outside, then there will be one point which has negative cross product with both the other points. So for each point i, find points whose cross product is less than 0. Now if you select any two of these points and make a triangle along with point i, the origin will be outside this triangle. That's why you subtract from res (selection of 2 from such points + point i). This was by far the best solution many implemented as it did not have the problem of precision with double etc.

squarepie solution

The following problem was asked in the programming contest, which is over now.
Squarepie program
I tried the best solution I could, but always got time limit exceeded error. My solution was as follows.
First add all the edges in a structure which is first sorted by length and then by their position. I was having two different structures for x and y edges. Find the outside rectangle, and add it to the stack. Now for each rectangle in the stack find if there is any intersecting edge. If yes divide this rectangle in two by this edge and add both to the stack. If failed to find any bisecting edge, add the area of the rectangle in priority queue. At the end print elements from priority queue.
I now wonder is there any faster solution.
Edit :-
Attaching my solution.
My Final solution
Everything looks good except for getLargestRect(), which you really overcomplicate. Just return rectangle(minX, minY, maxX, maxY). You can find the mins and maxs in linear time. The current implementation is O(n2) when all the lines have the same length.
I also coded up my own algorithm, if you want to have a look at a different approach. My idea was to store all the vertical lines in a fancy data structure, then scan through the horizontal lines and find the rectangles they close off.
When looking at the horizontal line h, all the vertical lines with y1 < h.y && y2 >= h.y are stored in a map by their x value. The current horizontal line forms rectangles with all the vertical lines from map[h.x1] to map[h.x2]. The outer two lines extend past h.y, but all the middle ones must end at h.y and are therefore removed from the map after the area of their rectangles has been calculated. The vertical lines that need to be added to the map for each horizontal line are found efficiently by sorting the verticals according to their y1 value.
Here is the code:
#include <iostream>
#include <map>
#include <vector>
#include <algorithm>
#define min(a, b) ((a) < (b) ? (a) : (b))
#define max(a, b) ((a) < (b) ? (b) : (a))
using namespace std;
class Horizontal
{
public:
int x1, x2, y;
Horizontal(int x1, int x2, int y) : x1(x1), x2(x2), y(y) {}
static bool comp(const Horizontal & a, const Horizontal & b)
{
return a.y < b.y;
}
};
class Vertical
{
public:
int x, y1; // no need to store y2
Vertical(int x, int y1) : x(x), y1(y1) {}
static bool comp(const Vertical & a, const Vertical & b)
{
return a.y1 < b.y1;
}
};
long long total = 0;
int vertI = 0; // index of next vertical to add to currentVerts
map<int, int> currentVerts; // currentVerts[5] = y1 of the vert line with x=5
vector<Vertical> verticals;
vector<Horizontal> horizontals;
vector<int> solutions;
void readInput();
void processHorizontal(Horizontal & line);
int main()
{
cout.precision(10);
readInput();
sort(verticals.begin(), verticals.end(), Vertical::comp);
sort(horizontals.begin(), horizontals.end(), Horizontal::comp);
// process the lines (start at i = 1 to ignore the top one)
for (int i = 1; i < horizontals.size(); i++)
{
processHorizontal(horizontals[i]);
}
sort(solutions.begin(), solutions.end());
for (int i = solutions.size() - 1; i >= 0; i--)
{
cout << (double) solutions[i] / total << "\n";
}
}
void readInput()
{
int n;
cin >> n;
int x1, x2, y1, y2;
for (int i = 0; i < n; i++)
{
cin >> x1 >> y1 >> x2 >> y2;
if (x2 < x1) swap(x1, x2);
if (y2 < y1) swap(y1, y2);
if (x1 == x2) verticals.push_back(Vertical(x1, y1));
else horizontals.push_back(Horizontal(x1, x2, y1));
}
}
void processHorizontal(Horizontal & horiz)
{
// add all vert lines which start above horiz to currentVert
for ( ; vertI < verticals.size() && verticals[vertI].y1 < horiz.y;
vertI++)
{
int x = verticals[vertI].x;
currentVerts[x] = verticals[vertI].y1;
}
map<int, int>::iterator left = currentVerts.find(horiz.x1);
map<int, int>::iterator right = currentVerts.find(horiz.x2);
map<int, int>::iterator i;
map<int, int>::iterator next;
for (i = next = left; i != right; i = next)
{
next++;
int width = (*next).first - (*i).first; // difference in x
int height = horiz.y - (*i).second; // difference y
int area = width * height;
total += area;
solutions.push_back(area);
if (i != left)
{
// if i is not the start it must be a short
// line which ends here, so delete it
currentVerts.erase(i);
}
else
{
// if it is left, cut the rectangle at horiz.y
// by modifying the start of the line
(*i).second = horiz.y;
}
}
}

How many moves to reach a destination? Efficient flood filling

I want to compute the distance of cells from a destination cell, using number of four-way movements to reach something. So the the four cells immediately adjacent to the destination have a distance of 1, and those on the four cardinal directions of each of them have a distance of 2 and so on. There is a maximum distance that might be around 16 or 20, and there are cells that are occupied by barriers; the distance can flow around them but not through them.
I want to store the output into a 2D array, and I want to be able to compute this 'distance map' for any destination on a bigger maze map very quickly.
I am successfully doing it with a variation on a flood fill where the I place incremental distance of the adjacent unfilled cells in a priority queue (using C++ STL).
I am happy with the functionality and now want to focus on optimizing the code, as it is very performance sensitive.
What cunning and fast approaches might there be?
I think you have done everything right. If you coded it correct it takes O(n) time and O(n) memory to compute flood fill, where n is the number of cells, and it can be proven that it's impossible to do better (in general case). And after fill is complete you just return distance for any destination with O(1), it easy to see that it also can be done better.
So if you want to optimize performance, you can only focused on CODE LOCAL OPTIMIZATION. Which will not affect asymptotic but can significantly improve your real execution time. But it's hard to give you any advice for code optimization without actually seeing source.
So if you really want to see optimized code see the following (Pure C):
include
int* BFS()
{
int N, M; // Assume we have NxM grid.
int X, Y; // Start position. X, Y are unit based.
int i, j;
int movex[4] = {0, 0, 1, -1}; // Move on x dimension.
int movey[4] = {1, -1, 0, 0}; // Move on y dimension.
// TO DO: Read N, M, X, Y
// To reduce redundant functions calls and memory reallocation
// allocate all needed memory once and use a simple arrays.
int* map = (int*)malloc((N + 2) * (M + 2));
int leadDim = M + 2;
// Our map. We use one dimension array. map[x][y] = map[leadDim * x + y];
// If (x,y) is occupied then map[leadDim*x + y] = -1;
// If (x,y) is not visited map[leadDim*x + y] = -2;
int* queue = (int*)malloc(N*M);
int first = 0, last =1;
// Fill the boarders to simplify the code and reduce conditions
for (i = 0; i < N+2; ++i)
{
map[i * leadDim + 0] = -1;
map[i * leadDim + M + 1] = -1;
}
for (j = 0; j < M+2; ++j)
{
map[j] = -1;
map[(N + 1) * leadDim + j] = -1;
}
// TO DO: Read the map.
queue[first] = X * leadDim + Y;
map[X * leadDim + Y] = 0;
// Very simple optimized process loop.
while (first < last)
{
int current = queue[first];
int step = map[current];
for (i = 0; i < 4; ++i)
{
int temp = current + movex[i] * leadDim + movey[i];
if (map[temp] == -2) // only one condition in internal loop.
{
map[temp] = step + 1;
queue[last++] = temp;
}
}
++first;
}
free(queue);
return map;
}
Code may seems tricky. And of course, it doesn't look like OOP (I actually think that OOP fans will hate it) but if you want something really fast that's what you need.
It's common task for BFS. Complexity is O(cellsCount)
My c++ implementation:
vector<vector<int> > GetDistance(int x, int y, vector<vector<int> > cells)
{
const int INF = 0x7FFFFF;
vector<vector<int> > distance(cells.size());
for(int i = 0; i < distance.size(); i++)
distance[i].assign(cells[i].size(), INF);
queue<pair<int, int> > q;
q.push(make_pair(x, y));
distance[x][y] = 0;
while(!q.empty())
{
pair<int, int> curPoint = q.front();
q.pop();
int curDistance = distance[curPoint.first][curPoint.second];
for(int i = -1; i <= 1; i++)
for(int j = -1; j <= 1; j++)
{
if( (i + j) % 2 == 0 ) continue;
pair<int, int> nextPoint(curPoint.first + i, curPoint.second + j);
if(nextPoint.first >= 0 && nextPoint.first < cells.size()
&& nextPoint.second >= 0 && nextPoint.second < cells[nextPoint.first].size()
&& cells[nextPoint.first][nextPoint.second] != BARRIER
&& distance[nextPoint.first][nextPoint.second] > curDistance + 1)
{
distance[nextPoint.first][nextPoint.second] = curDistance + 1;
q.push(nextPoint);
}
}
}
return distance;
}
Start with a recursive implementation: (untested code)
int visit( int xy, int dist) {
int ret =1;
if (array[xy] <= dist) return 0;
array[xy] = dist;
if (dist == maxdist) return ret;
ret += visit ( RIGHT(xy) , dist+1);
...
same for left, up, down
...
return ret;
}
You'l need to handle the initalisation and the edge-cases. And you have to decide if you want a two dimentional array or a one dimensonal array.
A next step could be to use a todo list and remove the recursion, and a third step could be to add some bitmasking.
8-bit computers in the 1970s did this with an optimization that has the same algorithmic complexity, but in the typical case is much faster on actual hardware.
Starting from the initial square, scan to the left and right until "walls" are found. Now you have a "span" that is one square tall and N squares wide. Mark the span as "filled," in this case each square with the distance to the initial square.
For each square above and below the current span, if it's not a "wall" or already filled, pick it as the new origin of a span.
Repeat until no new spans are found.
Since horizontal rows tend to be stored contiguously in memory, this algorithm tends to thrash the cache far less than one that has no bias for horizontal searches.
Also, since in the most common cases far fewer items are pushed and popped from a stack (spans instead of individual blocks) there is less time spent maintaining the stack.

Resources