Assume I have a matrix MxN, filled with values between 0 and 5. I now want to determine the largest connected tree in that matrix, where the values of the matrix are considered to be the nodes. A pair of nodes is said to be connected if it's nodes are adjacent to each other either horizontally or vertically, and if the value of both nodes is the same. The size of a tree is equal to the nodes in the tree.
An example:
1 0 3 0 0 2 2 0 0 0 0 0
1 1 2 2 2 0 2 0 0 0 0 0
0 1 0 3 0 0 2 0 0 0 0 2
3 1 0 3 0 0 2 0 2 2 2 2
0 0 0 0 0 0 0
3 0 0 3 3 0 0
3 3 3 3 0 0 0
On the left side, the 1-nodes on the left side form the largest tree. On the right side, the 3-nodes form the largest tree, while there are two other trees consisting of 2-nodes.
I know I could probably do a simple depth-first search, but I'm wondering if there is something well-known that I'm missing, maybe in the realm of graph theory (like Kruskal's minimum spanning tree algorithm, but for this example).
You are looking for disjoint sets so I would suggest a disjoint-set data structure and a find/union algorithm:
see http://en.wikipedia.org/wiki/Disjoint-set_data_structure#Disjoint-set_forests
The union operation is symmetric so you really only need to compare each element of the matrix with its neighbor to the right and its neighbor below applying the union operation when the compared elements have the same value.
Sweep through each of the elements again using the find operation to count the size of each set keeping track of the largest. You will need storage for the counts.
The computational complexity will be O(MN A-1(MN,MN)) where A-1 is the Inverse Ackermann function, which one can consider a small constant (< 5) for any useful value of MN. And the extra storage complexity will be O(MN).
Effectively what you're looking for are Connected components. Connected component is a set of nodes, where you could travel from any node to any other within that component.
Connected components are applicable generally to a graph. Connected components can be found using BFS/DFS and from algorithm complexity perspective given adjacency matrix input there is no better way to do that. The running time of the algorithm is O(N^2), where N is a number of nodes in a graph.
In your case graph has more constrains, such as each node can be adjacent to at most 4 other nodes. With BFS/DFS, this gives you a running time of O(4N) = O(N), where N is a number of nodes. There can't possibly be an algorithm with a better complexity, as you need to consider each node at least once in the worst case.
Related
Question:
Given a undirected graph of N nodes and M edges. You need to solve 2 problems:
Problem 1: For each edge j (j : 1 -> M), if you delete that edge, count the number of nodes that can't reach each other (there's no path between that 2 nodes).
Problem 2: For each node i (i : 1 -> N), if you delete that node (which also deleted all of the edges connected to it), count the number of nodes that can't reach each other.
Example:
N = 6, M = 7
1 2
2 3
3 1
3 4
4 5
5 6
6 4
(Edges are described as u - v)
Result:
For each edge j (j : 1 -> M): 0 0 0 9 0 0 0
For each node i (i : 1 -> N): 0 0 6 6 0 0
P/S: I have been thinking for many days but can't find a proper answer for this problem
IF initial graph is connected, then the first problem is searching of bridges and the second one is searching of cut vertex / articulation points.
After revealing of bridge get sizes of connected components (there are two of them) and needed result is product of sizes (for example, components of size 2 and size 3 give 6 pairs)
After revealing of cut vertex number of components might be larger, and result is sum of pairwise products of sizes (for components with sizes 1,2,3 result is 1*2+1*3+2*3=11 pairs)
C++ code for solving both problems using DFS could be found here
Given a matrix (n x n) of 1 and 0, where 1 represent land and 0 represent water.
How can I find the median of the area of the lands in the most efficient way?
For Example:
1 1 0 0 0
1 0 0 1 1
1 0 1 0 0
There are three islands, the area of them [1,2,4] and the median is 2
An island can be consist of continuous non-diagonal cells which contain 1:
For example:
1 0 1
0 1 0
this matrix contains three islands of areas [1,1,1]
My solution is finding recursively the areas and then sort them to find the median which takes O(n^2log(n^2)), is there a more efficient way to do that?
First step, run DFS recursively on the grid and discover all the islands & calculate areas in O(n^2) time.
Second step, You can use Median of Medians algorithm to calculate the median of unsorted island's areas array in expected O(m) time where m is the number of islands.
Overall time complexity O(n^2).
If you need further help, I can provide my implementation.
Using a disjoint set gives you O(A(N)), where A is inverse Ackermann function to find the Islands, then using an nth_element (aka IntroSelect) to find the N/2 in O(N) to find the median.
sets = DisjointSet(matrix)
median = nth_element(sets, N/2)
For a total of O(A(N)) far less than O(N^2)
I'm practising with graphs and adjacency matrices. But I couldn't find a good example that differentiates symmetric and asymmetric matrix. Can anyone tell me how to distinguish the difference between symmetric or asymmetric matrix.
An adjacency matrix is symmetric if it is derived from an undirected graph.
That means, the path from node A -> B has the same cost/weight/length as the path from node B -> A.
If you create the adjacency matrix M, it will be symmetric, meaning that for any i and j, M[i][j] == M[j]i]. More mathematically, the matrix is identical to its transpose. So if you transpose your matrix, it will look exactly the same. Graphically, such a matrix looks like this:
0 2 3 4
2 0 5 6
3 5 0 7
4 6 7 0
Due to the symmetry, you can often represent it using less memory. For algorithms like the Floyd-Warshall-algorithm on undirected graphs, you can reduce the amount of computation by 50% since you only need to compute half of the matrix:
0 2 3 4
0 5 6
0 7
0
For comparison, an asymmetric matrix:
0 2 3 9 <--
2 0 5 6
3 5 0 7
4 6 7 0
Note, that it is almost identical to the previous example, but in the upper right corner, there is a 9. So it is no longer possible to mirror the matrix along it's diagonal axis.
You can check the example of symmetric graph
Permutation of any two rows or columns in an incidence matrix simply corresponds to relabelling the vertices and edges of the same graph. Conversely, two graphs X and Y are isomorphic if and only if their incidence matrices A(X) and A(Y) differ only by permutations of rows and columns.
Can someone explain me what does it mean, with an example. What exactly does "permutation of any two rows or columns" over hear means?
"Permutation" here means "exchange". Consider the following node-node incidence matrix:
0 1 0
0 0 1
1 0 0
It defines a graph with vertices 0, 1, 2 where the edges constitue a circle 0-1-2-0. If the first two rows are exchanged, we obtain
0 0 1
0 1 0
1 0 0
where the circle is 0-2-1-0. This graph is obtained from the initial graph by relabelling 1 to 2 and vice versa. This means that both graphs are "identical up to renaming of vertices", i.e. they are isomorphic.
I am looking for an algorithm to select a subset of nodes from a chain. For example, with a given node set with "N" nodes in a temporal chain, I would like to select "K" nodes based on the criteria such that K < N. For example, what if I have to select a set of days {D1, D2, DK} with "K=3" days out of the set {D1, D2, D3,...DN} "N=7" days in a week such that I maximize the following cost given by:
I need to select the best "K" items from the set {D1,....,DN}. One possibility is that I can enumerate all possible choices and choose the best combination:
...
1 0 0 0 0 1 1
0 0 0 0 1 1 1
0 1 0 0 1 1 0
...
Is there a well-known algorithm in Computer Science to solve this problem? If so, any pointer to appropriate resources/code might help.
PS: I am not sure whether this is the right forum, please comment below, I will repost it.
Since the objective is linear, this problem has optimal substructure and thus is amenable to dynamic programming. For each i from 0 to K, for each j from 0 to N, determine the best way to choose i nodes from the first j. There's only one way to choose i = 0 nodes. The best way to choose i nodes from the first j > 0 is either the best way to choose i from the first j - 1, or item j preceded by the best way to choose i - 1 nodes from the first j - 1. By avoiding recomputing the optima for subproblems, the running time is polynomial.