broken edges union-find Algorithm - algorithm

I need some help to figure out union-find problem.
Here is the question.
There's an undirected connected graph with n nodes labeled 1..n. But
some of the edges has been broken disconnecting the graph. Find the
minimum cost to repair the edges so that all the nodes are once again
accessible from each other.
Input:
n, an int representing the total number of nodes.
edges, a list of
integer pair representing the nodes connected by an edge.
edgesToRepair, a list where each element is a triplet representing the
pair of nodes between which an edge is currently broken and the cost
of repearing that edge, respectively
(e.g. [1, 2, 12] means to repear
an edge between nodes 1 and 2, the cost would be 12).
Example 1:
Input: n = 5, edges = [[1, 2], [2, 3], [3, 4], [4, 5], [1, 5]],
edgesToRepair = [[1, 2, 12], [3, 4, 30], [1, 5, 8]]
Output: 20
There are 3 connected components due to broken edges:
[1], [2, 3] and [4, 5]. We can connect these components into a single
component by repearing the edges between nodes 1 and 2, and nodes 1
and 5 at a minimum cost 12 + 8 = 20.
public int minCostRepairEdges(int N, int[][] edges, int[][] edgesToRepair){
int[] unionFind = new int[N+1];
int totalEdges=0;
for(int[] edge : edges){
int ua = edge[0]; //node1
int ub = edge[1]; //node2
unionFind[ua] = ub;
totalEdges++;
}
//change unionFind for some broken edge
for(int[] broken : edgesToRepair){
int ua = Find(unionFind, broken[0]);
int ub = Find(unionFind, broken[1]);
if(ua == ub){
unionFind[ua] = 0;
}
}
Arrays.sort(edgesToRepair, (a,b)->(a[2]-b[2]));
int cost=0;
for(int[] i : edgesToRepair){
int ua = Find(unionFind, i[0]);
int ub = Find(unionFind, i[1]);
if(ua != ub){
unionFind[ua] = ub;
cost += i[2];
totalEdges++;
}
}
return edgesToRepair==N-1 ? cost : -1;
}
public int find(int[] uf, int find){
if(uf[find]==0) return find;
uf[find] = find(uf, uf[find]);
return uf[find];
}
And Above is my code so far.
My idea is that
First Adding all edges (given edges[][]) to UnionFind and then Update it based on given edgesToRepair Info. (if edge was broken then going to update it in union -find)
Then just try to do basic union-find algorithm to connect two nodes with minimum cost.
Any wrong approach in here?
First, I have trouble updating unionFind when it was broken.
I can't figure out how to handle unbroken edges between two nodes.
Any advice would be helpful.

You're supposed to use Kruskal's algorithm to find a minimum cost spanning tree consisting of the existing edges and broken (repaired) edges:
https://en.wikipedia.org/wiki/Kruskal%27s_algorithm
Just consider the existing edges to have 0 cost, while the broken edges have their repair cost.

Related

Find guaranteed ancestors in directed graph

I'm trying to implement an algorithm to find what I call 'guaranteed ancestors' in a directed graph. I have a list of nodes which each can point to zero, one or multiple child nodes.
Below you see an example of a simple graph. I've marked all circles with a unique number.
Let's imagine we're trying to determine which nodes I'm guaranteed to have visited before reaching node 13 starting at node 0.
My thoughts when solving this simple example by hand is starting in node 13 and working my way back, which nodes am I guaranteed to visit no matter which direction I go. The first node I notice obeying this property is node 10, since no matter if I choose to visit node 11 or node 12, then I'm guaranteed to eventually reach node 13. Similarly I can conclude I have to visit node 9 if I want to reach node 13. Working all the way up the graph I conclude that node 13 has node 0, 1, 9, 10 as it's guaranteed anchestors.
I'm not sure what such an algorithm is called, but I'm sure there is a name for this specific search.
Here is the constraints you can assume about my graph.
There is a single defined "head/root" node, which is the only node without any other nodes pointing to it.
The graph is acyclic (Ideally the algorithm would be able to handle cycles too, but I have a different check, verifying that the graph is acyclic, so this is not a must.)
There is no "dead" nodes, eg. nodes which can't be reached from the head/root node.
This has to run on more complicated graphs with up to 500 nodes and many nodes with multiple "parents", which could be connected back and forth. Runtime is a priority as well - I assume we should be able to solve this problem in linear time complexity.
I've tried simplifying the problem to the point where I tried making an algorithm which could determine if a single node was a guaranteed anchestor of another node, which I believe is pretty simple to determine in O(n), however if I want a complete list of all guaranteed anchestors I assume I'd have to run this algorithm for every node, leaving me with O(n^2).
Does anyone know the correct name of the algorithm I'm describing?
Assign a weight of 1 to every edge
Run Dijkstra to find shortest path between head and root.
Assign weight of 2 * ( edge count of graph ) to every edge in path
Run Dijkstra to find cheapest path
Identify edges that are present in both paths. ( they could not be avoided although very expensive )
The nodes at both ends of every edge identified in 5 will be critical - i.e they must ALL be visted by any route between head and root.
Consider an example:
The first Dijkstra run would return a path containing node 1 or 2 ( they both belong on 5 hop paths. The second run would return a path containing the other of those two nodes
This is almost what the definition of an articulation or cut vertex is in an undirected graph. See Biconnected component:
a cut vertex is any vertex whose removal increases the number of connected components.
The difference is that your graph is directed, and that you consider the root also as such a vertex.
So my suggestion is to temporarily consider the graph to be undirected, and to apply a depth-first algorithm to identify such cut vertices, and include the root.
The algorithm is given as pseudo code in the same Wikipedia article. I have rewritten it in JavaScript, so it can be run here for the graph that you have given as example:
function buildAdjacencyList(n, edges) {
// Indexes in adj represent node identifiers.
// Values in adj are lists of neighbors: start out with empty lists
let adj = [];
for (let i = 0; i < n; i++) adj.push([]);
for (let [start, end] of edges) {
adj[start].push(end );
adj[end ].push(start); // make edge bidirectional
}
return adj;
}
function markArticulationPoints(nodes, node, depth) {
node.visited = true;
node.depth = depth;
node.low = depth;
for (let neighborId of node.neighbors) {
let neighbor = nodes[neighborId];
if (!neighbor.visited) {
neighbor.parent = node;
markArticulationPoints(nodes, neighbor, depth + 1);
if (neighbor.low >= node.depth) node.isArticulation = true;
if (neighbor.low < node.low) node.low = neighbor.low;
} else if (neighbor != node.parent && neighbor.depth < node.low) {
node.low = neighbor.depth;
}
}
}
function getArticulationPoints(adj, root) {
// Create object for each node, having meta data for algorithm
let nodes = [];
for (let i = 0; i < adj.length; i++) {
nodes.push({
neighbors: adj[i],
visited: false,
depth: Infinity,
low: Infinity,
parent: -1,
isArticulation: i == root // root is considered articulation point
});
}
markArticulationPoints(nodes, nodes[root], 0); // start DFS algorithm
// Collect articulation points from meta data
let result = [];
for (let i = 0; i < adj.length; i++) {
if (nodes[i].isArticulation) result.push(i);
}
return result;
}
// Build adjacency list for example graph, but with undirected edges
let adj = buildAdjacencyList(14, [
[0, 1],
[1, 2],
[1, 3],
[2, 4],
[2, 5],
[4, 5],
[4, 6],
[3, 7],
[7, 8],
[6, 9],
[8, 9],
[9, 10],
[10, 11],
[10, 12],
[11, 13],
[12, 13]
]);
let result = getArticulationPoints(adj, 0);
console.log("Articluation points:", ...result);

Back edges in a graph

I'm having a hard time understanding Tarjan's algorithm for articulation points. I'm currently following this tutorial here: https://www.hackerearth.com/practice/algorithms/graphs/articulation-points-and-bridges/tutorial/. What I really can't see, and couldn't see in any other tutorial, is what exactly a "back edge" means. Considering the graph given there, I know 3-1 and 4-2 are back edges, but are 2-1, 3-2, and 4-3 back edges too? Thank you.
...a Back Edge is an edge that connects a vertex to a vertex that is discovered before it's parent.
from your source.
Think about it like this: When you apply a DFS on a graph you fix some path that the algorithm chooses. Now in the given case: 0->1->2->3->4. As in the article mentioned, the source graph contains the edges 4-2 and 3-1. When the DFS reaches 3 it could choose 1 but 1 is already in your path so it is a back edge and therefore, as mentioned in the source, a possible alternative path.
Addressing your second question: Are 2-1, 3-2, and 4-3 back edges too? For a different path they can be. Suppose your DFS chooses 0->1->3->2->4 then 2-1 and 4-3 are back edges.
Consider the following (directed) graph traversal with DFS. Here the colors of the nodes represent the following:
The floral-white nodes are the ones that are yet to be visited
The gray nodes are the nodes that are visited and on stack
The black nodes are the ones that are popped from the stack.
Notice that when the node 13 discovers the node 0 through the edge 13->0 the node 0 is still on the stack. Here, 13->0 is a back edge and it denotes the existence of a cycle (the triangle 0->1->13).
In essence, when you do a DFS, if there are cycles in your graph between nodes A, B and C and you have discovered the edges A-B, later you discover the edge B-C, then, since you have reached node C, you will discover the edge C-A, but you need to ignore this path in your search to avoid infinite loops. So, in your search A-B and B-C were not back edges, but C-A is a back edge, since this edge forms a cycle back to an already visited node.
From article mentioned:
Given a DFS tree of a graph, a Back Edge is an edge that connects a
vertex to a vertex that is discovered before it's parent.
2-1, 3-2, 4-3 are not "Back edge" because they link the vertices with their parents in DFS tree.
Here is the code for a better understand:
#include<bits/stdc++.h>
using namespace std;
struct vertex{
int node;
int start;
int finish;
int color;
int parent;
};
int WHITE=0, BLACK=1, GREY=2;
vector<int> adjList[8];
int num_of_verts = 8;
struct vertex vertices[8];
int t=0;
bool DFS_visit(int u){
bool cycleExists = false;
vertices[u].color=GREY;
t++;
vertices[u].start= t;
for( int i=0; adjList[u][i]!=-1; i++){
if( vertices[adjList[u][i]].color == WHITE ){
if(!cycleExists) cycleExists = DFS_visit(adjList[u][i]);
else DFS_visit(adjList[u][i]);
}
else {
cout << "Cycle detected at edge - ("<<u<<", "<<adjList[u][i]<<")"<<endl;
cycleExists = true;
}
}
vertices[u].color=BLACK;
t++;
vertices[u].finish= t;
return cycleExists;
}
void DFS(){
for(int i=0;i<num_of_verts;i++){
vertices[i].color=WHITE;
vertices[i].parent=NULL;
}
t=0;
for(int i=0;i<num_of_verts;i++){
if(vertices[i].color==WHITE){
cout << "Traversing component "<<i<<"-"<<endl;
bool cycle = DFS_visit(i);
cycle==1? cout<<"Cycle Exists\n\n":cout <<"Cycle does not exist\n\n";
}
}
}
int main(){
adjList[0] = {4, -1};
adjList[1] = {0, 5, -1};
adjList[2] = {1, 5, -1};
adjList[3] = {6, 7, -1};
adjList[4] = {1, -1};
adjList[5] = {-1};
adjList[6] = {2, 5, -1};
adjList[7] = {3, 6, -1};
DFS();
return 0;
}

Determine if a graph contains a triangle?

This problem has an easy solution if our target time complexity is O(|V| * |E|) or O(V^3) and the like. However, my professor recently gave us an assignment with the problem statement being:
Let G = (V, E) be a connected undirected graph. Write an algorithm that determines if G contains a triangle in O(|V| + |E|).
At this point, I'm stumped. Wikipedia says:
It is possible to test whether a graph with m edges is triangle-free in time O(m^1.41).
There was no mention of the possibility for a faster algorithm besides one that runs on a Quantum computer. I started resorting to better sources afterwards. A question on Math.SE linked me to this paper that says:
The fastest algorithm known for finding and counting triangles relies on fast matrix product and has an O(n^ω) time complexity, where ω < 2.376 is the fast matrix product exponent.
And that's where I started to realize that maybe, we're being tricked into working on an unsolved problem! That dastardly professor!
However, I'm still a bit skeptical. The paper says "finding and counting". Is that equivalent to the problem I'm trying to solve?
TL;DR: Am I being fooled, or am I overlooking something so trivial?
Well, it turns out, this really isn't doable in O(|V| + |E|). Or at least, we don't know. I read 4 papers to reach this result. I stopped half-way into one of them, because I realized it was more focused on distributed computing than graph theory. One of them even gave probabilistic algorithms to determine triangle-freeness in "almost linear" time. The three relevant papers are:
Finding and counting given length cycles by Alon, Yuster & Zwick.
Testing Triangle-Freeness in General Graphs by Alon, Kaufman, Krivelevich & Ron.
Main-memory Triangle Computations for Very Large (Sparse (Power-Law)) Graphs by Latapy
I wrote about 2 pages of LaTeX for the assignment, quoting the papers with proper citations. The relevant statements in the papers are boxed:
In the end, I spoke to my professor and it turns out, it was in fact an unintended dire mistake. He then changed the required complexity to O(|V| * |E|). I don't blame him, he got me to learn more graph theory!
Here's the code for the O(|E|*|V|) version.
When you constrain |V| the bit mask intersect-any operation is effectively O(1) which gets you O(|E|), but that's cheating.
Realistically the complexity is O(|E| * (|V| / C)) where C is some architecture specific constant (i.e: 32, 64, 128).
function hasTriangle(v, e) {
if(v.length > 32) throw Error("|V| too big, we can't pretend bit mask intersection is O(1) if |V| is too big!");
// setup search Array
var search = new Uint32Array(v.length);
// loop through edges O(|E|)
var lastEdge = [-1, -1];
for(var i=0, l=e.length; i < l; i++) {
var edge = e[i];
if(edge[0] == lastEdge[0]) {
search[lastEdge[1]] = search[lastEdge[1]] | (1 << edge[0]);
search[edge[1]] = search[edge[1]] | (1 << edge[0]);
} else {
lastEdge = edge;
}
// bit mask intersection-any O(1), but unfortunately considered O(|V|)
if( (search[edge[0]] & search[edge[1]]) > 0 ) {
return true;
}
}
return false;
}
var V = [0, 1, 2, 3, 4, 5];
var E_no_triangle = [[0, 4], [0, 5], [1, 2], [1, 3], [2, 5]];
var E_triangle = [[0, 1], [0, 2], [0, 3], [1, 4], [2, 1], [2, 3], [4, 5]]; // Triange(0, 2, 3)
console.log(hasTriangle(V, E_no_triangle)); // false
console.log(hasTriangle(V, E_triangle)); // true

Subgraph enumeration

What is an efficient algorithm for the enumeration of all subgraphs of a parent graph. In my particular case, the parent graph is a molecular graph, and so it will be connected and typically contain fewer than 100 vertices.
Edit: I am only interested in the connected subgraphs.
This question has a better answer in the accepted answer to this question. It avoids the computationally complex step marked "you fill in above function" in #ninjagecko's answer. It can deal efficiently with compounds where there are several rings.
See the linked question for the full details, but here's the summary. (N(v) denotes the set of neighbors of vertex v. In the "choose a vertex" step, you can choose any arbitrary vertex.)
GenerateConnectedSubgraphs(verticesNotYetConsidered, subsetSoFar, neighbors):
if subsetSoFar is empty:
let candidates = verticesNotYetConsidered
else
let candidates = verticesNotYetConsidered intersect neighbors
if candidates is empty:
yield subsetSoFar
else:
choose a vertex v from candidates
GenerateConnectedSubgraphs(verticesNotYetConsidered - {v},
subsetSoFar,
neighbors)
GenerateConnectedSubgraphs(verticesNotYetConsidered - {v},
subsetSoFar union {v},
neighbors union N(v))
What is an efficient algorithm for the enumeration of all subgraphs of a parent graph. In my particular case, the parent graph is a molecular graph, and so it will be connected and typically contain fewer than 100 vertices.
Comparison with mathematical subgraphs:
You could give each element a number from 0 to N, then enumerate each subgraph as any binary number of length N. You wouldn't need to scan the graph at all.
If what you really want is subgraphs with a certain property (fully connected, etc.) that is different, and you'd need to update your question. As a commentor noted, 2^100 is very large, so you definitely don't want to (like above) enumerate the mathematically-correct-but-physically-boring disconnected subgraphs. It would literally take you, assuming a billion enumerations per second, at least 40 trillion years to enumerate them all.
Connected-subgraph-generator:
If you want some kind of enumeration that retains the DAG property of subgraphs under some metric, e.g. (1,2,3)->(2,3)->(2), (1,2,3)->(1,2)->(2), you'd just want an algorithm that could generate all CONNECTED subgraphs as an iterator (yielding each element). This can be accomplished by recursively removing a single element at a time (optionally from the "boundary"), checking if the remaining set of elements is in a cache (else adding it), yielding it, and recursing. This works fine if your molecule is very chain-like with very few cycles. For example if your element was a 5-pointed star of N elements, it would only have about (100/5)^5 = 3.2million results (less than a second). But if you start adding in more than a single ring, e.g. aromatic compounds and others, you might be in for a rough ride.
e.g. in python
class Graph(object):
def __init__(self, vertices):
self.vertices = frozenset(vertices)
# add edge logic here and to methods, etc. etc.
def subgraphs(self):
cache = set()
def helper(graph):
yield graph
for element in graph:
if {{REMOVING ELEMENT WOULD DISCONNECT GRAPH}}:
# you fill in above function; easy if
# there is 0 or 1 ring in molecule
# (keep track if molecule has ring, e.g.
# self.numRings, maybe even more data)
# if you know there are 0 rings the operation
# takes O(1) time
continue
subgraph = Graph(graph.vertices-{element})
if not subgraph in cache:
cache.add(subgraph)
for s in helper(subgraph):
yield s
for graph in helper(self):
yield graph
def __eq__(self, other):
return self.vertices == other.vertices
def __hash__(self):
return hash(self.vertices)
def __iter__(self):
return iter(self.vertices)
def __repr__(self):
return 'Graph(%s)' % repr(set(self.vertices))
Demonstration:
G = Graph({1,2,3,4,5})
for subgraph in G.subgraphs():
print(subgraph)
Result:
Graph({1, 2, 3, 4, 5})
Graph({2, 3, 4, 5})
Graph({3, 4, 5})
Graph({4, 5})
Graph({5})
Graph(set())
Graph({4})
Graph({3, 5})
Graph({3})
Graph({3, 4})
Graph({2, 4, 5})
Graph({2, 5})
Graph({2})
Graph({2, 4})
Graph({2, 3, 5})
Graph({2, 3})
Graph({2, 3, 4})
Graph({1, 3, 4, 5})
Graph({1, 4, 5})
Graph({1, 5})
Graph({1})
Graph({1, 4})
Graph({1, 3, 5})
Graph({1, 3})
Graph({1, 3, 4})
Graph({1, 2, 4, 5})
Graph({1, 2, 5})
Graph({1, 2})
Graph({1, 2, 4})
Graph({1, 2, 3, 5})
Graph({1, 2, 3})
Graph({1, 2, 3, 4})
There is this algorithm called gspan [1] that has been used to count frequent subgraphs it can also be used to enumerate all subgraphs. You can find an implementation of it here [2].
The idea is the following: Graphs are represented by so called DFS codes. A DFS code corresponds to a depth first search on a graph G and has an entry of the form
(i, j, l(v_i), l(v_i, v_j), l(v_j)), for each edge (v_i, v_j) of the graph, where the vertex subscripts correspond to the order in which the vertices are discovered by the DFS. It is possible to define a total order on the set of all DFS codes (as is done in [1]) and as a consequence to obtain a canonical graph label for a given graph by computing the minimum over all DFS codes representing this graph. Meaning that if two graphs have the same minimum DFS code they are isomorphic. Now, starting from all possible DFS codes of length 1 (one per edge), all subgraphs of a graph can be enumerated by subsequently adding one edge at a time to the codes which gives rise to an enumeration tree where each node corresponds to a graph. If the enumeration is done carefully (i.e., compatible with the order on the DFS codes) minimal DFS codes are encountered first. Therefore, whenever a DFS code is encountered that is not minimal its corresponding subtree can be pruned. Please consult [1] for further details.
[1] https://sites.cs.ucsb.edu/~xyan/papers/gSpan.pdf
[2] http://www.nowozin.net/sebastian/gboost/

For every vertex in a graph, find all vertices within a distance d

In my particular case, the graph is represented as an adjacency list and is undirected and sparse, n can be in the millions, and d is 3. Calculating A^d (where A is the adjacency matrix) and picking out the non-zero entries works, but I'd like something that doesn't involve matrix multiplication. A breadth-first search on every vertex is also an option, but it is slow.
def find_d(graph, start, st, d=0):
if d == 0:
st.add(start)
else:
st.add(start)
for edge in graph[start]:
find_d(graph, edge, st, d-1)
return st
graph = { 1 : [2, 3],
2 : [1, 4, 5, 6],
3 : [1, 4],
4 : [2, 3, 5],
5 : [2, 4, 6],
6 : [2, 5]
}
print find_d(graph, 1, set(), 2)
Let's say that we have a function verticesWithin(d,x) that finds all vertices within distance d of vertex x.
One good strategy for a problem such as this, to expose caching/memoisation opportunities, is to ask the question: How are the subproblems of this problem related to each other?
In this case, we can see that verticesWithin(d,x) if d >= 1 is the union of vertices(d-1,y[i]) for all i within range, where y=verticesWithin(1,x). If d == 0 then it's simply {x}. (I'm assuming that a vertex is deemed to be of distance 0 from itself.)
In practice you'll want to look at the adjacency list for the case d == 1, rather than using that relation, to avoid an infinite loop. You'll also want to avoid the redundancy of considering x itself as a member of y.
Also, if the return type of verticesWithin(d,x) is changed from a simple list or set, to a list of d sets representing increasing distance from x, then
verticesWithin(d,x) = init(verticesWithin(d+1,x))
where init is the function that yields all elements of a list except the last one. Obviously this would be a non-terminating recursive relation if transcribed literally into code, so you have to be a little bit clever about how you implement it.
Equipped with these relations between the subproblems, we can now cache the results of verticesWithin, and use these cached results to avoid performing redundant traversals (albeit at the cost of performing some set operations - I'm not entirely sure that this is a win). I'll leave it as an exercise to fill in the implementation details.
You already mention the option of calculating A^d, but this is much, much more than you need (as you already remark).
There is, however, a much cheaper way of using this idea. Suppose you have a (column) vector v of zeros and ones, representing a set of vertices. The vector w := A v now has a one at every node that can be reached from the starting node in exactly one step. Iterating, u := A w has a one for every node you can reach from the starting node in exactly two steps, etc.
For d=3, you could do the following (MATLAB pseudo-code):
v = j'th unit vector
w = v
for i = (1:d)
v = A*v
w = w + v
end
the vector w now has a positive entry for each node that can be accessed from the jth node in at most d steps.
Breadth first search starting with the given vertex is an optimal solution in this case. You will find all the vertices that within the distance d, and you will never even visit any vertices with distance >= d + 2.
Here is recursive code, although recursion can be easily done away with if so desired by using a queue.
// Returns a Set
Set<Node> getNodesWithinDist(Node x, int d)
{
Set<Node> s = new HashSet<Node>(); // our return value
if (d == 0) {
s.add(x);
} else {
for (Node y: adjList(x)) {
s.addAll(getNodesWithinDist(y,d-1);
}
}
return s;
}

Resources