How to represent a dependency graph with alternative paths - algorithm

I'm having some trouble trying to represent and manipulate dependency graphs in this scenario:
a node has some dependencies that have to be solved
every path must not have dependencies loops (like in DAG graphs)
every dependency could be solved by more than one other node
I starts from the target node and recursively look for its dependencies, but have to mantain the above properties, in particular the third one.
Just a little example here:
I would like to have a graph like the following one
(A)
/ \
/ \
/ \
[(B),(C),(D)] (E)
/\ \
/ \ (H)
(F) (G)
which means:
F,G,C,H,E have no dependencies
D dependends on H
B depends on F and G
A depends on E and
B or
C or
D
So, if I write down all the possible topological-sorted paths to A I should have:
E -> F -> G -> B -> A
E -> C -> A
E -> H -> D -> A
How can I model a graph with these properties? Which kind of data structure is the more suitable to do that?

You should use a normal adjacency list, with an additional property, wherein a node knows its the other nodes that would also satisfy the same dependency. This means that B,C,D should all know that they belong to the same equivalence class. You can achieve this by inserting them all into a set.
Node:
List<Node> adjacencyList
Set<Node> equivalentDependencies
To use this data-structure in a topo-sort, whenever you remove a source, and remove all its outgoing edges, also remove the nodes in its equivalency class, their outgoing edges, and recursively remove the nodes that point to them.
From wikipedia:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edges
while S is non-empty do
remove a node n from S
add n to tail of L
for each node o in the equivalency class of n <===removing equivalent dependencies
remove j from S
for each node k with an edge e from j to k do
remove edge e from the graph
if k has no other incoming edges then
insert k into S
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
This algorithm will give you one of the modified topologicaly-sorted paths.

Related

Algorithm for "balanced" breadth-first search

I'm looking for references for an algorithm that conducts a breadth-first tree search in a balanced manner that is resilient in a situation where
we expect most nodes to have few children, but
a few nodes may have a many (possibly infinitely many) children.
Consider this simple tree (modified from this post):
A
/ \
B C
/ / \
D E F
| /|\ \
G HIJ... K
Depth-first visits nodes in this order:
A B D G C E H I J ... F K
Breadth-first visits nodes in this order:
A B C D E F G H I J ... K
The balanced breadth-first algorithm that I have in mind would visit nodes in this order:
A B C D E G F H K I J ...
Note how
we visit G before F, and
we visit K after H but before I.
G is deeper than F, but it is an only child of B whereas F is a second child of C and must share its search priority with E. Similarly between K and the many children H, I, J, ... of E.
I call this "balanced" because a node with lots of children cannot choke the algorithm. Concretely, if E has 𝜔 (infinitely) many nodes then a pure breadth-first strategy would never reach K, whereas the "balanced" algorithm would still reach K after H but before the other children of E.
(The reader who does not like 𝜔 can attain a similar effect with a large but still finite number such as "the greatest number of steps any practical search algorithm will ever make, plus 1".)
I can only imagine that this style of search or something like it must have been the subject of much research and practical application. I would be grateful to be pointed in the right direction. Thank you.
Transform your tree to a different kind of representation. In this new representation, each node has at most two links: one to its leftmost child, and one to its right sibling.
A
/ \
B C
/ / \
D E F
| /|\ \
G HIJ... K
  ⇓
A
/
B --> C
/ /
D E -> F
| / \
G / K
/
H -> I -> J -> ...
Then treat this representation as a normal binary tree, and traverse it breadth-first. It may have an infinite height, but like with any binary tree, the width at any particular level is finite.
depth-first-search, breadth-first-search, A*-search (and others) only differ in how you handle the list of "nodes still to visit".
(I assume you always process the node at the start of the list next)
depth-first-search: append new nodes at the front of the list
breadth-first-search: add new nodes to the end of the list
A*-search: insert new nodes in the list according to costs+heuristic
So you need to formalize how to insert new nodes to the list to fulfill your requirements.

why when we change the cost of every edge in G as c'= log17(c),every MST in G is still an MST in G′ (and vice versa)?

remarks:c' is logc with base 17
MST means (minimum spanning tree)
it's easy to prove the conclusion is correct when we use linear function to transform the cost of every edge.
But log function is not a linear function ,I could not understand why this conclusion is correct。
Supplementary notes:
I did not consider specific algorithms, such as the greedy algorithm. I simply consider the relationship between the sum of the weights of the two trees after transformation.
Numerically if (a + b) > (c + d) , (log a + log b) maybe not > ( logc + logd) .
If a tree generated by G has two edge a and b ,another tree generated by G has c and d,a + b < c + d and the first tree is a MST,but in transformed graph G' ,the sum of weights of edges of second tree may be smaller.
Because of this, I want to construct a counterexample based on "if (a + b)> (c + d), (log a + log b) maybe not> (logc + logd) ", but I failed.
One way to characterize when a spanning tree T is a minimum spanning tree is that, for every edge e not in T, the cycle formed by e and edges of T (the fundamental cycle of e with respect to T) has no edge more expensive than e. Using this characterization, I hope you see how to prove that transforming the costs with any increasing function preserves minimum spanning trees.
There's a one line proof that this condition is necessary. If the fundamental cycle contained a more expensive edge, we could replace it with e and get a spanning tree that costs less than T.
It's less obvious that this condition is sufficient, since at first glance it looks like we're trying to prove global optimality from a local optimality condition. To prove this statement, let T be a spanning tree that satisfies the condition, let T' be a minimum spanning tree, and let G' be the graph whose edges are the union of the edges of T and T'. Run Kruskal's algorithm on G', breaking ties by favoring edges in T over edges not in T. Let T'' be the resulting minimum spanning tree in G'. Since T' is a spanning tree in G', the cost of T'' is not greater than T', hence T'' is a minimum spanning tree in G as well as G'.
Suppose to the contrary that T'' ≠ T. Then there exists an edge in T but not in T''. Let e be the first such edge considered by Kruskal's algorithm. At the time that e was considered, it formed a cycle C in the edges that had been selected from T''. Since T is acyclic, C \ T is nonempty. By the tie breaking criterion, we know that every edge in C \ T costs less than e. Observing that some edge e' in C \ T must have one endpoint in each of the two connected components of T \ {e}, we infer that the fundamental cycle of e' with respect to T contains e, which violates the local optimality condition. In conclusion, T = T'', hence is a minimum spanning tree in G.
If you want a deeper dive, this logic gets abstracted out in the theory of matroids.
Well, its pretty easy to understand...let's see if I can break it down for you:
c` = log_17(c) // here 17 is base
log may not be linear function...but we can say that:
log_b(x) > log_b(y) if x > y and b > 1 (and of course x > 0 and y > 0)
I hope you get the equation I've written...In words in means, consider a base "b" such that b > 1, then log_b(x) would be greater than log_b(y) if x > y.
So, if we apply this rule in your costs of MST of G, then we see that the edges those were selected for G, would still produce the least possible edges to construct MST G' if c' = log_17(c) // here 17 is base.
UPDATE: As I can see you've problem understanding the proof, I'm elaborating a bit:
I guess, you know MST construction is greedy. We're going to use kruskal's algo to proof why it is correct.(In case, you don't know, how kruskal's algo works, you can read it somewhere, or just google it, you'll find millions of resources). Now, Let me write some steps of kruskal's edge selection for MST of G:
// the following edges are sorted by cost..i.e. c_0 <= c_1 <= c_2 ....
c_0: A, F // here, edge c_0 connects A, F, we've to take the edge in MST
c_1: A, B // it is also taken to construct MST
c_2: B, R // it is also taken to construct MST
c_3: A, R // we won't take it to construct to MST, cause (A, R) already connected through A -> B -> R
c_4: F, X // it is also taken to construct MST
...
...
so on...
Now, when constructing MST of G', we've to select edges which are in the form c' = log_17(c) // where 17 is base
Now, if we convert the edges using log of base 17, then c_0 becomes c_0', c_1 becomes c_1' and so on...
But we, know that:
log_b(x) > log_b(y) if x > y and b > 1 (and of course x > 0 and y > 0)
So, we may say that,
log_17(c_0) <= log_17(c_1), cause c_0 <= c_1
in general,
log_17(c_i) <= log_17(c_j), where i <= j
And now, we may say:
c_0` <= c_1` <= c_2` <= c_3` <= ....
So, the edge selection process to construct MST of G' would be:
// the following edges are sorted by cost..i.e. c_0` <= c_1` <= c_2` ....
c_0`: A, F // here, edge c_0` connects A, F, we've to take the edge in MST
c_1`: A, B // it is also taken to construct MST
c_2`: B, R // it is also taken to construct MST
c_3`: A, R // we won't take it to construct to MST, cause (A, R) already connected through A -> B -> R
c_4`: F, X // it is also taken to construct MST
...
...
so on...
Which is same as MST of G...
That proves the theorem ultimately....
I hope you get it...if not ask me in the comment what is not clear to you...

Check if all given vertices are on a path

The problem is
Given a graph and N sets of vertices, how to check that if the vertices
in each set exist on a path (that may not have to be simple) in the
graph.
For example, if a set of vertices is {A, B, C}
1) On a graph
A --> B --> C
The vertices A, B and C are on a path
2) On a graph
A ---> B ---> ...
^
C -----+
The vertices A, B and C are not on a path
3) On a graph
A <--- B <--- ...
|
C <----+
The vertices A, B and C are not on a path
4) On a graph
A ---> X -----> B ---> C --> ...
| ^
+---------------+
The vertices A, B and C are on a path
Here is a simple algorithm with complexity N * (V + E).
for each set S of vertices with more than 1 elements {
initialize M that maps a vertice to another vertice it reaches;
mark all vertices of G unvisited;
pick and remove a vertex v from S;
while S is not empty {
from all unvisited node, find one vertex v' in S that v can reach;
if v' does not exist {
pick and remove a vertex v from S;
continue;
}
M[v] = v';
remove v' from S;
v = v';
}
// Check that the relations in M forms a path
{
if size[M] != size(S)-1 {
return false;
}
take the vertex v in S that is not in M
while M is not empty {
if not exist v' s.t. M[v]' = v {
return false;
}
}
return true;
}
}
The for-loop takes N step; the while-loop would visit all node/edge in the worst case with cost V + E.
Is there any known algorithm to solve the problem?
If a graph is DAG, could we have a better solution?
Thank you
Acyclicity here is not a meaningful assumption, since we can merge each strong component into one vertex. The paths are a red herring too; with a bunch of two-node paths, we're essentially making a bunch of reachability queries in a DAG. Doing this faster than O(n2) is thought to be a hard problem: https://cstheory.stackexchange.com/questions/25298/dag-reachability-with-on-log-n-space-and-olog-n-time-queries .
You should check out the Floyd-Warshall algorithm. It will give you the lengths of the shortest path between all pairs of vertices in the graph in O(V3). Once you have those results, you can do a brute-force depth-first traversal to determine whether you can go from one node to the next and the next etc. That should happen in O(nn) where n is the number of vertices in your current set. Total complexity, then, should be O(V3 + N*nn) (or something like that).
The nn seems daunting, but if n is small compared to V it won't be a big deal in practice.
I can guess that one could improve on that given certain restrictions on the graph.

Shortest path that traverses a list of required edges

I have a directed graph, that looks like this:
I want to find the cheapest path from Start to End where the orange dotted lines are all required for the path to be valid.
The natural shortest path would be: Start -> A -> B -> End with the resultant cost = 5, but we have not met all required edge visits.
The path I want to find (via a general solution) is Start -> A -> B -> C -> D -> B -> End where the cost = 7 and we have met all required edge visits.
Does anyone have any thoughts on how to require such edge traversals?
Let R be the set of required edges and F = |R|. Let G be the input graph, t (resp. s) the starting (resp. ending) point of the requested path.
Preprocessing: A bunch of Dijkstra's algorithm runs...
The first step is to create another graph. This graph will have exactly F+2 vertices:
One for each edge in R
One for the starting point t of the path you want to compute
One for the ending point s of the path you want to compute
To create this graph, you will have to do the following:
Remove every edge in R from G.
For each edge E = (b,e) in R:
Compute the shortest path from t to b and the shortest path from e to s. If they exist, add an edge linking s to E in the "new graph", weighing the length of the related shortest path.
For each edge E' = (b', e') in R \ {E}:
Compute the shortest path from e to b'. If it exists, add an edge from E to E' in the new graph, weighing the length of that shortest path. Attach the computed paths as payload to the relevent edges.
Attach the computed path as a payload to that edge
The complexity to build this graph is O((F+2)².(E+V).log(V)) where E (resp. V) is the number of edges (resp. vertices) in the original graph.
Exhaustive search for the best possible path
From this point, we have to find the shortest Hamiltonian Path in the newly created graph. Unfortunately, this task is a hard problem. We have no better way than exploring every possible path. But that doesn't mean we can't do it cleverly.
We will perform the search using backtracking. We can achieve this by maintaining two sets:
The list of currently explored vertices: K (K for Known)
The list of currently unknown vertices: U (U for Uknown)
Before digging in the algorithm definition, here are the main ideas. We cannot do anything else than exploring the whole space of possible paths in the new graph. At each step, we have to make a decision: which edge do we take next? This leads to a sequence of decisions until we cannot move anymore or we reached s. But now we need to go back and cancel decisions to see if we can do better by changing a direction. To cancel decisions we proceed like this:
Every time we are stuck (or found a path), we cancel the last decision we made
Each time we take a decision at some point, we keep track of which decision, so when we get back to this point, we know not to take that very same decision and explore the others that are available.
We can be stuck because:
We found a path.
We cannot move further (there is no edge we can explore or the only one we could take increases the current partial path too much -- its length becomes higher than the length of the current best path found).
The final algorithm can be summed up in this fashion: (I give an iterative implementation, one can find a recursive implementation a tad easier and clearer)
Let K ← [], L[0..R+1] ← [] and U ← V (where V is the set of every vertex in the working graph minus the starting and ending vertices t and s). Finally let l ← i ← 0 and best_path_length ← ∞ and best_path ← []
While (i ≥ 0):
While U ≠ []
c ← U.popFront() (we take the head of U)
L[i].pushBack(c)
If i == R+1 AND (l == weight(cur_path.back(), s) + l) < best_path_length:
best_path_length ← l
best_path ← cur_path
If there is an edge e between K.tail() and c, and weight(e) + l < best_path_length: (if K is empty, then replace K.tail() with t in the previous statement)
K.pushBack(c)
i ← i+1
l ← weight(e) + l
cur_path.pushBack(c)
Concatenate L[i] at the end of U
L[i] ← []
i ← i-1
cur_path.popBack()
At the end of the while loop (while (i ≥ 0)), best_path will hold the best path (in the new graph). From there you just have to get the edges' payload to rebuild the path in the original graph.

Transitive set merging

Is there a well-known algorithm, that given a collection of sets, would merge together every two sets that have at least one common element? So for example, given the following input:
A B C
B D
A E
F G H
I J
K F
L M N
E O
It would produce:
A B C D E O
F G H K
I J
L M N
I already have a working implementation, but it seems to be common enough that there has to be a name for what I am doing.
You can model this as a simple graph problem: Introduce a node for every distinct element. Introduce a node for every set. Connect every set to the elements it contains. You get an (undirected) bipartite graph, in which the connected components are the solution of your problem. You can use depth-first search to find the CCs.
The runtime should be linear (with hash tables, so only expected runtime unless your numbers are bounded).
I don't think it deserves a special name, it's just an application of well-known concepts.

Resources