Finding the optimal directed graph of affected nodes with dependencies - algorithm

I have a set of nodes that represent values that affect each other. I want to find the logical best path of cascading calculation so that each node is calculated only a single time after all of the ancestors that affect it are calculated.
In the graph example, this would be something like:
[A, B, F]
[A, C, H, G, D]
[A, E]
Where paths like [A, H], and [A, C, G] are skipped since they have a better route with other required dependencies that need to calculated first.
I have tried a depth-first-search, but am having trouble thinking of a good data structure to represent this algorithm.
Any help would be greatly appreciated.

Related

Find all paths on undirected Graph

I have an undirected graph and i want to list all possible paths from a starting node.
Each connection between 2 nodes is unique in a listed path is unique, for example give this graph representation:
{A: [B, C, D],
B: [A, C, D],
C: [A, B, D],
D: [A, B, C]}
some listed path starting from A
A, B, C, D, A, C in this path we have a connection between
A and B but we can't have a connection between B and A
I can't accomplish it using the existing algorithm that i know like DFS .
Any help will be very appreciated .
The simplest way would be to recursively try each neighbor and combine all the results.
This assumes there are no loops - if you allow loops (as in your example) there will be infinitely-many paths. In this case, you can make a path-generator by limiting the path-length to check for, then looping over all possible path-lengths.
A probably most intuitive way would be to, as previously suggested, iterate over all neighbours of each potential starting point until all paths are exhausted.
However, the risk with this method is that it tends to loop forever if the graph has cycles. This can be mitigated by using a list of visited vertices (or, probably preferable in your case, visited edges)
In pseudocode, that might give something like this:
paths = []
for node in graph
visited = []
path = [node]
add node and path to stack-or-queue
while node on stack-or-queue
pop node and path
for edges of node
if edge is not visited
add edge to visited
add neighbour to path
add neighbour and path to stack-or-queue
add path to paths
It produces an algorithm of relatively high complexity, so be sure to test it well to avoid crap.
Writing it recursively might be easier, although it removes the possibility of easily changing between DFS and BFS.

Minimum Spanning Tree with Additional Vertex

Does minimum spanning tree works for situation like this: If I want to go from A to B and I do not have to go to E, but the direct distance between A and B is larger than distance_AE + distanceEB, so I can go E first and then go to B. I'm not sure if the normal implementation of mst also works for this kind of graph. So if I want to find the mst of ABCD, but E is not included in this graph, how can I solve this?
I believe that you're confused about the basic problem: what you've posted is a contradiction. If E is not in the graph, then by definition dist(A, E) is undefined; for algorithmic purposes, it's Inf (infinity).
Yes, the MST algorithms work fine for this: the entire universe consists of (A, B, C, D).

How to find Strongly Connected Components in a Graph?

I am trying self-study Graph Theory, and now trying to understand how to find SCC in a graph. I have read several different questions/answers on SO (e.g., 1,2,3,4,5,6,7,8), but I cant find one with a complete step-by-step example I could follow.
According to CORMEN (Introduction to Algorithms), one method is:
Call DFS(G) to compute finishing times f[u] for each vertex u
Compute Transpose(G)
Call DFS(Transpose(G)), but in the main loop of DFS, consider the vertices in order of decreasing f[u] (as computed in step 1)
Output the vertices of each tree in the depth-first forest of step 3 as a separate strong connected component
Observe the following graph (question is 3.4 from here. I have found several solutions here and here, but I am trying to break this down and understand it myself.)
Step 1: Call DFS(G) to compute finishing times f[u] for each vertex u
Running DFS starting on vertex A:
Please notice RED text formatted as [Pre-Vist, Post-Visit]
Step 2: Compute Transpose(G)
Step 3. Call DFS(Transpose(G)), but in the main loop of DFS, consider the vertices in order of decreasing f[u] (as computed in step 1)
Okay, so vertices in order of decreasing post-visit(finishing times) values:
{E, B, A, H, G, I , C, D, F ,J}
So at this step, we run DFS on G^T but start with each vertex from above list:
DFS(E): {E}
DFS(B): {B}
DFS(A): {A}
DFS(H): {H, I, G}
DFS(G): remove from list since it is already visited
DFS(I): remove from list since it is already visited
DFS(C): {C, J, F, D}
DFS(J): remove from list since it is already visited
DFS(F): remove from list since it is already visited
DFS(D): remove from list since it is already visited
Step 4: Output the vertices of each tree in the depth-first forest of step 3 as a separate strong connected component.
So we have five strongly connected components: {E}, {B}, {A}, {H, I, G}, {C, J, F, D}
This is what I believe is correct. However, solutions I found here and here say SCCs are {C,J,F,H,I,G,D}, and {A,E,B}. Where are my mistakes?
Your steps are correct and your answer is also correct, by examining the other answers you provided you can see that they used a different algorithm: First you run DFS on G transposed and then you run an undirected components algorithm on G processing the vertices in decreasing order of their post numbers from the previous step.
The problem is they ran this last step on G transposed instead of in G and thus got an incorrent answer. If you read Dasgupta from page 98 onwards you will see a detailed explanation of the algorithm they (tried) to use.
Your answers is correct. As per CLRS, "A strongly connected component of a directed graph G = (V,E) is a maximal set of vertices C, such that for every pair of vertices u and v, we have both u ~> v and v ~> u, i.e. vertices v and u are reachable from each other."
In case you assume {C, J, F, H, I, G, D} as correct, there is no way to reach from D to G (amongst many other fallacies), and same with other set, there is no way to reach from A to E.

Most efficient algorithm to check if leaf c is in the same subtree as leaves a and b

Currently, I am working on a program, one of whose steps is to check if a leaf c is in the same subtree as two other leaves a and b, in a binary tree T. My current approach is as follows: first, find the LCA of each pairs of leaves in T, and store it in a dictionary. Then, for each node in the tree, find all of the leaves that are descendants of it, and store it in a dictionary as well. Then when I need to determine if c is in the same subtree as a and b, I find the LCA of a and b, and check if c is a descendant of it.
I will need to run this step for many different pairs a and b, and do it on binary trees that have as many as 600 leaves, so is there a faster algorithm, or perhaps one that uses less memory, that does this same task? Thanks.
One useful observation that might help you here is the following: the smallest subtree containing leaves a and b is the subtree rooted at LCA(a, b). This means that you can test whether c is in the subtree by checking whether c is a descendant of LCA(a, b). One way to do this is the following: compute LCA(LCA(a, b), c). If c is in this subtree, then LCA(LCA(a, b), c) = LCA(a, b). Otherwise, it will be some other node. This gives a nice algorithm:
Return whether LCA(LCA(a, b), c) = LCA(a, b).
It might also help to use a fast LCA data structure. You mentioned precomputing the LCA of all pairs of nodes in the tree, but there are faster options. In particular, there are some nice algorithms that with O(n) preprocessing time can return the LCA of two nodes in a tree in time O(1) each. If you know the pairs in advance, check out Tarjan's offline LCA algorithm; if you don't, look up the Fischer-Heun LCA data structure.
Hope this helps!

Algorithm for laying out a directed acyclic graph in memory to maxmise data locality

Say I have the edges
A -> C
A -> D
A -> E
B -> D
B -> E
To maximise data locality I'd arrange for the DAG to be stored in this order in memory (as an array) to minimise the distance between the node and it's dependencies.
C, A, D, E, B
so that A has a distance 1 to C, 1 to D, 2 to E.
And that B has a distance of 1 to E, and 2 to D.
Is there name for an algorithm that does this? If not, how would one implement this?
Looks like you want to linearize the DAG. I don't know whether you are using it for dependancy resolution. Topological_sorting looks familiar to your question. also the program tsort does very similer thing.
However it is dependancy linearization.
neel#gentoo:~$ tsort
C A
D A
E A
D B
E B
C
D
E
B
A
Which prints the order in which that tasks have to be performed. and it will possible not work if there is a cycle. its relevant as you mentioned its acyclic.
I dont know if there is any such algorithm for data locality ordering string or anything similar however It looks like your data locality string have some problem.
What if C is close(1) to A and is also close(1) to B and B is too far(4) from A how will you represent it with your data locality string ?
I don't now what exactly you want to do. If you want to liniarize dependency to perform tasks in proper order then do a topological sort.
Here is slightly different approach to improve locality:
http://ceur-ws.org/Vol-733/paper_pacher.pdf
The described algorithm seems to be closer to force-directed graph drawing algorithm than to topological sorting.
You should also read papers on in-memory graph databases such as imGraph

Resources