Control Flow Graph Traversal - BFS, but make sure predecessors are Visited - algorithm

I have a situation where the predecessors of a node must be visited before the node is visited. So, here is the code for that:
nodeQ.Enqueue(rootNode);
while(!nodeQ.Empty())
{
node = nodeQ.Dequeue();
ForEach(var predecessor in node.Predecessors)
{
if(predecessor is not visited)
{
//put the node back into the queue
nodeQ.Enqueue(node);
skip = true;
break;
}
}
if(skip)continue;
Visit(node)
foreach(var successor in node.Successors)
{
if(successor is not already visited)
{
nodeQ.Enqueue(successor);
}
}
}
The above algorithm will be ok for linear control flow graphs without cycles (read: loops)
The normal BFS traversal doesn't ensure that the predecessors of a node are visited before the node itself.
Example CFG:
The Normal BFS traversal will be:
0, 1 , 2 , 3 , 12, 4, 5, 9, 10, 11, 8 , 6, 7
However, I want the order to be 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 which can be acheived by the small modification that is there in my code shown in the beginning.
However, this modification will cause endless skipping of blocks when there are loops involved.
Example CFG where this can fail:
In this scenario, my code will endlessly postpone Visiting of Nodes 1, 2, 3
So, I was looking for a way of traversal that ensures the traversal of nodes of a CFG (with or without loops) in such a way that the predecessors of a node are visited before the node itself.
I was thinking of identifying back-edge i.e, checking if a node N is a Dominator of its predecessor P, then P->N is a backedge and there is no need to consider P as a predecessor of node N. However this doesnt seem to work as node N doesnt always have to dominate node P.

I solved this problem by first finding Dominators, Creating a Dominator Tree and then traverse the Tree in DFS pre-order.

For others who come to this question seeking CFG and dominator calculation guidance, it might be useful to check out the IBM WALA tools. I'm finding a lot of useful information here for my particular quest along these lines.

Related

Find guaranteed ancestors in directed graph

I'm trying to implement an algorithm to find what I call 'guaranteed ancestors' in a directed graph. I have a list of nodes which each can point to zero, one or multiple child nodes.
Below you see an example of a simple graph. I've marked all circles with a unique number.
Let's imagine we're trying to determine which nodes I'm guaranteed to have visited before reaching node 13 starting at node 0.
My thoughts when solving this simple example by hand is starting in node 13 and working my way back, which nodes am I guaranteed to visit no matter which direction I go. The first node I notice obeying this property is node 10, since no matter if I choose to visit node 11 or node 12, then I'm guaranteed to eventually reach node 13. Similarly I can conclude I have to visit node 9 if I want to reach node 13. Working all the way up the graph I conclude that node 13 has node 0, 1, 9, 10 as it's guaranteed anchestors.
I'm not sure what such an algorithm is called, but I'm sure there is a name for this specific search.
Here is the constraints you can assume about my graph.
There is a single defined "head/root" node, which is the only node without any other nodes pointing to it.
The graph is acyclic (Ideally the algorithm would be able to handle cycles too, but I have a different check, verifying that the graph is acyclic, so this is not a must.)
There is no "dead" nodes, eg. nodes which can't be reached from the head/root node.
This has to run on more complicated graphs with up to 500 nodes and many nodes with multiple "parents", which could be connected back and forth. Runtime is a priority as well - I assume we should be able to solve this problem in linear time complexity.
I've tried simplifying the problem to the point where I tried making an algorithm which could determine if a single node was a guaranteed anchestor of another node, which I believe is pretty simple to determine in O(n), however if I want a complete list of all guaranteed anchestors I assume I'd have to run this algorithm for every node, leaving me with O(n^2).
Does anyone know the correct name of the algorithm I'm describing?
Assign a weight of 1 to every edge
Run Dijkstra to find shortest path between head and root.
Assign weight of 2 * ( edge count of graph ) to every edge in path
Run Dijkstra to find cheapest path
Identify edges that are present in both paths. ( they could not be avoided although very expensive )
The nodes at both ends of every edge identified in 5 will be critical - i.e they must ALL be visted by any route between head and root.
Consider an example:
The first Dijkstra run would return a path containing node 1 or 2 ( they both belong on 5 hop paths. The second run would return a path containing the other of those two nodes
This is almost what the definition of an articulation or cut vertex is in an undirected graph. See Biconnected component:
a cut vertex is any vertex whose removal increases the number of connected components.
The difference is that your graph is directed, and that you consider the root also as such a vertex.
So my suggestion is to temporarily consider the graph to be undirected, and to apply a depth-first algorithm to identify such cut vertices, and include the root.
The algorithm is given as pseudo code in the same Wikipedia article. I have rewritten it in JavaScript, so it can be run here for the graph that you have given as example:
function buildAdjacencyList(n, edges) {
// Indexes in adj represent node identifiers.
// Values in adj are lists of neighbors: start out with empty lists
let adj = [];
for (let i = 0; i < n; i++) adj.push([]);
for (let [start, end] of edges) {
adj[start].push(end );
adj[end ].push(start); // make edge bidirectional
}
return adj;
}
function markArticulationPoints(nodes, node, depth) {
node.visited = true;
node.depth = depth;
node.low = depth;
for (let neighborId of node.neighbors) {
let neighbor = nodes[neighborId];
if (!neighbor.visited) {
neighbor.parent = node;
markArticulationPoints(nodes, neighbor, depth + 1);
if (neighbor.low >= node.depth) node.isArticulation = true;
if (neighbor.low < node.low) node.low = neighbor.low;
} else if (neighbor != node.parent && neighbor.depth < node.low) {
node.low = neighbor.depth;
}
}
}
function getArticulationPoints(adj, root) {
// Create object for each node, having meta data for algorithm
let nodes = [];
for (let i = 0; i < adj.length; i++) {
nodes.push({
neighbors: adj[i],
visited: false,
depth: Infinity,
low: Infinity,
parent: -1,
isArticulation: i == root // root is considered articulation point
});
}
markArticulationPoints(nodes, nodes[root], 0); // start DFS algorithm
// Collect articulation points from meta data
let result = [];
for (let i = 0; i < adj.length; i++) {
if (nodes[i].isArticulation) result.push(i);
}
return result;
}
// Build adjacency list for example graph, but with undirected edges
let adj = buildAdjacencyList(14, [
[0, 1],
[1, 2],
[1, 3],
[2, 4],
[2, 5],
[4, 5],
[4, 6],
[3, 7],
[7, 8],
[6, 9],
[8, 9],
[9, 10],
[10, 11],
[10, 12],
[11, 13],
[12, 13]
]);
let result = getArticulationPoints(adj, 0);
console.log("Articluation points:", ...result);

Augmenting red-black tree for minDiff

So I have the following question:
You have of set of numbers, S, that you are storing in a red-black tree. You are trying to add minDiff to the red-black tree which gives you the absolute difference between the two closest numbers in S. For example if S = {1, 18, 23, 62, 79, 100} minDiff would return 5 (|23 - 18|)
A) Show how to augment a red-black tree to support this operation efficiently while maintaining the O(lgn) running time for Insert, Search and Delete.
B) Show how to output the values of two numbers that created the MinDiff. For the example above you would output 23 and 18.
My confusion:
I am stuck on the very beginning parts of the question, namely what to augment. I can think of simple and inefficient solutions such as having each node hold the absolute difference between itself and its parent. However, it seems like there should be some elegant solution that doesn't require you looking at every value of the tree to determine the solution.
I wish I could show more of my work, but I am completely stumped and don't know where to start!
The information you add to the tree has to meet 2 requirements:
It has to let you calculate minDiff quickly; and
You have to be able to recalculate the parent information from the information in its two children. This lets you quickly fix up the information in any nodes affected by inserts, deletes, and rebalancing operations.
The answer that immediately comes to mind is to augment each node in the tree with the minDiff among nodes in its subtree and the minimum and maximum values in its subtree.
node.minVal = node.left ? node.left.minVal : node.val
node.maxVal = node.right ? node.right.maxVal : node.val
node.minDiff = min(
node.left.minDiff,
node.right.minDiff,
node.val - node.left.maxVal,
node.right.minVal - node.Val
)

Why does Dijkstra's algorithm need a priority queue when this regular queue version is also correct?

I have read the following but please take a look at my code below.
Why Dijkstra's Algorithm uses heap (priority queue)?
I have two versions of dijkstra, one good version with PQueue, and one bad version with regular linked list queue.
public static void computeDijkstra(Vertex source) {
source.minDistance = 0.;
Queue<Vertex> vertexQueue = new PriorityQueue<Vertex>();
// Queue<Vertex> vertexQueue = new LinkedList<Vertex>();
vertexQueue.add(source);
while (!vertexQueue.isEmpty()) {
Vertex fromVertex = vertexQueue.poll();
if (fromVertex.neighbors != null) {
for (Edge currentEdge : fromVertex.neighbors) {
Vertex toVertex = currentEdge.target;
if (currentEdge.weight + fromVertex.minDistance < toVertex.minDistance) {
toVertex.minDistance = currentEdge.weight + fromVertex.minDistance;
toVertex.previous = fromVertex;
vertexQueue.add(toVertex);
}
}
}
}
}
public static void computeDijkstraBad(Vertex source) {
source.minDistance = 0.;
// Queue<Vertex> vertexQueue = new PriorityQueue<Vertex>();
Queue<Vertex> vertexQueue = new LinkedList<Vertex>();
vertexQueue.add(source);
while (!vertexQueue.isEmpty()) {
Vertex fromVertex = vertexQueue.poll();
if (fromVertex.neighbors != null) {
for (Edge currentEdge : fromVertex.neighbors) {
Vertex toVertex = currentEdge.target;
if (currentEdge.weight + fromVertex.minDistance < toVertex.minDistance) {
toVertex.minDistance = currentEdge.weight + fromVertex.minDistance;
toVertex.previous = fromVertex;
vertexQueue.add(toVertex);
}
}
}
}
}
I also have graph creation with a text file like below
0, 1, 2, 3, 4, 5, 6 // vertices
0, 6 // from and to vertex
1, (2-5, 0-4, 4-6) // from vertex 1 it will have edge to 2 with weight 5 ...
0, (4-3, 3-7)
4, (2-11, 3-8)
3, (2-2, 5-5)
2, (6-2, 5-10)
5, (6-3)
Both the implementation renders the following [0, 3, 2, 6] which is the shortest path indeed from 0 to 6!
Now we know, if a Simple BFS is used to find shortest distance with positive integers, there will be cases where it will not find the minimum path. So, can someone give me a counter example for which my Bad implementation will fail to print the right path for a graph. Feel free to give me the answer in the graph format (the sample text file format) that I used.
So far all the graphs I have had, both implementations rendered the right result. This shouldn't happen because the bad implementation is runtime (E+V) and we know we can't find shortest path without at least E log V.
Another example,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
0, 10
0, (1-9, 2-10, 3-11)
1, (4-1, 5-7)
2, (4-4, 5-3, 6-5)
3, (5-1, 6-4)
4, (7-9, 8-14, 5-3)
5, (7-4, 8-5, 9-9, 6-2)
6, (8-2, 9-2)
7, (10-3)
8, (10-2)
9, (10-5)
Both implementations renders [0, 3, 5, 6, 8, 10], which is the correct shortest path from 0-10.
I believe that the algorithm you've given is correct, but that it isn't as efficient as Dijkstra's algorithm.
At a high level, your algorithm works by finding an "active" node (one whose distance has been lowered), then scanning the outgoing edges to activate all adjacent nodes that need their distance updated. Notice that the same node can be made active multiple times - in fact, it's possible that a node will be activated once per time its candidate distance drops, which can happen potentially many times in a run of the algorithm. Additionally, the algorithm you have implemented might put the same node into the queue multiple times if the candidate distance drops multiple times, so it's possible that all dequeues except the first will be unnecessary. Overall, I'd expect this would result in a pretty big runtime hit for large graphs.
In a sense, the algorithm you've implemented is a shortest paths algorithm, but it's not Dijkstra's algorithm. The main difference is that Dijkstra's algorithm uses a priority queue to ensure that every node is dequeued and processed once and exactly once, leading to much higher efficiency.
So I guess the best answer I can give is "your algorithm isn't an implementation of Dijkstra's algorithm, and the reason Dijkstra's algorithm uses a priority queue is to avoid recomputing distances multiple times in the way that your algorithm does."
Your algorithm will find the right result but what your approach is doing is that it kills off the efficiency of Dijkstra's approach.
Example:
Consider 3 nodes named A B C.
A->C :7
A->B :2
B->C :3
In your bad approach, You'll first set the shortest path from A to C as 7, and then, as you traverse, you will revise it to 5 (A->B-C)
In Dijkstra's approach, A->C will not be traversed at all because, when using a min-heap, A->B will be traversed first, B will be marked as "traversed", and then B->C will be traversed, and, C will be marked as "traversed".
Now, since C is already marked as "traversed", the path A->C (of length 7) will never be checked.
Therefore, as you can see, in your bad approach, you will be reaching C 2 times (A->C & A->B->C), while using Dijkstra's approach, you will go to C only once.
This example should prove that you will have fewer iterations with Dijkstra's algorithm.
Read the others' answers, they are right, and the summary is the bad implement method is correct, but the complexity of the owner's claim(O(E+V)) is not Right. One other thing that no one else has mentioned, and I'll add it here.
This poor implementation also corresponds to an algorithm, which is actually a derivative of BFS, formally known as SPFA. Check out the SPFA algorithm. When the author published this algorithm back in 1994, he claimed it has a better performance than Dijkstra with O(E) complexity, which is Wrong.
It became very popular among ACM students. Due to its simplicity and ease to implement. As for Performance, Dijkstra is preferred.
similar reply ref to this post
All of the answers are very well written, so I won't be adding much explanation.
I am adding an example that I came across in a comment made by Striver in one of his videos of the graph playlist. Hope it will help to see the issue clearly.
If we don't use Priority Queue while implementing Dijkstra's Algo for an undirected graph with arbitrary edge weights then we might have to visit the same node, again and again, it will not be able to account for arbitrary edge weights.
Take the below example and dry run it by taking a Priority queue and by taking a simple queue, you will see that you have to visit node 6 twice for the normal queue dry run. So, in the case of much larger graphs, this will result in very poor performance.
Edit-- 5 to 6 node edge weight would be 1 and not 10. I will upload the correct image soon.
when we use the queue the unnecessary reoccurrence of visiting node arises, while in case of priority queue we avoid these things.

Function of this graph pseudocode

procedure explore(G; v)
Input: G = (V;E) is a graph; v 2 V
Output: visited(u) is set to true for all nodes u reachable from v
visited(v) = true
previsit(v)
for each edge (v; u) 2 E:
if not visited(u): explore(u)
postvisit(v)
All this pseudocode does is find one path right? It does nothing while backtracking if I'm not wrong?
It just explores the graph (it doesn't return a path) - everything that's reachable from the starting vertex will be explored and have the corresponding value in visited set (not just the vertices corresponding to one of the paths).
It moves on to the next edge while backtracking ... and it does postvisit.
So if we're at a, which has edges to b, c and d, we'll start by going to b, then, when we eventually return to a, we'll then go to c (if it hasn't been visited already), and then we will similarly go to d after return to a for the 2nd time.
It's called depth-first search, in case you were wondering. Wikipedia also gives an example of the order in which vertices will get explored in a tree: (the numbers correspond to the visit order, we start at 1)
In the above, you're not just exploring the vertices going down the left (1-4), but after 4 you go back to 3 to visit 5, then back to 2 to visit 6, and so on, until all 12 are visited.
With regard to previsit and postvisit - previsit will happen when we first get to a vertex, postvisit will happen after we've explored all of it's children (and their descendants in the corresponding DFS tree). So, in the above example, for 1, previsit will happen right at the start, but post-visit will happen only at the very end, because all the vertices are children of 1 or descendants of those children. The order will go something like:
pre 1, pre 2, pre 3, pre 4, post 4, pre 5, post 5, post 3, pre 6, post 6, post 2, ...

Finding size of max independent set in binary tree - why faulty "solution" doesn't work?

Here is a link to a similar question with a good answer: Java Algorithm for finding the largest set of independent nodes in a binary tree.
I came up with a different answer, but my professor says it won't work and I'd like to know why (he doesn't answer email).
The question:
Given an array A with n integers, its indexes start with 0 (i.e, A[0],
A[1], …, A[n-1]). We can interpret A as a binary tree in which the two
children of A[i] are A[2i+1] and A[2i+2], and the value of each
element is the node weight of the tree. In this tree, we say that a
set of vertices is "independent" if it does not contain any
parent-child pair. The weight of an independent set is just the
summation of all weights of its elements. Develop an algorithm to
calculate the maximum weight of any independent set.
The answer I came up with used the following two assumptions about independent sets in a binary tree:
All nodes on the same level are independent from each other.
All nodes on alternating levels are independent from each other (there are no parent/child relations)
Warning: I came up with this during my exam, and it isn't pretty, but I just want to see if I can argue for at least partial credit.
So, why can't you just build two independent sets (one for odd levels, one for even levels)?
If any of the weights in each set are non-negative, sum them (discarding the negative elements because that won't contribute to a largest weight set) to find the independent set with the largest weight.
If the weights in the set are all negative (or equal to 0), sort it and return the negative number closest to 0 for the weight.
Compare the weights of the largest independent set from each of the two sets and return it as the final solution.
My professor claims it won't work, but I don't see why. Why won't it work?
Interjay has noted why your answer is incorrect. The problem can be solved with a recursive algorithm find-max-independent which, given a binary tree, considers two cases:
What is the max-independent set given that the root node is
included?
What is the max-independent set given that the root node
is not included?
In case 1, since the root node is included, neither of its children can. Thus we sum the value of find-max-independent of the grandchildren of root, plus the value of root (which must be included), and return that.
In case 2, we return the max value of find-max-independent of the children nodes, if any (we can pick only one)
The algorithm may look something like this (in python):
def find_max_independent ( A ):
N=len(A)
def children ( i ):
for n in (2*i+1, 2*i+2):
if n<N: yield n
def gchildren ( i ):
for child in children(i):
for gchild in children(child):
yield gchild
memo=[None]*N
def rec ( root ):
"finds max independent set in subtree tree rooted at root. memoizes results"
assert(root<N)
if memo[root] != None:
return memo[root]
# option 'root not included': find the child with the max independent subset value
without_root = sum(rec(child) for child in children(root))
# option 'root included': possibly pick the root
# and the sum of the max value for the grandchildren
with_root = max(0, A[root]) + sum(rec(gchild) for gchild in gchildren(root))
val=max(with_root, without_root)
assert(val>=0)
memo[root]=val
return val
return rec(0) if N>0 else 0
Some test cases illustrated:
tests=[
[[1,2,3,4,5,6], 16], #1
[[-100,2,3,4,5,6], 6], #2
[[1,200,3,4,5,6], 200], #3
[[1,2,3,-4,5,-6], 6], #4
[[], 0],
[[-1], 0],
]
for A, expected in tests:
actual=find_max_independent(A)
print("test: {}, expected: {}, actual: {} ({})".format(A, expected, actual, expected==actual))
Sample output:
test: [1, 2, 3, 4, 5, 6], expected: 16, actual: 16 (True)
test: [-100, 2, 3, 4, 5, 6], expected: 15, actual: 15 (True)
test: [1, 200, 3, 4, 5, 6], expected: 206, actual: 206 (True)
test: [1, 2, 3, -4, 5, -6], expected: 8, actual: 8 (True)
test: [], expected: 0, actual: 0 (True)
test: [-1], expected: 0, actual: 0 (True)
Test case 1
Test case 2
Test case 3
Test case 4
The complexity of the memoized algorithm is O(n), since rec(n) is called once for each node. This is a top-down dynamic programming solution using depth-first-search.
(Test case illustrations courtesy of leetcode's interactive binary tree editor)
Your algorithm doesn't work because the set of nodes it returns will be either all from odd levels, or all from even levels. But the optimal solution can have nodes from both.
For example, consider a tree where all weights are 0 except for two nodes which have weight 1. One of these nodes is at level 1 and the other is at level 4. The optimal solution will contain both these nodes and have weight 2. But your algorithm will only give one of these nodes and have weight 1.

Resources