Efficient way to recursively calculate dominator tree? - algorithm

I'm using the Lengauer and Tarjan algorithm with path compression to calculate the dominator tree for a graph where there are millions of nodes. The algorithm is quite complex and I have to admit I haven't taken the time to fully understand it, I'm just using it. Now I have a need to calculate the dominator trees of the direct children of the root node and possibly recurse down the graph to a certain depth repeating this operation. I.e. when I calculate the dominator tree for a child of the root node I want to pretend that the root node has been removed from the graph.
My question is whether there is an efficient solution to this that makes use of immediate dominator information already calculated in the initial dominator tree for the root node? In other words I don't want to start from scratch for each of the children because the whole process is quite time consuming.
Naively it seems it must be possible since there will be plenty of nodes deep down in the graph that have idoms just a little way above them and are unaffected by changes at the top of the graph.
BTW just as aside: it's bizarre that the subject of dominator trees is "owned" by compiler people and there is no mention of it in books on classic graph theory. The application I'm using it for - my FindRoots java heap analyzer - is not related to compiler theory.
Clarification: I'm talking about directed graphs here. The "root" I refer to is actually the node with the greatest reachability. I've updated the text above replacing references to "tree" with "graph". I tend to think of them as trees because the shape is mainly tree-like. The graph is actually of the objects in a java heap and as you can imagine is reasonably hierarchical. I have found the dominator tree useful when doing OOM leak analysis because what you are interested in is "what keeps this object alive?" and the answer ultimately is its dominator. Dominator trees allow you to <ahem> see the wood rather than the trees. But sometimes lots of junk floats to the top of the tree so you have a root with thousands of children directly below it. For such cases I would like to experiment with calculating the dominator trees rooted at each of the direct children (in the original graph) of the root and then maybe go to the next level down and so on. (I'm trying not to worry about the possibility of back links for the time being :)

boost::lengauer_tarjan_dominator_tree_without_dfs might help.

Judging by the lack of comments, I guess there aren't many people on Stackoverflow with the relevent experience to help you. I'm one of those people, but I don't want such an interesting question go down with with a dull thud so I'll try and lend a hand.
My first thought is that if this graph is generated by other compilers would it be worth taking a look at an open-source compiler, like GCC, to see how it solves this problem?
My second thought is that, the main point of your question appears to be avoiding recomputing the result for the root of the tree.
What I would do is create a wrapper around each node that contains the node itself and any pre-computed data associated with that node. A new tree would then be reconstructed from the old tree recursively using these wrapper classes. As you're constructing this tree, you'd start at the root and work your way out to the leaf nodes. For each node, you'd store the result of the computation for all the ancestory thus far. That way, you should only ever have to look at the parent node and the current node data you're processing to compute the value for your new node.
I hope that helps!

Could you elaborate on what sort of graph you're starting with? I don't see how there is any difference between a graph which is a tree, and the dominator tree of that graph. Every node's parent should be its idom, and it would of course be dominated by everything above it in the tree.

I do not fully understand your question, but it seems to me you want to have some incremental update feature. I researched a while ago what algorithms are their but it seemed to me that there's no known way for large graphs to do this quickly (at least from a theoretical standpoint).
You may just search for "incremental updates dominator tree" to find some references.
I guess you are aware the Eclipse Memory Analyzer does use dominator trees, so this topic is not completely "owned" by the compiler community anymore :)

Related

Which data structure/algorithm can find the smallest node on a path of a dynamic MST (or a rooted tree)

On a given MST (or a rooted tree), we can perform these tasks:
To all nodes in a given subtree rooted at x add value A.
[Answered] Using Euclidean path, first and last appearance of x.
Report maximum value on the path from i to j.
Which data structure/algorithm will take the smallest time for both queries ?
I do not need code for this. I only want to know the idea behind the solution.
I implemented a generalized Sleator--Tarjan tree that can provide both operations in logarithmic time and several others besides (http://www.davideisenstat.com/dtree/). The linked implementation is amortized, but there's no reason a worst-case version couldn't be done other than practical inefficiency. Please talk to me if you ever consider writing your own; there's a lot of complexity there that may be unnecessary for your use case.
To describe the idea at a very high level requires a thumbnail sketch of how S--T trees are organized. We root the tree somewhere and decompose it into disjoint paths. Each of the paths is stored as a splay tree where each data structural node stores the maximum of its data structural subtree. The S--T operations allow the path collection to be manipulated to include the query path (Expose and Evert one end, then Expose the other). The splay tree also allows a value to be added to all of its nodes (i.e., a whole path). The trick relative to the original S--T paper is that the paths other than the one currently exposed can be updated lazily from their parent path, allowing subtree updates.
You could also look at a top tree implementation, but I personally find the top tree interface to be difficult to reason about, and the existing implementation of which I'm aware is significantly less efficient in practice.

Difference between BFS and DFS

I am reading about DFS in Introduction to Algorithms by Cormen. Following is text
snippet.
Unlike BFS, whose predecessor subgraph forms a tree, the predecessor
subgrpah produced by DFS may be composed of several trees, because
search may be repeated from multiple sources.
In addition to above notes, following is mentioned.
It may seem arbitrary that BFS is limited to only one source where as
DFS may search from multiple sources. Although conceptually, BFS
could proceed from mutilple sources and DFS could limited to one
source, our approach reflects how the results of these searches are
typically used.
My question is
Can any one give an example how BFS is used with multiple source and
DFS is used with single source?
When it says multiple sources it is referring to the start node of the search. You'll notice that the parameters of the algorithms are BFS(G, s) and DFS(G). That should already be a hint that BFS is single-source and DFS isn't, since DFS doesn't take any initial node as an argument.
The major difference between these two, as the authors point out, is that the result of BFS is always a tree, whereas DFS can be a forest (collection of trees). Meaning, that if BFS is run from a node s, then it will construct the tree only of those nodes reachable from s, but if there are other nodes in the graph, will not touch them. DFS however will continue its search through the entire graph, and construct the forest of all of these connected components. This is, as they explain, the desired result of each algorithm in most use-cases.
As the authors mentioned there is nothing stopping slight modifications to make DFS single source. In fact this change is easy. We simply accept another parameter s, and in the routine DFS (not DFS_VISIT) instead of lines 5-7 iterating through all nodes in the graph, we simply execute DFS_VISIT(s).
Similarly, changing BFS is possible to make it run with multiple sources. I found an implementation online: http://algs4.cs.princeton.edu/41undirected/BreadthFirstPaths.java.html although that is slightly different to another possible implementation, which creates separate trees automatically. Meaning, that algorithm looks like this BFS(G, S) (where S is a collection of nodes) whereas you can implement BFS(G) and make separate trees automatically. It's a slight modification to the queueing and I'll leave it as an exercise.
As the authors point out, the reason these aren't done is that the main use of each algorithm lends to them being useful as they are. Although well done for thinking about this, it is an important point that should be understood.
Did you understand the definition? Did you see some pictures on the holy book?
When it says that DFS may be composed of several trees it is because, it goes deeper until it reaches to a leaf and then back tracks. So essentially imagine a tree, first you search the left sub tree and then right subtree. left sub tree may contain several sub trees. that s why.
When you think about BFS, it s based on level. level first search in other words. so you have a single source (node) than you search all the sub nodes of that level.
DFS with single source if there is only one child node, so you have only 1 source. i think it would be more clear if you take the source as parent node.

Implementing Kruskal's algorithm in Ada, not sure where to start

With reference to Kruskal's algorithm in Ada, I'm not sure where to start.
I'm trying to think through everything before I actually write the program, but am pretty lost as to what data structures I should be using and how to represent everything.
My original thought is to represent the full tree in an adjacency list, but reading Wikipedia the algorithm states to create a forest F (a set of trees), where each vertex in the graph is a separate tree and I'm not sure how to implement this without getting really messy quickly.
The next thing it says to do is create a set S containing all the edges in the graph, but once again I'm not sure what the best way to do this would be. I was thinking of an array of records, with a to, from and weight, but I'm lost on the forest.
Lastly, I'm trying to figure out how I would know if an edge connects two trees, but again am not sure what the best way to do all of this is.
I can see where their algorithm description would leave you confused as how to start. It left me the same way.
I'd suggest reading over the later Example section instead. That makes it pretty clear how to proceed, and you can probably come up with the data structures you would need to do it just from that.
It looks like the basic idea is the following:
Take the graph, find the shortest edge that introduces at least one new vertex, and put it in your "spanning tree".
Repeat the step above until you have every vertex.
The "create a forest part" really means: implement the pseudocode from the page Disjoint-set data structure. If you can read C++, then I have a pretty straightforward implementation here. (That implementation works, I've used it to implement Kruskal's algo myself :)

What is a fast algorithm for finding critical nodes?

I'm looking for a quick method/algorithm for finding which nodes in a graph is critical.
For example, in this graph:
Node number 2 and 5 are critical.
My current method is to try removing one non-endpoint node from the graph at a time and then check if the entire network can be reached from all other nodes. This method is obvious not very efficient.
What are a better way?
See biconnected components. Calling them articulation points instead of critical nodes seems to yield better search results.
In any case, the algorithm consists of a simple depth first search where you maintain certain information for each node.
there are several better ways. research is always helpful
but since this is homework, the point of the exercise is likely to be to figure it out yourself
hint: how could you decorate the graph to tell you what nodes depend on what other nodes, and would this information perhaps be useful to spot the critical nodes?

Algorithm to computer the optimal layout of n-ary tree?

I am looking for an algorithm that will automatically arrange all the nodes in an n-tree so that no nodes overlap, and not too much space is wasted. The user will be able to add nodes at runtime and the tree must auto arrange itself. Also note it is possible that the tree's could get fairly large ( a few thousand nodes ).
The algorithm has to work in real time, meaning the user cannot notice any pausing.
I have tried Google but I haven't found any substantial resources, any help is appreciated!
I took a look at this problem a while back and decided ultimately to change my goals from a Directed acyclic graph (DAG) to a general graph only due to complexities of what I encountered.
That being said, have you looked at the Sugiyama algorithm for graph layout?
If you're not looking to roll your own, I came across yFiles that did the job quite nicely (a bit on the pricy side though, so I did end up doing exactly that - rolling my own).

Resources