How to find the lowest superordinate of two concepts in an efficient way?
The lowest superordinate of two concepts in a taxonomy means the most specific common ancestor of the two concepts. For example, in the following picture of a taxonomy, how find the most common ancestor of sense 1 and sense 2?
BTW, I found this question in Roberto Navigli's survey of word sense disambiguation. He didn't mention how to compute the superordinate.
You could go up the hierarchy from the Sense 1 side and mark all of those nodes as ancestors of Sense 1. Then go up the Sense 2 side and check each ancestor to see if you marked it as an ancestor of Sense 1. The first one you find would be the lowest superordinate or most specific common ancestor.
In your picture, it would be the root node no matter which senses you start from.
http://en.wikipedia.org/wiki/Lowest_common_ancestor contains pointers to more efficient algorithms for the least common ancestor problem in the (usual) case where we have a tree, so every node has one parent, except for the root node, which has no parent. These algorithms can answer queries in constant time, after linear time pre-processing of the tree.
I do not know if any of these algorithms would generalise to the case where you can have more than one parent. It looks like the obvious generalisations would amount to deleting links in your tree to make sure that, while still connected, every node had at most one parent.
Related
Is there a data structure for a sorted set allows quick lookup of the n-th (i.e. the least but n-th) item? That is, something like a a hybrid between a rope and a red-black tree.
Seems like it should be possible to either keep track of the size of the left subtree and update it through rotations or do something else clever and I'm hoping someone smart has already worked this out.
Seems like it should be possible to either keep track of the size of the left subtree and update it through rotations […]
Yes, this is quite possible; but instead of keeping track of the size of the left subtree, it's a bit simpler to keep track of the size of the complete subtree rooted at a given node. (You can then get the size of its left subtree by examining its left-child's size.) It's not as tricky as you might think, because you can always re-calculate a node's size as long as its children are up-to-date, so you don't need any extra bookkeeping beyond making sure that you recalculate sizes by working your way up the tree.
Note that, in most mutable red-black tree implementations, 'put' and 'delete' stop walking back up the tree once they've restored the invariants, whereas with this approach you need to walk all the way back up the tree in all cases. That'll be a small performance hit, but at least it's not hard to implement. (In purely functional red-black tree implementations, even that isn't a problem, because those always have to walk the full path back up to create the new parent nodes. So you can just put the size-calculation in the constructor — very simple.)
Edited in response to your comment:
I was vaguely hoping this data structure already had a name so I could just find some implementations out there and that there was something clever one could do to minimize the updating but (while I can find plenty of papers on data structures that are variations of balanced binary trees) I can't figure out a good search term to look for papers that let one lookup the nth least element.
The fancy term for the nth smallest value in a collection is order statistic; so a tree structure that enables fast lookup by order statistic is called an order statistic tree. That second link includes some references that may help you — not sure, I haven't looked at them — but regardless, that should give you some good search terms. :-)
Yes, this is fully possible. Self-balancing tree algorithms do not actually need to be search trees, that is simply the typical presentation. The actual requirement is that nodes be ordered in some fashion (which a rope provides).
What is required is to update the tree weight on insert and erase. Rotations do not require a full update, local is enough. For example, a left rotate requires that the weight of the parent be added to the new parent (since that new parent is the old parent's right child it is not necessary to walk down the new parent's right descent tree since that was already the new parent's left descent tree). Similarly, for a right rotate it is necessary to subtract the weight of the new parent only, since the new parent's right descent tree will become the left descent tree of the old parent.
I suppose it would be possible to create an insert that updates the weight as it does rotations then adds the weight up any remaining ancestors but I didn't bother when I was solving this problem. I simply added the new node's weight all the way up the tree then did rotations as needed. Similarly for erase, I did the fix-up rotations then subtracted the weight of the node being removed before finally unhooking the node from the tree.
We all know what a tree is: on the first level of a tree we have a root, and from the root come branches that are trees as well. But how do I name the "opposite" structure: on the i-th level we have a set of "leaf" nodes, and those nodes form groups of 1+ nodes, and a group points to a "trunk" node on i+1th level. If you want a visual example, imagine raindrops flowing down a window and combining as they collide.
A lot of tree data structures are actually constructed from leaf to root, and can be stored to allow for going one or both directions.
I don't think it really has a special name as it's more a convention than a requirement for trees typically to go from root to leaf rather than the other way or both ways. Also there are a number of tree data structures that allow for going both ways.
Every tree is a DAG, a directed acyclic graph, and so is the data-structure that you describe. What you describe is also a multitree, a subset of DAGs. Possibly there is a more precise real subset of multitrees that describes your graph, but I am not aware of it. Hope this helps.
So I have a huge list of chemicals within an organism, with the data for both their precursor chemicals, and the ones they created.
I was thinking that some sort of tree structure would be appropriate; each chemical is a node, each parent is a precursor, each child is a product.
Each node could have more than one parent or more than one child, hence my confusion!
However, the main function in this structure will be to find ALL the chemical pathways to make it, and I'm not sure if a tree would be the most efficient at this sort of search.
My question is: is there a more appropriate data structure for this type of data and operation?
I think your data structure is a directed graph.
The brute-force approach for finding all the pathways from A to B would be to do a breadth-first search starting in A, and cover as much of the graph as you are allowed to.
This guarantees that the paths you'll find will be ordered in length from shortest to longest.
Whenever you hit B, you should mark all of the nodes in that path as 'leading to B'. This way you can account for convergent pathways without having to walk the graph more than once.
Keep in mind that, unless you constrain it, it can be posible for the graph to contain loops. A loop in a pathway leading from A to B presents you with infinite pathways, so it's up to you how you'd like to handle this cases.
I've been implementing my own version of a red-black tree, mostly basing my algorithms from Wikipedia (http://en.wikipedia.org/wiki/Red-black_tree). Its fairly concise for the most part, but there's one part that I would like clarification on.
When erasing a node from the tree that has 2 non-leaf (non-NULL) children, it says to move either side's children into the deletable node, and remove that child.
I'm a little confused as to which side to remove from, based on that. Do I pick the side randomly, do I alternate between sides, or do I stick to the same side for every future deletion?
If you have no prior knowledge about your input data, you cannot know which side is more benefitial of being the new intermediate node or the new child.
You can therefore just apply the rule that fits you most (is easiest to write/compute -- probably "always take the left one"). Employing a random scheme typically just introduces more unneeded computation.
I'm using the Lengauer and Tarjan algorithm with path compression to calculate the dominator tree for a graph where there are millions of nodes. The algorithm is quite complex and I have to admit I haven't taken the time to fully understand it, I'm just using it. Now I have a need to calculate the dominator trees of the direct children of the root node and possibly recurse down the graph to a certain depth repeating this operation. I.e. when I calculate the dominator tree for a child of the root node I want to pretend that the root node has been removed from the graph.
My question is whether there is an efficient solution to this that makes use of immediate dominator information already calculated in the initial dominator tree for the root node? In other words I don't want to start from scratch for each of the children because the whole process is quite time consuming.
Naively it seems it must be possible since there will be plenty of nodes deep down in the graph that have idoms just a little way above them and are unaffected by changes at the top of the graph.
BTW just as aside: it's bizarre that the subject of dominator trees is "owned" by compiler people and there is no mention of it in books on classic graph theory. The application I'm using it for - my FindRoots java heap analyzer - is not related to compiler theory.
Clarification: I'm talking about directed graphs here. The "root" I refer to is actually the node with the greatest reachability. I've updated the text above replacing references to "tree" with "graph". I tend to think of them as trees because the shape is mainly tree-like. The graph is actually of the objects in a java heap and as you can imagine is reasonably hierarchical. I have found the dominator tree useful when doing OOM leak analysis because what you are interested in is "what keeps this object alive?" and the answer ultimately is its dominator. Dominator trees allow you to <ahem> see the wood rather than the trees. But sometimes lots of junk floats to the top of the tree so you have a root with thousands of children directly below it. For such cases I would like to experiment with calculating the dominator trees rooted at each of the direct children (in the original graph) of the root and then maybe go to the next level down and so on. (I'm trying not to worry about the possibility of back links for the time being :)
boost::lengauer_tarjan_dominator_tree_without_dfs might help.
Judging by the lack of comments, I guess there aren't many people on Stackoverflow with the relevent experience to help you. I'm one of those people, but I don't want such an interesting question go down with with a dull thud so I'll try and lend a hand.
My first thought is that if this graph is generated by other compilers would it be worth taking a look at an open-source compiler, like GCC, to see how it solves this problem?
My second thought is that, the main point of your question appears to be avoiding recomputing the result for the root of the tree.
What I would do is create a wrapper around each node that contains the node itself and any pre-computed data associated with that node. A new tree would then be reconstructed from the old tree recursively using these wrapper classes. As you're constructing this tree, you'd start at the root and work your way out to the leaf nodes. For each node, you'd store the result of the computation for all the ancestory thus far. That way, you should only ever have to look at the parent node and the current node data you're processing to compute the value for your new node.
I hope that helps!
Could you elaborate on what sort of graph you're starting with? I don't see how there is any difference between a graph which is a tree, and the dominator tree of that graph. Every node's parent should be its idom, and it would of course be dominated by everything above it in the tree.
I do not fully understand your question, but it seems to me you want to have some incremental update feature. I researched a while ago what algorithms are their but it seemed to me that there's no known way for large graphs to do this quickly (at least from a theoretical standpoint).
You may just search for "incremental updates dominator tree" to find some references.
I guess you are aware the Eclipse Memory Analyzer does use dominator trees, so this topic is not completely "owned" by the compiler community anymore :)