trees for family tree creation

trees for family tree creation - algorithm

I'm looking for a good algorithm to work out the lowest common ancestor for two people, without an initial overall tree.
If I have a child, I can make a call to get the parent. So if I have two children, I would need to make calls on each until they have a common ancestor. By this point I should have two lists, which I can build into a tree.
What kind of algorithm would be good for this?

If there's some basic information that determines whether a specific node can "in principle" be the parent of another specific node, it would allow you to determine the common ancestor (if any) quickly/efficiently. That's what my question about "age" - see comment - was about.
An age-property obviously would give you that type of information.
Assuming, from your answer, that there's no such information available, there are two obvious approaches:
A. scan both trees upwards (getParent, etcetera), building for each initial child the ancestor list, meanwhile checking with every new ancestor whether it occurs in the other child's ancestor list. Generally speaking, there is no reason to assume that the common ancestor is at ("almost") equal distance from the two children, so any order in which you build the respective ancestor-lists is equally like to give you the common ancestor in the fewest possible steps.
A drawback of this method is that you run the risk of having to search in the mutual lists a large number of times for the occurrence of a new candidate. This is potentially slow.
B. an alternative is to simply build each of the two ancestorlists independently until exhaustion. This is probably relatively fast as you are not interrupted at each step by having to check the contents of "the other" list.
Then, when you've finished the two lists, you either have a common ancestor or not, simply by verifying the top-items against each other.
If no, you're done (no result). If yes, you walk back down the lists synchronously and check where they separate, again avoiding the list-search.
From a general point of view I'd prefer method B. Occasionally method A will of course produce the result faster, but in bad cases it may become tedious/slow, which will not happen with method B. I assume, of course, that any ancestorlist cannot be circular, i.e, that a node cannot be an ancestor of itself.
As a final note: IF, after all, you do have a property that "orders" nodes in the sense of "age", then you should build the two lists simultaneously, in the sense that you always process the list that has currently a top node which cannot possibly be the parent of the other list's topnode. This allows you to check only the current topnodes against each other (and not any other members of the lists).

Related

Is it theoretically beneficial to implement a linked list which remembers the node previously accessed?

I intended to implement a linked list that remembers the node previously accessed but not sure whether it is theoretically more beneficial to do so.
For example, if the user first calls an api which retrieves the 300th element and then makes another call which retrieves the 310th element. A normal linked list would do two potentially expensive linear lookup starts from the head or from the tail of the list. But if an implementation somehow remembers the 300th node during the first call and begins the second lookup from it, the second call would be much cheaper.
I failed to find related information about this topic, could anyone advise my thought is correct or wrong?

I don't see the point in using a linked list for lookups, as there are better alternatives: priority queue, binary search tree, hash table, ...
But to answer the question:
If you keep a reference to the last accessed node in the list, and first start searching from that node, there is a possibility that you will not find the target node in that first scan, and still have to do a second scan, starting this time from the head of the list (making sure not to go beyond the node from where you started the first scan).
If the node being searched for has an equal probability to be at any index in the list, then there is no gain.
To see this clearly, imagine for a moment that the list is circular, where the tail is linked to the head. This actually represents what I described above: that a failing first scan must continue from the head. As assumed, all nodes in this circular list have an equal probability to be the node being searched for, and so there really is no preferential node: all nodes have an equal role in the "circle". It can easily be seen that any node would serve just as well as starting point for the search as any other.
It is only when you have more information, and there is a higher probability that the node being searched for is closer to the end of the list than the beginning, that you may find benefit in starting the search from any node you may still have access to.

Data structure: a graph that's similar to a tree - but not a tree

I have implemented a data structure in C, based upon a series of linked lists, that appears to be similar to a tree - but not enough to be referred as such, because in theory it allows the existence of cycles. Here's a basic outline of the nodes:
There is a single, identifiable root that doesn't have a parent node or brothers;
Each node contains a pointer to its "father", its nearest "brother" and the first of his "children";
There are "outer" nodes without children and brothers.
How can I name such a data structure? It cannot be a tree, because even if the pointers are clearly labelled and used differently, cycles like father->child->brother->father may very well exist. My question is: terms such as "father", "children" and "brother" can be used in the context of a graph or they are only reserved for trees? After quite a bit of research I'm still unable to clarify this matter.
Thanks in advance!

I'd say you can still call it a tree, because the foundation is a tree data structure. There is precedence for my claim: "Patricia Tries" are referred to as trees even though their leaf nodes may point up the tree (creating cycles). I'm sure there are other examples as well.
It sounds like the additional links you have are essentially just for convenience, and could be determined implicitly (rather than stored explicitly). By storing them explicitly you impose additional invariants on your tree operations (insert, delete, etc), but do not affect the underlying organization, which is a tree.
Precisely because you are naming and treating those additional links separately, they can be thought of as an "overlay" on top of your tree.
In the end it doesn't really matter what you call it or what category it falls into if it works for you. Reading a book like Sedgewick's "Algorithms in C" you realize that there are tons of data structures out there, and nothing wrong with inventing your own!
One more point: Trees are a special case of graphs, so there is nothing wrong with referring to it as a graph (or using graph algorithms on it) as well.

How to get parent in a k-ary level order succint trie?

I'm implementing a level order succint trie and I wan't to be able for a given node to jump back to his parent.
I tried several combination of rank/level but I can't wrap my head around this one...
I'm using this article as a base documentation :
http://stevehanov.ca/blog/index.php?id=120
It explain how to traverse childs, but not how to go up.
Thanks to this MIT lecture (http://www.youtube.com/watch?v=1MVVvNRMXoU) I know this is possible (in constant time as stated at 15:50), but the speaker only explain it for binary trie (eg: using the formula select1(floor(i/2)) ).
How can I do that on a k-ary trie?

Well, I don't know what select1() is, but the other part (floor(i/2)) looks like the trick you would use in an array-embedded binary tree, like those described here. You would divide by 2 because every parent has exactly 2 children --> every level uses twice the space of the parent level.
If you don't have the same number of children in every node (excepting leafs and perhaps one node with less children), you can't use this trick.
If you want to know the parent of any given node, you will need to add a pointer to the parent in every node.
Though, since trees are generally traversed starting at the root and going down, the usual thing to do is to store, in an array, the pointers to the nodes of the path. At any given point, the parent of the current node is the previous element in the array. This way you don't need to add a pointer to the parent in every node.

I think I've found my answer. This paper of Guy Jacobson explains it in section 3.2 Level-order unary degree sequence.
parent(x){ select1(rank0(x)) }
Space-efficient Static Trees and Graphs
http://www.cs.cmu.edu/afs/cs/project/aladdin/wwwlocal/compression/00063533.pdf
This work pretty good, as long as you don't mess up your node numbering like I was.

Why No Cycles in Eric Lippert's Immutable Binary Tree?

I was just looking at Eric Lippert's simple implementation of an immutable binary tree, and I have a question about it. After showing the implementation, Eric states that
Note that another nice feature of
immutable data structures is that it
is impossible to accidentally (or
deliberately!) create a tree which
contains a cycle.
It seems that this feature of Eric's implementation does not come from the immutability alone, but also from the fact that the tree is built up from the leaves. This naturally prevents a node from having any of its ancestors as children. It seems that if you built the tree in the other direction, you'd introduce the possibility of cycles.
Am I right in my thinking, or does the impossibility of cycles in this case come from the immutability alone? Considering the source, I wonder whether I'm missing something.
EDIT: After thinking it over a bit more, it seems that building up from the leaves might be the only way to create an immutable tree. Am I right?

If you're using an immutable data structure, in a strict (as opposed to lazy) language, it's impossible to create a cycle; as you must create the elements in some order, and once an element is created, you cannot mutate it to point at an element created later. So if you created node n, and then created node m which pointed at n (perhaps indirectly), you could never complete the cycle by causing n to point at m as you are not allowed to mutate n, nor anything that n already points to.
Yes, you are correct that you can only ever create an immutable tree by building up from the leaves; if you started from the root, you would have to modify the root to point at its children as you create them. Only by starting from the leaves, and creating each node to point to its children, can you construct a tree from immutable nodes.

If you really want to try hard at it you could create a tree with cycles in it that is immutable. For example, you could define an immutable graph class and then say:
Graph g = Graph.Empty
.AddNode("A")
.AddNode("B")
.AddNode("C")
.AddEdge("A", "B")
.AddEdge("B", "C")
.AddEdge("C", "A");
And hey, you've got a "tree" with "cycles" in it - because of course you haven't got a tree in the first place, you've got a directed graph.
But with a data type that actually uses a traditional "left and right sub trees" implementation of a binary tree then there is no way to make a cyclic tree (modulo of course sneaky tricks like using reflection or unsafe code.)

When you say "built up from the leaves", I guess you're including the fact that the constructor takes children but never takes a parent.
It seems that if you built the tree in
the other direction, you'd introduce
the possibility of cycles.
No, because then you'd have the opposite constraint: the constructor would have to take a parent but never a child. Therefore you can never create a descendant until all its ancestors are created. Therefore no cycles are possible.
After thinking it over a bit more, it
seems that building up from the leaves
might be the only way to create an
immutable tree. Am I right?
No... see my comments to Brian and ergosys.
For many applications, a tree whose child nodes point to their parents is not very useful. I grant that. If you need to traverse the tree in an order determined by its hierarchy, an upward-pointing tree makes that hard.
However for other applications, that sort of tree is exactly the sort we want. For example, we have a database of articles. Each article can have one or more translations. Each translation can have translations. We create this data structure as a relational database table, where each record has a "foreign key" (pointer) to its parent. None of these records need ever change its pointer to its parent. When a new article or translation is added, the record is created with a pointer to the appropriate parent.
A common use case is to query the table of translations, looking for translations for a particular article, or translations in a particular language. Ah, you say, the table of translations is a mutable data structure.
Sure it is. But it's separate from the tree. We use the (immutable) tree to record the hierarchical relationships, and the mutable table for iteration over the items. In a non-database situation, you could have a hash table pointing to the nodes of the tree. Either way, the tree itself (i.e. the nodes) never get modified.
Here's another example of this data structure, including how to usefully access the nodes.
My point is that the answer to the OP's question is "yes", I agree with the rest of you, that the prevention of cycles does come from immutability alone. While you can build a tree in the other direction (top-down), if you do, and it's immutable, it still cannot have cycles.
When you're talking about powerful theoretical guarantees like
another nice feature of immutable data structures is that
it is impossible to accidentally (or
deliberately!) create a tree which
contains a cycle [emphasis in original]
"such a tree wouldn't be very useful" pales in comparison -- even if it were true.
People create un-useful data structures by accident all the time, let alone creating supposedly-useless ones on purpose. The putative uselessness doesn't protect the program from the pitfalls of cycles in your data structures. A theoretical guarantee does (assuming you really meet the criteria it states).
P.S. one nice feature of upward-pointing trees is that you can guarantee one aspect of the definition of trees that downward-pointing tree data structures (like Eric Lippert's) don't: that every node has at most one parent. (See David's comment and my response.)

You can't build it from the root, it requires you to mutate nodes you already added.

What's a good data structure for building equivalence classes on nodes of a tree?

I'm looking for a good data structure to build equivalence classes on nodes of a tree. In an ideal structure, the following operations should be fast (O(1)/O(n) as appropriate) and easy (no paragraphs of mystery code):
(A) Walk the tree from the root; on each node --> child transition enumerate all the equivalent versions of the child node
(B) Merge two equivalence classes
(C) Create new nodes from a list of existing nodes (the children) and other data
(D) Find any nodes structurally equivalent to node (i.e. they have the same number of children, corresponding children belong to the same equivalence class, and their "other data" is equal) so that new (or newly modified) nodes may be put in the right equivalence class (via a merge)
So far I've considered (some of these could be used in combination):
A parfait, where the children are references to collections of nodes instead of to nodes. (A) is fast, (B) requires walking the tree and updating nodes to point to the merged collection, (C) requires finding the collection containing each child of the new node, (D) requires walking the tree
Maintaining a hash of nodes by their characteristics. This makes (D) much faster but (B) slower (since the hash would have to be updated when equivalence classes were merged)
String the nodes together into a circular linked list. (A) is fast, (B) would be fast but for the fact that that "merging" part of a circular list with itself actually splits the list (C) would be fast, (D) would require walking the tree
Like above, but with an additional "up" pointer in each node, which could be used to find a canonical member of the circular list.
Am I missing a sweet alternative?

You seem to have two forms of equivalence to deal with. Plain equivalence (A), tracked as equivalence classes which are kept up to date and structural equivalence (D), for which you occasionally go build a single equivalence class and then throw it away.
It sounds to me like the problem would be conceptually simpler if you maintain equivalence classes for both plain and structural equivalence. If that introduces too much churn for the structural equivalence, you could maintain equivalence classes for some aspects of structural equivalence. Then you could find a balance where you can afford the maintenance of those equivalence classes but still greatly reduce the number of nodes to examine when building a list of structurally equivalent nodes.

I don't think any one structure is going to solve your problems, but you might take a look at the Disjoint-set data structure. An equivalence class, after all, is the same thing as a partitioning of a set. It should be able to handle some of those operations speedily.

Stepping back for a moment I'd suggest against using a tree at all. Last time I had to confront a similar problem, I began with a tree, but later moved onto an array.
Reasons being multiple but number one reason was performance, my classes with up to 100 or so children would actually perform better while manipulating them as array than through the the nodes of a tree, mostly because of hardware locality, and CPU prefetch logic, and CPU pipelining.
So although algorithmically an array structure requires a larger N of operations than a tree, performing these dozens of operations is likely faster than chasing pointers across memory.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio