Benefit of a sentinel node in a red black tree? - algorithm

I created a doubly-linked list, and the benefits of a sentinel node were clear - no null checks or special cases at list boundaries.
Now I'm writing a red black tree, and trying to figure out if there is any benefit to such a concept.
My implementation is based on the last two functions in this article
(top down insertion/deletion). The author uses a "dummy tree root" or "head" to avoid special cases at the root for his insertion/deletion algorithms. The author's head node is scoped to the functions themselves - seemingly due to it's limited usefulness.
One other article I came across mentioned using a permanent root above the head as an "end" for iteration. This seems interesting, but I tried it and couldn't see any real benefit over using NULL as an end iterator. I also found several articles that used a shared sentinel to represent all empty leaf nodes, but this seems even more pointless than the first case.
Can anyone elaborate on how a sentinel node would be useful in a red black tree?

Related

Meaning of merge, phi, effectphi and dead in v8 terminology

I’m trying to read the v8 source code (in particular the compiler part of it) to understand better the optimisation and reduction procedures (in order to look for bugs).
I ran into a few terms that are used in the comments but seem to be unexplained. The comment is this:
// Check if this is a merge that belongs to an unused diamond, which means
// that:
//
// a) the {Merge} has no {Phi} or {EffectPhi} uses, and
// b) the {Merge} has two inputs, one {IfTrue} and one {IfFalse}, which are
// both owned by the Merge, and
// c) and the {IfTrue} and {IfFalse} nodes point to the same {Branch}.
What do the terms Merge, Phi and EffectPhi mean? Also, does marking a node as “dead” mean that it will treated as redundant?
Thanks in advance.
The link to the above code is this:
https://chromium.googlesource.com/v8/v8.git/+/refs/heads/master/src/compiler/common-operator-reducer.cc
V8 developer here. As background knowledge, it helps to know that V8's "Turbofan" compiler uses the "SSA" (static single assignment) and "sea of nodes" concepts. There are various compiler textbooks and research papers that explain these in great detail. To answer your questions in short:
A "Merge" node merges two control nodes, i.e. two branches of control flow. You can think of it as the "opposite" of a Branch, or the equivalent of a Phi for control nodes. Control nodes are the mechanism that Turbofan's sea-of-nodes design uses to make sure nodes aren't reordered across certain control flow boundaries.
A "Phi" node merges the two (or more) possibilities for a value that have been computed by different branches. See https://en.wikipedia.org/wiki/Static_single_assignment_form for more.
An "EffectPhi" is a special version of a Phi node that's used for nodes on the "effect chain". The effect chain is the mechanism Turbofan uses to make sure that nodes' external effects (like memory loads and stores) aren't visibly reordered.
A "dead" node is one that's unreachable and can be eliminated. So it's "redundant" in the sense of "superfluous/unnecessary", but not in the sense of "the same as another node".

Creating an Intersecting Tree in Fortran

I want to create a tree in Fortran (90) like the one in this picture:
The idea is that I would then be able to follow a path through the tree starting from the root in the following way. At each node perform a check with the value stored there: passing the check move to the left-most child, not passing or having reached a leaf-node move the highest node that the traversal hasn't been to yet. Here is an example of a possible traversal (green indicating passing the test and red not passing):
Importantly, not every node is reached (the ones in black) which is the point of the procedure actually.
So, I think I need is a subroutine that would insert nodes in the tree, in order to build it and another that would allow me to follow paths of the kind described above.
My question is, is this possible? Does this data structure have a name?
Granted that I have virtually no experience with building data structures of this kind, Google has not been much help. An example code would be great but I would be happy to just be referred to some reading where I could learn this.

Barnes-Hut tree creating

I am currently trying to create a Barnes-Hut octree, however, I still not fully understand how to do this properly. I have read threads here, this article and some others. I believe I do understand how to make a tree if every node contains the information about the indices of particles inside, and if you keep storing the empty nodes. But if you do not want to? How to make a tree such that at the end you will only have necessary information: say, monopoles and quadrupoles for all non-empty nodes. I made so many different attempts that now I am completely confused, to be honest. What should I contain in each node? What would be the pseudocode for such thing?
P.S. By the way, is it different for monopoles and quadrupoles? I mean I can imagine that you do not need the exact information about the particles inside the node to calculate a monopole (it is just a full mass of node), but for quadruple?
Thank you in advance!
P.S. By the way, I use julia language if it is somehow relevant.

Suitable tree data structure

Which is the most suitable tree data structure to model a hierarchical (containment relationship) content. My language is bit informal as I don't have much theoretical background on these
Parent node can have multiple children.
Unique parent
Tree structure is rarely changed, os it is ok to recreate than add/rearrange nodes.
Two way traversal
mainly interested in, find parent, find children, find a node with a unique id
Every node has a unique id
There might be only hundreds of nodes in total, so performance may not be big influence
Persistence may be good to have, but not necessary as I plan to use it in memory after reading the data from DB.
My language of choice is go (golang), so available libraries are limited. Please give a recommendation without considering the language which best fit the above requirement.
http://godashboard.appspot.com/ lists some of the available tree libraries. Not sure about the quality and how active they are. I read god about
https://github.com/petar/GoLLRB
http://www.stathat.com/src/treap
Please let know any additional information required.
Since there are only hundreds of nodes, just make the structure the same as you have described.
Each node has a unique reference to parent node.
Each node has a list of child node.
Each node has an id
There is a (external) map from id --> node. May not even necessary.
2 way traversal is possible, since parent and child nodes are know. Same for find parent and find child.
Find id can be done by traverse the whole tree, if no map. Or you can use the map to quickly find the node.
Add node is easy, since there is a list in each node. Rearrange is also easy, since you can just freely add/remove from list of child nodes and reassign the parent node.
I'm answering this question from a language-agnostic aspect. This is a tree structure without any limit, so implementation is not that popular.
I think B-Tree is way to go for your requirements. http://en.wikipedia.org/wiki/B-tree
Point 1,2,3: B-Tree inherently support this. (multiple children, unique parent, allows insertion/deletion of elements
Point 4,5: each node will have pointers for its child by default implementation . Additionally you can maintain pointer of parent for each node. you can implement your search/travers operations with BFS/DFS with help of these pointers
Pomit 6: depends on implementation of your insert method if you dont allow duplicate records
Pont 7,8: not a issue as for you have mentioned that you have only hundreds of records. Though B-Trees are quite good data structure for external disk storage also.

Store hierarchies in a way that is resistant to corruption

I was thinking today about the best way to store a hierarchical set of nodes, e.g.
(source: www2002.org)
The most obvious way to represent this (to me at least) would be for each node to have nextSibling and childNode pointers, either of which could be null.
This has the following properties:
Requires a small number of changes if you want to add in or remove a node somewhere
Is highly susceptible to corruption. If one node was lost, you could potentially lose a large amount of other nodes that were dependent on being found through that node's pointers.
Another method you might use is to come up with a system of coordinates, e.g. 1.1, 1.2, 1.2.1, 1.2.2. 1.2.3 would be the 3rd node at the 3rd level, with the 2nd node at the prior level as its parent. Unanticipated loss of a node would not affect the ability to resolve any other nodes. However, adding in a node somewhere has the potential effect of changing the coordinates for a large number of other nodes.
What are ways that you could store a hierarchy of nodes that requires few changes to add or delete a node and is resilient to corruption of a few nodes? (not implementation-specific)
When you refer to corruption, are you talking about RAM or some other storage? Perhaps during transmission over some medium?
In any case, when you are dealing with data corruption you are talking about an entire field of computer science that deals with error detection and correction.
When you talk about losing a node, the first thing you have to figure out is 'how do I know I've lost a node?', that's error detection.
As for the problem of protecting data from corruption, pretty much the only way to do this is with redundancy. The amount of redundancy is determined by what limit you want to put on the degree of corruption you would like to be able to recover from. You can't really protect yourself from this with a clever structure design as you are just as likely to suffer corruption to the critical 'clever' part of your structure :)
The ever-wise wikipedia is a good place to start: Error detection and correction
I was thinking today about the best way to store a hierarchical set of nodes
So you are writing a filesystem? ;-)
The most obvious way to represent this (to me at least) would be for each node to have nextSibling and childNode pointers
Why? The sibling information is present at the parent node, so all you need is a pointer back to the parent. A doubly linked-list, so to speak.
What are ways that you could store a hierarchy of nodes that requires few changes to add or delete a node and is resilient to corruption of a few nodes?
There are actually two different questions involved here.
Is the data corrupt?
How do I fix corrupt data (aka self healing systems)?
Answers to these two questions will determine the exact nature of the solution.
Data Corruption
If your only aim is to know if your data is good or not, store a hash digest of child node information with the parent.
Self Healing Structures
Any self healing structure will need the following information:
Is there a corruption? (See above)
Where is the corruption?
Can it be healed?
Different algorithms exist to fix data with varying degree of effectiveness. The root idea is to introduce redundancy. Reconstruction depends on your degree of redundancy. Since the best guarantees are given by the most redundant systems -- you'll have to choose.
I believe there is some scope of narrowing down your question to a point where we can start discussing individual bits and pieces of the puzzle.
A simple option is to store reference to root node in every node - this way it is easy to detect orphan nodes.
Another interesting option is to store hierarchy information as a descendants (transitive closure) table. So for 1.2.3 node you'd have the following relations:
1., 1.2.3. - root node is ascendant of 1.2.3.
1.2., 1.2.3. - 1.2. node is ascendant of 1.2.3.
1., 1.2. - root node is ascendant of 1.2.
etc...
This table can be more resistant to errors as it holds some redundant info.
Goran
The typical method to store a hierarchy, is to have a ParentNode property/field in every node. For root the ParentNode is null, for all other nodes it has a value. This does mean that the tree may lose entire branches, but in memory that seems unlikely, and in a DB you can guard against that using constraints.
This approach doesn't directly support the finding of all siblings, if that is a requirement I'd add another property/field for depth, root has depth 0, all nodes below root have depth 1, and so on. All nodes with the same depth are siblings.

Resources