Fortunes Algorithm - Beach Line Data Structure - data-structures

I have to implement Fortunes algorithm for constructing Voronoi diagrams.
Important part of the algorithm is a data structure called "Beach Line Data Structure".
It is a binary balanced tree, similar to AVL, but different in a way that data is stored only on the leafs (there are other differences, but are unimportant for the question).
I am not sure how to implement it. Obviously using AVL "as is" will not work because when balancing AVL tree leaf node can become inner node and vice versa.
I also tried to look at some other known data structures at wikipedia, but none suits the needs.
I have seen some implementations that do this with a linked list, but this is not good because searching linked list is O(n), and it needs to be O(log n) for the algorithm to be efficient.

The leaves indeed store (single) points and the inner nodes of the event structure (the "beach line tree") stores ordered tuples of points whose parabolas/arcs lie next to each other. If the parabola that point Pa forms lies to the left of the parabola formed by Pb (and these two parabola's intersect), the inner node stores the ordered tuple (Pa, Pb).
Obviously using AVL "as is" will not work because when balancing AVL tree leaf node can become inner node and vice versa.
If you're worried about storing different types of objects in the AVL tree, a simple scheme would be to store the leaves as tuples too. So don't store point Pj as a leaf, but store the tuple (Pj, Pj) instead. If Pj as a leaf disappears from the event tree (beach line), and its parent is (Pi, Pj), simply change the parent into (Pj, Pj), and of course its parent will also needs to be changed from (Pj, P?) to (Pi, P?) etc. Just as with a regular AVL tree: you walk up the tree and modify the inner nodes that need to be changed and/or re-balanced.
Note that a good implementation of the algorithm can't be easily written down in a SO answer (at least, not by me!). For a proper explanation of the entire algorithm, including a description of the data structures used by it, see Computational geometry: algorithms and applications by Mark de Berg et al.. Chapter 7 is devoted solely to Voronoi diagrams.

Related

Dynamically building a balanced BST with values "in the leaves"?

In their book Computational Geometry (2008), de Berg, et al., describe the data structure underlying their range search algorithm as a balanced BST where "leaves of T store the points of P and the internal nodes of T store splitting values to guide the search."
The Wikipedia page on range trees (link), which cites de Berg, says: "A 1-dimensional range tree on a set of n points is a binary search tree" such that "each node which is not a leaf stores the largest value of its left subtree."
Examples online construct such trees statically, by first sorting the set of points and then recursively pairing up nodes.
Does there exist an algorithm to build a BST of this nature dynamically (i.e., with the ability to insert additional values into the tree)? Where is it described?
It's possible to adapt just about any tree balancing procedure to work with these two examples, just by treating the leaves separately -- make a balanced tree of the internal nodes, and then take care to keep the leaves in order. Each operation, including balancing, will require you to recalculate the "summary statistics" on at most O(log N) nodes. Those are all the nodes that were updated and their ancestors.
This can be a little complicated, though, and doesn't work for the multi-dimensional range tree, because every level is treated differently from the ones above and below, and that makes tree rotations (which most balancing operations require) invalid.
For these kinds of trees, therefore, where different levels are handled differently, it is usually best to just avoid tree rotations by using a low-order B+tree variant like a 2-3 tree. In a tree like this, nodes can be split and merged, but they never have to change height -- you can implement them so that leaves are always leaves and internal nodes are always internal. The height of the tree is only ever changed by adding or removing the root.
Of course, if you use a tree that can have more than 2 children per node, then your search algorithms will need to change, but the changes are typically trivial.

What is a zip tree, and how does it work?

I've heard of a new balanced BST data structure called a zip tree. What is the zip tree? How does it work?
At a high level, a zip tree is a
randomized balanced binary search tree,
that is a way of encoding a skiplist as a BST, and
that uses a pair of operations called zipping and unzipping rather than tree rotations.
The first bullet point - that zip trees are randomized, balanced BSTs - gives a feel for what a zip tree achieves at a high level. It's a type of balanced binary search tree that, like treaps and unlike red/black trees, uses randomization to balance the tree. In that sense, a zip tree isn't guaranteed to be a balanced tree, but rather has a very high probability of being balanced.
The second bullet point - that zip trees are encodings of skiplists - shows where zip trees come from and why, intuitively, they're balanced. You can think of a zip tree as a way of taking the randomized skiplist data structure, which supports all major operations in expected time O(log n), and representing it as a binary search tree. This provides the intuition for where zip trees come from and why we'd expect them to be so fast.
The third bullet point - zip trees use zipping and unzipping rather than tree rotations - accounts for the name of the zip tree and what it feels like to code one up. Zip trees differ from other types of balanced trees (say, red/black trees or AVL trees) in that nodes are moved around the tree not through rotations, but through a pair of operations that convert a larger chain of nodes into two smaller chains or vice-versa.
The rest of this answer dives deeper into where zip trees come from, how they work, and how they're structured.
Review: Skip Lists
To understand where zip trees come from, let's begin with a review of another data structure, the skiplist. A skiplist is a data structure that, like a binary search tree, stores a collection of elements in sorted order. Skiplists, however, aren't tree structures. Rather, a skiplist works by storing elements in sorted order through several layers of linked lists. A sample skiplist is shown here:
As you can see, the elements are represented in sorted order. Each element has an associated height, and is part of a number of linked lists equal to its height. All of the elements of the skiplist participate in the bottom layer. Ideally, roughly half of the nodes will be in the layer above that, roughly a quarter of the nodes will be in the layer above that, roughly an eighth of the nodes will be in the layer above that, etc. (More on how this works later on.)
To do a lookup in a skiplist, we begin in the topmost layer. We walk forward in the skiplist until either (1) we find the element we're looking for, (2) we find an element bigger than the one we're looking for, or (3) we hit the end of the list. In the first case, we uncork the champagne and celebrate because we discovered the item we were searching for and there's nothing more to do. In the second case or third cases, we've "overshot" the element that we're looking for. But that's nothing to worry about - in fact, that's helpful because it means that what we're looking for must be between the node we hit that "overshot" and the node that comes before it. So we'll go to the previous node, drop down one layer, and pick up our search from there.
For example, here's how we'd do a search for 47:
Here, the blue edges indicate the links followed where we moved forward, and the red edges indicate where we overshot and decided to descend down a layer.
A powerful intuition for how skiplists work - which we'll need later on as we transition to zip trees - is that the topmost layer of the skiplist partitions the remaining elements of the skiplists into different ranges. You can see this here:
Intuitively, a skiplist search will be "fast" if we're able to skip looking at most of the elements. Imagine, for example, that the second-to-last layer of the skiplist only stores every other element of the skiplist. In that case, traversing the second-to-last layer is twice as fast as traversing the bottom layer, so we'd expect a lookup starting in the second-to-last layer to take half as much time as a lookup starting in the bottom layer. Similarly, imagine that the layer above that one only stores every other element from the layer below it. Then searching in that layer will take roughly half as much time as searching the layer below it. More generally, if each layer only stores roughly half the elements of the layer below it, then we could skip past huge amounts of the elements in the skiplist during a search, giving us good performance.
The skiplist accomplishes this by using the following rule: whenever we insert an element into the skiplist, we flip a coin until we get heads. We then set the height of the newly-inserted node to be the number of coins that we ended up tossing. This means it has a 50% chance to stay in its current layer and a 50% chance to move to the layer above it, which means, in aggregate, that roughly half the nodes will only be in the bottom layer, roughly half of what's left will be one layer above that, roughly half of what's left will be one layer above that, etc.
(For those of you with a math background, you could also say that the height of each node in the skiplist is a Geom(1/2) random variable.)
Here's an example of inserting 42 into the skiplist shown above, using a height of 1:
Deletion from a skiplist is also a fairly simple operation: we simply splice it out of whatever linked lists it happens to be in. That means that if we were to delete the 42 we just inserted from the above list, we'd end up with the same skiplist that we started with.
It can be shown that the expected cost of an insertion, deletion, or lookup in a skiplist is O(log n), based on the fact that the number of items in each list is roughly half the number of items in the one below it. (That means we'd expect to see O(log n) layers, and only take a constant number of steps in each layer.)
From Skiplists to Zip Trees
Now that we've reviewed skiplists, let's talk about where the zip tree comes from.
Let's imagine that you're looking at the skiplist data structure. You really like the expected O(log n) performance of each operation, and you like how conceptually simple it is. There's just one problem - you really don't like linked lists, and the idea of building something with layers upon layers of linked lists doesn't excite you. On the other hand, you really love binary search trees. They've got a really simple structure - each node has just two pointers leaving it, and there's a simple rule about where everything gets placed. This question then naturally arises: could you get all the benefits of a skiplist, except in BST form?
It turns out that there's a really nice way to do this. Let's imagine that you have the skiplist shown here:
Now, imagine you perform a lookup in this skiplist. How would that search work? Well, you'd always begin by scanning across the top layer of the skiplist, moving forward until you found a key that was bigger than the one you were looking for, or until you hit the end of the list and found that there were no more nodes at the top level. From there, you'd then "descend" one level into a sub-skiplist containing only the keys between the last node you visited and the one that overshot.
It's possible to model this exact same search as a BST traversal. Specifically, here's how we might represent the top layer of that skiplist as a BST:
Notice that all these nodes chain to the right, with the idea being that "scanning forward in the skiplist" corresponds to "visiting larger and larger keys." In a BST, moving from one node to a larger node corresponds to moving right, hence the chain of nodes to the right.
Now, each node in a BST can have up to two children, and in the picture shown above each node has either zero children or one child. If we fill in the missing children by marking what ranges they correspond to, we get this.
And hey, wait a minute! It sure looks like the BST is partitioning the space of keys the same way that the skiplist is. That's promising, since it suggests that we're on to something here. Plus, it gives us a way to fill in the rest of the tree: we can recursively convert the subranges of the skiplist into their own BSTs and glue the whole thing together. If we do that, we get this tree encoding the skiplist:
We now have a way of representing a skiplist as a binary search tree. Very cool!
Now, could we go the other way around? That is, could we go from a BST to a skiplist? In general, there's no one unique way to do this. After all, when we converted the skiplist to a BST, we did lose some information. Specifically, each node in the skiplist has an associated height, and while each node in our BST has a height as well it's not closely connected to the skiplist node heights. To address this, let's tag each BST node with the height of the skiplist node that it came from. This is shown here:
Now, some nice patterns emerge. For starters, notice that each node's associated number is bigger than its left child's number. That makes sense, since each step to the left corresponds to descending into a subrange of the skiplist, where nodes will have lower heights. Similarly, each node's associated number is greater than or equal to the number of its right child. And that again makes sense - moving to the right either means
continuing forward at the same level that we were already on, in which case the height remains the same, or
hitting the end of a range and descending into a subrange, in which case the height decreases.
Can we say more about the shape of the tree? Sure we can! For example, in a skiplist, each node's height is picked by flipping coins until we get heads, then counting how many total coins we flipped. (Or, as before, it's geometrically distributed with probability 1/2). So if we were to imagine building a BST that corresponded to a skiplist, we'd want the numbers assigned to the nodes to work out the same way.
Putting these three rules together, we get the following, which defines the shape of our tree, the zip tree!
A zip tree is a binary search tree where
Each node has an associated number called its rank. Ranks are assigned randomly to each node by flipping coins until heads is flipped, then counting how many total coins were tossed.
Each node's rank is strictly greater than its left child's rank.
Each node's rank is greater than or equal to its right child's rank.
It's amazing how something like a skiplist can be represented as a BST by writing out such simple rules!
Inserting Elements: Unzipping
Let's suppose you have a zip tree. How would you insert a new element into it?
We could in principle answer this question by looking purely at the rules given above, but I think it's a lot easier to figure this out by remembering that zip trees are skiplists in disguise. For example, here's the above zip tree, with its associated skiplist:
Now, suppose we want to insert 18 into this zip tree. To see how this might play out, imagine that we decide to give 18 a rank of 2. Rather than looking at the zip tree, let's look at what would happen if we did the insertion into the skiplist. That would give rise to this skiplist:
If we were to take this skiplist and encode it as a zip tree, we'd get the following result:
What's interesting about this is that we can see what the tree needs to look like after the insertion, even if we don't know how to perform the insertion. We can then try to figure out what the insertion logic needs to look like by reverse-engineering it from these "before" and "after" pictures.
Let's think about what change this insertion made to our zip tree. To begin with, let's think back to our intuition for how we encode skiplists as zip trees. Specifically, chains of nodes at the same level in a skiplist with no intervening "higher" elements map to chains of nodes in the zip tree that lean to the right. Inserting an element into the skiplist corresponds to adding some new element into one of the levels, which has the effect of (1) adding in something new into some level of the skiplist, and (2) taking chains of elements in the skiplist that previously were adjacent at some level, then breaking those connections.
For example, when we inserted 18 into the skiplist shown here, we added something new into the blue chain highlighted here, and we broke all of the red chains shown here:
What is that going to translate into in our zip tree? Well, we can highlight the blue link where our item was inserted here, as well as the red links that were cut:
Let's see if we can work out what's going on here. The blue link here is, fortunately, pretty easy to find. Imagine we do a regular BST insertion to add 18 into our tree. As we're doing so, we'll pause when we reach this point:
Notice that we've hit a key with the same rank as us. That means that, if we were to keep moving to the right, we'd trace out this region of the skiplist:
To find the blue edge - the place where we go - we just need to walk down through this chain of nodes until we find one bigger than us. The blue edge - our insertion point - is then given by the edge between that node and the one above it.
We can identify this location in a different way: we've found the blue edge - our insertion point - when we've reached a point where the node to insert (1) has a bigger rank than the node to the left, (2) has a rank that's greater than or equal to the node on the right, and (3) if the node to the right has the same rank, our new item to insert is less than the item to the right. The first two rules ensure that we're inserting into the right level of the skiplist, and the last rule ensures that we insert into the right place in that level of the skiplist.
Now, where are our red edges? Intuitively, these are the edges that were "cut" because 18 has been added into the skiplist. Those would be items that previously were between the two nodes on opposite ends of the blue edge, but which node need to get partitioned into the new ranges defined by the split version of that blue edge.
Fortunately, those edges appear in really nice places. Here's where they map to:
(In this picture, I've placed the new node 18 in the middle of the blue edge that we identified in the skiplist. This causes the result not to remain a BST, but we'll fix that in a minute.)
Notice that these are the exact same edges that we'd encounter if we were to finish doing our regular BST insertion - it's the path traced out by looking for 18! And something really nice happens here. Notice that
each time we move to the right, the node, when cut, goes to the right of 18, and
each time we move to the left, the node, when cut, goes to the left of 18.
In other words, once we find the blue edge where we get inserted, we keep walking as though we were doing our insertion as usual, keeping track of the nodes where we went left and the nodes where we went right. We can then chain together all the nodes where we went left and chain together all the nodes where we went right, gluing the results together under our new node. That's shown here:
This operation is called unzipping, and it's where we get the name "zip tree" from. The name kinda make sense - we're taking two interleaved structures (the left and right chains) and splitting them apart into two simpler linear chains.
To summarize:
Inserting x into a zip tree works as follows:
Assign a random rank to x by flipping coins and counting how many flips were needed to get heads.
Do a search for x. Stop the search once you reach a node where
the node's left child has a lower rank than x,
the node's right child has a rank less than or equal to x, and
the node's right child, if it has the same rank as x, has a larger key than x.
Perform a unzip. Specifically:
Continue the search for x as before, recording when we move left and when we move right.
Chain all the nodes together where we went left by making each the left child of the previously-visited left-moving node.
Chain all the nodes together where we went right by making each the right child of the previously-visited right-moving node.
Make those two chains the children of the node x.
You might notice that this "unzipping" procedure is equivalent to what you'd get if you performed a different operation. You could achieve the same result by inserting x as usual, then using tree rotations to pull x higher and higher in the tree until it came to rest in the right place. This is a perfectly valid alternative strategy for doing insertions, though it's a bit slower because two passes over the tree are required (a top-down pass to insert at a leaf, then a bottom-up pass to do the rotations).
Removing Elements: Zipping
Now that we've seen how to insert elements, how do we remove them?
Let's begin with a helpful observation: if we insert an item into a zip tree and then remove it, we should end up with the exact same tree that we started with. To see why this is, we can point back to a skiplist. If you add and then remove something from a skiplist, then you end up with the same skiplist that you would have had before. So that means that the zip tree needs to end up looking identical to how it started after we add and then remove an element.
To see how to do this, we'd need to perform two steps:
Undo the unzip operation, converting the two chains of nodes formed back into a linear chain of nodes.
Undo the break of the blue edge, restoring the insertion point of x.
Let's begin with how to undo an unzip operation. This, fortunately, isn't too bad. We can identify the chains of nodes that we made with the unzip operation when we inserted x into the zip tree fairly easily - we simply look at the left and right children of x, then move, respectively, purely to the left and purely to the right.
Now, we know that these nodes used to be linked together in a chain. What order do we reassemble them into? As an example, take a look a this part of a zip tree, where we want to remove 53. The chains to the left and right of 53 are highlighted:
If we look at the nodes making up the left and right chains, we can see that there's only one way to reassemble them. The topmost node of the reassembled chain must be 67, since it has rank 3 and will outrank all other items. After that, the next node must be 41, because it's the smaller of the rank-2 elements and elements with the same rank have smaller items on top. By repeating this process, we can reconstruct the chain of nodes, as shown here, simply by using the rules for how zip trees have to be structured:
This operation, which interleaves two chains together into one, is called zipping.
To summarize, here's how a deletion works:
Deleting a node x from a zip tree works as follows:
Find the node x in the tree.
Perform a zip of its left and right subtrees. Specifically:
Maintain "lhs" and "rhs" pointers, initially to the left and right subtrees.
While both those pointers aren't null:
If lhs has a higher rank than rhs, make lhs's right child rhs, then advance lhs to what used to be lhs's right child.
Otherwise, make rhs's left child lhs, then advance rhs to point to what used to be rhs's left child.
Rewire x's parent to point to the result of the zip operation rather than x.
More to Explore
To recap our main points: we saw how to represent a skiplist as a BST by using the idea of ranks. That gave rise to the zip tree, which uses ranking rules to determine parent/child relationships. Those rules are maintained using the zip and unzip operations, hence the name.
Doing a full analysis of a zip list is basically done by reasoning by analogy to a skiplist. We can show, for example, that the expected runtime of an insertion or deletion is O(log n) by pointing at the equivalent skiplist and noting that the time complexity of the equivalent operations there are O(log n). And we can similary show that these aren't just expected time bounds, but expected time bounds with a high probability of occurring.
There's a question of how to actually store the information needed to maintain a zip tree. One option would be to simply write the rank of each item down in the nodes themselves. That works, though since ranks are very unlikely to exceed O(log n) due to the nature of geometric random variables, that would waste a lot of space. Another alternative would be to use a hash function on node addresses to generate a random, uniformly-distributed integer in some range, then find the position of the most least-significant 1 bit to simulate our coin tosses. That increases the costs of insertions and deletions due to the overhead of computing the hash codes, but also decreases the space usage.
Zip trees aren't the first data structure to map skiplists and BSTs together. Dean and Jones developed an alternative presentation of this idea in 2007. There's also another way to exploit this connection. Here, we started with a randomized skiplist, and used it to derive a randomized BST. But we can run this in reverse as well - we can start with a deterministic balanced BST and use that to derive a deterministic skiplist. Munro, Papadakis, and Sedgewick found a way to do this by connecting 2-3-4 trees and skiplists.
And zip trees aren't the only randomized balanced BST. The treap was the first structure to do this, and with a little math you can show that treaps tend to have slightly lower expected heights than zip trees. The tradeoff, though, is that you need more random bits per node than in a zip tree.
Hope this helps!

Is it always possible to turn one BST into another using at most O(n) tree rotations?

This earlier question asks whether it's always possible to turn one BST for a set of values into another BST for the same set of values purely using tree rotations (the answer is yes). However, is it always possible to do this using at most O(n) total tree rotations?
Yes, it is always possible to turn one BST into another using at most O(n) tree rotations. This answer follows the same general approach as the other answer by picking some canonical tree shape T* and bounding the number of rotations needed to turn an arbitrary tree into our canonical tree. Then you can turn an arbitrary tree T₁ into another tree T₂ by transforming T₁ into T* and then transforming T* into T₂.
As suggested in comments, you can choose your canonical tree to be a degenerate linked list. For trees of n nodes, this upper bounds the number of rotations needed at 2n−2.
In the paper Rotation Distance, Triangulation, and Hyperbolic Geometry, Daniel Sleator, Robert Tarjan, and William Thurston proved that the rotation distance between any two binary trees of n nodes is at most 2n−6 (better than the bound we get when transforming into a linked list).
At a high level, they did this by introducing a way to represent any binary tree as a polygon triangulation, where a tree rotation has a corresponding triangulation operation. Then, instead of reasoning about binary trees in their usual representation, the paper picks a canonical triangulation and shows how to transform an arbitrary triangulation into their desired one.
The canonical triangulation they chose is one where all diagonals emanate from a single vertex in a fan-like shape, which ends up corresponding to a somewhat unintuitive binary tree shape (a generalization of linked lists that also includes diamond shaped trees consisting of a root, a left child whose right child is a linked list, and a right child whose left child is a linked list).
It's a very cool technique that illustrates the power of isometries in data structures, showing how changing our representation can give us a new way of approaching a problem. Some friends and I recently put together a writeup walking through Sleator, Tarjan, and Thurston's proof if you would like to explore this in more detail.
Yes, this is always possible. I fear that the best I can do right now is give you a silly algorithm that proves it's possible, though I suspect that there must be a much better way to do this.
The Day-Stout-Warren algorithm is an algorithm that, starting with any BST, uses tree rotations to convert it to a perfectly balanced BST. It runs in time O(n) and does O(n) total rotations.
So suppose that you want to turn one tree T1 into another tree T2 using tree rotations. Run Day-Stout-Warren on both trees to convert them to the same balanced tree T*, and record the rotations that you needed to make in both cases. Then you can turn T1 into T2 by first running all the rotations needed to perfectly balanced T1, then running the reverse of the rotations needed to turn T2 into a balanced tree. This turns T1 into T* and then turns T* into T2. Since the Day-Stout-Warren algorithms makes only O(n) total rotations, this too makes only O(n) total rotations.
I feel like there has to be a better way to do this, but I'm not sure off the top of my head how to achieve this. If I think of anything, I'll let you know!

kd-tree stores points in inner nodes? If yes, how to search for NN?

The link in wikipedia about kd-trees store points in the inner nodes. I have to perform NN queries and I think (newbie here), I am understanding the concept.
However, I was said to study Kd-trees from Computational Geometry Algorithms and Applications (De Berg, Cheong, Van Kreveld and Overmars), section 5.2, page 99. The main difference I can see is that Overmars places the splitting data in the inner nodes and the actual points of the dataset in the leaves. For example, in 2D, an inner node will hold the splitting line.
Wikipedia on the other hand, seems to store points in inner nodes and leaves (while Overmars only on leaves).
In this case, how do we perform nearest neighbour search? Moreover, why there is this difference?
Default k-d-trees should split the data set at a point. This point is then stored on the inner node, and checked as neighbor when you walk down this tree at search time.
Of course you can have various variants of k-d-trees where the split may be at a different place, and when there is no element exactly at the splitting position, you can't have one in the inner node anymore.
Also, as k-d-trees are not dynamic, when simulating deletions via tombstones, the inner node may only contain a tombstone (representing a deleted object).

Why storing data only in the leaf nodes of a balanced binary-search tree?

I have bought a nice little book about computational geometry. While reading it here and there, I often stumbled over the use of this special kind of binary search tree. These trees are balanced and should store the data only in the leaf nodes, whereas inner nodes should only store values to guide the search down to the leaves.
The following image shows an example of this trees (where the leaves are rectangles and the inner nodes are circles).
I have two questions:
What is the advantage of not storing data in the inner nodes?
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
Any kind of helpful resource is very welcome.
What is the advantage of not storing data in the inner nodes?
There are some tree data structures that, by design, require that no data is stored in the inner nodes, such as Huffman code trees and B+ trees. In the case of Huffman trees, the requirement is that no two leaves have the same prefix (i.e. the path to node 'A' is 101 whereas the path to node 'B' is 10). In the case of B+ trees, it comes from the fact that it is optimized for block-search (this also means that every internal node has a lot of children, and that the tree is usually only a few levels deep).
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
Sure! An AVL tree is not extremely complicated, so it's a good candidate for learning.
It is common to have other kinds of binary trees with data at the leaves instead of the interior nodes, but fairly uncommon for binary SEARCH trees.
One reason you might WANT to do this is educational -- it's often EASIER to implement a binary search tree this way then the traditional way. Why? Almost entirely because of deletions. Deleting a leaf is usually very easy, whereas deleting an interior node is harder/messier. If your data is only at the leaves, then you are always in the easy case!
It's worth thinking about where the keys on interior nodes come from. Often they are duplicates of keys that are also at the leaves (with data). Later, if the key at the leaf is deleted, the key at the interior nodes might still hang around.
What is the advantage of not storing data in the inner nodes?
In general, there is no advantage in not storing data in the inner nodes. For example, a red-black tree is a balanced tree and it stores its data into the inner and leaf nodes.
For the purpose of learning, I would like to implement such a tree. Therefore, I thought it might be a good idea to use an AVL tree as the basis, but is it a good idea?
In my opinion, it is.
One benefit to only keeping the data in leaf nodes (e.g., B+ tree) is that scanning/reading the data is exceedingly simple. The leaf nodes are linked together. So to read the next item when you are at the "end" (right or left) of the data within a given leaf node, you just read the link/pointer to the next (or previous) node and jump to the next leaf page.
With a B tree where data is in every node, you have to traverse the tree to read the data in order. That is certainly a well-defined process but is arguably more complex and typically requires more state information.
I am reading the same book and they say it could be done either way, data storage at external or at internal nodes.
The trees they use are Red-Black.
In any case, here is an article that stores data at internal nodes of a Red Black Tree and then links these data nodes together as a list.
Balanced binary search tree with a doubly linked list in C++
by Arjan van den Boogaard
http://archive.gamedev.net/archive/reference/programming/features/TStorage/default.html

Resources