How to delete in a B-Tree when all nodes are singletons - algorithm

I'm working though the B-Tree example given on the ever wonderful wikipedia on this page. (I'm using wikipedia, because Stackoverflow tells me to... How do you remove an element from a b-tree?
I'm happy with the construction of this tree.
..and I find the algorithm elegant.
My issue is that the descriptions on Wikipedia for deleting a node appear to be missing a case. The three cases given for 're-balancing after deletion' are:
If the deficient node's right sibling exists and has more than the minimum number of elements, then rotate left
Otherwise, if the deficient node's left sibling exists and has more than the minimum number of elements, then rotate right
Otherwise, if both immediate siblings have only the minimum number of elements, then merge with a sibling sandwiching their separator taken off from their parent.
None of which turn out to be helpful if the deficient node has no siblings (for example in the tree above, delete '1', '3' is now deficient and has no siblings).
My question is, what is the case/cases that are missing (presuming I've understood correctly), and what should the Wikipedia page say?

for example in the tree above, delete '1', '2' is now deficient and has no siblings
Yes, it has a sibling: The node (6,_). If you have no siblings, you are the root.
So in this case, we apply option 3 and end up with a two-level tree.

Related

kd-tree: duplicate key and deletion

In these slides (13) the deletion of a point in a kd-tree is described: It states that the left subtree can be swapped to be the right subtree, if the deleted node has only a left subtree. Then the minimum can be found and recursively be deleted (just as with a right subtree).
This is because kd-trees with equal keys for the current dimensions should be on the right.
My question: Why does the equal key point have to be the right children of the parent point? Also, what happens if my kd-tree algorithm returns a tree with an equal key point on the left?
For example:
Assume the dataset (7,2), (7,4), (9,6)
The resulting kd-tree would be (sorted with respect to one axis):
(7,2)
/ \
(7,4) (9,6)
Another source that states the same theory is this one (paragraph above Example 15.4.3)
Note that we can replace the node to be deleted with the least-valued node from the right subtree only if the right subtree exists. If it does not, then a suitable replacement must be found in the left subtree. Unfortunately, it is not satisfactory to replace N's record with the record having the greatest value for the discriminator in the left subtree, because this new value might be duplicated. If so, then we would have equal values for the discriminator in N's left subtree, which violates the ordering rules for the kd tree. Fortunately, there is a simple solution to the problem. We first move the left subtree of node N to become the right subtree (i.e., we simply swap the values of N's left and right child pointers). At this point, we proceed with the normal deletion process, replacing the record of N to be deleted with the record containing the least value of the discriminator from what is now N's right subtree.
Both refer to nodes that only have a left subtree but why would this be any different?
Thanks!
There is no hard and fast rule to have equal keys on right only. You can update that to left as well.
But doing this, you would also need update your algorithms of search and delete operations.
Have a look at these links:
https://www.geeksforgeeks.org/k-dimensional-tree/
https://www.geeksforgeeks.org/k-dimensional-tree-set-3-delete/

Something wrong with this heap diagram joke from xkcd?

I came across this picture, and someone had commented that there's a problem with the diagram, but I am not sure what it is.
Here's the picture: (original link)
Now the tree looks alright to me but the heap creates some doubt.
I know in binary heap, if the root has two children, then the left child must have it's two children before we can proceed on to the right child. Is it the case with n-ary heap also. That is, since the root has four children, then the first child should have had it's four children, before we move on to the next child.
In general, a structure is a heap if it satisfies heap condition - therefore this heap is ok, because it does satisfy it.
If we're looking for some concrete heap, I guess that pairing heap would be ok.
The problem is that there is a second condition that is generally required. That condition is that every row of the tree must be full except possibly the last, but the last row must be left-filled. In other words, if there are any nodes missing on the last row, they must be all towards the right. In the diagram the second node in the fourth row has no children, and the forth and fifth each have just a right child. Even worse, the first node in the second row doesn't have a right child. There is one more problem, but I'll leave it to you to find it.

What defines left and right in graphs and trees?

I'm learning about tree traversals and I can't seem to find any clear rules for how DFS or BFS algorithms decide which path to take first. I've seen variations of left first or least first.
Is left taken as being first child in the list?
Does this mean that (for a given node) the depth of a vertex in a graph that is part of a cycle is taken using the leftward path?
Also doesn't using a 'least first' rule make for a slower algorithm?
Thanks
Left is only meaningful for trees where the child nodes are oldered. Otherwise usually the author refers to first in the list of child nodes. Depth of a vertex is also not well defined in a graph that is not a tree, but if you refer to depth with respect to a given node that would usually be the shortest distance from the starting node.
I am not sure what does least first mean but if it refers to the key values of nodes and there is no ordering in the child nodes, finding the least will take more time of course.
Hope this helps.

The rope data structure, redundancy on wikipedia or am I missing something?

Why are there duplicate nodes like 9, 1 and 6 in the wikipedia article on rope?
Am I missing something, or those nodes are fully redundant?
they (the non-leaf nodes with single children) seem completely pointless. there doesn't seem to be anything in the linked to paper from boehm et al that is equivalent (there they use "normal" balanced trees).
they make no sense to me.
From the article:
Each node has a "weight" equal to the length of its string plus the sum of all the weights in its left subtree.
Those numbers appear to represent a weight of the node, based on the size of its children. So two nodes with value 6 do not have to have the same values. There is a Hello_ of weight 6 and a _Simon of weight 6.
Edit
For non-leaf values, the duplicates appear to be there to make the leaves at the same depth.
Those nodes could come about after a delete. Eventually you would want to rebalance so every node has two children (or a leaf) and the depth of each branch would be the same.

Find a loop in a binary tree

How to find a loop in a binary tree? I am looking for a solution other than marking the visited nodes as visited or doing a address hashing. Any ideas?
Suppose you have a binary tree but you don't trust it and you think it might be a graph, the general case will dictate to remember the visited nodes. It is, somewhat, the same algorithm to construct a minimum spanning tree from a graph and this means the space and time complexity will be an issue.
Another approach would be to consider the data you save in the tree. Consider you have numbers of hashes so you can compare.
A pseudocode would test for this conditions:
Every node would have to have a maximum of 2 children and 1 parent (max 3 connections). More then 3 connections => not a binary tree.
The parent must not be a child.
If a node has two children, then the left child has a smaller value than the parent and the right child has a bigger value. So considering this, if a leaf, or inner node has as a child some node on a higher level (like parent's parent) you can determine a loop based on the values. If a child is a right node then it's value must be bigger then it's parent but if that child forms a loop, it means he is from the left part or the right part of the parent.
3.a. So if it is from the left part then it's value is smaller than it's sibling. So => not a binary tree. The idea is somewhat the same for the other part.
Testing aside, in what form is the tree that you want to test? Remeber that every node has a pointer to it's parent. An this pointer points to a single parent. So depending of the format you tree is in, you can take advantage from this.
As mentioned already: A tree does not (by definition) contain cycles (loops).
To test if your directed graph contains cycles (references to nodes already added to the tree) you can iterate trough the tree and add each node to a visited-list (or the hash of it if you rather prefer) and check each new node if it is in the list.
Plenty of algorithms for cycle-detection in graphs are just a google-search away.

Resources