Binary tree, deleting item and reconnecting node - binary-tree

I'm learning data structures and found out that for binary search trees, there are two ways to reconnect node when you delete item. Are those two ways (below) correct?
Link to the image to see it non-resized

Yes, they are. Note that you could also do the "mirror image" version of each way, so it's actually 4 ways in total.
In fact, there are quite few ways that would produce a valid binary tree. All you need to take care of is that the left child of a node is less than the node itself, and the right child is more. However the ways you have listed are the simplest ones that are typically used (unless it's a balanced tree and you need to rebalance it).

The two methods look correct. The first method does re-balance the tree while the second simply does the connect.

Related

Data structure: a graph that's similar to a tree - but not a tree

I have implemented a data structure in C, based upon a series of linked lists, that appears to be similar to a tree - but not enough to be referred as such, because in theory it allows the existence of cycles. Here's a basic outline of the nodes:
There is a single, identifiable root that doesn't have a parent node or brothers;
Each node contains a pointer to its "father", its nearest "brother" and the first of his "children";
There are "outer" nodes without children and brothers.
How can I name such a data structure? It cannot be a tree, because even if the pointers are clearly labelled and used differently, cycles like father->child->brother->father may very well exist. My question is: terms such as "father", "children" and "brother" can be used in the context of a graph or they are only reserved for trees? After quite a bit of research I'm still unable to clarify this matter.
Thanks in advance!
I'd say you can still call it a tree, because the foundation is a tree data structure. There is precedence for my claim: "Patricia Tries" are referred to as trees even though their leaf nodes may point up the tree (creating cycles). I'm sure there are other examples as well.
It sounds like the additional links you have are essentially just for convenience, and could be determined implicitly (rather than stored explicitly). By storing them explicitly you impose additional invariants on your tree operations (insert, delete, etc), but do not affect the underlying organization, which is a tree.
Precisely because you are naming and treating those additional links separately, they can be thought of as an "overlay" on top of your tree.
In the end it doesn't really matter what you call it or what category it falls into if it works for you. Reading a book like Sedgewick's "Algorithms in C" you realize that there are tons of data structures out there, and nothing wrong with inventing your own!
One more point: Trees are a special case of graphs, so there is nothing wrong with referring to it as a graph (or using graph algorithms on it) as well.

How to get parent in a k-ary level order succint trie?

I'm implementing a level order succint trie and I wan't to be able for a given node to jump back to his parent.
I tried several combination of rank/level but I can't wrap my head around this one...
I'm using this article as a base documentation :
http://stevehanov.ca/blog/index.php?id=120
It explain how to traverse childs, but not how to go up.
Thanks to this MIT lecture (http://www.youtube.com/watch?v=1MVVvNRMXoU) I know this is possible (in constant time as stated at 15:50), but the speaker only explain it for binary trie (eg: using the formula select1(floor(i/2)) ).
How can I do that on a k-ary trie?
Well, I don't know what select1() is, but the other part (floor(i/2)) looks like the trick you would use in an array-embedded binary tree, like those described here. You would divide by 2 because every parent has exactly 2 children --> every level uses twice the space of the parent level.
If you don't have the same number of children in every node (excepting leafs and perhaps one node with less children), you can't use this trick.
If you want to know the parent of any given node, you will need to add a pointer to the parent in every node.
Though, since trees are generally traversed starting at the root and going down, the usual thing to do is to store, in an array, the pointers to the nodes of the path. At any given point, the parent of the current node is the previous element in the array. This way you don't need to add a pointer to the parent in every node.
I think I've found my answer. This paper of Guy Jacobson explains it in section 3.2 Level-order unary degree sequence.
parent(x){ select1(rank0(x)) }
Space-efficient Static Trees and Graphs
http://www.cs.cmu.edu/afs/cs/project/aladdin/wwwlocal/compression/00063533.pdf
This work pretty good, as long as you don't mess up your node numbering like I was.

How do I balance a BK-Tree and is it necessary?

I am looking into using an Edit Distance algorithm to implement a fuzzy search in a name database.
I've found a data structure that will supposedly help speed this up through a divide and conquer approach - Burkhard-Keller Trees. The problem is that I can't find very much information on this particular type of tree.
If I populate my BK-tree with arbitrary nodes, how likely am I to have a balance problem?
If it is possibly or likely for me to have a balance problem with BK-Trees, is there any way to balance such a tree after it has been constructed?
What would the algorithm look like to properly balance a BK-tree?
My thinking so far:
It seems that child nodes are distinct on distance, so I can't simply rotate a given node in the tree without re-calibrating the entire tree under it. However, if I can find an optimal new root node this might be precisely what I should do. I'm not sure how I'd go about finding an optimal new root node though.
I'm also going to try a few methods to see if I can get a fairly balanced tree by starting with an empty tree, and inserting pre-distributed data.
Start with an alphabetically sorted list, then queue from the middle. (I'm not sure this is a great idea because alphabetizing is not the same as sorting on edit distance).
Completely shuffled data. (This relies heavily on luck to pick a "not so terrible" root by chance. It might fail badly and might be probabilistically guaranteed to be sub-optimal).
Start with an arbitrary word in the list and sort the rest of the items by their edit distance from that item. Then queue from the middle. (I feel this is going to be expensive, and still do poorly as it won't calculate metric space connectivity between all words - just each word and a single reference word).
Build an initial tree with any method, flatten it (basically like a pre-order traversal), and queue from the middle for a new tree. (This is also going to be expensive, and I think it may still do poorly as it won't calculate metric space connectivity between all words ahead of time, and will simply get a different and still uneven distribution).
Order by name frequency, insert the most popular first, and ditch the concept of a balanced tree. (This might make the most sense, as my data is not evenly distributed and I won't have pure random words coming in).
FYI, I am not currently worrying about the name-synonym problem (Bill vs William). I'll handle that separately, and I think completely different strategies would apply.
There is a lisp example in the article: http://cliki.net/bk-tree. About unbalancing the tree I think the data structure and the method seems to be complicated enough and also the author didn't say anything about unbalanced tree. When you experience unbalanced tree maybe it's not for you?

Suitable tree data structure

I have been reading about tree data structure to model a problem. I need to construct memory representation of a data which is very similar to folder/file representation in file system (I don't imply the actual file stored in disk but the explorer like structure). The tree may be maximum 10 deep The intermediate nodes may only have moderate number of children (say 10 ), but there could be thousands of leaf nodes.[that is like thousands of files in the folder and file is the leaf node]
Some thoughts
A Binary tree cannot work as one node can at the most have only 2
children. (say we can have 3 subfolders)
A very generic tree implementation may be inefficient as my data can be ordered. Like the left sibling is smaller/lesser than the right ones. I hope this allow to have efficient traversal.
A B-tree sounds very close, but does it insist balancing requirements. In my case, the depth won't be more than 10, but not necessarily all the branch that deep.(say c:/windows , C:/MyDoc../A/B/C)
Please help with your experience. Should I custom make a tree or any suitable data structure available (don't mean specific to a programming language)
You have two different kinds of nodes: files and folders.
A folder node contains a set (or map) of children, where the children may themselves be files or folders.
Alternatively, you might prefer for a folder node to contain a set of files and a set of folders.
For the sets, just use your favorite representation of ordered sets (probably the one that comes with whatever language you are using). Depending the exact details of your situation, you might prefer to use a map instead.
Use two separate data structures:
A binary search tree for search
And a general binary tree for representation
and link these two together.
Note:
In general tree put folders first in order and put all files in a BST as one last node.
Or Use:
Node:
Node* Left_Most_Child_Folder;
Node* Right_Sibling_Folder;
BST_Node* Files_Root;
In a typical file system, the "directory-tree" and the search tree are not the same thing, and are usually maintained separately. The "directory-tree", which tells you what files/sub-folders a folder has, or the path to a particular file, simply reflects how the user organizes the files and is only useful to the user. The search tree on the other hand maintains the global index of all files, so as to facilitate a fast search.
For example, you can implement a Linux like file system, where a folder is a file that records the pointers of the other files/folders it contains. At the same time you maintain a B+ tree, which has every file pointer as a leaf. The balance condition of the B+ tree has nothing to do with how the user organizes the folders.
One way to do this would be to use a binary tree of binary trees. For example:
Node
Node* Children;
Node* Left;
Note* Right;
And the root of your tree is a Node*.
This makes for easy traversal and quick insertion and removal of a node. Provided, of course, you know the path to the level where you want to insert the node, or the path to the node that you want to delete. But since you indicate that you want a model similar to Explorer, I assume that finding a particular level doesn't pose a problem.
Searching for a node at a particular level is as simple as searching a binary tree.
Without a little bit more information about what you're trying to model, that's the best I can do.

How do you remove an element from a b-tree?

I'm trying to learn about b-tree and every source I can find seem to omits the discussion about how to remove an element from the tree while preserving the b-tree properties.
Can someone explain the algorithm or point me to resource that do explain how it's done?
There's an explanation of it on the Wikipedia page. B-tree - Deletion
If you haven't got it yet, I strongly recommend Carmen & al Introduction to Algorithms 3rd Edition.
It is not described because the operations naturally stem from the B-Tree properties.
Since you have a lower-bound on the number of elements in a node, if removing your elements violates this invariant, then you need to restore it, which generally involves merging with a neighbour (or stealing some of its elements).
If you merge with a neighbour, then you need to remove an element in the parent node, which triggers the same algorithm. And you apply recursively till you get to the top.
B-Tree don't have rebalancing (at least not those I saw) so it's far less complicated that maintaining a red-black tree or an AVL tree which is probably why people didn't feel compelled to write about the removal.
About which b-trees are you talking about? With linked leaves or not? Also, there are different ways of removing an item (top-bottom, bottom-top, etc.). This paper might help: B-trees, Shadowing, and Clones (even though there are many file-system specific related stuff).
The deletion example from CLRS (2nd edition) is available here: http://ysangkok.github.io/js-clrs-btree/btree.html
Press "Init book" and then push the deletion buttons in order. That will cover all cases. Try and predict the new tree state before pushing each button, and try to recognize how the cases are all unique.

Resources