Why do ropes store data only at the leafs? - data-structures

If you look at the definition of a rope on wikipedia:
A rope is a binary tree where each leaf (end node) holds a string and a length (also known as a "weight"), and each node further up the tree holds the sum of the lengths of all the leaves in its left subtree
Why did a rope for strings end up with data only at the leaf nodes?
Is there a reason why you wouldn't, for example, use an avl and store strings at every node in the tree?

Related

difference between m way tree and m way search tree

I tried finding the difference between m way tree and the m way search tree. Most resources only tells about m way search tree and end up being on B tree or B+ trees.
My doubts are:-
Is it analogous to the binary tree and binary search tree?
I read somewhere that m way trees don't have any particular order and
every node has to be filled fully before moving to the new node.(complete
tree)
Is it analogous to the binary tree and binary search tree?
Yes
m way trees don't have any particular order
This is true
and every node has to be filled fully before moving to the new node.(complete tree)
Something like this describes a step in an algorithm, and has little to do with the data structure itself: nothing is "moving" in a data structure.
Definitions
In short: an m-way tree puts no conditions on the values stored in the nodes, while an m-way search tree does.
Reva Freedman, associate professor at Northern Illinois University has notes on Multiway Trees where four terms are defined in succession, each time indicating which additional requirements apply for the next term:
multi way tree,
m-way tree
m-way search tree
B-tree of order m
Multiway Trees
A multiway tree is a tree that can have more than two children. A
multiway tree of order m (or an m-way tree) is one in which a tree can
have m children.
As with the other trees that have been studied, the nodes in an m-way
tree will be made up of key fields, in this case m-1 key fields, and
pointers to children.
To make the processing of m-way trees easier, some type of order will
be imposed on the keys within each node, resulting in a multiway
search tree of order m ( or an m-way search tree). By definition an
m-way search tree is a m-way tree in which:
Each node has m children and m-1 key fields
The keys in each node are in ascending order.
The keys in the first i children are smaller than the ith key
The keys in the last m-i children are larger than the ith key
M-way search trees give the same advantages to m-way trees that binary
search trees gave to binary trees - they provide fast information
retrieval and update. However, they also have the same problems that
binary search trees had - they can become unbalanced, which means that
the construction of the tree becomes of vital importance.
B-Trees
An extension of a multiway search tree of order m is a B-tree of
order m. This type of tree will be used when the data to be
accessed/stored is located on secondary storage devices because they
allow for large amounts of data to be stored in a node.
A B-tree of order m is a multiway search tree in which:
The root has at least two subtrees unless it is the only node in the tree.
Each nonroot and each nonleaf node have at most m nonempty children and at least m/2 nonempty children.
The number of keys in each nonroot and each nonleaf node is one less than the number of its nonempty children.
All leaves are on the same level.

Count nodes bigger then root in each subtree of a given binary tree in O(n log n)

We are given a tree with n nodes in form of a pointer to its root node, where each node contains a pointer to its parent, left child and right child, and also a key which is an integer. For each node v I want to add additional field v.bigger which should contain number of nodes with key bigger than v.key, that are in a subtree rooted at v. Adding such a field to all nodes of a tree should take O(n log n) time in total.
I'm looking for any hints that would allow me to solve this problem. I tried several heuristics - for example when thinking about doing this problem in bottom-up manner, for a fixed node v, v.left and v.right could provide v with some kind of set (balanced BST?) with operation bigger(x), which for a given x returns a number of elements bigger than x in that set in logarihmic time. The problem is, we would need to merge such sets in O(log n), so this seems as a no-go, as I don't know any ordered set like data structure which supports quick merging.
I also thought about top-down approach - a node v adds one to some u.bigger for some node u if and only if u lies on a simple path to the root and u<v. So v could update all such u's somehow, but I couldn't come up with any reasonable way of doing that...
So, what is the right way of thinking about this problem?
Perform depth-first search in given tree (starting from root node).
When any node is visited for the first time (coming from parent node), add its key to some order-statistics data structure (OSDS). At the same time query OSDS for number of keys larger than current key and initialize v.bigger with negated result of this query.
When any node is visited for the last time (coming from right child), query OSDS for number of keys larger than current key and add the result to v.bigger.
You could apply this algorithm to any rooted trees (not necessarily binary trees). And it does not necessarily need parent pointers (you could use DFS stack instead).
For OSDS you could use either augmented BST or Fenwick tree. In case of Fenwick tree you need to preprocess given tree so that values of the keys are compressed: just copy all the keys to an array, sort it, remove duplicates, then substitute keys by their indexes in this array.
Basic idea:
Using the bottom-up approach, each node will get two ordered lists of the values in the subtree from both sons and then find how many of them are bigger. When finished, pass the combined ordered list upwards.
Details:
Leaves:
Leaves obviously have v.bigger=0. The node above them creates a two item list of the values, updates itself and adds its own value to the list.
All other nodes:
Get both lists from sons and merge them in an ordered way. Since they are already sorted, this is O(number of nodes in subtree). During the merge you can also find how many nodes qualify the condition and get the value of v.bigger for the node.
Why is this O(n logn)?
Every node in the tree counts through the number of nodes in its subtree. This means the root counts all the nodes in the tree, the sons of the root each count (combined) the number of nodes in the tree (yes, yes, -1 for the root) and so on all nodes in the same height count together the number of nodes that are lower. This gives us that the number of nodes counted is number of nodes * height of the tree - which is O(n logn)
What if for each node we keep a separate binary search tree (BST) which consists of nodes of the subtree rooted at that node.
For a node v at level k, merging the two subtrees v.left and v.right which both have O(n/2^(k+1)) elements is O(n/2^k). After forming the BST for this node, we can find v.bigger in O(n/2^(k+1)) time by just counting the elements in the right (traditionally) subtree of the BST. Summing up, we have O(3*n/2^(k+1)) operations for a single node at level k. There are a total of 2^k many level k nodes, therefore we have O(2^k*3*n/2^(k+1)) which is simplified as O(n) (dropping the 3/2 constant). operations at level k. There are log(n) levels, hence we have O(n*log(n)) operations in total.

binary prefix code in huffman algorithm

In the huffman coding algorithm, there's a lemma that says:
The binary tree corresponding to an optimal binary prefix code is full
But I can't figure out why. How can you prove this lemma?
Any binary code for data can be represented as a binary tree. The code is represented by the path from the root to the leaf, with a left edge representing a 0 in the prefix and a right one representing 1 (or vice versa.)
Keep in mind that for each symbol there is one leaf node.
To prove that an optimal code will be represented by a full binary tree, let's recall what a full binary tree is,
It is a tree where each node is either a leaf or has two chilren.
Let's assume that a certain code is optimal and is represented by a non-full tree.
So there is a certain vertex u with only a single child v. The edge between u and v adds the bit x to the prefix code of the symbols (at the leaves) in the subtree rooted at v.
From this tree I can remove the edge x and replace u with v, thus decreasing the length of the prefix code of all symbols in the subtree rooted at v by one. This reduces the number of bits in the representation of at least one symbol (when v is a singleton node.)
This shows that the tree didnt actually represent an optimal code, and my premise was wrong. Thus proving the lemma.
From wikipedia,
A full binary tree (sometimes 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children.
The way in which the tree for huffman code is produced will definitely produce a full binary tree. Because at each step of the algorithm, we remove the two nodes of highest priority (lowest probability) from the queue and create a new internal node with these two nodes as children.

Traversing an overflowing binary tree

Given a very large binary tree (i.e. with millions of nodes), how to handle determining the number of nodes in the tree? In other words, given the root node of this tree to a function, the function should return the number of nodes in the tree.
Or let's say how do you check if the Binary Tree is BST if the tree has very large number of nodes?
Walk all nodes and check whatever conditions/metric you need. There is nothing else you can do without additional knowledge about the tree.
You can enforce particular conditions at the time when tree is created (i.e. must be balanced/sorted/whatever) or collect information about tree at creation time (i.e. store and constantly update number of children).
To check if it's a VALID bst you have to visit every node depth first and ensure each node is smaller than the previous.
If you want to evaluate how long that will take for a balanced BST you could get a quick approximation of the size by counting the length of one leg, I believe the total size will be between 2^(n-1) and 2^n-1 inclusive

Construction of B+ trees

Suppose I am asked to construct a B+ tree, of:
i) n = x.
ii) order = x.
iii) degree = x.
iv) p = x.
What should the no. of keys, and pointers that each node can contain, in each of the above cases?
In B+ tree, Order denotes Maximum number of child pointers for each internal node, i.e. if Order of a B+ tree is m, then each internal node can have at most m children (subsequently, m-1 number of keys) and at least CEIL(m/2) number of children pointers (Except root).
For Degree of B+ tree, from this, I got the information that if d is the degree of a B-Tree, then each node can contain upto 2d items (keys). Now, both B tree and B+ tree are Multiway Tree, and hence, I suppose definition of degree will not change. Check the $LINK given as Comment also which indicates same fact.
For n, as JustinDanielson mentioned, it might be total number of keys stored in the node, for which number of children pointer would be n+1 (=x+1 for your question)

Resources