length of binary tree - data-structures

What do we mean by length of binary tree - number of nodes, or height of tree?
Thank you

It is not a term I have seen used to describe the properties of a binary tree. I would guess someone using it would be referring to the depth.

I would personally think of 'length' as the height (depth), not the size (# of nodes) of the tree, but this is quite a contextual question.

Typically, 'length' refers to the number of items in the underlying data structure.
The height of the tree would be its 'depth'

I am going to argue that the n, number of nodes is the "best" answer.
Almost any recursively consistent measure might be argued as a potential answer, e.g. height. However, size of tree=n, number of nodes is the largest numerical answer.
Height of tree=log n, and the others will all be same or smaller numbers. So I conclude that node-count "should" be the length of a tree. It carries the most bits of information of the arguable possibilities.

Related

size for sub-trees after split in AVL tree

So I am trying to implement split method in AVL tree(given a node with key X split the tree..) and since I have size field only for the AVL tree class,I can't find a way to find the size of each sub-tree after the split. I was thinking about adding each node a size field but this solution is too complicated for now because I will have to edit many of the code that I have written.
I will be glad for solution(if exists, under those conditions) to know how to find the size of each sub-tree after the split(without time complexity above O(logn)).thank you!
It is not possible to determine the size of each side of the split in sub-linear time without attaching additional data to the node.

Extending trie to higher number of leaves

I have to make a dictionary using tries, the number of letters in the alphabet will increase from 26 to 120, and hence the numb er of leaf nodes will increase exponentially. What optimisations can I use so that my lookup, insertion and deletion time doesn't increase exponentially?
EDIT
Making the question clearer, sorry for the lack of details
I am using a multiway trie like radix tree and making some modifications to it. My question is if I know that the word size will increase (for sure) from 26 to 120, it will increase the depth of the tree. Is it possible to decrease the increase in depth by increasing the key to more than 64 bits (the register can gold maximum 64 bits)?
It's often better to use binary tries with path compression (Patricia tries) based on the binary representation of your keys. That way you get the benefits of the smallest possible alphabet, and you still only have 2 nodes (one leaf and one internal) per key.
Though there may be some optimizations otherwise but our lookup, insertion and deletion time will not increase exponentially. Increasing your alphabet set only means that your each trie node will be bigger now. The path to each word will still have the same length which is equal to the letters in the word.

Does Huffman Encoding necessarily result in a balanced binary tree?

Huffman encoding uses probability of occurrences of each value, to construct a tree where the values are the leaves. The path length from root to a leaf is least for most probably occurring values.
Is the tree constructed (ignoring the assignment of 0s and 1s to left and right) always balanced? (Balanced: depth of left and right subt-ree of every node differ at most by 1)
Support with mathematical proof would be much appreciated :)
No. Consider the frequencies 1, 1, 2, 4. It will generate a tree like the following:
For a more mathematical answer, you could say that if all the frequencies are different powers of two, the huffman tree will be unbalanced.
No. Even the basic examples shown, for example, on the Wikipedia page, have encodings with three different lengths in a single tree.

Is there only one correct answer in heapsort?

If starting with an empty heap representing a priority queue, where numbers have to be inserted in order and then represented as a binary tree, is there only one strict answer to that? I have tried different Java heap generators etc. and they are all giving me different answers.
If you mean the sorted representation, then it's obviously unique.
If you mean the binary tree representation, then yes it's unique too - a heap is a complete tree - binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible.
After all heap operations, may they be insert, delete-max, build-heap, siftup, siftdown, the heap remains in a predictable state, we can know how it's binary tree representation will look like.
Can you give more details about how you got different answers?

What invariant do RRB-trees maintain?

Relaxed Radix Balanced Trees (RRB-trees) are a generalization of immutable vectors (used in Clojure and Scala) that have 'effectively constant' indexing and update times. RRB-trees maintain efficient indexing and update but also allow efficient concatenation (log n).
The authors present the data structure in a way that I find hard to follow. I am not quite sure what the invariant is that each node maintains.
In section 2.5, they describe their algorithm. I think they are ensuring that indexing into the node will only ever require e extra steps of linear search after radix searching. I do not understand how they derived their formula for the extra steps, and I think perhaps I'm not sure what each of the variables mean (in particular "a total of p sub-tree branches").
What's how does the RRB-tree concatenation algorithm work?
They do describe an invariant in section 2.4 "However, as mentioned earlier
B-Trees nodes do not facilitate radix searching. Instead we chose
the initial invariant of allowing the node sizes to range between m
and m - 1. This defines a family of balanced trees starting with
well known 2-3 trees, 3-4 trees and (for m=32) 31-32 trees. This
invariant ensures balancing and achieves radix branch search in the
majority of cases. Occasionally a few step linear search is needed
after the radix search to find the correct branch.
The extra steps required increase at the higher levels."
Looking at their formula, it looks like they have worked out the maximum and minimum possible number of values stored in a subtree. The difference between the two is the maximum possible difference between the maximum and minimum number of values underneath a point. If you divide this by the number of values underneath a slot, you have the maximum number of slots you could be off by when you work out which slot to look at to see if it contains the index you are searching for.
#mcdowella is correct that's what they say about relaxed nodes. But if you're splitting and joining nodes, a range from m to m-1 means you will sometimes have to adjust up to m-1 (m-2?) nodes in order to add or remove a single element from a node. This seems horribly inefficient. I think they meant between m and (2 m) - 1 because this allows nodes to be split into 2 when they get too big, or 2 nodes joined into one when they are too small without ever needing to change a third node. So it's a typo that the "2" is missing in "2 m" in the paper. Jean Niklas L’orange's masters thesis backs me up on this.
Furthermore, all strict nodes have the same length which must be a power of 2. The reason for this is an optimization in Rich Hickey's Clojure PersistentVector. Well, I think the important thing is to pack all strict nodes left (more on this later) so you don't have to guess which branch of the tree to descend. But being able to bit-shift and bit-mask instead of divide is a nice bonus. I didn't time the get() operation on a relaxed Scala Vector, but the relaxed Paguro vector is about 10x slower than the strict one. So it makes every effort to be as strict as possible, even producing 2 strict levels if you repeatedly insert at 0.
Their tree also has an even height - all leaf nodes are equal distance from the root. I think it would still work if relaxed trees had to be within, say, one level of one-another, though not sure what that would buy you.
Relaxed nodes can have strict children, but not vice-versa.
Strict nodes must be filled from the left (low-index) without gaps. Any non-full Strict nodes must be on the right-hand (high-index) edge of the tree. All Strict leaf nodes can always be full if you do appends in a focus or tail (more on that below).
You can see most of the invariants by searching for the debugValidate() methods in the Paguro implementation. That's not their paper, but it's mostly based on it. Actually, the "display" variables in the Scala implementation aren't mentioned in the paper either. If you're going to study this stuff, you probably want to start by taking a good look at the Clojure PersistentVector because the RRB Tree has one inside it. The two differences between that and the RRB Tree are 1. the RRB Tree allows "relaxed" nodes and 2. the RRB Tree may have a "focus" instead of a "tail." Both focus and tail are small buffers (maybe the same size as a strict leaf node), the difference being that the focus will probably be localized to whatever area of the vector was last inserted/appended to, while the tail is always at the end (PerSistentVector can only be appended to, never inserted into). These 2 differences are what allow O(log n) arbitrary inserts and removals, plus O(log n) split() and join() operations.

Resources