What are nearly complete binary trees? - algorithm

I have read many definitions of a "heap" online, and I have also read the definition in CLRS. Most of the definitions online seem to say that heaps are complete binary trees; however, CLRS starts the heap chapter with the following sentence:
The (binary) heap data structure is an array object that we can view
as a nearly complete binary tree...
I'm not sure why, but it really bothers me that CLRS calls heaps "nearly complete," whereas almost every other definition of "heap" I've read calls heaps "complete."
This leads me to the following question: Is it possible to have a heap that isn't a complete binary tree?

You are absolutely right to be bothered by the expression "nearly complete". A heap is a complete binary tree, according to the most common terminology:
complete binary tree: all except the last level are fully occupied, and the leaves in the last
level appear at the left side of that level.
perfect binary tree: a complete binary tree where also the last level is completely occupied.
full binary tree: a binary tree where none of the nodes has just one child. Sometimes this term is used to denote a perfect binary tree, adding to the confusion.
A perfect binary tree is also a complete and a full binary tree, but a complete binary tree may or may not be a full binary tree.
But the Wikipedia article on Binary tree warns:
Some authors use the term complete to refer instead to a perfect binary tree [...] in which case they call this type of tree (with a possibly not filled last level) an almost complete binary tree or nearly complete binary tree.
So apparently the author of the text you refer to, falls into that category.

What exactly complete means? People have different opinions. In context of heap, complete binary tree should mean last level of tree has maximum number of nodes.
Any heap not having maximum leaves in its last level is not complete or is nearly complete.
For example, a heap with 7 elements would be complete binary tree. But a heap with 4, 5 or 6 elements wouldn't have its last level completely filled i.e nearly complete.
A heap with Nearly complete binary tree of depth three (assuming depth of root node to be 1) looks like below:

Related

Is there a balanced BST with each node maintain the subtree size?

Is there a balanced BST structure that also keeps track of subtree size in each node?
In Java, TreeMap is a red-black tree, but doesn't provide subtree size in each node.
Previously, I did write some BST that could keep track subtree size of each node, but it's not balanced.
The questions are:
Is it possible to implement such a tree, while keeping efficiency of (O(lg(n)) for basic operations)?
If yes, then is there any 3rd-party libraries provide such an impl?
A Java impl is great, but other languages (e.g c, go) would also be helpful.
BTW:
The subtree size should be kept track in each node.
So that could get the size without traversing the subtree.
Possible appliation:
Keep track of rank of items, whose value (that the rank depends on) might change on fly.
The Weight Balanced Tree (also called the Adams Tree, or Bounded Balance tree) keeps the subtree size in each node.
This also makes it possible to find the Nth element, from the start or end, in log(n) time.
My implementation in Nim is on github. It has properties:
Generic (parameterized) key,value map
Insert (add), lookup (get), and delete (del) in O(log(N)) time
Key-ordered iterators (inorder and revorder)
Lookup by relative position from beginning or end (getNth) in O(log(N)) time
Get the position (rank) by key in O(log(N)) time
Efficient set operations using tree keys
Map extensions to set operations with optional value merge control for duplicates
There are also implementations in Scheme and Haskell available.
That's called an "order statistic tree": https://en.wikipedia.org/wiki/Order_statistic_tree
It's pretty easy to add the size to any kind of balanced binary tree (red-black, avl, b-tree, etc.), or you can use a balancing algorithm that works with the size directly, like weight-balanced trees (#DougCurrie answer) or (better) size-balanced trees: https://cs.wmich.edu/gupta/teaching/cs4310/lectureNotes_cs4310/Size%20Balanced%20Tree%20-%20PEGWiki%20sourceMayNotBeFullyAuthentic%20but%20description%20ok.pdf
Unfortunately, I don't think there are any standard-library implementations, but you can find open source if you look for it. You may want to roll your own.

When can we use Simple Binary Tree over Binary Search Tree?

Lots of tutorials focus on implementation of Binary Search Tree and it is easier for search operations. Are there applications or circumstances where implementing a Simple Binary Tree is better than BST? Or is it just taught as an introductory concept for trees?
You use a binary tree (rather than a binary search tree) when you have a structure that requires a parent and up to two children. For example, consider a tree to represent mathematical expressions. The expression (a+b)*c becomes:
*
/ \
+ c
/ \
a b
The Paring heap is a data structure that is logically a general tree (i.e. no restriction on the number of children a node can have), but it is often implemented using a left-child right-sibling binary tree. The LCRS binary tree is often more efficient and easier to work with than a general tree.
The binary heap also is a binary tree, but not a binary search tree.
The old guessing game where the player answers a bunch of yes/no questions in order to arrive at an answer, is another example of a binary tree. In the tree below, the left child is the "No" answer, and the right child is "Yes" answer
Is it an animal?
/ \
Is it a plant? Is is a mammal?
/ \
A reptile? A dog?
You can imagine an arbitrarily deep tree with questions at each level.
Those are just a few examples. I've found binary trees useful in lots of different situations.

heap and tree data structure implementation difference

So I see that trees are usually implemented as a list where each node is dynamically allocated and each node contains pointers to two of its children.
But a heap is almost always implemented (or so is recommended in text books) using an array. Why is that? Is there some underlying assumption about the uses of these two data strcutures? For e.g. if you are implementing a priority queue using a min heap then the number of nodes in the queue is constant and so it can be implemented using an array of fixed size. But when you are talking/teaching about a heap in general why recommend implemeting it using an array. Or to flip the question a bit why not recommend learnig about trees with an implementation using arrays?
(I assume by heap you mean binary heap; other heaps are almost always linked nodes.)
A binary heap is always a complete tree, and no operation on it moves whole subtrees around or otherwise alters the topology of the tree in any nontrivial way. This is not an assumption, the first is part of the definition of a heap and the second is immediately obvious from the definition of the operations.
First, since the Ahnentafel layout requires reserving space for every internal node (and all leaf nodes except the rightmost ones), an incomplete tree implemented this way would waste space for nodes that don't exist. Conversely, for a complete tree it's the most efficient layout possible, since all space is actually used for node data, and no space is needed for pointers.
Second, moving a subtree in the array would require copying all child elements to their new positions (since the left child's index is always twice the parent's index, the former changes when the latter changes, recursively down to the leafs). When you have nodes linked via pointers, you only need to move a few pointers around regardless of how large the trees below those pointers are. Moving subtrees is a core component of many algorithms of trees, including all kinds of binary search trees. It needs to be lightning fast for those algorithms to be efficient. Binary heap operations however never need to do this so it's a non-issue.

Threaded Binary Search Trees Advantage

An explanation about Threaded Binary Search Trees (skip it if you know them):
We know that in a binary search tree with n nodes, there are n+1 left and right pointers that contain null. In order to use that memory that contain null, we change the binary tree as follows -
for every node z in the tree:
if left[z] = NULL, we put in left[z] the value of tree-predecessor(z) (i.e, a pointer to the node which contains the predecessor key),
if right[z] = NULL, we put in right[z] the value of tree-successor(z) (again, this is a pointer to the node which contains the successor key).
A tree like that is called a threaded binary search tree, and the new links are called threads.
And my question is:
What is the main advatage of Threaded Binary Search Trees (in comparison to "Regular" binary search trees).
A quick search in the web has told me that it helps to implement in-order traversal iteratively, and not recursively.
Is that the only difference? Is there another way we can use the threads?
Is that so meaningful advantage? and if so, why?
Recursive traversal costs O(n) time too, so..
Thank you very much.
Non-recursive in-order scan is a huge advantage. Imagine that somebody asks you to find the value "5" and the four values that follow it. That's difficult using recursion. But if you have a threaded tree then it's easy: do the recursive in-order search to find the value "5", and then follow the threaded links to get the next four values.
Similarly, what if you want the four values that precede a particular value? That's difficult with a recursive traversal, but trivial if you find the item and then walk the threaded links backwards.
The main advantage of Threaded Binary Search Trees over Regular one is in Traversing nature which is more efficient in case of first one as compared to other one.
Recursively traversing means you don't need to implement it with stack or queue .Each node will have pointer which will give inorder successor and predecessor in more efficient way , while implementing traversing in normal BST need stack which is memory exhaustive (as here programming language have to consider implementation of stack) .

Binary Tree's usage

Can someone give me a real life example ( in programming, C#) of needing to use a Binary Tree or even just an ordinary tree?
I understand the principle of a Binary Tree and how they work, but I'm trying to find some real life example's of their usage?
Tony
In C#, Java, Python, C++ (using the STL) and other high-level languages, most of the time you will use one of the built-in/library-included types to store your data, at least the data you work on at the moment, so most of the time you won't be using a binary tree or another kind of tree explicitly.
This being said, some of these built-in types are implemented as trees of one kind or another "in the backstage", and in some situations you will have to implement one yourself.
Also, a related thing you HAVE to know is binary search. This is mostly done in binary trees (binary search trees :P) but the idea can be extrapolated to a lot of problems, even without trees involved, so try understand it well.
Edit: Real life classical example:
Imagine that you want to search for the phone number of a particular person in the phone guide of a big city. All things being equal, you will open it roughly at the middle, look for the guys in that page, and see if your "target" is before or after it, thus cutting the data by half. Then you repeat the operation in the half where you know your "target" is, and again and again until you found your "target". As each time you are looking into half the data you had before, you require a total of log(base 2) n operations to reach your "target", where n is the total size of the data.
So in a 1 million phone book, you find your target in log(base 2) 1 million = 20 comparisons, instead of comparing one by one as in a linear search (that's 1 million comparisons in the worst case).
Note that this only work in already sorted data.
Balanced binary trees, storing data maintained in sorted order, are used to achieve O(log(n)) lookup, delete, and insert times. "Balanced" just means there is a bounded limit between the depth of the shallowest and deepest leaves, counting empty left/right nodes as leaves. (optimally the depth of left and right subtrees differs at most by one, some implementations relax this to make the algorithms simpler)
You can use an array, rather than a tree, in sorted order with binary search to achieve O(log(n)) lookup time, but then the insert/delete times are O(n).
Some trees (notably B-trees for databases) use more than 2 branches per node, to widen the tree and reduce the maximum depth (which determines search times).
I can't think of a reason to use binary trees that are not maintained in sorted order (a point that has not been mentioned in most of the answers here), but maybe there's some application for this. Besides the sorted binary balanced tree, anything with hierarchy (as other answerers have mentioned, XML or directory structures) is a good application for trees, whether binary or not.
edit: re: unsorted binary trees: I just remembered that LISP and Scheme both make heavy use of unbalanced binary trees. The cons function takes two arguments (e.g. (define c (cons a b)) ) and returns a tree node whose branches are the two arguments. The car function takes such a tree node and returns the first argument given to cons. The cdr function is similar but returns the second argument to cons. Finally nil represents a null object. These are the primitives used to make all data structures in LISP and Scheme. Lists are implemented using an extreme unbalanced binary tree. The list containing literal elements 'Alabama, 'Alaska, 'Arizona, and 'Arkansas can be constructed explicitly as
(cons 'Alabama (cons 'Alaska (cons 'Arizona (cons 'Arkansas nil))))
and can be traversed using car and cdr (where car is used to get the head of the list and cdr is used to get the sublist excluding the list head). This is how Scheme works, I think LISP is the same or very similar. More complicated data structures, like binary trees (which need 3 members per node: two to hold the left and right nodes, and a third to hold the node value) or trees containing more than two branches per node can be constructed using a list to implement each node.
How about the directory structure in Unix. For instance the du command i.e. the disk usage command does a post order traversal (traversal order:: left child -> right child -> root node) of a tree representing the directory structure in order to fetch the disk space used by that directory.
The following slides should help.
http://www.cse.unt.edu/~rada/CSCE3110/Lectures/Trees.ppt
cheers
In Java, trees are used to implement certain sorted data structures, such as the TreeSet:
http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeSet.html
They are used for data structures where you want the order to be based on some property of the elements, rather than on insertion order.
Here are some examples:
The in-memory representation of a parsed program or expression is a tree. In the case of expressions (excluding ternary operators) the tree will be binary.
The components of a GUI are organized as a tree.
Any "containment" hierarchy can be represented as a tree. (HTML, XML and SGML are examples.
And of course, binary (and n-ary) trees can be used to represent indexes, maps, sets and other "generic" data structures.
An easy example is searching. If you store your list data in a tree, for example, you get O(log(n)) lookup times. A standard array implementation of a list would achieve O(n) lookup time.
XML, HTML (and SGML) documents are trees.

Resources