binary prefix code in huffman algorithm - algorithm

In the huffman coding algorithm, there's a lemma that says:
The binary tree corresponding to an optimal binary prefix code is full
But I can't figure out why. How can you prove this lemma?

Any binary code for data can be represented as a binary tree. The code is represented by the path from the root to the leaf, with a left edge representing a 0 in the prefix and a right one representing 1 (or vice versa.)
Keep in mind that for each symbol there is one leaf node.
To prove that an optimal code will be represented by a full binary tree, let's recall what a full binary tree is,
It is a tree where each node is either a leaf or has two chilren.
Let's assume that a certain code is optimal and is represented by a non-full tree.
So there is a certain vertex u with only a single child v. The edge between u and v adds the bit x to the prefix code of the symbols (at the leaves) in the subtree rooted at v.
From this tree I can remove the edge x and replace u with v, thus decreasing the length of the prefix code of all symbols in the subtree rooted at v by one. This reduces the number of bits in the representation of at least one symbol (when v is a singleton node.)
This shows that the tree didnt actually represent an optimal code, and my premise was wrong. Thus proving the lemma.

From wikipedia,
A full binary tree (sometimes 2-tree or strictly binary tree) is a tree in which every node other than the leaves has two children.
The way in which the tree for huffman code is produced will definitely produce a full binary tree. Because at each step of the algorithm, we remove the two nodes of highest priority (lowest probability) from the queue and create a new internal node with these two nodes as children.

Related

How to make Full Binary Tree with 6 nodes?

I know well about Full Binary Tree and Complete Binary Tree. But unable to make Full binary tree with only 6 nodes.
The answer is No. You can't make a Full binary tree with just 6 nodes. As the definition in the Wikipedia says:
A full binary tree (sometimes referred to as a proper or plane
binary tree) is a tree in which every node has either 0 or 2
children. Another way of defining a full binary tree is a recursive
definition. A full binary tree is either:
A single vertex.
A tree whose root node has two subtrees, both of which are full binary trees.
Another interesting property I noticed is that, the number of nodes required to make a full binary tree will always be odd.
Another way to see that a full binary tree has an odd number of nodes:
Starting with the definition of a full binary tree (Wikipedia):
a tree in which every node has either 0 or 2 children.
This means that the total number of child nodes is even (0+2+2+0+...+2 is always even). There is only one node that is not a child of another, which is the root. So considering that node as well, the total becomes odd.
By consequence there is no full binary tree with 6 nodes.
Elaborating on #vivek_23's answer, this is, unfortunately, not possible. There's a beautiful theorem that says the following:
Theorem: Any full binary tree has 2L - 1 nodes, where L is the number of leaf nodes in the tree.
The intuition behind this theorem is actually pretty simple. Imagine you take a complete binary tree and delete all the internal nodes from it. You now have a forest of L single-node full binary trees, one for each leaf. Now, add the internal nodes back one at a time. Each time you do, you'll be taking two different trees in the forest and combining them into a single tree, which decreases the number of trees in the forest by one. This means that you have to have exactly L - 1 internal nodes, since if you had any fewer you wouldn't be able to join together all the trees in the forest, and if you had any more you'd run out of trees to combine.
The fact that there are 2L - 1 total nodes in a full binary tree means that the number of nodes in a full binary tree is always odd, so you can't create a full binary tree with 6 nodes. However, you can create a full binary tree with any number of odd nodes - can you figure out how to prove that?
Hope this helps!

Why the tree is not a binary tree?

I am a beginner in the field of data structures, I am studying binary trees and in my textbook there's a tree which is not a binary tree but I am not able to make out why the tree is not a binary tree because every node in the tree has atmost two children.
According to Wikipedia definition of binary tree is "In computer science, a binary tree is a treedata structure in which each node has at most two children, which are referred to as the left child and the right child."
The tree in the picture seems to satisfy the condition as mentioned in the definition of binary tree.
I want an explanation for why the tree is not a binary tree?
This is not even a tree, let alone binary tree. Node I has two parents which violates the tree property.
I got the answer, This not even a tree because a tree is connected acyclic graph also a binary tree is a finite set of elements that is either empty or is partitioned into three disjoint subsets. The first subset contains a single element called the root of the tree. The other two subsets are themselves binary trees called the left and right subtrees of the original tree.
Here the word disjoint answers the problem.
It's not a binary tree because of node I
This can be ABEI or ACFI
This would mean the node can be represented by 2 binary numbers which is incorrect
Each node has either 0 or 1 parents. 0 in the case of the root node. 1 otherwise. I has 2 parents E and F

Joining of binary trees

Suppose we have a set of binary trees with their inorder and preorder traversals given,and where no tree is a subtree of another tree in the given set. Now another binary tree Q is given.find whether it can be formed by joining the binary trees from the given set.(while joining each tree in the set should be considered atmost once) joining operation means:
Pick the root of any tree in the set and hook it to any vertex of another tree such that the resulting tree is also a binary tree.
Can we do this using LCA (least common ancestor)?or does it needs any special datastructure to solve?
I think a Binary tree structure should be enough. I don't think you NEED any other special data structure.
And I don't understand how you would use LCA for this. As far as my knowledge goes, LCA is used for knowing the lowest common Ancestor for two NODES in the same tree. It would not help in comparing two trees. (which is what I would do to check if Q can be formed)
My solution in words.
The tree Q that has to be checked if it can be made from the set of trees, So I would take a top-down approach. Basically comparing Q with the possible trees formed from the set.
Logic:
if Q.root does not match with any of the roots of the trees in the set (A,B,C....Z...), No solution possible.
if Q.root matches a Tree root (say A) check corresponding children and mark A as used/visited. (Which is what I understand from the question: a tree can be used only once)
We should continue with A in our solution only if all of Q's children match the corresponding children of A. (I would do Depth First traversal, Breadth First would work as well).
We can add append a new tree from the set (i.e. append a new root (tree B) only at leaf nodes of A as we have to maintain binary tree). Keep track of where the B was appended.
Repeat same check with corresponding children comparison as done for A. If no match, remove B and try to add C tree at the place where B was Added.
We continue to do this till we run out of nodes in Q. (unless we want IDENTICAL MATCH, in which case we would try other tree combinations other than the ones that we have, which match Q but have more nodes).
Apologies for the lengthy verbose answer. (I feel my pseudo code would be difficult to write and be riddled with comments to explain).
Hope this helps.
An alternate solution: Will be much less efficient (try only if there are relatively less number of trees) : forming all possible set of trees ( first in 2s then 3s ....N) and and Checking the formed trees if they are identical to Q.
the comparing part can be referred here:
http://www.geeksforgeeks.org/write-c-code-to-determine-if-two-trees-are-identical/

Binary Tree Definition

I see this definition of a binary tree in Wikipedia:
Another way of defining binary trees is a recursive definition on directed graphs. A binary tree is either:
A single vertex.
A graph formed by taking two binary trees, adding a vertex, and adding an edge directed from the new vertex to the root of each binary tree.
How then is it possible to have a binary tree with one root and one left son, like this:
O
/
O
This is a binary tree, right? What am I missing here?
And please don't just say "Wikipedia can be wrong", I've seen this definition in a few other places as well.
Correct. A tree can be empty (nil)
Let's assume you have two trees: one, that has one vertex, and one which is empty (nil). They look like this:
O .
Notice that I used a dot for the (nil) tree.
Then I add a new vertex, and edges from the new vertex to the existing two trees (notice that we do not take edges from the existing trees and connect them to the new vertes - it would be impossible.). So it looks like it now:
O
/ \
O .
Since edges leading to (nil) are not drawn, here it is what is at the end:
O
/
O
I hope it clarifies.
It depends on the algorithm you use for binary-tree: as for icecream, there are many flavors :)
One example is when you have a mix of node pointers and leaf pointers on a node, and a balancing system that decide to create a second node (wether it's the root or the other) when you are inserting new values on a full node: instead of creating a root and 2 leafs, by splitting it, it's much more economical to create just another node.
Wikipedia can be wrong. Binary trees are finite data structures, a subtree must be allowed to be empty otherwise binary trees would be infinite. The base case for the recursive definition of a binary tree must allow either a single node or the empty tree.
Section 14.4 of Touch of Class: An Introduction to Programming Well Using Objects
and Contracts, by Bertrand Meyer, Springer Verlag, 2009. © Bertrand Meyer, 2009. has a better recursive definition of a binary tree
Definition: binary tree
A binary tree over G, for an arbitrary data type G, is a finite set of items called
nodes, each containing a value of type G, such that the nodes, if any, are
divided into three disjoint parts:
• A single node, called the root of the binary tree.
• (Recursively) two binary trees over G, called the left subtree and right subtree.
The definition explicitly allows a binary tree to be empty (“the nodes, if any”).
Without this, of course, the recursive definition would lead to an infinite
structure, whereas our binary trees are, as the definition also prescribes, finite.
If not empty, a binary tree always has a root, and may have: no subtree; a
left subtree only; a right subtree only; or both.

Complete binary tree definitions

I have some questions on binary trees:
Wikipedia states that a binary tree is complete when "A complete binary tree is a binary tree in which every level, except possibly the last, is completely filled, and all nodes are as far left as possible." What does the last "as far left as possible" passage mean?
A well-formed binary tree is said to be "height-balanced" if (1) it is empty, or (2) its left and right children are height-balanced and the height of the left tree is within 1 of the height of the right tree, taken from How to determine if binary tree is balanced?, is this correct or there's "jitter" on the 1-value? I read on the answer I linked that there could be also a difference factor of 4 between the height of the right and the left tree
Do the complete and height-balanced definitions just apply to binary tree or just any other tree?
Following the reference of the definition in wikipedia, I got to
this page. The definition was taken from there but modified:
Definition: A binary tree in which every level, except possibly the deepest, is completely filled. At depth n, the height of the
tree, all nodes must be as far left as possible.
It continues with a note below though,
A complete binary tree has 2k nodes at every depth k < n and between 2n and 2^(n+1) - 1 nodes altogether.
Sometimes, definitions vary according to convenience (be useful for something). That passage might be a variation which, as I understand, requires leaf nodes to fill first the left side of the deepest level (that is, fill from left to right). The definition that I usually found is exactly as described above but without that
passage.
Usually the definition taken for height-balanced tree is the one you
described. In other words:
A tree is balanced if and only if for every node the heights of its two subtrees differ by at most 1.
That definition was taken from here. Again, sometimes definitions are made more flexible to serve specific purposes. For example, the definition of an AVL tree says that
In an AVL tree, the heights of the two child subtrees of any node
differ by at most one
Still, I remember once I had to rewrite an algorithm so that the tree
would be considered height-balanced if the two child subtrees of any
node differed by at most 2. Note that the definition you gave is recursive, this is very common for binary trees.
In a tree whose number of children is variable, you wouldn't be able to say that it is complete (any parent could have the number of children that you want). Still, it can apply to n-ary trees (with a fixed amount of n children).
Do the complete and height-balanced definitions just apply to binary
tree or just any other tree?
Short answer: Yes, it can be extended to any n-ary tree.

Resources