Is null a binary tree? - algorithm

So I have seen a few examples such as
How to validate a Binary Search Tree?
http://www.geeksforgeeks.org/check-if-a-binary-tree-is-subtree-of-another-binary-tree/
They return 1, or true is a tree is null.
Expanding questions a bit - assuming I had to find if TreeSmall is subtree of TreeBig, and my TreeSmall is null, should the return value of checkSubtree(smallTree) true or false ? A true indicates TreeSmall was a tree with value of null. This does not make sense to me.

In pure computer science, null is a valid binary tree. It is called an empty binary tree. Just like an empty set is still a valid set. Furthermore, a binary tree with only a single root node and no children is also valid (but not empty). See this Stack Overflow answer for more information.
In practical implementation, there are two ways to go about it though.
Assume that a valid binary tree must have at least one node and do not allow empty trees. Each node does not have to have children. All recursive methods on this tree do not descend to the level of null. Rather, they stop when they see that the left child or right child of a node is null. This implementation works as long as you don't pass null to any place where a tree is expected.
Assume that null is a valid binary tree (formally, just the empty tree). In this implementation, you first check if the pointer is null before doing any operations on it (like checking for left/right children, etc.) This implementation works for any pointer to a tree. You can freely pass null pointers to methods that are expecting a tree.
Both ways work. The second implementation has the advantage of flexibility. You can pass null to anything that expects a tree and it will not raise an exception. The first implementation has the advantage of not wasting time descending to "child nodes" that are null and you don't have to use null checks at the beginning of every function/method which operates on a node. You simply have to do null checks for the children instead.

That depends on the application and is a question of definition.
Edit:
For example Wikipedia "defines" a BST as follows:
In computer science, a binary search tree (BST), sometimes also called an ordered or sorted binary tree, is a node-based binary tree data structure which has the following properties:
The left subtree of a node contains only nodes with keys less than the node's key.
The right subtree of a node contains only nodes with keys greater than the node's key.
The left and right subtree must each also be a binary search tree.
There must be no duplicate nodes
Let's test those for null:
left subtree doesn't exist, so there are no nodes violating this rule -> check
similiar to first -> check
if all this tests are passed this passes too, since both subtrees are null -> check
of course there aren't duplicates -> check
So by this definition null is a valid BST. You could inverse this by also requiring "there must be one root node". Which doesn't affect any of the practical properties of a BST but might in an explicit application.

Ex falso sequitur quodlibet - since "null" is nothing at all, it can be interpreted to be anything. It is really a matter of design. Some people may claim that checkSubTree() should thrown something like an IllegalArgumentException in this case. Another approach would be to introduce a special kind of typed object or instance which represents an empty tree (cf. NullObjectPattern). Such a null-object would be a tree by all accounts, e.g. EmptyTree instanceof Tree, while null instanceof Tree would always be false.

Related

Find number of leaves under each node of a tree

I have a tree which is represented in the following format:
nodes is a list of nodes in the tree in the order of their height from top. Node at height 0 is the first element of nodes. Nodes at height 1 (read from left to right) are the next elements of nodes and so on.
n_children is a list of integers such that n_children[i] = num children of nodes[i]
For example given a tree like {1: {2, 3:{4,5,2}}}, nodes=[1,2,3,4,5,2], n_children = [2,0,3,0,0,0].
Given a Tree, is it possible to generate nodes and n_children and the number of leaves corresponding to each node in nodes by traversing the tree only once?
Is such a representation unique? Or is it possible for two different trees to have the same representation?
For the first question - creating the representation given a tree:
I am assuming by "a given tree" we mean a tree that is given in the form of node-objects, each holding its value and a list of references to its children-node-objects.
I propose this algorithm:
Start at node=root.
if node.children is empty return {values_list:[[node.value]], children_list:[[0]]}
otherwise:
3.1. construct two lists. One will be called values_list and each element there shall be a list of values. The other will be called children_list and each element there shall be a list of integers. Each element in these two lists will represent a level in the sub-tree beginning with node, including node itself (will be added at step 3.3).
So values_list[1] will become the list of values of the children-nodes of node, and values_list[2] will become the list of values of the grandchildren-nodes of node. values_list[1][0] will be the value of the leftmost child-node of node. And values_list[0] will be a list with one element alone, values_list[0][0], which will be the value of node.
3.2. for each child-node of node (for which we have references through node.children):
3.2.1. start over at (2.) with the child-node set to node, and the returned results will be assigned back (when the function returns) to child_values_list and child_children_list accordingly.
3.2.2. for each index i in the lists (they are of same length) if there is a list already in values_list[i] - concatenate child_values_list[i] to values_list[i] and concatenate child_children_list[i] to children_list[i]. Otherwise assign values_list[i]=child_values_list[i] and children_list[i]=child.children.list[i] (that would be a push - adding to the end of the list).
3.3. Make node.value the sole element of a new list and add that list to the beginning of values_list. Make node.children.length the sole element of a new list and add that list to the beginning of children_list.
3.4. return values_list and children_list
when the above returns with values_list and children_list for node=root (from step (1)), all we need to do is concatenate the elements of the lists (because they are lists, each for one specific level of the tree). After concatenating the list-elements, the resulting values_list_concatenated and children_list_concatenated will be the wanted representation.
In the algorithm above we visit a node only by starting step (2) with it set as node and we do that only once for each child of a node we visit. We start at the root-node and each node has only one parent => every node is visited exactly once.
For the number of leaves associated with each node: (if I understand correctly - the number of leaves in the sub-tree a node is its root), we can add another list that will be generated and returned: leaves_list.
In the stop-case (no children to node - step (2)) we will return leaves_list:[[1]]. In step (3.2.2) we will concatenate the list-elements like the other two lists' list-elements. And in step (3.3) we will sum the first list-element leaves_list[0] and will make that sum the sole element in a new list that we will add to the beginning of leaves_list. (something like leaves_list.add_to_eginning([leaves_list[0].sum()]))
For the second question - is this representation unique:
To prove uniqueness we actually want to show that the function (let's call it rep for "representation") preserves distinctiveness over the space of trees. i.e. that it is an injection. As you can see in the wiki linked, for that it suffices to show that there exists a function (let's call it tre for "tree") that given a representation gives a tree back, and that for every tree t it holds that tre(rep(t))=t. In simple words - that we can make a method that takes a representation and builds a tree out of it, and for every tree if we make its representation and passes that representation through that methos we'll get the exact same tree back.
So let's get cracking!
Actually the first job - creating that method (the function tre) is already done by you - by the way you explained what the representation is. But let's make it explicit:
if the lists are empty return the empty tree. Otherwise continue
make the root node with values[0] as its value and n_children[0] as its number of children (without making the children nodes yet).
initiate a list-index i=1 and a level index li=1 and level-elements index lei=root.children.length and a next-level-elements accumulator nle_acc=0
while lei>0:
4.1. for lei times:
4.1.1. make a node with values[i] as value and n_children[i] as the number of children.
4.1.2. add the new node as the leftmost child in level li that has not been filled yet (traverse the tree to the li level from the leftmost in right direction and assign the new node to the first reference that is not assigned yet. We know the previous level is done, so each node in the li-1 level has a children.length property we can check and see if each has filled the number of children they should have)
4.1.3. add nle_acc+=n_children[i]
4.1.4. increment ++i
4.2. assign lei=nle_acc (level-elements can take what the accumulator gathered for it)
4.3. clear nle_acc=0 (next-level-elements accumulator needs to accumulate from the start for the next round)
Now we need to prove that an arbitrary tree that is passed through the first algorithm and then through the second algorithm (this one here) will get out of all of that the same as it was originally.
As I'm not trying to prove the corectness of the algorithms (although I should), let's assume they do what I intended them to do. i.e. the first one writes the representation as you described it, and the second one makes a tree level-by-level, left-to-right, assigning a value and the number of children from the representation and fills the children references according to those numbers when it comes to the next level.
So each node has the right amount of children according to the representation (that's how the children were filled), and that number was written from the tree (when generating the representation). And the same is true for the values and thus it is the same tree as the original.
The proof actually should be much more elaborate and detailed - but I think I'll leave it at that now. If there will be a demand for elaboration maybe I'll make it an actual proof.

Uniqueness of B-tree

Say that I have a sequence of key values to be inserted into a B-tree of any given order. After insertion of all the elements, I am performing a deletion operation on some of those elements. Does it always give an unique result (in the form of a B-tree) or it can it differ according to the deletion operation?
Quoted from wiki :
link:https://en.wikipedia.org/wiki/B-tree
Deletion from an internal node
Each element in an internal node acts as a separation value for two
subtrees, therefore we need to find a replacement for separation. Note
that the largest element in the left subtree is still less than the
separator. Likewise, the smallest element in the right subtree is
still greater than the separator. Both of those elements are in leaf
nodes, and either one can be the new separator for the two subtrees.
Algorithmically described below:
Choose a new separator (either the largest element in the left subtree or the smallest element in the right subtree), remove it from
the leaf node it is in, and replace the element to be deleted with the
new separator.
The previous step deleted an element (the new separator) from a leaf
node. If that leaf node is now deficient (has fewer than the required
number of nodes), then rebalance the tree starting from the leaf node.
I think according to the deletion operation it may vary because of the above lines quoted in bold letters. Am I right? help :)
If your question is whether two B-trees that contain the exact same collection of key values will always have identical nodes, then the answer is No.
Note that this is also true for e.g. simple binary trees.
However, in the case of B-trees this can be more pronounced because B-trees are optimized for minimizing page changes and thus the need to write back to slow secondary storage.

how to check that if a binary tree isComplete() in data structure?

how to check if a binary tree isComplete() in data structure?
the leaves nodes have to be filled from left to right without gap
You could have a recursive method that queries the left and right child to see if they have two children each and if they do, returns a true. Then you just call that method and pass it your root and it'll return either true or false after recursing through each of its children and their children.

Do binary search trees have null children that are not shown?

In every diagram of binary search trees that I have found it appears there are nodes that only have 1 child. Do these nodes actually have only one child or is there another null child that is not shown. For example, does 14 have a right null child that is not shown? This would make tree traversals make more sense.
Yes, whenever there aren't exactly 2 children, the references / pointers to the remaining children will contain null values (or whatever the equivalent of that is in whichever language it's implemented in).
So 14 will have a null right child, and 1, 4, 7 and 13 will have null left and right children.
I can only speak for a fairly small subset of languages, but you'll definitely need to have some sort of "points to nothing" concept, which these will contain.
The above assumes you have a structure similar to:
node
node left
node right
type value
As an alternative to this representation (although I can't say I've ever seen this used for a binary tree - just presenting the possibility), you could also have an array of children, for example - a size 2 array means both left and right children, a size 1 array means only 1 child - you could perhaps have a flag indicating whether it's a left or right child (or, since it's a BST, you could just compare it to determine which it is), a size 0 array means no children. Note that this array needn't contain any null values.

Sort list to ease construction of binary tree

I have a set of items that are supposed to for a balanced binary tree. Each item is of the form (data,parent), data being the useful information and parent being the index of the parent node in the binary tree.
Nodes in the tree are numbered left-to-right, row-by-row, like this:
1
___/ \___
/ \
2 3
_/\_ _/\_
4 5 6 7
These elements come stored in a linked list. How should I order this list such that it's easier for me to build the tree? Each parent node will be referenced (by index) by exactly two child nodes; if I sort these by parent index, the sorting must be stable.
You can sort the list in any stable sort, according to the parent field, in increasing order.
The result will be a list like that:
[(d_1,nil), (d_2,1), (d_3,1) , (d_4,2), (d_5,2), ...(d_i,x), (d_i+1,x) ]
^
the root has no parent...
Note that in this list, since we used a stable sort - for each two pairs (d_i,x), (d_i+1,x) in the sorted list, d_i is the left leaf!
Now, you can populate the tree in breadth-first traversal,
Since it is homework - I still want you to make sure you understand everything by your own. So I do not want to "feed answer". If you have any specific question, please comment - and I will try to edit and explain the relevant parts with more details.
Bonus: The result of this organization is very common way to implement a binary heap structure, which is a complete binary tree, but for performance, we usually store it as an array, which is very similar to the output generated by this approach.
I don't think I understand what exactly are you trying to achieve. You have to write the function that inserts items in the tree. The red-black tree, for example, has the same complexity for insertions, O(log n), no matter how the input data is sorted. Is there a specific implementation that you have to use or a specific speed target that you must reach for inserts?
PS: Sounds like a homework to me :)
It sounds like you want a binary tree that allows you to go from a leaf node to its ancestors, using an array.
Usually sorting a list before putting it into a binary tree causes an unbalanced binary tree, unless you use a treap or other O(logn) datastructure.
The usual way of stashing a (complete) binary tree in an array, is to make node i have two children 2i and 2i+1.
Given this organization (not sorting but organization), you can go to a parent node from a leaf node by dividing the array index by 2 using integer arithmetic which will truncate fractions.
if your binary trees are not always complete, you'll probably be better served by forgetting about using an array, and instead using a more traditional tree structure with pointers/references.

Resources