Time Complexity for Finding the Minimum Value of a Binary Tree - algorithm

I wrote a recursive function for finding the min value of a binary tree (assume that it is not ordered).
The code is as below.
//assume node values are positive int.
int minValue (Node n) {
if(n == null) return 0;
leftmin = minValue(n.left);
rightmin = minValue(n.right);
return min(n.data, leftmin, rightmin);
}
int min (int a, int b, int c) {
int min = 0;
if(b != 0 && c != 0) {
if(a<=b) min =a;
else min =b;
if(min<=c) return min;
else return c;
}
if(b==0) {
if(a<=c) return a;
else return c;
}
if(c==0) {
if(a<=b) return a;
else return b;
}
}
I guess the time complexity of the minValue function is O(n) by intuition.
Is this correct? Can someone show the formal proof of the time complexity of minValue function?

Assuming your binary tree is not ordered, then your search algorithm will have O(N) running time, so your intuition is correct. The reason it will take O(N) is that you will, on average, have to search half the nodes in the tree to find an input. But this assumes that the tree is completely unordered.
For a sorted and balanced binary tree, searching will take O(logN). The reason for this is that the search will only ever have to traverse one single path down the tree. A balanced tree with N nodes will have a height of log(N), and this explains the complexity for searching. Consider the following tree for example:
5
/ \
3 7
/ \ / \
1 4 6 8
There are 8 (actually 7) nodes in the tree, but the height is only log(8) = 2. You can convince yourself that you will only ever have to traverse this tree once to find a value or fail doing so.
Note that for a binary tree which is not balanced these complexities may not apply.

The number of comparisons is n-1. The proof is an old chestnut, usually applied to the problem of saying how many matches are needed in a single-elimination tennis match. Each comparison removes exactly one number from consideration and so if there's initially n numbers in the tree, you need n-1 comparisons to reduce that to 1.

You can lookup and remove the min/max of a BST in constant time O(1), if you implement it yourself and store a reference to head/tail. Most implementations don't do that, only storing the root-node. But if you analyze how a BST works, given a ref to min/max (or aliased as head/tail), then you can find the next min/max in constant time.
See this for more info:
https://stackoverflow.com/a/74905762/1223975

Related

I am bit confused on Complexity comparison between 2 binary tree, if identical, below is the code for the same

Binary tree identical or not with another binary tree code below gives linear complexity i.e big O (n) where n is number of node of the binary tree with least number of nodes.
boolean identical(Node a, Node b)
{
if (a == null && b == null)
return true;
if (a != null && b != null)
return (a.data == b.data
&& identical(a.left, b.left)
&& identical(a.right, b.right));
/* 3. one empty, one not -> false */
return false;
}
(Fibonacci series using recursion gives exponential complexity)
Complexity of below code is 2^n.
class Fibonacci {
static int fib(int n)
{
if (n <= 1)
return n;
return fib(n-1) + fib(n-2);
}
public static void main (String args[])
{
int n = 9;
}
}
My question is both are looking similar but one has linear complexity and another has exponential. Could anyone clarify on both algorithms.
Fibonacci Series
If you build a tree for the recursive code to generate the fibonacci series, it will be like:
fib(n)
fib(n-1) fib(n-2)
fib(n-2) fib(n-3) fib(n-3) fib(n-4)
at what level you will encounter fib(1) so that the tree can "stop" ?
at ( n-1 )th level you will encounter fib(1) and there the recursion stops.
The number of nodes will be of order of 2^n because there are (n-1) levels.
Binary Tree Comparison
Lets consider your binary tree comparison.
Lets assume both are complete binary trees. According to your algorithm it will visit all nodes once and if 'h' is the height
of the tree , the number of nodes will be order of 2^h. You can say the complexity in that case as O(2^h).
The O(n) in this case is equivalent to O(2^h)
The difference originates in a different definition of n. While the naive recursive algorithm for Fibonacci numbers also performs a kind of traversal in a graph, the value of n is not defined by the number of nodes in that graph, but by the input number.
The binary tree comparison however, has n defined as a number of nodes.
So n has a completely different meaning in these two algorithms, and it explains why the time complexity in terms of n comes out so differently.

What would be the complexity of this Sorting Algorithm? What are the demerits of using the same?

The sorting algorithm can be described as follows:
1. Create Binary Search Tree from the Array data.
(For multiple occurences, increment occurence variable of the current Node)
2. Traverse BST in inorder fashion.
(Inorder traversal will return Sorted order of elements in array).
3. At each node in inorder traversal, overwrite the array element at current index(index beginning at 0) with current node value.
Here's a Java implementation for the same:
Structure of Node Class
class Node {
Node left;
int data;
int occurence;
Node right;
}
inorder function
(returning type is int just for obtaining correct indices at every call, they serve no other purpose)
public int inorder(Node root,int[] arr,int index) {
if(root == null) return index;
index = inorder(root.left,arr,index);
for(int i = 0; i < root.getOccurence(); i++)
arr[index++] = root.getData();
index = inorder(root.right,arr,index);
return index;
}
main()
public static void main(String[] args) {
int[] arr = new int[]{100,100,1,1,1,7,98,47,13,56};
BinarySearchTree bst = new BinarySearchTree(new Node(arr[0]));
for(int i = 1; i < arr.length; i++)
bst.insert(bst.getRoot(),arr[i]);
int dummy = bst.inorder(bst.getRoot(),arr,0);
System.out.println(Arrays.toString(arr));
}
The space complexity is terrible, I know, but it should not be such a big issue unless the sort is used for an extremely HUGE dataset. However, as I see it, isn't Time Complexity O(n)? (Insertions and Retrieval from BST is O(log n), and each element is touched once, making it O(n)). Correct me if I am wrong as I haven't yet studied Big-O well.
Assuming that the amortized (average) complexity of an insertion is O(log n), then N inserts (construction of the tree) will give O(log(1) + log(2) + ... + log(N-1) + log(N) = O(log(N!)) = O(NlogN) (Stirling's theorem). To read back the sorted array, perform an in-order depth-first traversal, which visits each node once, and is hence O(N). Combining the two you get O(NlogN).
However this requires that the tree is always balanced! This will not be the case in general for the most basic binary tree, as insertions do not check the relative depths of each child tree. There are many variants which are self-balancing - the two most famous being Red-Black trees and AVL trees. However the implementation of balancing is quite complicated and often leads to a higher constant factor in real-life performance.
the goal was to implement an O(n) algorithm to sort an Array of n elements with each element in the range [1, n^2]
In that case Radix sort (counting variation) would be O(n), taking a fixed number of passes (logb(n^2)), where b is the "base" used for the field, and b a function of n, such as b == n, where it would take two passes, or b == sqrt(n), where it would take four passes, or if n is small enough, b == n^2 in where it would take one pass and counting sort could be used. b could be rounded up to the next power of 2 in order to replace division and modulo with binary shift and binary and. Radix sort needs O(n) extra space, but so do the links for a binary tree.

How to build a binary tree in O(N ) time?

Following on from a previous question here I'm keen to know how to build a binary tree from an array of N unsorted large integers in order N time?
Unless you have some pre-conditions on the list that allow you to calculate the position in the tree for each item in constant time it is not possible to 'build', that is sequentially insert, items into a tree in O(N) time. Each insertion has to compare up to Log M times where M is the number of items already in the tree.
OK, just for completeness... The binary tree in question is built from an array and has a leaf for every array element. It keeps them in their original index order, not value order, so it doesn't magically let you sort a list in linear time. It also needs to be balanced.
To build such a tree in linear time, you can use a simple recursive algorithm like this (using 0-based indexes):
//build a tree of elements [start, end) in array
//precondition: end > start
buildTree(int[] array, int start, int end)
{
if (end-start > 1)
{
int mid = (start+end)>>1;
left = buildTree(array, start, mid);
right = buildTree(array, mid, end);
return new InternalNode(left,right);
}
else
{
return new LeafNode(array[start]);
}
}
I agree that this seems impossible in general (assuming we have a general, totally ordered set S of N items.) Below is an informal argument where I essentially reduce the building of a BST on S to the problem of sorting S.
Informal argument. Let S be a set of N elements. Now construct a binary search tree T that stores items from S in O(N) time.
Now do an inorder walk of the tree and print values of the leaves as you visit them. You essentially sorted the elements from S. This took you O(|T|) steps, where |T| is the size of the tree (i.e. the number of nodes). (The size of the BST is O(N log N) in the worst case.)
If |T|=o(N log N) then you just solved the general sorting problem in o(N log N) time which is a contradiction.
I have an idea, how it is possible.
Sort array with RadixSort, this is O(N). Thereafter, use recursive procedure to insert into leafs, like:
node *insert(int *array, int size) {
if(size <= 0)
return NULL;
node *rc = new node;
int midpoint = size / 2;
rc->data = array[midpoint];
rc->left = insert(array, midpoint);
rc->right = insert(array + midpoint + 1, size - midpoint - 1);
return rc;
}
Since we do not iterate tree from up to down, but always attach nodes to a current leafs, this is also O(1).

Amortized worst case complexity of binary search

For a binary search of a sorted array of 2^n-1 elements in which the element we are looking for appears, what is the amortized worst-case time complexity?
Found this on my review sheet for my final exam. I can't even figure out why we would want amortized time complexity for binary search because its worst case is O(log n). According to my notes, the amortized cost calculates the upper-bound of an algorithm and then divides it by the number of items, so wouldn't that be as simple as the worst-case time complexity divided by n, meaning O(log n)/2^n-1?
For reference, here is the binary search I've been using:
public static boolean binarySearch(int x, int[] sorted) {
int s = 0; //start
int e = sorted.length-1; //end
while(s <= e) {
int mid = s + (e-s)/2;
if( sorted[mid] == x )
return true;
else if( sorted[mid] < x )
start = mid+1;
else
end = mid-1;
}
return false;
}
I'm honestly not sure what this means - I don't see how amortization interacts with binary search.
Perhaps the question is asking what the average cost of a successful binary search would be. You could imagine binary searching for all n elements of the array and looking at the average cost of such an operation. In that case, there's one element for which the search makes one probe, two for which the search makes two probes, four for which it makes three probes, etc. This averages out to O(log n).
Hope this helps!
iAmortized cost is the total cost over all possible queries divided by the number of possible queries. You will get slightly different results depending on how you count queries that fail to find the item. (Either don't count them at all, or count one for each gap where a missing item could be.)
So for a search of 2^n - 1 items (just as an example to keep the math simple), there is one item you would find on your first probe, 2 items would be found on the second probe, 4 on the third probe, ... 2^(n-1) on the nth probe. There are 2^n "gaps" for missing items (remembering to count both ends as gaps).
With your algorithm, finding an item on probe k costs 2k-1 comparisons. (That's 2 compares for each of the k-1 probes before the kth, plus one where the test for == returns true.) Searching for an item not in the table costs 2n comparisons.
I'll leave it to you to do the math, but I can't leave the topic without expressing how irked I am when I see binary search coded this way. Consider:
public static boolean binarySearch(int x, int[] sorted {
int s = 0; // start
int e = sorted.length; // end
// Loop invariant: if x is at sorted[k] then s <= k < e
int mid = (s + e)/2;
while (mid != s) {
if (sorted[mid] > x) e = mid; else s = mid;
mid = (s + e)/2; }
return (mid < e) && (sorted[mid] == x); // mid == e means the array was empty
}
You don't short-circuit the loop when you hit the item you're looking for, which seems like a defect, but on the other hand you do only one comparison on every item you look at, instead of two comparisons on each item that doesn't match. Since half of all items are found at leaves of the search tree, what seems like a defect turns out to be a major gain. Indeed, the number of elements where short-circuiting the loop is beneficial is only about the square root of the number of elements in the array.
Grind through the arithmetic, computing amortized search cost (counting "cost" as the number of comparisons to sorted[mid], and you'll see that this version is approximately twice as fast. It also has constant cost (within ±1 comparison), depending only on the number of items in the array and not on where or even if the item is found. Not that that's important.

Generating uniformly random curious binary trees

A binary tree of N nodes is 'curious' if it is a binary tree whose node values are 1, 2, ..,N and which satisfy the property that
Each internal node of the tree has exactly one descendant which is greater than it.
Every number in 1,2, ..., N appears in the tree exactly once.
Example of a curious binary tree
4
/ \
5 2
/ \
1 3
Can you give an algorithm to generate a uniformly random curious binary tree of n nodes, which runs in O(n) guaranteed time?
Assume you only have access to a random number generator which can give you a (uniformly distributed) random number in the range [1, k] for any 1 <= k <= n. Assume the generator runs in O(1).
I would like to see an O(nlogn) time solution too.
Please follow the usual definition of labelled binary trees being distinct, to consider distinct curious binary trees.
There is a bijection between "curious" binary trees and standard heaps. Namely, given a heap, recursively (starting from the top) swap each internal node with its largest child. And, as I learned in StackOverflow not long ago, a heap is equivalent to a permutation of 1,2,...,N. So you should make a random permutation and turn it into a heap; or recursively make the heap in the same way that you would have made a random permutation. After that you can convert the heap to a "curious tree".
Aha, I think I've got how to create a random heap in O(N) time. (after which, use approach in Greg Kuperberg's answer to transform into "curious" binary tree.)
edit 2: Rough pseudocode for making a random min-heap directly. Max-heap is identical except the values inserted into the heap are in reverse numerical order.
struct Node {
Node left, right;
Object key;
constructor newNode() {
N = new Node;
N.left = N.right = null;
N.key = null;
}
}
function create-random-heap(RandomNumberGenerator rng, int N)
{
Node heap = Node.newNode();
// Creates a heap with an "incomplete" node containing a null, and having
// both child nodes as null.
List incompleteHeapNodes = [heap];
// use a vector/array type list to keep track of incomplete heap nodes.
for k = 1:N
{
// loop invariant: incompleteHeapNodes has k members. Order is unimportant.
int m = rng.getRandomNumber(k);
// create a random number between 0 and k-1
Node node = incompleteHeapNodes.get(m);
// pick a random node from the incomplete list,
// make it a complete node with key k.
// It is ok to do so since all of its parent nodes
// have values less than k.
node.left = Node.newNode();
node.right = Node.newNode();
node.key = k;
// Now remove this node from incompleteHeapNodes
// and add its children. (replace node with node.left,
// append node.right)
incompleteHeapNodes.set(m, node.left);
incompleteHeapNodes.append(node.right);
// All operations in this loop take O(1) time.
}
return prune-null-nodes(heap);
}
// get rid of all the incomplete nodes.
function prune-null-nodes(heap)
{
if (heap == null || heap.key == null)
return null;
heap.left = prune-null-nodes(heap.left);
heap.right = prune-null-nodes(heap.right);
}

Resources