Complexity of inserting n numbers into a binary search tree - algorithm

I have got a question, and it says "calculate the tight time complexity for the process of inserting n numbers into a binary search tree". It does not denote whether this is a balanced tree or not. So, what answer can be given to such a question? If this is a balanced tree, then height is logn, and inserting n numbers take O(nlogn) time. But this is unbalanced, it may take even O(n2) time in the worst case. What does it mean to find the tight time complexity of inserting n numbers to a bst? Am i missing something? Thanks

It could be O(n^2) even if the tree is balanced.
Suppose you're adding a sorted list of numbers, all larger than the largest number in the tree. In that case, all numbers will be added to the right child of the rightmost leaf in the tree, Hence O(n^2).
For example, suppose that you add the numbers [15..115] to the following tree:
The numbers will be added as a long chain, each node having a single right hand child. For the i-th element of the list, you'll have to traverse ~i nodes, which yields O(n^2).
In general, if you'd like to keep the insertion and retrieval at O(nlogn), you need to use Self Balancing trees.

What wiki is saying is correct.
Since the given tree is a BST, so one need not to search through entire tree, just comparing the element to be inserted with roots of tree/subtree will get the appropriate node for th element. This takes O(log2n).
Once we have such a node we can insert the key there bht after that it is required push all the elements in the right aub-tree to right, so that BST's searching property does not get violated. If the place to be inserted comes to be the very last last one, we need to worry for the second procedure. If note this procedure may take O(n), worst case!.
So the overall worst case complexity of inserting an element in a BST would be O(n).
Thanks!

Related

Confusion related to the time complexity of an algorithm

I was going through this algorithm https://codereview.stackexchange.com/questions/63921/print-all-nodes-from-root-to-leaves
In one of the comments it is mentioned that printing the paths from the root to leaf itself has average time complexity of O(nlogn). I am not quite sure how he came up with that. Any clarification will be much appreciated.
I think this is what they mean:
in the best case, the tree is perfectly balanced, and it contains N nodes, where log(N)+1 is the number of levels. The tree has N/2 leaves.
Every time we move to a lower level, we duplicate the currently accumulated path. If you assume copying an array of length k as an O(k) operation, then when we move from the second to last level to a leaf we do an O(log(N)) operation. As there are N/2 leaves, and for each we do an O(log(N)) operation, you get O(N*log(N)).
Instead of duplicating arrays, the function could pass recursively the same array, and the current level number, making sure that the path is printed only up to the level of the leaf.

Worst case running time of constructing a BST?

Could someone explain to me how the Worst case running time of constructing a BST is n^2? I asked my professor and the only feedback i received is
"Because the tree is linear to the size of the input. The cost is 1+2+3+4+...+(n-1)."
Can someone explain this in a different way? Her explanation makes me think its O(n)....
I think the worst case happens when the input is already sorted:
A,B,C,D,E,F,G,H.
That's why you might want to randomly permute the input sequence if applicable.
The worst-case running time is proportional to the square of the input because the BST is unbalanced. An ubalanced BST can exhibit a degenerate structure: in the worst case, a singly linked list. Constructing this list will require that each insertion marches down the full length of the growing list to get to the leaf node to add a new leaf.
For instance, try running the algorithm on data which is precisely in the reverse order, so that each new node must become the new leftmost node of the tree.
A BST (even a balanced one!) can be constructed in linear time only if the input data is already sorted. Moreover, this is done using a special algorithm which takes advantage of the order; not by performing N insertions.
I'm guessing the 1+2+3+4+...+(n-1) insertion steps are clear, (for a reversed ordered list).
You should get comfortable with the idea that this number of steps is quadratic. Think about running the algorithm twice and count the number of steps:
[1+2+3+4+...+(n-1)] + [1+2+3+4+...+(n-1)] = [1+2+3+4+...+(n-1)] + [(n-1) + ... + 4+3+2+1] = n+n+...n = n^2
Therefore, one run take 0.5*n^2 steps.

Find k-th smallest element data structure

I have a problem here that requires to design a data structure that takes O(lg n) worst case for the following three operations:
a) Insertion: Insert the key into data structure only if it is not already there.
b) Deletion: delete the key if it is there!
c) Find kth smallest : find the ݇k-th smallest key in the data structure
I am wondering if I should use heap but I still don't have a clear idea about it.
I can easily get the first two part in O(lg n), even faster but not sure how to deal with the c) part.
Anyone has any idea please share.
Two solutions come in mind:
Use a balanced binary search tree (Red black, AVG, Splay,... any would do). You're already familiar with operation (1) and (2). For operation (3), just store an extra value at each node: the total number of nodes in that subtree. You could easily use this value to find the kth smallest element in O(log(n)).
For example, let say your tree is like follows - root A has 10 nodes, left child B has 3 nodes, right child C has 6 nodes (3 + 6 + 1 = 10), suppose you want to find the 8th smallest element, you know you should go to the right side.
Use a skip list. It also supports all your (1), (2), (3) operations for O(logn) on average but may be a bit longer to implement.
Well, if your data structure keeps the elements sorted, then it's easy to find the kth lowest element.
The worst-case cost of a Binary Search Tree for search and insertion is O(N) while the average-case cost is O(lgN).
Thus, I would recommend using a Red-Black Binary Search Tree which guarantees a worst-case complexity of O(lgN) for both search and insertion.
You can read more about red-black trees here and see an implementation of a Red-Black BST in Java here.
So in terms of finding the k-th smallest element using the above Red-Black BST implementation, you just need to call the select method, passing in the value of k. The select method also guarantees worst-case O(lgN).
One of the solution could be using the strategy of quick sort.
Step 1 : Pick the fist element as pivot element and take it to its correct place. (maximum n checks)
now when you reach the correct location for this element then you do a check
step 2.1 : if location >k
your element resides in the first sublist. so you are not interested in the second sublist.
step 2.2 if location
step 2.3 if location == k
you have got the element break the look/recursion
Step 3: repete the step 1 to 2.3 by using the appropriate sublist
Complexity of this solution is O(n log n)
Heap is not the right structure for finding the Kth smallest element of an array, simply because you would have to remove K-1 elements from the heap in order to get to the Kth element.
There is a much better approach to finding Kth smallest element, which relies on median-of-medians algorithm. Basically any partition algorithm would be good enough on average, but median-of-medians comes with the proof of worst-case O(N) time for finding the median. In general, this algorithm can be used to find any specific element, not only the median.
Here is the analysis and implementation of this algorithm in C#: Finding Kth Smallest Element in an Unsorted Array
P.S. On a related note, there are many many things that you can do in-place with arrays. Array is a wonderful data structure and only if you know how to organize its elements in a particular situation, you might get results extremely fast and without additional memory use.
Heap structure is a very good example, QuickSort algorithm as well. And here is one really funny example of using arrays efficiently (this problem comes from programming Olympics): Finding a Majority Element in an Array

Sorting an n element array with O(logn) distinct elements in O(nloglogn) worst case time

The problem at hand is whats in the title itself. That is to give an algorithm which sorts an n element array with O(logn) distinct elements in O(nloglogn) worst case time. Any ideas?
Further how do you generally handle arrays with multiple non distinct elements?
O(log(log(n))) time is enough for you to do a primitive operation in a search tree with O(log(n)) elements.
Thus, maintain a balanced search tree of all the distinct elements you have seen so far. Each node in the tree additionally contains a list of all elements you have seen with that key.
Walk through the input elements one by one. For each element, try to insert it into the tree (which takes O(log log n) time). If you find you've already seen an equal element, just insert it into the auxiliary list in the already-existing node.
After traversing the entire list, walk through the tree in order, concatenating the auxiliary lists. (If you take care to insert in the auxiliary lists at the right ends, this is even a stable sort).
Simple log(N) space solution would be:
find distinct elements using balanced tree (log(n) space, n+log(n) == n time)
Than you can use this this tree to allways pick correct pivot for quicksort.
I wonder if there is log(log(N)) space solution.
Some details about using a tree:
You should be able to use a red black tree (or other type of tree based sorting algorithm) using nodes that hold both a value and a counter: maybe a tuple (n, count).
When you insert a new value you either create a new node or you increment the count of the node with the value you are adding (if a node with that value already exists). If you just increment the counter it will take you O(logH) where H is the height of the tree (to find the node), if you need to create it it will also take O(logH) to create and position the node (the constants are bigger, but it's still O(logH).
This will ensure that the tree will have no more than O(logn) values (because you have log n distinct values). This means that the insertion will take O(loglogn) and you have n insertions, so O(nloglogn).

Median of BST in O(logn) time complexity

I came across solution given at http://discuss.joelonsoftware.com/default.asp?interview.11.780597.8 using Morris InOrder traversal using which we can find the median in O(n) time.
But is it possible to achieve the same using O(logn) time? The same has been asked here - http://www.careercup.com/question?id=192816
If you also maintain the count of the number of left and right descendants of a node, you can do it in O(logN) time, by doing a search for the median position. In fact, you can find the kth largest element in O(logn) time.
Of course, this assumes that the tree is balanced. Maintaining the count does not change the insert/delete complexity.
If the tree is not balanced, then you have Omega(n) worst case complexity.
See: Order Statistic Tree.
btw, BigO and Smallo are very different (your title says Smallo).
Unless you guarantee some sort of balanced tree, it's not possible.
Consider a tree that's completely degenerate -- e.g., every left pointer is NULL (nil, whatever), so each node only has a right child (i.e., for all practical purposes the "tree" is really a singly linked list).
In this case, just accessing the median node (at all) takes linear time -- even if you started out knowing that node N was the median, it would still take N steps to get to that node.
We can find the median by using the rabbit and the turtle pointer. The rabbit moves twice as fast as the turtle in the in-order traversal of the BST. This way when the rabbit reaches the end of traversal, the turtle in at the median of the BST.
Please see the full explanation.

Resources