Could someone explain to me how the Worst case running time of constructing a BST is n^2? I asked my professor and the only feedback i received is
"Because the tree is linear to the size of the input. The cost is 1+2+3+4+...+(n-1)."
Can someone explain this in a different way? Her explanation makes me think its O(n)....
I think the worst case happens when the input is already sorted:
A,B,C,D,E,F,G,H.
That's why you might want to randomly permute the input sequence if applicable.
The worst-case running time is proportional to the square of the input because the BST is unbalanced. An ubalanced BST can exhibit a degenerate structure: in the worst case, a singly linked list. Constructing this list will require that each insertion marches down the full length of the growing list to get to the leaf node to add a new leaf.
For instance, try running the algorithm on data which is precisely in the reverse order, so that each new node must become the new leftmost node of the tree.
A BST (even a balanced one!) can be constructed in linear time only if the input data is already sorted. Moreover, this is done using a special algorithm which takes advantage of the order; not by performing N insertions.
I'm guessing the 1+2+3+4+...+(n-1) insertion steps are clear, (for a reversed ordered list).
You should get comfortable with the idea that this number of steps is quadratic. Think about running the algorithm twice and count the number of steps:
[1+2+3+4+...+(n-1)] + [1+2+3+4+...+(n-1)] = [1+2+3+4+...+(n-1)] + [(n-1) + ... + 4+3+2+1] = n+n+...n = n^2
Therefore, one run take 0.5*n^2 steps.
Related
"Every comparison-based algorithm to sort n elements must take Ω(nlogn) comparisons in the worst case. With this fact, what would be the complexity of constructing a n-node binary search tree and why?"
Based on this question, I am thinking that the construction complexity must be at least O(nlogn). That said, I can't seem to figure out how to find the total complexity of construction.
The title of the question and the text you quote are asking different things. I am going to address what the quote is saying because finding how expensive BST construction is can be done just by looking at an algorithm.
Assume that for a second it was possible to construct a BST in better than Ω(nlogn). With a binary search tree you can read out the sorted list in Θ(n) time. This means I could create a sorting algorithm as follows.
Algorithm sort(L)
B <- buildBST(L)
Sorted <- inOrderTraversal(B)
return Sorted
With this algorithm I would be able to sort a list in better than Ω(nlogn). But as you stated this is not possible because Ω(nlogn) is a lower bound. Therefor it is not possible to create a binary search tree in better than Ω(nlogn) time.
Furthermore since an algorithm exits to create a BST in O(nlogn) time you can actually say that the algorithm is optimal under the comparison based model
The construction of the BST will be O(n(log(n))).
You will need to insert each and every node which is an O(n) operation.
To insert that n nodes you will need to make at least O(log(n)) comparisons.
Hence the minimum will be O(n(log(n))).
Only in the best case where the array is already sorted the time complexity will be O(n)
I know that linked list is not a appropriate data structure for building heaps.
One of the answers here (https://stackoverflow.com/a/14584517/5841727) says that heap sort can be done in O(nlogn) using linked list which is same as with arrays.
I think that heapify operation would cost O(n) time in linked list and we would need (n/2) heapify operations leading to time complexity of O(n^2).
Can someone please tell how to achieve O(nlogn) complexity (for heap sort ) using linked list ?
Stackoverflow URL you mentioned is merely someone's claim (at least when I'm writing this) so based on assumption here is my answer. Mostly when people mention "Time complexity", they mean asymptomatic analysis and finding out the proportion to which time taken by algorithm would increase with increasing size of input ignoring all the constants.
To prove the time complexity with linkedlist lets assume there is a function which returns value for given index (i know linked list don't return by index). For efficiency of this function you'd also need to pass in level but we can ignore that for now since it doesn't have any impact on time complexity.
So now it comes down to analyzing time proportion impact on this function with increasing input size. Now you could imagine that for fixing (heapifying) one node you may have to traverse 3 times max (1. find out which one to swap with requires one traverse to compare one of two possible children, 2. going back to parent for swaping 3. coming back down to the one you just swapped). Now even though it may seem that you are doing max n/2 traversal for 3 times; for asymptomatic analysis this is just equals to 'n'. Now you'll have to do this for log n exactly same way how you do for an array. Therefore time-complexity is O(n log n). On wikipedia time-complexity table for heaps URL https://en.wikipedia.org/wiki/Binary_heap#Summary_of_running_times
I have got a question, and it says "calculate the tight time complexity for the process of inserting n numbers into a binary search tree". It does not denote whether this is a balanced tree or not. So, what answer can be given to such a question? If this is a balanced tree, then height is logn, and inserting n numbers take O(nlogn) time. But this is unbalanced, it may take even O(n2) time in the worst case. What does it mean to find the tight time complexity of inserting n numbers to a bst? Am i missing something? Thanks
It could be O(n^2) even if the tree is balanced.
Suppose you're adding a sorted list of numbers, all larger than the largest number in the tree. In that case, all numbers will be added to the right child of the rightmost leaf in the tree, Hence O(n^2).
For example, suppose that you add the numbers [15..115] to the following tree:
The numbers will be added as a long chain, each node having a single right hand child. For the i-th element of the list, you'll have to traverse ~i nodes, which yields O(n^2).
In general, if you'd like to keep the insertion and retrieval at O(nlogn), you need to use Self Balancing trees.
What wiki is saying is correct.
Since the given tree is a BST, so one need not to search through entire tree, just comparing the element to be inserted with roots of tree/subtree will get the appropriate node for th element. This takes O(log2n).
Once we have such a node we can insert the key there bht after that it is required push all the elements in the right aub-tree to right, so that BST's searching property does not get violated. If the place to be inserted comes to be the very last last one, we need to worry for the second procedure. If note this procedure may take O(n), worst case!.
So the overall worst case complexity of inserting an element in a BST would be O(n).
Thanks!
I have a problem here that requires to design a data structure that takes O(lg n) worst case for the following three operations:
a) Insertion: Insert the key into data structure only if it is not already there.
b) Deletion: delete the key if it is there!
c) Find kth smallest : find the ݇k-th smallest key in the data structure
I am wondering if I should use heap but I still don't have a clear idea about it.
I can easily get the first two part in O(lg n), even faster but not sure how to deal with the c) part.
Anyone has any idea please share.
Two solutions come in mind:
Use a balanced binary search tree (Red black, AVG, Splay,... any would do). You're already familiar with operation (1) and (2). For operation (3), just store an extra value at each node: the total number of nodes in that subtree. You could easily use this value to find the kth smallest element in O(log(n)).
For example, let say your tree is like follows - root A has 10 nodes, left child B has 3 nodes, right child C has 6 nodes (3 + 6 + 1 = 10), suppose you want to find the 8th smallest element, you know you should go to the right side.
Use a skip list. It also supports all your (1), (2), (3) operations for O(logn) on average but may be a bit longer to implement.
Well, if your data structure keeps the elements sorted, then it's easy to find the kth lowest element.
The worst-case cost of a Binary Search Tree for search and insertion is O(N) while the average-case cost is O(lgN).
Thus, I would recommend using a Red-Black Binary Search Tree which guarantees a worst-case complexity of O(lgN) for both search and insertion.
You can read more about red-black trees here and see an implementation of a Red-Black BST in Java here.
So in terms of finding the k-th smallest element using the above Red-Black BST implementation, you just need to call the select method, passing in the value of k. The select method also guarantees worst-case O(lgN).
One of the solution could be using the strategy of quick sort.
Step 1 : Pick the fist element as pivot element and take it to its correct place. (maximum n checks)
now when you reach the correct location for this element then you do a check
step 2.1 : if location >k
your element resides in the first sublist. so you are not interested in the second sublist.
step 2.2 if location
step 2.3 if location == k
you have got the element break the look/recursion
Step 3: repete the step 1 to 2.3 by using the appropriate sublist
Complexity of this solution is O(n log n)
Heap is not the right structure for finding the Kth smallest element of an array, simply because you would have to remove K-1 elements from the heap in order to get to the Kth element.
There is a much better approach to finding Kth smallest element, which relies on median-of-medians algorithm. Basically any partition algorithm would be good enough on average, but median-of-medians comes with the proof of worst-case O(N) time for finding the median. In general, this algorithm can be used to find any specific element, not only the median.
Here is the analysis and implementation of this algorithm in C#: Finding Kth Smallest Element in an Unsorted Array
P.S. On a related note, there are many many things that you can do in-place with arrays. Array is a wonderful data structure and only if you know how to organize its elements in a particular situation, you might get results extremely fast and without additional memory use.
Heap structure is a very good example, QuickSort algorithm as well. And here is one really funny example of using arrays efficiently (this problem comes from programming Olympics): Finding a Majority Element in an Array
We always see operations on a (binary search) tree has O(logn) worst case running time because of the tree height is logn. I wonder if we are told that an algorithm has running time as a function of logn, e.g m + nlogn, can we conclude it must involve an (augmented) tree?
EDIT:
Thanks to your comments, I now realize divide-conquer and binary tree are so similar visually/conceptually. I had never made a connection between the two. But I think of a case where O(logn) is not a divide-conquer algo which involves a tree which has no property of a BST/AVL/red-black tree.
That's the disjoint set data structure with Find/Union operations, whose running time is O(N + MlogN), with N being the # of elements and M the number of Find operations.
Please let me know if I'm missing sth, but I cannot see how divide-conquer comes into play here. I just see in this (disjoint set) case that it has a tree with no BST property and a running time being a function of logN. So my question is about why/why not I can make a generalization from this case.
What you have is exactly backwards. O(lg N) generally means some sort of divide and conquer algorithm, and one common way of implementing divide and conquer is a binary tree. While binary trees are a substantial subset of all divide-and-conquer algorithms, the are a subset anyway.
In some cases, you can transform other divide and conquer algorithms fairly directly into binary trees (e.g. comments on another answer have already made an attempt at claiming a binary search is similar). Just for another obvious example, however, a multiway tree (e.g. a B-tree, B+ tree or B* tree), while clearly a tree is just as clearly not a binary tree.
Again, if you want to badly enough, you can stretch the point that a multiway tree can be represented as sort of a warped version of a binary tree. If you want to, you can probably stretch all the exceptions to the point of saying that all of them are (at least something like) binary trees. At least to me, however, all that does is make "binary tree" synonymous with "divide and conquer". In other words, all you accomplish is warping the vocabulary and essentially obliterating a term that's both distinct and useful.
No, you can also binary search a sorted array (for instance). But don't take my word for it http://en.wikipedia.org/wiki/Binary_search_algorithm
As a counter example:
given array 'a' with length 'n'
y = 0
for x = 0 to log(length(a))
y = y + 1
return y
The run time is O(log(n)), but no tree here!
Answer is no. Binary search of a sorted array is O(log(n)).
Algorithms taking logarithmic time are commonly found in operations on binary trees.
Examples of O(logn):
Finding an item in a sorted array with a binary search or a balanced search tree.
Look up a value in a sorted input array by bisection.
As O(log(n)) is only an upper bound also all O(1) algorithms like function (a, b) return a+b; satisfy the condition.
But I have to agree all Theta(log(n)) algorithms kinda look like tree algorithms or at least can be abstracted to a tree.
Short Answer:
Just because an algorithm has log(n) as part of its analysis does not mean that a tree is involved. For example, the following is a very simple algorithm that is O(log(n)
for(int i = 1; i < n; i = i * 2)
print "hello";
As you can see, no tree was involved. John, also provides a good example on how binary search can be done on a sorted array. These both take O(log(n)) time, and there are of other code examples that could be created or referenced. So don't make assumptions based on the asymptotic time complexity, look at the code to know for sure.
More On Trees:
Just because an algorithm involves "trees" doesn't imply O(logn) either. You need to know the tree type and how the operation affects the tree.
Some Examples:
Example 1)
Inserting or searching the following unbalanced tree would be O(n).
Example 2)
Inserting or search the following balanced trees would both by O(log(n)).
Balanced Binary Tree:
Balanced Tree of Degree 3:
Additional Comments
If the trees you are using don't have a way to "balance" than there is a good chance that your operations will be O(n) time not O(logn). If you use trees that are self balancing, then inserts normally take more time, as the balancing of the trees normally occur during the insert phase.