BinaryTree inOrder Traversal Sorting complexity - data-structures

I am confused to why quicksort, shellsort, mergesort...all O(nlog(n)) algorithms repeatedly mentioned as popular sorting algorithms, Doesn't inorder traversal of a binarysearch tree give O(n) complexity to sort a tree? What am I missing?

No. Building the tree has O(N log N) complexity (i.e., you're inserting N items into the tree, and each insertion has logarithmic complexity).
Once you have the tree built, you can traverse it with linear complexity (especially if it's a threaded tree), but that's not equivalent to sorting--it's equivalent to traversing an array after you've sorted it.
Although they have the same asymptotic complexity, building a tree will usually be slower by a substantial factor, because you have to allocate nodes for the tree and traverse non-contiguously allocated nodes to walk the tree.

Basically there are 6 types of complexity.
O(1),O(logn),O(n),O(nlogn),O(n^2),O(n^3).
For first part of your question why popular sorting algorithm are having O(nlogn) complexity its simply because we cannot sort an array in O(n) complexity.
Its because for O(n) complexity you want to sort the array in only one loop.
That is not possible since we cannot sort an array in one strech.
So the next possible complexity is O(nlogn). That is dividing and conquer method for example in merge sort
We find the middle element sort each side of that recursively. for recursion since it decreasing size to half each time the complexity is o(logn). For sorting part it make the complexity to O(nlogn).
For Next part of your question remember the fact that all basic operation like insertion,deletion in a BST is having the complexity of O(logn) where logn is the height of the tree.
So if you sort a tree using o(n) complexity that make it O(nlogn) in total.
Comment me if you didnt got my point. This is the simple way to answer it i think.

Any traversals of a binary tree is O(n). But that is not sorting.
Having a BST meaning that is already a sorted tree. You are just traversing it. Building a BST is the sorting process which is asymptotically O(nlog(n)).
Please note O(n) + O(nlog(n)) is same as O(nlog(n)).

Related

Tree Sort Performance

I have an AVL tree implementation where the insertion method runs in O(log n) time and the method that returns an in-order list representation runs in O(n^2) time. If I have a list needed to be sorted. By using a for-loop, I can iterate through the list and insert each element into the AVL tree, which will run in O(n log n) time combined. So what is the performance of this entire sorting algorithm (i.e. iterate through the list, insert each element, then use in-order traversal to return a sorted list)?
You correctly say that adding n elements to the tree will take time O(nlog(n)). A simple in-order traversal of a BST can be performed in time O(n). It is thus possible to get a sorted list of the elements in time O(nlog(n) + n) = O(nlog(n)). If the time complexity of your algorithm to generate the sorted list from the tree is quadratic (i.e. in O(n^2) but not always in O(n)) the worst case time complexity of the procedure you describe is in O(nlog(n) + n^2) = O(n^2), which is not optimal.

What is the time complexity of constructing a binary search tree?

"Every comparison-based algorithm to sort n elements must take Ω(nlogn) comparisons in the worst case. With this fact, what would be the complexity of constructing a n-node binary search tree and why?"
Based on this question, I am thinking that the construction complexity must be at least O(nlogn). That said, I can't seem to figure out how to find the total complexity of construction.
The title of the question and the text you quote are asking different things. I am going to address what the quote is saying because finding how expensive BST construction is can be done just by looking at an algorithm.
Assume that for a second it was possible to construct a BST in better than Ω(nlogn). With a binary search tree you can read out the sorted list in Θ(n) time. This means I could create a sorting algorithm as follows.
Algorithm sort(L)
B <- buildBST(L)
Sorted <- inOrderTraversal(B)
return Sorted
With this algorithm I would be able to sort a list in better than Ω(nlogn). But as you stated this is not possible because Ω(nlogn) is a lower bound. Therefor it is not possible to create a binary search tree in better than Ω(nlogn) time.
Furthermore since an algorithm exits to create a BST in O(nlogn) time you can actually say that the algorithm is optimal under the comparison based model
The construction of the BST will be O(n(log(n))).
You will need to insert each and every node which is an O(n) operation.
To insert that n nodes you will need to make at least O(log(n)) comparisons.
Hence the minimum will be O(n(log(n))).
Only in the best case where the array is already sorted the time complexity will be O(n)

What is the appropriate data structure for insertion sort?

I revisited insertion sort algorithm and noticed something funny.
One obviously shouldn't use an array with this sort, as upon insertion, one will have to shift all subsequent elements O(n^2 log(n)). However a linked list is also not good here, since we preferably find the right placement using binary search, which isn't possible for a simple linked list (so we end up with O(n^2)).
Which makes me wonder: what is a data structure on which this sorting algorithm provides its premise of O(nlog(n)) complexity?
From where did you get the premise of O(n log n)? Wikipedia disagrees, as does my own experience. The premises of the insertion sort include components that are O(n) for each of the n elements.
Also, I believe that your claim of O(n^2 log n) is incorrect. The binary search is log n, and the ensuing "move sideways" is n, but these two steps are in succession, not nested. The result is n + log n, not a multiplication. The result is the expected O(n^2).
If you use a gapped array and a binary search to figure out where to insert things, then with high probability your sort will be O(n log(n)). See https://en.wikipedia.org/wiki/Library_sort for details.
However this is not as efficient as a wide variety of other sorts that are widely implemented. So this knowledge is only of theoretical interest.
Insertion sort is defined over array or list, if you use some other data structure, then it will be another algorithm.
Of course if you use a BST, insertion and search would be O(log(n)) and your overall complexity would be O(n.log(n)) on the average (remind that it will be O(n^2) in the worst), but this will be no more an insertion sort but a tree sort. If you use an AVL tree, then you get the O(n.log(n)) worst case complexity.
In insertion sort the best case scenario is when the sequence is already sorted and that takes Linear time and in the worst case takes O(n^2) time. I do not know how you got the logarithmic part in the complexity.

Best way to join two red-black trees

The easiest way is to store two trees in two arrays, merge them and build a new red-black tree with a sorted array which takes O(m + n) times.
Is there an algorithm with less time complexity?
You can merge two red-black trees in time O(m log(n/m + 1)) where n and m are the input sizes and, WLOG, m ≤ n. Notice that this bound is tighter than O(m+n). Here's some intuition:
When the two trees are similar in size (m ≈ n), the bound is approximately O(m) = O(n) = O(n + m).
When one tree is significantly larger than the other (m ≪ n), the bound is approximately O(log n).
You can find a brief description of the algorithm here. A more in-depth description which generalizes to other balancing schemes (AVL, BB[α], Treap, ...) can be found in a recent paper.
I think that if you have a generic Sets (so generic red-black tree) you can't choose the solution which was suggested #Sam Westrick. Because he assumes that all elements in the first set are less then the elements in the second set. Also into the Cormen (the best book to learn algorithm and data structures) specifies this condition to join two red-black tree.
Due to the fact that you need to compare each element in both m and n Red-Black Trees, you will have to deal with a minimum of O(m+n) time complexity, there's a way to do it O(1) space complexity, but that is something else which has nothing to do with your qu. if you are not iterating and checking each element in each Red-Black Tree, you cannot guarantee that your new Red-Black Tree will be sorted. I can think of another way of merging two Red-Black Trees, which called "In-Place Merge using DLL", but this one would also result O(m+n) time complexity.
Convert the given two Red-Black Trees into Doubly Linked List, which has O(m+n) time complexity.
Merge the two sorted Linked Lists, which has O(m+n) time complexity.
Build a Balanced Red-Black Tree from the merged list created in step 2, which has O(m+n) time complexity.
Time complexity of this method is also O(m+n).
So due to the fact you have to compare the elements each Tree with the other elements of the other Tree, you will have to end up with at least O(m+n).

Why we can not construct a Binary Search Tree from an unsorted array in O(n)?

Even if Space complexity does not matter, why is this impossible?
Is there a way out?
All I can come up with is O(nlogn) time as Average and O(n^2) as Worst case.
You can traverse the tree in O(n).
If you could build a tree in O(n), you would have a sorting algorithm which is also O(n).
But that's impossible - comparison sort cannot be done less than O(nlogn).
Note: if you use a balanced binary tree, you can construct it in O(nlogn).

Resources