Can we reduce the time complexity to construct a Binary tree? - algorithm

In my yesterday's interview, i was asked time complexity to construct a binary tree from given inorder and preorder/postorder.
I came up with skewed tree, which needs O(N^2) and somehow if we can guarantee a balanced binary tree, then we can do it in O(N log N).
But, in order to reply unique, i came up with an idea that it can be done in O(N) time. Reason what i gave was
Put one by one all the nodes of inorder traversal in hash table in O(N).
Searching in hash table for a particular node can be done in amortized O(1).
Total time complexity can be theoretically reduced to O(N). (Actually, i haven't implemented it yet)
So, was i correct in my reply and the results haven't been announced yet.

It can be done in O(N) time and O(N) space (for a skewed binary tree), but instead of storing the elements of inorder traversal, store the indices of the elements in the hash table. Then the following algorithm should work:
Function buildTree (in_index1, in_index2, pre_index1, pre_index2)
root_index = hashEntry [pre-list[pre_index1]]
createNode (root)
root->leftChild = buildTree (in_index1, root_index-1, pre_index1 + 1, pre_index1 + (root_index-in_index1))
root->rightChild = buildTree (root_index+1, in_index2, pre_index1 + (root_index-in_index1) + 1, pre_index2)
return root
Note: Above is the basic idea. For the recursive calls, you will need to get the correct indices more carefully.

Related

What is the time complexity of constructing a binary search tree?

"Every comparison-based algorithm to sort n elements must take Ω(nlogn) comparisons in the worst case. With this fact, what would be the complexity of constructing a n-node binary search tree and why?"
Based on this question, I am thinking that the construction complexity must be at least O(nlogn). That said, I can't seem to figure out how to find the total complexity of construction.
The title of the question and the text you quote are asking different things. I am going to address what the quote is saying because finding how expensive BST construction is can be done just by looking at an algorithm.
Assume that for a second it was possible to construct a BST in better than Ω(nlogn). With a binary search tree you can read out the sorted list in Θ(n) time. This means I could create a sorting algorithm as follows.
Algorithm sort(L)
B <- buildBST(L)
Sorted <- inOrderTraversal(B)
return Sorted
With this algorithm I would be able to sort a list in better than Ω(nlogn). But as you stated this is not possible because Ω(nlogn) is a lower bound. Therefor it is not possible to create a binary search tree in better than Ω(nlogn) time.
Furthermore since an algorithm exits to create a BST in O(nlogn) time you can actually say that the algorithm is optimal under the comparison based model
The construction of the BST will be O(n(log(n))).
You will need to insert each and every node which is an O(n) operation.
To insert that n nodes you will need to make at least O(log(n)) comparisons.
Hence the minimum will be O(n(log(n))).
Only in the best case where the array is already sorted the time complexity will be O(n)

Worst case running time of constructing a BST?

Could someone explain to me how the Worst case running time of constructing a BST is n^2? I asked my professor and the only feedback i received is
"Because the tree is linear to the size of the input. The cost is 1+2+3+4+...+(n-1)."
Can someone explain this in a different way? Her explanation makes me think its O(n)....
I think the worst case happens when the input is already sorted:
A,B,C,D,E,F,G,H.
That's why you might want to randomly permute the input sequence if applicable.
The worst-case running time is proportional to the square of the input because the BST is unbalanced. An ubalanced BST can exhibit a degenerate structure: in the worst case, a singly linked list. Constructing this list will require that each insertion marches down the full length of the growing list to get to the leaf node to add a new leaf.
For instance, try running the algorithm on data which is precisely in the reverse order, so that each new node must become the new leftmost node of the tree.
A BST (even a balanced one!) can be constructed in linear time only if the input data is already sorted. Moreover, this is done using a special algorithm which takes advantage of the order; not by performing N insertions.
I'm guessing the 1+2+3+4+...+(n-1) insertion steps are clear, (for a reversed ordered list).
You should get comfortable with the idea that this number of steps is quadratic. Think about running the algorithm twice and count the number of steps:
[1+2+3+4+...+(n-1)] + [1+2+3+4+...+(n-1)] = [1+2+3+4+...+(n-1)] + [(n-1) + ... + 4+3+2+1] = n+n+...n = n^2
Therefore, one run take 0.5*n^2 steps.

LSM Tree lookup time

What's the worst case time complexity in a log-structured merge tree for a simple search query (like querying a single WHERE clause)?
Is it O(log N)? O(N*Log N)? Something else?
How about for a multiple query, like searching for multiple WHERE clauses in a key-value database?
The wikipedia page on LSM trees is currently lacking this info.
And I'm trying to make sense of the original paper.
I have been wondering the same.
If you have a series of trees, getting smaller by a constant factor each time, and you need to search them all for a single key, the cost seems to be O(log(N)^2).
Say the first (binary) tree takes log_2(N) branches to reach a node. The second might be half the size, and take (log_2(N) - 1) branches to find a node. The smallest tree will be some O(1) constant in size and there are roughly log_2(N) trees total. Summing the series gives O(log_2(N)^2).
However, I'm wondering if there is some more clever scheme where arbitrary single-key lookups, insertions or deletions have amortized cost O(log(N)), but haven't been able to find an answer (yet).
For a simple search indexed by a LSM tree, it is O(log n). This is because the biggest tree in the LSM tree is a B tree, which is O(log n), and the other trees are subsets of B trees or in the case of in memory trees, more efficient trees, which are no worse than O(log n). The number of trees is a constant, so it doesn't affect the order of the search time.

Find median in O(1) in binary tree

Suppose I have a balanced BST (binary search tree). Each tree node contains a special field count, which counts all descendants of that node + the node itself. They call this data structure order statistics binary tree.
This data structure supports two operations of O(logN):
rank(x) -- number of elements that are less than x
findByRank(k) -- find the node with rank == k
Now I would like to add a new operation median() to find the median. Can I assume this operation is O(1) if the tree is balanced?
Unless the tree is complete, the median might be a leaf node. So in general case the cost will be O(logN). I guess there is a data structure with requested properties and with a O(1) findMedian operation (Perhaps a skip list + a pointer to the median node; I'm not sure about findByRank and rank operations though) but a balanced BST is not one of them.
If the tree is complete (i.e. all levels completely filled), yes you can.
In a balanced order statistics tree, finding the median is O(log N). If it is important to find the median in O(1) time, you can augment the data structure by maintaining a pointer to the median. The catch, of course, is that you would need to update this pointer during each Insert or Delete operation. Updating the pointer would take O(log N) time, but since those operations already take O(log N) time, the extra work of updating the median pointer does not change their big-O cost.
As a practical matter, this only makes sense if you do a lot of "find median" operations compared to the number of insertions/deletions.
If desired, you could reduce the cost of updating the median pointer during Insert/Delete to O(1) by using a (doubly) threaded binary tree, but Insert/Delete would still be O(log N).

Is O(logn) always a tree?

We always see operations on a (binary search) tree has O(logn) worst case running time because of the tree height is logn. I wonder if we are told that an algorithm has running time as a function of logn, e.g m + nlogn, can we conclude it must involve an (augmented) tree?
EDIT:
Thanks to your comments, I now realize divide-conquer and binary tree are so similar visually/conceptually. I had never made a connection between the two. But I think of a case where O(logn) is not a divide-conquer algo which involves a tree which has no property of a BST/AVL/red-black tree.
That's the disjoint set data structure with Find/Union operations, whose running time is O(N + MlogN), with N being the # of elements and M the number of Find operations.
Please let me know if I'm missing sth, but I cannot see how divide-conquer comes into play here. I just see in this (disjoint set) case that it has a tree with no BST property and a running time being a function of logN. So my question is about why/why not I can make a generalization from this case.
What you have is exactly backwards. O(lg N) generally means some sort of divide and conquer algorithm, and one common way of implementing divide and conquer is a binary tree. While binary trees are a substantial subset of all divide-and-conquer algorithms, the are a subset anyway.
In some cases, you can transform other divide and conquer algorithms fairly directly into binary trees (e.g. comments on another answer have already made an attempt at claiming a binary search is similar). Just for another obvious example, however, a multiway tree (e.g. a B-tree, B+ tree or B* tree), while clearly a tree is just as clearly not a binary tree.
Again, if you want to badly enough, you can stretch the point that a multiway tree can be represented as sort of a warped version of a binary tree. If you want to, you can probably stretch all the exceptions to the point of saying that all of them are (at least something like) binary trees. At least to me, however, all that does is make "binary tree" synonymous with "divide and conquer". In other words, all you accomplish is warping the vocabulary and essentially obliterating a term that's both distinct and useful.
No, you can also binary search a sorted array (for instance). But don't take my word for it http://en.wikipedia.org/wiki/Binary_search_algorithm
As a counter example:
given array 'a' with length 'n'
y = 0
for x = 0 to log(length(a))
y = y + 1
return y
The run time is O(log(n)), but no tree here!
Answer is no. Binary search of a sorted array is O(log(n)).
Algorithms taking logarithmic time are commonly found in operations on binary trees.
Examples of O(logn):
Finding an item in a sorted array with a binary search or a balanced search tree.
Look up a value in a sorted input array by bisection.
As O(log(n)) is only an upper bound also all O(1) algorithms like function (a, b) return a+b; satisfy the condition.
But I have to agree all Theta(log(n)) algorithms kinda look like tree algorithms or at least can be abstracted to a tree.
Short Answer:
Just because an algorithm has log(n) as part of its analysis does not mean that a tree is involved. For example, the following is a very simple algorithm that is O(log(n)
for(int i = 1; i < n; i = i * 2)
print "hello";
As you can see, no tree was involved. John, also provides a good example on how binary search can be done on a sorted array. These both take O(log(n)) time, and there are of other code examples that could be created or referenced. So don't make assumptions based on the asymptotic time complexity, look at the code to know for sure.
More On Trees:
Just because an algorithm involves "trees" doesn't imply O(logn) either. You need to know the tree type and how the operation affects the tree.
Some Examples:
Example 1)
Inserting or searching the following unbalanced tree would be O(n).
Example 2)
Inserting or search the following balanced trees would both by O(log(n)).
Balanced Binary Tree:
Balanced Tree of Degree 3:
Additional Comments
If the trees you are using don't have a way to "balance" than there is a good chance that your operations will be O(n) time not O(logn). If you use trees that are self balancing, then inserts normally take more time, as the balancing of the trees normally occur during the insert phase.

Resources