Is O(logn) always a tree? - algorithm

We always see operations on a (binary search) tree has O(logn) worst case running time because of the tree height is logn. I wonder if we are told that an algorithm has running time as a function of logn, e.g m + nlogn, can we conclude it must involve an (augmented) tree?
EDIT:
Thanks to your comments, I now realize divide-conquer and binary tree are so similar visually/conceptually. I had never made a connection between the two. But I think of a case where O(logn) is not a divide-conquer algo which involves a tree which has no property of a BST/AVL/red-black tree.
That's the disjoint set data structure with Find/Union operations, whose running time is O(N + MlogN), with N being the # of elements and M the number of Find operations.
Please let me know if I'm missing sth, but I cannot see how divide-conquer comes into play here. I just see in this (disjoint set) case that it has a tree with no BST property and a running time being a function of logN. So my question is about why/why not I can make a generalization from this case.

What you have is exactly backwards. O(lg N) generally means some sort of divide and conquer algorithm, and one common way of implementing divide and conquer is a binary tree. While binary trees are a substantial subset of all divide-and-conquer algorithms, the are a subset anyway.
In some cases, you can transform other divide and conquer algorithms fairly directly into binary trees (e.g. comments on another answer have already made an attempt at claiming a binary search is similar). Just for another obvious example, however, a multiway tree (e.g. a B-tree, B+ tree or B* tree), while clearly a tree is just as clearly not a binary tree.
Again, if you want to badly enough, you can stretch the point that a multiway tree can be represented as sort of a warped version of a binary tree. If you want to, you can probably stretch all the exceptions to the point of saying that all of them are (at least something like) binary trees. At least to me, however, all that does is make "binary tree" synonymous with "divide and conquer". In other words, all you accomplish is warping the vocabulary and essentially obliterating a term that's both distinct and useful.

No, you can also binary search a sorted array (for instance). But don't take my word for it http://en.wikipedia.org/wiki/Binary_search_algorithm

As a counter example:
given array 'a' with length 'n'
y = 0
for x = 0 to log(length(a))
y = y + 1
return y
The run time is O(log(n)), but no tree here!

Answer is no. Binary search of a sorted array is O(log(n)).

Algorithms taking logarithmic time are commonly found in operations on binary trees.
Examples of O(logn):
Finding an item in a sorted array with a binary search or a balanced search tree.
Look up a value in a sorted input array by bisection.

As O(log(n)) is only an upper bound also all O(1) algorithms like function (a, b) return a+b; satisfy the condition.
But I have to agree all Theta(log(n)) algorithms kinda look like tree algorithms or at least can be abstracted to a tree.

Short Answer:
Just because an algorithm has log(n) as part of its analysis does not mean that a tree is involved. For example, the following is a very simple algorithm that is O(log(n)
for(int i = 1; i < n; i = i * 2)
print "hello";
As you can see, no tree was involved. John, also provides a good example on how binary search can be done on a sorted array. These both take O(log(n)) time, and there are of other code examples that could be created or referenced. So don't make assumptions based on the asymptotic time complexity, look at the code to know for sure.
More On Trees:
Just because an algorithm involves "trees" doesn't imply O(logn) either. You need to know the tree type and how the operation affects the tree.
Some Examples:
Example 1)
Inserting or searching the following unbalanced tree would be O(n).
Example 2)
Inserting or search the following balanced trees would both by O(log(n)).
Balanced Binary Tree:
Balanced Tree of Degree 3:
Additional Comments
If the trees you are using don't have a way to "balance" than there is a good chance that your operations will be O(n) time not O(logn). If you use trees that are self balancing, then inserts normally take more time, as the balancing of the trees normally occur during the insert phase.

Related

What is the time complexity of constructing a binary search tree?

"Every comparison-based algorithm to sort n elements must take Ω(nlogn) comparisons in the worst case. With this fact, what would be the complexity of constructing a n-node binary search tree and why?"
Based on this question, I am thinking that the construction complexity must be at least O(nlogn). That said, I can't seem to figure out how to find the total complexity of construction.
The title of the question and the text you quote are asking different things. I am going to address what the quote is saying because finding how expensive BST construction is can be done just by looking at an algorithm.
Assume that for a second it was possible to construct a BST in better than Ω(nlogn). With a binary search tree you can read out the sorted list in Θ(n) time. This means I could create a sorting algorithm as follows.
Algorithm sort(L)
B <- buildBST(L)
Sorted <- inOrderTraversal(B)
return Sorted
With this algorithm I would be able to sort a list in better than Ω(nlogn). But as you stated this is not possible because Ω(nlogn) is a lower bound. Therefor it is not possible to create a binary search tree in better than Ω(nlogn) time.
Furthermore since an algorithm exits to create a BST in O(nlogn) time you can actually say that the algorithm is optimal under the comparison based model
The construction of the BST will be O(n(log(n))).
You will need to insert each and every node which is an O(n) operation.
To insert that n nodes you will need to make at least O(log(n)) comparisons.
Hence the minimum will be O(n(log(n))).
Only in the best case where the array is already sorted the time complexity will be O(n)

Time complexity of binary search in a slightly unbalanced binary tree

The best case running time for binary search is O(log(n)), if the binary tree is balanced. The worst case would be, if the binary tree is so unbalanced, that it basically represents a linked list. In that case the running time of a binary search would be O(n).
However, what if the tree is only slightly unbalanced, as is teh case for this tree:
Best case would still be O(log n) if I am not mistaken. But what would be the worst case?
Typically, when we say something like "the cost of looking up an element in a balanced binary search tree is O(log n)," what we mean is "in the worst case, we have to do O(log n) work in the course of performing a search on a balanced binary search tree." And since we're talking about big-O notation here, the previous statement is meant to be taken about balanced trees in general rather than a specific concrete tree.
If you have a specific BST in mind, you can work out the maximum number of comparisons required to find any element. Just find the deepest node in the tree, then imagine searching for a value that's bigger than that value but smaller than the next value in the tree. That will cause you to walk all the way down the tree as deeply as possible, making the maximum number of comparisons possible (specifically, h + 1 of them, where h is the height of the tree).
To be able to talk about the big-O cost of performing lookups in a tree, you'd need to talk about a family of trees of different numbers of nodes. You could imagine "kinda balanced" trees whose depth is Θ(√n), for example, where lookups would take time O(√n), for example. However, it's uncommon to encounter trees like that in practice, since generally you'd either (1) have a totally imbalanced tree or (2) use some sort of balanced tree that would prevent the height from getting that high.
In a sorted array of n values, the run-time of binary search for a value, is
O(log n), in the worst case. In the best case, the element you are searching for, is in the exact middle, and it can finish up in constant-time. In the average case too, the run-time is O(log n).

Worst case running time of constructing a BST?

Could someone explain to me how the Worst case running time of constructing a BST is n^2? I asked my professor and the only feedback i received is
"Because the tree is linear to the size of the input. The cost is 1+2+3+4+...+(n-1)."
Can someone explain this in a different way? Her explanation makes me think its O(n)....
I think the worst case happens when the input is already sorted:
A,B,C,D,E,F,G,H.
That's why you might want to randomly permute the input sequence if applicable.
The worst-case running time is proportional to the square of the input because the BST is unbalanced. An ubalanced BST can exhibit a degenerate structure: in the worst case, a singly linked list. Constructing this list will require that each insertion marches down the full length of the growing list to get to the leaf node to add a new leaf.
For instance, try running the algorithm on data which is precisely in the reverse order, so that each new node must become the new leftmost node of the tree.
A BST (even a balanced one!) can be constructed in linear time only if the input data is already sorted. Moreover, this is done using a special algorithm which takes advantage of the order; not by performing N insertions.
I'm guessing the 1+2+3+4+...+(n-1) insertion steps are clear, (for a reversed ordered list).
You should get comfortable with the idea that this number of steps is quadratic. Think about running the algorithm twice and count the number of steps:
[1+2+3+4+...+(n-1)] + [1+2+3+4+...+(n-1)] = [1+2+3+4+...+(n-1)] + [(n-1) + ... + 4+3+2+1] = n+n+...n = n^2
Therefore, one run take 0.5*n^2 steps.

LSM Tree lookup time

What's the worst case time complexity in a log-structured merge tree for a simple search query (like querying a single WHERE clause)?
Is it O(log N)? O(N*Log N)? Something else?
How about for a multiple query, like searching for multiple WHERE clauses in a key-value database?
The wikipedia page on LSM trees is currently lacking this info.
And I'm trying to make sense of the original paper.
I have been wondering the same.
If you have a series of trees, getting smaller by a constant factor each time, and you need to search them all for a single key, the cost seems to be O(log(N)^2).
Say the first (binary) tree takes log_2(N) branches to reach a node. The second might be half the size, and take (log_2(N) - 1) branches to find a node. The smallest tree will be some O(1) constant in size and there are roughly log_2(N) trees total. Summing the series gives O(log_2(N)^2).
However, I'm wondering if there is some more clever scheme where arbitrary single-key lookups, insertions or deletions have amortized cost O(log(N)), but haven't been able to find an answer (yet).
For a simple search indexed by a LSM tree, it is O(log n). This is because the biggest tree in the LSM tree is a B tree, which is O(log n), and the other trees are subsets of B trees or in the case of in memory trees, more efficient trees, which are no worse than O(log n). The number of trees is a constant, so it doesn't affect the order of the search time.

Create a balanced binary search tree from a stream of integers

I have just finished a job interview and I was struggling with this question, which seems to me as a very hard question for giving on a 15 minutes interview.
The question was:
Write a function, which given a stream of integers (unordered), builds a balanced search tree.
Now, you can't wait for the input to end (it's a stream), so you need to balance the tree on the fly.
My first answer was to use a Red-Black tree, which of course does the job, but i have to assume they didn't expect me to implement a red black tree in 15 minutes.
So, is there any simple solution for this problem i'm not aware of?
Thanks,
Dave
I personally think that the best way to do this would be to go for a randomized binary search tree like a treap. This doesn't absolutely guarantee that the tree will be balanced, but with high probability the tree will have a good balance factor. A treap works by augmenting each element of the tree with a uniformly random number, then ensuring that the tree is a binary search tree with respect to the keys and a heap with respect to the uniform random values. Insertion into a treap is extremely easy:
Pick a random number to assign to the newly-added element.
Insert the element into the BST using standard BST insertion.
While the newly-inserted element's key is greater than the key of its parent, perform a tree rotation to bring the new element above its parent.
That last step is the only really hard one, but if you had some time to work it out on a whiteboard I'm pretty sure that you could implement this on-the-fly in an interview.
Another option that might work would be to use a splay tree. It's another type of fast BST that can be implemented assuming you have a standard BST insert function and the ability to do tree rotations. Importantly, splay trees are extremely fast in practice, and it's known that they are (to within a constant factor) at least as good as any other static binary search tree.
Depending on what's meant by "search tree," you could also consider storing the integers in some structure optimized for lookup of integers. For example, you could use a bitwise trie to store the integers, which supports lookup in time proportional to the number of bits in a machine word. This can be implemented quite nicely using a recursive function to look over the bits, and doesn't require any sort of rotations. If you needed to blast out an implementation in fifteen minutes, and if the interviewer allows you to deviate from the standard binary search trees, then this might be a great solution.
Hope this helps!
AA Trees are a bit simpler than Red-Black trees, but I couldn't implement one off the top of my head.
One of the simplest balanced binary search tree is BB(α)-tree. You pick the constant α, which says how much unbalanced can the tree get. At all times, #descendants(child) <= (1-α) × #descendants(node) must hold. You treat it as normal binary search tree, but when the formula doesn't apply to some node anymore, you just rebuild that part of the tree from scratch, so that it is perfectly balanced.
The amortized time complexity for insertion or deletion is still O(log N), just as with other balanced binary trees.

Resources