Recurrence Relation/Time Complexity for finding average height of a BST - algorithm

Say we have an initially empty BST where I perform n arbitrary inserts, how would I find the average height of this BST? The expression/pseudocode for this would be (if I'm not mistaken):
H(T) = 1 + max(H(T.left), H(T.right))
My guess at a recurrence relation for this would be T(n) = 1 + 2*T(n/2), but I'm not sure if this is correct.
Now here's my dilemma, if my recurrence relation is correct, how do I calculate the average complexity for my average height algorithm?

In general average case analysis is more complicated and you can't really use the same big-O techniques you would use in a normal worst-case proof. While your definition of height is correct, translating it to the recurrence will probably be more complicated then that. First off, you probably meant T(n) = 1 + T(n/2) (this would give a O(log n) height while your version gives O(n)) and then, nothing guarantees that values are evenly split 50-50 between right and left.
If you search a bit you will see that there is plenty of material out there on the average height of BSTs. For example, one of the results I got said that the expected height of a BST tends to 4.3 * (log n) as n grows but goes through lots of complicated math to get there.

T(n/2)+c
where c is some constant
and we divide our array in two parts but we use only single part to search.if out ans is the larger then the middle value of then we only search in (mid+1.....j)
and if its smaller then the middle value then we only search in(i.....mid)
so, at time we only work with the single sub-array

Related

Big O for Height of Balanced Binary Tree

Perhaps a dumb question. In a balanced binary tree where n is the total number of nodes, I understand why the height is equal to log(n). What I don't understand is what people mean when they refer to the height as being O(log(n)). I've only seen Big O used in the context of algorithms, where if an algorithm runs in O(n) and if the input doubles then the running time doubles. But height isn't an algorithm. How does this apply to the height of a tree? What does it mean for the height to be O(log(n))?
This is because a complete binary tree of n nodes does not have height log(n).
Consider a complete binary tree of height k. Such a tree has 2k leaf nodes. How many nodes does it have in total? If you look at each level, you will find that it has 1 + 2 + 4 + 8 + ... + 2k nodes, or 20 + 21 + 22 + 23 + ... 2k.
After some math, you will find that this series equals 2k+1 - 1.
So, if your tree has n nodes, what is its height? If you solve the equation n = 2k+1 - 1 with respect to k, you obtain k = log2(n+1) - 1.
This expression is slightly less nice than log2(n), and it is certainly not the same number. However, by the properties of big-O notation,
log2(n+1) - 1 = O(log(n)).
In the source you are reading, emphasis is given on that the height grows as fast as log(n). They belong to the same complexity class. This information can be useful when designing algorithms, since you know that doubling the input will increase the tree height only by a constant. This property gives tree structures immense power. So even if the expression for the exact height of the tree is more complicated (and if your binary tree is not complete, it will look more complicated still), it is of logarithmic complexity with respect to n.
To add to Berthur's excellent answer, Big-Oh notation is not specific to analysis of algorithms; it applies to any functions. In analysis f algorithms we care about the function T(n) which gives the (typically worst-case) runtime for input size n, and we want to know an upper bound (Big-Oh) on that function's rate of growth. Here, there is a function that gives the true height of a tree with whatever property, and we want an upper bound on that function's rate of growth. We could find upper bounds on arbitrary functions devoid of any context at all like f(n) = n!^(1/2^n) or whatever.
I think they mean it takes O(log(n)) to traverse the tree

Quicksort time complexity when it always select the 2nd smallest element as pivot in a sublist

Time complexity of Quicksort when pivot always is the 2nd smallest element in a sublist.
Is it still O(NlogN)?
If i solve the recurrence equation
F(N) = F(N-2) + N
= F(N-2(2)) + 2N -2
= F(N-3(2)) + 3N - (2+1)(2)
= F(N-4(2)) + 4N - (3+2+1)(2)
Which is O(N^2), but I doubt my answer somehow, someone help me with the clarification please?
To start with, the quicksort algorithm has an average time complexity of O(NlogN), but its worst-time complexity is actually O(N^2).
The generic complexity analysis of quicksort depends not just on the devising of the recurrence relations, but also on the value of the variable K in F(N-K) term of your recurrence relation. And according to whether you're calculating best, average and worst case complexities, that value is usually estimated by the probability distribution of having the best, average, or worst element as the pivot, respectively.
If, for instance, you want to compute the best case, then you may think that your pivot always divides the array into two. (i.e. K=N/2) If computing for the worst case, you may think that your pivot is either the largest or the smallest element. (i.e. K=1) For the average case, based on the probability distribution of the indices of the elements, K=N/4 is used. (You may need more about it here) Basically, for the average case, your recurrence relation becomes F(N) = F(N / 4) + F(3 * N / 4) + N, which yields O(NlogN).
Now, the value you assumed for K, namely 2, is just one shy from the worst case scenario. That is why you can not observe the average case performance of O(NlogN) here, and get O(N^2).

Time complexity for modified binary search which calculates the mid as high - 2

I have to find out the time complexity for binary search which calculates the dividing point as mid = high - 2 (instead of mid = (low + high)/2)
so as to know how much slower or faster the modified algorithm would be
The worst-case scenario is that the searched item is the very first one. In this case, since you always subtract 2 from n, you will have roughly n/2 steps, which is a linear complexity. The best case is that the searched item is exactly at n-2, which will take a constant complexity. The average complexity, assuming that n -> infinity will be linear as well.
Hint: You can derive the answer based on the recurrence formula for binary search.
We have T(n) = T(floor(n/2)) + O(1)
Since we divide in two equal halfs, we have floor(n/2). You should rewrite the given formula to describe the modified version. Furthermore, you should use Akra-Bazzi method to solve the recursion formula for the modified version since you are dividing in two unbalanced halfs.

Randomized Quick Sort Pivot selection with 25%-75% split

I came to know that in case of Randomized quick sort, if we choose the pivot in such a way that it will at least give the split in the ration 25%-75%, then the run time is O(n log n).
Now I also came to know that we can prove this with Master Theorem.
But my problem is that if we split the array in 25%-75% in each step, then how I will define my T(n) and how can I prove that the runtime analysis in O(n log n)?
You can use Master theorem to find the complexity of this kind of algorithms. In this particular case assume, that when you divide the array into two parts each of these parts is not greater then 3/4 of the initial array. Then, T(n) < 2 * T(3/4 * n) + O(n), or T(n) = 2 * T(3/4 * n) + O(n) if you look for upper bound. Master theorem gives you the solution for this equation.
Update: though Master theorem may solve such recurrence equations, in this case it gives us a result which is worse than expected O(n*log n). Nevertheless, it can be solved in other way. If we assume that a pivot always splits the array in the way that the smaller part is >= 1/4 size, then we can limit the recursion depth as log_{4/3}N (because on each level the size of array decreases by at least 4/3 times). Time complexity on each recursion level is O(n) in total, thus we have O(n) * log{4/3}n = O(n*log n) overall complexity.
Furthermore, if you want some more strict analysis, you may consider a Wikipedia article, there are some good proofs.

Can someone explain the recurrence relations for quad and binary partitions for searching in a sorted table?

If you have a square region that holds various numbers, what is the result of each recurrence relation?:
T(n) = 3T(n/2) + c and T(n) = 2T(n/2) + cn
I know the first is supposed to result in a quad partition and the second a binary partition, but I can't intuitively wrap my head around why this is the case. Why are we making 3 recursive calls in the first case and 2 in the second? Why does the +c or +cn effect what we're doing with the problem?
I think this is what you are looking for
http://leetcode.com/2010/10/searching-2d-sorted-matrix-part-ii.html
if your question is just about the recursion explanation, I recommend reading up on solving recurrences using recursion tree and the master method
http://courses.csail.mit.edu/6.006/spring11/rec/rec08.pdf
This explains the second recurrence and the method. Basically you will have a recursion tree with height (lgn) and the cost at each level equalling n.
In the first one the recursion tree will have run time of the order of number of nodes in the tree. The height will still be lgn but the cost at each level 3^h * c. Summation over this will give you the complexity

Resources