Complexity of Perfectly Balanced Binary Tree - binary-tree

My scenario is a perfectly balanced binary tree containing integers.
I've searched and found many explanations of best/worst case scenarios for binary trees. Best case is O(1) (target found in root), and worst is O(log(n)) (height of the tree).
I have found little to no information on calculating average complexity. The best answer I could find was O(log(n)) - 1, but I guess I don't quite understand (if correct) how this average case is calculated.
Also, would searching for an integer not in the tree yield the same complexity, I think it would, but any incite is appreciated.

Lets say we have a perfect balanced binaray tree containing n = 2k integers, so the depth is log₂(n) = k.
The best and worst case is, as you say, O(1) and O(log(n)).
Short way
Lets pick a random integer X (uniform distributed) from the binary tree. The last row the tree contains the same number of integers as the first k-1 rows together. With probability 1/2 X is in the frist k-1 rows, so we need at most O(k-1) = O(log(n)-1) steps to find it. And also with probability 1/2 X is in the last row, where we need O(k) = O(log(n)) steps.
In total we get
E[X] ≤ P(row of X ≤ k-1)⋅O(log(n)-1) + P(row of X = k)⋅O(log(n))
= 1/2⋅O(log(n)-1) + 1/2⋅O(log(n))
= 1/2⋅O(log(n)-1) + 1/2⋅O(log(n)-1)
= O(log(n)-1)
Notice: This is a little ugly but in O-notation O(x) and O(x±c) is the same for any constant value c.
Long way
Now lets try to calculate the average case for a random (uniform distributed) integer X containd in the tree and lets name the set of integers on the i-th "row" of the tree Ti. Ti contains 2i Elements. T0 denotes the root.
The probability of picking an integer in the i-th row is P(X ∈ Ti) = 2i/n = 2i-k.
To find an integer on row i it take O(2i) = O(i) steps.
So the expected number of steps is
E[X] = Σi=0,...,k-1 O(i)⋅2i-k.
To simplify this we use
O(i)⋅2i-k + O(i+1)⋅2i+1-k ≤ O(i)⋅2i+1-k + O(i+1)⋅2i+1-k ≤ O(i+1)⋅2i+2-k
This leads us to
E[X] = Σi=0,...,k-1 O(i)⋅2i-k ≤ O(k-1)⋅2⁰
Since k = log(n), we see that the average case is in O(log(n)-1) = O(log(n)).
Values not in the tree
If the value is not in the tree, you have to walk through the whole tree. After log(n) steps you have found a leaf. If the value equals your input, you have found what you seached for. If not, you know, that the value you searched for is not containd in the tree. So if you seach for a value that is not in the tree it will take O(log(n)).

Related

Quick sort - partition split ratio [duplicate]

This was a problem of CLR (Introduction to Algorithms) The question goes as follow:
Suppose that the splits at every level of quicksort are in the proportion 1 - α to α, where 0 < α ≤ 1/2 is a constant. Show that the minimum depth of a leaf in the recursion tree is approximately - lg n/ lg α and the maximum depth is approximately -lg n/ lg(1 - α). (Don't worry about integer round-off.)http://integrator-crimea.com/ddu0043.html
I'm not getting how to reach this solution. as per the link they show that for a ratio of 1:9 the max depth is log n/log(10/9) and minimum log n/log(10). Then how can the above formula be proved. Please help me as to where am I going wrong as I'm new to Algorithms and Data Structures course.
First, let us consider this simple problem. Assume you a number n and a fraction (between 0 and 1) p. How many times do you need to multiply n with p so that resulting number is less than or equal to 1?
n*p^k <= 1
log(n)+k*log(p) <= 0
log(n) <= -k*log(p)
k => -log(n)/log(p)
Now, let us consider your problem. Assume you send the shorter of the two segments to the left child and longer to the right child. For the left-most chain, the length is given by substituting \alpha as p in the above equation. For the right most chain, the length is calculated by substituting 1-\alpha as p. Which is why you have those numbers as answers.
general question and the answer
Suppose that the splits at every level of quicksort are in proportion
1−α to α, where 0< α ≤1/2 is a constant. Show that the minimum depth of a leaf in the recursion tree is approximately −lgn/lgα
and the maximum depth is approximately −lgn/lg(1−α). (Don't worry about integer round-off.)
answer :
The minimum depth follows a path that always takes the smaller part of the partition i.e., that multiplies the number of elements by α. One iteration reduces the number of elements from n to αn, and i iterations reduce the number of elements to (α^i)n. At a leaf, there is just one remaining element, and so at a minimum-depth leaf of depth m, we have (α^m)n=1. Thus, αm=1/n. Taking logs, we get m*lgα=−lgn, or m=−lgn/lgα.
Similarly, maximum depth corresponds to always taking the larger part of the partition, i.e., keeping a fraction 1−α of the elements each time. The maximum depth M is reached when there is one element left, that is, when [(1−α)^M ]n=1. Thus, M=−lgn/lg(1−α).
All these equations are approximate because we are ignoring floors and ceilings.
source

Running time of algorithm with arbitrary sized recursive calls

I have written the following algorithm that given a node x in a Binary Search Tree T, will set the field s for all nodes in the subtree rooted at x, such that for each node, s will be the sum of all odd keys in the subtree rooted in that node.
OddNodeSetter(T, x):
if (T.x == NIL):
return 0;
if (T.x.key mod 2 == 1):
T.x.s = T.x.key + OddNodeSetter(T, x.left) + OddNodeSetter(T, x.right)
else:
T.x.s = OddNodeSetter(T, x.left) + OddNodeSetter(T, x.right)
I've thought of using the master theorem for this, with the recurrence
T(n) = T(k) + T(n-k-1) + 1 for 1 <= k < n
however since the size of the two recursive calls could vary depending on k and n-k-1 (i.e. the number of nodes in the left and right subtree of x), I can't quite figure out how to solve this recurrence though. For example in case the number of nodes in the left and right subtree of x are equal, we can express the recurrence in the form
T(n) = 2T(n/2) + 1
which can be solved easily, but that doesn't prove the running time in all cases.
Is it possible to prove this algorithm runs in O(n) with the master theorem, and if not what other way is there to do this?
The algorithm visits every node in the tree exactly once, hence O(N).
Update:
And obviously, a visit takes constant time (not counting the recursive calls).
There is no need to use the Master theorem here.
Think of the problem this way: what is the maximum number of operations you have do for each node in the tree? It is bounded by a constant. And what the is the number of nodes in the tree? It is n.
The multiplication of constant with n is still O(n).

Big O and Big Omega Notation Algorithms

There is a comparison-based sorting algorithm that runs in O(n*log(sqrt(n))).
Given the existence of an Omega(n(log(n)) lower bound for sorting, how can this be possible?
Basically, this problem is asking you to prove that O(n*log(n)) = O(n*log(√n)), which means that you need to find some constant c > 0 such that: O(n*log(n)) = O(c*n*log(√n)). Remember that √n = n^(1/2) and that log(n^(1/2)) = 1/2*log(n). So, now we have O(n*log(n)) = O(1/2*n*log(n)). Since asymptotic notation ignores constant multipliers, we can rewrite this as O(n*log(n)) = O(n*log(n)). Voila, proof positive that it is possible.
For a sorting algorithm based on comparison you can draw a decision tree. It is a binary tree representing comparisons done by the algorithm, and every leaf of this tree is a permutation of the elements from given set.
There are n! possible permutations, where n is the size of the set, and only one of them represents the sorted set. Path leading to every leaf represents comparisons necessary to achieve permutation represented by the leaf.
Now lets make h the height of our decision tree, and l to be the number of leafs. Every possible permutation of the input set must be in one of the leafs, so n! <= l. A binary tree with height h can have at most 2^h leafs. Therefore we get n! <= l <= 2^h. Apply a logarithm to both sides, so you get h >= log(n!), and log(n!) is Omega(nlog(n)).
Because the height of the decision tree represents a number of comparisons necessary to get to the leaf, this is the proof that the lower bound for sorting algorithms based on comparison is nlog(n). This can't be done faster. So the only option left for this task to be correct is to assume that Omega(nlog(n) is also Omega(nlog(sqrt(n)). log(sqrt(n)) = log(n^(1/2)) = (1/2)log(n) => nlog(sqrt(n)) = n((1/2)log(n)) = (1/2)nlog(n). Ignore const = 1/2 (as we are interested in asympthotic complexity) an you get nlog(sqrt(n)) = nlog(n) in terms of complexity.

Find the pair of numbers in an unsorted array with a sum closest to an arbitrary target [duplicate]

This question already has answers here:
Linear time algorithm for 2-SUM
(13 answers)
Closed 8 years ago.
This is a generalization of the 2Sum problem
Given an unsorted array of numbers, how do you find the pair of numbers with a sum closest to an arbitrary target. Note that an exact match may not exist, so the O(n) hashtable solution doesn't fit here.
I can solve the problem in O(n*log(n)) time for a target of 0 as so:
Sort the numbers by absolute value.
Iterate across the sorted array, keeping the minimum of the sums of adjacent values.
This works because the three cases of pairs (+/+, +/-, -/-) are all handled by the logic of absolute value. That is, the sum of pairs of the same sign is minimized when they are closest to 0, and the sum of pairs of different sign is minimized when the components of the pair are closest to each other. Both of these cases are represented when sorting by absolute value.
How can I generalize this to an arbitrary target sum?
Step 1: Sort the array in non-decreasing order. Complexity: O( n lg n )
Step 2: Scan inwards from both ends. Complexity: O( n )
On the sorted array A, let l point to the left-most (i.e. minimum) element and r point to the right-most (i.e. maximum) element.
while true:
curSum = A[ l ] + A[ r ]
diff = currSum - target
if( diff > 0 ):
r = r - 1
else:
l = l + 1
If ever diff == 0, then you got a perfect match for 2Sum.
Otherwise, look for change of sign in diff, i.e. transition from positive to negative, or negative to positive. Whenever the transition happens, either just before or just after the change, currSum is closest to target.
Overall Complexity: O( n log n ) because of step 1.
If you are going to sort the numbers anyway, your algorithm already will have a complexity of at least O(n*log(n)). So here is what you can do: for each number v you can perform a binary search to find the least number u in the array that gets the sum u + v more than target. Now check u + v and u + t where t is the predecessor of v in the sorted array.
The complexity of this is n times the complexity of binary search i.e. O(n * log (n)) thus your overall complexity remains O(n*log(n)). Also this solution is way easier to implement than what you suggest.
As pointed out by amit in a comment to the question you can do the second phase with linear complexity, improving its speed. Still the overall computational complexity will remain the same and it is a little bit harder to implement the solution.

Selection i'th smallest number algorithm

I'm reading Introduction to Algorithms book, second edition, the chapter about Medians and Order statistics. And I have a few questions about randomized and non-randomized selection algorithms.
The problem:
Given an unordered array of integers, find i'th smallest element in the array
a. The Randomized_Select algorithm is simple. But I cannot understand the math that explains it's work time. Is it possible to explain that without doing deep math, in more intuitive way? As for me, I'd think that it should work for O(nlog n), and in worst case it should be O(n^2), just like quick sort. In avg randomizedPartition returns near middle of the array, and array is divided into two each call, and the next recursion call process only half of the array. The RandomizedPartition costs (p-r+1)<=n, so we have O(n*log n). In the worst case it would choose every time the max element in the array, and divide the array into two parts - (n-1) and (0) each step. That's O(n^2)
The next one (Select algorithm) is more incomprehensible then previous:
b. What it's difference comparing to previous. Is it faster in avg?
c. The algorithm consists of five steps. In first one we divide the array into n/5 parts each one with 5 elements (beside the last one). Then each part is sorted using insertion sort, and we select 3rd element (median) of each. Because we have sorted these elements, we can be sure that previous two <= this pivot element, and the last two are >= then it. Then we need to select avg element among medians. In the book stated that we recursively call Select algorithm for these medians. How we can do that? In select algorithm we are using insertion sort, and if we are swapping two medians, we need to swap all four (or even more if it is more deeper step) elements that are "children" for each median. Or do we create new array that contain only previously selected medians, and are searching medians among them? If yes, how can we fill them in original array, as we changed their order previously.
The other steps are pretty simple and look like in the randomized_partition algorithm.
The randomized select run in O(n). look at this analysis.
Algorithm :
Randomly choose an element
split the set in "lower than" set L and "bigger than" set B
if the size of "lower than" is j-1 we found it
if the size is bigger, then Lookup in L
or lookup in B
The total cost is the sum of :
The cost of splitting the array of size n
The cost of lookup in L or the cost of looking up in B
Edited: I Tried to restructure my post
You can notice that :
We always go next in the set with greater amount of elements
The amount of elements in this set is n - rank(xj)
1 <= rank(xi) <= n So 1 <= n - rank(xj) <= n
The randomness of the element xj directly affect the randomness of the number of element which
are greater xj(and which are smaller than xj)
if xj is the element chosen , then you know that the cost is O(n) + cost(n - rank(xj)). Let's call rank(xj) = rj.
To give a good estimate we need to take the expected value of the total cost, which is
T(n) = E(cost) = sum {each possible xj}p(xj)(O(n) + T(n - rank(xj)))
xj is random. After this it is pure math.
We obtain :
T(n) = 1/n *( O(n) + sum {all possible values of rj when we continue}(O(n) + T(n - rj))) )
T(n) = 1/n *( O(n) + sum {1 < rj < n, rj != i}(O(n) + T(n - rj))) )
Here you can change variable, vj = n - rj
T(n) = 1/n *( O(n) + sum { 0 <= vj <= n - 1, vj!= n-i}(O(n) + T(vj) ))
We put O(n) outside the sum , gain a factor
T(n) = 1/n *( O(n) + O(n^2) + sum {1 <= vj <= n -1, vj!= n-i}( T(vj) ))
We put O(n) and O(n^2) outside, loose a factor
T(n) = O(1) + O(n) + 1/n *( sum { 0 <= vj <= n -1, vj!= n-i} T(vj) )
Check the link on how this is computed.
For the non-randomized version :
You say yourself:
In avg randomizedPartition returns near middle of the array.
That is exactly why the randomized algorithm works and that is exactly what it is used to construct the deterministic algorithm. Ideally you want to pick the pivot deterministically such that it produces a good split, but the best value for a good split is already the solution! So at each step they want a value which is good enough, "at least 3/10 of the array below the pivot and at least 3/10 of the array above". To achieve this they split the original array in 5 at each step, and again it is a mathematical choice.
I once created an explanation for this (with diagram) on the Wikipedia page for it... http://en.wikipedia.org/wiki/Selection_algorithm#Linear_general_selection_algorithm_-_Median_of_Medians_algorithm

Resources