Why to consider binary search running time complexity as log2N - algorithm

Can someone explain me when it comes to binary search we say the running time complexity is O(log n)? I searched it in Google and got the below,
"The number of times that you can halve the search space is the same as log2 n".
I know we do halve until we find the search key in the data structure, but why we have to consider it as log2 n? I understand that ex is exponential growth and so the log2 n is the binary decay. But I am unable to interpret the binary search in terms of my logarithm definition understanding.

Think of it like this:
If you can afford to half something m times, (i.e., you can afford to spend time proportional to m), then how large array can you afford to search?
Obviously arrays of size 2m, right?
So if you can search an array of size n = 2m, then the time it takes is proportional to m, and solving m for n look like this:
n = 2m
log2(n) = log2(2m)
log2(n) = m
Put another way: Performing a binary search on an array of size n = 2m takes time proportional to m, or equivalently, proportional to log2(n).

Binary search :-
lets take an example to solve the problem .
suppose we are having 'n' apples and every day half of the apples gets rotten . then after how many days the apple count will be '1'.
first day n apples : a a a a .... (total n)
second day : a a a a..a(total n/2)
third day : a a a .. a(total n/(2^2));
so onn..............
lets suppose after k days the apples left will be 1
i.e n/(2^k) should become 1 atlast
n/(2^k)=1;
2^k=n;
applying log to base 2 on both sides
k=log n;
in the same manner in binary search
firstly we are left with n elements
then n/2
then n/4
then n/8
so on
finally we are left with one ele
so time complexity is log n

These are all good answers, however I wish to clarify something that I did not consider before. We are asking how many operations does it take to get an array of size 1 from size n. The reason for this is that when the array size is 1, the only element in the array is the element which is to be found and the search operation can be terminated. In other words, when the array size becomes 1, the element that was searched is found.
The way binary search works is by halving the search space of the array and gradually focusing on the matching element. Let's say the size of array is n. Then, in m operations of halving the search space, the size of the array search space becomes n/2^m. When it becomes 1, we have found our element. So equate it to 1 and solve for m.
To summarize, m = log2(n) is the number of operations it would take for the binary search algorithm to reduce the search space from n to 1 and hence, find the element that is searched.

Related

Time Complexity of Binary Search?

If I divide array size by 3 what will the running time of Binary search.
With binary search you typically search in a sorted random access data structure like an array, by discarding half of the array with each comparison. Hence, in k steps you effectively cover 2^k entries. This yields a complexity of at most log2(n) of n elements.
With landau symbols, the base of the logarithm disappears because it is a constant: O(log2(n)) = O(log(n) / log(2)) = O(log(n)).
Now, if you, for some reason, can not only discard half of the values, but two thirds, by always knowing in which third the needle will end up in, this means you cover 3^k many entries in k steps.
Hence, you get log3(n). But this again reduces to the same time complexity as log(3) is a constant: O(log3(n)) = O(log(n)/log(3)) = O(log(n)).
It would be still log n assuming your array is sorted.

Find time complexity of an element which is neither kth maximum nor kth minimum?

There are N distinct numbers which are given not in sorted order. How much time it will take to select a number say which is neither k-th minimum nor k-th maximum?
I tried like this =>
Take initial k + 1 numbers and sort them in O(k log k). Then pick up kth number in that sorted list, that will be neither the kth minimum nor kth maximum .
Hence, time complexity = O(K log k)
Example =>
Select a number which is neither the 2nd minimum nor 2nd maximum.
array[] = {3,9,1,2,6,5,7,8,4}
Take initial 3 numbers or subarray = 3,9,1 and sorted subarray will be = 1,3,9
Now pick up 2nd element 3. Now, 3 is not the 2nd minimum nor 2nd maximum .
Now, time complexity = O(k lg k) = O(2 lg 2) = O(1).
The problem is trivial if N < k. Otherwise there's no k'th largest or smallest element in the array -- so one can pick any element (for example the first) in O(1) time.
If N is large enough you can take any subset of size 2k+1 and choose the median. Then you have found a number that is guaranteed not to be the kth largest or smallest number in the overall array. In fact you get something stronger -- it's guaranteed that it will not be in the first k or last k numbers in the sorted array.
Finding a median of M things can be done in O(M) time, so this algorithm runs in O(k) time.
I believe this is asymptotically optimal for large N -- any algorithm that considers fewer than k items cannot guarantee that it chooses a number that's not the kth min or max in the overall array.
If N isn't large enough (specifically N < 2k+1), you can find the minimum (or second minimum value if k=1) in O(N) time. Since k <= N < 2k+1, this is also O(k).
There are three cases where no solution exists: (k=1, N=1), (k=1, N=2), (k=2, N=2).
If you only consider cases where k <= N, then the complexity of the overall algorithm is O(k). If you want to include the trivial cases too then it's somewhat messy. If I(k<=N) is the function that's 1 when k<=N and 0 otherwise, a tighter bound is O(1 + k*I(k<=N)).
I think there many points that must be noticed in your solution:
Firstly it would require to take 2k+1 elements instead of k+1 in your solution. More specifically you take :
array[] = {3,9,1,2,6,5,7,8,4}
Take initial 3 numbers or subarray = 3,9,1 and sorted subarray will be = 1,3,9
Now pick up 2nd element 3. Now, 3 is not the 2nd minimum nor 2nd maximum .
but to check that 3 is not the 2nd minimum nor 2nd you can't do it with your k+1 elements:subarray = 3,9,1 you have to check the array to see what is the 2 max and min and check your solution.
On the other hand by taking 2k+1 elements and sorting them ,since your elements are distinct you would know that the k+1 element is greater from the k first elements and smaller from the k last elements of your sorted subarray.
In your example you could see:
array[] = {3,9,1,2,6,5,7,8,4}
subarray[]={3,9,1,2,6} then sort the subarray :{1,2,3,6,9} ,and give as an answer the number 3 .
An example where your solution would not be rigt:
array[] = {9,8,2,6,5,3,7,1,4} where your algorithm would return the number 2 which is the second min .
As of terms of complexity .By taking 2k+1 elements it would not change the complexity that you found because it would be O((2k+1)log(2k+1)) which is O(klog(k)).
Clearly if n<2k+1 the above algorithm won't work ,so you will have to sort the entire array which would take nlog n , but in this case n<2k+1 so it O(klogk).
Finally the algorithm based on the above will be O(klog k) .A thing that might be confusing is that the problem has two parameters k,n .If K is much smaller than n this is efficient algorithm since you don't need to look and short the n-size array but when k,n are very close then it is the same as sorting the n-size array .
One more thing that you should understand is that big O notation is way of measuring the time complexity when an input n is given to the algorithm ,and shows the asymptotic behavior of the algorithm for big input n. O(1) denotes that the algorithm is running ALWAYS in constant time .So in the end when you refer:
Now, time complexity = O(k lg k) = O(2 lg 2) = O(1).
This is not Right you have to measure the complexity with k being the input variable and not a constant ,and this shows the behavior of the algorithm for a random input k. Clearly the above algorithm doesn't take O(1) (or else constant time) it takes O(k log(k)).
Finally ,after searching for a better approach of the problem, if you want a more efficient way you could find kth min and kth max in O(n) (n is the size of the array) .And with one loop in O(n) you could simply select the first element which is different from kth min and max. I think O(n) is the lowest time complexity you can get since finding kth min and max take the least O(n).
For how to find kth min,max in O(n) you could see here:
How to find the kth largest element in an unsorted array of length n in O(n)?
This solution is O(n) while previous solution was O(klog k) .Now for k parameter close to n ,as explained above it is the same as O(n log(n)) ,so in this occasion the O(n) solution is better .But if most of the times k is much smaller than n then then O(k log k) may be better .The good thing with the O(n) solution (second solution) is that in all cases it takes O(n) regardless to k so it is more stable but as mentioned for small k the first solution may be better (but in the worst case it can reach O(nlogn)).
You can sort the entire list in pseudo-linear time using radix-sort and select the k-th largest element in constant time.
Overall it would be a worst-case O(n) algorithm assuming the size of the radix is much smaller than n or you're using a Selection algorithm.
O(n) is the absolute lower bound here. There's no way to get anything better than linear because if the list is unsorted you need to at least examine everything or you might miss the element you're looking for.

repetition detection in O(Log n) in sorted array

Given a sorted integer array A of size n, where n is a multiple of 4. Could someone help me how to find an algorithm that decides whether not there exists an element that repeats at least n/4 times in the array in O(log n) time.
If there is an element that repeats n/4 times, it must also occupy one of the following indices: n/4, 2n/4, 3n/4, n.
For each of these elements, do two binary searches to find the first index it occupies and the last one.
This totals in 4*2 binary searches, each taking O(logn) time. This gives you total run time of O(8*logn) = O(logn)

What is meant by complexity of O(log n)?

Consider this example of binary search tree.
n =10 ;and if base = 2 then
log n = log2(10) = 3.321928.
I am assuming it means 3.321 steps(accesses) will be required at max to search for an element. Also I assume that BST is balanced binary tree.
Now to access node with value 25. I have to go following nodes:
50
40
30
25
So I had to access 4 nodes. And 3.321 is nearly equal to 4.
Is this understanding is right or erroneous?
I'd call your understanding not quite correct.
The big-O notation does not say anything about an exact amount of steps done. A notation O(log n) means that something is approximately proportional to log n, but not neccesarily equal.
If you say that the number of steps to search for a value in a BST is O(log n), this means that it is approximately C*log n for some constant C not depending on n, but it says nothing about the value of C. So for n=10 this never says that the number of steps is 4 or whatever. It can be 1, or it can be 1000000, depending on what C is.
What does this notation say is that if you consider two examples with different and big enough sizes, say n1 and n2, then ratio of the number of steps in these two examples will be approximately log(n1)/log(n2).
So if for n=10 it took you, say, 4 steps, then for n=100 it should take approximately two times more, that is, 8 steps, because log(100)/log(10)=2, and for n=10000 it should take 16 steps.
And if for n=10 it took you 1000000 steps, then for n=100 it should take 2000000, and for n=10000 — 4000000.
This is all for "large enough" n — for small ns the number of steps can deviate from this proportionality. For most practical algorithms the "large enough" usually starts from 5-10, if not even 1, but from a strict point of view the big-O notation does not set any requirement on when the proportionality should start.
Also in fact O(log n) notation does not require that number of steps growths proportionally to log n, but requires that the number of steps growths no faster than proportionally to log n, that is the ratio of the numbers of steps should not be log(n1)/log(n2), but <=log(n1)/log(n2).
Note also another situation that can make the background for O-notation more clear. Consider not the number of steps, but the time spent for search in a BST. You clearly can not predict this time because it depends on the machine you are running on, on a particular implementation of the algorithm, after all on the units you use for time (seconds or nanoseconds, etc.). So the time can be 0.0001 or 100000 or whatever. However, all these effects (speed of your machine, etc) routhly changes all the measurement results by some constant factor. Therefore you can say that the time is O(log n), just in different cases the C constant will be different.
Your thinking is not correct totally. The steps/accesses which are considered are for comparisons. But, O(log n) is just a mere parameter to measure asymptotic complexity, and not the exact steps calculation. As exactly answered by Petr, you should go through the points mentioned in his answer.
Also, BST is binary search tree,also sometimes called ordered or sorted binary trees.
Exact running time/comparisons can't be derived from Asymptotic complexity measurement. For that, you'll have to return to the exact derivation of searching an element in a BST.
Assume that we have a “balanced” tree with n nodes. If the maximum number of comparisons to find an entry is (k+1), where k is the height, we have
2^(k+1) - 1 = n
from which we obtain
k = log2(n+1) – 1 = O(log2n).
As you can see, there are other constant factors which are removed while measuring asymptotic complexity in worst case analysis. So, the complexity of the comparisons gets reduced to O(log2n).
Next, demonstration of how element is searched in a BST based on how comparison is done :-
1. Selecting 50,if root element //compare and move below for left-child or right-child
2. Movement downwards from 50 to 40, the leftmost child
3. Movement downwards from 40 to 30, the leftmost child
4. Movement downwards from 30 to 25, found and hence no movement further.
// Had there been more elements to traverse deeper, it would have been counted the 5th step.
Hence, it searched the item 25 after 3 iterative down-traversal. So, there is 4 comparisons and 3 downward-traversals(because height is 3).
Usually you say something like this :-
Given a balanced binary search tree with n elements, you need O(log n) operations to search.
or
Search in a balanced binary search tree of n elements is in O(log n).
I like the second phrase more, because it emphasizes that O is a function returning a set of functions given x (short: O(x)). x: N → N is a function. The input of x is the size of the input of a function and the output of x can be interpreted as the number of operations you need.
An function g is in O(x) when g is lower than x multiplied by an arbitrary non-negative constant from some starting point n_0 for all following n.
In computer science, g is often set equal with an algorithm which is wrong. It might be the number of operations of an algorithm, given the input size. Note that this is something different.
More formally:
So, regarding your question: You have to define what n is (conceptually, not as a number). In your example, it is eventually the number of nodes or the number of nodes on the longest path to a leaf.
Usually, when you use Big-O notation, you are not interested for a "average" case (and especially not for some given case) but you want to say something about the worst case.

Prepare array in linear time to find k smallest elements in O(k)

This is an interesting question I have found on the web. Given an array containing n numbers (with no information about them), we should pre-process the array in linear time so that we can return the k smallest elements in O(k) time, when we are given a number 1 <= k <= n
I have been discussing this problem with some friends but no one could find a solution; any help would be appreciated!
For the pre-processing step, we will use the partition-based selection several times on the same data set.
Find the n/2-th number with the algorithm.. now the dataset is partitioned into two half, lower and upper. On the lower half find again the middlepoint. On its lower partition do the same thing and so on... Overall this is O(n) + O(n/2) + O(n/4) + ... = O(n).
Now when you have to return the k smallest elements, search for the nearest x < k, where x is a partition boundary. Everything below it can be returned, and from the next partition you have to return k - x numbers. Since the next partition's size is O(k), running another selection algorithm for the k - x th number will return the rest.
We can find the median of a list and partition around it in linear time.
Then we can use the following algorithm: maintain a buffer of size 2k.
Every time the buffer gets full, we find the median and partition around it, keeping only the lowest k elements.
This requires n/k find-median-and-partition steps, each of which take O(k) time with a traditional quickselect. this approach requires only O(n) time.
Additionally if you need the sorted output.
Which adds an additional O(k log k) time. In total, this approach requires only O(n + k log k) time and O(k) space.

Resources