repetition detection in O(Log n) in sorted array - algorithm

Given a sorted integer array A of size n, where n is a multiple of 4. Could someone help me how to find an algorithm that decides whether not there exists an element that repeats at least n/4 times in the array in O(log n) time.

If there is an element that repeats n/4 times, it must also occupy one of the following indices: n/4, 2n/4, 3n/4, n.
For each of these elements, do two binary searches to find the first index it occupies and the last one.
This totals in 4*2 binary searches, each taking O(logn) time. This gives you total run time of O(8*logn) = O(logn)

Related

Difference between O(logn) and O(nlogn)

I am preparing for software development interviews, I always faced the problem in distinguishing the difference between O(logn) and O(nLogn). Can anyone explain me with some examples or share some resource with me. I don't have any code to show. I understand O(Logn) but I haven't understood O(nlogn).
Think of it as O(n*log(n)), i.e. "doing log(n) work n times". For example, searching for an element in a sorted list of length n is O(log(n)). Searching for the element in n different sorted lists, each of length n is O(n*log(n)).
Remember that O(n) is defined relative to some real quantity n. This might be the size of a list, or the number of different elements in a collection. Therefore, every variable that appears inside O(...) represents something interacting to increase the runtime. O(n*m) could be written O(n_1 + n_2 + ... + n_m) and represent the same thing: "doing n, m times".
Let's take a concrete example of this, mergesort. For n input elements: On the very last iteration of our sort, we have two halves of the input, each half size n/2, and each half is sorted. All we have to do is merge them together, which takes n operations. On the next-to-last iteration, we have twice as many pieces (4) each of size n/4. For each of our two pairs of size n/4, we merge the pair together, which takes n/2 operations for a pair (one for each element in the pair, just like before), i.e. n operations for the two pairs.
From here, we can extrapolate that every level of our mergesort takes n operations to merge. The big-O complexity is therefore n times the number of levels. On the last level, the size of the chunks we're merging is n/2. Before that, it's n/4, before that n/8, etc. all the way to size 1. How many times must you divide n by 2 to get 1? log(n). So we have log(n) levels. Therefore, our total runtime is O(n (work per level) * log(n) (number of levels)), n work log(n) times.

Find time complexity of an element which is neither kth maximum nor kth minimum?

There are N distinct numbers which are given not in sorted order. How much time it will take to select a number say which is neither k-th minimum nor k-th maximum?
I tried like this =>
Take initial k + 1 numbers and sort them in O(k log k). Then pick up kth number in that sorted list, that will be neither the kth minimum nor kth maximum .
Hence, time complexity = O(K log k)
Example =>
Select a number which is neither the 2nd minimum nor 2nd maximum.
array[] = {3,9,1,2,6,5,7,8,4}
Take initial 3 numbers or subarray = 3,9,1 and sorted subarray will be = 1,3,9
Now pick up 2nd element 3. Now, 3 is not the 2nd minimum nor 2nd maximum .
Now, time complexity = O(k lg k) = O(2 lg 2) = O(1).
The problem is trivial if N < k. Otherwise there's no k'th largest or smallest element in the array -- so one can pick any element (for example the first) in O(1) time.
If N is large enough you can take any subset of size 2k+1 and choose the median. Then you have found a number that is guaranteed not to be the kth largest or smallest number in the overall array. In fact you get something stronger -- it's guaranteed that it will not be in the first k or last k numbers in the sorted array.
Finding a median of M things can be done in O(M) time, so this algorithm runs in O(k) time.
I believe this is asymptotically optimal for large N -- any algorithm that considers fewer than k items cannot guarantee that it chooses a number that's not the kth min or max in the overall array.
If N isn't large enough (specifically N < 2k+1), you can find the minimum (or second minimum value if k=1) in O(N) time. Since k <= N < 2k+1, this is also O(k).
There are three cases where no solution exists: (k=1, N=1), (k=1, N=2), (k=2, N=2).
If you only consider cases where k <= N, then the complexity of the overall algorithm is O(k). If you want to include the trivial cases too then it's somewhat messy. If I(k<=N) is the function that's 1 when k<=N and 0 otherwise, a tighter bound is O(1 + k*I(k<=N)).
I think there many points that must be noticed in your solution:
Firstly it would require to take 2k+1 elements instead of k+1 in your solution. More specifically you take :
array[] = {3,9,1,2,6,5,7,8,4}
Take initial 3 numbers or subarray = 3,9,1 and sorted subarray will be = 1,3,9
Now pick up 2nd element 3. Now, 3 is not the 2nd minimum nor 2nd maximum .
but to check that 3 is not the 2nd minimum nor 2nd you can't do it with your k+1 elements:subarray = 3,9,1 you have to check the array to see what is the 2 max and min and check your solution.
On the other hand by taking 2k+1 elements and sorting them ,since your elements are distinct you would know that the k+1 element is greater from the k first elements and smaller from the k last elements of your sorted subarray.
In your example you could see:
array[] = {3,9,1,2,6,5,7,8,4}
subarray[]={3,9,1,2,6} then sort the subarray :{1,2,3,6,9} ,and give as an answer the number 3 .
An example where your solution would not be rigt:
array[] = {9,8,2,6,5,3,7,1,4} where your algorithm would return the number 2 which is the second min .
As of terms of complexity .By taking 2k+1 elements it would not change the complexity that you found because it would be O((2k+1)log(2k+1)) which is O(klog(k)).
Clearly if n<2k+1 the above algorithm won't work ,so you will have to sort the entire array which would take nlog n , but in this case n<2k+1 so it O(klogk).
Finally the algorithm based on the above will be O(klog k) .A thing that might be confusing is that the problem has two parameters k,n .If K is much smaller than n this is efficient algorithm since you don't need to look and short the n-size array but when k,n are very close then it is the same as sorting the n-size array .
One more thing that you should understand is that big O notation is way of measuring the time complexity when an input n is given to the algorithm ,and shows the asymptotic behavior of the algorithm for big input n. O(1) denotes that the algorithm is running ALWAYS in constant time .So in the end when you refer:
Now, time complexity = O(k lg k) = O(2 lg 2) = O(1).
This is not Right you have to measure the complexity with k being the input variable and not a constant ,and this shows the behavior of the algorithm for a random input k. Clearly the above algorithm doesn't take O(1) (or else constant time) it takes O(k log(k)).
Finally ,after searching for a better approach of the problem, if you want a more efficient way you could find kth min and kth max in O(n) (n is the size of the array) .And with one loop in O(n) you could simply select the first element which is different from kth min and max. I think O(n) is the lowest time complexity you can get since finding kth min and max take the least O(n).
For how to find kth min,max in O(n) you could see here:
How to find the kth largest element in an unsorted array of length n in O(n)?
This solution is O(n) while previous solution was O(klog k) .Now for k parameter close to n ,as explained above it is the same as O(n log(n)) ,so in this occasion the O(n) solution is better .But if most of the times k is much smaller than n then then O(k log k) may be better .The good thing with the O(n) solution (second solution) is that in all cases it takes O(n) regardless to k so it is more stable but as mentioned for small k the first solution may be better (but in the worst case it can reach O(nlogn)).
You can sort the entire list in pseudo-linear time using radix-sort and select the k-th largest element in constant time.
Overall it would be a worst-case O(n) algorithm assuming the size of the radix is much smaller than n or you're using a Selection algorithm.
O(n) is the absolute lower bound here. There's no way to get anything better than linear because if the list is unsorted you need to at least examine everything or you might miss the element you're looking for.

Prepare array in linear time to find k smallest elements in O(k)

This is an interesting question I have found on the web. Given an array containing n numbers (with no information about them), we should pre-process the array in linear time so that we can return the k smallest elements in O(k) time, when we are given a number 1 <= k <= n
I have been discussing this problem with some friends but no one could find a solution; any help would be appreciated!
For the pre-processing step, we will use the partition-based selection several times on the same data set.
Find the n/2-th number with the algorithm.. now the dataset is partitioned into two half, lower and upper. On the lower half find again the middlepoint. On its lower partition do the same thing and so on... Overall this is O(n) + O(n/2) + O(n/4) + ... = O(n).
Now when you have to return the k smallest elements, search for the nearest x < k, where x is a partition boundary. Everything below it can be returned, and from the next partition you have to return k - x numbers. Since the next partition's size is O(k), running another selection algorithm for the k - x th number will return the rest.
We can find the median of a list and partition around it in linear time.
Then we can use the following algorithm: maintain a buffer of size 2k.
Every time the buffer gets full, we find the median and partition around it, keeping only the lowest k elements.
This requires n/k find-median-and-partition steps, each of which take O(k) time with a traditional quickselect. this approach requires only O(n) time.
Additionally if you need the sorted output.
Which adds an additional O(k log k) time. In total, this approach requires only O(n + k log k) time and O(k) space.

A data structure for counting integers within some range?

Question:
Given n integers in the range [1, k], preprocesses its input and then
answers any query about how many of the n integers have values between a and b, where 1 ≤ a, b ≤ k
are two given parameters. Your algorithm should use O(n + k) preprocessing time.
Your algorithm is reasonably good, but it can be made much faster. Specifically, your algorithm has O(1) preprocessing time, but then spends O(n) time per query because of the linear cost of the time required to do the partitioning step.
Let's consider an alternative approach. Suppose that all of your values were in sorted order. In this case, you could find the number of elements in a range very quickly by just doing two binary searches - a first binary search to find the index of the lower bound, and a second search to find the upper bound - and could just subtract the indices. This would take time O(log n). If you can preprocess the input array to sort it in time O(n + k), then this approach will result in exponentially faster lookup times.
To do this sorting, as #minitech has pointed out, you can use the counting sort algorithm, which sorts in time O(n + k) for integers between 1 and k. Consequently, using both counting sort and the binary search together gives O(n + k) setup time and O(log n) query time.
If you are willing to trade memory for efficiency, though, you can speed this up even further. Let's suppose that k is a reasonably small number (say, not more than 100). Then if you are okay using O(k) space, you can answer these queries in O(1) time. The idea is as follows: build up a table of k elements that represents, for each element k, how many elements of the original array are smaller than k. If you have this array, you can find the total number of elements in some subrange by looking up how many elements are less than b and how many elements are less than a (each in O(1) time), then subtracting them.
Of course, to do this, you have to actually build up this table in time O(n + k). This can be done as follows. First, create an array of k elements, then iterate across the original n-element array and for each element increment the spot in the table corresponding to this number. When you're done (in time O(n + k)), you will have filled in this table with the number of times that each of the values in the range 1 - k exists in the original array (this is, incidentally, how counting sort works). Next, create a second table of k elements that will hold the cumulative frequency. Then, iterate across the histogram you built in the first step, and fill in the cumulative frequency table with the cumulative total number of elements encountered so far as you walk across the histogram. This last step takes time O(k), for a grand total of time O(n + k) for setup. You can now answer queries in time O(1).
Hope this helps!
Here is another simple algorithm:
First allocate an array A of size k, then iterate over n elements and for each integer x increment A[x] by one. this will take O(n) time.
Then compute prefix sum of array A, and store them as array B. this will take O(k).
now for any query of points(a, b) you can simply return: B[b]-B[a]+A[a]

Why to consider binary search running time complexity as log2N

Can someone explain me when it comes to binary search we say the running time complexity is O(log n)? I searched it in Google and got the below,
"The number of times that you can halve the search space is the same as log2 n".
I know we do halve until we find the search key in the data structure, but why we have to consider it as log2 n? I understand that ex is exponential growth and so the log2 n is the binary decay. But I am unable to interpret the binary search in terms of my logarithm definition understanding.
Think of it like this:
If you can afford to half something m times, (i.e., you can afford to spend time proportional to m), then how large array can you afford to search?
Obviously arrays of size 2m, right?
So if you can search an array of size n = 2m, then the time it takes is proportional to m, and solving m for n look like this:
n = 2m
log2(n) = log2(2m)
log2(n) = m
Put another way: Performing a binary search on an array of size n = 2m takes time proportional to m, or equivalently, proportional to log2(n).
Binary search :-
lets take an example to solve the problem .
suppose we are having 'n' apples and every day half of the apples gets rotten . then after how many days the apple count will be '1'.
first day n apples : a a a a .... (total n)
second day : a a a a..a(total n/2)
third day : a a a .. a(total n/(2^2));
so onn..............
lets suppose after k days the apples left will be 1
i.e n/(2^k) should become 1 atlast
n/(2^k)=1;
2^k=n;
applying log to base 2 on both sides
k=log n;
in the same manner in binary search
firstly we are left with n elements
then n/2
then n/4
then n/8
so on
finally we are left with one ele
so time complexity is log n
These are all good answers, however I wish to clarify something that I did not consider before. We are asking how many operations does it take to get an array of size 1 from size n. The reason for this is that when the array size is 1, the only element in the array is the element which is to be found and the search operation can be terminated. In other words, when the array size becomes 1, the element that was searched is found.
The way binary search works is by halving the search space of the array and gradually focusing on the matching element. Let's say the size of array is n. Then, in m operations of halving the search space, the size of the array search space becomes n/2^m. When it becomes 1, we have found our element. So equate it to 1 and solve for m.
To summarize, m = log2(n) is the number of operations it would take for the binary search algorithm to reduce the search space from n to 1 and hence, find the element that is searched.

Resources