I have an array {a1,a2,....,an} (of natural numbers), i need to build an greedy algorithm that finds a permutation (i1,...in) of 1....n that minimizes the sum: 1.ai1 + 2.ai2 + .... + (n − 1)ain−1 + n.ain.
Definitely I can just try all of them and select the one which gives the smallest sum (this will give correct result in O(n!)).
The greedy choice that i though is to choose the numbers in decreasing order, but i don't know how to prove that this works.
P.S: this is just for study and training, I'm not being able to think "greedly"
Choosing the numbers in decreasing order is optimal.
Proof is by induction on n: suppose there's a permutation that is optimal and that the smallest number is not in the last place. Then, swapping the element that is in the last place and the smallest element decreases the total sum. That contradicts the assumption of optimality, so we must have that the smallest element is in the last place. By the induction hypothesis, the other elements are in decreasing order in the first (n-1) places.
The base case of n=1 is trivial.
Related
There are N distinct numbers which are given not in sorted order. How much time it will take to select a number say which is neither k-th minimum nor k-th maximum?
I tried like this =>
Take initial k + 1 numbers and sort them in O(k log k). Then pick up kth number in that sorted list, that will be neither the kth minimum nor kth maximum .
Hence, time complexity = O(K log k)
Example =>
Select a number which is neither the 2nd minimum nor 2nd maximum.
array[] = {3,9,1,2,6,5,7,8,4}
Take initial 3 numbers or subarray = 3,9,1 and sorted subarray will be = 1,3,9
Now pick up 2nd element 3. Now, 3 is not the 2nd minimum nor 2nd maximum .
Now, time complexity = O(k lg k) = O(2 lg 2) = O(1).
The problem is trivial if N < k. Otherwise there's no k'th largest or smallest element in the array -- so one can pick any element (for example the first) in O(1) time.
If N is large enough you can take any subset of size 2k+1 and choose the median. Then you have found a number that is guaranteed not to be the kth largest or smallest number in the overall array. In fact you get something stronger -- it's guaranteed that it will not be in the first k or last k numbers in the sorted array.
Finding a median of M things can be done in O(M) time, so this algorithm runs in O(k) time.
I believe this is asymptotically optimal for large N -- any algorithm that considers fewer than k items cannot guarantee that it chooses a number that's not the kth min or max in the overall array.
If N isn't large enough (specifically N < 2k+1), you can find the minimum (or second minimum value if k=1) in O(N) time. Since k <= N < 2k+1, this is also O(k).
There are three cases where no solution exists: (k=1, N=1), (k=1, N=2), (k=2, N=2).
If you only consider cases where k <= N, then the complexity of the overall algorithm is O(k). If you want to include the trivial cases too then it's somewhat messy. If I(k<=N) is the function that's 1 when k<=N and 0 otherwise, a tighter bound is O(1 + k*I(k<=N)).
I think there many points that must be noticed in your solution:
Firstly it would require to take 2k+1 elements instead of k+1 in your solution. More specifically you take :
array[] = {3,9,1,2,6,5,7,8,4}
Take initial 3 numbers or subarray = 3,9,1 and sorted subarray will be = 1,3,9
Now pick up 2nd element 3. Now, 3 is not the 2nd minimum nor 2nd maximum .
but to check that 3 is not the 2nd minimum nor 2nd you can't do it with your k+1 elements:subarray = 3,9,1 you have to check the array to see what is the 2 max and min and check your solution.
On the other hand by taking 2k+1 elements and sorting them ,since your elements are distinct you would know that the k+1 element is greater from the k first elements and smaller from the k last elements of your sorted subarray.
In your example you could see:
array[] = {3,9,1,2,6,5,7,8,4}
subarray[]={3,9,1,2,6} then sort the subarray :{1,2,3,6,9} ,and give as an answer the number 3 .
An example where your solution would not be rigt:
array[] = {9,8,2,6,5,3,7,1,4} where your algorithm would return the number 2 which is the second min .
As of terms of complexity .By taking 2k+1 elements it would not change the complexity that you found because it would be O((2k+1)log(2k+1)) which is O(klog(k)).
Clearly if n<2k+1 the above algorithm won't work ,so you will have to sort the entire array which would take nlog n , but in this case n<2k+1 so it O(klogk).
Finally the algorithm based on the above will be O(klog k) .A thing that might be confusing is that the problem has two parameters k,n .If K is much smaller than n this is efficient algorithm since you don't need to look and short the n-size array but when k,n are very close then it is the same as sorting the n-size array .
One more thing that you should understand is that big O notation is way of measuring the time complexity when an input n is given to the algorithm ,and shows the asymptotic behavior of the algorithm for big input n. O(1) denotes that the algorithm is running ALWAYS in constant time .So in the end when you refer:
Now, time complexity = O(k lg k) = O(2 lg 2) = O(1).
This is not Right you have to measure the complexity with k being the input variable and not a constant ,and this shows the behavior of the algorithm for a random input k. Clearly the above algorithm doesn't take O(1) (or else constant time) it takes O(k log(k)).
Finally ,after searching for a better approach of the problem, if you want a more efficient way you could find kth min and kth max in O(n) (n is the size of the array) .And with one loop in O(n) you could simply select the first element which is different from kth min and max. I think O(n) is the lowest time complexity you can get since finding kth min and max take the least O(n).
For how to find kth min,max in O(n) you could see here:
How to find the kth largest element in an unsorted array of length n in O(n)?
This solution is O(n) while previous solution was O(klog k) .Now for k parameter close to n ,as explained above it is the same as O(n log(n)) ,so in this occasion the O(n) solution is better .But if most of the times k is much smaller than n then then O(k log k) may be better .The good thing with the O(n) solution (second solution) is that in all cases it takes O(n) regardless to k so it is more stable but as mentioned for small k the first solution may be better (but in the worst case it can reach O(nlogn)).
You can sort the entire list in pseudo-linear time using radix-sort and select the k-th largest element in constant time.
Overall it would be a worst-case O(n) algorithm assuming the size of the radix is much smaller than n or you're using a Selection algorithm.
O(n) is the absolute lower bound here. There's no way to get anything better than linear because if the list is unsorted you need to at least examine everything or you might miss the element you're looking for.
Encountered this problem in coding contest. Could think only O(n^2 log(n)) solution. I guess the expected was O(n log n).
I am given n numbers, I have to find 3 numbers that follow triangle inequality and have the smallest sum.
I hope this is quite easy to understand.
Eg.
10,2,5,1,8,20
Answer is 23 = (5+8+10)
The longest side should be the successor of the second longest; otherwise, we could shrink the longest and thus the perimeter. Now you can use your binary search to find the third side over O(n) possibilities instead of O(n^2) (and actually, you don't even need to search if you iterate from small to large, though the sort will still cost you).
I think the answer is something like this, assuming no duplicates in the numbers.
Sort the numbers. Then scan the numbers and take the first number that is smaller than the sum of the preceding two numbers. Call that x(n) . . . the nth position of the sorted series.
x(n) is one of the numbers, and so far we are O(n log(n)).
Then there are a limited number of previous choices. Then x(n-1) has to be one of the numbers, because x(n-2) + x(n-3) < x(n-1) < x(n). Then it is a simple scan up to x(n-1) to find the smallest number that matches. This could be at the beginning of the series, as in 2, 3, 8, 15, 16.
I think the analysis is essentially the same with duplicates.
I know that Binary Search has time complexity of O(logn) to search for an element in a sorted array. But let's say if instead of selecting the middle element, we select a random element, how would it impact the time complexity. Will it still be O(logn) or will it be something else?
For example :
A traditional binary search in an array of size 18 , will go down like 18 -> 9 -> 4 ...
My modified binary search pings a random element and decides to remove the right part or left part based on the value.
My attempt:
let C(N) be the average number of comparisons required by a search among N elements. For simplicity, we assume that the algorithm only terminates when there is a single element left (no early termination on strict equality with the key).
As the pivot value is chosen at random, the probabilities of the remaining sizes are uniform and we can write the recurrence
C(N) = 1 + 1/N.Sum(1<=i<=N:C(i))
Then
N.C(N) - (N-1).C(N-1) = 1 + C(N)
and
C(N) - C(N-1) = 1 / (N-1)
The solution of this recurrence is the Harmonic series, hence the behavior is indeed logarithmic.
C(N) ~ Ln(N-1) + Gamma
Note that this is the natural logarithm, which is better than the base 2 logarithm by a factor 1.44 !
My bet is that adding the early termination test would further improve the log basis (and keep the log behavior), but at the same time double the number of comparisons, so that globally it would be worse in terms of comparisons.
Let us assume we have a tree of size 18. The number I am looking for is in the 1st spot. In the worst case, I always randomly pick the highest number, (18->17->16...). Effectively only eliminating one element in every iteration. So it become a linear search: O(n) time
The recursion in the answer of #Yves Daoust relies on the assumption that the target element is located either at the beginning or the end of the array. In general, where the element lies in the array changes after each recursive call making it difficult to write and solve the recursion. Here is another solution that proves O(log n) bound on the expected number of recursive calls.
Let T be the (random) number of elements checked by the randomized version of binary search. We can write T=sum I{element i is checked} where we sum over i from 1 to n and I{element i is checked} is an indicator variable. Our goal is to asymptotically bound E[T]=sum Pr{element i is checked}. For the algorithm to check element i it must be the case that this element is selected uniformly at random from the array of size at least |j-i|+1 where j is the index of the element that we are searching for. This is because arrays of smaller size simply won't contain the element under index i while the element under index j is always contained in the array during each recursive call. Thus, the probability that the algorithm checks the element at index i is at most 1/(|j-i|+1). In fact, with a bit more effort one can show that this probability is exactly equal to 1/(|j-i|+1). Thus, we have
E[T]=sum Pr{element i is checked} <= sum_i 1/(|j-i|+1)=O(log n),
where the last equation follows from the summation of harmonic series.
I need to write an algorithm that finds the minimum of n elements with:
n-1 comparisons.
Performing only log(n) comparisons per element.
I thought of the selection search algorithm, but I think it compares each element more than log(n) times.
Any ideas?
Thanks!
You can think of a selection process as a tournament:
The first element is compared to the second, the third to the fourth and so on.
The winner of a comparison is the smaller element.
All the winners participate in the next round, in the same manner, until one element remains. The remaining element is the smallest of all.
Pseudocode
I'll give the recursive solution, but you can implement it iteratively also.
smallestElement(A[1...n]):
if size(A) == 1:
return A[1]
else
return min(smallestElement(A[1...n/2], smallestElement(A[n/2 + 1...n]))
The recursion has depth logn because on every level we dividing the size of the input by 2, so the winner of the tournament participates in logn comparison, and no one element participates in more comparisons.
We know that the easy way to find the smallest number of a list would simply be n comparisons, and if we wanted the 2nd smallest number we could go through it again or just keep track of another variable during the first iteration. Either way, this would take 2n comparisons to find both numbers.
So suppose that I had a list of n distinct elements, and I wanted to find the smallest and the 2nd smallest. Yes, the optimal algorithm takes at most n + ceiling(lg n) - 2 comparisons. (Not interested in the optimal way though)
But suppose then that you're forced to use the easy algorithm, the one that takes 2n comparisons. In the worst case, it'd take 2n comparisons. But what about the average? What would be the average number of comparisons it'd take to find the smallest and the 2nd smallest using the easy brute force algorithm?
EDIT: It'd have to be smaller than 2n -- (copied and pasted from my comment below) I compare the index I am at to the tmp2 variable keeping track of 2nd smallest. I don't need to make another comparison to tmp1 variable keeping track of smallest unless the value at my current index is smaller than tmp2. So you can reduce the number of comparisons from 2n. It'd still take more than n though. Yes in worst case this would still take 2n comparisons. But on average if everything is randomly put in...
I'd guess that it'd be n + something comparisons, but I can't figure out the 2nd part. I'd imagine that there would be some way to involve log n somehow, but any ideas on how to prove that?
(Coworker asked me this at lunch, and I got stumped. Sorry) Once again, I'm not interested in the optimal algorithm since that one is kinda common knowledge.
As you pointed out in the comment, there is no need for a second comparison if the current element in the iteration is larger than the second smallest found so far. What is the probability for a second comparison if we look at the k-th element ?
I think this can be rephrased as follows "What is the probability that the k-th element is in the subset containing the 2 smallest elements of the first k elements?"
This should be 2/k for uniformly distributed elements, because if we think of the first k elements as an ordered list, every position has equal probability 1/k for the k-th element, but only two, the smallest and second smallest position, cause a second comparison. So the number of 2nd comparisons should be sum_k=1^n (2/k) = 2 H_n (the n-th harmonic number). This is actually the calculation of the expected value for second comparisons, where the random number represents the event that a second comparison has to be done, it is 1 if a second comparison has to be done and 0 if just one comparison has to be done.
If this is correct, the overall number of comparisons in the average case is C(n) = n + 2 H_n and afaik H_n = theta(log(n)), C(n) = theta(n + log(n)) = theta(n)