average case running time of linear search algorithm - algorithm

I am trying to derive the average case running time for deterministic linear search algorithm. The algorithm searches an element x in an unsorted array A in the order A[1], A[2], A[3]...A[n]. It stops when it finds the element x or proceeds until it reaches the end of the array. I searched on wikipedia and the answer given was (n+1)/(k+1) where k is the number of times x is present in the array. I approached in another way and am getting a different answer. Can anyone please give me the correct proof and also let me know whats wrong with my method?
E(T)= 1*P(1) + 2*P(2) + 3*P(3) ....+ n*P(n) where P(i) is the probability that
the algorithm runs for 'i' time (i.e. compares 'i' elements).
P(i)= (n-i)C(k-1) * (n-k)! / n!
Here, (n-i)C(k-1) is (n-i) Choose (k-1). As the algorithm has reached the ith
step, the rest of k-1 x's must be in the last n-i elements. Hence (n-i)C(k-i).
(n-k)! is the total number of ways of arranging the rest non x numbers, and n!
is the total number of ways of arranging the n elements in the array.
I am not getting (n+1)/(k+1) on simplifying.

You've forgotten to account for the permutations of the k copies of x. The correct definition of P(i) is
P(i) = (n-i)C(k-1) * k! * (n-k)! / n! = (n-i)C(k-1) / nCk.
^^
I'll turn things over to Mathematica:
In[1]:= FullSimplify[Sum[i Binomial[n-i, k-1]/Binomial[n, k], {i, 1, n}], 0 <= k <= n]
1 + n
Out[1]= -----
1 + k
To elaborate on my comment below: assume that all elements are distinct, let X be the set of matches, and let Y be the set of non-matches. By assumption, |X| = k and |Y| = n-k. The expected number of reads is equal to the sum over elements e of the probability that e is read.
Given a set of elements S, each element in S comes before all of the others with probability 1/|S|.
An element x in X is read if and only if it comes before every other element of X, which is probability 1/k. The total contribution of elements in X is |X| (1/k) = 1.
An element y in Y is read if and only if it comes before every element of X, which is probability 1/(k+1). The total contribution of elements in Y is |Y| (1/(k+1)) = (n-k)/(k+1).
We have 1 + (n-k)/(k+1) = (n+1)/(k+1).

Here is a solution that uses Cormen terms:
Let S be the set of the other n-k elements.
Let the indicator random variable Xi=1, if we encounter the i'th element
of the set S in the course of our execution.
Pr{Xi=1}=1/(k+1) and therefore E[Xi]=1/(k+1).
Let the indicator random variable Y=1, if we encounter any of the k elements that we are searching for in the course of our execution.
Pr{Y=1}=1 and therefore E[Y]=1.
Let the random variable X=Y+X1+X2+...X(n-k) be the sum of the elements that we
encounter in the course of our execution.
E[X]=E[Y+X1+X2+...X(n-k)]=E[Y]+E[X1]+E[X2]+...E[X(n-k)]=1+(n-k)/(k+1)=(n+1)/(k+1).

Related

Given a number N and an array A. Check if N can be expressed as a product of one or more array elements

Given a number N (where N <= 10^18) and an array A(consisting of at most 20 elements). I have to tell if it is possible to form N by multiplying some elements of the array. Note that, I can use any element multiple times.
Example: N = 8 and A = {2, 3}. Here, 8 = 2 * 2 * 2. So the answer is YES. But if N = 15, then I can't form 15 as a product of one or more elements using them any number of times. So in this case the answer is NO.
How can I approach this problem?
Simple pseudocode:
A_divisors = set()
for x in A:
if num % x == 0:
A_divisors.add(x)
candidates = A_divisors.clone()
seen = set()
while(candidates.size()):
size = divisors.size()
new_candidates = set()
for y in candidates:
for x in A_divisors:
if num % (x * y) == 0 and (x * y) not in seen:
new_candidates.add(x * y)
seen.add(x * y)
if x * y == num:
return true
candidates = new_candidates
return false
Complexity: O(|A| * k * log k), with k being amount of divisors. The log k would be the cost of adding and checking if element is present in the set. With a hash based approach it would be O(1) and can be removed. I am also assuming %, * operations to be O(1).
Since you show no code or algorithm, I'll just give one idea. If you want more help, please show more of your own work on the problem.
Note that N can be at most 60 bits long. This is small enough that N could be decomposed into its prime factors pretty quickly. So first work up a good factoring algorithm for numbers of that size.
Your algorithm would factor N and each of the elements in your array A. If there is any prime factor of N that does not divide into any element of A then your answer is NO. This is the case in your example of N = 15.
Now you work with the prime factors and their exponents in N and in the elements of A. Now you want to find a subset (or, more properly, a sub-multiset) of A where the exponents for each prime add up to that in N. This greatly reduces the sizes of your numbers thus makes the problem easier.
That last part is not trivial. Work more on this problem and show us some of your work, then we can continue helping you.
You can follow below approach:
Form 2 queues: Q2 and Q3.
Add 2 in Q2 and 3 in Q3.
Get the minimum of the head of both queues, lets say h. Remove h from the corresponding queue. Check if it is equal to the number N. If yes, return true. If it is greater than N, return false.
If it is less than N, then add 2*h in Q2 and 3*h in Q3. Repeat steps 3 to 4.
Please note that when the minimum h comes from Q3, you need not to add 2*h into Q2. That is because you already have added that element in Q3 before. (I will leave it for you to deduce}. Keep on doing this procedure until your h is greater than N.
If you have more such numbers, you can form there queues as well. I think this is an optimal solution in case you have more numbers to process.
Can you guess the time and space complexity of this?

How many times variable m is updated

Given the following pseudo-code, the question is how many times on average is the variable m being updated.
A[1...n]: array with n random elements
m = a[1]
for I = 2 to n do
if a[I] < m then m = a[I]
end for
One might answer that since all elements are random, then the variable will be updated on average on half the number of iterations of the for loop plus one for the initialization.
However, I suspect that there must be a better (and possibly the only correct) way to prove it using binomial distribution with p = 1/2. This way, the average number of updates on m would be
M = 1 + Σi=1 to n-1[k.Cn,k.pk.(1-p)(n-k)]
where Cn,k is the binomial coefficient. I have tried to solve this but I have stuck some steps after since I do not know how to continue.
Could someone explain me which of the two answers is correct and if it is the second one, show me how to calculate M?
Thank you for your time
Assuming the elements of the array are distinct, the expected number of updates of m is the nth harmonic number, Hn, which is the sum of 1/k for k ranging from 1 to n.
The summation formula can also be represented by the recursion:
H1 &equals; 1
Hn &equals; Hn−1&plus;1/n (n > 1)
It's easy to see that the recursion corresponds to the problem.
Consider all permutations of n−1 numbers, and assume that the expected number of assignments is Hn−1. Now, every permutation of n numbers consists of a permutation of n−1 numbers, with a new smallest number inserted in one of n possible insertion points: either at the beginning, or after one of the n−1 existing values. Since it is smaller than every number in the existing series, it will only be assigned to m in the case that it was inserted at the beginning. That has a probability of 1/n, and so the expected number of assignments of a permutation of n numbers is Hn−1 + 1/n.
Since the expected number of assignments for a vector of length one is obviously 1, which is H1, we have an inductive proof of the recursion.
Hn is asymptotically equal to ln n &plus; γ where γ is the Euler-Mascheroni constant, approximately 0.577. So it increases without limit, but quite slowly.
The values for which m is updated are called left-to-right maxima, and you'll probably find more information about them by searching for that term.
I liked #rici answer so I decided to elaborate its central argument a little bit more so to make it clearer to me.
Let H[k] be the expected number of assignments needed to compute the min m of an array of length k, as indicated in the algorithm under consideration. We know that
H[1] = 1.
Now assume we have an array of length n > 1. The min can be in the last position of the array or not. It is in the last position with probability 1/n. It is not with probability 1 - 1/n. In the first case the expected number of assignments is H[n-1] + 1. In the second, H[n-1].
If we multiply the expected number of assignments of each case by their probabilities and sum, we get
H[n] = (H[n-1] + 1)*1/n + H[n-1]*(1 - 1/n)
= H[n-1]*1/n + 1/n + H[n-1] - H[n-1]*1/n
= 1/n + H[n-1]
which shows the recursion.
Note that the argument is valid if the min is either in the last position or in any the first n-1, not in both places. Thus we are using that all the elements of the array are different.

Sample number with equal probability which is not part of a set

I have a number n and a set of numbers S ∈ [1..n]* with size s (which is substantially smaller than n). I want to sample a number k ∈ [1..n] with equal probability, but the number is not allowed to be in the set S.
I am trying to solve the problem in at worst O(log n + s). I am not sure whether it's possible.
A naive approach is creating an array of numbers from 1 to n excluding all numbers in S and then pick one array element. This will run in O(n) and is not an option.
Another approach may be just generating random numbers ∈[1..n] and rejecting them if they are contained in S. This has no theoretical bound as any number could be sampled multiple times even if it is in the set. But on average this might be a practical solution if s is substantially smaller than n.
Say s is sorted. Generate a random number between 1 and n-s, call it k. We've chosen the k'th element of {1,...,n} - s. Now we need to find it.
Use binary search on s to find the count of the elements of s <= k. This takes O(log |s|). Add this to k. In doing so, we may have passed or arrived at additional elements of s. We can adjust for this by incrementing our answer for each such element that we pass, which we find by checking the next larger element of s from the point we found in our binary search.
E.g., n = 100, s = {1,4,5,22}, and our random number is 3. So our approach should return the third element of [2,3,6,7,...,21,23,24,...,100] which is 6. Binary search finds that 1 element is at most 3, so we increment to 4. Now we compare to the next larger element of s which is 4 so increment to 5. Repeating this finds 5 in so we increment to 6. We check s once more, see that 6 isn't in it, so we stop.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. Binary search finds that 2 elements are at most 4, so we increment to 6. Now we compare to the next larger element of s which is 5 so increment to 7. We check s once more, see that the next number is > 7, so we stop.
If we assume that "s is substantially smaller than n" means |s| <= log(n), then we will increment at most log(n) times, and in any case at most s times.
If s is not sorted then we can do the following. Create an array of bits of size s. Generate k. Parse s and do two things: 1) count the number of elements < k, call this r. At the same time, set the i'th bit to 1 if k+i is in s (0 indexed so if k is in s then the first bit is set).
Now, increment k a number of times equal to r plus the number of set bits is the array with an index <= the number of times incremented.
E.g., n = 100, s = {1,4,5,22}, and our random number is 4. So our approach should return the fourth element of [2,3,6,7,...,21,23,24,...,100] which is 7. We parse s and 1) note that 1 element is below 4 (r=1), and 2) set our array to [1, 1, 0, 0]. We increment once for r=1 and an additional two times for the two set bits, ending up at 7.
This is O(s) time, O(s) space.
This is an O(1) solution with O(s) initial setup that works by mapping each non-allowed number > s to an allowed number <= s.
Let S be the set of non-allowed values, S(i), where i = [1 .. s] and s = |S|.
Here's a two part algorithm. The first part constructs a hash table based only on S in O(s) time, the second part finds the random value k ∈ {1..n}, k ∉ S in O(1) time, assuming we can generate a uniform random number in a contiguous range in constant time. The hash table can be reused for new random values and also for new n (assuming S ⊂ { 1 .. n } still holds of course).
To construct the hash, H. First set j = 1. Then iterate over S(i), the elements of S. They do not need to be sorted. If S(i) > s, add the key-value pair (S(i), j) to the hash table, unless j ∈ S, in which case increment j until it is not. Finally, increment j.
To find a random value k, first generate a uniform random value in the range s + 1 to n, inclusive. If k is a key in H, then k = H(k). I.e., we do at most one hash lookup to insure k is not in S.
Python code to generate the hash:
def substitute(S):
H = dict()
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
H[s] = j
j += 1
return H
For the actual implementation to be O(s), one might need to convert S into something like a frozenset to insure the test for membership is O(1) and also move the len(S) loop invariant out of the loop. Assuming the j in S test and the insertion into the hash (H[s] = j) are constant time, this should have complexity O(s).
The generation of a random value is simply:
def myrand(n, s, H):
k = random.randint(s + 1, n)
return (H[k] if k in H else k)
If one is only interested in a single random value per S, then the algorithm can be optimized to improve the common case, while the worst case remains the same. This still requires S be in a hash table that allows for a constant time "element of" test.
def rand_not_in(n, S):
k = random.randint(len(S) + 1, n);
if k not in S: return k
j = 1
for s in S:
if s > len(S):
while j in S: j += 1
if s == k: return j
j += 1
Optimizations are: Only generate the mapping if the random value is in S. Don't save the mapping to a hash table. Short-circuit the mapping generation when the random value is found.
Actually, the rejection method seems like the practical approach.
Generate a number in 1...n and check whether it is forbidden; regenerate until the generated number is not forbidden.
The probability of a single rejection is p = s/n.
Thus the expected number of random number generations is 1 + p + p^2 + p^3 + ... which is 1/(1-p), which in turn is equal to n/(n-s).
Now, if s is much less than n, or even more up to s = n/2, this expected number is at most 2.
It would take s almost equal to n to make it infeasible in practice.
Multiply the expected time by log s if you use a tree-set to check whether the number is in the set, or by just 1 (expected value again) if it is a hash-set. So the average time is O(1) or O(log s) depending on the set implementation. There is also O(s) memory for storing the set, but unless the set is given in some special way, implicitly and concisely, I don't see how it can be avoided.
(Edit: As per comments, you do this only once for a given set.
If, additionally, we are out of luck, and the set is given as a plain array or list, not some fancier data structure, we get O(s) expected time with this approach, which still fits into the O(log n + s) requirement.)
If attacks against the unbounded algorithm are a concern (and only if they truly are), the method can include a fall-back algorithm for the cases when a certain fixed number of iterations didn't provide the answer.
Similarly to how IntroSort is QuickSort but falls back to HeapSort if the recursion depth gets too high (which is almost certainly a result of an attack resulting in quadratic QuickSort behavior).
Find all numbers that are in a forbidden set and less or equal then n-s. Call it array A.
Find all numbers that are not in a forbidden set and greater then n-s. Call it array B. It may be done in O(s) if set is sorted.
Note that lengths of A and B are equal, and create mapping map[A[i]] = B[i]
Generate number t up to n-s. If there is map[t] return it, otherwise return t
It will work in O(s) insertions to a map + 1 lookup which is either O(s) in average or O(s log s)

Find The number of sides of triangle

I am giving a Array A consists of integer, I have to choose any two integer from array A and any third integer from range [L,R] , such that all three integer forms a valid triangle.
I have to find the number of integer in range [L,R] which can be used to form a valid triangle, by choosing any two values from array A.
I know if i know two sides then third side must be range a-b<x<a+b
where a and b are any two integer from A.
How to find the number of valid integer in [L,R] in O(N) N= Size of A time.
L and R and be very large upto 10^20
To my understanding, complete enumeration will yield an efficient solution by the following argument. The maximum number of possible solutions occurs in the case where any selection of elements of A yields a consistent choice of side lengths for a triangle. Such an input, for instance, would be an input of N times the value 1. However, the number of triples chosen from A can be bounded by
(n choose 3) = n!/(3!(n-3))
= n!/(6(n-3)!)
= (1/6)*(n-2)*(n-1)*n
<= n^3
(where choose is meant to denote the binomial coefficient) which is a polynomial number of choices. Any choice can be checked for validity in constant time, as only 3 values are involved.
Now the contest has ended, so here is my way to solve it in the contest.
The problem is asking how many number x in [L,R]can form triangle with any pair (a_i, a_j) in A.
Naive method is to brute all pairs which is O(N^2 * (R-L+1)) = O(N^2 * (R-L))
But indeed, we do not need to test all O(N^2) pairs, we only need to test O(N) pairs IF A is sorted, namely, all adjacent pair (a_i-1, a_i) for i > 0
Why? Because in sorted A:
If there is some pair (a_j, a_i) where a_j < a_i and j != i-1, which can form triangle with x
Then (a_i-1, a_i) must form a triangle with x too:
a_i-1 + a_i > a_j + a_i > x
x + a_i-1 > x + a_j > a_i
x + a_i > x + a_i-1 > a_j
Therefore checking all (a_i-1, a_i) is sufficient, this is the first core idea to solve this problem.
So now we have a O(NlgN + N*(R-L+1)) = O(N*(R-L)) algorithm.
For the original contest problem, it is still too slow as R-L+1 is as large as 10^18, so we need another tricks.
Note that actually by the triangle inequality, for a pair (a_i-1, a_i), indeed we can find a range of x which can form triangle with this pair:
a_i-1 + a_i > x > a_i - a_i-1 (By above 1 & 2)
For example, (4,5) can form triangle with all 1 < x < 9
So instead of iterating all x in [L,R], we try to find range of x for each pair, which can be done in O(N) as we know the range for x of a pair in O(1). Beware that x must fall in the range of [L,R].
After that we have O(N) ranges / segments of x which then we can union them and the size of the result set is the desired answer.
Union O(N) segments can be done easily in O(N) as well, I leave you as a homework :)
Combine both tricks, the algorithm is O(N lg N + N + N) = O(N lg N)

Generate a random integer from 0 to N-1 which is not in the list

You are given N and an int K[].
The task at hand is to generate a equal probabilistic random number between 0 to N-1 which doesn't exist in K.
N is strictly a integer >= 0.
And K.length is < N-1. And 0 <= K[i] <= N-1. Also assume K is sorted and each element of K is unique.
You are given a function uniformRand(int M) which generates uniform random number in the range 0 to M-1 And assume this functions's complexity is O(1).
Example:
N = 7
K = {0, 1, 5}
the function should return any random number { 2, 3, 4, 6 } with equal
probability.
I could get a O(N) solution for this : First generate a random number between 0 to N - K.length. And map the thus generated random number to a number not in K. The second step will take the complexity to O(N). Can it be done better in may be O(log N) ?
You can use the fact that all the numbers in K[] are between 0 and N-1 and they are distinct.
For your example case, you generate a random number from 0 to 3. Say you get a random number r. Now you conduct binary search on the array K[].
Initialize i = K.length/2.
Find K[i] - i. This will give you the number of numbers missing from the array in the range 0 to i.
For example K[2] = 5. So 3 elements are missing from K[0] to K[2] (2,3,4)
Hence you can decide whether you have to conduct the remaining search in the first part of array K or the next part. This is because you know r.
This search will give you a complexity of log(K.length)
EDIT: For example,
N = 7
K = {0, 1, 4} // modified the array to clarify the algorithm steps.
the function should return any random number { 2, 3, 5, 6 } with equal probability.
Random number generated between 0 and N-K.length = random{0-3}. Say we get 3. Hence we require the 4th missing number in array K.
Conduct binary search on array K[].
Initial i = K.length/2 = 1.
Now we see K[1] - 1 = 0. Hence no number is missing upto i = 1. Hence we search on the latter part of the array.
Now i = 2. K[2] - 2 = 4 - 2 = 2. Hence there are 2 missing numbers up to index i = 2. But we need the 4th missing element. So we again have to search in the latter part of the array.
Now we reach an empty array. What should we do now? If we reach an empty array between say K[j] & K[j+1] then it simply means that all elements between K[j] and K[j+1] are missing from the array K.
Hence all elements above K[2] are missing from the array, namely 5 and 6. We need the 4th element out of which we have already discarded 2 elements. Hence we will choose the second element which is 6.
Binary search.
The basic algorithm:
(not quite the same as the other answer - the number is only generated at the end)
Start in the middle of K.
By looking at the current value and it's index, we can determine the number of pickable numbers (numbers not in K) to the left.
Similarly, by including N, we can determine the number of pickable numbers to the right.
Now randomly go either left or right, weighted based on the count of pickable numbers on each side.
Repeat in the chosen subarray until the subarray is empty.
Then generate a random number in the range consisting of the numbers before and after the subarray in the array.
The running time would be O(log |K|), and, since |K| < N-1, O(log N).
The exact mathematics for number counts and weights can be derived from the example below.
Extension with K containing a bigger range:
Now let's say (for enrichment purposes) K can also contain values N or larger.
Then, instead of starting with the entire K, we start with a subarray up to position min(N, |K|), and start in the middle of that.
It's easy to see that the N-th position in K (if one exists) will be >= N, so this chosen range includes any possible number we can generate.
From here, we need to do a binary search for N (which would give us a point where all values to the left are < N, even if N could not be found) (the above algorithm doesn't deal with K containing values greater than N).
Then we just run the algorithm as above with the subarray ending at the last value < N.
The running time would be O(log N), or, more specifically, O(log min(N, |K|)).
Example:
N = 10
K = {0, 1, 4, 5, 8}
So we start in the middle - 4.
Given that we're at index 2, we know there are 2 elements to the left, and the value is 4, so there are 4 - 2 = 2 pickable values to the left.
Similarly, there are 10 - (4+1) - 2 = 3 pickable values to the right.
So now we go left with probability 2/(2+3) and right with probability 3/(2+3).
Let's say we went right, and our next middle value is 5.
We are at the first position in this subarray, and the previous value is 4, so we have 5 - (4+1) = 0 pickable values to the left.
And there are 10 - (5+1) - 1 = 3 pickable values to the right.
We can't go left (0 probability). If we go right, our next middle value would be 8.
There would be 2 pickable values to the left, and 1 to the right.
If we go left, we'd have an empty subarray.
So then we'd generate a number between 5 and 8, which would be 6 or 7 with equal probability.
This can be solved by basically solving this:
Find the rth smallest number not in the given array, K, subject to
conditions in the question.
For that consider the implicit array D, defined by
D[i] = K[i] - i for 0 <= i < L, where L is length of K
We also set D[-1] = 0 and D[L] = N
We also define K[-1] = 0.
Note, we don't actually need to construct D. Also note that D is sorted (and all elements non-negative), as the numbers in K[] are unique and increasing.
Now we make the following claim:
CLAIM: To find the rth smallest number not in K[], we need to find right most occurrence of r' in D (which occurs at position defined by j), where r' is the largest number in D, which is < r. Such an r' exists, because D[-1] = 0. Once we find such an r' (and j), the number we are looking for is r-r' + K[j].
Proof: Basically the definition of r' and j tells us that there are exactlyr' numbers missing from 0 to K[j], and more than r numbers missing from 0 to K[j+1]. Thus all the numbers from K[j]+1 to K[j+1]-1 are missing (and these missing are at least r-r' in number), and the number we seek is among them, given by K[j] + r-r'.
Algorithm:
In order to find (r',j) all we need to do is a (modified) binary search for r in D, where we keep moving to the left even if we find r in the array.
This is an O(log K) algorithm.
If you are running this many times, it probably pays to speed up your generation operation: O(log N) time just isn't acceptable.
Make an empty array G. Starting at zero, count upwards while progressing through the values of K. If a value isn't in K add it to G. If it is in K don't add it and progress your K pointer. (This relies on K being sorted.)
Now you have an array G which has only acceptable numbers.
Use your random number generator to choose a value from G.
This requires O(N) preparatory work and each generation happens in O(1) time. After N look-ups the amortized time of all operations is O(1).
A Python mock-up:
import random
class PRNG:
def __init__(self, K,N):
self.G = []
kptr = 0
for i in range(N):
if kptr<len(K) and K[kptr]==i:
kptr+=1
else:
self.G.append(i)
def getRand(self):
rn = random.randint(0,len(self.G)-1)
return self.G[rn]
prng=PRNG( [0,1,5], 7)
for i in range(20):
print prng.getRand()

Resources