average number of swaps in dutch national flag - algorithm

I just want to know how can I get the average number of swaps in the two colors dutch national flag. sorting positive and negative numbers instead of colors. I'm assuming that the negative numbers are equal to the positive numbers and the array's numbers are given a random configuration, I'm not sure if my assumption is correct.
Algorithm(A[0…n-1]):
i ← 0
j ← n - 1
while i ≤ j:
if A[i] < 0:
i ← i + 1
else:
swap(A[i], A[j])
j ← j - 1
Thank you.

If the distribution of the positives and negatives is uniform, the first element is positive with probability 1/2. After the first iteration, the array is shortened by one element and the distribution of the subarray is still uniform (moving an element is a neutral operation).
There are exactly n iterations before the subarray is empty thus the average number of swaps is n/2. More precisely, the number of swaps follows a Binomial law with parameters 1/2, n (this is a Bernouilli scheme).

Related

How many times variable m is updated

Given the following pseudo-code, the question is how many times on average is the variable m being updated.
A[1...n]: array with n random elements
m = a[1]
for I = 2 to n do
if a[I] < m then m = a[I]
end for
One might answer that since all elements are random, then the variable will be updated on average on half the number of iterations of the for loop plus one for the initialization.
However, I suspect that there must be a better (and possibly the only correct) way to prove it using binomial distribution with p = 1/2. This way, the average number of updates on m would be
M = 1 + Σi=1 to n-1[k.Cn,k.pk.(1-p)(n-k)]
where Cn,k is the binomial coefficient. I have tried to solve this but I have stuck some steps after since I do not know how to continue.
Could someone explain me which of the two answers is correct and if it is the second one, show me how to calculate M?
Thank you for your time
Assuming the elements of the array are distinct, the expected number of updates of m is the nth harmonic number, Hn, which is the sum of 1/k for k ranging from 1 to n.
The summation formula can also be represented by the recursion:
H1 &equals; 1
Hn &equals; Hn−1&plus;1/n (n > 1)
It's easy to see that the recursion corresponds to the problem.
Consider all permutations of n−1 numbers, and assume that the expected number of assignments is Hn−1. Now, every permutation of n numbers consists of a permutation of n−1 numbers, with a new smallest number inserted in one of n possible insertion points: either at the beginning, or after one of the n−1 existing values. Since it is smaller than every number in the existing series, it will only be assigned to m in the case that it was inserted at the beginning. That has a probability of 1/n, and so the expected number of assignments of a permutation of n numbers is Hn−1 + 1/n.
Since the expected number of assignments for a vector of length one is obviously 1, which is H1, we have an inductive proof of the recursion.
Hn is asymptotically equal to ln n &plus; γ where γ is the Euler-Mascheroni constant, approximately 0.577. So it increases without limit, but quite slowly.
The values for which m is updated are called left-to-right maxima, and you'll probably find more information about them by searching for that term.
I liked #rici answer so I decided to elaborate its central argument a little bit more so to make it clearer to me.
Let H[k] be the expected number of assignments needed to compute the min m of an array of length k, as indicated in the algorithm under consideration. We know that
H[1] = 1.
Now assume we have an array of length n > 1. The min can be in the last position of the array or not. It is in the last position with probability 1/n. It is not with probability 1 - 1/n. In the first case the expected number of assignments is H[n-1] + 1. In the second, H[n-1].
If we multiply the expected number of assignments of each case by their probabilities and sum, we get
H[n] = (H[n-1] + 1)*1/n + H[n-1]*(1 - 1/n)
= H[n-1]*1/n + 1/n + H[n-1] - H[n-1]*1/n
= 1/n + H[n-1]
which shows the recursion.
Note that the argument is valid if the min is either in the last position or in any the first n-1, not in both places. Thus we are using that all the elements of the array are different.

Get random values without replacement given a probability distribution

Given a probability distribution – a mapping of objects to their probability – I want an algorithm that selects random objects from the map and is without replacement (the probability distribution is updated per selection). However, the algorithm must have an O(1) space complexity and have high quality randomness. I tried searching for implementations, but none of them seemed to have both of these properties.
EDIT:
Probability without replacement:
You have a bag of objects, each object has a probability of being selected. Once you select an object, you remove it from the bag. All objects now a different probability of being selected.
With O(1) space complexity, we are not storing a list with objects repeated according to their probability of being selected. Instead, we are only storing a probability distribution and iterating over a permutation (but not storing that permutation).
I would try variation of Fisher-Yates-Knuth shuffle (in Durstenfeld implementation it is O(1))
Original:
for i from 0 to n − 1 do
j ← random integer such that 0 ≤ j ≤ i
if j ≠ i
a[i] ← a[j]
a[j] ← source[i]
Modified to fulfill requirements:
for i from 0 to n − 1 do
p ← probabilities(n-i)
j ← random integer via probabilities(n-i) such that 0 ≤ j ≤ i
if j ≠ i
a[i] ← a[j]
a[j] ← source[i]
So at each step you would update probabilities and use them to sample index. After that it's just FYK shuffle.

algorithm proof - building least number after deleting k digits from an n-digit number

Problem: given an n-digit number, which k (k < n) digits should be deleted from it to make the number left is the smallest among all cases (the relative sequence of remaining digits should not be changed). e.g. delete 2 digits from '24635', the smallest left number is '235'.
A solution: Delete the first digit (from left to right) which is larger than or equal to its right neighbor, or the last digit, if we cannot find one as such. Repeat this procedure for k times. (see codecareer for reference. There are other solutions such as geeksforgeeks, stackoverflow, but I thought the one described here is more intuitive, so I prefer this one.)
The problem now is, how to prove the solution above is correct, i.e. how can it guarantee the final number is smallest by making it the smallest after deleting a single digit at each step.
Suppose k = 1.
Let m = Σi=0,...,n aibi and n+1 digit number anan-1...a1a0 with base b, i.e. 0 ≤ ai < b ∀ 0 ≤ i ≤ n (e.g. b = 10).
Proof
∃ j > 0 with aj > aj-1 and let j be maximal.
This means aj is the last digit of a (not necessary strictly) increasing sequence of consecutive digits.
Then the digit aj is now removed from the number and the resulting number m' has the value
m' = Σi=0,...,j-1 aibi + Σi=j+1,...,n aibi-1
The aim of this reduction is to maximize the difference m-m'. So lets take a look:
m - m' = Σi=0,...,n aibi - (Σi=0,...,j-1 aibi + Σi=j+1,...,n aibi-1)
= ajbj + Σi=j+1,...,n (aibi - aibi-1)
= anbn + Σi=j,...,n-1 (ai - ai+1)bi
Can there be a better choice of j to get a bigger difference?
Since an...aj is an increasing sub sequence, ai-ai+1 ≥ 0 holds. So choosing j' > j instead of j, you get more zeros where you now have a positive number, i.e. the difference gets not bigger, but lower if there exists an i with ai+1 < ai (strict smaller).
j is supposed to be maximal, i.e. aj-1-aj < 0. We know
bj-1 > Σi=0,...,j-2(b-1)bi = bi-1-1
This means, that if we choose `j' < j', we get a negative addition to the difference, so it also gets not bigger.
If ∄ j > 0 with aj > aj-1 the above proof works for j = 0.
What is left to do?
This is only the proof that your algorithm works for k = 1.
It is possible to extend the above proof to multiple sub sequences of (not necessary strictly) increasing digits. It's exact the same proof but much less readable, due to the number of indexes you need.
Maybe you can also use induction, since there are no interactions between the digits (blocking following next choices or something).
Here is a simple argument that your algorithm works for any k. Suppose there is a digit in the mth place that is less than or equal to it's right (m+1)th digit neighbor, and you delete the mth digit but not the (m+1)th. Then you can delete the (m+1)th digit instead of the mth, and you will get an answer less than or equal to your original answer.
notice: this proof is for building the maximum number after removing k digits, but the thinking is similar
key lemma: maximum (m + 1)-digit number contains maximum m-digit
number for all m = 0, 1, ..., n - 1
proof:
greedy solution to delete one digit from some number to get the maximum
result: just delete the first digit which next digit is greater than it, or the last digit if digits are in non-ascending order. This is very easy to prove.
we use contradiction to proof the lemma.
suppose the first time the lemma is broken when m = k, so S(k) ⊄ S(k + 1). Notice that the S(k) ⊂ S(n) as the initial number contains all sub optimal ones, so there must exist a x that S(k) ⊂ S(x) and S(k) ⊄ S(x - 1), k + 2 <= x <= n
we use the greedy solution above to delete only one digit S[X][y] from S(x) to get S(x - 1), so S[X][y] ∈ S(x) and S[X][y] ∉ S(x - 1) and S(k) must contain it. We now use contradiction to prove that S(k) does not need to contain this digit .
According to our greedy solution, all digits from beginning to S[X][y] are
in non-ascending order.
if S[X][y] is at the tail, then S(k) can be the first k digits of S(x) ---> contradiction!
otherwise, we firstly know that all digits in S[X][1, 2,..., y] are in S[k]. If there is a S[X][z] is not inS(k), 1 <= z <= y - 1, then we can shift digits of S(k) that in range S[X][z + 1, y] to left one unit to get a greater or equal S(k). Therefore, there are at least 2 digit after S[X][y] that are not in S(k) as x >= k + 2. Then, we can follow the prefix of S(k) to S[X][y], but we do not use S[X][y], we use from S[X][y + 1]. As S[X][y + 1] > S[X][y], we can build a greater S(k) -------> contradiction!
so, we prove lemma. If we have got S(m + 1), and we know S(m + 1) contains S(m), then S(m) must be the maximum number after removing one digit from S(m + 1)

Finding sub-array sum in an integer array

Given an array of N positive integers. It can have n*(n+1)/2 sub-arrays including single element sub-arrays. Each sub-array has a sum S. Find S's for all sub-arrays is obviously O(n^2) as number of sub-arrays are O(n^2). Many sums S's may be repeated also. Is there any way to find count of all distinct sum (not the exact values of sums but only count) in O(n logn).
I tried an approach but stuck on the way. I iterated the array from index 1 to n.
Say a[i] is the given array. For each index i, a[i] will add to all the sums in which a[i-1] is involved and will include itself also as individual element. But duplicate will emerge if among sums in which a[i-1] is involved, the difference of two sums is a[i]. I mean that, say sums Sp and Sq end up at a[i-1] and difference of both is a[i]. Then Sp + a[i] equals Sq, giving Sq as a duplicate.
Say C[i] is count of the distinct sums in which end up at a[i].
So C[i] = C[i-1] + 1 - numbers of pairs of sums in which a[i-1] is involved whose difference is a[i].
But problem is to find the part of number of pairs in O(log n). Please give me some hint about this or if I am on wrong way and completely different approach is required problem point that out.
When S is not too large, we can count the distinct sums with one (fast) polynomial multiplication. When S is larger, N is hopefully small enough to use a quadratic algorithm.
Let x_1, x_2, ..., x_n be the array elements. Let y_0 = 0 and y_i = x_1 + x_2 + ... + x_i. Let P(z) = z^{y_0} + z^{y_1} + ... + z^{y_n}. Compute the product of polynomials P(z) * P(z^{-1}); the coefficient of z^k with k > 0 is nonzero if and only if k is a sub-array sum, so we just have to read off the number of nonzero coefficients of positive powers. The powers of z, moreover, range from -S to S, so the multiplication takes time on the order of S log S.
You can look at the sub-arrays as a kind of tree. In the sense that subarray [0,3] can be divided to [0,1] and [2,3].
So build up a tree, where nodes are defined by length of the subarray and it's staring offset in the original array, and whenever you compute a subarray, store the result in this tree.
When computing a sub-array, you can check this tree for existing pre-computed values.
Also, when dividing, parts of the array can be computed on different CPU cores, if that matters.
This solution assumes that you don't need all values at once, rather ad-hoc.
For the former, there could be some smarter solution.
Also, I assume that we're talking about counts of elements in 10000's and more. Otherwise, such work is a nice excercise but has not much of a practical value.

Sum of last k digits same as sum of first k digits

I want to find if sum of first k digits of few numbers in given range is equal to sum of last k digits. Here the range is very large and k is less than 20.
One way we can do this is by brute force method. Can someone suggest some other efficient algo. for same?
If it is a range, the first digits will not change often and the last digits will change in a simple way. S is the sum of the first 20 digits. While the secund digit doesn't change, the sum will be increased by one when you go to the next digit. So if all yours digits, except the last one, are fixed, and if the sum with the last digit equal to i is Si, you the only good last digit is n= S - Si + i. You then have to check if n is between 0 and 9, and if the resulting number is in the interval. This decrease by ten the number of lookups.
You can check for the next secund lower digits.
If the first n is lower than 0, you need to decrease the secund digit by -n. Call n2 this secund digit. If n2 > = 0, the good numbers will end by (n2,0), (n2 -1,1), ..., (0, n2). This decrease the complexity by 100.
If n is bigger than 10, you increase the second digit by n-9. Call n2 the second digit. If n2<=9, the good numbers are (n2,9),(n2-1,8),...,(0,something).
This also decrease the complexity by 100.
You can do the same for the third digit, and then for the fourth, up to the 20. This will result in just 1 sum, and a complexity in O(number of solutions), so it is minimal. For coding, be careful that your firsts numbers can change. Do one computation per group of 20 first numbers.
one theoretical improvement to the brute force method:
1) sum up the frist k digits, store in sumFirst
2) sum up the last k digits, but stop if sum exceeds sumFirst.
Point 2 could save summing up some of the last few digits.
But you have to measure if the additional logic, costs more then simply adding all k digits.
Optimization N-k
One way to improve the algorithm is if when the number having N digits has the following property:N < 2k.
For instance if N = 5 and k = 3, 5 < 2x3, digits being
abcde
you only have to count ab against de (ie no need to check k (3) digits, since the 3rd is shared by k-last and k-first digits).In other words, the number of digits to be counted both sides is only
min(k, N-k), having N >= k
If you are going to use that multiple times for the same array, you can sum all element with previous elements which is O(n) where the size of array is n i.e
for(int i = 1; i < n; i++)
arr[i] = arr[i] + arr[i-1];
This will convert your array from probability density function to cumulative distribution function (for discrete numbers). Therefore your query is going to be O(1) i.e.
if(arr[k-1] == (arr[n-1]-arr[n-k])) //arr[k-1] is sum of first k element
return true;
return false;
another improvement over the brute force:
i = 0, T = 0
while |T| < 9 * (k - i)
T = T + last[i] - first[i]
i = i + 1
return T == 0

Resources