Probabilty based on quicksort partition - algorithm

I have come across this question:
Let 0<α<.5 be some constant (independent of the input array length n). Recall the Partition subroutine employed by the QuickSort algorithm, as explained in lecture. What is the probability that, with a randomly chosen pivot element, the Partition subroutine produces a split in which the size of the smaller of the two subarrays is ≥α times the size of the original array?
Its answer is 1-2*α.
Can anyone explain me how has this answer come?Please Help.

The choice of the pivot element is random, with uniform distribution.
There are N elements in the array, and we will assume that N is large (or we won't get the answer we want).
If 0≤α≤1, the probability that the number of elements smaller than the pivot is less than αN is α. The probability that the number of elements greater than the pivot is less than αN is the same. If α≤ 1/2, then these two possibilities are exclusive.
To say that the smaller subarray is of length ≥αN, is to say that neither of these conditions holds, therefore the probability is 1-2α.

The other answers didn't quite click with me so here's another take:
If at least one of the 2 subarrays must be you can deduce that the pivot must also be in position . This is obvious by contradiction. If the pivot is then there is a subarray smaller than . By the same reasoning the pivot must also be . Any larger value for the pivot will yield a smaller subarray than on the "right hand side".
This means that , as shown by the diagram below:
What we want to calculate then is the probability of that event (call it A) i.e .
The way we calculate the probability of an event is to sum of the probability of the constituent outcomes i.e. that the pivot lands at .
That sum is expressed as:
Which easily simplifies to:
With some cancellation we get:

Just one more approach for solving the problem (for those who have uneasy time understanding it, like I have).
First.
Since we are talking about "the smaller of the two subarrays", then its length is less than 1/2 * n (n - the number of elements in original array).
Second.
If 0 < a < 0.5 it means the a * n is less than 1/2 * n either.
And thus we are talking from now about two randomly chosen integers bounded by 0 at lowest and 1/2 * n at highest.
Third.
Lets imagine the dice with numbers from 1 to 6 on it's sides. Lets choose a number from 1 to 6, for example 4. Now roll the dice. Each number has a probability 1/6 to be the outcome of this roll. Thus for event "outcome is less or equal to 4" we have probability equal to the sum of probabilities of each of this outcomes. And we have numbers 1, 2, 3 and 4. Altogether p(x <= 4) = 4 * 1/6 = 4/6 = 2/3. So the probability of event "output is bigger than 4" is p(x > 4) = 1 - p(x <= 4) = 1 - 2/3 = 1/3.
Fourth.
Lets go back to our problem. The "chosen number" is now a * n. And we are going to roll the dice with the numbers from 0 to (1/2 * n) on it to get k - the number of elements in a smallest of subarrays. The probability that outcome is bounded by (a * n) at highest is equals to sum of the probabilities of all outcomes from 0 to (a * n). And the probability for any particular outcome k is p(k) = 1 / (1/2 * n).
Therefore p(k <= a * n) = (a * n) * (1 / (1/2 * n)) = 2 * a.
From this we can easily conclude that p(k > a * n) = 1 - p(k <= a * n) = 1 - 2 * a.

Array length is n.
For smaller array length >= αn pivot should be greater than αn number of elements. At the same time pivot should be smaller than αn number of elements( else smaller array size will be less than required)
So out of n element we have to select one among (n-2α)n elements.
required probability is n(1-2α)/n.
Hence 1-2α

The probability would be, the number of desired elements/Total number of elements.
In this case, ((1-αn)-(αn))/n
Since α lies between,0 and 0.5,(1-α) must be bigger than α.Hence the number of elements contained between them would be,
(1-α-α)n=(1-2α)n
and so,the probability would be,
(1-2α)n/n=1-2α

Another approach:
List the "more balanced" options:
αn + 1 to (1 - α)n - 1
αn + 2 to (1 - α)n - 2
...
αn + k to (1 - α)n - k
So k in total. We know that the most balanced is n / 2 to n / 2, so:
αn + k = n / 2 => k = n(1/2 - α)
Similarly, list the "less balanced" options:
αn - 1 to (1 - α)n + 1
αn - 2 to (1 - α)n + 2
...
αn - m to (1 - α)n + m
So m in total. We know that the least balanced is 0 to n so:
αn - m = 0 => m = αn
Since all these options happen with equal probability we can use the frequency definition of probability so:
Pr{More balanced} = (total # of more balanced) / (total # of options) =>
Pr{More balanced} = k / (k + m) = n(1/2 - α) / (n(1/2 - α) + αn) = 1 - 2α

Related

Dividing N items in p groups

You are given N total number of item, P group in which you have to divide the N items.
Condition is the product of number of item held by each group should be max.
example N=10 and P=3 you can divide the 10 item in {3,4,3} since 3x3x4=36 max possible product.
You will want to form P groups of roughly N / P elements. However, this will not always be possible, as N might not be divisible by P, as is the case for your example.
So form groups of floor(N / P) elements initially. For your example, you'd form:
floor(10 / 3) = 3
=> groups = {3, 3, 3}
Now, take the remainder of the division of N by P:
10 mod 3 = 1
This means you have to distribute 1 more item to your groups (you can have up to P - 1 items left to distribute in general):
for i = 0 up to (N mod P) - 1:
groups[i]++
=> groups = {4, 3, 3} for your example
Which is also a valid solution.
For fun I worked out a proof of the fact that it in an optimal solution either all numbers = N/P or the numbers are some combination of floor(N/P) and ceiling(N/P). The proof is somewhat long, but proving optimality in a discrete context is seldom trivial. I would be interested if anybody can shorten the proof.
Lemma: For P = 2 the optimal way to divide N is into {N/2, N/2} if N is even and {floor(N/2), ceiling(N/2)} if N is odd.
This follows since the constraint that the two numbers sum to N means that the two numbers are of the form x, N-x.
The resulting product is (N-x)x = Nx - x^2. This is a parabola that opens down. Its max is at its vertex at x = N/2. If N is even this max is an integer. If N is odd, then x = N/2 is a fraction, but such parabolas are strictly unimodal, so the closer x gets to N/2 the larger the product. x = floor(N/2) (or ceiling, it doesn't matter by symmetry) is the closest an integer can get to N/2, hence {floor(N/2),ceiling(N/2)} is optimal for integers.
General case: First of all, a global max exists since there are only finitely many integer partitions and a finite list of numbers always has a max. Suppose that {x_1, x_2, ..., x_P} is globally optimal. Claim: given and i,j we have
|x_i - x_ j| <= 1
In other words: any two numbers in an optimal solution differ by at most 1. This follows immediately from the P = 2 lemma (applied to N = x_i + x_ j).
From this claim it follows that there are at most two distinct numbers among the x_i. If there is only 1 number, that number is clearly N/P. If there are two numbers, they are of the form a and a+1. Let k = the number of x_i which equal a+1, hence P-k of the x_i = a. Hence
(P-k)a + k(a+1) = N, where k is an integer with 1 <= k < P
But simple algebra yields that a = (N-k)/P = N/P - k/P.
Hence -- a is an integer < N/P which differs from N/P by less than 1 (k/P < 1)
Thus a = floor(N/P) and a+1 = ceiling(N/P).
QED

Reversed Huffman coding

Suppose I have a collection of words with a predefined binary prefix code. Given a very large random binary chunk of data, I can parse this chunk into words using the prefix code.
I want to determine, at least approximately (for random chunks of very large lengths) the expectation values of number of hits for each word (how many times it is mentioned in the decoded text).
At first glance, the problem appears trivial - the probability of each word being scanned from the random pool of bits is completely determined by its length (since each bit can be either 0 or 1). But I suspect this to be an incorrect answer to the problem above since words have different lengths and thus this probability is not the same as the expected number of hits (divided by the length of the data chunk).
UPD: I was asked (in comments below) to state this problem mathematically, so here it goes.
Let w be a list of words written with only zeros and ones (our alphabet consists of only two letters). Furthermore, no word in w is a prefix of any other word. Thus w forms a legitimate binary prefix code. I want to know (at least approximately) the mean value of hits, for each word in w, averaged over all possible binary chunks of data with fixed size n. n can be taken very large, much much larger than any of the lengths of our words. However, words have different lengths and this can not be neglected.
I would appreciate any references to attempts to solve this.
My brief answer: the expected number of hits (or rather the expected proportion of hits) can be calculated for every given list of words.
I will not describe the full algorithm, but just do the following example in detail for illustration: let us fix the following very simple list of three words: 0, 10, 11.
For every n, there are 2^n different data chunks of length n (I mean n bits), each occur with the same probability 2^(-n).
The first observation is that, not all the data chunks can be decoded exactly - e.g. the data 0101, when you decode, there will remain a single 1 in the end.
Let us write U(n) for the number of length n data chunks that CAN be decoded exactly, and write V(n) for the others (i.e. those with an extra 1 in the end). The following recurrence relations are clear:
U(n) + V(n) = 2^n
V(n) = U(n - 1)
with the initial values U(0) = 1 and V(0) = 0.
A simple calculation then yields:
U(n) = (2^(n + 1) + (- 1)^n) / 3.
Now let A(n) (resp. B(n), C(n)) be the sum of the number of hits on the word 0 (resp. 10, 11) for all the U(n) exact data chunks, and let a(n) (resp. b(n), c(n)) be the same sum for all the V(n) inexact data chunks (the last 1 does not count in this case).
Then we have the following relations:
a(n) = A(n - 1), b(n) = B(n - 1), c(n) = C(n - 1)
A(n) = A(n - 1) + U(n - 1) + A(n - 2) + A(n - 2)
B(n) = B(n - 1) + B(n - 2) + U(n - 2) + B(n - 2)
C(n) = C(n - 1) + C(n - 2) + C(n - 2) + U(n - 2)
Explanation for the relations 2 3 4:
If D is an exact data chunk of length n, then there are three possibilities:
D ends with 0, and deleting this 0 yields an exact data chunk of length n - 1;
D ends with 10, and deleting this 10 yields an exact data chunk of length n - 2;
D ends with 11, and deleting this 11 yields an exact data chunk of length n - 2.
Thus, for example, when we sum up all the hit numbers for 0 in all exact data chunks of length n, the contributions of the three cases are respectively A(n - 1) + U(n - 1), A(n - 2), A(n - 2). Similarly for the other two equalities.
Now, solving these recurrence relations, we get:
A(n) = 2/9 * n * 2^n + (smaller terms)
B(n) = C(n) = 1/9 * n * 2^n + (smaller terms)
Since U(n) = 2/3 * 2^n + (smaller terms), our conclusion is that there are approximately n/3 hits on 0, n/6 hits on 10, n/6 hits on 11.
Note that the same proportions hold if we take also the V(n) inexact data chunks into account, because of the relations between A(n), B(n), C(n), U(n) and a(n), b(n), c(n), V(n).
This method generalizes to any list of words. It's the same idea as if you were to solve this problem using dynamic programing - create status, find recurrence relation, and establish transition matrix.
To go further
I think the following might also be true, which will simplify the answer further.
Let w_1, ..., w_k be the words in the list, and let l_1, ..., l_k be their lengths.
For every i = 1, ..., k, let a_i be the proportion of hits of w_i, i.e. for length n data chunks the expected number of hits for w_i is a_i * n + (smaller terms).
Then, my feeling (conjecture) is that a_i * 2^(l_i) is the same for all i, i.e. if one word is one bit longer than another, then its hit number is a half of that of the other.
This conjecture, if correct, is probably not very difficult to prove. But I'm too lazy to think now...
If this is true, then we can calculate those a_i very easily, because we have the identity:
sum (a_i * l_i) = 1.
Let me illustrate this with the above example.
We have w_1 = 0, w_2 = 10, w_3 = 11, hence l_1 = 1, l_2 = l_3 = 2.
According to the conjecture, we should have a_1 = 2 * a_2 = 2 * a_3. Thus a_2 = a_3 = x and a_1 = 2x. The above equality becomes:
2x * 1 + x * 2 + x * 2 = 1
Hence x = 1 / 6, and we have a_1 = 1 / 3, a_2 = a_3 = 1 / 6, as can be verified by the above calculation.
Let's make a simple machine that can recognize words: a DFA with an accepting state for each word. To construct this DFA, start with a binary tree with each left-child-edge labeled 0 and each right-child-edge labeled 1. Each leaf is either a word-accepter (if the path to that leaf down the tree is the word's spelling) or is garbage (a string of letters that isn't a prefix for any valid word). We wire up "restart" edges from the leaves back to the root of the tree*.
Let's find out what the frequency of matching each word would be, if we had a string of infinite length. To do this, treat the graph of the DFA as a Markov state transition diagram, initialize the starting state to be at the root with probability 1 and all other states 0, and find the steady state distribution (by finding the dominant eigenvector of the transition diagram's corresponding matrix).
Our string is not of infinite length. But since n is large, I expect "edge effects" to not matter so much. We can approximate the matching frequency by word by taking the matching rate by word and multiplying by n. If we want to be more precise, instead of taking the eigenvector we could just take the transition matrix to the nth power and multiply that with the starting distribution to get the resulting distribution after n letters.
*This isn't quite precise, because this Markov system would spend some nonzero amount of time at the root, when after recognizing a word or skipping garbage it should immediately go to the 0-child or 1-child depending. So we don't actually wire up our "restart" edges to a root: from a word-accepting node we wire up two restart edges (one to the 0-child and one to the 1-child of the root); we replace garbage nodes that are left-children with an edge to the 0-child; and we replace garbage nodes that are right-children with an edge to the 1-child. In fact, if we set our initial state to 0 with probability 0.5 and 1 with probability 0.5, we don't even need the root.
EDIT: To use #WhatsUp's example, we start with a DFA that looks like this:
We rewire it a little bit to restart after a word is accepted and get rid of the root node:
The corresponding Markov transition matrix is:
0.5 0 0.5 0.5
0.5 0 0.5 0.5
0 0.5 0 0
0 0.5 0 0
whose first eigenvector is:
0.333
0.333
0.167
0.167
Which is to say that it spends 1/3 of its time in the 0 node, 1/3 in 1, 1/6 in 10, and 1/6 in 11. This is in agreement with #WhatsUp's results for that example.

Counting the strictly increasing sequences

I aligned the N candles from left to right. The ith candle from the left has the height Hi and the color Ci, an integer ranged from 1 to a given K, the number of colors.
Problem: , how many strictly increasing ( in height ) colorful subsequences are there? A subsequence is considered as colorful if every of the K colors appears at least one times in the subsequence.
For Ex: N=4 k= 3
H C
1 1
3 2
2 2
4 3
only two valid subsequences are (1, 2, 4) and (1, 3, 4)
I think it is a problem of Fenwick Tree please provide me a approach how to proceeded with such type of problems
For a moment, let's forget about the colors. So the problem is simpler: count the number of increasing subsequences. This problem has a standard solution:
1. Map each value to [0...n - 1] range.
2. Let's assume the f[value] is the number of increasing subsequences that have value as their last element.
3. Initially, f is filled with 0.
4. After that, you iterate over all array elements from left to right and perform the following operation: f[value] += 1 + get_sum(0, value - 1)(it means that you add this element to all possible subsequences so that they remain strictly increasing), where value is the current element of the array and get_sum(a, b) returns the sum of f[a] + f[a + 1] + ... + f[b].
5. The answer is f[0] + f[1] + ... + f[n - 1].
Using binary index tree(aka Fenwick tree), it is possible to do get_sum operation in O(log n) and get O(n log n) total time complexity.
Now let's come back to the original problem. To take into account the colors, we can compute f[value, mask] instead of f[value](that is, the number of increasing subsequences that have value as their last element and mask(it is a bitmask that shows which colors are present) colors). Then an update for each element looks like this:
for mask in [0...2^K - 1]:
f[value, mask or 2^(color[i] - 1)] += 1 + get_sum(0, value - 1, mask)
The answer is f[0, 2^K - 1] + f[1, 2^K - 1] + ... + f[n - 1, 2^K - 1].
You can maintain 2^K binary index trees to achieve O(n * log n * 2^K) time complexity using the same idea as in a simpler problem.

Maximum of sums of unsorted array and each of a number of sorted arrays

Given an unsorted array
A = a_1 ... a_n
And a set of sorted Arrays
B_i = b_i_1 ... b_i_n # for i from 1 to $large_number
I would like to find the maximums from the (not yet calculated) sum arrays
C_i = (a_1 + b_i_1) ... (a_n + b_i_n)
for each i.
Is there a trick to do better than just calculating all the C_i and finding their maximums in O($large_number * n)?
Can we do better when we know that the B arrays are just shifts from an endless sequence,
e.g.
S = 0 1 4 9 16 ...
B_i = S[i:i+n]
(The above sequence has the maybe advantageous property that (S_i - S_i-1 > S_i-1 - S_i-2))
There are $large_number * n data in your first problem, so there can't be any such trick.
You can prove this with an adversary argument. Suppose you have an algorithm that solves your problem without looking at all n * $large_number entries of b. I'm going to pick a fixed a, namely (-10, -20, -30, ..., -10n). The first $large_number * n - 1 the algorithm looks at an entry b_(i,j), I'll answer that it's 10j, for a sum of zero. The last time it looks at an entry, I'll answer that it's 10j+1, for a sum of 1.
If $large_number is Omega(n), your second problem requires you to look at n * $large_number entries of S, so it also can't have any such trick.
However, if you specify S, there may be something. And if $large_number <= n/2 (or whatever it is), then, all of the entries of S must be sorted, so you only have to look at the last B.
If we don't know anything I don't it's possible to do better than O($large_number * n)
However - If it's just shifts of an endless sequence we can do it in O($large_number + n):
We calculate B_0 ןמ O($large_number).
Than B_1 = (B_0 - S[0]) + S[n+1]
And in general: B_i = (B_i-1 - S[i-1]) + S[i-1+n].
So we can calculate all the other entries and the max in O(n).
This is for a general sequence - if we have some info about it, it might be possible to do better.
we know that the B arrays are just shifts from an endless sequence,
e.g.
S = 0 1 4 9 16 ...
B_i = S[i:i+n]
You can easily calculate S[i:i+n] as (sum of squares from 1 to i+n) - (sum of squares from 1 to i-1)
See https://math.stackexchange.com/questions/183316/how-to-get-to-the-formula-for-the-sum-of-squares-of-first-n-numbers
With the provided example, S1 = 0, S2 = 1, S3 = 4...
Let f(n) = SUM of Si for i=1 to n = (n-1)(n)(2n-1)/6
B_i = f(i+n) - f(i-1)
You then add SUM(A) to each sum.
Another approach is to calculate the difference between B_i and B_(i-1):
That would be: S[i:i+n] - S[i-1:i+n-1] = S(i+n) - S(i-1)
That way, you can just calculate the difference of the sums of each array with the previous one. In my understanding, since Ci = SUM(Bi)+SUM(A), SUM(A) becomes a constant that is irrelevant in finding the maximum.

Algorithm to compute k fractions of form 1/r summing up to 1

Given k, we need to write 1 as a sum of k fractions of the form 1/r.
For example,
For k=2, 1 can uniquely be written as 1/2 + 1/2.
For k=3, 1 can be written as 1/3 + 1/3 + 1/3 or 1/2 + 1/4 + 1/4 or 1/6 + 1/3 + 1/2
Now, we need to consider all such set of k fractions that sum upto 1 and return the highest denominator among all such sets; for instance, the sample case 2, our algorithm should return 6.
I came across this problem in a coding competition and couldn't come up with an algorithm for the same. A bit of Google search later revealed that such fractions are called Egyption Fractions but probably they are set of distinct fractions summing upto a particular value (not like 1/2 + 1/2). Also, I couldn't find an algo to compute Egyption Fractions (if they are at all helpful for this problem) when their number is restricted by k.
If all you want to do is find the largest denominator, there's no reason to find all the possibilities. You can do this very simply:
public long largestDenominator(int k){
long denominator = 1;
for(int i=1;i<k;i++){
denominator *= denominator + 1;
}
return denominator;
}
For you recursive types:
public long largestDenominator(int k){
if(k == 1)
return 1;
long last = largestDenominator(k-1);
return last * (last + 1); // or (last * last) + last)
}
Why is it that simple?
To create the set, you need to insert the largest fraction that will keep it under 1 at each step(except the last). By "largest fraction", I mean by value, meaning the smallest denominator.
For the simple case k=3, that means you start with 1/2. You can't fit another half, so you go with 1/3. Then 1/6 is left over, giving you three terms.
For the next case k=4, you take that 1/6 off the end, since it won't fit under one, and we need room for another term. Replace it with 1/7, since that's the biggest value that fits. The remainder is 1/42.
Repeat as needed.
For example:
2 : [2,2]
3 : [2,3,6]
4 : [2,3,7,42]
5 : [2,3,7,43,1806]
6 : [2,3,7,43,1807,3263442]
As you can see, it rapidly becomes very large. Rapidly enough that you'll overflow a long if k>7. If you need to do so, you'll need to find an appropriate container (ie. BigInteger in Java/C#).
It maps perfectly to this sequence:
a(n) = a(n-1)^2 + a(n-1), a(0)=1.
You can also see the relationship to Sylvester's sequence:
a(n+1) = a(n)^2 - a(n) + 1, a(0) = 2
Wikipedia has a very nice article explaining the relationship between the two, as pointed out by Peter in the comments.
I never heard of Egyptian fractions before but here are some thoughts:
Idea
You can think of them geometrically:
Start with a unit square (1x1)
Draw either vertical or horizontal lines dividing the square into equal parts.
Repeat optionally the drawing of lines inside any of the sub-boxes evenly.
Stop any time you want.
The rectangles present will form a set of fractions of the form 1/n that add to 1.
You can count them and they might equal your 'k'.
Depending on how many equal sections you divided a rectangle into, it will tell whether you have 1/2 or 1/3 or whatever. 1/6 is 1/2 of 1/3 or 1/3 of 1/2. (i.e. You dived by 2 and then one of the sub-boxes by 3 OR the other way around.)
Idea 2
You start with 1 box. This is the fraction 1/1 with k=1.
When you sub-divide by n you add n to the count of boxes (k or of fractions summed) and subtract 1.
When you sub-divide any of those boxes, again, subtract 1 and add n, the number of divisions. Note that n-1 is the number of lines you drew to divide them.
More
You are going to start searching for the answer with k. Obviously k * 1/k = 1 so you have one solution right there.
How about k-1?
There's a solution there: (k-2) * 1/(k-1) + 2 * (1/((k-1)*2))
How did I get that? I made k-1 equal sections (with k-2 vertical lines) and then divided the last one in half horizontally.
Each solution is going to consist of:
taking a prior solution
using j less lines and some stage and dividing one of the boxes or sub-boxes into j+1 equal sections.
I don't know if all solutions can be formed by repeating this rule starting from k * 1/k
I do know you can get effective duplicates this way. For example: k * 1/k with j = 1 => (k-2) * 1/(k-1) + 2 * (1/((k-1)*2)) [from above] but k * 1/k with j = (k-2) => 2 * (1/((k-1)*2)) + (k-2) * 1/(k-1) [which just reverses the order of the parts]
Interesting
k = 7 can be represented by 1/2 + 1/4 + 1/8 + ... + 1/(2^6) + 1/(2^6) and the general case is 1/2 + ... + 1/(2^(k-1)) + 1/(2^(k-1)).
Similarly for any odd k it can be represented by 1/3 + ... + 3 * [1/(3^((k-1)/2)].
I suspect there are similar patterns for all integers up to k.

Resources