Correctness proof of Algorithm - algorithm

Our teacher asked us a question which goes like this.
Given a list of n distinct integers and a sequence of n boxes with preset
inequality signs inserted between them, design an algorithm that places
the numbers into the boxes to satisfy those inequalities. For example, the
numbers 2, 5, 1, and 0 can be placed in the four boxes as shown below
The from looking at the question we can say the numbers be sorted and if less than symbol appears next we have to insert the least number, if greater than symbol appears we have to insert max number and proceed as so.
How can I say this algorithm works for every case?
Traversing in
one direction i found one solution and in reverse order i found
another. What is the efficient method to find all solutions to this
problem?

To answer question 1, I'd say that should be done by induction over the number of distinct numbers involved.
Say n is the number of numbers.
for n = 1 there's nothing left to prove.
for n = 2, you have either a greater than or a less than operator. Since the numbers are distinct and the set of natural (or real) numbers is well ordered, your algorithm will trivially yield a solution.
n -> n+1:
case 1: the first operator is a less than sign. According to your algorithm you pick the smallest number and put it into the first box. Then you solve the problem for the last n boxes. This is possible by induction. Since the number in the first box is the smallest, it is also smaller than the number in the second box. Therefor you have a solution.
Case 2: the first operator is a greater than sign. This also works analogue to case 1.
QED
Now for the second part of the question. My thoughts came up with the algorithm described below. Happy with the fact I solved the question (of getting all solutions) in the first place, I can't guarantee that it's the fastest solution.
As already noted in a comment, if there are no operator changes, there will be only one solution. So we assume there are operator changes (but the algorithm will produce this solution too).
for all operator changes of the form >< or nil < (first operator is a <):
place the smallest element between them
now divide the remaining set of numbers into two.
if there are n numbers and there are k operators to the left of
the placed number, that will give you k over (n - 1) possibilities.
do this recursively with the two remaining parts.
If no operator changes of the form >< or nil < are left,
do the same with the mirrored operator changes <> and nil >,
but choose the highest element of the set.
If no operator changes are left, fill in the elements according to the
remaining operator in ascending or descending order.
OK, this is not program code, but I think that will be comparatively easy (the choose k out of n - 1 is the hard part). I hope my explanation of the algorithm is understandable. And beware, the number of solutions grows fast (exponentially, probably worse).

Answering part two of the question in part, finding one single solution:
First, the input can be sorted, which can be done in O( n log n ) time. Then, the algorithm described can be applied; the minimum and maximum element are at the beginning and the end of the list, which can be accessed in constant time. This means that, once the input is sorted, the output can be generated in O( n ) time. This yields a runtime bound of O( n log n ) in total.

Here's a proof that there is an assignment of numbers to boxes:
The diagram defines a partial order on the empty boxes (box A <= box B
if either A=B, or A is the left of B and there's only < between them,
or A is to the right of B and there's only > between them). It's easy
to check that <= satisfies the properties of a partial order.
Any finite partially ordered set can be totally ordered (for example
by using a topological sort).
So sort the boxes with respect to the total order, and assign numbers
based on the position in the resulting list.
This yields an answer to part (2) of your question, which is to enumerate all possible solutions. A solution is exactly a total order that's compatible with the partial order defined in the proof, and you can generate all possible total orders by choosing which minimal element you pick at each stage in the topological sort algorithm.
That is, you can generate all solutions like this:
topo_sort(boxes):
if boxes = [] then return [[]]
let result = []
for each x in minimal(boxes)
for each y in topo_sort(boxes - x)
append ([x] + y) to result
return result
minimal(boxes):
return [all x in boxes such that there's no y != x in boxes with y lesseq x]
Here "lesseq" is constructed as in the proof.
If you squint hard enough, you can see that your algorithm can also be viewed as a topological sorting algorithm, using the observation that if the next symbol is a > then the first element is maximal in the remaining list, and if the next symbol is a < then the first element is minimal in the remaining list. That observation gives you a proof that your algorithm is also correct.
Here's an inefficient Python 2.7 implementation:
def less(syms, i, j):
if i == j: return False
s = '<' if i < j else '>'
return all(c == s for c in syms[min(i,j):max(i,j)])
def order(boxes, syms):
if not boxes:
yield []
return
for x in [b for b in boxes if not any(less(syms, a, b) for a in boxes)]:
for y in order(boxes - set([x]), syms):
yield [x] + y
def solutions(syms):
for idxes in order(set(range(len(syms)+1)), syms):
yield [idxes.index(i) for i in xrange(len(syms)+1)]
print list(solutions('<><'))
Which gives as output all 5 solutions:
[[0, 2, 1, 3], [0, 3, 1, 2], [1, 2, 0, 3], [1, 3, 0, 2], [2, 3, 0, 1]]

Related

Shuffle a deck of card with equal likely to be 52 permutations [duplicate]

The famous Fisher-Yates shuffle algorithm can be used to randomly permute an array A of length N:
For k = 1 to N
Pick a random integer j from k to N
Swap A[k] and A[j]
A common mistake that I've been told over and over again not to make is this:
For k = 1 to N
Pick a random integer j from 1 to N
Swap A[k] and A[j]
That is, instead of picking a random integer from k to N, you pick a random integer from 1 to N.
What happens if you make this mistake? I know that the resulting permutation isn't uniformly distributed, but I don't know what guarantees there are on what the resulting distribution will be. In particular, does anyone have an expression for the probability distributions over the final positions of the elements?
An Empirical Approach.
Let's implement the erroneous algorithm in Mathematica:
p = 10; (* Range *)
s = {}
For[l = 1, l <= 30000, l++, (*Iterations*)
a = Range[p];
For[k = 1, k <= p, k++,
i = RandomInteger[{1, p}];
temp = a[[k]];
a[[k]] = a[[i]];
a[[i]] = temp
];
AppendTo[s, a];
]
Now get the number of times each integer is in each position:
r = SortBy[#, #[[1]] &] & /# Tally /# Transpose[s]
Let's take three positions in the resulting arrays and plot the frequency distribution for each integer in that position:
For position 1 the freq distribution is:
For position 5 (middle)
And for position 10 (last):
and here you have the distribution for all positions plotted together:
Here you have a better statistics over 8 positions:
Some observations:
For all positions the probability of
"1" is the same (1/n).
The probability matrix is symmetrical
with respect to the big anti-diagonal
So, the probability for any number in the last
position is also uniform (1/n)
You may visualize those properties looking at the starting of all lines from the same point (first property) and the last horizontal line (third property).
The second property can be seen from the following matrix representation example, where the rows are the positions, the columns are the occupant number, and the color represents the experimental probability:
For a 100x100 matrix:
Edit
Just for fun, I calculated the exact formula for the second diagonal element (the first is 1/n). The rest can be done, but it's a lot of work.
h[n_] := (n-1)/n^2 + (n-1)^(n-2) n^(-n)
Values verified from n=3 to 6 ( {8/27, 57/256, 564/3125, 7105/46656} )
Edit
Working out a little the general explicit calculation in #wnoise answer, we can get a little more info.
Replacing 1/n by p[n], so the calculations are hold unevaluated, we get for example for the first part of the matrix with n=7 (click to see a bigger image):
Which, after comparing with results for other values of n, let us identify some known integer sequences in the matrix:
{{ 1/n, 1/n , ...},
{... .., A007318, ....},
{... .., ... ..., ..},
... ....,
{A129687, ... ... ... ... ... ... ..},
{A131084, A028326 ... ... ... ... ..},
{A028326, A131084 , A129687 ... ....}}
You may find those sequences (in some cases with different signs) in the wonderful http://oeis.org/
Solving the general problem is more difficult, but I hope this is a start
The "common mistake" you mention is shuffling by random transpositions. This problem was studied in full detail by Diaconis and Shahshahani in Generating a random permutation with random transpositions (1981). They do a complete analysis of stopping times and convergence to uniformity. If you cannot get a link to the paper, then please send me an e-mail and I can forward you a copy. It's actually a fun read (as are most of Persi Diaconis's papers).
If the array has repeated entries, then the problem is slightly different. As a shameless plug, this more general problem is addressed by myself, Diaconis and Soundararajan in Appendix B of A Rule of Thumb for Riffle Shuffling (2011).
Let's say
a = 1/N
b = 1-a
Bi(k) is the probability matrix after i swaps for the kth element. i.e the answer to the question "where is k after i swaps?". For example B0(3) = (0 0 1 0 ... 0) and B1(3) = (a 0 b 0 ... 0). What you want is BN(k) for every k.
Ki is an NxN matrix with 1s in the i-th column and i-th row, zeroes everywhere else, e.g:
Ii is the identity matrix but with the element x=y=i zeroed. E.g for i=2:
Ai is
Then,
But because BN(k=1..N) forms the identity matrix, the probability that any given element i will at the end be at position j is given by the matrix element (i,j) of the matrix:
For example, for N=4:
As a diagram for N = 500 (color levels are 100*probability):
The pattern is the same for all N>2:
The most probable ending position for k-th element is k-1.
The least probable ending position is k for k < N*ln(2), position 1 otherwise
I knew I had seen this question before...
" why does this simple shuffle algorithm produce biased results? what is a simple reason? " has a lot of good stuff in the answers, especially a link to a blog by Jeff Atwood on Coding Horror.
As you may have already guessed, based on the answer by #belisarius, the exact distribution is highly dependent on the number of elements to be shuffled. Here's Atwood's plot for a 6-element deck:
What a lovely question! I wish I had a full answer.
Fisher-Yates is nice to analyze because once it decides on the first element, it leaves it alone. The biased one can repeatedly swap an element in and out of any place.
We can analyze this the same way we would a Markov chain, by describing the actions as stochastic transition matrices acting linearly on probability distributions. Most elements get left alone, the diagonal is usually (n-1)/n. On pass k, when they don't get left alone, they get swapped with element k, (or a random element if they are element k). This is 1/(n-1) in either row or column k. The element in both row and column k is also 1/(n-1). It's easy enough to multiply these matrices together for k going from 1 to n.
We do know that the element in last place will be equally likely to have originally been anywhere because the last pass swaps the last place equally likely with any other. Similarly, the first element will be equally likely to be placed anywhere. This symmetry is because the transpose reverses the order of matrix multiplication. In fact, the matrix is symmetric in the sense that row i is the same as column (n+1 - i). Beyond that, the numbers don't show much apparent pattern. These exact solutions do show agreement with the simulations run by belisarius: In slot i, The probability of getting j decreases as j raises to i, reaching its lowest value at i-1, and then jumping up to its highest value at i, and decreasing until j reaches n.
In Mathematica I generated each step with
step[k_, n_] := Normal[SparseArray[{{k, i_} -> 1/n,
{j_, k} -> 1/n, {i_, i_} -> (n - 1)/n} , {n, n}]]
(I haven't found it documented anywhere, but the first matching rule is used.)
The final transition matrix can be calculated with:
Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]]
ListDensityPlot is a useful visualization tool.
Edit (by belisarius)
Just a confirmation. The following code gives the same matrix as in #Eelvex's answer:
step[k_, n_] := Normal[SparseArray[{{k, i_} -> (1/n),
{j_, k} -> (1/n), {i_, i_} -> ((n - 1)/n)}, {n, n}]];
r[n_, s_] := Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]];
Last#Table[r[4, i], {i, 1, 4}] // MatrixForm
Wikipedia's page on the Fisher-Yates shuffle has a description and example of exactly what will happen in that case.
You can compute the distribution using stochastic matrices. Let the matrix A(i,j) describe the probability of the card originally at position i ending up in position j. Then the kth swap has a matrix Ak given by Ak(i,j) = 1/N if i == k or j == k, (the card in position k can end up anywhere and any card can end up at position k with equal probability), Ak(i,i) = (N - 1)/N for all i != k (every other card will stay in the same place with probability (N-1)/N) and all other elements zero.
The result of the complete shuffle is then given by the product of the matrices AN ... A1.
I expect you're looking for an algebraic description of the probabilities; you can get one by expanding out the above matrix product, but it I imagine it will be fairly complex!
UPDATE: I just spotted wnoise's equivalent answer above! oops...
I've looked into this further, and it turns out that this distribution has been studied at length. The reason it's of interest is because this "broken" algorithm is (or was) used in the RSA chip system.
In Shuffling by semi-random transpositions, Elchanan Mossel, Yuval Peres, and Alistair Sinclair study this and a more general class of shuffles. The upshot of that paper appears to be that it takes log(n) broken shuffles to achieve near random distribution.
In The bias of three pseudorandom shuffles (Aequationes Mathematicae, 22, 1981, 268-292), Ethan Bolker and David Robbins analyze this shuffle and determine that the total variation distance to uniformity after a single pass is 1, indicating that it is not very random at all. They give asympotic analyses as well.
Finally, Laurent Saloff-Coste and Jessica Zuniga found a nice upper bound in their study of inhomogeneous Markov chains.
This question is begging for an interactive visual matrix diagram analysis of the broken shuffle mentioned. Such a tool is on the page Will It Shuffle? - Why random comparators are bad by Mike Bostock.
Bostock has put together an excellent tool that analyzes random comparators. In the dropdown on that page, choose naïve swap (random ↦ random) to see the broken algorithm and the pattern it produces.
His page is informative as it allows one to see the immediate effects a change in logic has on the shuffled data. For example:
This matrix diagram using a non-uniform and very-biased shuffle is produced using a naïve swap (we pick from "1 to N") with code like this:
function shuffle(array) {
var n = array.length, i = -1, j;
while (++i < n) {
j = Math.floor(Math.random() * n);
t = array[j];
array[j] = array[i];
array[i] = t;
}
}
But if we implement a non-biased shuffle, where we pick from "k to N" we should see a diagram like this:
where the distribution is uniform, and is produced from code such as:
function FisherYatesDurstenfeldKnuthshuffle( array ) {
var pickIndex, arrayPosition = array.length;
while( --arrayPosition ) {
pickIndex = Math.floor( Math.random() * ( arrayPosition + 1 ) );
array[ pickIndex ] = [ array[ arrayPosition ], array[ arrayPosition ] = array[ pickIndex ] ][ 0 ];
}
}
The excellent answers given so far are concentrating on the distribution, but you have asked also "What happens if you make this mistake?" - which is what I haven't seen answered yet, so I'll give an explanation on this:
The Knuth-Fisher-Yates shuffle algorithm picks 1 out of n elements, then 1 out of n-1 remaining elements and so forth.
You can implement it with two arrays a1 and a2 where you remove one element from a1 and insert it into a2, but the algorithm does it in place (which means, that it needs only one array), as is explained here (Google: "Shuffling Algorithms Fisher-Yates DataGenetics") very well.
If you don't remove the elements, they can be randomly chosen again which produces the biased randomness. This is exactly what the 2nd example your are describing does. The first example, the Knuth-Fisher-Yates algorithm, uses a cursor variable running from k to N, which remembers which elements have already been taken, hence avoiding to pick elements more than once.

Place "sum" and "multiply" operators between the elements of a given list of integers so that the expression results in a specified value

I was given a tricky question.
Given:
A = [a1,a2,...an] (list of positive integers with length "n")
r (positive integer)
Find a list of { *, + } operators
O = [o1,o2,...on-1]
so that if we placed those operators between the elements of "A", the resulting expression would evaluate to "r". Only one solution is required.
So for example if
A = [1,2,3,4]
r = 14
then
O = [*, +, *]
I've implemented a simple recursive solution with some optimisation, but of course it's exponential O(2^n) time, so for an input with length 40, it works for ages.
I wanted to ask if any of you know a sub-exponential solution for this?
Update
Elements of A are between 0-10000,
r can be arbitrarily big
Let A and B be positive integers. Then A + B ≤ A × B + 1.
This little fact can be used to construct a very efficient algorithm.
Let's define a graph. The graph nodes correspond to operations lists, for example, [+, ×, +, +, ×]. There is an edge from graph node X to graph node Y if the Y can be obtained by changing a single + to a × in X. The graph has a source at the node corresponding to [+, +, ..., +].
Now perform a breadth-first search from the source node, constructing the graph as you go. When expanding a node [+, ×, +, +, ×], for example, you (optionally construct then) connect to the nodes [×, ×, +, +, ×], [+, ×, ×, +, ×], and [+, ×, +, ×, ×]. Do not expand to a node if the result of evaluating it is greater than r + k(O), where k(O) is the number of +'s in the operation list O. This is because of the "+ 1" in the fact at the beginning of the answer - consider the case of a = [1, 1, 1, 1, 1], r = 1.
This approach uses O(n 2n) time and O(2n) space (where both are potentially very-loose worst case bounds). This is still an exponential algorithm, however I think you will find it performs very reasonably for non-sinister inputs. (I suspect this problem is NP-complete, which is why I am happy with this "non-sinister inputs" escape clause.)
Here's an O(rn^2)-time, O(rn)-space DP approach. If r << 2^n then this will have better worst-case behaviour than exponential-time branch-and-bound approaches, though even then the latter may still be faster on many instances. This is pseudo-polynomial time, because it takes time proportional to the value of part of its input (r), not its size (which would be log2(r)). Specifically it needs rn bits of memory, so it should give answers in a few seconds for up to around rn < 1,000,000,000 and n < 1000 (e.g. n = 100, r = 10,000,000).
The key observation is that any formula involving all n numbers has a final term that consists of some number i of factors, where 1 <= i <= n. That is, any formula must be in one of the following n cases:
(a formula on the first n-1 terms) + a[n]
(a formula on the first n-2 terms) + a[n-1] * a[n]
(a formula on the first n-3 terms) + a[n-2] * a[n-1] * a[n]
...
a[1] * a[2] * ... * a[n]
Let's call the "prefix" of a[] consisting of the first i numbers P[i]. If we record, for each 0 <= i <= n-1, the complete set of values <= r that can be reached by some formula on P[i], then based on the above, we can quite easily compute the complete set of values <= r that can be reached by P[n]. Specifically, let X[i][j] be a true or false value that indicates whether the prefix P[i] can achieve the value j. (X[][] could be stored as an array of n size-(r+1) bitmaps.) Then what we want to do is compute X[n][r], which will be true if r can be reached by some formula on a[], and false otherwise. (X[n][r] isn't quite the full answer yet, but it can be used to get the answer.)
X[1][a[1]] = true. X[1][j] = false for all other j. For any 2 <= i <= n and 0 <= j <= r, we can compute X[i][j] using
X[i][j] = X[i - 1][j - a[i]] ||
X[i - 2][j - a[i-1]*a[i]] ||
X[i - 3][j - a[i-2]*a[i-1]*a[i]] ||
... ||
X[1][j - a[2]*a[3]*...*a[i]] ||
(a[1]*a[2]*...*a[i] == j)
Note that the last line is an equality test that compares the product of all i numbers in P[i] to j, and returns true or false. There are i <= n "terms" (rows) in the expression for X[i][j], each of which can be computed in constant time (note in particular that the multiplications can be built up in constant time per row), so computing a single value X[i][j] can be done in O(n) time. To find X[n][r], we need to calculate X[i][j] for every 1 <= i <= n and every 0 <= j <= r, so there is O(rn^2) overall work to do. (Strictly speaking we may not need to compute all of these table entries if we use memoization instead of a bottom-up approach, but many inputs will require us to compute a large fraction of them anyway, so it's likely that the latter is faster by a small constant factor. Also a memoization approach requires keeping an "already processed" flag for each DP cell -- which doubles the memory usage when each cell is just 1 bit!)
Reconstructing a solution
If X[n][r] is true, then the problem has a solution (satisfying formula), and we can reconstruct one in O(n^2) time by tracing back through the DP table, starting from X[n][r], at each location looking for any term that enabled the current location to assume the value "true" -- that is, any true term. (We could do this reconstruction step faster by storing more than a single bit per (i, j) combination -- but since r is allowed to be "arbitrarily big", and this faster reconstruction won't improve the overall time complexity, it probably makes more sense to go with the approach that uses the fewest bits per DP table entry.) All satisfying solutions can be reconstructed this way, by backtracking through all true terms instead of just picking any one -- but there may be an exponential number of them.
Speedups
There are two ways that calculation of an individual X[i][j] value can be sped up. First, because all the terms are combined with ||, we can stop as soon as the result becomes true, since no later term can make it false again. Second, if there is no zero anywhere to the left of i, we can stop as soon as the product of the final numbers becomes larger than r, since there's no way for that product to be decreased again.
When there are no zeroes in a[], that second optimisation is likely to be very important in practice: it has the potential to make the inner loop much smaller than the full i-1 iterations. In fact if a[] contains no zeroes, and its average value is v, then after k terms have been computed for a particular X[i][j] value the product will be around v^k -- so on average, the number of inner loop iterations (terms) needed drops from n to log_v(r) = log(r)/log(v). That might be much smaller than n, in which case the average time complexity for this model drops to O(rn*log(r)/log(v)).
[EDIT: We actually can save multiplications with the following optimisation :)]
8/32/64 X[i][j]s at a time: X[i][j] is independent of X[i][k] for k != j, so if we are using bitsets to store these values, we can calculate 8, 32 or 64 of them (or maybe more, with SSE2 etc.) in parallel using simple bitwise OR operations. That is, we can calculate the first term of X[i][j], X[i][j+1], ..., X[i][j+31] in parallel, OR them into the results, then calculate their second terms in parallel and OR them in, etc. We still need to perform the same number of subtractions this way, but the products are all the same, so we can reduce the number of multiplications by a factor of 8/32/64 -- as well as, of course, the number of memory accesses. OTOH, this makes the first optimisation from the previous paragraph harder to accomplish -- you have to wait until an entire block of 8/32/64 bits have become true before you can stop iterating.
Zeroes: Zeroes in a[] may allow us to stop early. Specifically, if we have just computed X[i][r] for some i < n and found it to be true, and there is a zero anywhere to the right of position i in a[], then we can stop: we already have a formula on the first i numbers that evaluates to r, and we can use that zero to "kill off" all numbers to the right of position i by creating one big product term that includes all of them.
Ones: An interesting property of any a[] entry containing the value 1 is that it can be moved to any other position in a[] without affecting whether or not there is a solution. This is because every satisfying formula either has a * on at least one side of this 1, in which case it multiplies some other term and has no effect there, and would likewise have no effect anywhere else; or it has a + on both sides (imagine extra + signs before the first position and after the last), in which case it might as well be added in anywhere.
So, we can safely shunt all 1 values to the end of a[] before doing anything else. The point of doing this is that now we don't have to evaluate these rows of X[][] at all, because they only influence the outcome in a very simple way. Suppose there are m < n ones in a[], which we have moved to the end. Then after computing the m+1 values X[n-m][r-m], X[n-m][r-m+1], X[n-m][r-m+2], ..., X[n-m][r], we already know what X[n][r] must be: if any of them are true, then X[n][r] must be true, otherwise (if they are all false) it must be false. This is because the final m ones can add anywhere from 0 up to m to a formula on the first n-m values. (But if a[] consists entirely of 1s, then at least 1 must be "added" -- they can't all multiply some other term.)
Here is another approach that might be helpful. It is sometimes known as a "meet-in-the-middle" algorithm and runs in O(n * 2^(n/2)). The basic idea is this. Suppose n = 40 and you know that the middle slot is a +. Then, you can brute force all N := 2^20 possibilities for each side. Let A be a length N array storing the possible values of the left side, and similarly let B be a length N array storing the values for the right side.
Then, after sorting A and B, it is not hard to efficiently check for whether any two of them sum to r (e.g. for each value in A, do a binary search on B, or you can even do it in linear time if both arrays are sorted). This part takes O(N * log N) = O(n * 2^(n/2)) time.
Now, this was all assuming the middle slot is a +. If not, then it has to be a *, and you can combine the middle two elements into one (their product), reducing the problem to n = 39. Then you try the same thing, and so on. If you analyze it carefully, you should get O(n * 2^(n/2)) as the asymptotic complexity, since actually the largest term dominates.
You need to do some bookkeeping to actually recover the +'s and *'s, which I have left out to simplify the explanation.

Correctness of algorithm for generating a random list of numbers from 1 to n

Given a function Random(x, y) which returns a random number between x and y (inclusive). Design an algorithm to print a list of unique_random_numbers from 1 to n.
There must be n numbers in the list and every number must appear only once.
e.g.
PrintRandomList(1, 5) can print -> 2, 5, 1, 4, 3
PrintRandomList(1, 6) can print -> 4, 1, 6, 3, 2, 5
I've been able to come-up with an algorithm but couldn't prove that it will generate a truly random list (Assume that Random(x, y) will generate true random numbers).
void PrintRandomList(int a, int b) {
if(a<=b) {
int pivot = Random(a, b);
printf("%d ", pivot);
PrintRandomList(a, pivot-1);
PrintRandomList(pivot+1, b);
}
}
My question: Is the algorithm correct? If yes then, can we prove the correctness of the algorithm?
If this algorithm is correct than we can also use it to shuffle an array instead of using Knuth Shuffling Algorithm.
The trick to the root question of proving if this algorithm works is that you must prove that every outcome is equally possible. As other have pointed out, that's not the case here.
If you can find any sequence that is unreachable or even less likely to be reached then you know the algorithm is not "fair". The converse of this argument proves that it is.
Your algo isn't correct, most of the times the numbers from 1 to n/2 will be in the first half of the returned list (imagine that pivot returns roughly n/2). (not easy to prove that it is not correct either ^^)
You need to think a bit more, but if you ask we can give tips

Finding even numbers in an array without using feedback

I saw this post: Finding even numbers in an array and I was thinking about how you could do it without feedback. Here's what I mean.
Given an array of length n containing at most e even numbers and a
function isEven that returns true if the input is even and false
otherwise, write a function that prints all the even numbers in the
array using the fewest number of calls to isEven.
The answer on the post was to use a binary search, which is neat since it doesn't mean the array has to be in order. The number of times you have to check if a number is even is e log n instead if n because you do a binary search (log n) to find one even number each time (e times).
But that idea means that you divide the array in half, test for evenness, then decide which half to keep based on the result.
My question is whether or not you can beat n calls on a fixed testing scheme where you check all the numbers you want for evenness without knowing the outcome, and then figure out where the even numbers are after you've done all the tests based on the results. So I guess it's no-feedback or blind or some term like that.
I was thinking about this for a while and couldn't come up with anything. The binary search idea doesn't work at all with this constraint, but maybe something else does? Even getting down to n/2 calls instead of n (yes, I know they are the same big-O) would be good.
The technical term for "no-feedback or blind" is "non-adaptive". O(e log n) calls still suffice, but the algorithm is rather more involved.
Instead of testing the evenness of products, we're going to test the evenness of sums. Let E ≠ F be distinct subsets of {1, …, n}. If we have one array x1, …, xn with even numbers at positions E and another array y1, …, yn with even numbers at positions F, how many subsets J of {1, …, n} satisfy
(∑i in J xi) mod 2 ≠ (∑i in J yi) mod 2?
The answer is 2n-1. Let i be an index such that xi mod 2 ≠ yi mod 2. Let S be a subset of {1, …, i - 1, i + 1, … n}. Either J = S is a solution or J = S union {i} is a solution, but not both.
For every possible outcome E, we need to make calls that eliminate every other possible outcome F. Suppose we make 2e log n calls at random. For each pair E ≠ F, the probability that we still cannot distinguish E from F is (2n-1/2n)2e log n = n-2e, because there are 2n possible calls and only 2n-1 fail to distinguish. There are at most ne + 1 choices of E and thus at most (ne + 1)ne/2 pairs. By a union bound, the probability that there exists some indistinguishable pair is at most n-2e(ne + 1)ne/2 < 1 (assuming we're looking at an interesting case where e ≥ 1 and n ≥ 2), so there exists a sequence of 2e log n calls that does the job.
Note that, while I've used randomness to show that a good sequence of calls exists, the resulting algorithm is deterministic (and, of course, non-adaptive, because we chose that sequence without knowledge of the outcomes).
You can use the Chinese Remainder Theorem to do this. I'm going to change your notation a bit.
Suppose you have N numbers of which at most E are even. Choose a sequence of distinct prime powers q1,q2,...,qk such that their product is at least N^E, i.e.
qi = pi^ei
where pi is prime and ei > 0 is an integer and
q1 * q2 * ... * qk >= N^E
Now make a bunch of 0-1 matrices. Let Mi be the qi x N matrix where the entry in row r and column c has a 1 if c = r mod qi and a 0 otherwise. For example, if qi = 3^2, then row 2 has ones in columns 2, 11, 20, ... 2 + 9j and 0 elsewhere.
Now stack these matrices vertically to get a Q x N matrix M, where Q = q1 + q2 + ... + qk. The rows of M tell you which numbers to multiply together (the nonzero positions). This gives a total of Q products that you need to test for evenness. Call each row a "trial", and say that a "trial involves j" if the jth column of that row is nonempty. The theorem you need is the following:
THEOREM: The number in position j is even if and only if all trials involving j are even.
So you do a total of Q trials and then look at the results. If you choose the prime powers intelligently, then Q should be significantly smaller than N. There are asymptotic results that show you can always get Q on the order of
(2E log N)^2 / 2log(2E log N)
This theorem is actually a corollary of the Chinese Remainder Theorem. The only place that I've seen this used is in Combinatorial Group Testing. Apparently the problem originally arose when testing soldiers coming back from WWII for syphilis.
The problem you are facing is a form of group testing, type of a problem with the objective of reducing the cost of identifying certain elements of a set (up to d elements of a set of N elements).
As you've already stated, there are two basic principles via which the testing may be carried out:
Non-adaptive Group Testing, where all the tests to be performed are decided a priori.
Adaptive Group Testing, where we perform several tests, basing each test on the outcome of previous tests. Obviously, adaptive testing has a potential to reduce the cost, compared to non-adaptive testing.
Theoretical bounds for both principles have been studied, and are available in this Wiki article, or this paper.
For adaptive testing, the upper bound is O(d*log(N)) (as already described in this answer).
For non-adaptive testing, it can be shown that the upper bound is O(d*d/log(d)*log(N)), which is obviously larger than the upper bound for adaptive testing by a factor of d/log(d).
This upper bound for non-adaptive testing comes from an algorithm which uses disjunct matrices: matrices of dimension T x N ("number of tests" x "number of elements"), where each item can be either true (if an element was included in a test), or false (if it wasn't), with a property that any subset of d columns must differ from all other columns by at least a single row (test inclusion). This allows linear time of decoding (there are also "d-separable" matrices where fewer test are needed, but the time complexity for their decoding is exponential and not computationaly feasible).
Conclusion:
My question is whether or not you can beat n calls on a fixed testing scheme [...]
For such a scheme and a sufficiently large value of N, a disjunct matrix can be constructed which would have less than K * [d*d/log(d)*log(N)] rows. So, for large values of N, yes, you can beat it.
The underlying question (challenge) is kind of silly. If the binary search answer is acceptable (where it sums sub arrays and sends them to IsEven) then I can think of a way to do it with E or less calls to IsEven (assuming the numbers are integers of course).
JavaScript to demonstrate
// sort the array by only the first bit of the number
A.sort(function(x,y) { return (x & 1) - (y & 1); });
// all of the evens will be at the beginning
for(var i=0; i < E && i < A.length; i++) {
if(IsEven(A[i]))
Print(A[i]);
else
break;
}
Not exactly a solution, but just few thoughts.
It is easy to see that if a solution exists for array length n that takes less than n tests, then for any array length m > n it is easy to see that there is always a solution with less than m tests. So, if you have a solution for n = 2 or 3 or 4, then the problem is solved.
You can split the array into pairs of numbers and for each pair: if the sum is odd, then exactly one of them is even, otherwise if one of the numbers is even, then both of them are even. This way for each pair it takes either one or two tests. Best case:n/2 tests, worse case:n tests, if even and odd numbers are chosen with equal probability, then: 3n/4 tests.
My hunch is there is no solution with less than n tests. Not sure how to prove it.
UPDATE: The second solution can be extended in the following way.
Check if the sum of two numbers is even. If odd, then exactly one of them is even. Otherwise label the set as "homogeneous set of size 2". Take two "homogenous set"s of same size n. Pick one number from each set and check if their sum is even. If it is even, combine these two sets to a "homogeneous set of size 2n". Otherwise, it implies that one of those sets purely consists of even numbers and the other one purely odd numbers.
Best case:n/2 tests. Average case: 3*n/2. Worst case is still n. Worst case exists only when all the numbers are even or all the numbers are odd.
If we can add and multiply array elements, then we can compute every Boolean function (up to complementation) on the low-order bits. Simulate a circuit that encodes the positions of the even numbers as a number from 0 to nC0 + nC1 + ... + nCe - 1 represented in binary and use calls to isEven to read off the bits.
Number of calls used: within 1 of the information-theoretic optimum.
See also fully homomorphic encryption.

What distribution do you get from this broken random shuffle?

The famous Fisher-Yates shuffle algorithm can be used to randomly permute an array A of length N:
For k = 1 to N
Pick a random integer j from k to N
Swap A[k] and A[j]
A common mistake that I've been told over and over again not to make is this:
For k = 1 to N
Pick a random integer j from 1 to N
Swap A[k] and A[j]
That is, instead of picking a random integer from k to N, you pick a random integer from 1 to N.
What happens if you make this mistake? I know that the resulting permutation isn't uniformly distributed, but I don't know what guarantees there are on what the resulting distribution will be. In particular, does anyone have an expression for the probability distributions over the final positions of the elements?
An Empirical Approach.
Let's implement the erroneous algorithm in Mathematica:
p = 10; (* Range *)
s = {}
For[l = 1, l <= 30000, l++, (*Iterations*)
a = Range[p];
For[k = 1, k <= p, k++,
i = RandomInteger[{1, p}];
temp = a[[k]];
a[[k]] = a[[i]];
a[[i]] = temp
];
AppendTo[s, a];
]
Now get the number of times each integer is in each position:
r = SortBy[#, #[[1]] &] & /# Tally /# Transpose[s]
Let's take three positions in the resulting arrays and plot the frequency distribution for each integer in that position:
For position 1 the freq distribution is:
For position 5 (middle)
And for position 10 (last):
and here you have the distribution for all positions plotted together:
Here you have a better statistics over 8 positions:
Some observations:
For all positions the probability of
"1" is the same (1/n).
The probability matrix is symmetrical
with respect to the big anti-diagonal
So, the probability for any number in the last
position is also uniform (1/n)
You may visualize those properties looking at the starting of all lines from the same point (first property) and the last horizontal line (third property).
The second property can be seen from the following matrix representation example, where the rows are the positions, the columns are the occupant number, and the color represents the experimental probability:
For a 100x100 matrix:
Edit
Just for fun, I calculated the exact formula for the second diagonal element (the first is 1/n). The rest can be done, but it's a lot of work.
h[n_] := (n-1)/n^2 + (n-1)^(n-2) n^(-n)
Values verified from n=3 to 6 ( {8/27, 57/256, 564/3125, 7105/46656} )
Edit
Working out a little the general explicit calculation in #wnoise answer, we can get a little more info.
Replacing 1/n by p[n], so the calculations are hold unevaluated, we get for example for the first part of the matrix with n=7 (click to see a bigger image):
Which, after comparing with results for other values of n, let us identify some known integer sequences in the matrix:
{{ 1/n, 1/n , ...},
{... .., A007318, ....},
{... .., ... ..., ..},
... ....,
{A129687, ... ... ... ... ... ... ..},
{A131084, A028326 ... ... ... ... ..},
{A028326, A131084 , A129687 ... ....}}
You may find those sequences (in some cases with different signs) in the wonderful http://oeis.org/
Solving the general problem is more difficult, but I hope this is a start
The "common mistake" you mention is shuffling by random transpositions. This problem was studied in full detail by Diaconis and Shahshahani in Generating a random permutation with random transpositions (1981). They do a complete analysis of stopping times and convergence to uniformity. If you cannot get a link to the paper, then please send me an e-mail and I can forward you a copy. It's actually a fun read (as are most of Persi Diaconis's papers).
If the array has repeated entries, then the problem is slightly different. As a shameless plug, this more general problem is addressed by myself, Diaconis and Soundararajan in Appendix B of A Rule of Thumb for Riffle Shuffling (2011).
Let's say
a = 1/N
b = 1-a
Bi(k) is the probability matrix after i swaps for the kth element. i.e the answer to the question "where is k after i swaps?". For example B0(3) = (0 0 1 0 ... 0) and B1(3) = (a 0 b 0 ... 0). What you want is BN(k) for every k.
Ki is an NxN matrix with 1s in the i-th column and i-th row, zeroes everywhere else, e.g:
Ii is the identity matrix but with the element x=y=i zeroed. E.g for i=2:
Ai is
Then,
But because BN(k=1..N) forms the identity matrix, the probability that any given element i will at the end be at position j is given by the matrix element (i,j) of the matrix:
For example, for N=4:
As a diagram for N = 500 (color levels are 100*probability):
The pattern is the same for all N>2:
The most probable ending position for k-th element is k-1.
The least probable ending position is k for k < N*ln(2), position 1 otherwise
I knew I had seen this question before...
" why does this simple shuffle algorithm produce biased results? what is a simple reason? " has a lot of good stuff in the answers, especially a link to a blog by Jeff Atwood on Coding Horror.
As you may have already guessed, based on the answer by #belisarius, the exact distribution is highly dependent on the number of elements to be shuffled. Here's Atwood's plot for a 6-element deck:
What a lovely question! I wish I had a full answer.
Fisher-Yates is nice to analyze because once it decides on the first element, it leaves it alone. The biased one can repeatedly swap an element in and out of any place.
We can analyze this the same way we would a Markov chain, by describing the actions as stochastic transition matrices acting linearly on probability distributions. Most elements get left alone, the diagonal is usually (n-1)/n. On pass k, when they don't get left alone, they get swapped with element k, (or a random element if they are element k). This is 1/(n-1) in either row or column k. The element in both row and column k is also 1/(n-1). It's easy enough to multiply these matrices together for k going from 1 to n.
We do know that the element in last place will be equally likely to have originally been anywhere because the last pass swaps the last place equally likely with any other. Similarly, the first element will be equally likely to be placed anywhere. This symmetry is because the transpose reverses the order of matrix multiplication. In fact, the matrix is symmetric in the sense that row i is the same as column (n+1 - i). Beyond that, the numbers don't show much apparent pattern. These exact solutions do show agreement with the simulations run by belisarius: In slot i, The probability of getting j decreases as j raises to i, reaching its lowest value at i-1, and then jumping up to its highest value at i, and decreasing until j reaches n.
In Mathematica I generated each step with
step[k_, n_] := Normal[SparseArray[{{k, i_} -> 1/n,
{j_, k} -> 1/n, {i_, i_} -> (n - 1)/n} , {n, n}]]
(I haven't found it documented anywhere, but the first matching rule is used.)
The final transition matrix can be calculated with:
Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]]
ListDensityPlot is a useful visualization tool.
Edit (by belisarius)
Just a confirmation. The following code gives the same matrix as in #Eelvex's answer:
step[k_, n_] := Normal[SparseArray[{{k, i_} -> (1/n),
{j_, k} -> (1/n), {i_, i_} -> ((n - 1)/n)}, {n, n}]];
r[n_, s_] := Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]];
Last#Table[r[4, i], {i, 1, 4}] // MatrixForm
Wikipedia's page on the Fisher-Yates shuffle has a description and example of exactly what will happen in that case.
You can compute the distribution using stochastic matrices. Let the matrix A(i,j) describe the probability of the card originally at position i ending up in position j. Then the kth swap has a matrix Ak given by Ak(i,j) = 1/N if i == k or j == k, (the card in position k can end up anywhere and any card can end up at position k with equal probability), Ak(i,i) = (N - 1)/N for all i != k (every other card will stay in the same place with probability (N-1)/N) and all other elements zero.
The result of the complete shuffle is then given by the product of the matrices AN ... A1.
I expect you're looking for an algebraic description of the probabilities; you can get one by expanding out the above matrix product, but it I imagine it will be fairly complex!
UPDATE: I just spotted wnoise's equivalent answer above! oops...
I've looked into this further, and it turns out that this distribution has been studied at length. The reason it's of interest is because this "broken" algorithm is (or was) used in the RSA chip system.
In Shuffling by semi-random transpositions, Elchanan Mossel, Yuval Peres, and Alistair Sinclair study this and a more general class of shuffles. The upshot of that paper appears to be that it takes log(n) broken shuffles to achieve near random distribution.
In The bias of three pseudorandom shuffles (Aequationes Mathematicae, 22, 1981, 268-292), Ethan Bolker and David Robbins analyze this shuffle and determine that the total variation distance to uniformity after a single pass is 1, indicating that it is not very random at all. They give asympotic analyses as well.
Finally, Laurent Saloff-Coste and Jessica Zuniga found a nice upper bound in their study of inhomogeneous Markov chains.
This question is begging for an interactive visual matrix diagram analysis of the broken shuffle mentioned. Such a tool is on the page Will It Shuffle? - Why random comparators are bad by Mike Bostock.
Bostock has put together an excellent tool that analyzes random comparators. In the dropdown on that page, choose naïve swap (random ↦ random) to see the broken algorithm and the pattern it produces.
His page is informative as it allows one to see the immediate effects a change in logic has on the shuffled data. For example:
This matrix diagram using a non-uniform and very-biased shuffle is produced using a naïve swap (we pick from "1 to N") with code like this:
function shuffle(array) {
var n = array.length, i = -1, j;
while (++i < n) {
j = Math.floor(Math.random() * n);
t = array[j];
array[j] = array[i];
array[i] = t;
}
}
But if we implement a non-biased shuffle, where we pick from "k to N" we should see a diagram like this:
where the distribution is uniform, and is produced from code such as:
function FisherYatesDurstenfeldKnuthshuffle( array ) {
var pickIndex, arrayPosition = array.length;
while( --arrayPosition ) {
pickIndex = Math.floor( Math.random() * ( arrayPosition + 1 ) );
array[ pickIndex ] = [ array[ arrayPosition ], array[ arrayPosition ] = array[ pickIndex ] ][ 0 ];
}
}
The excellent answers given so far are concentrating on the distribution, but you have asked also "What happens if you make this mistake?" - which is what I haven't seen answered yet, so I'll give an explanation on this:
The Knuth-Fisher-Yates shuffle algorithm picks 1 out of n elements, then 1 out of n-1 remaining elements and so forth.
You can implement it with two arrays a1 and a2 where you remove one element from a1 and insert it into a2, but the algorithm does it in place (which means, that it needs only one array), as is explained here (Google: "Shuffling Algorithms Fisher-Yates DataGenetics") very well.
If you don't remove the elements, they can be randomly chosen again which produces the biased randomness. This is exactly what the 2nd example your are describing does. The first example, the Knuth-Fisher-Yates algorithm, uses a cursor variable running from k to N, which remembers which elements have already been taken, hence avoiding to pick elements more than once.

Resources