Random char stream choose uniform distribution - algorithm

Ok another ridiculous daily code problem with an absurd answer
Q:
Given a stream of elements too large to store in memory, pick a random element from the stream with uniform probability
A:
For the base case where i = 0, let’s say the random element is the first one. Then we know it works because
For i = 0, we would’ve picked uniformly from [0, 0].
For i > 0, before the loop began, any element K in [0, i - 1] had 1 / i chance
of being chosen as the random element. We want K to have 1 / (i + 1) chance
of being chosen after the iteration. This is the case since the chance of having
being chosen already but not getting swapped with the ith element is
1 / i * (1 - (1 / (i + 1))) which is 1 / i * i / (i + 1) or 1 / (i + 1)
The code:
import random
def pick(big_stream):
random_element = None
for i, e in enumerate(big_stream):
if i == 0:
random_element = e
elif random.randint(1, i + 1) == 1:
random_element = e
return random_element
So the 1/(k+1) if you're still in a loop it's just 1/k this elif 1/(k+1) with the I+1 seems artificial and wouldn't affect the O(n) time and instead make it O(n+1) which is the same as O(n).
What really is meant by this question? This algorithm seems really superficial, is there any suggestions that really can beat O(n)? This programming language looks perlesque but it's not what language would be close to it? Is there a more optimal (more specific) language for this?

The programming language (once the indentation is fixed) is most definitely Python.
What really is meant by this question? This algorithm seems really superficial, is there any suggestions that really can beat O(n)?
This question is mostly about space, not runtime. The algorithm they provide runs in constant space, which was the point. The naive answer would use an enormous amount of space.
There is a simpler (not necessarily faster) algorithm however, which can also be used to sample k elements from n elements in O(n log k) time and O(k) space. It works like this:
Assign an uniformly random real value in [0, 1] to each element from the stream as you receive it. Using a min-heap, keep track of the k smallest random values and their associated elements. Once the stream is fully processed return the elements that remain in the heap.
Which for k = 1 simply becomes:
Assign an uniformly random real value in [0, 1] to each element and return the element with the smallest random value.

This is an implementation of the algorithm described by #orlp for k=1:
import random
def pick(big_stream):
choice = None
minp = 1.0
for e in big_stream:
p = random.random()
if p >= minp:
continue
minp = p
choice = e
return choice

Related

Fixing this faulty Bingo Sort implementation

While studying Selection Sort, I came across a variation known as Bingo Sort. According to this dictionary entry here, Bingo Sort is:
A variant of selection sort that orders items by first finding the least value, then repeatedly moving all items with that value to their final location and find the least value for the next pass.
Based on the definition above, I came up with the following implementation in Python:
def bingo_sort(array, ascending=True):
from operator import lt, gt
def comp(x, y, func):
return func(x, y)
i = 0
while i < len(array):
min_value = array[i]
j = i + 1
for k in range(i + 1, len(array), 1):
if comp(array[k], min_value, (lt if ascending else gt)):
min_value = array[k]
array[i], array[k] = array[k], array[i]
elif array[k] == min_value:
array[j], array[k] = array[k], array[j]
j += 1
i = j
return array
I know that this implementation is problematic. When I run the algorithm on an extremely small array, I get a correctly sorted array. However, running the algorithm with a larger array results in an array that is mostly sorted with incorrect placements here and there. To replicate the issue in Python, the algorithm can be ran on the following input:
test_data = [[randint(0, 101) for i in range(0, 101)],
[uniform(0, 101) for i in range(0, 101)],
["a", "aa", "aaaaaa", "aa", "aaa"],
[5, 5.6],
[3, 2, 4, 1, 5, 6, 7, 8, 9]]
for dataset in test_data:
print(dataset)
print(bingo_sort(dataset, ascending=True, mutation=True))
print("\n")
I cannot for the life of me realize where the fault is at since I've been looking at this algorithm too long and I am not really proficient at these things. I could not find an implementation of Bingo Sort online except an undergraduate graduation project written in 2020. Any help that can point me in the right direction would be greatly appreciated.
I think your main problem is that you're trying to set min_value in your first conditional statement and then to swap based on that same min_value you've just set in your second conditional statement. These processes are supposed to be staggered: the way bingo sort should work is you find the min_value in one iteration, and in the next iteration you swap all instances of that min_value to the front while also finding the next min_value for the following iteration. In this way, min_value should only get changed at the end of every iteration, not during it. When you change the value you're swapping to the front over the course of a given iteration, you can end up unintentionally shuffling things a bit.
I have an implementation of this below if you want to refer to something, with a few notes: since you're allowing a custom comparator, I renamed min_value to swap_value as we're not always grabbing the min, and I modified how the comparator is defined/passed into the function to make the algorithm more flexible. Also, you don't really need three indexes (I think there were even a couple bugs here), so I collapsed i and j into swap_idx, and renamed k to cur_idx. Finally, because of how swapping a given swap_val and finding the next_swap_val is to be staggered, you need to find the initial swap_val up front. I'm using a reduce statement for that, but you could just use another loop over the whole array there; they're equivalent. Here's the code:
from operator import lt, gt
from functools import reduce
def bingo_sort(array, comp=lt):
if len(array) <= 1:
return array
# get the initial swap value as determined by comp
swap_val = reduce(lambda val, cur: cur if comp(cur, val) else val, array)
swap_idx = 0 # set the inital swap_idx to 0
while swap_idx < len(array):
cur_idx = swap_idx
next_swap_val = array[cur_idx]
while cur_idx < len(array):
if comp(array[cur_idx], next_swap_val): # find next swap value
next_swap_val = array[cur_idx]
if array[cur_idx] == swap_val: # swap swap_vals to front of the array
array[swap_idx], array[cur_idx] = array[cur_idx], array[swap_idx]
swap_idx += 1
cur_idx += 1
swap_val = next_swap_val
return array
In general, the complexity of this algorithm depends on how many duplicate values get processed, and when they get processed. This is because every time k duplicate values get processed during a given iteration, the length of the inner loop is decreased by k for all subsequent iterations. Performance is therefore optimized when large clusters of duplicate values are processed early on (as when the smallest values of the array contain many duplicates). From this, there are basically two ways you could analyze the complexity of the algorithm: You could analyze it in terms of where the duplicate values tend to appear in the final sorted array (Type 1), or you could assume the clusters of duplicate values are randomly distributed through the sorted array and analyze complexity in terms of the average size of duplicate clusters (that is, in terms of the magnitude of m relative to n: Type 2).
The definition you linked uses the first type of analysis (based on where duplicates tend to appear) to derive best = Theta(n+m^2), average = Theta(nm), worst = Theta(nm). The second type of analysis produces best = Theta(n), average = Theta(nm), worst = Theta(n^2) as you vary m from Theta(1) to Theta(m) to Theta(n).
In the best Type 1 case, all duplicates will be among the smallest elements of the array, such that the run-time of the inner loop quickly decreases to O(m), and the final iterations of the algorithm proceed as an O(m^2) selection sort. However, there is still the up-front O(n) pass to select the initial swap value, so the overall complexity is O(n + m^2).
In the worst Type 1 case, all duplicates will be among the largest elements of the array. The length of the inner loop isn't substantially shortened until the last iterations of the algorithm, such that we achieve a run-time looking something like n + n-1 + n-2 .... + n-m. This is a sum of m O(n) values, giving us O(nm) total run-time.
In the average Type 1 case (and for all Type 2 cases), we don't assume that the clusters of duplicate values are biased towards the front or back of the sorted array. We take it that the m clusters of duplicate values are randomly distributed through the array in terms of their position and their size. Under this analysis, we expect that after the initial O(n) pass to find the first swap value, each of the m iterations of the outer loop reduce the length of the inner loop by approximately n/m. This leads to an expression of the overall run-time for unknown m and randomly distributed data as:
We can use this expression for the average case run-time with randomly distributed data and unknown m, Theta(nm), as the average Type 2 run-time, and it also directly gives us the best and worst case run-times based on how we might vary the magnitude of n.
In the best Type 2 case, m might just be some constant value independent of n. if we have m=Theta(1) randomly distributed duplicate clusters, the best case run time is then Theta(n*Theta(1))) = Theta(n). For example as you would see O(2n) = O(n) performance from bingo-sort with just one unique value (one pass to find the find value, one pass to swap every single value to the front), and this O(n) asymptotic complexity still holds if m is bounded by any constant.
However in the worst Type 2 case we could have m=Theta(n), and bingo sort essentially devolves into O(n^2) selection sort. This is clearly the case for m = n, but if the amount the inner-loop's run-time is expected to decrease by with each iteration, n/m, is any constant value, which is the case for any m value in Theta(n), we still see O(n^2) complexity.

Shuffle a deck of card with equal likely to be 52 permutations [duplicate]

The famous Fisher-Yates shuffle algorithm can be used to randomly permute an array A of length N:
For k = 1 to N
Pick a random integer j from k to N
Swap A[k] and A[j]
A common mistake that I've been told over and over again not to make is this:
For k = 1 to N
Pick a random integer j from 1 to N
Swap A[k] and A[j]
That is, instead of picking a random integer from k to N, you pick a random integer from 1 to N.
What happens if you make this mistake? I know that the resulting permutation isn't uniformly distributed, but I don't know what guarantees there are on what the resulting distribution will be. In particular, does anyone have an expression for the probability distributions over the final positions of the elements?
An Empirical Approach.
Let's implement the erroneous algorithm in Mathematica:
p = 10; (* Range *)
s = {}
For[l = 1, l <= 30000, l++, (*Iterations*)
a = Range[p];
For[k = 1, k <= p, k++,
i = RandomInteger[{1, p}];
temp = a[[k]];
a[[k]] = a[[i]];
a[[i]] = temp
];
AppendTo[s, a];
]
Now get the number of times each integer is in each position:
r = SortBy[#, #[[1]] &] & /# Tally /# Transpose[s]
Let's take three positions in the resulting arrays and plot the frequency distribution for each integer in that position:
For position 1 the freq distribution is:
For position 5 (middle)
And for position 10 (last):
and here you have the distribution for all positions plotted together:
Here you have a better statistics over 8 positions:
Some observations:
For all positions the probability of
"1" is the same (1/n).
The probability matrix is symmetrical
with respect to the big anti-diagonal
So, the probability for any number in the last
position is also uniform (1/n)
You may visualize those properties looking at the starting of all lines from the same point (first property) and the last horizontal line (third property).
The second property can be seen from the following matrix representation example, where the rows are the positions, the columns are the occupant number, and the color represents the experimental probability:
For a 100x100 matrix:
Edit
Just for fun, I calculated the exact formula for the second diagonal element (the first is 1/n). The rest can be done, but it's a lot of work.
h[n_] := (n-1)/n^2 + (n-1)^(n-2) n^(-n)
Values verified from n=3 to 6 ( {8/27, 57/256, 564/3125, 7105/46656} )
Edit
Working out a little the general explicit calculation in #wnoise answer, we can get a little more info.
Replacing 1/n by p[n], so the calculations are hold unevaluated, we get for example for the first part of the matrix with n=7 (click to see a bigger image):
Which, after comparing with results for other values of n, let us identify some known integer sequences in the matrix:
{{ 1/n, 1/n , ...},
{... .., A007318, ....},
{... .., ... ..., ..},
... ....,
{A129687, ... ... ... ... ... ... ..},
{A131084, A028326 ... ... ... ... ..},
{A028326, A131084 , A129687 ... ....}}
You may find those sequences (in some cases with different signs) in the wonderful http://oeis.org/
Solving the general problem is more difficult, but I hope this is a start
The "common mistake" you mention is shuffling by random transpositions. This problem was studied in full detail by Diaconis and Shahshahani in Generating a random permutation with random transpositions (1981). They do a complete analysis of stopping times and convergence to uniformity. If you cannot get a link to the paper, then please send me an e-mail and I can forward you a copy. It's actually a fun read (as are most of Persi Diaconis's papers).
If the array has repeated entries, then the problem is slightly different. As a shameless plug, this more general problem is addressed by myself, Diaconis and Soundararajan in Appendix B of A Rule of Thumb for Riffle Shuffling (2011).
Let's say
a = 1/N
b = 1-a
Bi(k) is the probability matrix after i swaps for the kth element. i.e the answer to the question "where is k after i swaps?". For example B0(3) = (0 0 1 0 ... 0) and B1(3) = (a 0 b 0 ... 0). What you want is BN(k) for every k.
Ki is an NxN matrix with 1s in the i-th column and i-th row, zeroes everywhere else, e.g:
Ii is the identity matrix but with the element x=y=i zeroed. E.g for i=2:
Ai is
Then,
But because BN(k=1..N) forms the identity matrix, the probability that any given element i will at the end be at position j is given by the matrix element (i,j) of the matrix:
For example, for N=4:
As a diagram for N = 500 (color levels are 100*probability):
The pattern is the same for all N>2:
The most probable ending position for k-th element is k-1.
The least probable ending position is k for k < N*ln(2), position 1 otherwise
I knew I had seen this question before...
" why does this simple shuffle algorithm produce biased results? what is a simple reason? " has a lot of good stuff in the answers, especially a link to a blog by Jeff Atwood on Coding Horror.
As you may have already guessed, based on the answer by #belisarius, the exact distribution is highly dependent on the number of elements to be shuffled. Here's Atwood's plot for a 6-element deck:
What a lovely question! I wish I had a full answer.
Fisher-Yates is nice to analyze because once it decides on the first element, it leaves it alone. The biased one can repeatedly swap an element in and out of any place.
We can analyze this the same way we would a Markov chain, by describing the actions as stochastic transition matrices acting linearly on probability distributions. Most elements get left alone, the diagonal is usually (n-1)/n. On pass k, when they don't get left alone, they get swapped with element k, (or a random element if they are element k). This is 1/(n-1) in either row or column k. The element in both row and column k is also 1/(n-1). It's easy enough to multiply these matrices together for k going from 1 to n.
We do know that the element in last place will be equally likely to have originally been anywhere because the last pass swaps the last place equally likely with any other. Similarly, the first element will be equally likely to be placed anywhere. This symmetry is because the transpose reverses the order of matrix multiplication. In fact, the matrix is symmetric in the sense that row i is the same as column (n+1 - i). Beyond that, the numbers don't show much apparent pattern. These exact solutions do show agreement with the simulations run by belisarius: In slot i, The probability of getting j decreases as j raises to i, reaching its lowest value at i-1, and then jumping up to its highest value at i, and decreasing until j reaches n.
In Mathematica I generated each step with
step[k_, n_] := Normal[SparseArray[{{k, i_} -> 1/n,
{j_, k} -> 1/n, {i_, i_} -> (n - 1)/n} , {n, n}]]
(I haven't found it documented anywhere, but the first matching rule is used.)
The final transition matrix can be calculated with:
Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]]
ListDensityPlot is a useful visualization tool.
Edit (by belisarius)
Just a confirmation. The following code gives the same matrix as in #Eelvex's answer:
step[k_, n_] := Normal[SparseArray[{{k, i_} -> (1/n),
{j_, k} -> (1/n), {i_, i_} -> ((n - 1)/n)}, {n, n}]];
r[n_, s_] := Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]];
Last#Table[r[4, i], {i, 1, 4}] // MatrixForm
Wikipedia's page on the Fisher-Yates shuffle has a description and example of exactly what will happen in that case.
You can compute the distribution using stochastic matrices. Let the matrix A(i,j) describe the probability of the card originally at position i ending up in position j. Then the kth swap has a matrix Ak given by Ak(i,j) = 1/N if i == k or j == k, (the card in position k can end up anywhere and any card can end up at position k with equal probability), Ak(i,i) = (N - 1)/N for all i != k (every other card will stay in the same place with probability (N-1)/N) and all other elements zero.
The result of the complete shuffle is then given by the product of the matrices AN ... A1.
I expect you're looking for an algebraic description of the probabilities; you can get one by expanding out the above matrix product, but it I imagine it will be fairly complex!
UPDATE: I just spotted wnoise's equivalent answer above! oops...
I've looked into this further, and it turns out that this distribution has been studied at length. The reason it's of interest is because this "broken" algorithm is (or was) used in the RSA chip system.
In Shuffling by semi-random transpositions, Elchanan Mossel, Yuval Peres, and Alistair Sinclair study this and a more general class of shuffles. The upshot of that paper appears to be that it takes log(n) broken shuffles to achieve near random distribution.
In The bias of three pseudorandom shuffles (Aequationes Mathematicae, 22, 1981, 268-292), Ethan Bolker and David Robbins analyze this shuffle and determine that the total variation distance to uniformity after a single pass is 1, indicating that it is not very random at all. They give asympotic analyses as well.
Finally, Laurent Saloff-Coste and Jessica Zuniga found a nice upper bound in their study of inhomogeneous Markov chains.
This question is begging for an interactive visual matrix diagram analysis of the broken shuffle mentioned. Such a tool is on the page Will It Shuffle? - Why random comparators are bad by Mike Bostock.
Bostock has put together an excellent tool that analyzes random comparators. In the dropdown on that page, choose naïve swap (random ↦ random) to see the broken algorithm and the pattern it produces.
His page is informative as it allows one to see the immediate effects a change in logic has on the shuffled data. For example:
This matrix diagram using a non-uniform and very-biased shuffle is produced using a naïve swap (we pick from "1 to N") with code like this:
function shuffle(array) {
var n = array.length, i = -1, j;
while (++i < n) {
j = Math.floor(Math.random() * n);
t = array[j];
array[j] = array[i];
array[i] = t;
}
}
But if we implement a non-biased shuffle, where we pick from "k to N" we should see a diagram like this:
where the distribution is uniform, and is produced from code such as:
function FisherYatesDurstenfeldKnuthshuffle( array ) {
var pickIndex, arrayPosition = array.length;
while( --arrayPosition ) {
pickIndex = Math.floor( Math.random() * ( arrayPosition + 1 ) );
array[ pickIndex ] = [ array[ arrayPosition ], array[ arrayPosition ] = array[ pickIndex ] ][ 0 ];
}
}
The excellent answers given so far are concentrating on the distribution, but you have asked also "What happens if you make this mistake?" - which is what I haven't seen answered yet, so I'll give an explanation on this:
The Knuth-Fisher-Yates shuffle algorithm picks 1 out of n elements, then 1 out of n-1 remaining elements and so forth.
You can implement it with two arrays a1 and a2 where you remove one element from a1 and insert it into a2, but the algorithm does it in place (which means, that it needs only one array), as is explained here (Google: "Shuffling Algorithms Fisher-Yates DataGenetics") very well.
If you don't remove the elements, they can be randomly chosen again which produces the biased randomness. This is exactly what the 2nd example your are describing does. The first example, the Knuth-Fisher-Yates algorithm, uses a cursor variable running from k to N, which remembers which elements have already been taken, hence avoiding to pick elements more than once.

Find subset with elements that are furthest apart from eachother

I have an interview question that I can't seem to figure out. Given an array of size N, find the subset of size k such that the elements in the subset are the furthest apart from each other. In other words, maximize the minimum pairwise distance between the elements.
Example:
Array = [1,2,6,10]
k = 3
answer = [1,6,10]
The bruteforce way requires finding all subsets of size k which is exponential in runtime.
One idea I had was to take values evenly spaced from the array. What I mean by this is
Take the 1st and last element
find the difference between them (in this case 10-1) and divide that by k ((10-1)/3=3)
move 2 pointers inward from both ends, picking out elements that are +/- 3 from your previous pick. So in this case, you start from 1 and 10 and find the closest elements to 4 and 7. That would be 6.
This is based on the intuition that the elements should be as evenly spread as possible. I have no idea how to prove it works/doesn't work. If anyone knows how or has a better algorithm please do share. Thanks!
This can be solved in polynomial time using DP.
The first step is, as you mentioned, sort the list A. Let X[i,j] be the solution for selecting j elements from first i elements A.
Now, X[i+1, j+1] = max( min( X[k,j], A[i+1]-A[k] ) ) over k<=i.
I will leave initialization step and memorization of subset step for you to work on.
In your example (1,2,6,10) it works the following way:
1 2 6 10
1 - - - -
2 - 1 5 9
3 - - 1 4
4 - - - 1
The basic idea is right, I think. You should start by sorting the array, then take the first and the last elements, then determine the rest.
I cannot think of a polynomial algorithm to solve this, so I would suggest one of the two options.
One is to use a search algorithm, branch-and-bound style, since you have a nice heuristic at hand: the upper bound for any solution is the minimum size of the gap between the elements picked so far, so the first guess (evenly spaced cells, as you suggested) can give you a good baseline, which will help prune most of the branches right away. This will work fine for smaller values of k, although the worst case performance is O(N^k).
The other option is to start with the same baseline, calculate the minimum pairwise distance for it and then try to improve it. Say you have a subset with minimum distance of 10, now try to get one with 11. This can be easily done by a greedy algorithm -- pick the first item in the sorted sequence such that the distance between it and the previous item is bigger-or-equal to the distance you want. If you succeed, try increasing further, if you fail -- there is no such subset.
The latter solution can be faster when the array is large and k is relatively large as well, but the elements in the array are relatively small. If they are bound by some value M, this algorithm will take O(N*M) time, or, with a small improvement, O(N*log(M)), where N is the size of the array.
As Evgeny Kluev suggests in his answer, there is also a good upper bound on the maximum pairwise distance, which can be used in either one of these algorithms. So the complexity of the latter is actually O(N*log(M/k)).
You can do this in O(n*(log n) + n*log(M)), where M is max(A) - min(A).
The idea is to use binary search to find the maximum separation possible.
First, sort the array. Then, we just need a helper function that takes in a distance d, and greedily builds the longest subarray possible with consecutive elements separated by at least d. We can do this in O(n) time.
If the generated array has length at least k, then the maximum separation possible is >=d. Otherwise, it's strictly less than d. This means we can use binary search to find the maximum value. With some cleverness, you can shrink the 'low' and 'high' bounds of the binary search, but it's already so fast that sorting would become the bottleneck.
Python code:
def maximize_distance(nums: List[int], k: int) -> List[int]:
"""Given an array of numbers and size k, uses binary search
to find a subset of size k with maximum min-pairwise-distance"""
assert len(nums) >= k
if k == 1:
return [nums[0]]
nums.sort()
def longest_separated_array(desired_distance: int) -> List[int]:
"""Given a distance, returns a subarray of nums
of length k with pairwise differences at least that distance (if
one exists)."""
answer = [nums[0]]
for x in nums[1:]:
if x - answer[-1] >= desired_distance:
answer.append(x)
if len(answer) == k:
break
return answer
low, high = 0, (nums[-1] - nums[0])
while low < high:
mid = (low + high + 1) // 2
if len(longest_separated_array(mid)) == k:
low = mid
else:
high = mid - 1
return longest_separated_array(low)
I suppose your set is ordered. If not, my answer will be changed slightly.
Let's suppose you have an array X = (X1, X2, ..., Xn)
Energy(Xi) = min(|X(i-1) - Xi|, |X(i+1) - Xi|), 1 < i <n
j <- 1
while j < n - k do
X.Exclude(min(Energy(Xi)), 1 < i < n)
j <- j + 1
n <- n - 1
end while
$length = length($array);
sort($array); //sorts the list in ascending order
$differences = ($array << 1) - $array; //gets the difference between each value and the next largest value
sort($differences); //sorts the list in ascending order
$max = ($array[$length-1]-$array[0])/$M; //this is the theoretical max of how large the result can be
$result = array();
for ($i = 0; i < $length-1; $i++){
$count += $differences[i];
if ($length-$i == $M - 1 || $count >= $max){ //if there are either no more coins that can be taken or we have gone above or equal to the theoretical max, add a point
$result.push_back($count);
$count = 0;
$M--;
}
}
return min($result)
For the non-code people: sort the list, find the differences between each 2 sequential elements, sort that list (in ascending order), then loop through it summing up sequential values until you either pass the theoretical max or there arent enough elements remaining; then add that value to a new array and continue until you hit the end of the array. then return the minimum of the newly created array.
This is just a quick draft though. At a quick glance any operation here can be done in linear time (radix sort for the sorts).
For example, with 1, 4, 7, 100, and 200 and M=3, we get:
$differences = 3, 3, 93, 100
$max = (200-1)/3 ~ 67
then we loop:
$count = 3, 3+3=6, 6+93=99 > 67 so we push 99
$count = 100 > 67 so we push 100
min(99,100) = 99
It is a simple exercise to convert this to the set solution that I leave to the reader (P.S. after all the times reading that in a book, I've always wanted to say it :P)

What distribution do you get from this broken random shuffle?

The famous Fisher-Yates shuffle algorithm can be used to randomly permute an array A of length N:
For k = 1 to N
Pick a random integer j from k to N
Swap A[k] and A[j]
A common mistake that I've been told over and over again not to make is this:
For k = 1 to N
Pick a random integer j from 1 to N
Swap A[k] and A[j]
That is, instead of picking a random integer from k to N, you pick a random integer from 1 to N.
What happens if you make this mistake? I know that the resulting permutation isn't uniformly distributed, but I don't know what guarantees there are on what the resulting distribution will be. In particular, does anyone have an expression for the probability distributions over the final positions of the elements?
An Empirical Approach.
Let's implement the erroneous algorithm in Mathematica:
p = 10; (* Range *)
s = {}
For[l = 1, l <= 30000, l++, (*Iterations*)
a = Range[p];
For[k = 1, k <= p, k++,
i = RandomInteger[{1, p}];
temp = a[[k]];
a[[k]] = a[[i]];
a[[i]] = temp
];
AppendTo[s, a];
]
Now get the number of times each integer is in each position:
r = SortBy[#, #[[1]] &] & /# Tally /# Transpose[s]
Let's take three positions in the resulting arrays and plot the frequency distribution for each integer in that position:
For position 1 the freq distribution is:
For position 5 (middle)
And for position 10 (last):
and here you have the distribution for all positions plotted together:
Here you have a better statistics over 8 positions:
Some observations:
For all positions the probability of
"1" is the same (1/n).
The probability matrix is symmetrical
with respect to the big anti-diagonal
So, the probability for any number in the last
position is also uniform (1/n)
You may visualize those properties looking at the starting of all lines from the same point (first property) and the last horizontal line (third property).
The second property can be seen from the following matrix representation example, where the rows are the positions, the columns are the occupant number, and the color represents the experimental probability:
For a 100x100 matrix:
Edit
Just for fun, I calculated the exact formula for the second diagonal element (the first is 1/n). The rest can be done, but it's a lot of work.
h[n_] := (n-1)/n^2 + (n-1)^(n-2) n^(-n)
Values verified from n=3 to 6 ( {8/27, 57/256, 564/3125, 7105/46656} )
Edit
Working out a little the general explicit calculation in #wnoise answer, we can get a little more info.
Replacing 1/n by p[n], so the calculations are hold unevaluated, we get for example for the first part of the matrix with n=7 (click to see a bigger image):
Which, after comparing with results for other values of n, let us identify some known integer sequences in the matrix:
{{ 1/n, 1/n , ...},
{... .., A007318, ....},
{... .., ... ..., ..},
... ....,
{A129687, ... ... ... ... ... ... ..},
{A131084, A028326 ... ... ... ... ..},
{A028326, A131084 , A129687 ... ....}}
You may find those sequences (in some cases with different signs) in the wonderful http://oeis.org/
Solving the general problem is more difficult, but I hope this is a start
The "common mistake" you mention is shuffling by random transpositions. This problem was studied in full detail by Diaconis and Shahshahani in Generating a random permutation with random transpositions (1981). They do a complete analysis of stopping times and convergence to uniformity. If you cannot get a link to the paper, then please send me an e-mail and I can forward you a copy. It's actually a fun read (as are most of Persi Diaconis's papers).
If the array has repeated entries, then the problem is slightly different. As a shameless plug, this more general problem is addressed by myself, Diaconis and Soundararajan in Appendix B of A Rule of Thumb for Riffle Shuffling (2011).
Let's say
a = 1/N
b = 1-a
Bi(k) is the probability matrix after i swaps for the kth element. i.e the answer to the question "where is k after i swaps?". For example B0(3) = (0 0 1 0 ... 0) and B1(3) = (a 0 b 0 ... 0). What you want is BN(k) for every k.
Ki is an NxN matrix with 1s in the i-th column and i-th row, zeroes everywhere else, e.g:
Ii is the identity matrix but with the element x=y=i zeroed. E.g for i=2:
Ai is
Then,
But because BN(k=1..N) forms the identity matrix, the probability that any given element i will at the end be at position j is given by the matrix element (i,j) of the matrix:
For example, for N=4:
As a diagram for N = 500 (color levels are 100*probability):
The pattern is the same for all N>2:
The most probable ending position for k-th element is k-1.
The least probable ending position is k for k < N*ln(2), position 1 otherwise
I knew I had seen this question before...
" why does this simple shuffle algorithm produce biased results? what is a simple reason? " has a lot of good stuff in the answers, especially a link to a blog by Jeff Atwood on Coding Horror.
As you may have already guessed, based on the answer by #belisarius, the exact distribution is highly dependent on the number of elements to be shuffled. Here's Atwood's plot for a 6-element deck:
What a lovely question! I wish I had a full answer.
Fisher-Yates is nice to analyze because once it decides on the first element, it leaves it alone. The biased one can repeatedly swap an element in and out of any place.
We can analyze this the same way we would a Markov chain, by describing the actions as stochastic transition matrices acting linearly on probability distributions. Most elements get left alone, the diagonal is usually (n-1)/n. On pass k, when they don't get left alone, they get swapped with element k, (or a random element if they are element k). This is 1/(n-1) in either row or column k. The element in both row and column k is also 1/(n-1). It's easy enough to multiply these matrices together for k going from 1 to n.
We do know that the element in last place will be equally likely to have originally been anywhere because the last pass swaps the last place equally likely with any other. Similarly, the first element will be equally likely to be placed anywhere. This symmetry is because the transpose reverses the order of matrix multiplication. In fact, the matrix is symmetric in the sense that row i is the same as column (n+1 - i). Beyond that, the numbers don't show much apparent pattern. These exact solutions do show agreement with the simulations run by belisarius: In slot i, The probability of getting j decreases as j raises to i, reaching its lowest value at i-1, and then jumping up to its highest value at i, and decreasing until j reaches n.
In Mathematica I generated each step with
step[k_, n_] := Normal[SparseArray[{{k, i_} -> 1/n,
{j_, k} -> 1/n, {i_, i_} -> (n - 1)/n} , {n, n}]]
(I haven't found it documented anywhere, but the first matching rule is used.)
The final transition matrix can be calculated with:
Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]]
ListDensityPlot is a useful visualization tool.
Edit (by belisarius)
Just a confirmation. The following code gives the same matrix as in #Eelvex's answer:
step[k_, n_] := Normal[SparseArray[{{k, i_} -> (1/n),
{j_, k} -> (1/n), {i_, i_} -> ((n - 1)/n)}, {n, n}]];
r[n_, s_] := Fold[Dot, IdentityMatrix[n], Table[step[m, n], {m, s}]];
Last#Table[r[4, i], {i, 1, 4}] // MatrixForm
Wikipedia's page on the Fisher-Yates shuffle has a description and example of exactly what will happen in that case.
You can compute the distribution using stochastic matrices. Let the matrix A(i,j) describe the probability of the card originally at position i ending up in position j. Then the kth swap has a matrix Ak given by Ak(i,j) = 1/N if i == k or j == k, (the card in position k can end up anywhere and any card can end up at position k with equal probability), Ak(i,i) = (N - 1)/N for all i != k (every other card will stay in the same place with probability (N-1)/N) and all other elements zero.
The result of the complete shuffle is then given by the product of the matrices AN ... A1.
I expect you're looking for an algebraic description of the probabilities; you can get one by expanding out the above matrix product, but it I imagine it will be fairly complex!
UPDATE: I just spotted wnoise's equivalent answer above! oops...
I've looked into this further, and it turns out that this distribution has been studied at length. The reason it's of interest is because this "broken" algorithm is (or was) used in the RSA chip system.
In Shuffling by semi-random transpositions, Elchanan Mossel, Yuval Peres, and Alistair Sinclair study this and a more general class of shuffles. The upshot of that paper appears to be that it takes log(n) broken shuffles to achieve near random distribution.
In The bias of three pseudorandom shuffles (Aequationes Mathematicae, 22, 1981, 268-292), Ethan Bolker and David Robbins analyze this shuffle and determine that the total variation distance to uniformity after a single pass is 1, indicating that it is not very random at all. They give asympotic analyses as well.
Finally, Laurent Saloff-Coste and Jessica Zuniga found a nice upper bound in their study of inhomogeneous Markov chains.
This question is begging for an interactive visual matrix diagram analysis of the broken shuffle mentioned. Such a tool is on the page Will It Shuffle? - Why random comparators are bad by Mike Bostock.
Bostock has put together an excellent tool that analyzes random comparators. In the dropdown on that page, choose naïve swap (random ↦ random) to see the broken algorithm and the pattern it produces.
His page is informative as it allows one to see the immediate effects a change in logic has on the shuffled data. For example:
This matrix diagram using a non-uniform and very-biased shuffle is produced using a naïve swap (we pick from "1 to N") with code like this:
function shuffle(array) {
var n = array.length, i = -1, j;
while (++i < n) {
j = Math.floor(Math.random() * n);
t = array[j];
array[j] = array[i];
array[i] = t;
}
}
But if we implement a non-biased shuffle, where we pick from "k to N" we should see a diagram like this:
where the distribution is uniform, and is produced from code such as:
function FisherYatesDurstenfeldKnuthshuffle( array ) {
var pickIndex, arrayPosition = array.length;
while( --arrayPosition ) {
pickIndex = Math.floor( Math.random() * ( arrayPosition + 1 ) );
array[ pickIndex ] = [ array[ arrayPosition ], array[ arrayPosition ] = array[ pickIndex ] ][ 0 ];
}
}
The excellent answers given so far are concentrating on the distribution, but you have asked also "What happens if you make this mistake?" - which is what I haven't seen answered yet, so I'll give an explanation on this:
The Knuth-Fisher-Yates shuffle algorithm picks 1 out of n elements, then 1 out of n-1 remaining elements and so forth.
You can implement it with two arrays a1 and a2 where you remove one element from a1 and insert it into a2, but the algorithm does it in place (which means, that it needs only one array), as is explained here (Google: "Shuffling Algorithms Fisher-Yates DataGenetics") very well.
If you don't remove the elements, they can be randomly chosen again which produces the biased randomness. This is exactly what the 2nd example your are describing does. The first example, the Knuth-Fisher-Yates algorithm, uses a cursor variable running from k to N, which remembers which elements have already been taken, hence avoiding to pick elements more than once.

How to generate a permutation?

My question is: given a list L of length n, and an integer i such that 0 <= i < n!, how can you write a function perm(L, n) to produce the ith permutation of L in O(n) time? What I mean by ith permutation is just the ith permutation in some implementation defined ordering that must have the properties:
For any i and any 2 lists A and B, perm(A, i) and perm(B, i) must both map the jth element of A and B to an element in the same position for both A and B.
For any inputs (A, i), (A, j) perm(A, i)==perm(A, j) if and only if i==j.
NOTE: this is not homework. In fact, I solved this 2 years ago, but I've completely forgotten how, and it's killing me. Also, here is a broken attempt I made at a solution:
def perm(s, i):
n = len(s)
perm = [0]*n
itCount = 0
for elem in s:
perm[i%n + itCount] = elem
i = i / n
n -= 1
itCount+=1
return perm
ALSO NOTE: the O(n) requirement is very important. Otherwise you could just generate the n! sized list of all permutations and just return its ith element.
def perm(sequence, index):
sequence = list(sequence)
result = []
for x in xrange(len(sequence)):
idx = index % len(sequence)
index /= len(sequence)
result.append( sequence[idx] )
# constant time non-order preserving removal
sequence[idx] = sequence[-1]
del sequence[-1]
return result
Based on the algorithm for shuffling, but we take the least significant part of the number each time to decide which element to take instead of a random number. Alternatively consider it like the problem of converting to some arbitrary base except that the base name shrinks for each additional digit.
Could you use factoradics? You can find an illustration via this MSDN article.
Update: I wrote an extension of the MSDN algorithm that finds i'th permutation of n things taken r at a time, even if n != r.
A computational minimalistic approach (written in C-style pseudocode):
function perm(list,i){
for(a=list.length;a;a--){
list.switch(a-1,i mod a);
i=i/a;
}
return list;
}
Note that implementations relying on removing elements from the original list tend to run in O(n^2) time, at best O(n*log(n)) given a special tree style list implementation designed for quickly inserting and removing list elements.
The above code rather than shrinking the original list and keeping it in order just moves an element from the end to the vacant location, still makes a perfect 1:1 mapping between index and permutation, just a slightly more scrambled one, but in pure O(n) time.
So, I think I finally solved it. Before I read any answers, I'll post my own here.
def perm(L, i):
n = len(L)
if (n == 1):
return L
else:
split = i%n
return [L[split]] + perm(L[:split] + L[split+1:], i/n)
There are n! permutations. The first character can be chosen from L in n ways. Each of those choices leave (n-1)! permutations among them. So this idea is enough for establishing an order. In general, you will figure out what part you are in, pick the appropriate element and then recurse / loop on the smaller L.
The argument that this works correctly is by induction on the length of the sequence. (sketch) For a length of 1, it is trivial. For a length of n, you use the above observation to split the problem into n parts, each with a question on an L' with length (n-1). By induction, all the L's are constructed correctly (and in linear time). Then it is clear we can use the IH to construct a solution for length n.

Resources