Can we reduce the no.of calls being made to rand(n) in Fisher–Yates shuffle Algorithm. If not, how do we know that the no.of calls we are making is reasonable.
Can we reduce the no.of calls being made to rand(n) in Fisher–Yates shuffle Algorithm
No
You want to select k points from array - there are k calls to rand(), up to N-1.
If not, how do we know that the no.of calls we are making is reasonable.
Well, you supposed to know how many points you want to produce/permute, so this is how many calls to rand() you have to make
Related
I want to sort a list of n items from pairwise comparisons. Each round, I receive k comparisons, one each from k different "arbiters".
The arbiters cannot coordinate, and must choose their comparisons independently from myself and each other. How should they choose their comparisons so that I can sort the list of items in as few rounds as possible?
A naive solution is that each arbiter independently runs quicksort, sending over the corresponding comparisons they make. Ultimately, I'd just be waiting for one arbiter to finish sorting, so this would take O(n*log(n)) rounds for me to sort the list, and I literally receive no benefit from having k arbiter over just a single arbiter.
Another naive solution is each arbiter independently sends over random comparisons. This would result in a coupon collector problem, and would taken on average O(n^2*log(n)/k) rounds for me to get the right comparisons to sort the list. But unless k is in ω(n), this run-time isn't better than O(n*log(n)).
Is there a better solution? Maybe one that uses O(n*log(n)/k) rounds? (i.e. double the arbiters = half the rounds needed)
To be more concrete about the independence of arbiters: ideally, the arbiters would use symmetric randomized strategies. If that's not possible, though, then I'll allow the arbiters to have a strategy meeting one time only at the start.
Also, arbiters have to send exactly the comparisons they make. E.g. an arbiter cannot just sort the entire list themselves, and then send only the comparisons (arr[i] < arr[i+1]) for i=0 to n-2. They have to send each comparison they make as they make it.
I think I have a solution. It's totally symmetric with no coordination.
Each arbiter runs the following algorithm to choose their comparisons.
First, they consider evenly dividing n into k "sections" (e.g. section 1 is from 1 to n/k, section 2 is from n/k+1 to 2n/k, etc.). Then, they randomly choose one of k sections to focus on. Next, they use quickselect to get all the elements that will end up in their section (which takes O(n) comparisons). Finally, they fully sort their section (which takes O(n*log(n)/k) comparisons). After that, they randomly choose another section and repeat.
To get all k sections sorted, it's a coupon collector problem, and takes k*log(k) tries, which, when distributed over k arbiters, will only take log(k) tries.
So the final number of rounds of this approach will be O(n*log(k) + n*log(n)*log(k)/k) = O(n*log(n)*log(k)/k), which gives you an improvement factor of log(k)/k. This can be upgraded to 1/k if the arbiters are allowed to choose at the start which sections they're in charge of.
If they can coordinate their own numbers.
Here's an idea. First, partition the array in k parts as for quicksort. First have everybody partition the entire array in 2 synchronously. Then let the first half of the arbiters partition the first half of the array while the second half of the arbiters partition the second half of the array, and so on. This takes no more than 2n rounds. Next, let arbiter number i sort the partition number i. This takes (n/k) log (n/k) rounds.
If they cannot coordinate their own numbers.
The same idea but it takes longer. You still have them partition the array in k parts, only now they have to do it all the way down synchronously, so you don't benefit from having k arbiters at this stage. This takes n log k rounds. Then let each arbiter sort each partition in randomly chosen order. I'm too lazy to calculate the expected number of rounds here. Off the top of my head, you will need about e (n/k) log (n/k) rounds, but don't quote me.
Another way if they cannot coordinate their own numbers.
Almost the same as above, but have each arbiter partition in a random order (first partition in half, then randomly choose which half to partition further etc). Sort as soon as there is a partition of size n/k, then randomly select what to partition next. This is harder to analyse than the above but should be faster.
I have a range of numbers from 1-10 and I want to pick out 3 randomly but never the same twice. In lua I used a Fisher-Yates shuffle which is O(n), I know python has a built-in random.sample() also O(n). Can it be done faster with an arbitrary range and number of picks?
It is impossible to operate on any sequence of numbers where you read each one under complexity of O(n) because having n read operations alone puts it into linear complexity.
P.S.: This assumes that n is the number of picks. If you have an array and n is the size of that array and the number of picks m is constant, then you can generate a random index number with any method m times and achieve O(1), assuming array index takes constant time. I hope that answers your question, please clarify, if that didn't solve it.
Coin Change is a pretty popular problem which asks how many ways you can get to a sum of N cents using coins (C[0], C[1]...C[K-1]). The DP solution is to use the method
s(N, K) = s(N, K-1) + s(N-C[K-1], K), where s(N,K) is the amount of ways to arrive at a sum N with the first K coins(sorted in ascending order).
It means the amount of ways to make N cents using the first K coins is the sum of the amount of ways to arrive at the same sum without using the Kth coin added to the amount of ways to arrive at N-the Kth coin. I really don't understand how you can come across this solution, and how it makes sense logically. Could someone explain?
The most important thing when solving a DP is to reduce the problem to a set of simpler subproblems. Most of the times "simpler" means with smaller argument(s) and in this case a problem is simpler if either the sum is less or the remaining coin values are less.
My way of thinking to solve the problem is: okay I have a set of coins and I need to count the number of ways I can form a given sum. This sounds complicated, but if I has one less coin it would be a bit easier.
It also helps to think of the bottom case. In this case you know in how many ways you can form a given sum if all you had is a single coin. This somehow suggests that the reduction to simpler problems will probably reduce the number of different coins.
Lets say we have some discrete distribution with finite number of possible results, is it possible to generate a random number from this distribution faster than in O(logn), where n is number possible results?
How to make it in O(logn):
- Make an array with cumulative probability (Array[i] = Probability that random number will be less or equal to i)
- Generate random number from uniform distribution (lets denote it by k)
- Find the smallest i such that k < Array[i]. It can be done using binary search.
- i is our random number.
Walker's alias method can draw a sample in constant worst-case time, using some auxiliary arrays of size n which need to be precomputed. This method is described in Chapter 3 of Devroye's book on sampling and is implemented in the R sample() function. You can get code from R's source code or this thread. A 1991 paper by Vose claims to reduce the initialization cost.
Note that your question isn't well-defined unless you specify the exact form of the input and how many random numbers you want to draw. For example, if the input is an array giving the probability of each result, then your algorithm is not O(log n) because it requires first computing the cumulative probabilities which takes O(n) time from the input array.
If you intend to draw many samples then the cost of generating a single sample is not so important. Instead what matters is the total cost to generate m results, and the peak memory required. In this regard, the alias method very good. If you want to generate the samples all at once, use the O(n+m) algorithm posted here and then shuffle the results.
I'd like to produce fast random shuffles repeatedly with minimal bias.
It's known that the Fisher-Yates shuffle is unbiased as long as the underlying random number generator (RNG) is unbiased.
To shuffle an array a of n elements:
for i from n − 1 downto 1 do
j ← random integer with 0 ≤ j ≤ i
exchange a[j] and a[i]
But what if the RNG is biased (but fast)?
Suppose I want to produce many random permutations of an array of 25 elements. If I use the Fisher-Yates algorithm with a biased RNG, then my permutation will be biased, but I believe this assumes that the 25-element array starts from the same state before each application of the shuffle algorithm. One problem, for example, is if the RNG only has a period of 2^32 ~ 10^9 we can not produce every possible permutation of the 25 elements because this is 25! ~ 10^25 permutations.
My general question is, if I leave the shuffled elements shuffled before starting each new application of the Fisher-Yates shuffle, would this reduce the bias and/or allow the algorithm to produce every permutation?
My guess is it would generally produce better results, but it seems like if the array being repeatedly shuffled had a number of elements that was related to the underlying RNG that the permutations could actually repeat more often than expected.
Does anyone know of any research that addresses this?
As a sub-question, what if I only want repeated permutations of 5 of the 25 elements in the array, so I use the Fisher-Yates algorithm to select 5 elements and stop before doing a full shuffle? (I use the 5 elements on the end of the array that got swapped.) Then I start over using the previous partially shuffled 25-element array to select another permutation of 5. Again, it seems like this would be better than starting from the original 25-element array if the underlying RNG had a bias. Any thoughts on this?
I think it would be easier to test the partial shuffle case since there are only 6,375,600 possible permutations of 5 out of 25 elements, so are there any simple tests to use to check for biases?
if the RNG only has a period of 2^32 ~
10^9 we can not produce every possible
permutation of the 25 elements because
this is 25! ~ 10^25 permutations
This is only true as long as the seed determines every successive selection. As long as your RNG can be expected to deliver a precisely even distribution over the range specified for each next selection, then it can produce every permutation. If your RNG cannot do that, having a larger seed base will not help.
As for your side question, you might as well reseed for every draw. However, reseeding the generator is only useful if reseeding it contains enough entropy. Time stamps don't contain much entropy, neither do algorithmic calculations.
I'm not sure what this solution is part of because you have not listed it, but if you are trying to calculate something from a larger domain using random input, there are probably better methods.
A couple of points:
1) Anyone using the Fisher Yates shuffle should read this and make doubly sure their implementation is correct.
2) Doesn't repeating the shuffle defeat the purpose of using a faster random number generator? Surely if you're going to have to repeat every shuffle 5 times to get the desired entropy you're better using a low bias generator.
3) Do you have a set up where you can test this? If so start trying things - Jeffs graphs make it clear that you can easily detect quite a lot of errors by using small decks and visually portraying the results.
My feeling is that with a biased RNG repeated runs of the Knuth shuffle would produce all the permutations, but I'm not able to prove it (it depends on the period of the RNG and how much biased it is).
So let's reverse the question: given an algorithm that requires a random input and a biased RNG, is it easier to de-skew the algorithm's output or to de-skew the RNG's output?
Unsurprisingly, the latter is much easier to do (and is of broader interest): there are several standard techniques to do it. A simple technique, due to Von Neumann, is: given a bitstream from a biased RNG, take bits in pairs, throw away every (0,0) and (1,1) pair, return a 1 for every (1,0) pair and a 0 for every (0,1) pair. This technique assumes that the bits are from a stream where each bit has the same probability of being a 0 or 1 as any other bit in the stream and that bits are not correlated. Elias generalized von Neumann's technique to a more efficient scheme (one where fewer bits are discarded).
But even strongly biased or correlated bits, may contain useful amounts of randomness, for example using a technique based on Fast Fourier Transform.
Another option is to feed the biased RNG output to a cryptographically strong function, for example a message digest algorithm, and use its output.
For further references on how to de-skew random number generators, I suggest you to read the Randomness Recommendations for Security RFC.
My point is that the quality if the output of a random-based algorithm is upper bounded by the entropy provided by the RNG: if it is extremely biased the output will be extremely biased, no matter what you do. The algorithm can't squeeze more entropy than the one contained in the biased random bitstream. Worse: it will probably lose some random bits. Even assuming that the algorithm works with a biased RNG, to obtain good result you'll have to put a computational effort at least as great as the effort that it would take to de-skew the RNG (but it probably will require more effort, since you'll have to both run the algorithm and "defeat" the biasing at the same time).
If your question is just theoretical, then please disregard this answer. If it is practical then please seriously think about de-skewing your RNG instead of making assumption about the output of the algorithm.
I can't completely answer your question, but this observation seemed too long for a comment.
What happens if you ensure that the number of random numbers pulled from your RNG for each iteration of Fisher-Yates has a high least common multiple with the RNG period? That may mean that you "waste" a random integer at the end of the algorithm. When shuffling 25 elements, you need 24 random numbers. If you pull one more random number at the end, making 25 random numbers, you're not guaranteed to have a repetition for much longer than the RNG period. Now, randomly, you could have the same 25 numbers occur in succession before reaching the period, of course. But, as 25 has no common factors other than 1 with 2^32, you wouldn't hit a guaranteed repetition until 25*(2^32). Now, that isn't a huge improvement, but you said this RNG is fast. What if the "waste" value was much larger? It may still not be practical to get every permutation, but you could at least increase the number you can reach.
It depends entirely on the bias. In general I would say "don't count on it".
Biased algorithm that converges to non-biased:
Do nothing half of the time, and a correct shuffle the other half. Converges towards non-biased exponentially. After n shuffles there is a 1-1/2^n chance the shuffle is non-biased and a 1/2^n chance the input sequence was selected.
Biased algorithm that stays biased:
Shuffle all elements except the last one. Permanently biased towards not moving the last element.
More General Example:
Think of a shuffle algorithm as a weighted directed graph of permutations, where the weights out of a node correspond to the probability of transitioning from one permutation to another when shuffled. A biased shuffle algorithm will have non-uniform weights.
Now suppose you filled one node in that graph with water, and water flowed from one node to the next based on the weights. The algorithm will converge to non-biased if the distribution of water converges to uniform no matter the starting node.
So in what cases will the water not spread out uniformly? Well, if you have a cycle of above-average weights, nodes in the cycle will tend to feed each other and stay above the average amount of water. They won't take all of it, since as they get more water the amount coming in decreases and the amount going out increases, but it will be above average.