Algorithm to spread selection over a fixed size array - ruby

It's not specifically a ruby problem: more of a general question about algorithms. But there might be some ruby-specific array methods which are helpful.
I have an array with 30 items. I ask for a number of items between 15 and 30, and I want to select a given number of items from the whole array as evenly distributed as possible. The selection needs to be non-random, returning the same result every time.
Let's say someone asks for 16 items. If I return the first 16, that would be a massive fail. Instead, I could return all the odd numbered ones plus the last one; If I had the numbers 1 to 30 stored in the array, I could give back
myArr.spread(16)
=> [1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,30]
If someone asks for 20 items, its a bit tricker: I can't immediately think of a nice programmatic way of doing this. I feel like it must have been solved already by someone. Any suggestions?

I ended up doing this, inspired by Alex D: i step through n-1 times and then always add the last element to the end.
class Array
def spread(n)
step = self.length.to_f / (n -1)
(0..(n-2)).to_a.collect{|i| self[i * step]} + [self.last]
end
end
> (1..30).to_a.spread(3)
=> [1, 16, 30]
> (1..30).to_a.spread(4)
=> [1, 11, 21, 30]
> (1..30).to_a.spread(5)
=> [1, 8, 16, 23, 30]
> (1..30).to_a.spread(15)
=> [1, 3, 5, 7, 9, 11, 13, 16, 18, 20, 22, 24, 26, 28, 30]

Having recently implemented this method—although I called it keep—for use in a backup retention application, I thought I'd share my solution. It's similar to Alex D's answer with two major differences in the algorithm:
The "stride" is calculated using (length + (length / n) - 1).to_f / n where n is the number of items desired. Calculating an offset in terms of the number of times n goes into length ensures that the last item is always included.
It uses a modulo operation instead of incrementing: If the element's index divided by the "stride" has a remainder between 0 and 1 (inclusive of 0, exclusive of 1), the element is included in the result. The fact that 0 % x is always 0 ensures that the first element is always returned.
Edge cases, such as when the number of elements is less than the number desired, are accounted for.
class Array
def keep(n)
if n < 1
[]
elsif length <= n
self.clone
else
stride = (length + (length / n) - 1).to_f / n
select.with_index do |_, i|
remainder = i % stride
(0 <= remainder && remainder < 1)
end
end
end
end

Divide the size of the array by the number of items you want to select (DON'T use truncating division) -- this will be your "stride" as you walk over the array, selecting items. Keep adding the "stride" to a running total until it equals or exceeds the size of the array. Each time you add the "stride", take the integral part and use it as an index into the array to select an item.
Say you have 100 items and you want to select 30. Then your "stride" will be 3.3333... so you start with a "running total" of 3.3333, and select item 3. Then 6.66666 -- so you select item 6. Next is 10.0 -- so you select item 10. And so on...
Test to make sure you don't get "off by one" errors, and also that you don't divide by zero if the array size or number of items to select is zero. Also use a guard clause to ensure that the number of items to select is not greater than the number in the array.

There was a similar question for this here, but the solution is in python.
In Ruby, it would be:
class Array
def spread( count)
length = self.length
result = Array.new
0.upto(count-1) do |i|
result << self[(i * length.to_f / count).ceil]
end
return result
end
end
arr = Array(1..30)
puts arr.spread(20)
#=> [1, 3, 4, 6, 7, 9, 10, 12, 13, 15, 16, 18, 19, 21, 22, 24, 25, 27, 28, 30]

You could try to use a Random (doc) with a fixed seed:
with the Random object, you can pick the elements of the array randomly
the fixed seed ensure that every call to the function will generate the list of random numbers.
For example with Array#sample
def spread(arr, count) do
arr.sample(count, Random.new(0))
end

Related

Unique random string with alphanumberic required in Ruby

I'm using the following code to generate a unique 10-character random string of [A-Z a-z 0-9] in Ruby:
random_code = [*('a'..'z'),*('0'..'9'),*('A'..'Z')].shuffle[0, 10].join
However, sometimes this random string does not contain a number or an uppercase character. Could you help me have a method that generates a unique random string that requires at least one number, one uppercase and one downcase character?
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
all = down + up + digits
[down.sample, up.sample, digits.sample].
concat(7.times.map { all.sample }).
shuffle.
join
#=> "TioS8TYw0F"
[Edit: The above reflects a misunderstanding of the question. I'll leave it, however. To have no characters appear more than once:
def rnd_str
down = ('a'..'z').to_a
up = ('A'..'Z').to_a
digits = ('0'..'9').to_a
[extract1(down), extract1(up), extract1(digits)].
concat(((down+up+digits).sample(7))).shuffle.join
end
def extract1(arr)
i = arr.size.times.to_a.sample
c = arr[i]
arr.delete_at(i)
c
end
rnd_str #=> "YTLe0WGoa1"
rnd_str #=> "NrBmAnE9bT"
down.sample.shift (etc.) would have been more compact than extract1, but the inefficiency was just too much to bear.
If you do not want to repeat random strings, simply keep a list of the ones you generate. If you generate another that is in the list, discard it and generate another. It's pretty unlikely you'll have to generate any extra ones, however. If, for example, you generate 100 random strings (satisfying the requirement of at least one lowercase letter, uppercase letter and digit), the chances that there will be one or more duplicate strings is about one in 700,000:
t = 107_518_933_731
n = t+1
t = t.to_f
(1.0 - 100.times.reduce(1.0) { |prod,_| prod * (n -= 1)/t }).round(10)
#=> 1.39e-07
where t = C(62,10) and C(62,10) is defined below.
An alternative
There is a really simple way to do this that turns out to be pretty efficient: just sample without replacement until a sample is found that meets the requirement of at least lowercase letter, one uppercase letter and one digit. We can do that as follows:
DOWN = ('a'..'z').to_a
UP = ('A'..'Z').to_a
DIGITS = ('0'..'9').to_a
ALL = DOWN + UP + DIGITS
def rnd_str
loop do
arr = ALL.sample(10)
break arr.shuffle.join unless (DOWN&&arr).empty? || (UP&&arr).empty? ||
(DIGITS&&arr).empty?
end
end
rnd_str #=> "3jRkHcP7Ge"
rnd_str #=> "B0s81x4Jto
How many samples must we reject, on average, before finding a "good" one? It turns out (see below if you are really, really interested) that the probability of getting a "bad" string (i.e, selecting 10 characters at random from the 62 elements of all, without replacement, that has no lowercase letters, no uppercase letters or no digits, is only about 0.15. (15%). That means that 85% of the time no bad samples will be rejected before a good one is found.
It turns out that the expected number of bad strings that will be sampled, before a good string is sampled, is:
0.15/0.85 =~ 0.17
The following shows how the above probability was derived, should anyone be interested.
Let n_down be the number of ways a sample of 10 can be drawn that has no lowercase letters:
n_down = C(36,10) = 36!/(10!*(36-10)!)
where (the binomial coefficient) C(36,10) equals the number of combinations of 36 "things" that can be "taken" 10 at a time, and equals:
C(36,10) = 36!/(10!*(36-10)!) #=> 254_186_856
Similarly,
n_up = n_down #=> 254_186_856
and
n_digits = C(52,10) #=> 15_820_024_220
We can add these three numbers together to obtain:
n_down + n_up + n_digits #=> 16_328_397_932
This is almost, but not quite, the number of ways to draw 10 characters, without replacement, that contains no lowercase letters characters, uppercase letters or digits. "Not quite" because there is a bit of double-counting going on. The necessary adjustment is as follows:
n_down + n_up + n_digits - 2*C(26,10) - 3
#=> 16_317_774_459
To obtain the probability of drawing a sample of 10 from a population of 62, without replacement, that has no lowercase letter, no uppercase letter or no digit, we divide this number by the total number of ways 10 characters can be drawn from 62 without replacement:
(16_317_774_459.0/c(62,10)).round(2)
#=> 0.15
If you want a script to generate just some small amount of tokens (like 2, 5, 10, 100, 1000, 10 000, etc), then the best way would be to simply keep the already generated tokens in memory and retry until a new one is generated (statistically speaking, this wont take long). If this is not the case - keep reading.
After thinking about it, this problem turned out to be in fact very interenting. For brievety, I will not mention the requirement to have at least one number, capital and lower case letters, but it will be included in the final solution. Also let all = [*'1'..'9', *'a'..'z', *'A'..'Z'].
To sum it up, we want to generate k-permutations of n elements with repetition randomly with uniqueness constraint.
k = 10, n = 61 (all.size)
Ruby just so happens to have such method, it's Array#repeated_permutation. So everything is great, we can just use:
all.repeated_permutation(10).to_a.map(&join).shuffle
and pop the resulting strings one by one, right? Wrong! The problem is that the amount of possibilities happens to be:
k^n = 10000000000000000000000000000000000000000000000000000000000000 (10**61).
Even if you had an infinetelly fast processor, you still can't hold such amount of data, no matter if this was the count of complex objects or simple bits.
The opposite would be to generate random permutations, keep the already generated in a set and make checks for inclusion before returning the next element. This is just delaying the innevitable - not only you would still have to hold the same amount of information at some point, but as the number of generated permutations grows, the number of tries required to generate a new permutation diverges to infinity.
As you might have thought, the root of the problem is that randomness and uniqueness hardly go hand to hand.
Firstly, we would have to define what we would consider as random. Judging by the amount of nerdy comics on the subject, you could deduce that this isn't that black and white either.
An intuitive definition for a random program would be one that doesn't generate the tokens in the same order with each execution. Great, so now we can just take the first n permutations (where n = rand(100)), put them at the end and enumerate everything in order? You can sense where this is going. In order for a random generation to be considered good, the generated outputs of consecutive runs should be equaly distributed. In simpler terms, the probability of getting any possible output should be equal to 1 / #__all_possible_outputs__.
Now lets explore the boundaries of our problem a little:
The number of possible k-permutations of n elements without repetition is:
n!/(n-k)! = 327_234_915_316_108_800 ((61 - 10 + 1).upto(61).reduce(:*))
Still out of reach. Same goes for
The number of possible full permutations of n elements without repetition:
n! = 507_580_213_877_224_798_800_856_812_176_625_227_226_004_528_988_036_003_099_405_939_480_985_600_000_000_000_000 (1.upto(61).reduce(:*))
The number of possible k-combinations of n elements without repetition:
n!/k!(n-k)! = 90_177_170_226 ((61 - 10 + 1).upto(61).reduce(:*)/1.upto(10).reduce(:*))
Finally, where we might have a break through with full permutation of k elements without repetition:
k! = 3_628_800 (1.upto(10).reduce(:*))
~3.5m isn't nothing, but at least it's reasonably computable. On my personal laptop k_permutations = 0.upto(9).to_a.permutation.to_a took 2.008337 seconds on average. Generally, as computing time goes, this is a lot. However, assuming that you will be running this on an actual server and only once per application startup, this is nothing. In fact, it would even be reasonable to create some seeds. A single k_permutations.shuffle took 0.154134 seconds, therefore in about a minute we can acquire 61 random permutations: k_seeds = 61.times.map { k_permutations.shuffle }.to_a.
Now lets try to convert the problem of k-permutations of n elements without repetition to solving multiple times full k-permutations without repetitions.
A cool trick for generating permutations is using numbers and bitmaps. The idea is to generate all numbers from 0 to 2^61 - 1 and look at the bits. If there is a 1 on position i, we will use the all[i] element, otherwise we will skip it. We still didn't escape the issue as 2^61 = 2305843009213693952 (2**61) which we can't keep in memory.
Fortunatelly, another cool trick comes to the rescue, this time from number theory.
Any m consecutive numbers raised to the power of a prime number by modulo of m give the numbers from 0 to m - 1
In other words:
5.upto(65).map { |number| number**17 % 61 }.sort # => [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
5.upto(65).map { |number| number**17 % 61 } # => [36, 31, 51, 28, 20, 59, 11, 22, 47, 48, 42, 12, 54, 26, 5, 34, 29, 57, 24, 53, 15, 55, 3, 38, 21, 18, 43, 40, 23, 58, 6, 46, 8, 37, 4, 32, 27, 56, 35, 7, 49, 19, 13, 14, 39, 50, 2, 41, 33, 10, 30, 25, 16, 9, 17, 60, 0, 1, 44, 52, 45]
Now actually, how random is that? As it turns out - the more common divisors shared by m and the selected m numbers, the less evenly distributed the sequence is. But we are at luck here - 61^2 - 1 is a prime number (also called Mersenne prime). Therefore, the only divisors it can share are 1 and 61^2 - 1. This means that no matter what power we choose, the positions of the numbers 0 and 1 will be fixed. That is not perfect, but the other 61^2 - 3 numbers can be found at any position. And guess what - we don't care about 0 and 1 anyway, because they don't have 10 1s in their binary representation!
Unfortunatelly, a bottleneck for our randomness is the fact that the bigger prime number we want to generate, the harder it gets. This is the best I can come up with when it comes to generating all the numbers in a range in a shuffled order, without keeping them in memory simultaneously.
So to put everything in use:
We generate seeds of full permutations of 10 elements.
We generate a random prime number.
We randomly choose if we want to generate permutations for the next number in the sequence or a number that we already started (up to a finite number of started numbers).
We use bitmaps of the generated numbers to get said permutations.
Note that this will solve only the problem of k-permutations of n elements without repetition. I still haven't thought of a way to add repetition.
DISCLAIMER: The following code comes with no guarantees of any kind, explicit or implied. Its point is to further express the author's ideas, rather than be a production ready solution:
require 'prime'
class TokenGenerator
NUMBERS_UPPER_BOUND = 2**61 - 1
HAS_NUMBER_MASK = ('1' * 9 + '0' * (61 - 9)).reverse.to_i(2)
HAS_LOWER_CASE_MASK = ('0' * 9 + '1' * 26 + '0' * 26).reverse.to_i(2)
HAS_UPPER_CASE_MASK = ('0' * (9 + 26) + '1' * 26).reverse.to_i(2)
ALL_CHARACTERS = [*'1'..'9', *'a'..'z', *'A'..'Z']
K_PERMUTATIONS = 0.upto(9).to_a.permutation.to_a # give it a couple of seconds
def initialize
random_prime = Prime.take(10_000).drop(100).sample
#all_numbers_generator = 1.upto(NUMBERS_UPPER_BOUND).lazy.map do |number|
number**random_prime % NUMBERS_UPPER_BOUND
end.select do |number|
!(number & HAS_NUMBER_MASK).zero? and
!(number & HAS_LOWER_CASE_MASK).zero? and
!(number & HAS_UPPER_CASE_MASK).zero? and
number.to_s(2).chars.count('1') == 10
end
#k_permutation_seeds = 61.times.map { K_PERMUTATIONS.shuffle }.to_a # this will take a minute
#numbers_in_iteration = {go_fish: nil}
end
def next
raise StopIteration if #numbers_in_iteration.empty?
number_generator = #numbers_in_iteration.keys.sample
if number_generator == :go_fish
add_next_number if #numbers_in_iteration.size < 1_000_000
self.next
else
next_permutation(number_generator)
end
end
private
def add_next_number
#numbers_in_iteration[#all_numbers_generator.next] = #k_permutation_seeds.sample.to_enum
rescue StopIteration # lol, you actually managed to traverse all 2^61 numbers!
#numbers_in_iteration.delete(:go_fish)
end
def next_permutation(number)
fetch_permutation(number, #numbers_in_iteration[number].next)
rescue StopIteration # all k permutations for this number were already generated
#numbers_in_iteration.delete(number)
self.next
end
def fetch_permutation(number_mask, k_permutation)
k_from_n_indices = number_mask.to_s(2).chars.reverse.map.with_index { |bit, index| index if bit == '1' }.compact
k_permutation.each_with_object([]) { |order_index, k_from_n_values| k_from_n_values << ALL_CHARACTERS[k_from_n_indices[order_index]] }
end
end
EDIT: it turned out that our constraints eliminate too much possibilities. This causes #all_numbers_generator to take too much time testing and skipping numbers. I will try to think of a better generator, but everything else remains valid.
The old version that generates tokens with uniqueness constraint on the containing characters:
numbers = ('0'..'9').to_a
downcase_letters = ('a'..'z').to_a
upcase_letters = downcase_letters.map(&:upcase)
all = [numbers, downcase_letters, upcase_letters]
one_of_each_set = all.map(&:sample)
random_code = (one_of_each_set + (all.flatten - one_of_each_set).sample(7)).shuffle.join
Use 'SafeRandom' Gem GithubLink
It will provide the easiest way to generate random values for Rails 2, Rails 3, Rails 4, Rails 5 compatible.
Here you can use the strong_string method to generate a strong combination of string ( ie combination of the alphabet(uppercase, downcase), number, and symbols
# => Strong string: Minumum number should be greater than 5 otherwise by default 8 character string.
require 'safe_random'
puts SafeRandom.strong_string # => 4skgSy93zaCUZZCoF9WiJF4z3IDCGk%Y
puts SafeRandom.strong_string(3) # => P4eUbcK%
puts SafeRandom.strong_string(5) # => 5$Rkdo

Allocate an array of integers proportionally compensating for rounding errors

I have an array of non-negative values. I want to build an array of values who's sum is 20 so that they are proportional to the first array.
This would be an easy problem, except that I want the proportional array to sum to exactly
20, compensating for any rounding error.
For example, the array
input = [400, 400, 0, 0, 100, 50, 50]
would yield
output = [8, 8, 0, 0, 2, 1, 1]
sum(output) = 20
However, most cases are going to have a lot of rounding errors, like
input = [3, 3, 3, 3, 3, 3, 18]
naively yields
output = [1, 1, 1, 1, 1, 1, 10]
sum(output) = 16 (ouch)
Is there a good way to apportion the output array so that it adds up to 20 every time?
There's a very simple answer to this question: I've done it many times. After each assignment into the new array, you reduce the values you're working with as follows:
Call the first array A, and the new, proportional array B (which starts out empty).
Call the sum of A elements T
Call the desired sum S.
For each element of the array (i) do the following:
a. B[i] = round(A[i] / T * S). (rounding to nearest integer, penny or whatever is required)
b. T = T - A[i]
c. S = S - B[i]
That's it! Easy to implement in any programming language or in a spreadsheet.
The solution is optimal in that the resulting array's elements will never be more than 1 away from their ideal, non-rounded values. Let's demonstrate with your example:
T = 36, S = 20. B[1] = round(A[1] / T * S) = 2. (ideally, 1.666....)
T = 33, S = 18. B[2] = round(A[2] / T * S) = 2. (ideally, 1.666....)
T = 30, S = 16. B[3] = round(A[3] / T * S) = 2. (ideally, 1.666....)
T = 27, S = 14. B[4] = round(A[4] / T * S) = 2. (ideally, 1.666....)
T = 24, S = 12. B[5] = round(A[5] / T * S) = 2. (ideally, 1.666....)
T = 21, S = 10. B[6] = round(A[6] / T * S) = 1. (ideally, 1.666....)
T = 18, S = 9. B[7] = round(A[7] / T * S) = 9. (ideally, 10)
Notice that comparing every value in B with it's ideal value in parentheses, the difference is never more than 1.
It's also interesting to note that rearranging the elements in the array can result in different corresponding values in the resulting array. I've found that arranging the elements in ascending order is best, because it results in the smallest average percentage difference between actual and ideal.
Your problem is similar to a proportional representation where you want to share N seats (in your case 20) among parties proportionnaly to the votes they obtain, in your case [3, 3, 3, 3, 3, 3, 18]
There are several methods used in different countries to handle the rounding problem. My code below uses the Hagenbach-Bischoff quota method used in Switzerland, which basically allocates the seats remaining after an integer division by (N+1) to parties which have the highest remainder:
def proportional(nseats,votes):
"""assign n seats proportionaly to votes using Hagenbach-Bischoff quota
:param nseats: int number of seats to assign
:param votes: iterable of int or float weighting each party
:result: list of ints seats allocated to each party
"""
quota=sum(votes)/(1.+nseats) #force float
frac=[vote/quota for vote in votes]
res=[int(f) for f in frac]
n=nseats-sum(res) #number of seats remaining to allocate
if n==0: return res #done
if n<0: return [min(x,nseats) for x in res] # see siamii's comment
#give the remaining seats to the n parties with the largest remainder
remainders=[ai-bi for ai,bi in zip(frac,res)]
limit=sorted(remainders,reverse=True)[n-1]
#n parties with remainter larger than limit get an extra seat
for i,r in enumerate(remainders):
if r>=limit:
res[i]+=1
n-=1 # attempt to handle perfect equality
if n==0: return res #done
raise #should never happen
However this method doesn't always give the same number of seats to parties with perfect equality as in your case:
proportional(20,[3, 3, 3, 3, 3, 3, 18])
[2,2,2,2,1,1,10]
You have set 3 incompatible requirements. An integer-valued array proportional to [1,1,1] cannot be made to sum to exactly 20. You must choose to break one of the "sum to exactly 20", "proportional to input", and "integer values" requirements.
If you choose to break the requirement for integer values, then use floating point or rational numbers. If you choose to break the exact sum requirement, then you've already solved the problem. Choosing to break proportionality is a little trickier. One approach you might take is to figure out how far off your sum is, and then distribute corrections randomly through the output array. For example, if your input is:
[1, 1, 1]
then you could first make it sum as well as possible while still being proportional:
[7, 7, 7]
and since 20 - (7+7+7) = -1, choose one element to decrement at random:
[7, 6, 7]
If the error was 4, you would choose four elements to increment.
A naïve solution that doesn't perform well, but will provide the right result...
Write an iterator that given an array with eight integers (candidate) and the input array, output the index of the element that is farthest away from being proportional to the others (pseudocode):
function next_index(candidate, input)
// Calculate weights
for i in 1 .. 8
w[i] = candidate[i] / input[i]
end for
// find the smallest weight
min = 0
min_index = 0
for i in 1 .. 8
if w[i] < min then
min = w[i]
min_index = i
end if
end for
return min_index
end function
Then just do this
result = [0, 0, 0, 0, 0, 0, 0, 0]
result[next_index(result, input)]++ for 1 .. 20
If there is no optimal solution, it'll skew towards the beginning of the array.
Using the approach above, you can reduce the number of iterations by rounding down (as you did in your example) and then just use the approach above to add what has been left out due to rounding errors:
result = <<approach using rounding down>>
while sum(result) < 20
result[next_index(result, input)]++
So the answers and comments above were helpful... particularly the decreasing sum comment from #Frederik.
The solution I came up with takes advantage of the fact that for an input array v, sum(v_i * 20) is divisible by sum(v). So for each value in v, I mulitply by 20 and divide by the sum. I keep the quotient, and accumulate the remainder. Whenever the accumulator is greater than sum(v), I add one to the value. That way I'm guaranteed that all the remainders get rolled into the results.
Is that legible? Here's the implementation in Python:
def proportion(values, total):
# set up by getting the sum of the values and starting
# with an empty result list and accumulator
sum_values = sum(values)
new_values = []
acc = 0
for v in values:
# for each value, find quotient and remainder
q, r = divmod(v * total, sum_values)
if acc + r < sum_values:
# if the accumlator plus remainder is too small, just add and move on
acc += r
else:
# we've accumulated enough to go over sum(values), so add 1 to result
if acc > r:
# add to previous
new_values[-1] += 1
else:
# add to current
q += 1
acc -= sum_values - r
# save the new value
new_values.append(q)
# accumulator is guaranteed to be zero at the end
print new_values, sum_values, acc
return new_values
(I added an enhancement that if the accumulator > remainder, I increment the previous value instead of the current value)

Code to write a random array of x numbers with no duplicates [duplicate]

This is what I have so far:
myArray.map!{ rand(max) }
Obviously, however, sometimes the numbers in the list are not unique. How can I make sure my list only contains unique numbers without having to create a bigger list from which I then just pick the n unique numbers?
Edit:
I'd really like to see this done w/o loop - if at all possible.
(0..50).to_a.sort{ rand() - 0.5 }[0..x]
(0..50).to_a can be replaced with any array.
0 is "minvalue", 50 is "max value"
x is "how many values i want out"
of course, its impossible for x to be permitted to be greater than max-min :)
In expansion of how this works
(0..5).to_a ==> [0,1,2,3,4,5]
[0,1,2,3,4,5].sort{ -1 } ==> [0, 1, 2, 4, 3, 5] # constant
[0,1,2,3,4,5].sort{ 1 } ==> [5, 3, 0, 4, 2, 1] # constant
[0,1,2,3,4,5].sort{ rand() - 0.5 } ==> [1, 5, 0, 3, 4, 2 ] # random
[1, 5, 0, 3, 4, 2 ][ 0..2 ] ==> [1, 5, 0 ]
Footnotes:
It is worth mentioning that at the time this question was originally answered, September 2008, that Array#shuffle was either not available or not already known to me, hence the approximation in Array#sort
And there's a barrage of suggested edits to this as a result.
So:
.sort{ rand() - 0.5 }
Can be better, and shorter expressed on modern ruby implementations using
.shuffle
Additionally,
[0..x]
Can be more obviously written with Array#take as:
.take(x)
Thus, the easiest way to produce a sequence of random numbers on a modern ruby is:
(0..50).to_a.shuffle.take(x)
This uses Set:
require 'set'
def rand_n(n, max)
randoms = Set.new
loop do
randoms << rand(max)
return randoms.to_a if randoms.size >= n
end
end
Ruby 1.9 offers the Array#sample method which returns an element, or elements randomly selected from an Array. The results of #sample won't include the same Array element twice.
(1..999).to_a.sample 5 # => [389, 30, 326, 946, 746]
When compared to the to_a.sort_by approach, the sample method appears to be significantly faster. In a simple scenario I compared sort_by to sample, and got the following results.
require 'benchmark'
range = 0...1000000
how_many = 5
Benchmark.realtime do
range.to_a.sample(how_many)
end
=> 0.081083
Benchmark.realtime do
(range).sort_by{rand}[0...how_many]
end
=> 2.907445
Just to give you an idea about speed, I ran four versions of this:
Using Sets, like Ryan's suggestion.
Using an Array slightly larger than necessary, then doing uniq! at the end.
Using a Hash, like Kyle suggested.
Creating an Array of the required size, then sorting it randomly, like Kent's suggestion (but without the extraneous "- 0.5", which does nothing).
They're all fast at small scales, so I had them each create a list of 1,000,000 numbers. Here are the times, in seconds:
Sets: 628
Array + uniq: 629
Hash: 645
fixed Array + sort: 8
And no, that last one is not a typo. So if you care about speed, and it's OK for the numbers to be integers from 0 to whatever, then my exact code was:
a = (0...1000000).sort_by{rand}
Yes, it's possible to do this without a loop and without keeping track of which numbers have been chosen. It's called a Linear Feedback Shift Register: Create Random Number Sequence with No Repeats
[*1..99].sample(4) #=> [64, 99, 29, 49]
According to Array#sample docs,
The elements are chosen by using random and unique indices
If you need SecureRandom (which uses computer noise instead of pseudorandom numbers):
require 'securerandom'
[*1..99].sample(4, random: SecureRandom) #=> [2, 75, 95, 37]
How about a play on this? Unique random numbers without needing to use Set or Hash.
x = 0
(1..100).map{|iter| x += rand(100)}.shuffle
You could use a hash to track the random numbers you've used so far:
seen = {}
max = 100
(1..10).map { |n|
x = rand(max)
while (seen[x])
x = rand(max)
end
x
}
Rather than add the items to a list/array, add them to a Set.
If you have a finite list of possible random numbers (i.e. 1 to 100), then Kent's solution is good.
Otherwise there is no other good way to do it without looping. The problem is you MUST do a loop if you get a duplicate. My solution should be efficient and the looping should not be too much more than the size of your array (i.e. if you want 20 unique random numbers, it might take 25 iterations on average.) Though the number of iterations gets worse the more numbers you need and the smaller max is. Here is my above code modified to show how many iterations are needed for the given input:
require 'set'
def rand_n(n, max)
randoms = Set.new
i = 0
loop do
randoms << rand(max)
break if randoms.size > n
i += 1
end
puts "Took #{i} iterations for #{n} random numbers to a max of #{max}"
return randoms.to_a
end
I could write this code to LOOK more like Array.map if you want :)
Based on Kent Fredric's solution above, this is what I ended up using:
def n_unique_rand(number_to_generate, rand_upper_limit)
return (0..rand_upper_limit - 1).sort_by{rand}[0..number_to_generate - 1]
end
Thanks Kent.
No loops with this method
Array.new(size) { rand(max) }
require 'benchmark'
max = 1000000
size = 5
Benchmark.realtime do
Array.new(size) { rand(max) }
end
=> 1.9114e-05
Here is one solution:
Suppose you want these random numbers to be between r_min and r_max. For each element in your list, generate a random number r, and make list[i]=list[i-1]+r. This would give you random numbers which are monotonically increasing, guaranteeing uniqueness provided that
r+list[i-1] does not over flow
r > 0
For the first element, you would use r_min instead of list[i-1]. Once you are done, you can shuffle the list so the elements are not so obviously in order.
The only problem with this method is when you go over r_max and still have more elements to generate. In this case, you can reset r_min and r_max to 2 adjacent element you have already computed, and simply repeat the process. This effectively runs the same algorithm over an interval where there are no numbers already used. You can keep doing this until you have the list populated.
As far as it is nice to know in advance the maxium value, you can do this way:
class NoLoopRand
def initialize(max)
#deck = (0..max).to_a
end
def getrnd
return #deck.delete_at(rand(#deck.length - 1))
end
end
and you can obtain random data in this way:
aRndNum = NoLoopRand.new(10)
puts aRndNum.getrnd
you'll obtain nil when all the values will be exausted from the deck.
Method 1
Using Kent's approach, it is possible to generate an array of arbitrary length keeping all values in a limited range:
# Generates a random array of length n.
#
# #param n length of the desired array
# #param lower minimum number in the array
# #param upper maximum number in the array
def ary_rand(n, lower, upper)
values_set = (lower..upper).to_a
repetition = n/(upper-lower+1) + 1
(values_set*repetition).sample n
end
Method 2
Another, possibly more efficient, method modified from same Kent's another answer:
def ary_rand2(n, lower, upper)
v = (lower..upper).to_a
(0...n).map{ v[rand(v.length)] }
end
Output
puts (ary_rand 5, 0, 9).to_s # [0, 8, 2, 5, 6] expected
puts (ary_rand 5, 0, 9).to_s # [7, 8, 2, 4, 3] different result for same params
puts (ary_rand 5, 0, 1).to_s # [0, 0, 1, 0, 1] repeated values from limited range
puts (ary_rand 5, 9, 0).to_s # [] no such range :)

Non-repeating pseudo random number stream with 'clumping'

I'm looking for a method to generate a pseudorandom stream with a somewhat odd property - I want clumps of nearby numbers.
The tricky part is, I can only keep a limited amount of state no matter how large the range is. There are algorithms that give a sequence of results with minimal state (linear congruence?)
Clumping means that there's a higher probability that the next number will be close rather than far.
Example of a desirable sequence (mod 10): 1 3 9 8 2 7 5 6 4
I suspect this would be more obvious with a larger stream, but difficult to enter by hand.
Update:
I don't understand why it's impossible, but yes, I am looking for, as Welbog summarized:
Non-repeating
Non-Tracking
"Clumped"
Cascade a few LFSRs with periods smaller than you need, combining them to get a result such than the fastest changing register controls the least significant values. So if you have L1 with period 3, L2 with period 15 and L3 with some larger period, N = L1(n) + 3 * L2(n/3) + 45 * L3(n/45). This will obviously generate 3 clumped values, then jump and general another 3 clumped values. Use something other than multiplication ( such as mixing some of the bits of the higher period registers ) or different periods to make the clump spread wider than the period of the first register. It won't be particularly smoothly random, but it will be clumpy and non-repeating.
For the record, I'm in the "non-repeating, non-random, non-tracking is a lethal combination" camp, and I hope some simple though experiments will shed some light. This is not formal proof by any means. Perhaps someone will shore it up.
So, I can generate a sequence that has some randomness easily:
Given x_i, x_(i+1) ~ U(x_i, r), where r > x_i.
For example:
if x_i = 6, x_(i+1) is random choice from (6+epsilon, some_other_real>6). This guarantees non-repeating, but at the cost that the distribution is monotonically increasing.
Without some condition (like monotonicity), inherent to the sequence of generated numbers themselves, how else can you guarantee uniqueness without carrying state?
Edit: So after researching RBarryYoung's claim of "Linear Congruential Generators" (not differentiators... is this what RBY meant), and clearly, I was wrong! These sequences exist, and by necessity, any PRNG whose next number is dependent only on the current number and some global, non changing state can't have repeats within a cycle (after some initial burn it period).
By defining the "clumping features" in terms of the probability distribution of its size, and the probability distribution of its range, you can then use simple random generators with the underlying distribution and produce the sequences.
One way to get "clumpy" numbers would be to use a normal distribution.
You start the random list with your "initial" random value, then you generate a random number with the mean of the previous random value and a constant variance, and repeat as necessary. The overall variance of your entire list of random numbers should be approximately constant, but the "running average" of your numbers will drift randomly with no particular bias.
>>> r = [1]
>>> for x in range(20):
r.append(random.normalvariate(r[-1], 1))
>>> r
[1, 0.84583267252801408, 0.18585962715584259, 0.063850022580489857, 1.2892164299497422,
0.019381814281494991, 0.16043424295472472, 0.78446377124854461, 0.064401889591144235,
0.91845494342245126, 0.20196939102054179, -1.6521524237203531, -1.5373703928440983,
-2.1442902977248215, 0.27655425357702956, 0.44417440706703393, 1.3128647361934616,
2.7402744740729705, 5.1420432435119352, 5.9326297626477125, 5.1547981880261782]
I know it's hard to tell by looking at the numbers, but you can sort of see that the numbers clump together a little bit - the 5.X's at the end, and the 0.X's on the second row.
If you need only integers, you can just use a very large mean and variance, and truncate/divide to obtain integer output. Normal distributions by definition are a continuous distribution, meaning all real numbers are potential output - it is not restricted to integers.
Here's a quick scatter plot in Excel of 200 numbers generated this way (starting at 0, constant variance of 1):
scatter data http://img178.imageshack.us/img178/8677/48855312.png
Ah, I just read that you want non-repeating numbers. No guarantee of that in a normal distribution, so you might have to take into account some of the other approaches others have mentioned.
I don't know of an existing algorithm that would do this, but it doesn't seem difficult to roll your own (depending on how stringent the "limited amount of state" requirement is). For example:
RANGE = (1..1000)
CLUMP_ODDS = .5
CLUMP_DIST = 10
last = rand(RANGE)
while still_want_numbers
if rand(CLUMP_ODDS) # clump!
next = last + rand(CLUMP_DIST) - (CLUMP_DIST / 2) # do some boundary checking here
else # don't clump!
next = rand(RANGE)
end
print next
last = next
end
It's a little rudimentary, but would something like that suit your needs?
In the range [0, 10] the following should give a uniform distribution. random() yields a (pseudo) random number r with 0 <= r < 1.
x(n + 1) = (x(n) + 5 * (2 * random() - 1)) mod 10
You can get your desired behavior by delinearizing random() - for example random()^k will be skewed towards small numbers for k > 1. An possible function could be the following, but you will have to try some exponents to find your desired distribution. And keep the exponent odd, if you use the following function... ;)
x(n + 1) = (x(n) + 5 * (2 * random() - 1)^3) mod 10
How about (psuedo code)
// clumpiness static in that value retained between calls
static float clumpiness = 0.0f; // from 0 to 1.0
method getNextvalue(int lastValue)
float r = rand(); // float from 0 to 1
int change = MAXCHANGE * (r - 0.5) * (1 - clumpiness);
clumpiness += 0.1 * rand() ;
if (clumpiness >= 1.0) clumpiness -= 1.0;
// -----------------------------------------
return Round(lastValue + change);
Perhaps you could generate a random sequence, and then do some strategic element swapping to get the desired property.
For example, if you find 3 values a,b,c in the sequence such that a>b and a>c, then with some probability you could swap elements a and b or elements a and c.
EDIT in response to comment:
Yes, you could have a buffer on the stream that is whatever size you are comfortable with. Your swapping rules could be deterministic, or based on another known, reproducible psuedo-random sequence.
Does a sequence like 0, 94, 5, 1, 3, 4, 14, 8, 10, 9, 11, 6, 12, 7, 16, 15, 17, 19, 22, 21, 20, 13, 18, 25, 24, 26, 29, 28, 31, 23, 36, 27, 42, 41, 30, 33, 34, 37, 35, 32, 39, 47, 44, 46, 40, 38, 50, 43, 45, 48, 52, 49, 55, 54, 57, 56, 64, 51, 60, 53, 59, 62, 61, 69, 68, 63, 58, 65, 71, 70, 66, 73, 67, 72, 79, 74, 81, 77, 76, 75, 78, 83, 82, 85, 80, 87, 84, 90, 89, 86, 96, 93, 98, 88, 92, 99, 95, 97, 2, 91 (mod 100) look good to you?
This is the output of a small ruby program (explanations below):
#!/usr/bin/env ruby
require 'digest/md5'
$seed = 'Kind of a password'
$n = 100 # size of sequence
$k = 10 # mixing factor (higher means less clumping)
def pseudo_random_bit(k, n)
Digest::MD5.hexdigest($seed + "#{k}|#{n}")[-1] & 1
end
def sequence(x)
h = $n/2
$k.times do |k|
# maybe exchange 1st with 2nd, 3rd with 4th, etc
x ^= pseudo_random_bit(k, x >> 1) if x < 2*h
# maybe exchange 1st with last
if [0, $n-1].include? x
x ^= ($n-1)*pseudo_random_bit(k, 2*h)
end
# move 1st to end
x = (x - 1) % $n
# maybe exchange 1st with 2nd, 3rd with 4th, etc
# (corresponds to 2nd with 3rd, 4th with 5th, etc)
x ^= pseudo_random_bit(k, h+(x >> 1)) if x < 2*(($n-1)/2)
# move 1st to front
x = (x + 1) % $n
end
x
end
puts (0..99).map {|x| sequence(x)}.join(', ')
The idea is basically to start with the sequence 0..n-1 and disturb the order by passing k times over the sequence (more passes means less clumping). In each pass one first looks at the pairs of numbers at positions 0 and 1, 2 and 3, 4 and 5 etc (general: 2i and 2i+1) and flips a coin for each pair. Heads (=1) means exchange the numbers in the pair, tails (=0) means don't exchange them. Then one does the same for the pairs at positions 1 and 2, 3 and 4, etc (general: 2i+1 and 2i+2). As you mentioned that your sequence is mod 10, I additionally exchanged positions 0 and n-1 if the coin for this pair dictates it.
A single number x can be mapped modulo n after k passes to any number of the interval [x-k, x+k] and is approximately binomial distributed around x. Pairs (x, x+1) of numbers are not independently modified.
As pseudo-random generator I used only the last of the 128 output bits of the hash function MD5, choose whatever function you want instead. Thanks to the clumping one won't get a "secure" (= unpredictable) random sequence.
Maybe you can chain together 2 or more LCGs in a similar manner described for the LSFRs described here. Incement the least-significant LCG with its seed, on full-cycle, increment the next LCG. You only need to store a seed for each LCG. You could then weight each part and sum the parts together. To avoid repititions in the 'clumped' LstSig part you can randomly reseed the LCG on each full cycle.

How do I generate a list of n unique random numbers in Ruby?

This is what I have so far:
myArray.map!{ rand(max) }
Obviously, however, sometimes the numbers in the list are not unique. How can I make sure my list only contains unique numbers without having to create a bigger list from which I then just pick the n unique numbers?
Edit:
I'd really like to see this done w/o loop - if at all possible.
(0..50).to_a.sort{ rand() - 0.5 }[0..x]
(0..50).to_a can be replaced with any array.
0 is "minvalue", 50 is "max value"
x is "how many values i want out"
of course, its impossible for x to be permitted to be greater than max-min :)
In expansion of how this works
(0..5).to_a ==> [0,1,2,3,4,5]
[0,1,2,3,4,5].sort{ -1 } ==> [0, 1, 2, 4, 3, 5] # constant
[0,1,2,3,4,5].sort{ 1 } ==> [5, 3, 0, 4, 2, 1] # constant
[0,1,2,3,4,5].sort{ rand() - 0.5 } ==> [1, 5, 0, 3, 4, 2 ] # random
[1, 5, 0, 3, 4, 2 ][ 0..2 ] ==> [1, 5, 0 ]
Footnotes:
It is worth mentioning that at the time this question was originally answered, September 2008, that Array#shuffle was either not available or not already known to me, hence the approximation in Array#sort
And there's a barrage of suggested edits to this as a result.
So:
.sort{ rand() - 0.5 }
Can be better, and shorter expressed on modern ruby implementations using
.shuffle
Additionally,
[0..x]
Can be more obviously written with Array#take as:
.take(x)
Thus, the easiest way to produce a sequence of random numbers on a modern ruby is:
(0..50).to_a.shuffle.take(x)
This uses Set:
require 'set'
def rand_n(n, max)
randoms = Set.new
loop do
randoms << rand(max)
return randoms.to_a if randoms.size >= n
end
end
Ruby 1.9 offers the Array#sample method which returns an element, or elements randomly selected from an Array. The results of #sample won't include the same Array element twice.
(1..999).to_a.sample 5 # => [389, 30, 326, 946, 746]
When compared to the to_a.sort_by approach, the sample method appears to be significantly faster. In a simple scenario I compared sort_by to sample, and got the following results.
require 'benchmark'
range = 0...1000000
how_many = 5
Benchmark.realtime do
range.to_a.sample(how_many)
end
=> 0.081083
Benchmark.realtime do
(range).sort_by{rand}[0...how_many]
end
=> 2.907445
Just to give you an idea about speed, I ran four versions of this:
Using Sets, like Ryan's suggestion.
Using an Array slightly larger than necessary, then doing uniq! at the end.
Using a Hash, like Kyle suggested.
Creating an Array of the required size, then sorting it randomly, like Kent's suggestion (but without the extraneous "- 0.5", which does nothing).
They're all fast at small scales, so I had them each create a list of 1,000,000 numbers. Here are the times, in seconds:
Sets: 628
Array + uniq: 629
Hash: 645
fixed Array + sort: 8
And no, that last one is not a typo. So if you care about speed, and it's OK for the numbers to be integers from 0 to whatever, then my exact code was:
a = (0...1000000).sort_by{rand}
Yes, it's possible to do this without a loop and without keeping track of which numbers have been chosen. It's called a Linear Feedback Shift Register: Create Random Number Sequence with No Repeats
[*1..99].sample(4) #=> [64, 99, 29, 49]
According to Array#sample docs,
The elements are chosen by using random and unique indices
If you need SecureRandom (which uses computer noise instead of pseudorandom numbers):
require 'securerandom'
[*1..99].sample(4, random: SecureRandom) #=> [2, 75, 95, 37]
How about a play on this? Unique random numbers without needing to use Set or Hash.
x = 0
(1..100).map{|iter| x += rand(100)}.shuffle
You could use a hash to track the random numbers you've used so far:
seen = {}
max = 100
(1..10).map { |n|
x = rand(max)
while (seen[x])
x = rand(max)
end
x
}
Rather than add the items to a list/array, add them to a Set.
If you have a finite list of possible random numbers (i.e. 1 to 100), then Kent's solution is good.
Otherwise there is no other good way to do it without looping. The problem is you MUST do a loop if you get a duplicate. My solution should be efficient and the looping should not be too much more than the size of your array (i.e. if you want 20 unique random numbers, it might take 25 iterations on average.) Though the number of iterations gets worse the more numbers you need and the smaller max is. Here is my above code modified to show how many iterations are needed for the given input:
require 'set'
def rand_n(n, max)
randoms = Set.new
i = 0
loop do
randoms << rand(max)
break if randoms.size > n
i += 1
end
puts "Took #{i} iterations for #{n} random numbers to a max of #{max}"
return randoms.to_a
end
I could write this code to LOOK more like Array.map if you want :)
Based on Kent Fredric's solution above, this is what I ended up using:
def n_unique_rand(number_to_generate, rand_upper_limit)
return (0..rand_upper_limit - 1).sort_by{rand}[0..number_to_generate - 1]
end
Thanks Kent.
No loops with this method
Array.new(size) { rand(max) }
require 'benchmark'
max = 1000000
size = 5
Benchmark.realtime do
Array.new(size) { rand(max) }
end
=> 1.9114e-05
Here is one solution:
Suppose you want these random numbers to be between r_min and r_max. For each element in your list, generate a random number r, and make list[i]=list[i-1]+r. This would give you random numbers which are monotonically increasing, guaranteeing uniqueness provided that
r+list[i-1] does not over flow
r > 0
For the first element, you would use r_min instead of list[i-1]. Once you are done, you can shuffle the list so the elements are not so obviously in order.
The only problem with this method is when you go over r_max and still have more elements to generate. In this case, you can reset r_min and r_max to 2 adjacent element you have already computed, and simply repeat the process. This effectively runs the same algorithm over an interval where there are no numbers already used. You can keep doing this until you have the list populated.
As far as it is nice to know in advance the maxium value, you can do this way:
class NoLoopRand
def initialize(max)
#deck = (0..max).to_a
end
def getrnd
return #deck.delete_at(rand(#deck.length - 1))
end
end
and you can obtain random data in this way:
aRndNum = NoLoopRand.new(10)
puts aRndNum.getrnd
you'll obtain nil when all the values will be exausted from the deck.
Method 1
Using Kent's approach, it is possible to generate an array of arbitrary length keeping all values in a limited range:
# Generates a random array of length n.
#
# #param n length of the desired array
# #param lower minimum number in the array
# #param upper maximum number in the array
def ary_rand(n, lower, upper)
values_set = (lower..upper).to_a
repetition = n/(upper-lower+1) + 1
(values_set*repetition).sample n
end
Method 2
Another, possibly more efficient, method modified from same Kent's another answer:
def ary_rand2(n, lower, upper)
v = (lower..upper).to_a
(0...n).map{ v[rand(v.length)] }
end
Output
puts (ary_rand 5, 0, 9).to_s # [0, 8, 2, 5, 6] expected
puts (ary_rand 5, 0, 9).to_s # [7, 8, 2, 4, 3] different result for same params
puts (ary_rand 5, 0, 1).to_s # [0, 0, 1, 0, 1] repeated values from limited range
puts (ary_rand 5, 9, 0).to_s # [] no such range :)

Resources