This question here does not seem to help: Calculating Percentiles (Ruby)
I would like to calculate 95th percentile (or, indeed, any other desired percentile) from an array of numbers. Ultimately, this will be applied in Rails to calculate distribution against a large number of records.
But, if I can determine how to accurately determine a given percentile from an array of numbers, I can take it from there.
Frankly, I am surprised that I haven't been able to find some sort of gem that would have such functions--I haven't found one yet.
Help is greatly appreciated.
If you want to replicate Excel's PERCENTILE function then try the following:
def percentile(values, percentile)
values_sorted = values.sort
k = (percentile*(values_sorted.length-1)+1).floor - 1
f = (percentile*(values_sorted.length-1)+1).modulo(1)
return values_sorted[k] + (f * (values_sorted[k+1] - values_sorted[k]))
end
values = [1, 2, 3, 4]
p = 0.95
puts percentile(values, p)
#=> 3.85
The formula is based on the QUARTILE method, which is really just a specific percentiles - https://support.microsoft.com/en-us/office/quartile-inc-function-1bbacc80-5075-42f1-aed6-47d735c4819d.
If your are interested in existing gem, then descriptive_statistics gem is best I found so far for percentile function.
IRB Session
> require 'descriptive_statistics'
=> true
irb(main):009:0> data = [1, 2, 3, 4]
=> [1, 2, 3, 4]
irb(main):010:0> data.percentile(95)
=> 3.8499999999999996
irb(main):011:0> data.percentile(95).round(2)
=> 3.85
Good part of gem is its elegant way of describing "I want 95 percentile of data".
Percentile based on count of items
a = [1,2,3,4,5,6,10,11,12,13,14,15,20,30,40,50,60,61,91,99,120]
def percentile_by_count(array,percentile)
count = (array.length * (1.0-percentile)).floor
array.sort[-count..-1]
end
# 80th percentile (21 items*80% == 16.8 items are below; pick the top 4)
p percentile_by_count(a,0.8) #=> [61, 91, 99, 120]
Percentile based on range of values
def percentile_by_value(array,percentile)
min, max = array.minmax
range = max - min
min_value = (max-min)*percentile + min
array.select{ |v| v >= min_value }
end
# 80th percentile (119 * 80% = 95.2; pick values above this)
p percentile_by_value(a,0.8) #=> [99, 120]
Interestingly, Excel's PERCENTILE function returns 60 as the first value for the 80th percentile. If you want this result—if you want an item falling on the cusp of the limit to be included— then change the .floor above to .ceil.
This is the method I developed in my own statistical library:
def quantiles(data, probs=[0.25, 0.50, 0.75])
values = data.sort
probs.map do |prob|
h = 1 + (values.count - 1) * prob
mod = h % 1
(1 - mod) * values[h.floor - 1] + (mod) * values[h.ceil - 1]
end
end
If you just want one quantile, then do quantiles(data, [0.95]).
Related
I've got an array of hashes (sorted), something like this:
testArray = [{price: 540, volume: 12},
{price: 590, volume: 18},
{price: 630, volume: 50}]
Now I want to calculate the mean value up to certain total volume. Let's say someone wants to buy 40 pieces and he wants it the cheapest way. It would mean an average price of (540*12+590*18+630*50)/40 money units.
My first attempt is following:
testArray.each do |priceHash|
#priceArray << priceHash.fetch(:price)
#volumeArray << priceHash.fetch(:volume)
end
def calculateMiddlePrice(priceArray, volumeArray, totalAmount)
result = 0
# Here some crazy wild magic happens
(0...volumeArray.count).inject(0) do |r, i|
if (volumeArray[0..i].inject(:+)) < totalAmount
r += volumeArray[i]*priceArray[i]
elsif volumeArray[0..i-1].inject(:+) < totalAmount && volumeArray[0..i].inject(:+) >= totalAmount
theRest = volumeArray[i] - (volumeArray[0..i].inject(:+) - totalAmount)
r += theRest * priceArray[i]
elsif volumeArray[0] > totalAmount
r = totalAmount * priceArray[0]
end
result = r
end
result
end
Right now I'm not even sure why it works, but it does. However this absolutely ridiculous code in my eyes.
My second thought was to cut my testArray when the total amount is achieved. The code looks better
testAmount = 31
def returnIndexForSlice(array, amount)
sum = 0
array.each_index do |index|
p sum += array[index][:volume]
if sum >= amount
return index+1
end
end
end
testArray.slice(0,returnIndexForSlice(testArray, testAmount))
Still, this just doesn't feel that right, "rubyish" if you could say so. I checked almost every method for array class, played around with bsearch, however I can't figure out a really elegant way of solving my problem.
What's crossing my mind is something like that:
amountToCheck = 31
array.some_method.with_index {|sum, index| return index if sum >= amountToCheck}
But is there such method or any other way?
Given your prices array of hashes:
prices = [ {price: 540, volume: 12},
{price: 590, volume: 18},
{price: 630, volume: 50}]
You can calculate your result in 2 steps.
def calc_price(prices, amount)
order = prices.flat_map{|item| [item[:price]] * item[:volume] } #step 1
order.first(amount).reduce(:+)/amount #step 2
end
Step 1: Create an array with each individual item in it (if the prices aren't sorted, you have to add a sort_by clause). In other words, expand the prices into a numeric array containing twelve 540's, 18 590's, etc. This uses Ruby's array repetition method: [n] * 3 = [n, n, n].
Step 2: Average the first n elements
Result:
calc_price(prices, 40)
=> 585
This is what I have so far:
myArray.map!{ rand(max) }
Obviously, however, sometimes the numbers in the list are not unique. How can I make sure my list only contains unique numbers without having to create a bigger list from which I then just pick the n unique numbers?
Edit:
I'd really like to see this done w/o loop - if at all possible.
(0..50).to_a.sort{ rand() - 0.5 }[0..x]
(0..50).to_a can be replaced with any array.
0 is "minvalue", 50 is "max value"
x is "how many values i want out"
of course, its impossible for x to be permitted to be greater than max-min :)
In expansion of how this works
(0..5).to_a ==> [0,1,2,3,4,5]
[0,1,2,3,4,5].sort{ -1 } ==> [0, 1, 2, 4, 3, 5] # constant
[0,1,2,3,4,5].sort{ 1 } ==> [5, 3, 0, 4, 2, 1] # constant
[0,1,2,3,4,5].sort{ rand() - 0.5 } ==> [1, 5, 0, 3, 4, 2 ] # random
[1, 5, 0, 3, 4, 2 ][ 0..2 ] ==> [1, 5, 0 ]
Footnotes:
It is worth mentioning that at the time this question was originally answered, September 2008, that Array#shuffle was either not available or not already known to me, hence the approximation in Array#sort
And there's a barrage of suggested edits to this as a result.
So:
.sort{ rand() - 0.5 }
Can be better, and shorter expressed on modern ruby implementations using
.shuffle
Additionally,
[0..x]
Can be more obviously written with Array#take as:
.take(x)
Thus, the easiest way to produce a sequence of random numbers on a modern ruby is:
(0..50).to_a.shuffle.take(x)
This uses Set:
require 'set'
def rand_n(n, max)
randoms = Set.new
loop do
randoms << rand(max)
return randoms.to_a if randoms.size >= n
end
end
Ruby 1.9 offers the Array#sample method which returns an element, or elements randomly selected from an Array. The results of #sample won't include the same Array element twice.
(1..999).to_a.sample 5 # => [389, 30, 326, 946, 746]
When compared to the to_a.sort_by approach, the sample method appears to be significantly faster. In a simple scenario I compared sort_by to sample, and got the following results.
require 'benchmark'
range = 0...1000000
how_many = 5
Benchmark.realtime do
range.to_a.sample(how_many)
end
=> 0.081083
Benchmark.realtime do
(range).sort_by{rand}[0...how_many]
end
=> 2.907445
Just to give you an idea about speed, I ran four versions of this:
Using Sets, like Ryan's suggestion.
Using an Array slightly larger than necessary, then doing uniq! at the end.
Using a Hash, like Kyle suggested.
Creating an Array of the required size, then sorting it randomly, like Kent's suggestion (but without the extraneous "- 0.5", which does nothing).
They're all fast at small scales, so I had them each create a list of 1,000,000 numbers. Here are the times, in seconds:
Sets: 628
Array + uniq: 629
Hash: 645
fixed Array + sort: 8
And no, that last one is not a typo. So if you care about speed, and it's OK for the numbers to be integers from 0 to whatever, then my exact code was:
a = (0...1000000).sort_by{rand}
Yes, it's possible to do this without a loop and without keeping track of which numbers have been chosen. It's called a Linear Feedback Shift Register: Create Random Number Sequence with No Repeats
[*1..99].sample(4) #=> [64, 99, 29, 49]
According to Array#sample docs,
The elements are chosen by using random and unique indices
If you need SecureRandom (which uses computer noise instead of pseudorandom numbers):
require 'securerandom'
[*1..99].sample(4, random: SecureRandom) #=> [2, 75, 95, 37]
How about a play on this? Unique random numbers without needing to use Set or Hash.
x = 0
(1..100).map{|iter| x += rand(100)}.shuffle
You could use a hash to track the random numbers you've used so far:
seen = {}
max = 100
(1..10).map { |n|
x = rand(max)
while (seen[x])
x = rand(max)
end
x
}
Rather than add the items to a list/array, add them to a Set.
If you have a finite list of possible random numbers (i.e. 1 to 100), then Kent's solution is good.
Otherwise there is no other good way to do it without looping. The problem is you MUST do a loop if you get a duplicate. My solution should be efficient and the looping should not be too much more than the size of your array (i.e. if you want 20 unique random numbers, it might take 25 iterations on average.) Though the number of iterations gets worse the more numbers you need and the smaller max is. Here is my above code modified to show how many iterations are needed for the given input:
require 'set'
def rand_n(n, max)
randoms = Set.new
i = 0
loop do
randoms << rand(max)
break if randoms.size > n
i += 1
end
puts "Took #{i} iterations for #{n} random numbers to a max of #{max}"
return randoms.to_a
end
I could write this code to LOOK more like Array.map if you want :)
Based on Kent Fredric's solution above, this is what I ended up using:
def n_unique_rand(number_to_generate, rand_upper_limit)
return (0..rand_upper_limit - 1).sort_by{rand}[0..number_to_generate - 1]
end
Thanks Kent.
No loops with this method
Array.new(size) { rand(max) }
require 'benchmark'
max = 1000000
size = 5
Benchmark.realtime do
Array.new(size) { rand(max) }
end
=> 1.9114e-05
Here is one solution:
Suppose you want these random numbers to be between r_min and r_max. For each element in your list, generate a random number r, and make list[i]=list[i-1]+r. This would give you random numbers which are monotonically increasing, guaranteeing uniqueness provided that
r+list[i-1] does not over flow
r > 0
For the first element, you would use r_min instead of list[i-1]. Once you are done, you can shuffle the list so the elements are not so obviously in order.
The only problem with this method is when you go over r_max and still have more elements to generate. In this case, you can reset r_min and r_max to 2 adjacent element you have already computed, and simply repeat the process. This effectively runs the same algorithm over an interval where there are no numbers already used. You can keep doing this until you have the list populated.
As far as it is nice to know in advance the maxium value, you can do this way:
class NoLoopRand
def initialize(max)
#deck = (0..max).to_a
end
def getrnd
return #deck.delete_at(rand(#deck.length - 1))
end
end
and you can obtain random data in this way:
aRndNum = NoLoopRand.new(10)
puts aRndNum.getrnd
you'll obtain nil when all the values will be exausted from the deck.
Method 1
Using Kent's approach, it is possible to generate an array of arbitrary length keeping all values in a limited range:
# Generates a random array of length n.
#
# #param n length of the desired array
# #param lower minimum number in the array
# #param upper maximum number in the array
def ary_rand(n, lower, upper)
values_set = (lower..upper).to_a
repetition = n/(upper-lower+1) + 1
(values_set*repetition).sample n
end
Method 2
Another, possibly more efficient, method modified from same Kent's another answer:
def ary_rand2(n, lower, upper)
v = (lower..upper).to_a
(0...n).map{ v[rand(v.length)] }
end
Output
puts (ary_rand 5, 0, 9).to_s # [0, 8, 2, 5, 6] expected
puts (ary_rand 5, 0, 9).to_s # [7, 8, 2, 4, 3] different result for same params
puts (ary_rand 5, 0, 1).to_s # [0, 0, 1, 0, 1] repeated values from limited range
puts (ary_rand 5, 9, 0).to_s # [] no such range :)
Given an array like [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10], I want to get a random value that takes into consideration the position.
I want the likelihood of 1 popping up to be way bigger than 10.
Is something like this possible?
For the sake of simplicity let's assume an array arr = [x, y, z] from which we will be sampling values. We'd like to see following relative frequencies of x, y and z:
frequencies = [5, 2, 1]
Preprocess these frequencies to calculate margins for our subsequent dice roll:
thresholds = frequencies.clone
1.upto(frequencies.count - 1).each { |i| thresholds[i] += thresholds[i - 1] }
Let's sum them up.
max = frequencies.reduce :+
Now choose a random number
roll = 1 + rand max
index = thresholds.find_index { |x| roll <= x }
Return arr[index] as a result. To sum up:
def sample arr, frequencies
# assert arr.count == frequencies.count
thresholds = frequencies.clone
1.upto(frequencies.count - 1).each { |i| thresholds[i] += thresholds[i - 1] }
max = frequencies.reduce :+
roll = 1 + rand(max)
index = thresholds.find_index { |x| roll <= x }
arr[index]
end
Let's see how it works.
data = 80_000.times.map { sample [:x, :y, :z], [5, 2, 1] }
A histogram for data shows that sample works as we've intended.
def coin_toss( arr )
arr.detect{ rand(2) == 0 } || arr.last
end
a = (1..10).to_a
10.times{ print coin_toss( a ), ' ' } #=> 1 1 1 9 1 5 4 1 1 3
This takes the first element of the array, flips a coin, returns the element and stops if the coinflip is 'tails'; the same with the next element otherwise. If it is 'heads' all the way, return the last element.
A simple way to implement this with an logarithmic probabilistic of being selected is to simulate coin flips. Generate a random integer 0 and 1, the index to that array to choose is the number of consecutive 1s you get. With this method, the chance of selecting 2 is 1/2 as likely as 1, 3 is 1/4th as likely, etc. You can vary the probability slightly say by generating random numbers between 0 and 5 and count the number of consecutive rounds above 1, which makes each number in the array 4/5th as likely to appear as the one before.
A better and more general way to solve this problem is to use the alias method. See the answer to this question for more information:
Data structure for loaded dice?
I know that I can generate random floats with rand(max). I tried to generate a float in a range, this shouldn't be hard. But e.g rand(1.4512) returns 0, thus rand isn't calculating with floats. Now I tried a little trick, converting the thing to an integer and after randomizing a fitting number in my desired range, calculating it back to a float.. which is not working.
My question is how to do this in a better way. If there is no better way, why is this one not working? (Maybe it's too late for me, I should've started sleeping 2 hours ago..). The whole thing aims to be a method for calculating a "position" field for database records so users can order them manually. I've never done something like this before, maybe someone can hint me with a better solution.
Here's the code so far:
def calculate_position(#elements, index)
min = #elements[index].position
if #elements[index + 1].nil?
pos = min + 1
else
pos = min + (rand(#elements[index + 1].position * 10000000000) / 10000000000)
end
return pos
end
Pass a range of floats to rand
If you want to "create a random float in a range between two floats", just pass a range of floats to rand.
rand(11.2...76.9)
(Tested with Ruby 2.1)
Edit
According to the documentation: https://ruby-doc.org/core-2.4.0/Random.html
There are two different ways to write the random function: inclusive and exclusive for the last value
rand(5..9) # => one of [5, 6, 7, 8, 9]
rand(5...9) # => one of [5, 6, 7, 8]
rand(5.0..9.0) # => between 5.0 and 9.0, including 9.0
rand(5.0...9.0) # => between 5.0 and 9.0, excluding 9.0
Let's recap:
rand() will generate a (psuedo-)random
float between 0 and 1.
rand(int) will generate a
(psuedo-)random integer between 0 and
int.
So something like:
def range (min, max)
rand * (max-min) + min
end
Should do nicely.
Update:
Just tested with a little unit test:
def testRange
min = 1
max = 100
1_000_000.times {
result = range min, max
print "ERROR" if result < min || result > max
}
end
Looks fine.
In 1.9 and 2.0 you can give a range argument to rand:
irb(main):001:0> 10.times { puts rand Math::E..Math::PI }
3.0656267148715446
2.7813979580609587
2.7661725184200563
2.9745784681934655
2.852157154320737
2.741063222095785
2.992638029938756
3.0713152547478866
2.879739743508003
2.7836491029737407
=> 10
I think your best bet is to use rand() to generate a random float between 0 and 1, and then multiply to set the range and add to set the offset:
def float_rand(start_num, end_num=0)
width = end_num-start_num
return (rand*width)+start_num
end
Note: since the order of the terms doesn't matter, making end_num default to 0 allows you to get a random float between 0 and x with float_rand(x).
This is what I have so far:
myArray.map!{ rand(max) }
Obviously, however, sometimes the numbers in the list are not unique. How can I make sure my list only contains unique numbers without having to create a bigger list from which I then just pick the n unique numbers?
Edit:
I'd really like to see this done w/o loop - if at all possible.
(0..50).to_a.sort{ rand() - 0.5 }[0..x]
(0..50).to_a can be replaced with any array.
0 is "minvalue", 50 is "max value"
x is "how many values i want out"
of course, its impossible for x to be permitted to be greater than max-min :)
In expansion of how this works
(0..5).to_a ==> [0,1,2,3,4,5]
[0,1,2,3,4,5].sort{ -1 } ==> [0, 1, 2, 4, 3, 5] # constant
[0,1,2,3,4,5].sort{ 1 } ==> [5, 3, 0, 4, 2, 1] # constant
[0,1,2,3,4,5].sort{ rand() - 0.5 } ==> [1, 5, 0, 3, 4, 2 ] # random
[1, 5, 0, 3, 4, 2 ][ 0..2 ] ==> [1, 5, 0 ]
Footnotes:
It is worth mentioning that at the time this question was originally answered, September 2008, that Array#shuffle was either not available or not already known to me, hence the approximation in Array#sort
And there's a barrage of suggested edits to this as a result.
So:
.sort{ rand() - 0.5 }
Can be better, and shorter expressed on modern ruby implementations using
.shuffle
Additionally,
[0..x]
Can be more obviously written with Array#take as:
.take(x)
Thus, the easiest way to produce a sequence of random numbers on a modern ruby is:
(0..50).to_a.shuffle.take(x)
This uses Set:
require 'set'
def rand_n(n, max)
randoms = Set.new
loop do
randoms << rand(max)
return randoms.to_a if randoms.size >= n
end
end
Ruby 1.9 offers the Array#sample method which returns an element, or elements randomly selected from an Array. The results of #sample won't include the same Array element twice.
(1..999).to_a.sample 5 # => [389, 30, 326, 946, 746]
When compared to the to_a.sort_by approach, the sample method appears to be significantly faster. In a simple scenario I compared sort_by to sample, and got the following results.
require 'benchmark'
range = 0...1000000
how_many = 5
Benchmark.realtime do
range.to_a.sample(how_many)
end
=> 0.081083
Benchmark.realtime do
(range).sort_by{rand}[0...how_many]
end
=> 2.907445
Just to give you an idea about speed, I ran four versions of this:
Using Sets, like Ryan's suggestion.
Using an Array slightly larger than necessary, then doing uniq! at the end.
Using a Hash, like Kyle suggested.
Creating an Array of the required size, then sorting it randomly, like Kent's suggestion (but without the extraneous "- 0.5", which does nothing).
They're all fast at small scales, so I had them each create a list of 1,000,000 numbers. Here are the times, in seconds:
Sets: 628
Array + uniq: 629
Hash: 645
fixed Array + sort: 8
And no, that last one is not a typo. So if you care about speed, and it's OK for the numbers to be integers from 0 to whatever, then my exact code was:
a = (0...1000000).sort_by{rand}
Yes, it's possible to do this without a loop and without keeping track of which numbers have been chosen. It's called a Linear Feedback Shift Register: Create Random Number Sequence with No Repeats
[*1..99].sample(4) #=> [64, 99, 29, 49]
According to Array#sample docs,
The elements are chosen by using random and unique indices
If you need SecureRandom (which uses computer noise instead of pseudorandom numbers):
require 'securerandom'
[*1..99].sample(4, random: SecureRandom) #=> [2, 75, 95, 37]
How about a play on this? Unique random numbers without needing to use Set or Hash.
x = 0
(1..100).map{|iter| x += rand(100)}.shuffle
You could use a hash to track the random numbers you've used so far:
seen = {}
max = 100
(1..10).map { |n|
x = rand(max)
while (seen[x])
x = rand(max)
end
x
}
Rather than add the items to a list/array, add them to a Set.
If you have a finite list of possible random numbers (i.e. 1 to 100), then Kent's solution is good.
Otherwise there is no other good way to do it without looping. The problem is you MUST do a loop if you get a duplicate. My solution should be efficient and the looping should not be too much more than the size of your array (i.e. if you want 20 unique random numbers, it might take 25 iterations on average.) Though the number of iterations gets worse the more numbers you need and the smaller max is. Here is my above code modified to show how many iterations are needed for the given input:
require 'set'
def rand_n(n, max)
randoms = Set.new
i = 0
loop do
randoms << rand(max)
break if randoms.size > n
i += 1
end
puts "Took #{i} iterations for #{n} random numbers to a max of #{max}"
return randoms.to_a
end
I could write this code to LOOK more like Array.map if you want :)
Based on Kent Fredric's solution above, this is what I ended up using:
def n_unique_rand(number_to_generate, rand_upper_limit)
return (0..rand_upper_limit - 1).sort_by{rand}[0..number_to_generate - 1]
end
Thanks Kent.
No loops with this method
Array.new(size) { rand(max) }
require 'benchmark'
max = 1000000
size = 5
Benchmark.realtime do
Array.new(size) { rand(max) }
end
=> 1.9114e-05
Here is one solution:
Suppose you want these random numbers to be between r_min and r_max. For each element in your list, generate a random number r, and make list[i]=list[i-1]+r. This would give you random numbers which are monotonically increasing, guaranteeing uniqueness provided that
r+list[i-1] does not over flow
r > 0
For the first element, you would use r_min instead of list[i-1]. Once you are done, you can shuffle the list so the elements are not so obviously in order.
The only problem with this method is when you go over r_max and still have more elements to generate. In this case, you can reset r_min and r_max to 2 adjacent element you have already computed, and simply repeat the process. This effectively runs the same algorithm over an interval where there are no numbers already used. You can keep doing this until you have the list populated.
As far as it is nice to know in advance the maxium value, you can do this way:
class NoLoopRand
def initialize(max)
#deck = (0..max).to_a
end
def getrnd
return #deck.delete_at(rand(#deck.length - 1))
end
end
and you can obtain random data in this way:
aRndNum = NoLoopRand.new(10)
puts aRndNum.getrnd
you'll obtain nil when all the values will be exausted from the deck.
Method 1
Using Kent's approach, it is possible to generate an array of arbitrary length keeping all values in a limited range:
# Generates a random array of length n.
#
# #param n length of the desired array
# #param lower minimum number in the array
# #param upper maximum number in the array
def ary_rand(n, lower, upper)
values_set = (lower..upper).to_a
repetition = n/(upper-lower+1) + 1
(values_set*repetition).sample n
end
Method 2
Another, possibly more efficient, method modified from same Kent's another answer:
def ary_rand2(n, lower, upper)
v = (lower..upper).to_a
(0...n).map{ v[rand(v.length)] }
end
Output
puts (ary_rand 5, 0, 9).to_s # [0, 8, 2, 5, 6] expected
puts (ary_rand 5, 0, 9).to_s # [7, 8, 2, 4, 3] different result for same params
puts (ary_rand 5, 0, 1).to_s # [0, 0, 1, 0, 1] repeated values from limited range
puts (ary_rand 5, 9, 0).to_s # [] no such range :)