Calculating the expected probability that an expression resolves to True - ruby

Suppose I have a simple program that simulates a coin toss, with a given probability specified by an expression. It might look something like this:
# This is the probability that you will get heads.
$expr = "rand < 0.5"
def get_result(expr)
eval(expr)
end
def toss_coin
if get_result($expr)
return "Head"
else
return "Tail"
end
end
Now, I also want to tell the user what the probability of getting Head is.
For the given expression
"rand < 0.5"
We can eye-ball it and say the probability is 50%, because rand returns a number between 0 and 1, and therefore the expression evaluates to true 50% of the time on average.
However, if I decided to provide a rigged coin toss where the expression used to determine the outcome is
"rand < 0.3"
Now, I have a 30% chance of getting Head.
Is it possible to write a method that will take an arbitrary expression (that evaluates to a boolean!) and return the probability that it resolves to true?
def get_expected_probability(expr)
# Returns the probability the `expr` returns true
# `rand < 0.5` would return 0.5
# `rand < 0.3` would return 0.3
# `true` would return 1
# `false` would return 0
end

My guess would be that it would be theoreticially possible to write such a method, assuming you restricted yourself to rand and deterministic mathematical functions and had complete knowledge of the systems floating point implementation, etc.
It would be much more straightforward, however, to approximate the probability by executing the expression a large number of times and keeping track of the percentage of times it succeeded.

For simple comparisons to a uniform random number, yes, but in general, no. It depends on the distribution of the expression you're using to determine your boolean, and you could write arbitrarily complex expressions with bizarre distributions. However, it's pretty straightforward to estimate the probability empirically.
Create a Bernoulli (0/1) outcome based on the expression, yielding 1 when the expression is true and 0 when it is false. Generate a large number (n) of them. The long run average of the Bernoulli outcomes will converge to the probability of getting a true. If you call that p-hat and the true value is p, then p-hat should fall within the range p +/- (1.96 * sqrt(p*(1-p)/n)) 95% of the time. You can see from this that the larger the sample size n is, the more precise your estimate is.

An incredibly slow way of approximating this would be to evaluate the expression a very large number of times and estimate the probability it converges to. The Law of Large Numbers guarantees that as n approaches infinity, it will be that probability.
$expr = "rand < 0.5"
def get_result(expr)
eval(expr)
end
n = 1000000
a = Array.new(n)
n.times do |i|
a[i] = eval($expr)
end
puts a.count(true)/n.to_f
Returned 0.499899 for me.

Related

How to get the number e (2.718) using a random number sensor?

Is it possible to calculate the number e (2.718) using random numbers?
I'm assuming that when you say "using random numbers" you mean "using some sort of random sampling scheme." If you want the exact answer to an infinite number of decimals, the answer is "no, not unless you have an infinite amount of time." However, we can generate random sequences whose expected value is e, and we can assess the sampling error using basic statistics. By increasing the sample size, we can decrease the sampling error to any precision you want as long as you specify your desired confidence level.
It turns out that if you sum a bunch of random uniform(0,1)'s until the sum exceeds 1, the quantity of uniforms required has an expected value of e. We can turn that into a sampling problem by writing a method/function to return the count, and taking the average of the values obtained by calling that method multiple times.
You didn't specify any particular language, so here it is in Ruby (which is practically like pseudocode):
require 'quickstats' # install from rubygems w/ 'gem install quickstats'
def trial # generate results of one trial
count = 0
sum = 0.0
while sum < 1.0
count += 1
sum += rand # Ruby's rand produces U(0,1) values by default
end
return count # added "return" keyword for non-rubyists' readability
end
stats = QuickStats.new
10_000_000.times { stats.new_obs trial } # more precision? bump up sample size
puts "Average = #{stats.avg}"
half_width = 1.96 * stats.std_err
puts "CI half-width = #{half_width}"
deviation = (stats.avg - Math::E).abs
puts " |E - avg| = #{deviation} (should be ≤ half-width 95% of the time)"
This runs in under 4 seconds on my laptop and produces outputs such as:
Average = 2.7179918000002234
CI half-width = 0.0005421324752620413
|E - avg| = 0.0002900284588216451 (should be ≤ half-width 95% of the time)
Here’s another option. Consider the following probability question: you have a biased coin that comes up heads with probability 1/n. You then flip the coin n times. What is the probability that you never flip heads? Well, that’s the probability that you flip tails n times, which is (1 - 1/n)n, which as n tends towards infinity starts to rapidly approach 1/e. You could therefore estimate e by picking some modest value of n, simulating n tosses of a coin that comes up heads with probability 1/n, and seeing whether you never flip heads. The proportion of trials that don’t yield heads will approach 1/e, and from there you can estimate e.
For example, here's Python code to flip a coin with heads probability 1/n a total of n times (done by sampling a uniformly random number between 0 and 1) and see if all of them are tails:
from random import random
def one_trial(n):
for i in range(n):
if random() < 1 / n:
return False
return True
We can then run a large number of trials and see which fraction of them are all tails. That fraction will be approximately 1/e, so we just take the reciprocal:
def estimate_e(n, num_trials):
successes = 0
for i in range(num_trials):
if one_trial(n):
successes += 1
return num_trials / successes
Doing this with n = 210 and num_trials = 220 gave me the estimate
e ≈ 2.7198016257969466,
which isn't too bad.

Ruby function to calculate average search time for a skip list

I'm trying to write a ruby function to determine the average expected search time for a skip list. I don't have a strong math background and I believe the results I'm getting from this function are not correct.
n = number of elements in the list
base = denominator of the promotion probability. i.e. if 1 of 4 nodes are promoted base = 4
def lookup_eficiency(n, base)
return (Math.log(n, base)*(base/2.0))
end
How do I express an equation in Ruby which will take the number of elements in a skip-list and a base and return the average search time?
Since complexity of a skip list look up is O(logbase(n/base)), how about this ?
def lookup_efficiency(n, base)
Math.log(n/base)/Math.log(base)
end
Make sure your base is a float so you don't end up with integer division !

Choosing individuals from a population, by a fitness function

I've been working on an algorithm, where I need to choose n individuals from a population of size k, where k is much bigger than n. All individuals have a fitness value, therefore the selection should favor higher fitness values. However, I don't want to simply choose best n individuals, the worse ones should have a chance also. (Natural selection)
So, I decided to find the min and max fitness values within population. So, any individual would have
p = (current - min) / (max - min)
probability to be chosen, but I can not just iterate over all of them, roll the dice and choose one if the probability holds, because then I end up with more than n individuals. I could shuffle the list and iterate from front, till I obtain up to n individuals, but that might miss great ones to the end of list.
I also could perform more than one passes, until the remaining population size reaches to n. But this might favor better ones a lot, and converge to the naive selection method I mentioned.
Any suggestion, or references to such a selection process? I could do some reading on relevant statistical methods if you can refer any.
Thanks.
Use Roulette-wheel selection. The basic idea is that you assign an area of the roulette-wheel relative to the probability size:
Then you simply spin it n times to select the individuals you want.
Sample implementation in ruby:
def roulette(population, n)
probs = population.map { |gene| gene.probability } # TODO: Implement this
selected = []
n.times do
r, inc = rand * probs.max, 0 # pick a random number and select the individual
# corresponding to that roulette-wheel area
population.each_index do |i|
if r < (inc += probs[i])
selected << population[i]
# make selection not pick sample twice
population.delete_at i
probs.delete_at i
break
end
end
end
return selected
end
Note: if you are a Ruby hacker, you see that the code could be much shorter with more Rubyisms, however I wanted the algorithm to be as clear as possible.

How to compute the "15% of the time" randomness?

I'm looking for a decent, elegant method of calculating this simple logic.
Right now I can't think of one, it's spinning my head.
I am required to do some action only 15% of the time.
I'm used to "50% of the time" where I just mod the milliseconds of the current time and see if it's odd or even, but I don't think that's elegant.
How would I elegantly calculate "15% of the time"? Random number generator maybe?
Pseudo-code or any language are welcome.
Hope this is not subjective, since I'm looking for the "smartest" short-hand method of doing that.
Thanks.
Solution 1 (double)
get a random double between 0 and 1 (whatever language you use, there must be such a function)
do the action only if it is smaller than 0.15
Solution 2 (int)
You can also achieve this by creating a random int and see if it is dividable to 6 or 7. UPDATE --> This is not optimal.
You can produce a random number between 0 and 99, and check if it's less than 15:
if (rnd.Next(100) < 15) ...
You can also reduce the numbers, as 15/100 is the same as 3/20:
if (rnd.Next(20) < 3) ...
Random number generator would give you the best randomness. Generate a random between 0 and 1, test for < 0.15.
Using the time like that isn't true random, as it's influenced by processing time. If a task takes less than 1 millisecond to run, then the next random choice will be the same one.
That said, if you do want to use the millisecond-based method, do milliseconds % 20 < 3.
Just use a PRNG. Like always, it's a performance v. accuracy trade-off. I think making your own doing directly off the time is a waste of time (pun intended). You'll probably get biasing effects even worse than a run of the mill linear congruential generator.
In Java, I would use nextInt:
myRNG.nextInt(100) < 15
Or (mostly) equivalently:
myRNG.nextInt(20) < 3
There are way to get a random integer in other languages (multiple ways actually, depending how accurate it has to be).
Using modulo arithmetic you can easily do something every Xth run like so
(6 will give you ruthly 15%
if( microtime() % 6 === ) do it
other thing:
if(rand(0,1) >= 0.15) do it
boolean array[100] = {true:first 15, false:rest};
shuffle(array);
while(array.size > 0)
{
// pop first element of the array.
if(element == true)
do_action();
else
do_something_else();
}
// redo the whole thing again when no elements are left.
Here's one approach that combines randomness and a guarantee that eventually you get a positive outcome in a predictable range:
Have a target (15 in your case), a counter (initialized to 0), and a flag (initialized to false).
Accept a request.
If the counter is 15, reset the counter and the flag.
If the flag is true, return negative outcome.
Get a random true or false based on one of the methods described in other answers, but use a probability of 1/(15-counter).
Increment counter
If result is true, set flag to true and return a positive outcome. Else return a negative outcome.
Accept next request
This means that the first request has probability of 1/15 of return positive, but by the 15th request, if no positive result has been returned, there's a probability of 1/1 of a positive result.
This quote is from a great article about how to use a random number generator:
Note: Do NOT use
y = rand() % M;
as this focuses on the lower bits of
rand(). For linear congruential random
number generators, which rand() often
is, the lower bytes are much less
random than the higher bytes. In fact
the lowest bit cycles between 0 and 1.
Thus rand() may cycle between even and
odd (try it out). Note rand() does not
have to be a linear congruential
random number generator. It's
perfectly permissible for it to be
something better which does not have
this problem.
and it contains formulas and pseudo-code for
r = [0,1) = {r: 0 <= r < 1} real
x = [0,M) = {x: 0 <= x < M} real
y = [0,M) = {y: 0 <= y < M} integer
z = [1,M] = {z: 1 <= z <= M} integer

How can I randomly iterate through a large Range?

I would like to randomly iterate through a range. Each value will be visited only once and all values will eventually be visited. For example:
class Array
def shuffle
ret = dup
j = length
i = 0
while j > 1
r = i + rand(j)
ret[i], ret[r] = ret[r], ret[i]
i += 1
j -= 1
end
ret
end
end
(0..9).to_a.shuffle.each{|x| f(x)}
where f(x) is some function that operates on each value. A Fisher-Yates shuffle is used to efficiently provide random ordering.
My problem is that shuffle needs to operate on an array, which is not cool because I am working with astronomically large numbers. Ruby will quickly consume a large amount of RAM trying to create a monstrous array. Imagine replacing (0..9) with (0..99**99). This is also why the following code will not work:
tried = {} # store previous attempts
bigint = 99**99
bigint.times {
x = rand(bigint)
redo if tried[x]
tried[x] = true
f(x) # some function
}
This code is very naive and quickly runs out of memory as tried obtains more entries.
What sort of algorithm can accomplish what I am trying to do?
[Edit1]: Why do I want to do this? I'm trying to exhaust the search space of a hash algorithm for a N-length input string looking for partial collisions. Each number I generate is equivalent to a unique input string, entropy and all. Basically, I'm "counting" using a custom alphabet.
[Edit2]: This means that f(x) in the above examples is a method that generates a hash and compares it to a constant, target hash for partial collisions. I do not need to store the value of x after I call f(x) so memory should remain constant over time.
[Edit3/4/5/6]: Further clarification/fixes.
[Solution]: The following code is based on #bta's solution. For the sake of conciseness, next_prime is not shown. It produces acceptable randomness and only visits each number once. See the actual post for more details.
N = size_of_range
Q = ( 2 * N / (1 + Math.sqrt(5)) ).to_i.next_prime
START = rand(N)
x = START
nil until f( x = (x + Q) % N ) == START # assuming f(x) returns x
I just remembered a similar problem from a class I took years ago; that is, iterating (relatively) randomly through a set (completely exhausting it) given extremely tight memory constraints. If I'm remembering this correctly, our solution algorithm was something like this:
Define the range to be from 0 to
some number N
Generate a random starting point x[0] inside N
Generate an iterator Q less than N
Generate successive points x[n] by adding Q to
the previous point and wrapping around if needed. That
is, x[n+1] = (x[n] + Q) % N
Repeat until you generate a new point equal to the starting point.
The trick is to find an iterator that will let you traverse the entire range without generating the same value twice. If I'm remembering correctly, any relatively prime N and Q will work (the closer the number to the bounds of the range the less 'random' the input). In that case, a prime number that is not a factor of N should work. You can also swap bytes/nibbles in the resulting number to change the pattern with which the generated points "jump around" in N.
This algorithm only requires the starting point (x[0]), the current point (x[n]), the iterator value (Q), and the range limit (N) to be stored.
Perhaps someone else remembers this algorithm and can verify if I'm remembering it correctly?
As #Turtle answered, you problem doesn't have a solution. #KandadaBoggu and #bta solution gives you random numbers is some ranges which are or are not random. You get clusters of numbers.
But I don't know why you care about double occurence of the same number. If (0..99**99) is your range, then if you could generate 10^10 random numbers per second (if you have a 3 GHz processor and about 4 cores on which you generate one random number per CPU cycle - which is imposible, and ruby will even slow it down a lot), then it would take about 10^180 years to exhaust all the numbers. You have also probability about 10^-180 that two identical numbers will be generated during a whole year. Our universe has probably about 10^9 years, so if your computer could start calculation when the time began, then you would have probability about 10^-170 that two identical numbers were generated. In the other words - practicaly it is imposible and you don't have to care about it.
Even if you would use Jaguar (top 1 from www.top500.org supercomputers) with only this one task, you still need 10^174 years to get all numbers.
If you don't belive me, try
tried = {} # store previous attempts
bigint = 99**99
bigint.times {
x = rand(bigint)
puts "Oh, no!" if tried[x]
tried[x] = true
}
I'll buy you a beer if you will even once see "Oh, no!" on your screen during your life time :)
I could be wrong, but I don't think this is doable without storing some state. At the very least, you're going to need some state.
Even if you only use one bit per value (has this value been tried yes or no) then you will need X/8 bytes of memory to store the result (where X is the largest number). Assuming that you have 2GB of free memory, this would leave you with more than 16 million numbers.
Break the range in to manageable batches as shown below:
def range_walker range, batch_size = 100
size = (range.end - range.begin) + 1
n = size/batch_size
n.times do |i|
x = i * batch_size + range.begin
y = x + batch_size
(x...y).sort_by{rand}.each{|z| p z}
end
d = (range.end - size%batch_size + 1)
(d..range.end).sort_by{rand}.each{|z| p z }
end
You can further randomize solution by randomly choosing the batch for processing.
PS: This is a good problem for map-reduce. Each batch can be worked by independent nodes.
Reference:
Map-reduce in Ruby
you can randomly iterate an array with shuffle method
a = [1,2,3,4,5,6,7,8,9]
a.shuffle!
=> [5, 2, 8, 7, 3, 1, 6, 4, 9]
You want what's called a "full cycle iterator"...
Here is psudocode for the simplest version which is perfect for most uses...
function fullCycleStep(sample_size, last_value, random_seed = 31337, prime_number = 32452843) {
if last_value = null then last_value = random_seed % sample_size
return (last_value + prime_number) % sample_size
}
If you call this like so:
sample = 10
For i = 1 to sample
last_value = fullCycleStep(sample, last_value)
print last_value
next
It would generate random numbers, looping through all 10, never repeating If you change random_seed, which can be anything, or prime_number, which must be greater than, and not be evenly divisible by sample_size, you will get a new random order, but you will still never get a duplicate.
Database systems and other large-scale systems do this by writing the intermediate results of recursive sorts to a temp database file. That way, they can sort massive numbers of records while only keeping limited numbers of records in memory at any one time. This tends to be complicated in practice.
How "random" does your order have to be? If you don't need a specific input distribution, you could try a recursive scheme like this to minimize memory usage:
def gen_random_indices
# Assume your input range is (0..(10**3))
(0..3).sort_by{rand}.each do |a|
(0..3).sort_by{rand}.each do |b|
(0..3).sort_by{rand}.each do |c|
yield "#{a}#{b}#{c}".to_i
end
end
end
end
gen_random_indices do |idx|
run_test_with_index(idx)
end
Essentially, you are constructing the index by randomly generating one digit at a time. In the worst-case scenario, this will require enough memory to store 10 * (number of digits). You will encounter every number in the range (0..(10**3)) exactly once, but the order is only pseudo-random. That is, if the first loop sets a=1, then you will encounter all three-digit numbers of the form 1xx before you see the hundreds digit change.
The other downside is the need to manually construct the function to a specified depth. In your (0..(99**99)) case, this would likely be a problem (although I suppose you could write a script to generate the code for you). I'm sure there's probably a way to re-write this in a state-ful, recursive manner, but I can't think of it off the top of my head (ideas, anyone?).
[Edit]: Taking into account #klew and #Turtle's answers, the best I can hope for is batches of random (or close to random) numbers.
This is a recursive implementation of something similar to KandadaBoggu's solution. Basically, the search space (as a range) is partitioned into an array containing N equal-sized ranges. Each range is fed back in a random order as a new search space. This continues until the size of the range hits a lower bound. At this point the range is small enough to be converted into an array, shuffled, and checked.
Even though it is recursive, I haven't blown the stack yet. Instead, it errors out when attempting to partition a search space larger than about 10^19 keys. I has to do with the numbers being too large to convert to a long. It can probably be fixed:
# partition a range into an array of N equal-sized ranges
def partition(range, n)
ranges = []
first = range.first
last = range.last
length = last - first + 1
step = length / n # integer division
((first + step - 1)..last).step(step) { |i|
ranges << (first..i)
first = i + 1
}
# append any extra onto the last element
ranges[-1] = (ranges[-1].first)..last if last > step * ranges.length
ranges
end
I hope the code comments help shed some light on my original question.
pastebin: full source
Note: PW_LEN under # options can be changed to a lower number in order to get quicker results.
For a prohibitively large space, like
space = -10..1000000000000000000000
You can add this method to Range.
class Range
M127 = 170_141_183_460_469_231_731_687_303_715_884_105_727
def each_random(seed = 0)
return to_enum(__method__) { size } unless block_given?
unless first.kind_of? Integer
raise TypeError, "can't randomly iterate from #{first.class}"
end
sample_size = self.end - first + 1
sample_size -= 1 if exclude_end?
j = coprime sample_size
v = seed % sample_size
each do
v = (v + j) % sample_size
yield first + v
end
end
protected
def gcd(a,b)
b == 0 ? a : gcd(b, a % b)
end
def coprime(a, z = M127)
gcd(a, z) == 1 ? z : coprime(a, z + 1)
end
end
You could then
space.each_random { |i| puts i }
729815750697818944176
459631501395637888351
189447252093456832526
919263002791275776712
649078753489094720887
378894504186913665062
108710254884732609237
838526005582551553423
568341756280370497598
298157506978189441773
27973257676008385948
757789008373827330134
487604759071646274309
217420509769465218484
947236260467284162670
677052011165103106845
406867761862922051020
136683512560740995195
866499263258559939381
596315013956378883556
326130764654197827731
55946515352016771906
785762266049835716092
515578016747654660267
...
With a good amount of randomness so long as your space is a few orders smaller than M127.
Credit to #nick-steele and #bta for the approach.
This isn't really a Ruby-specific answer but I hope it's permitted. Andrew Kensler gives a C++ "permute()" function that does exactly this in his "Correlated Multi-Jittered Sampling" report.
As I understand it, the exact function he provides really only works if your "array" is up to size 2^27, but the general idea could be used for arrays of any size.
I'll do my best to sort of explain it. The first part is you need a hash that is reversible "for any power-of-two sized domain". Consider x = i + 1. No matter what x is, even if your integer overflows, you can determine what i was. More specifically, you can always determine the bottom n-bits of i from the bottom n-bits of x. Addition is a reversible hash operation, as is multiplication by an odd number, as is doing a bitwise xor by a constant. If you know a specific power-of-two domain, you can scramble bits in that domain. E.g. x ^= (x & 0xFF) >> 5) is valid for the 16-bit domain. You can specify that domain with a mask, e.g. mask = 0xFF, and your hash function becomes x = hash(i, mask). Of course you can add a "seed" value into that hash function to get different randomizations. Kensler lays out more valid operations in the paper.
So you have a reversible function x = hash(i, mask, seed). The problem is that if you hash your index, you might end up with a value that is larger than your array size, i.e. your "domain". You can't just modulo this or you'll get collisions.
The reversible hash is the key to using a technique called "cycle walking", introduced in "Ciphers with Arbitrary Finite Domains". Because the hash is reversible (i.e. 1-to-1), you can just repeatedly apply the same hash until your hashed value is smaller than your array! Because you're applying the same hash, and the mapping is one-to-one, whatever value you end up on will map back to exactly one index, so you don't have collisions. So your function could look something like this for 32-bit integers (pseudocode):
fun permute(i, length, seed) {
i = hash(i, 0xFFFF, seed)
while(i >= length): i = hash(i, 0xFFFF, seed)
return i
}
It could take a lot of hashes to get to your domain, so Kensler does a simple trick: he keeps the hash within the domain of the next power of two, which makes it require very few iterations (~2 on average), by masking out the unnecessary bits. The final algorithm looks like this:
fun next_pow_2(length) {
# This implementation is for clarity.
# See Kensler's paper for one way to do it fast.
p = 1
while (p < length): p *= 2
return p
}
permute(i, length, seed) {
mask = next_pow_2(length)-1
i = hash(i, mask, seed) & mask
while(i >= length): i = hash(i, mask, seed) & mask
return i
}
And that's it! Obviously the important thing here is choosing a good hash function, which Kensler provides in the paper but I wanted to break down the explanation. If you want to have different random permutations each time, you can add a "seed" value to the permute function which then gets passed to the hash function.

Resources