Shouldn't the same srand value produce the same random numbers? - random

When I repeatedly run this code,
srand 1;
my #x = (1..1000).pick: 100;
say sum #x;
I get different answers each time. If I'm resetting with srand why shouldn't it produce the same random numbers each time?
The error occurs in the REPL.
The error occurs in this file:
use v6.d;
srand 1;
my $x = rand;
say $x; # OUTPUT: 0.5511548437617427
srand 1;
$x = rand;
say $x; # OUTPUT: 0.308302962221659
say $*KERNEL; # OUTPUT: darwin
I'm using:
Welcome to Rakudo™ v2022.07.
Implementing the Raku® Programming Language v6.d.
Built on MoarVM version 2022.07.

It should produce the same numbers for a given piece of code all of the time. And I haven't been able to reproduce your observation in any way.
There may be something spooky going on under the hood, though:
$ raku -e 'srand 1; (my $x = (1..1000).pick(1)).say'
(761)
$ raku -e 'srand 1; (my #x = (1..1000).pick(1)).say'
[471]
On the surface, you'd say that these values should be the same, as each only generates a single value. But apparently a different number of random values is actually calculated under the hood, causing the visibly different values. Is that perhaps what is going on in your case?

(This answer is a paraphrase of jnthn's comment in the GitHub issue opened based on this question).
Setting srand 1 will cause the same sequence of random numbers to be generated -- that is, the nth random number will be the same. However, since Raku (really, Rakudo and/or MoarVM, assuming you're using those backends) uses random numbers internally, you won't always be in the same position in that sequence (i.e., your n might be different) and thus you might not get the same random number.
This is further complicated by Rakudo's optimizer. Naively, repeating the same code later in the program should consume the same number of random numbers from the sequence. However, the optimizer may well remove some of those random number uses from subsequent calls, which can result in different random numbers.
I'm unclear to what degree the current behavior is intended versus a bug in Rakudo/MoarVM's implementation; please see the previously linked issue for additional details.

Related

Lua: What is typical approach for using calculated values in a for loop?

What is the typical approach in LUA (before the introduction of integers in 5.3) for dealing with calculated range values in for loops? Mathematical calculations on the start and end values in a numerical for loop put the code at risk of bugs, possibly nasty latent ones as this will only occur on certain values and/or with changes to calculation ordering. Here's a concocted example of a loop not producing the desire output:
a={"a","b","c","d","e"}
maybethree = 3
maybethree = maybethree / 94
maybethree = maybethree * 94
for i = 1,maybethree do print(a[i]) end
This produces the unforuntate output of two items rather than the desired three (tested on 5.1.4 on 64bit x86):
a
b
Programmers unfamiliar with this territory might be further confused by print() output as that prints 3!
The application of a rounding function to the nearest whole number could work here. I understand the approximatation with FP and why this fails, I'm interested in what the typical style/solution is for this in LUA.
Related questions:
Lua for loop does not do all iterations
Lua: converting from float to int
The solution is to avoid this reliance on floating-point math where floating-point precision may become an issue. Or, more realistically, just be aware of when you are using FP and be mindul of the precision issue. This isn’t a Lua problem that requires a Lua-specific solution.
maybethree is a misnomer: it is never three. Your code above is deterministic. It will always print just a and b. Since the maybethree variable is less than three, of course the for loop would not execute 3 times.
The print function is also behaving as defined/expected. Use string.format to show thr FP number in all its glory:
print(string.format("%1.16f", maybethree)) -- 2.9999999999999996
Still need to use calculated values to control your for loop? Then you already mentioned the answer: implement a rounding function.

Random number generation from 1 to 7

I was going through Google Interview Questions. to implement the random number generation from 1 to 7.
I did write a simple code, I would like to understand if in the interview this question asked to me and if I write the below code is it Acceptable or not?
import time
def generate_rand():
ret = str(time.time()) # time in second like, 12345.1234
ret = int(ret[-1])
if ret == 0 or ret == 1:
return 1
elif ret > 7:
ret = ret - 7
return ret
return ret
while 1:
print(generate_rand())
time.sleep(1) # Just to see the output in the STDOUT
(Since the question seems to ask for analysis of issues in the code and not a solution, I am not providing one. )
The answer is unacceptable because:
You need to wait for a second for each random number. Many applications need a few hundred at a time. (If the sleep is just for convenience, note that even a microsecond granularity will not yield true random numbers as the last microsecond will be monotonically increasing until 10us are reached. You may get more than a few calls done in a span of 10us and there will be a set of monotonically increasing pseudo-random numbers).
Random numbers have uniform distribution. Each element should have the same probability in theory. In this case, you skew 1 more (twice the probability for 0, 1) and 7 more (thrice the probability for 7, 8, 9) compared to the others in the range 2-6.
Typically answers to this sort of a question will try to get a large range of numbers and distribute the ranges evenly from 1-7. For example, the above method would have worked fine if u had wanted randomness from 1-5 as 10 is evenly divisible by 5. Note that this will only solve (2) above.
For (1), there are other sources of randomness, such as /dev/random on a Linux OS.
You haven't really specified the constraints of the problem you're trying to solve, but if it's from a collection of interview questions it seems likely that it might be something like this.
In any case, the answer shown would not be acceptable for the following reasons:
The distribution of the results is not uniform, even if the samples you read from time.time() are uniform.
The results from time.time() will probably not be uniform. The result depends on the time at which you make the call, and if your calls are not uniformly distributed in time then the results will probably not be uniformly distributed either. In the worst case, if you're trying to randomise an array on a very fast processor then you might complete the entire operation before the time changes, so the whole array would be filled with the same value. Or at least large chunks of it would be.
The changes to the random value are highly predictable and can be inferred from the speed at which your program runs. In the very-fast-computer case you'll get a bunch of x followed by a bunch of x+1, but even if the computer is much slower or the clock is more precise, you're likely to get aliasing patterns which behave in a similarly predictable way.
Since you take the time value in decimal, it's likely that the least significant digit doesn't visit all possible values uniformly. It's most likely a conversion from binary to some arbitrary number of decimal digits, and the distribution of the least significant digit can be quite uneven when that happens.
The code should be much simpler. It's a complicated solution with many special cases, which reflects a piecemeal approach to the problem rather than an understanding of the relevant principles. An ideal solution would make the behaviour self-evident without having to consider each case individually.
The last one would probably end the interview, I'm afraid. Perhaps not if you could tell a good story about how you got there.
You need to understand the pigeonhole principle to begin to develop a solution. It looks like you're reducing the time to its least significant decimal digit for possible values 0 to 9. Legal results are 1 to 7. If you have seven pigeonholes and ten pigeons then you can start by putting your first seven pigeons into one hole each, but then you have three pigeons left. There's nowhere that you can put the remaining three pigeons (provided you only use whole pigeons) such that every hole has the same number of pigeons.
The problem is that if you pick a pigeon at random and ask what hole it's in, the answer is more likely to be a hole with two pigeons than a hole with one. This is what's called "non-uniform", and it causes all sorts of problems, depending on what you need your random numbers for.
You would either need to figure out how to ensure that all holes are filled equally, or you would have to come up with an explanation for why it doesn't matter.
Typically the "doesn't matter" answer is that each hole has either a million or a million and one pigeons in it, and for the scale of problem you're working with the bias would be undetectable.
Using the same general architecture you've created, I would do something like this:
import time
def generate_rand():
ret = str(time.time()) # time in second like, 12345.1234
ret = ret % 8 # will return pseudorandom numbers 0-7
if ret == 0:
return 1 # or you could also return the result of another call to generate_rand()
return ret
while 1:
print(generate_rand())
time.sleep(1)

YAP Prolog random's lack of randomness

When executing the following Prolog program with YAP, the output is always the same, namely the integer 233.
:- use_module(library(random)).
x:- random(1,1000,X), writeln(X).
For instance, if I execute the following bash script, the output is always the same integer (233).
for k in `seq 0`
do
yap -l test.pl << %
x.
%
done
If I repeat this procedure using swipl, then the output is different each time, i.e random.
Can anyone explain this?
First things first!
For many good reasons (reproducibility of past results being the most important one) computer programs do not work with actual random numbers, but with pseudo random numbers.
PRNGs are fully deterministic functions that, given the same internal state (a.k.a. "seed") as initialization, produce exactly the same sequence of numbers (from now until eternity).
Quick fix: find yourself a suitable seed (date, time, phase of the moon, ...) and explicitly initialize the PRNG with that seed. Record the seed so you can later deterministically re-run past experiments.
usually, random generators require something like set_seed(SomeReallyRandomValue) call, in C was often used seed(time(0)).
So, I guess
datime(datime(_Year, _Month, _DayOfTheMonth, _Hour, Minute, Second)),
X is Minute * Second,Y=X,Z=X,
setrand(rand(X,Y,Z)),
could work

How do I interpret the results from dieharder for great justice

This is a question about an SO question; I don't think it belongs in meta despite being sp by definition, but if someone feels it should go to math, cross-validated, etc., please let me know.
Background:
#ForceBru asked this question about how to generate a 64 bit random number using rand(). #nwellnhof provided an answer that was accepted that basically takes the low 15 bits of 5 random numbers (because MAXRAND is apparently only guaranteed to be 15bits on at least some compilers) and glues them together and then drops the first 11 bits (15*5-64=11). #NikBougalis made a comment that while this seems reasonable, it won't pass many statistical tests of randomnes. #Foon (me) asked for a citation or an example of a test that it would fail. #NikBougalis replied with an answer that didn't elucidate me; #DavidSwartz suggested running it against dieharder.
So, I ran dieharder. I ran it against the algorithm in question
unsigned long long llrand() {
unsigned long long r = 0;
for (int i = 0; i < 5; ++i) {
r = (r << 15) | (rand() & 0x7FFF);
}
return r & 0xFFFFFFFFFFFFFFFFULL;
}
For comparison, I also ran it against just rand() and just 8bits of rand() at at time.
void rand_test()
{
int x;
srand(1);
while(1)
{
x = rand();
fwrite(&x,sizeof(x),1,stdout);
}
void rand_byte_test()
{
srand(1);
while(1)
{
x = rand();
c = x % 256;
fwrite(&c,sizeof(c),1,stdout);
}
}
The algorithm under question came back with two tests showing weakenesses for rgb_lagged_sum for ntuple=28 and one of the sts_serials for ntuple=8.
The just using rand() failed horribly on many tests, presumably because I'm taking a number that has 15 bits of randomness and passing it off as 32 bits of randomness.
The using the low 8 bits of rand() at a time came back as weak for rgb_lagged_sum with ntuple 2, and (edit) failed dab_monobit, with tuple 12
My question(s) is:
Am I interpretting the results for 8 bits of randomly correctly, namely that given that one of the tests (which was marked as "good"; for the record, it also came back as weak for one of the dieharder tests marked "suspect"), came as weak and one as failed, rand()'s randomness should be suspected.
Am I interpretting the results for the algorithm under test correctly (namely that this should also be marginally suspected)
Given the description of what the tests that came back as weak do (e.g for sts_serial looks at whether the distribution of bit patterns of a certain size is valid), should I be able to determine what the bias likely is
If 3, since I'm not, can someone point out what I should be seeing?
Edit: understood that rand() isn't guaranteed to be great. Also, I tried to think what values would be less likely, and surmised zero, maxvalue, or repeated numbers might be... but doing a test of 1000000000 tries, the ratio is very near the expected value of 1 out of every 2^15 times (e.g., in 1000000000 runs, we saw 30512 zeros, 30444 max, and 30301 repeats, and bc says that 30512 * 2^15 is 999817216; other runs had similar ratios including cases where max and/or repeat was larger than zeros.
When you run dieharder the column you really need to watch is the p-value column.
The p-value column essentially says: "This is the probability that real random numbers could have produced this result." You want it to be uniformly distributed between 0 and 1.
You'll also want to run it multiple times on suspect cases. For instance, if you have a column with a p-value of (for instance) .03 then if you re-run it, you still have .03 (rather than some higher value) then you can have a high confidence that your random number generator performs poorly on that test and it's not just a 3% fluke. However, if you get a high value, then you're probably looking at a statistical fluke. But it cuts both ways.
Ultimately, knowing facts about random or pseudorandom processes is difficult. But armed with dieharder you have approximate knowledge of many things.

sas treatment of seed when generating random distributions

I needed to generate a poisson distribution in excel and found a method (Inverse Transform Method)
did it in excel and then in sas (just for fun, so I do not need a quick answer) to compare with the ranpoi sas function.
Here my code (which works):
data Poisson(keep=mean Poisson PoissonSas);
mean=0.2;
confronta=exp(-mean);
do obs=1 to 100;
found=0;
Poisson=0;
ranuni=1;
do until(found=1);
ranuni=ranuni*ranuni(12547);
if ranuni<confronta then found=1;
else Poisson=Poisson+1;
end;
PoissonSas=ranpoi(012584,mean);
output;
end;
run;
proc means data=Poisson(drop=mean);run;
So I initialized the seed in both random functions to replicate results.
The strange thing is that I get different results depending on whether I submit the data step with both methods or only one of them (commenting the other), but the same results over and over for each type of submission.
I expected the same results always! Why this is not so?
(I am using sas 9.3)
Thanks!
It looks like SAS is interleaving the calls to the PRNGs as a single stream. Pseudo-random numbers are a sequence of values that are actually deterministic. If you seed and use the sequence in one algorithm, you'll get the same results every time for that algorithm. If you use the sequence alternating between two or more algorithms, the set of algorithms will always yield the same set of results (which seems to be the case for you), but the results for a given algorithm will be different because some of the underlying PRNs it was drawing before are now being used by the other algorithms. This is at the core of the synchronization requirement when using so-called variance reduction techniques based on common random numbers. In general, if you want identical results the solution is to have multiple instances of your PRNG, one for each "source" of randomness in your program, and to seed the multiple sources independently of each other but identically across runs. It looks like you tried to do this, but SAS doesn't behave the way you think it does. According to their documentation, it appears that they produce a single PRN stream based on the first seed entry in your code! This is a subset of one of their examples:
/* This DATA step calls the RANUNI and the RANNOR functions */
/* and produces a single stream of random numbers based on */
/* a seed value of 7. */
data d;
d = ranuni (7); f = ' '; output;
d = ranuni (8); f = ' '; output;
d = rannor (9); f = 'n'; output;
/* they actually have more... */
run;
By the way, your Poisson algorithm is not generally regarded as an inverse transform algorithm. Inversion is 1-to-1, i.e., a single input uniform produces a single random variable. The loop you're performing is actually doing acceptance/rejection, and you use a variable number of uniforms to come up with each Poisson value.
PJS's answer is essentially correct, but a few clarifications.
SAS does indeed use a single seed when you do it the way you did; all of what I'd call 'primitive' random functions work off of one PRNG stream, and only the first seed matters (and only matters the first time it's encountered).
However, RANPOI is a little different - probably because of how SAS creates poissons. It's not made clear in the documentation, but it appears that it uses up two random numbers (not sure if it's always two, or just coincidence). See the following test:
data test;
U=ranuni(7);
P=ranpoi(8,100);
put u= p=;
run;
data test2;
p=ranpoi(8,100);
u=ranuni(7);
put u= p=;
run;
data test3;
u=ranuni(8);
p=ranuni(7);
put u= p=;=
run;
data test4;
u=ranuni(7);
p=ranuni(8);
put u= p=;
run;
data test5;
do _t = 1 to 5;
u=ranuni(8);
put u=;
end;
run;
Now, in test4, we see the first two ranuni's when starting with seed 7, and indeed the first one matches the first one from test. However, test3 has the first two starting with seed 8, and the second one does not match the one from test2! test5 shows that in fact the third matches, meaning ranpoi in test2 used up 2 numbers from the stream.
In any event, if you want to change the seed midstream, you have two options.
One is to use CALL RANPOI (and CALL RANUNI), which allow you to store the seed in a variable. Two is to use RAND function, which works with CALL STREAMINIT to set seeds whenever you want to. The RAND function is considered 'better' than the more primitive RANPOI and such - it uses a better PRNG algorithm.

Resources