Randomness of Shuffle in Python - random

I am referring to Python random modules' shuffle function.
from random import shuffle
list = [1, 2, 3, 4]
shuffle(list)
I would guess that above function uses random seed. I know that in C, rand function iterates over a few random seed numbers in computer. Therefore, when looping over the function, random function does not become random anymore.
Does shuffle function work similar to rand function in C? If so, how can I add my own seed that is random? (I am thinking of using time in millisecond to come up with unique value).
Previously posted comment on the accepted answer on this question but could not get any response (Shuffling a list of objects in python)
EDIT:
I want to make sure that random shuffle does not repeat its shuffling method in a loop. For example,
I want to shuffle [1, 2, 3, 4, 5, 6]
I loop over 10000 times.
It produces results below:
[1, 3, 2, 4, 5, 6]
[2, 1, 4, 5, 3, 6]
... (large number of different combinations of shuffling)
[1, 3, 2, 4, 5, 6]
[2, 1, 4, 5, 3, 6]
... (repeats the pattern).
I want to avoid above behaviour because I am looping over a large number. Would above behaviour happen in the first place? If so, do I have to change the seed after certain number of loop?

Almost all module functions depend on the basic function random(), which
generates a random float uniformly in the semi-open range [0.0, 1.0).
Python uses the Mersenne Twister as the core generator. It produces
53-bit precision floats and has a period of 2**19937-1. The underlying
implementation in C is both fast and threadsafe. The Mersenne Twister
is one of the most extensively tested random number generators in
existence. However, being completely deterministic, it is not suitable
for all purposes, and is completely unsuitable for cryptographic
purposes.
The random module also provides the SystemRandom class which uses the
system function os.urandom() to generate random numbers from sources
provided by the operating system.
Warning The pseudo-random generators of this module should not be used
for security purposes. Use os.urandom() or SystemRandom if you require
a cryptographically secure pseudo-random number generator.
https://docs.python.org/2/library/random.html

I know that in C, rand function iterates over a few random seed numbers in computer.
No it doesn't. It uses whatever seed you set with srand, or 1 if you didn't call srand.
Does shuffle function work similar to rand function in C? If so, how can I add my own seed that is random? (I am thinking of using time in millisecond to come up with unique value).
Python's random module uses either system time or OS-provided randomness sources to seed the RNG:
random.seed([x])
Initialize the basic random number generator. Optional argument x can
be any hashable object. If x is omitted or None, current system time
is used; current system time is also used to initialize the generator
when the module is first imported. If randomness sources are provided
by the operating system, they are used instead of the system time (see
the os.urandom() function for details on availability).
Changed in version 2.4: formerly, operating system resources were not
used.

Related

understanding seed of a ByteTensor in PyTorch

I understand that a seed is a number used to initialize pseudo-random number generator. in pytorch, torch.get_rng_state documentation states as follows "Returns the random number generator state as a torch.ByteTensor.". and when i print it i get a 1-d tensor of size 5048 whose values are as shown below
tensor([ 80, 78, 248, ..., 0, 0, 0], dtype=torch.uint8)
why does a seed have 5048 values and how is this different from usual seed which we can get using torch.initial_seed
It sounds like you're thinking of the seed and the state as equivalent. For older pseudo-random number generators (PRNGs) that was true, but with more modern PRNGs tend to work as described here. (The answer in the link was written with respect to Mersenne Twister, but the concepts apply equally to other generators.)
Why is it a good idea to not have a 32- or 64-bit state space and report the state as the generator's output? Because if you do that, as soon as you see any value repeat the entire sequence will repeat. PRNGs were designed to be "full cycle," i.e., to iterate through the maximum number of values possible before repeating. This paper showed that the birthday problem could quickly (O(sqrt(cycle-length)) identify such PRNGs as non-random. This meant, for instance, that with 32-bit integers you shouldn't use more than ~50000 values before a statistician could call you out with a better than 99% level of confidence. The solution, used by many modern PRNGs, is to have a larger state space and collapse it down to output a 32- or 64-bit result. Since multiple states can produce the same output, duplicates will occur in the output stream without the entire stream being replicated. It looks like that's what PyTorch is doing.
Given the larger state space, why allow seeding with a single integer? Convenience. For instance, Mersenne Twister has a 19,937 bit state space, but most people don't want to enter that much info to kick-start it. You can if you want to, but most people use the front-end which populates the full state space from a single integer input.

Case study of streams of digits

I'm doing a case study of a random number portal. The portal displays a sequence of numbers (1 to 49) that changes every 4:25 (about 4 1/2 minutes) to a new sequence of numbers.
Examples:
previous stream:
36, 1, 37, 6, 17, 48
Current Stream :
45, 4, 49, 30, 41, 16
What will the next stream will be?
Can we reverse engineer the current output of streams of numbers to get the next stream ?
No. First of all, you specified a random portal -- which, by definition of "random" cannot be predicted from any preceding sequence of output.
If you mean a pseudo-random sequence, then reverse-engineering is theoretically possible, but you must have enough knowledge of the RNG (random-number generator) to reduce the possible outputs to 1 from the 6^49 possible sequences (you didn't specify numbers unique within the stream of 6; if that's another oversight, then it's 49!/(49-6)! If order is unimportant, then divide again by 6!).
Look at the value of information you've presented here: 12 numbers in a particular sequence. Divide the quantity of possible continuations by that value ... the result is far more than 1.
If you can provide the characteristics of the RNG, and those characteristics are sufficiently restrictive, then perhaps it's possible to determine the future sequence. Until then, the answer remains a resounding NO.
UPDATE per OP's comment
If the application is, indeed, a TRNG, then there's your answer: see my first paragraph.
If you're trying to implement a linear congruential RNG (e.g. the equation you posted), then simply check the myriad available hits and pick one that looks good to you. Getting a set of six numbers is a simply calling the generator six times.
Either way, there is still insufficient information to definitively obtain the parameters of even a generic linear congruential RNG. Do you have bounds on the values of a and c? Do you know the range of the X values and how they're converted to the range [1,49]?

Rust GSL library always returns the same number for a random number generator

I am using the rgsl library in Rust that wraps functions from the C GSL math libraries. I was using a random number generator function, but I am always getting the same exact value whenever I generate a new random number. I imagine that the number should vary upon each run of the function. Is there something that I am missing? Do I need to set a new random seed each time or such?
extern crate rgsl;
use rgsl::Rng;
fn main() {
rgsl::RngType::env_setup();
let t = rgsl::rng::default();
let r = Rng::new(&t).unwrap()
let val = rgsl::randist::binomial::binomial(&r, 0.01f64, 1u32);
print!("{}",val);
}
The value I keep getting is 1, which seems really high considering the probability of obtaining a 1 is 0.01.
The documentation for env_setup explains everything you need to know:
This function reads the environment variables GSL_RNG_TYPE and GSL_RNG_SEED and uses their values to set the corresponding library variables gsl_rng_default and gsl_rng_default_seed
If you don’t specify a generator for GSL_RNG_TYPE then gsl_rng_mt19937 is used as the default. The initial value of gsl_rng_default_seed is zero.
(Emphasis mine)
Like all software random number generators, this is really an algorithm that produces pseudo random numbers. The algorithm and the initial seed uniquely identify a sequence of these numbers. Since the seed is always the same, the first (and second, third, ...) number in the sequence will always be the same.
So if I want to generate a new series of random numbers, then I need to change the seed each time. However, if I use the rng to generate a set of random seeds, then I will get the same seeds each time.
That's correct.
Other languages don't seem to have this constraint, meaning that the seed can be manually set if desired, but is otherwise is random.
A classical way to do this is to seed your RNG with the current time. This produces an "acceptable" seed for many cases. You can also get access to true random data from the operating system and use that as a seed or mix it in to produce more random data.
Is there no way to do this in Rust?
This is a very different question. If you just want a random number generator in Rust, use the rand crate. This uses techniques like I described above.
You could even do something crazy like using random values from the rand crate to seed your other random number generator. I just assumed that there is some important reason you are using that crate instead of rand.

Making a custom Distribution of Random Numbers

I am trying to make sense of the different distribution objects in c++11 and I am finding it overwhelming. I hope some of you can and will help.
This is why I am looking into all this:
I need a random number generator that I can adjust every time it is used so that it is more likely to produce the same number again. The second requirement I need to fill is that I need the random numbers generated to only be these numbers:
{1, 2, 4, 8, 16, ..., 128}
Third and last requirement is that on certain occasions I need to skip one or more numbers from the above set.
My problem is that I don't understand the descriptions of various distribution objects. I, thus, cannot determine what tools I need to use to meet my above needs.
Can somebody tell me what tools I need and how I need to use them? The more clear, concise and detailed the response the better.
Your range can be generated with a random number j in the range [0, 7], then you compute:
1 << j
to get your number. std::uniform_int_distribution<> would be handy for generating the value in [0, 7].
Additionally you could use a std::bernoulli_distribution (which returns a random bool) to decide if the next number is going to be the same as the last one, or if you should generate a new number. The std::bernoulli_distribution defaults to a 50/50 chance of true/false, but you can customize that distribution in the bernoulli_distribution constructor to anything you like (e.g. 80/20 or whatever).
If this isn't clear enough, just jump in with some code. Try coding it up, and if it isn't working, post what you have, and I'm sure somebody will help.
Oh, forgot about your 3rd requirement: For that just put your [0, 7] generation in a loop, and if you come up with a number you're supposed to skip, then iterate the loop, else break out of it.
For skipping numbers I completely agree with Howard that manual checking is probably the way to go, but there might be a better way altering the probability of a given number being generated.
Another way to do this would be to use a discrete_distribution object, which allows you to specify the probability of generating any given value, so for your example it would be something like
std::default_random_engine entropy;
std::array<double, 128> probs;
probs.fill(1.0);
std::discrete_distribution<int> choose(probs.begin(), probs.end());
then when you're in your loop, in addition to deciding whether or not to skip, you can increment one of those values by some amount to increase the odds of it coming up again, making sure to reinitialize the discrete distribution, like this:
int x;
double myValue = 0.2;//or whatever increment you want
for (something; something else; something else else)
{
x = choose(entropy);
if (skip(x))
continue;//alternately you could set probs.at(x) = 0
//only if you never want to generate it again
probs.at(x) += myValue;
choose = std::discrete_distribution<int>(probs.begin(), probs.end());
output(x);
}
where skip and output are your functions to decide if x should be skipped and do whatever you want with the generated value respectively

How to correctly seed the random function in Scheme?

I was under the impression, upon starting up Scheme, the randomize procedure was called with the current time as its seed. However, if I have a Scheme script consisting solely of (print (random 10)), the only output I receive is 7; no other number. So, what am I doing wrong? For the record, I am using Chicken Scheme.
What random library are you using, exactly? according to the documentation your assumption about random's seeding is correct:
(randomize [SEED]) : Set random-number seed. If SEED (an exact integer) is not supplied, the current time is used. On startup (when Unit extras is initialized), the random number generator is initialized with the current time.
(random N) : Returns a pseudo-random integer in [0, N-1]. N is an integer.
Also notice the warnings, in particular the second one that seems to explain the behaviour you're witnessing:
Warning: This procedure uses rand(3) internally and exhibits its deficiencies, including low quality pseudo-randomness:
On Windows and Solaris, only 32768 unique random values can be generated in the range [0, N-1]. If N >= 32768, there will be gaps in the result set.
On Mac OS X, Windows and some other platforms, little variance in output is seen with nearby seeds. Since the random generator is seeded with current-seconds at startup, new processes may see similar or identical random sequences for up to a minute.
On Linux, rand(3) is an alias to random(3), which provides output of reasonable quality.

Resources