Random number that prefers a range - random

I'm trying to write a program that requires me to generator a random number.
I also need to make it so there's a variable chance to pick a set range.
In this case, I would be generating between 1-10 and the range with the percent chance is 7-10.
How would I do this? Could I be supplied with a formula or something like that?

So if I'm understanding your question, you want two number ranges, and a variable-defined probability that the one range will be selected. This can be described mathematically as a probability density function (PDF), which in this case would also take your "chance" variable as an argument. If, for example, your 7-10 range is more likely than the rest of the 1-10 range, your PDF might look something like:
One PDF such as a flat distribution can be transformed into another via a transformation function, which would allow you to generate a uniformly random number and transform it to your own density function. See here for the rigorous mathematics:
http://www.stat.cmu.edu/~shyun/probclass16/transformations.pdf
But since you're writing a program and not a mathematics thesis, I suggest that the easiest way is to just generate two random numbers. The first decides which range to use, and the second generates the number within your chosen range. Keep in mind that if your ranges overlap (1-10 and 7-10 obviously do) then the overlapping region will be even more likely, so you will probably want to make your ranges exclusive so you can more easily control the probabilities. You haven't said what language you're using, but here's a simple example in Python:
import random
range_chance = 30 #30 percent change of the 7-10 range
if random.uniform(0,100) < range_chance:
print(random.uniform(7,10))
else:
print(random.uniform(1,7)) #1-7 so that there is no overlapping region

Related

Storing a probability distribution without saving single values

I am calculating many (~ 100 million) floating point values during an operation. I do not want to store them all in the memory but I want to save a rough distribution of the collection.
My idea was to determine the exponents of all values and count them in a histogram. But this, of course, works only if the values have different exponents.
Has anybody an idea how I can do this without knowing how the distribution looks like?
I would suggest randomly saving some, then making a histogram after the fact from that. For example if you randomly save 0.1% of the numbers then you'd only need to save 100,000, from which you can calculate a highly accurate distribution.
You can reduce the number of calls to rand() by calling it every time you save a number to find a random number in the range 1..2000, then wait that many numbers before saving the next.
If you approximately know the min and max values, I'd think a binning strategy would be a good choice. Here is an outline for what I mean:
Figure out how many bins you need
For all my numbers
Find the bin that this number goes in
Increment that bin
Another useful alternative would be to compute on-the-fly moments of the distribution, and then reconstruct PDF from moments
https://en.wikipedia.org/wiki/Method_of_moments_(statistics)
https://www.wias-berlin.de/people/john/ELECTRONIC_PAPERS/JAOT07.CES.pdf

Need an Algorithm to generate Serialnumber

I want to generate 16-digits hexadecimal serial-number like: F204-8BE2-17A2-CFF3.
(This pattern give me 16^16 distinct serial-number But I don't need all of them)
I need you all to suggest me an algorithm to generate these serial-numbers randomly with an special characteristic which is:
each two serial-numbers have (at-least) 6 different digits
(= It means if you are given two most similar serial-number, they should still have difference in 6 indexes)
I know that a good algorithm with this characteristic needs to remember previously generated serial-numbers and I don't want that much.
In fact, I need an algorithm which do this with least probability for a chosen pair to collide (less than 0.001 seems sufficient )
PS:
I've just tried to create 10K string randomly using MD5 hash and It gave similar string( similar=more than 3 same digits) with 0.00018 probability.
It is possible to construct a correct generator without having to remember all previously generated codes. You can generate serial numbers that are spaced 6 characters apart by using Hamming code. A hamming code can be designed to arbitrarily space out two distinct generated values. Obviously, the greater the distance, the higher redundancy you will have to use, resulting in more complex code and longer numbers.
First you design a hamming code to your liking, that encodes a number into a sequence of hexadecimal digits and then you can take any sequence of numbers and use it as a seed, such as prime numbers. You just always need to remember, what number was used last and use the next one.
That being said, if you don't need to properly ensure minimal distance of two serials, and would settle for a small error, I would suggest that any half decent hash function or cypher should produce decently spaced out outputs. Therefore the first thing I would try to do is to take MD5 or SHA hashes and test-drive them on numbers 1 - 1000. My hopes are, the results will be quite satisfactory.
I suggest you look into the ANSI X9.17 pseudorandom bit generator. An algorithmic sketch is given in these slides. ANSI X9.17 generates 64-bit pseudorandom strings which is what you want.
A revised and enhanced version of this generator was approved by NIST. Please have a look at this page.
Now whether you use ANSI X9.17 generator, another generator, or develop your own, it's a good idea to have the generator pass some statistical tests in order to ensure the quality of its pseudorandom bits.
Example tests include the ENT battery, the DIEHARD battery, and the NIST battery.

Make CURAND generate positive different random numbers less than a specific number

I am trying to use CURAND library to generate random numbers which are completely independent of each other. Hence I want to give different seeds to each thread.
So, Question 1: How do I give different seeds to each thread?(Is there some time function in CUDA which I can use?)
Now I also want to generate this random number between a range i.e 0 to 10000. How do I accomplish that to happen.
Currently I am using curand_normal (as I want to have numbers from normal distribution) but its giving me negative and same numbers which I do not want.
Setting different seeds is not a statistically sound way to get independent (non-correlated) random numbers (with any single random number generator). You would be better off selecting different sub-sequences of a single sequence, and most random number libraries will allow you to do that, including cuRAND.
Check out the examples in the CUDA SDK, for example the EstimatePiP or EstimatePiInlineP examples use cuRAND to generate pseudo-random numbers.
For the second part of your question, as mentioned in the cuRAND manual the curand_normal() routines return Normally distributed numbers with mean 0.0 and standard deviation 1.0 (i.e. Standard Normal Distribution). Clearly that means that you will have ~50% negative numbers.
It doesn't make sense to specify a fixed range along with the Normal distribution. You either want some other distribution (e.g. Uniform) with the fixed range or else you want the Normal distribution with a specific mean and standard distribution. To get from the Standard Normal to your target mean/std.dev. you simply multiply the random draw by the target standard deviation and add the target mean.

how to generate longer random number from a short random number?

I have a short random number input, let's say int 0-999.
I don't know the distribution of the input. Now I want to generate a random number in range 0-99999 based on the input without changing the distribution shape.
I know there is a way to make the input to [0,1] by dividing it by 999 and then multiple 99999 to get the result. However, this method doesn't cover all the possible values, like 99999 will never get hit.
Assuming your input is some kind of source of randomness...
You can take two consecutive inputs and combine them:
input() + 1000*(input()%100)
Be careful though. This relies on the source having plenty of entropy, so that a given input number isn't always followed by the same subsequent input number. If your source is a PRNG designed to cycle between the numbers 0–999 in some fashion, this technique won't work.
With most production entropy sources (e.g., /dev/urandom), this should work fine. OTOH, with a production entropy source, you could fetch a random number between 0–99999 fairly directly.
You can try something like the following:
(input * 100) + random
where random is a random number between 0 and 99.
The problem is that input only specifies which 100 range to use. For instance 50 just says you will have a number between 5000 and 5100 (to keep a similar shape distribution). Which number between 5000 and 5100 to pick is up to you.

Generate Array of Numbers that fit to a Probability Distribution in Ruby?

Say I have 100 records, and I want to mock out the created_at date so it fits on some curve. Is there a library to do that, or what formula could I use? I think this is along the same track:
Generate Random Numbers with Probabilistic Distribution
I don't know much about how they are classified in mathematics, but I'm looking at things like:
bell curve
logarithmic (typical biology/evolution) curve?
...
Just looking for some formulas in code so I can say this:
Given 100 records, a timespan of 1.week, and an interval of 12.hours
set created_at for each record such that it fits, roughly, to curve
Thanks so much!
Update
I found this forum post about ruby algorithms, which led me to rsruby, an R/Ruby bridge, but that seems like too much.
Update 2
I wrote this little snippet trying out the gsl library, getting there...
Generate test data in Rails where created_at falls along a Statistical Distribution
I recently came across croupier, a ruby gem that aims to generate numbers according to a variety of statistical distributions.
I have yet to try it but it sounds quite promising.
You can generate UNIX timestamps which are really just integers. First figure out when you want to start, for example now:
start = DateTime::now().to_time.to_i
Find out when the end of your interval should be (say 1 week later):
finish = (DateTime::now()+1.week).to_time.to_i
Ruby uses this algorithm to generate random numbers. It is almost uniform. Then generate random numbers between the two:
r = Random.new.rand(start..finish)
Then convert that back to a date:
d = Time.at(r)
This looks promising as well:
http://rb-gsl.rubyforge.org/files/rdoc/randist_rdoc.html
And this too:
http://rb-gsl.rubyforge.org/files/rdoc/rng_rdoc.html
From wiki:
There are a couple of methods to
generate a random number based on a
probability density function. These
methods involve transforming a uniform
random number in some way. Because of
this, these methods work equally well
in generating both pseudo-random and
true random numbers.
One method, called the inversion
method, involves integrating up to
an area greater than or equal to the
random number (which should be
generated between 0 and 1 for proper
distributions).
A second method, called the
acceptance-rejection method,
involves choosing an x and y value and
testing whether the function of x is
greater than the y value. If it is,
the x value is accepted. Otherwise,
the x value is rejected and the
algorithm tries again.
The first method is the one used in the accepted answer in your SO linked question: Generate Random Numbers with Probabilistic Distribution
Another option is the Distribution gem under SciRuby. You can generate normal numbers by:
require 'distribution'
rng = Distribution::Normal.rng
random_numbers = Array.new(100).map { rng.call }
There are RNGs for various other distributions as well.

Resources