How can I acquire a certain "random range" in a higher frequency? - random

My question is basically, "how can I obtain certain random values within a specific range more than random values outside the range?"
Allow me to demonstrate what I mean:
If I were to, on a good amount of trials, start picking a variety of
random numbers from 1-10, I should be seeing more numbers in the 7-10
range than in the 1-6 range.
I tried a couple of ways, but I am not getting desirable results.
First Function:
function getAverage(i)
math.randomseed(os.time())
local sum = 0;
for j = 1,i do
sum = sum + (1-math.random()^3)*10
end
print(sum/i)
end
getAverage(500)
I was constantly getting numbers only around 7.5, such as 7.48, and 7.52. Although this does indeed get me a number within my range, I don't want such strict consistancy.
Second Function:
function getAverage(i)
math.randomseed(os.time())
local sum = 0;
for j = 1,i do
sum = sum + (math.random() > .3 and math.random(7,10) or math.random(1,6))
end
print(sum/i)
end
getAverage(500)
This function didn't work as I wanted it to either. I primarily getting numbers such as 6.8 and 7.2 but nothing even close to 8.
Third Function:
function getAverage(i)
math.randomseed(os.time())
local sum = 0;
for j = 1,i do
sum = sum + (((math.random(10) * 2)/1.2)^1.05) - math.random(1,3)
end
print(sum/i)
end
getAverage(500)
This function was giving me slightly more favorable results, with the function consistently returning 8, but that is the issue - consistency.
What type of paradigms or practical solutions can I use to generate more random numbers within a specific range over another range?
I have labeled this as Lua, but a solution in any language that is understandable is acceptable.

I don't want such strict consistancy.
What does that mean?
If you average a very large number of values in a given range from any RNG, you should expect that to produce the same number. That means each of the numbers in the range was equally likely to appear.
This function didn't work as I wanted it to either. I primarily getting numbers such as 6.8 and 7.2 but nothing even close to 8.
You have to clarify what "didn't work" means. Why would you expect it to give you 8? You can see it won't just by looking at the formula you used.
For instance, if you'd used math.random(1,10), assuming all numbers in the range have an equal chance of appearing, you should expect the average to be 5.5, dead in the middle of 1 and 10 (because (1+2+3+4+5+6+7+8+9+10)/10 = 5.5).
You used math.random() > .3 and math.random(7,10) or math.random(1,6) which is saying 70% of the time to give 7, 8, 9, or 10 (average = 8.5) and 30% of the time to give you 1, 2, 3, 4, 5, or 6 (average = 3.5). That should give you an overall average of 7 (because 3.5 * .3 + 8.5 * .7 = 7). If you bump up your sample size, that's exactly what you'll see. You're seeing values on either size because you sample size is so small (try bumping it up to 100000).

I've made skewed random values before by simply generating two random numbers in the range, and then picking the largest (or smallest). This skews the probability towards the high (or low) endpoint.
Picking the smallest of two gives you a linear probability distribution.
Picking the smallest of three gives you a parabolic distribution (more selectivity, less probability at "the other end"). For my needs, a linear distribution was fine.
Not exactly what you wanted, but maybe it's good enough.
Have fun!

Related

Randomly select N unique elements from a list, given a probability for each

I've run into a problem: I have a list or array (IList) of elements that have a field (float Fitness). I need to efficiently choose N random unique elements depending on this variable: the bigger - the more likely it is to be chosen.
I searched on the internet, but the algorithms I found were rather unreliable.
The answer stated here seems to have a bigger probability at the beginning which I need to make sure to avoid.
-Edit-
For example I need to choose from objects with the values [-5, -3, 0, 1, 2.5] (negative values included).
The basic algorithm is to sum the values, and then draw a point from 0-sum(values) and an order for the items, and see which one it "intersects".
For the values [0.1, 0.2, 0.3] the "windows" [0-0.1, 0.1-0.3, 0.3-0.6] will look like this:
1 23 456
|-|--|---|
|-*--*---|
And you draw a point [0-0.6] and see what window it hit on the axis.
Pseudo-python for this:
original_values = {val1, val2, ... valn}
# list is to order them, order doesn't matter outside this context.
values = list(original_values)
# limit
limit = sum(values)
draw = random() * limit
while true:
candidate = values.pop()
if candidate > draw:
return candidate
draw -= candidate
So what shall those numbers represent?
Does 2.5 mean, that the probability to be chosen is twice as high than 1.25? Well - the negative values don't fit into that scheme.
I guess fitness means something like -5: very ill, 2.5: very fit. We have a range of 7.5 and could randomly pick an element, if we know how many candidates there are and if we have access by index.
Then, take a random number between -5 and 2.5 and see, if our number is lower than or equal to the candidates fitness. If so, the candidate is picked, else we repeat with step 1. I would say, that we then generate a new threshold to survive, because if we got an 2.5, but no candidate with that fitness remains, we would search infinitely.
The range of fitnesses has to be known for this, too.
fitnesses [-5, -3, 0, 1, 2.5]
rand -5 x x x x x
-2.5 - - x x x
0 - - x x x
2.5 - - - - x
If every candidate shall be testet every round, and the -5 guy shall have a chance to survive, you have to stretch the interval of random numbers a bit, to give him a chance, for instance, from -6 to 3.

Random Numbers based on the ANU Quantum Random Numbers Server

I have been asked to use the ANU Quantum Random Numbers Service to create random numbers and use Random.rand only as a fallback.
module QRandom
def next
RestClient.get('http://qrng.anu.edu.au/API/jsonI.php?type=uint16&length=1'){ |response, request, result, &block|
case response.code
when 200
_json=JSON.parse(response)
if _json["success"]==true && _json["data"]
_json["data"].first || Random.rand(65535)
else
Random.rand(65535) #fallback
end
else
puts response #log problem
Random.rand(65535) #fallback
end
}
end
end
Their API service gives me a number between 0-65535. In order to create a random for a bigger set, like a random number between 0-99999, I have to do the following:
(QRandom.next.to_f*(99999.to_f/65535)).round
This strikes me as the wrong way of doing, since if I were to use a service (quantum or not) that creates numbers from 0-3 and transpose them into space of 0-9999 I have a choice of 4 numbers that I always get. How can I use the service that produces numbers between 0-65535 to create random numbers for a larger number set?
Since 65535 is 1111111111111111 in binary, you can just think of the random number server as a source of random bits. The fact that it gives the bits to you in chunks of 16 is not important, since you can make multiple requests and you can also ignore certain bits from the response.
So after performing that abstraction, what we have now is a service that gives you a random bit (0 or 1) whenever you want it.
Figure out how many bits of randomness you need. Since you want a number between 0 and 99999, you just need to find a binary number that is all ones and is greater than or equal to 99999. Decimal 99999 is equal to binary 11000011010011111, which is 17 bits long, so you will need 17 bits of randomness.
Now get 17 bits of randomness from the service and assemble them into a binary number. The number will be between 0 and 2**17-1 (131071), and it will be evenly distributed. If the random number happens to be greater than 99999, then throw away the bits you have and try again. (The probability of needing to retry should be less than 50%.)
Eventually you will get a number between 0 and 99999, and this algorithm should give you a totally uniform distribution.
How about asking for more numbers? Using the length parameter of that API you can just ask for extra numbers and sum them so you get bigger numbers like you want.
http://qrng.anu.edu.au/API/jsonI.php?type=uint16&length=2
You can use inject for the sum and the modulo operation to make sure the number is not bigger than you want.
json["data"].inject(:+) % MAX_NUMBER
I made some other changes to your code like using SecureRandom instead of the regular Random. You can find the code here:
https://gist.github.com/matugm/bee45bfe637f0abf8f29#file-qrandom-rb
Think of the individual numbers you are getting as 16 bits of randomness. To make larger random numbers, you just need more bits. The tricky bit is figuring out how many bits is enough. For example, if you wanted to generate numbers from an absolutely fair distribution from 0 to 65000, then it should be pretty obvious that 16 bits are not enough; even though you have the range covered, some numbers will have twice the probability of being selected than others.
There are a couple of ways around this problem. Using Ruby's Bignum (technically that happens behind the scenes, it works well in Ruby because you won't overflow your Integer type) it is possible to use a method that simply collects more bits until the result of a division could never be ambiguous - i.e. the difference when adding more significant bits to the division you are doing could never change the result.
This what it might look like, using your QRandom.next method to fetch bits in batches of 16:
def QRandom.rand max
max = max.to_i # This approach requires integers
power = 1
sum = 0
loop do
sum = 2**16 * sum + QRandom.next
power *= 2**16
lower_bound = sum * max / power
break lower_bound if lower_bound == ( (sum + 1) * max ) / power
end
end
Because it costs you quite a bit to fetch random bits from your chosen source, you may benefit from taking this to the most efficient form possible, which is similar in principle to Arithmetic Coding and squeezes out the maximum possible entropy from your source whilst generating unbiased numbers in 0...max. You would need to implement a method QRandom.next_bits( num ) that returned an integer constructed from a bitstream buffer originating with your 16-bit numbers:
def QRandom.rand max
max = max.to_i # This approach requires integers
# I prefer this: start_bits = Math.log2( max ).floor
# But this also works (and avoids suggestions the algo uses FP):
start_bits = max.to_s(2).length
sum = QRandom.next_bits( start_bits )
power = 2 ** start_bits
# No need for fractional bits if max is power of 2
return sum if power == max
# Draw 1 bit at a time to resolve fractional powers of 2
loop do
lower_bound = (sum * max) / power
break lower_bound if lower_bound == ((sum + 1) * max)/ power
sum = 2 * sum + QRandom.next_bits(1) # 0 or 1
power *= 2
end
end
This is the most efficient use of bits from your source possible. It is always as efficient or better than re-try schemes. The expected number of bits used per call to QRandom.rand( max ) is 1 + Math.log2( max ) - i.e. on average this allows you to draw just over the fractional number of bits needed to represent your range.

Get X random points in a fixed grid without repetition

I'm looking for a way of getting X points in a fixed sized grid of let's say M by N, where the points are not returned multiple times and all points have a similar chance of getting chosen and the amount of points returned is always X.
I had the idea of looping over all the grid points and giving each point a random chance of X/(N*M) yet I felt like that it would give more priority to the first points in the grid. Also this didn't meet the requirement of always returning X amount of points.
Also I could go with a way of using increments with a prime number to get kind of a shuffle without repeat functionality, but I'd rather have it behave more random than that.
Essentially, you need to keep track of the points you already chose, and make use of a random number generator to get a pseudo-uniformly distributed answer. Each "choice" should be independent of the previous one.
With your first idea, you're right, the first ones would have more chance of getting picked. Consider a one-dimensional array with two elements. With the strategy you mention, the chance of getting the first one is:
P[x=0] = 1/2 = 0.5
The chance of getting the second one is the chance of NOT getting the first one 0.5, times 1/2:
P[x=1] = 1/2 * 1/2 = 0.25
You don't mention which programming language you're using, so I'll assume you have at your disposal random number generator rand() which results in a random float in the range [0, 1), a Hashmap (or similar) data structure, and a Point data structure. I'll further assume that a point in the grid can be any floating point x,y, where 0 <= x < M and 0 <= y < N. (If this is a NxM array, then the same applies, but in integers, and up to (M-1,N-1)).
Hashmap points = new Hashmap();
Point p;
while (items.size() < X) {
p = new Point(rand()*M, rand()*N);
if (!points.containsKey(p)) {
items.add(p, 1);
}
}
Note: Two Point objects of equal x and y should be themselves considered equal and generate equal hash codes, etc.

Efficient (fastest) way to sum elements of matrix in matlab

Lets have matrix A say A = magic(100);. I have seen 2 ways of computing sum of all elements of matrix A.
sumOfA = sum(sum(A));
Or
sumOfA = sum(A(:));
Is one of them faster (or better practise) then other? If so which one is it? Or are they both equally fast?
It seems that you can't make up your mind about whether performance or floating point accuracy is more important.
If floating point accuracy were of paramount accuracy, then you would segregate the positive and negative elements, sorting each segment. Then sum in order of increasing absolute value. Yeah, I know, its more work than anyone would do, and it probably will be a waste of time.
Instead, use adequate precision such that any errors made will be irrelevant. Use good numerical practices about tests, etc, such that there are no problems generated.
As far as the time goes, for an NxM array,
sum(A(:)) will require N*M-1 additions.
sum(sum(A)) will require (N-1)*M + M-1 = N*M-1 additions.
Either method requires the same number of adds, so for a large array, even if the interpreter is not smart enough to recognize that they are both the same op, who cares?
It is simply not an issue. Don't make a mountain out of a mole hill to worry about this.
Edit: in response to Amro's comment about the errors for one method over the other, there is little you can control. The additions will be done in a different order, but there is no assurance about which sequence will be better.
A = randn(1000);
format long g
The two solutions are quite close. In fact, compared to eps, the difference is barely significant.
sum(A(:))
ans =
945.760668102446
sum(sum(A))
ans =
945.760668102449
sum(sum(A)) - sum(A(:))
ans =
2.72848410531878e-12
eps(sum(A(:)))
ans =
1.13686837721616e-13
Suppose you choose the segregate and sort trick I mentioned. See that the negative and positive parts will be large enough that there will be a loss of precision.
sum(sort(A(A<0),'descend'))
ans =
-398276.24754782
sum(sort(A(A<0),'descend')) + sum(sort(A(A>=0),'ascend'))
ans =
945.7606681037
So you really would need to accumulate the pieces in a higher precision array anyway. We might try this:
[~,tags] = sort(abs(A(:)));
sum(A(tags))
ans =
945.760668102446
An interesting problem arises even in these tests. Will there be an issue because the tests are done on a random (normal) array? Essentially, we can view sum(A(:)) as a random walk, a drunkard's walk. But consider sum(sum(A)). Each element of sum(A) (i.e., the internal sum) is itself a sum of 1000 normal deviates. Look at a few of them:
sum(A)
ans =
Columns 1 through 6
-32.6319600960983 36.8984589766173 38.2749084367497 27.3297721091922 30.5600109446534 -59.039228262402
Columns 7 through 12
3.82231962760523 4.11017616179294 -68.1497901792032 35.4196443983385 7.05786623564426 -27.1215387236418
Columns 13 through 18
When we add them up, there will be a loss of precision. So potentially, the operation as sum(A(:)) might be slightly more accurate. Is it so? What if we use a higher precision for the accumulation? So first, I'll form the sum down the columns using doubles, then convert to 25 digits of decimal precision, and sum the rows. (I've displayed only 20 digits here, leaving 5 digits hidden as guard digits.)
sum(hpf(sum(A)))
ans =
945.76066810244807408
Or, instead, convert immediately to 25 digits of precision, then summing the result.
sum(hpf(A(:))
945.76066810244749807
So both forms in double precision were equally wrong here, in opposite directions. In the end, this is all moot, since any of the alternatives I've shown are far more time consuming compared to the simple variations sum(A(:)) or sum(sum(A)). Just pick one of them and don't worry.
Performance-wise, I'd say both are very similar (assuming a recent MATLAB version). Here is quick test using the TIMEIT function:
function sumTest()
M = randn(5000);
timeit( #() func1(M) )
timeit( #() func2(M) )
end
function v = func1(A)
v = sum(A(:));
end
function v = func2(A)
v = sum(sum(A));
end
the results were:
>> sumTest
ans =
0.0020917
ans =
0.0017159
What I would worry about is floating-point issues. Example:
>> M = randn(1000);
>> abs( sum(M(:)) - sum(sum(M)) )
ans =
3.9108e-11
Error magnitude increases for larger matrices
i think a simple way to understand is apply " tic_ toc "function in first and last of your code.
tic
A = randn(5000);
format long g
sum(A(:));
toc
but when you used randn function ,elements of it are random and time of calculation can
different in each cycle CPU calculation .
This better you used a unique matrix whit so large elements to compare time of calculation.

Random function returning number from interval

How would you implement a function that is returning a random number from interval 1..1000
in the case there is a number N determining the chance of reaching higher numbers or lower numbers?
It should behave as follows:
e.g.
if N = 0 and we will generate many times the random number we will get a certain equilibrium (every number from interval 1..1000 has equal chance).
if N = 2321 (I call it positive factor) it will be very hard to achieve small number (often will be generated numbers > 900, sometimes numbers near 500 and rarely numbers < 100). The highest the positive factor the highest probability for high numbers
if N = -2321 (negative factor) this will be the opposite of positive factor
It's clear that the generated numbers will create for given N certain characteristic curve. Could you advise me how to achieve this goal and what curves can I create? What possibilities do I have here? How would you limit positive and negative factors etc.
thank you for help
If you generate a uniform random number, and then raise it to a power > 1, it will get smaller, but stay in the range [0, 1]. If you raise it to a power greater than 0 but less than 1, it will get larger, but stay in the range [0, 1].
So you can use the exponent to pick a power when generating your random numbers.
def biased_random(scale, bias):
return random.random() ** bias * scale
sum(biased_random(1000, 2.5) for x in range(100)) / 100
291.59652962214676 # average less than 500
max(biased_random(1000, 2.5) for x in range(100))
963.81166161355998 # but still occasionally generates large numbers
sum(biased_random(1000, .3) for x in range(100)) / 100
813.90199860117821 # average > 500
min(biased_random(1000, .3) for x in range(100))
265.25040459294883 # but still occasionally generates small numbers
This problem is severely underspecified. There are a million ways to solve it as it is mentioned.
Instead of arbitrary positive and negative values, try to think what is the meaning behind them. IMHO, beta distribution is the one you should consider. By selecting the parameters \alpha and \beta you should be appropriately modulate the behavior of your distribution.
See what shapes you can get with certain \alpha and \beta http://en.wikipedia.org/wiki/Beta_distribution#Shapes
http://en.wikipedia.org/wiki/File:Beta_distribution_pdf.svg
Lets for beginning decide that we will pick numbers from [0,1] because it makes stuff simpler.
n is number that represents distribution (0,2321 or -2321) as in example
We need solution only for n > 0, because if n < 0. You can take positive version of n and subtract from 1.
One simple idea for PDF in interval [0,1] is x^n. (or at least this kind of shape)
Calculating CDF is then integrating x^n and is x^(n+1)/(n+1)
Because CDF must be 1 at the end (in our case at 1) our final CDF is than x^(n+1) and is properly weighted
In order to generate this kind distribution from this, we must calaculate quantile function
Quantile function is just inverse of CDF and is in our case. x^(1/(n+1))
And that is it. Your QF is x^(1/(n+1))
To generate numbers from [0,1] you have to pick uniformly distributetd random from [0,1] (most common random function in programming languages)
and than power this whit (1/(n+1))
Only problem I see is that it can be problem to calculate 1-x^(1/(-n+1)) correctly, where n < 0 but i think that you can use log1p,
so it becomes exp(log1p(-x^(1/(-n+1))) if n<0
conclusion whit normalizations
if n>=0: (x^(1/(n/1000+1)))*1000
if n<0: exp(log1p(-(x^(1/(-(n/1000)+1)))))*1000
where x is uniformly distributed random value in interval [0,1]

Resources