Checking probabilities and random numbers (NetLogo) - random

I'm writing to have an explanation about checking probabilities in a model built by NetLogo.
I have a circumstance that can happen with a probability, for example, of 60%...
Thus I generate a number with
let trial random 100
So, it's not clear to me if I have to verifiy that trial is greater than 60 or lower equal than 60, so that the probability is satisfied.
Which is the correct way?
Thank you

You need to check if the result is less than the probability (not less than or equal). For example:
if random 100 < 60 [
do-something
]
Using random 100 will give you a number between 0 and 99 (inclusive). In the example, numbers 0 to 59 will meet the condition, i.e., 60 numbers out of the 100 possible numbers: a 60% probability.

Related

implement a function that generates an random number between a range given an biased random function

Given :
I was given a function that generates randomly 0 or 1. It generates 0 with probability p and 1 with probability 1-p.
Requirement:
I need to create a function that generates a number between 0 and 6 randomly with uniform probability by utilizing the above given function.
Note:cant use inbuilt random functions.
Can someone help me with this.
Thanks in advance
You can skew a biased random function to become unbiased by checking for a sequence of 01 or 10 and ignoring other results, this way you have a fair coin with a 50% chance of outputting any of the said sequences ((1-p)*p == p*(1-p)
With this fair coin you can then roll 3 bits and output the rolled number, if you roll a 7 (111) just repeat the process.

Specific knapsack-like

I am looking for some ideas how to deal with this specific knapsack problem (I believe it is knapsack-like problem although I might be mistaken).
As input we get set of numbers, and each can be positive or negative - we don't know that.
We have to find minimum possible absolute value of sum of some these numbers.
We don't have to use all numbers. We have to do additions (or subtractions) in same order in which numbers are given and we have to start with first number (and add or subtract following ones).
Example would be:
4 11 5 5 => 0
because 4+11-5-5 = 0
10 3 9 4 100 => 2
because 10-3-9 = -2
In second example we skipped two last numbers - because adding next numbers wouldn't give us smaller absolute number.
Amount of numbers can be up to 5,000
, and the sum of them won't over exceed 10,000
They are integers
If you were to explore all combinations of addition and subtraction of 5000 numbers, you would have to go through 25000−1 ≈ 1.4⋅101505 alternatives. That's obviously not reasonable. However, since the sum of the numbers is at most 10000, we know that all partial sums (including subtraction) must lie between -10000 and 10000, so there can be less than 20000 different sums. If you only consider different sums when you work through the 5000 positions you have less than 100 million sums to consider, which is not that much work for a computer.
Example: suppose the first three numbers are 5,1,1. The possible sums that include exactly three numbers are
5+1+1=7
5+1-1=5
5-1+1=5
5-1-1=3
Before adding the fourth number it is important to recognize that you have only three unique results from the four computations.

Increase random number set size fairly?

Math/programming question that has arisen while I'm trying to deal with using a set of random data as an entropy source. In a situation where I'm using something like Random.org's pregenerated random files as an entropy source. Raw data like this is random zeroes and ones, and could be bit off as random bytes (0-255) or larger ranges as powers of two. I'm trying to be as efficient as possible in using this random source, since it's finite in length, so I don't want to use a larger set than I need.
Taking random bytes is fair if you want a number from a range evenly divisible by 256 (e.g. 100 to 355, 0 to 15, etc.). However, what if I want a number from 1 to 100? That doesn't fit nicely in 256. I could assign 0-199 to the 1-100 range twice over, leaving 200-255 as extra that would have to be discarded if drawn, or else 55 numbers in the range would be unfairly weighted to come up more often.
Is throwing out the out-of-range numbers the only fair option? Or is there a mathematical way to fairly "blur" those 55 numbers over the 1-100 range?
The only other option I've come up with to know I will be able to use the number and not throw out results is to absorb a larger number of bytes, so that the degree of bias is less (0-255 would have some numbers in 1-100 with two "draws", and some with three; 3:2 odds = 50% more likely. Ten bytes (0-2,550) would have 26:25 odds = 4% more likely. Etc.) That uses up more data, but is more predictable.
Is there a term for what I'm trying to do (can't Google what I can't name)? Is it possible, or do I have to concede that I'll have to throw out data that doesn't fairly match the range I want?
If you use 7 bits per number, you get 0-127. Whenever you get a number greater than 100, you have to discard it. You lose the use of that data point but its still random. You lose 28 of every 128 or about 20% of the random information.
If you use 20 bits at a whack, you get a number between 0 and 1,048,575. This can be broken into 3 random values between 0 and 99 (or 1-100 if you add 1 to it). You have to use integer arithmetic or throw away any fractional part when dividing.
if (number > 1000000) discard it.
a = number % 100;
b = (number / 100) % 100;
c = (number / 10000) % 100;
You only waste 48,575 values out of 1048575 or about 5% of the random information.
You can think of this process this way. Take the number you get by converting 20 bits to an decimal integer. Break out the 10's and 1's digits, the 1000's and 100's digits and the 100,000's and 10,000's digits and use those as three random numbers. They are truly random since those digits could be any value at all in the original number. Further, we discarded any values that bias particular values of the three.
So there's a way to make more efficient use of the random bits. But you have to do some computing.
Note: The next interesting bit combination is 27 bits and that wastes about 25%. 14 bits would waste about 60%.

random number generator test

How will you test if the random number generator is generating actual random numbers?
My Approach: Firstly build a hash of size M, where M is the prime number. Then take the number
generated by random number generator, and take mod with M.
and see it fills in all the hash or just in some part.
That's my approach. Can we prove it with visualization?
Since I have very less knowledge about testing. Can you suggest me a thorough approach of this question? Thanks in advance
You should be aware that you cannot guarantee the random number generator is working properly. Note that even a perfect uniform distribution in range [1,10] - there is a 10-10 chance of getting 10 times 10 in a random sampling of 10 numbers.
Is it likely? Of course not.
So - what can we do?
We can statistically prove that the combination (10,10,....,10) is unlikely if the random number generator is indeed uniformly distributed. This concept is called Hypothesis testing. With this approach we can say "with certainty level of x% - we can reject the hypothesis that the data is taken from a uniform distribution".
A common way to do it, is using Pearson's Chi-Squared test, The idea is similar to yours - you fill in a table - check what is the observed (generated) number of numbers for each cell, and what is the expected number of numbers for each cell under the null hypothesis (in your case, the expected is k/M - where M is the range's size, and k is the total number of numbers taken).
You then do some manipulation on the data (see the wikipedia article for more info what this manipulation is exactly) - and get a number (the test statistic). You then check if this number is likely to be taken from a Chi-Square Distribution. If it is - you cannot reject the null hypothesis, if it is not - you can be certain with x% certainty that the data is not taken from a uniform random generator.
EDIT: example:
You have a cube, and you want to check if it is "fair" (uniformly distributed in [1,6]). Throw it 200 times (for example) and create the following table:
number: 1 2 3 4 5 6
empirical occurances: 37 41 30 27 32 33
expected occurances: 33.3 33.3 33.3 33.3 33.3 33.3
Now, according to Pearson's test, the statistic is:
X = ((37-33.3)^2)/33.3 + ((41-33.3)^2)/33.3 + ... + ((33-33.3)^2)/33.3
X = (18.49 + 59.29 + 10.89 + 39.69 + 1.69 + 0.09) / 33.3
X = 3.9
For a random C~ChiSquare(5), the probability of being higher then 3.9 is ~0.45 (which is not improbable)1.
So we cannot reject the null hypothesis, and we can conclude that the data is probably uniformly distributed in [1,6]
(1) We usually reject the null hypothesis if the value is smaller then 0.05, but this is very case dependent.
My naive idea:
The generator is following a distribution. (At least it should.) Do a reasonable amount of runs then plot the values on a graph. Fit a regression curve on the points. If it correlates with the shape of the distribution you're good. (This is also possible in 1D with projections and histograms. And fully automatable with the correct tool, e.g. MatLab)
You can also use the diehard tests as it was mentioned before, that is surely better but involves much less intuition, at least on your side.
Let's say you want to generate a uniform distribution on the interval [0, 1].
Then one possible test is
for i from 1 to sample-size
when a < random-being-tested() < b
counter +1
return counter/sample-size
And see if the result is closed to b-a (b minus a).
Of course you should define a function taking a, b between 0 and 1 as inputs, and return the difference between the counter/sample-size and b-a. Loop through possible a, b, say of the multiples of 0.01, a < b. Print out a, b when the difference is larger than a preset epsilon, say 0.001.
Those are the a, b for which there are too many outliers.
If you let sample-size be 5000. Your random-being-tested will be called about 5000 * 5050 times in total, hopefully not too bad.
I had the same problem.
when I finish to write my code (using an external RNG engine)
I looked on the results and found that all of them fail Chi-Square test whenever I have to many results.
my code generated a random number and hold buckets of the amount of each result range.
I don't know why the Chi-square test fail when i have a lot of results.
during my research I saw that the C# Random.next() fail in any range of random and that some of the numbers have better odds than the other, further more i saw that the RNGCryptoServiceProvider random provider is not supporting good on big numbers.
when trying to get numbers in the range of 0-1,000,000,000 the numbers in the lower range 0-300M have better odds to appear...
as a result I'm using the RNGCryptoServiceProvider and if my range is higher than 100M i'm combine the number my self (RandomHigh*100M + RandomLow) and the ranges of both randoms is smaller than 100M so it good.
Good Luck!

How to program a function to return values on some sort of probability?

This question arose to me while I was playing FIFA.
Assumingly, they programmed a complex function which includes all the factors like shooting skills, distance, shot power etc. to calculate the probability that the shot hits the target. How would they have programmed something that the goal happens according to that probability?
In other words, like a function X() has the probability that it return 1 89% and 0 11%. How would I program it so that it returns 1 (approximately) 89 times in 100 trials?
Generate a uniformly-distributed random number between 0 and 1, and return true if the number is less than the desired probability (0.89).
For example, in IPython:
In [13]: from random import random
In [14]: vals = [random() < 0.89 for i in range(10000)]
In [15]: sum(vals)
Out[15]: 8956
In this realisation, 8956 out of the 10000 boolean outcomes are true. If we repeat the experiment, the number will vary around 8900.
That is not how goals are determined in FIFA or other video games. They don't have a function that says, with some probability, the shot makes it or doesn't.
Rather, they simulate a ball actually being kicked into a goal.
The ball will have some speed (based on the "shot power") and some trajectory angle (based on where the player aimed, and some variability based on the character's "shot skill"). Then they allow physics - and the AI of the goalee, if there is one - to take over, and count it as a point only when the ball physically enters the goal.
There is of course still randomness involved, but there is no single variable that decides whether or not a shot will make it.
I'm not 100% sure but one way i would achieve:
Generate a random number (between 0 and 100). If the number is 89 or greater than return 1, elsewise return 0.
If you have a random number generator, then you would do something like:
bool return_true_89_out_of_100() {
double random_n = rand(); // returns random between 0 and 89
return (random_n < 0.89);
}
You can generate a crudely random number by, for example, sampling lower bits of the CPU clock or some mathematical tricks.
You're tagged language agnostic, but the answer depends on what random number function(s) are available to you. Furthermore the accuracy may depend on how close to being truly random your generator is (generally they're not that close).
As to random number functions, there tend to be two kinds -- those which generate a number between 0 and 1, and those that generate a number between m and n. Each can be used to derive a percentage easily.

Resources