Julia : random normal distribution - random

l want to generate random normal distribution between 0.1 and 0.3 using
randn() how can l use it ?
l tried this one but it's not working
randn(0.1:0.3,(3,1)) # (3,1) three lines and one column

Try doing
randn(3) * (0.3 - 0.1) + 0.1
random number in a range [a,b] is
rand() * (b - a) + a

Related

Clamping to next lowest value in a series

I'm trying to clamp a number to the lower value of a series of numbers. For instance, if I have a series (sorry for the bad notation)
[pq] where p is any integer and q is any positive number.
Say if q is 50 my series will be ...-150, -100, -50, 0, 50, 100, 150...
Now what I'd like is to have a function f(y) which'll clamp any number to the next lowest number in the series.
For example, if I had the number 37 I'd be expecting the f(37) = 0 and I'd be expecting f(-37) = -50.
I've tried many algorithms involving modulus and integer division but I can't seem to figure it out. The latest I've tried is for example
(37 / q) * q which works great for positive numbers but doesn't work for any number between -50 and 0.
I've also tried ((37 - q) / q) * q but this won't work for negative cases which land exactly in the series.
EDIT
Assume that I do not have the entire series but only the multiplier p of the series.
You simply need to divide y by q using integer Euclidean division and then multiply the result by q again.
f(y) = (y / q) * q
where / represents Euclidean division.
In programming languages that do not immediately support Euclidean division you will have to either implement it manually or adjust the result of whatever division the language supports.
For example, in C and C++ Euclidean division for positive divisor q can be implemented through the native "Fortran-style" division as
(y >= 0 ? y : y - q + 1) / q
so in C or C++ the whole expression will look as
f(y) = (y >= 0 ? y : y - q + 1) / q * q
For 37 you get
f(37) = 37 / 50 * 50 = 0
For -37 you get
f(-37) = (-37 - 50 + 1) / 50 * 50 = -86 / 50 * 50 = -50
If you want a purely mathematical way, with no consideration given to computational efficiency, you could shift the input p into the positive integer range by adding a positive integer that is greater than or equal to |p| and is a multiple of q, and then shift it back by subtracting afterwards. p^2*q satisfies this.
This gives:
((p + p^2*q) / q) * q - p^2*q
You can subtract the modulus result once you've ensured it's positive. In some languages it will always be positive, but if not:
mod = p % q
positive_mod = (mod + q) % q
answer = p - positive_mod
Result in C++: https://ideone.com/kIuit8
Result in Python: https://ideone.com/w6wUgZ

Number of ways to reach N from 0 using only 2 or 3?

I am solving this problem where we need to reach from X=0 to X=N.We can only take a step of 2 or 3 at a time.
For each step of 2 we have a probability of 0.2 and for each step of 3 we have a probability of 0.8.How can we find the total probability to reach N.
e.g. for reaching 5,
2+3 with probability =0.2 * 0.8=0.16
3+2 with probability =0.8 * 0.2=0.16 total = 0.32.
My initial thoughts:
Number of ways can be found out by simple Fibonacci sequence.
f(n)=f(n-3)+f(n-2);
But how do we remember the numbers so that we can multiply them to find the probability?
This can be solved using Dynamic programming.
Lets call the function F(N) = probability to reach 0 using only 2 and 3 when the starting number is N
F(N) = 0.2*F(N-2) + 0.3*F(N-3)
Base case:
F(0) = 1 and F(k)= 0 where k< 0
So the DP code would be somthing like that:
F[0] = 1;
for(int i = 1;i<=N;i++){
if(i>=3)
F[i] = 0.2*F[i-2] + 0.8*F[i-3];
else if(i>=2)
F[i] = 0.2*F[i-2];
else
F[i] = 0;
}
return F[N];
This algorithm would run in O(N)
Some clarifications about this solution: I assume the only allowed operation for generating the number from 2s and 3s is addition (your definition would allow substraction aswell) and the input-numbers are always valid (2 <= input). Definition: a unique row of numbers means: no other row with the same number of 3s and 2s in another order is in scope.
We can reduce the problem into multiple smaller problems:
Problem A: finding all sequences of numbers that can sum up to the given number. (Unique rows of numbers only)
Start by finding the minimum-number of 3s required to build the given number, which is simply input % 2. The maximum-number of 3s that can be used to build the input can be calculated this way:
int max_3 = (int) (input / 3);
if(input - max_3 == 1)
--max_3;
Now all sequences of numbers that sum up to input must hold between input % 2 and max_3 3s. The 2s can be easily calculated from a given number of 3s.
Problem B: calculating the probability for a given list and it's permutations to be the result
For each unique row of numbers, we can easily derive all permutations. Since these consist of the same number, they have the same likeliness to appear and produce the same sum. The likeliness can be calculated easily from the row: 0.8 ^ number_of_3s * 0.2 ^ number_of_2s. Next step would be to calculate the number of different permuatations. Calculating all distinct sets with a specific number of 2s and 3s can be done this way: Calculate all possible distributions of 2s in the set: (number_of_2s + number_of_3s)! / (number_of_3s! * numer_of_2s!). Basically just the number of possible distinct permutations.
Now from theory to praxis
Since the math is given, the rest is pretty straight forward:
define prob:
input: int num
output: double
double result = 0.0
int min_3s = (num % 2)
int max_3s = (int) (num / 3)
if(num - max_3 == 1)
--max_3
for int c3s in [min_3s , max_3s]
int c2s = (num - (c3s * 3)) / 2
double p = 0.8 ^ c3s * 0.2 * c2s
p *= (c3s + c2s)! / (c3s! * c2s!)
result += p
return result
Instead of jumping into the programming, you can use math.
Let p(n) be the probability that you reach the location that is n steps away.
Base cases:
p(0)=1
p(1)=0
p(2)=0.2
Linear recurrence relation
p(n+3)=0.2 p(n+1) + 0.8 p(n)
You can solve this in closed form by finding the exponential solutions to the linear recurrent relation.
c^3 = 0.2 c + 0.8
c = 1, (-5 +- sqrt(55)i)/10
Although this was cubic, c=1 will always be a solution in this type of problem since there is a constant nonzero solution.
Because the roots are distinct, all solutions are of the form a1(1)^n + a2((-5+sqrt(55)i) / 10)^n + a3((-5-sqrt(55)i)/10)^n. You can solve for a1, a2, and a3 using the initial conditions:
a1=5/14
a2=(99-sqrt(55)i)/308
a3=(99+sqrt(55)i)/308
This gives you a nonrecursive formula for p(n):
p(n)=5/14+(99-sqrt(55)i)/308((-5+sqrt(55)i)/10)^n+(99+sqrt(55)i)/308((-5-sqrt(55)i)/10)^n
One nice property of the non-recursive formula is that you can read off the asymptotic value of 5/14, but that's also clear because the average value of a jump is 2(1/5)+ 3(4/5) = 14/5, and you almost surely hit a set with density 1/(14/5) of the integers. You can use the magnitudes of the other roots, 2/sqrt(5)~0.894, to see how rapidly the probabilities approach the asymptotics.
5/14 - (|a2|+|a3|) 0.894^n < p(n) < 5/14 + (|a2|+|a3|) 0.894^n
|5/14 - p(n)| < (|a2|+|a3|) 0.894^n
f(n, p) = f(n-3, p*.8) + f(n -2, p*.2)
Start p at 1.
If n=0 return p, if n <0 return 0.
Instead of using the (terribly inefficient) recursive algorithm, start from the start and calculate in how many ways you can reach subsequent steps, i.e. using 'dynamic programming'. This way, you can easily calculate the probabilities and also have a complexity of only O(n) to calculate everything up to step n.
For each step, memorize the possible ways of reaching that step, if any (no matter how), and the probability of reaching that step. For the zeroth step (the start) this is (1, 1.0).
steps = [(1, 1.0)]
Now, for each consecutive step n, get the previously computed possible ways poss and probability prob to reach steps n-2 and n-3 (or (0, 0.0) in case of n < 2 or n < 3 respectively), add those to the combined possibilities and probability to reach that new step, and add them to the list.
for n in range(1, 10):
poss2, prob2 = steps[n-2] if n >= 2 else (0, 0.0)
poss3, prob3 = steps[n-3] if n >= 3 else (0, 0.0)
steps.append( (poss2 + poss3, prob2 * 0.2 + prob3 * 0.8) )
Now you can just get the numbers from that list:
>>> for n, (poss, prob) in enumerate(steps):
... print "%s\t%s\t%s" % (n, poss, prob)
0 1 1.0
1 0 0.0
2 1 0.2
3 1 0.8
4 1 0.04
5 2 0.32 <-- 2 ways to get to 5 with combined prob. of 0.32
6 2 0.648
7 3 0.096
8 4 0.3856
9 5 0.5376
(Code is in Python)
Note that this will get you both the number of possible ways of reaching a certain step (e.g. "first 2, then 3" or "first 3, then 2" for 5), and the probability to reach that step in one go. Of course, if you need only the probability, you can just use single numbers instead of tuples.

random increasing sequence with O(1) access to any element?

I have an interesting math/CS problem. I need to sample from a possibly infinite random sequence of increasing values, X, with X(i) > X(i-1), with some distribution between them. You could think of this as the sum of a different sequence D of uniform random numbers in [0,d). This is easy to do if you start from the first one and go from there; you just add a random amount to the sum each time. But the catch is, I want to be able to get any element of the sequence in faster than O(n) time, ideally O(1), without storing the whole list. To be concrete, let's say I pick d=1, so one possibility for D (given a particular seed) and its associated X is:
D={.1, .5, .2, .9, .3, .3, .6 ...} // standard random sequence, elements in [0,1)
X={.1, .6, .8, 1.7, 2.0, 2.3, 2.9, ...} // increasing random values; partial sum of D
(I don't really care about D, I'm just showing one conceptual way to construct X, my sequence of interest.) Now I want to be able to compute the value of X[1] or X[1000] or X[1000000] equally fast, without storing all the values of X or D. Can anyone point me to some clever algorithm or a way to think about this?
(Yes, what I'm looking for is random access into a random sequence -- with two different meanings of random. Makes it hard to google for!)
Since D is pseudorandom, there’s a space-time tradeoff possible:
O(sqrt(n))-time retrievals using O(sqrt(n)) storage locations (or,
in general, O(n**alpha)-time retrievals using O(n**(1-alpha))
storage locations). Assume zero-based indexing and that
X[n] = D[0] + D[1] + ... + D[n-1]. Compute and store
Y[s] = X[s**2]
for all s**2 <= n in the range of interest. To look up X[n], let
s = floor(sqrt(n)) and return
Y[s] + D[s**2] + D[s**2+1] + ... + D[n-1].
EDIT: here's the start of an approach based on the following idea.
Let Dist(1) be the uniform distribution on [0, d) and let Dist(k) for k > 1 be the distribution of the sum of k independent samples from Dist(1). We need fast, deterministic methods to (i) pseudorandomly sample Dist(2**p) and (ii) given that X and Y are distributed as Dist(2**p), pseudorandomly sample X conditioned on the outcome of X + Y.
Now imagine that the D array constitutes the leaves of a complete binary tree of size 2**q. The values at interior nodes are the sums of the values at their two children. The naive way is to fill the D array directly, but then it takes a long time to compute the root entry. The way I'm proposing is to sample the root from Dist(2**q). Then, sample one child according to Dist(2**(q-1)) given the root's value. This determines the value of the other, since the sum is fixed. Work recursively down the tree. In this way, we look up tree values in time O(q).
Here's an implementation for Gaussian D. I'm not sure it's working properly.
import hashlib, math
def random_oracle(seed):
h = hashlib.sha512()
h.update(str(seed).encode())
x = 0.0
for b in h.digest():
x = ((x + b) / 256.0)
return x
def sample_gaussian(variance, seed):
u0 = random_oracle((2 * seed))
u1 = random_oracle(((2 * seed) + 1))
return (math.sqrt((((- 2.0) * variance) * math.log((1.0 - u0)))) * math.cos(((2.0 * math.pi) * u1)))
def sample_children(sum_outcome, sum_variance, seed):
difference_outcome = sample_gaussian(sum_variance, seed)
return (((sum_outcome + difference_outcome) / 2.0), ((sum_outcome - difference_outcome) / 2.0))
def sample_X(height, i):
assert (0 <= i <= (2 ** height))
total = 0.0
z = sample_gaussian((2 ** height), 0)
seed = 1
for j in range(height, 0, (- 1)):
(x, y) = sample_children(z, (2 ** j), seed)
assert (abs(((x + y) - z)) <= 1e-09)
seed *= 2
if (i >= (2 ** (j - 1))):
i -= (2 ** (j - 1))
total += x
z = y
seed += 1
else:
z = x
return total
def test(height):
X = [sample_X(height, i) for i in range(((2 ** height) + 1))]
D = [(X[(i + 1)] - X[i]) for i in range((2 ** height))]
mean = (sum(D) / len(D))
variance = (sum((((d - mean) ** 2) for d in D)) / (len(D) - 1))
print(mean, math.sqrt(variance))
D.sort()
with open('data', 'w') as f:
for d in D:
print(d, file=f)
if (__name__ == '__main__'):
test(10)
If you do not record the values in X, and if you do not remember the values in X you have previously generate, there is no way to guarantee that the elements in X you generate (on the fly) will be in increasing order. It furthermore seems like there is no way to avoid O(n) time worst-case per query, if you don't know how to quickly generate the CDF for the sum of the first m random variables in D for any choice of m.
If you want the ith value X(i) from a particular realization, I can't see how you could do this without generating the sequence up to i. Perhaps somebody else can come up with something clever.
Would you be willing to accept a value which is plausible in the sense that it has the same distribution as the X(i)'s you would observe across multiple realizations of the X process? If so, it should be pretty easy. X(i) will be asymptotically normally distributed with mean i/2 (since it's the sum of the Dk's for k=1,...,i, the D's are Uniform(0,1), and the expected value of a D is 1/2) and variance i/12 (since the variance of a D is 1/12 and the variance of a sum of independent random variables is the sum of their variances).
Because of the asymptotic aspect, I'd pick some threshold value for i to switch over from direct summing to using the normal. For example, if you use i = 12 as your threshold you would use actual summing of uniforms for values of i from 1 to 11, and generate a Normal(i/2, sqrt(i/12)) value for i >. That's an O(1) algorithm since the total work is bounded by your threshold, and the results produced will be distributionally representative of what you would see if you actually went through the summing.

Generating a random float with a controlled precision level

This generates a random float with a certain precision level in Ruby:
def generate(min,max,precision)
number = rand * (max - min) + min
factor = 10.0 ** precision
return (number * factor).to_i / factor
end
Someone recently suggested me that it may be simpler to do this:
var = rand(100) * 1.0
var /= 10
Which generates a random float from 0.0 to 10.0.
This sounds good, but I'm not sure about how to control the precision level with that method. How may I make the equivalent to my first method, but using this suggestion?
If you have a range x1 to x2, and you want a minimum increment ("precision") of delta, then you need
n = (x2 - x1)/delta +1
Different integers, which you scale with
rand(n) * delta + x1
To give random numbers between x1 and x2 inclusive, with an increment of delta
How do you define "precision"? The relationship between delta and precision is given ( according to you formula) by
Delta = 10**(-precision)
Or
delta = 1.0 / 10**precision

Possible ways to calculate X = A - inv(B) * Y * inv(B) and X = Y + A' * inv(B) * A

I have two problems. I have to calculate two equations:
X = A - inv(B) * Y * inv(B)
and
X = Y + A' * inv(B) * A
where, A, B and Y are known p*p matrices (p can be small or large, depends the situation). Matrices are quite dense, without any structure (except B being non-singular of course).
Is it possible to solve X in those equations without inverting the matrix B? I have to calculate these equations n times, n being hundreds or thousands, and all the matrices change over time.
Thank you very much.
If you can express your updates to your matrix B in the following terms:
Bnew = B + u*s*v
then you can express an update to inv(B) explicitly using the Sherman-Morrison-Woodbury formula:
inv(B + u*s*v) = inv(B) - inv(B)*u*inv(s + v*inv(B)*u)*v*inv(B)
If u and v are vectors (column and row, respectively) and s is scalar, then this expression simplifies:
inv(B + u*s*v) = inv(B) - inv(B)*u*v*inv(B)/(s + v*inv(B)*u)
You would only have to calculate inv(B) once and then update it when it changes with no additional inversions.
It may be preferable not to calculate the full inverse, just simple "matrix divisions" on y and (ynew - y) or a and (anew - a) depending on the size of "n" with respect to "p" in your problem.
Memo-ize inv(B), i.e. only invert B when it changes, and keep the inverse around.
If changes to B are small, possibly you could use a delta-approximation.

Resources