pypsark randn with mean and variance - std

I would like to generate numbers from Normal distribution. I used this line to do so:
df1 = sqlContext.range(0, 1000000)\
.withColumn('normal',func.round(randn(seed=23),2))
So I assume normal column does that with mean 0 and std 1.
Is there a way to generate these numbers with mean=10 and std=5 for example?

Related

Is there a way to get a random sample from a particular decile of the normal distribution using Stata's rnormal() function?

I'm working with a dataset where the values of my variable of interest are hidden. I have the range (min max), mean, and sd of this variable and for each observation, I have information on which decile the value for observation lies in. Is there any way I can impute some values for this variable using the random number generator or rnormal() suite of commands in Stata? Something along the lines of:
set seed 1
gen imputed_var=rnormal(mean,sd,decile) if decile==1
Appreciate any help on this, thanks!
I am not familiar with Stata, but the following may get you in the right direction.
In general, to generate a random number in a certain decile:
Generate a random number in [(decile-1)/10, decile/10], where decile is the desired decile, from 1 through 10.
Find the quantile of the random number just generated.
Thus, in pseudocode, the following will achieve what you want (I'm not sure about the exact names of the corresponding functions in Stata, though, which is why it's pseudocode):
decile = 4 # 4th decile
# Generate a random number in the decile (here, [0.3, 0.4]).
v = runiform((decile-1)/10, decile/10)
# Convert the number to a normal random number
q = qnormal(v) # Quantile of the standard normal distribution
# Scale and shift the number to the desired mean
# and standard deviation
q = q * sd + mean
This is precisely the suggestion just made by #Peter O. I make the same assumption he did: that by a common abuse of terminology, "decile" is your shorthand for decile class, bin or interval. Historically, deciles are values corresponding to cumulative probabilities 0.1(0.1)0.9, not any bins those values delimit.
. clear
. set obs 100
number of observations (_N) was 0, now 100
. set seed 1506
. gen foo = invnormal(runiform(0, 0.1))
. su foo
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
foo | 100 -1.739382 .3795648 -3.073447 -1.285071
and (closer to your variable names)
gen wanted = invnormal(runiform(0.1 * (decile - 1), 0.1 * decile))

EDIT: ES6 : rolling and testing my own PRNG hash hex key generator?

[EDIT: post was originnaly too long... trying to shorten it !]
for building some VNodes based app, I need a function that can generate pseudo random numbers, used as unique IDs, with a 3 or 4 bytes sizes.
Think a random RGB or RBBA, CSS-like string colors like : 'bada55' or '900dcafe'... User could seed the generator by picking a random pixel in a PNG image, for example.
Due to functionnal aspects of my app, generator must a pure function avpoding side effects like : using Math.random...
I'm not aware of artithmetic theories (my courses are so far in past...) but I decided to roll a custom PRNG, using Multiply-With-Carry (MWC) paradigm, and test it empirically with some prime coeffs, and random colors seeds.
My idea is to test it with 1-byte, 2-bytes, 3-bytes, and then 4-bytes outputs : my feelings are :
identifiying "good" primes and potential 'bad' seeds when number of bytes is lower
and try to test it against a cresing number of bytes
[EDITED FROM HERE]
MCW usually works as follow:
# each turn, compute the i-th byte:
Xi[i] = A * Zi[i] + Ci[i]
Ci[i] = Math.floor((A * Zi[i] + Ci[i]) / M)
Zi[i] = Xi[i] % M
where A is the multiplier, C the increment and M the modulus. For bytes the modulus is 256.
It's possible to determine mathematically A and C (prime numbers) to get a full cycle generator.
The trouble is that, when a byte cell starts to loop to its seed value, ALL cells do that... so the period is 256 for 4 bytes !
I need to build something like an odometer to shift values of othrt cells and garantuee a priod of Math.pow(256, 4).
How to achieve that in a simple way, if possible ?

Julia - Random number in different intervals

Hello
I would like create an array with numbers from different intervals.
For example, with the following code:
using Distributions
A = rand(Uniform(1,10),1,20)
"A" contains 20 numbers between 1 and 10.
I would like create "B" where "B" contains 20 numbers between 1 and 4, or between 6 and 10 but not between 4 and 6.
Is it possible ?
Thank you
I think for general usecase, you want to make sure that the new probability you're sampling from is still a uniform one, albeit spread across non-connecting ranges.
I hacked together a function that produces a new uniform distribution from multiple disconnected uniform distributions:
using Distributions
function general_uniform(distributions...)
all_dists = [distributions...]
sort!(all_dists, by = D -> minimum(D))
# make sure ranges are non overlapping
#assert all(map(maximum, all_dists)[1:end-1] .<= map(minimum, all_dists)[2:end])
dist_legths = map(D -> maximum(D) - minimum(D), all_dists)
ratios = dist_legths ./ sum(dist_legths)
return MixtureModel(all_dists, Categorical(ratios))
end
Then you can sample from this like this:
B = rand(general_uniform(Uniform(1,4), Uniform(6,10)),1,20)
This will give you a uniform distribution even if your ranges don't have the same length. For example:
general_uniform(Uniform(0,1), Uniform(1,10))
Will sample from range 0-1 with probability of 0.1 and from range 1-10 with probability of 0.9.
For example, the following gives a number around 5:
mean(rand(general_uniform(Uniform(0,9), Uniform(9,10)),1000))
Sure:
numbers = []
for i in 1 : 20
if rand() < 0.5
push!(numbers, rand(Uniform(1,4)))
else
push!(numbers, rand(Uniform(6,10)))
end
end
You can also do a mixture:
D = MixtureModel([Uniform(1,4), Uniform(6,10)], Categorical([0.5,0.5]))
rand(D, 1, 20)
Here you have to specify a probability distribution over which uniform distribution to select from, hence the Categorical. The code above samples from each uniform range with equal probability. You can adjust the weighting by changing the Categorical as you see fit.
Using a mixture model of two uniform distributions
rand(MixtureModel(Uniform[Uniform(1,4),Uniform(6,10)]),1,20)
edit :: this sampling is only correct if the size of the intervals is equal!
hth!

Combining two unique numbers (order doesn't matter) to create a unique number

I'm creating a website in which 2 items out of a possible 117 are chosen and compared in different ways. I need a way to assign each of these matchups a unique number so they can be easily stored in a database and what not. I've seen pairing functions, but I cannot find one in which order doesn't matter. For example, I want the unique number for 2 and 17 to be the same as 17 and 2. Is there an equation that will satisfy this?
It depends on what programming language you are using.
In Java for example it would be quite easy, because same seed is producing the same random number sequence. So you could simple use the sum of both random numbers
Long seed = 2L + 17L;
Long seed2 = 17L+2L;
Random random = new Random(seed);
Random random2 = new Random(seed2);
Boolean b = (random.nextLong() == random2.nextLong()) //true
However, this would also return the same value for 1+18, 0+19 and so on - whatever sums up to 19.
So, to get really unique numbers "per pair" you would need to shift one of them. IE, with 117 entries, you could multiply the SMALLER (or larger) by 1000:
Long seed = 2L * 1000 + 17L;
....
Then you have a unique random number for 2,17 and 17,2 - but 19,0 or 0,19 would produce a DIFFERENT random number.
ps.: if it should ALWAYS return the same for 2,19 - the result is not really a random number, isn't it?
I know that the question dates from 2014, but I still wanted to add the following answer.
You could use the product of prime numbers to do this. For example, if your pair is (2,4), then you could use the product of the 2nd prime number (=3) and the 4th prime number (=7) as your id (= 3*7 = 21).
In order to do this though with your 117 possible combinations, you would need to pre-calculate all first 117 prime numbers and store them for example in an array or a hash table and then do something like (in JavaScript):
var primes = [2,3,5,7,...];
var a = 2;
var b = 17;
var id = primes[a-1]*primes[b-1];
Note that if you want to decode them as well, things are going to be more difficult since you would need to calculate the prime factorization of your id.

increase the performance to generate random numbers in a range with step-size

To make sure that this is not a duplicate, I have already checked this and this out.
I want to generate random numbers in a specific range including step size (not continuous distribution).
For example, I want to generate random numbers between -2 and 3 in which the step between two consecutive numbers is 0.02. (e.g. [-2 -1.98 -1.96 ... 2.69 2.98 3] so a generated number should be 2.96 not 2.95).
I have tried this:
a=-2*100;
b=3*100;
r = (b-a).*rand(5,1) + a;
for i=1:length(r)
if r(i) >= 0
if mod(fix(r(i)),2)
r(i)=ceil(r(i))/100;
else
r(i)=floor(r(i))/100;
end
else
if mod(fix(r(i)),2)
r(i)=floor(r(i))/100;
else
r(i)=ceil(r(i))/100;
end
end
end
and it works.
there is an alternative way to do this in MATLAB which is :
y = datasample(-2:0.02:3,5,'Replace',false)
I want to know:
How can I make my own implementation faster (improve the
performance)?
If the second method is faster (it looks faster to me), how can I
use similar implementation in C++?
Those previous answers do cover your case if you read carefully. For example, this one produces random numbers between limits with a step size of one. But let's generalize this to an arbitrary step size in case you can't figure out how to get there. There are several different ways. Here's one using randi where we use the default step size of one and the range from one to the number possible values as indices:
lo = 2;
hi = 3;
step = 0.02;
v = lo:step:hi;
r = v(randi(length(v),[5 1]))
If you look inside datasample (type edit datasample in your command window to view the code) you'll see that it's doing something very similar to this answer. In the case of the 'Replace' option being true see around line 135 (in R2013a at least).
If the 'Replace' option is false, as in your use of datasample above, then randperm actually needs to be used instead (see around line 159):
lo = 2;
hi = 3;
step = 0.02;
v = lo:step:hi;
r = v(randperm(length(v),51))
Because there is no replacement in this case, 51 is the maximum number of values that can be requested in a call and all values of r will be unique.
In C++ you should not use rand() if you're doing scientific computing and generating large numbers of random variates. Instead you should use a large period random number generator such as Mersenne Twister (the default in Matlab). C++11 includes a version of this generator as part of . More here in rand(). If you want something fast, you should try the Double precision SIMD-oriented Fast Mersenne Twister. You'll have to ask another question if you want to implement your code in C++.
The distribution you want is a simple transform of integers, so how about:
step = 0.02
r = randi([-2 3] / step, [5, 1]) * step;
In C++, rand() generates integers too, so it should be pretty obvious how to take a similar approach there.

Resources