How do I generate a multivariate distribution in J given the mean value vector and the covariance matrix?
For example, in Python,
np.random.multivariate_normal([0,0],[[1,.75],[.75,1]],1000)
generates a multivariate distribution with [0,0] as mean value vector and [[1,.75],[.75,1]] as the variance-covariance matrix?
Thanks
Wikipedia describes a standard method for creating multivariate normal distributions:
Mu=: 0 0 NB. vector of means
]Sigma=: 1 0.75 ,: 0.75 1 NB. covariance matrix
1 0.75
0.75 1
We can get the Cholesky Decomposition of the covariance matrix using code from the matfacto.ijs script in the math/misc addon (or use the LAPACK addon)
load 'math/misc/matfacto'
A=: choleski Sigma NB. Cholesky decomp
Create 2 independent univariate normal variables using the stats/distribs addon.
load 'stats/distribs'
z=: rnorm 2 1000 NB. 2 standard normal variables sampled 1000 times
Now generate the desired multivariate distributions:
X=: Mu + A mp z
Now check that the distributions are as specified:
load 'stats/base'
mean"1 X
0.0264368 0.00887907 NB. mean close to 0 (Mu)
stddev"1 X
0.987214 0.991614 NB. stddev close to 1 (sqrt of diagonal of Sigma)
corr/ X
0.746917 NB. correlation close to 0.75 (off-diagonal of Sigma)
We can code this as a single verb:
multivar_norm=: dyad define
'Mu Sigma'=. x
A=. choleski Sigma
z=. rnorm y ,~ #Sigma
Mu + A mp z
)
X=: (Mu;Sigma) multivar_norm 1000
((mean , stddev)"1 ; corr/) X
┌──────────────────┬────────┐
│0.0199138 1.01788│0.749184│
│ 0.035176 0.987191│ │
└──────────────────┴────────┘
Related
So I am working on a Maxima program that involves a bunch of iterations (the Souriau-Frame Drazin Inverse Algorithm, to be specific), each step of which yields a polynomial. I need to check and stop my iterations when the polynomial goes to zero (i.e., all coefficients go to zero).
Maxima seems to never truncate small numbers to zero up until it reaches something absurd like $10^{-323}$ and so on.
The following code snippet gives an idea of what I need:
(%i3) rat(1e-300);
rat: replaced 1.0E-300 by 1/999999999999999903803069407426113968898218766118103141789833949572356552411722264192305659040010509526872994217248819197070144216063125530186267630296136203765329090687113225440746189048800695790727969805197112921161540803823920273299782054992133678869364753954248541633605124057805104488924519071744 = 1.0E-300
(%o3)/R/ 1/9999999999999999038030694074261139688982187661181031417898339495723\
565524117222641923056590400105095268729942172488191970701442160631255301862676\
302961362037653290906871132254407461890488006957907279698051971129211615408038\
23920273299782054992
133678869364753954248541633605124057805104488924519071744
(%i4) rat(1e-323);
rat: replaced 9.0E-324 by^C
Maxima encountered a Lisp error:
SIMPLE-ERROR: Console interrupt.
Automatically continuing.
To enable the Lisp debugger set *debugger-hook* to nil.
(%i5) rat(1e-325);
rat: replaced 0.0 by 0/1 = 0.0
(%o5)/R/ 0
(%i6)
As one can see, it's not truncating $10^{-300}$ to zero, it hangs (and I had to sigkill it) for $10^{-323}$ and everything smaller than $10^{-325}$ is set to zero.
I don't know where this 324 is coming from. And I'd like to know if it's possible to reduce this for my code.
Edit 1: Here's the output if I used rationalize instead of rat:
(%i3) rationalize(1e-300);
(%o3) 6032057205060441/6032057205060440848842124543157735677050252251748505781\
796615064961622344493727293370973578138265743708225425014400837164813540499979\
063179105919597766951022193355091707896034850684039059079180396788349106095584\
290087446076413771468940477241550670753145517602931224392424029547429993824129\
889235158145614364972941312
(%i4) rationalize(1e-323);
(%o4) 1/1012011266536553091762476733594586535247783248820710591784506790137151\
697839976734459801918507185622475935389321584059556949043686928967384335066999\
703692549607587121382831806822334538710466081706198838392363725342810037417123\
463493090516778245797781704050282561793847761667073076152512660931637543230031\
31653853870546747392
(%i5) rationalize(1e-324);
(%o5) 0
Edit 2: Here's the output to "build_info();":
(%i6) build_info();
(%o6)
Maxima version: "5.43.2"
Maxima build date: "2020-02-21 05:22:38"
Host type: "x86_64-pc-linux-gnu"
Lisp implementation type: "GNU Common Lisp (GCL)"
Lisp implementation version: "GCL 2.6.12"
User dir: "/home/nidish/.maxima"
Temp dir: "/tmp"
Object dir: "/home/nidish/.maxima/binary/5_43_2/gcl/GCL_2_6_12"
Frontend: false
I gather that the goal is to replace small (in absolute value) float with zero. There doesn't appear to be a built-in function for that. Here's an attempt at an implementation via the pattern matching machinery.
First define a rule to replace small floats, and define a function which applies the rule to an expression.
(%i4) matchdeclare(xx,floatnump) $
(%i5) defrule(squashing_rule,xx, if abs(xx) <= squashing_tolerance then 0 else xx);
(%o5) squashing_rule : xx -> (if abs(xx) <= squashing_tolerance then 0 else xx)
(%i6) squashing_tolerance:0.01 $
(%i7) squash_floats(expr):=applyb1(expr,squashing_rule) $
Now create a random polynomial.
(%i8) e:makelist(float((((2*random(2)-1)*(1+random(8)))/8) *10^-random(4)) *x^k,k,1,6);
2 3 4 5 6
(%o8) [- 3.75e-4 x, - 0.00625 x , - 0.05 x , 0.00625 x , 0.005 x , 0.5 x ]
(%i9) e1:apply("+",e);
6 5 4 3 2
(%o9) 0.5 x + 0.005 x + 0.00625 x - 0.05 x - 0.00625 x - 3.75e-4 x
Apply squash_floats to the generated polynomial.
(%i10) squash_floats(e1);
6 3
(%o10) 0.5 x - 0.05 x
Change the squashing tolerance.
(%i11) squashing_tolerance:0.001;
(%i12) squash_floats(e1);
6 5 4 3 2
(%o12) 0.5 x + 0.005 x + 0.00625 x - 0.05 x - 0.00625 x
Verify the replacement happens in nested expressions.
(%i13) squash_floats(sin(1+1/e1));
1
(%o13) sin(----------------------------------------------------- + 1)
6 5 4 3 2
0.5 x + 0.005 x + 0.00625 x - 0.05 x - 0.00625 x
First let's step back a moment. What is the behavior you are hoping to find? If you need to convert very small floats to rational numbers accurately, try rationalize instead of rat. Does that work correctly for 1e-323?
If you want floats smaller than a tolerance to be converted to zero, we'll need to take a different approach. I'll hold off on that for the moment.
About the specific behavior you have observed, it appears to be implementation-dependent; I get a different (still buggy) behavior with Maxima + SBCL, which reports a floating point overflow. What does build_info(); report?
I don't know if it matters, but 1e-323 is a so-called denormalized float -- it is smaller than the smallest normalized (full precision) float, which is about 1e-308.
First you say "I want to know when a polynomial is exactly going to zero." And then you say "if a coefficient in a polynomial drops below a threshold, I want that terms to be completely thrown out of the polynomial". So you don't want the polynomial to be exactly zero, you want it to be zero within some threshold (relative? absolute?).
I'm afraid I'm not familiar with the Souriau-Frame Drazin algorithm, but looking at the Greville paper about it, it seems that all the calculations are rational (no square roots etc.), so I wonder if it's feasible to perform your calculations with completely exact rational numbers instead of using floating-point numbers. Then presumably exact means exact, and you don't need to worry about thresholds at all.
I am trying to generate a skewed trapezoidal distribution using inverse transform sampling.
The inputs are the values where the ramps start and end (a, b, c, d) and the sample size.
a=-3;b=-1;c=1;d=8;
SampleSize=10e4;
h=2/(d+c-a-b);
Then I calculate the ratio of the length of ramps and flat components to get sample size for each:
firstramp=round(((b-a)/(d-a)),3);
flat=round((c-b)/(d-a),3);
secondramp=round((d-c)/(d-a),3);
n1=firstramp*SampleSize; %sample size for first ramp
n3=secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
And then finally I get the histogram from the following code:
quartile1=h/2*(b-a);
quartile2=1-h/2*(d-c);
y1=linspace(0,quartile1,n1);
y2=linspace(quartile1,quartile2,n2);
y3=linspace(quartile2,1,n3);
%inverse cumulative distribution functions
invcdf1=a+sqrt(2*(b-a)/h)*sqrt(y1);
invcdf2=(a+b)/2+y2/h;
invcdf3=d-sqrt(2*(d-c)/h)*sqrt(1-y3);
distr=[invcdf1 invcdf2 invcdf3];
histogram(distr,100)
However the sampling of ramps and flat components are not equal, looks like this:
I fixed this by trial and error, by reducing the sample size of the ramps by half:
n1=0.5*firstramp*SampleSize; %sample size for first ramp
n3=0.5*secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
This made the distribution look like this:
However this makes the output sample less than what is given in input.
I've also tried different combinations of changing the sample sizes of ramps and flat.
This also works:
n1=0.75*firstramp*SampleSize; %sample size for first ramp
n3=0.75*secondramp*SampleSize; %sample size for second ramp
n2=1.5*flat*SampleSize;
It increases the output samples, but it's still not close.
Any help will be appreciated.
Full code:
a=-3;b=-1;c=1;d=8;
SampleSize=10e4;%*1.33333333333333;
h=2/(d+c-a-b);
firstramp=round(((b-a)/(d-a)),3);
flat=round((c-b)/(d-a),3);
secondramp=round((d-c)/(d-a),3);
n1=firstramp*SampleSize; %sample size for first ramp
n3=secondramp*SampleSize; %sample size for second ramp
n2=flat*SampleSize;
quartile1=h/2*(b-a);
quartile2=1-h/2*(d-c);
y1=linspace(0,quartile1,.75*n1);
y2=linspace(quartile1,quartile2,1.5*n2);
y3=linspace(quartile2,1,.75*n3);
%inverse cumulative distribution functions
invcdf1=a+sqrt(2*(b-a)/h)*sqrt(y1);
invcdf2=(a+b)/2+y2/h;
invcdf3=d-sqrt(2*(d-c)/h)*sqrt(1-y3);
distr=[invcdf1 invcdf2 invcdf3];
histogram(distr,100)
%end
I don't know Matlab so I was hoping somebody else would jump in on this, but since nobody did here goes.
If I'm reading your code correctly what you did is not an inversion. Inversion is 1-1, i.e., one uniform input produces one outcome. You seem to be using a technique known as the "composition method". In composition the overall distribution is comprised of component pieces, each of which is straightforward to generate. You choose which component to generate from based on their proportions/probabilities relative to the whole. For density functions, probability is found as the area under the density curve, so your first mistake was in sampling the components relative to the width of each component rather than using their areas. The correct sampling proportions are 2/13, 4/13, and 7/13 for what you designated the firstramp, flat, and secondramp components, respectively. A second mistake (which is relatively minor) was to assign exact sample sizes to each of the components. Having probability 2/13 does not mean that exactly 2*SampleSize/13 of your samples will be from the firstramp, it means that's the expected sample size for that component. The expected value of a random variate is not necessarily (or even likely to be) the outcome you will actually get.
In pseudocode, the composition approach would be
generate U ~ Uniform(0,1)
if U <= 2/13:
generate and return a value from firstramp
else if U <= 6/13:
generate and return a value from flat
else:
generate and return a value from secondramp
Note that since each of the generate options will use one or more uniforms, and choosing between the options requires a uniform U, this is not an inversion.
If you want an actual inversion, you need to quantify your density, integrate it to get the cumulative distribution function, then apply the inversion technique by setting F(X) = U and solving for X. Since your distribution is made of distinct components, both the density and cumulative density will be piecewise functions.
After deriving the height based on the requirement that the areas of the two triangles and the flat section must add up to 1, I came up with the following for your density:
| (x + 3) / 13 -3 <= x <= -1
|
f(x) = | 2 / 13 -1 <= x <= 1
|
| 2 * (8 - x) / 91 1 <= x <= 8
Integrating this and collecting terms produces the CDF:
| (x + 3)**2 / 26 -3 <= x <= -1
|
F(x) = | (2 + x) * 2 / 13 -1 <= x <= 1
|
| 6 / 13 + [49 - (x - 8)**2] / 91 1 <= x <= 8
Finally, determining the values of F(x) at the break points between the segments and applying inversion yields the following pseudocode algorithm:
generate U ~ Uniform(0,1)
if U <= 2 / 13:
return 2 * sqrt( (13 * U) / 2 ) - 3
else if U <= 6 / 13:
return (13 * U) / 2 - 2:
else:
return 8 - sqrt( 91 * (1 - U) )
Note that this is a true inversion. The outcome is determined by generating a single U, and transforming it in different ways depending on which range it falls in.
(This question is related to how to generate a dataset of correlated variables with different distributions?)
In Stata, say that I create a random variable following a Uniform[0,1] distribution:
set seed 100
gen random1 = runiform()
I now want to create a second random variable that is correlated with the first (the correlation should be .75, say), but is bounded by 0 and 1. I would like this second variable to also be more-or-less Uniform[0,1]. How can I do this?
This won't be exact, but the NORTA/copula method should be pretty close and easy to implement.
The relevant citation is:
Cario, Marne C., and Barry L. Nelson. Modeling and generating random vectors with arbitrary marginal distributions and correlation matrix. Technical Report, Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois, 1997.
The paper can be found here.
The general recipe to generate correlated random variables from any distribution is:
Draw two (or more) correlated variables from a joint standard normal distribution using corr2data
Calculate the univariate normal CDF of each of these variables using normal()
Apply the inverse CDF of any distribution to simulate draws from that distribution.
The third step is pretty easy with the [0,1] uniform: you don't even need it. Typically, the magnitude of the correlations you get will be less than the magnitudes of the original (normal) correlations, so it might be useful to bump those up a bit.
Stata Code for 2 uniformish variables that have a correlation of 0.75:
clear
// Step 1
matrix C = (1, .75 \ .75, 1)
corr2data x y, n(10000) corr(C) double
corr x y, means
// Steps 2-3
replace x = normal(x)
replace y = normal(y)
// Make sure things worked
corr x y, means
stack x y, into(z) clear
lab define vars 1 "x" 2 "y"
lab val _stack vars
capture ssc install bihist
bihist z, by(_stack) density tw1(yline(-1 0 1))
If you want to improve the approximation for the uniform case, you can transform the correlations like this (see section 5 of the linked paper):
matrix C = (1,2*sin(.75*_pi/6)\2*sin(.75*_pi/6),1)
This is 0.76536686 instead of the 0.75.
Code for the question in the comments
The correlation matrix C written more compactly, and I am applying the transformation:
clear
matrix C = ( 1, ///
2*sin(-.46*_pi/6), 1, ///
2*sin(.53*_pi/6), 2*sin(-.80*_pi/6), 1, ///
2*sin(0*_pi/6), 2*sin(-.41*_pi/6), 2*sin(.48*_pi/6), 1 )
corr2data v1 v2 v3 v4, n(10000) corr(C) cstorage(lower)
forvalues i=1/4 {
replace v`i' = normal(v`i')
}
I have a set of points like:
(x , y , z , t)
(1 , 3 , 6 , 0.5)
(1.5 , 4 , 6.5 , 1)
(3.5 , 7 , 8 , 1.5)
(4 , 7.25 , 9 , 2)
I am looking to find the best linear fit on these points, let say a function like:
f(t) = a * x +b * y +c * z
This is Linear Regression problem. The "best fit" depends on the metric you define for being better.
One simple example is the Least Squares Metric, which aims to minimize the sum of squares: (f((x_i,y_i,z_i)) - w_i)^2 - where w_i is the measured value for the sample.
So, in least squares you are trying to minimize SUM{(a*x_i+b*y_i+c*z^i - w_i)^2 | per each i }. This function has a single global minimum at:
(a,b,c) = (X^T * X)^-1 * X^T * w
Where:
X is a 3xm matrix (m is the number of samples you have)
X^T - is the transposed of this matrix
w - is the measured results: `(w_1,w_2,...,w_m)`
The * operator represents matrix multiplication
There are more complex other methods, that use other distance metric, one example is the famous SVR with a linear kernel.
It seems that you are looking for the major axis of a point cloud.
You can work this out by finding the Eigenvector associated to the largest Eigenvalue of the covariance matrix. Could be an opportunity to use the power method (starting the iterations with the point farthest from the centroid, for example).
Can also be addressed by Singular Value Decomposition, preferably using methods that compute the largest values only.
If your data set contains outliers, then RANSAC could be a better choice: take two points at random and compute the sum of distances to the line they define. Repeat a number of times and keep the best fit.
Using the squared distances will answer your request for least-squares, but non-squared distances will be more robust.
You have a linear problem.
For example, my equation will be Y=ax1+bx2+c*x3.
In MATLAB do it:
B = [x1(:) x2(:) x3(:)] \ Y;
Y_fit = [x1(:) x2(:) x3(:)] * B;
In PYTHON do it:
import numpy as np
B, _, _, _ = np.linalg.lstsq([x1[:], x2[:], x3[:]], Y)
Y_fit = np.matmul([x1[:] x2[:] x3[:]], B)
I want to generate random numbers according some distributions. How can I do this?
The standard random number generator you've got (rand() in C after a simple transformation, equivalents in many languages) is a fairly good approximation to a uniform distribution over the range [0,1]. If that's what you need, you're done. It's also trivial to convert that to a random number generated over a somewhat larger integer range.
Conversion of a Uniform distribution to a Normal distribution has already been covered on SO, as has going to the Exponential distribution.
[EDIT]: For the triangular distribution, converting a uniform variable is relatively simple (in something C-like):
double triangular(double a,double b,double c) {
double U = rand() / (double) RAND_MAX;
double F = (c - a) / (b - a);
if (U <= F)
return a + sqrt(U * (b - a) * (c - a));
else
return b - sqrt((1 - U) * (b - a) * (b - c));
}
That's just converting the formula given on the Wikipedia page. If you want others, that's the place to start looking; in general, you use the uniform variable to pick a point on the vertical axis of the cumulative density function of the distribution you want (assuming it's continuous), and invert the CDF to get the random value with the desired distribution.
The right way to do this is to decompose the distribution into n-1 binary distributions. That is if you have a distribution like this:
A: 0.05
B: 0.10
C: 0.10
D: 0.20
E: 0.55
You transform it into 4 binary distributions:
1. A/E: 0.20/0.80
2. B/E: 0.40/0.60
3. C/E: 0.40/0.60
4. D/E: 0.80/0.20
Select uniformly from the n-1 distributions, and then select the first or second symbol based on the probability if each in the binary distribution.
Code for this is here
It actually depends on distribution. The most general way is the following. Let P(X) be the probability that random number generated according to your distribution is less than X.
You start with generating uniform random X between zero and one. After that you find Y such that P(Y) = X and output Y. You could find such Y using binary search (since P(X) is an increasing function of X).
This is not very efficient, but works for distributions where P(X) could be efficiently computed.
You can look up inverse transform sampling, rejection sampling as well as the book by Devroye "Nonuniform random variate generation"/Springer Verlag 1986
You can convert from discrete bins to float/double with interpolation. Simple linear works well. If your table memory is constrained other interpolation methods can be used. -jlp
It's a standard textbook matter. See here for some code, or here at Section 3.2 for some reference mathematical background (actually very quick and simple to read).