how to define the probability distribution - probability

I have small question and I will be very happy if you can give me a solution or any idea for solution of probability distribution of the following idea:
I have a random variable x which follows exponntial distribution with parameter lambda1,I have one more variable y which follows exponential distribution with parameter lambda2. z is a discrete value, how can I define the probability distribution of k in the following formula ?
k=z-x-y
Thank you so much

Ok, lets start with rewriting formula a bit:
k = z-x-y = -(x-y) + z = - (x + y + -z)
That parts in the parentheses looks manageable. Let's start with x+y. For random variable x and y if one wants to find out their sum, answer is PDFs convolution.
q = x+y
PDF(q) = S PDFx(q-t) PDFy(t) dt
where S denotes integration. For x and y being exponential, the convolution integral is known and equal to expression here when lambdas are different, or to Gamma(2,lambda) when lambdas are equal, Gamma being Gamma distribution.
If z is some constant discrete value, then we could express it as continuous RV with PDF
PDF(t) = 𝛿(t+z)
where 𝛿 is Delta function, and we take into account that peak would be at -z as expected. It is normalized, so integral over t is eqaul to 1. It could be easily extended to discrete RV, as sum of 𝛿-functions at those values, multiplied by probabilities such that sum of them is equal to 1.
Again, we have sum of two RV, with known PDFs, and solution is convolution, which is easy to compute due to property of 𝛿-function. So final PDF of x + y + -z would be
PDF(q+z) dq
where PDF is taken from sum expression from Exponential distribution wiki, of Gamma distribution from Gamma wiki.
You just have to negate, and that's it

Related

Implement density function

I am going through my book , it states that " Write a sampling algorithm for this density function"
y=x^2+(2/3)*x+1/3; 0 < 𝑥 < 1
Or I can use Monte Carlo?
Any help would be appreciated!
I'm assuming you mean you want to generate random x values that have the distribution specified by density y(x).
It's often desirable to derive the cumulative distribution function by integrating the density, and use inverse transform sampling to generate x values. In your case the the CDF is a third order polynomial which doesn't factor to yield a simple cube-root solution, so you would have to use a numerical solver to find the inverse. Time to consider alternatives.
Another option is to use the acceptance/rejection method. After checking the derivative, it's clear that your density is convex, so it's easy to create a bounding function b(x) by drawing a straight line from f(0) to f(1). This yields b(x) = 1/3 + 5x/3. This bounding function has area 7/6, while your f(x) has an area of 1, since it is a valid density. Consequently, 6/7 of points generated uniformly under b(x) will also fall under f(x), and only 1 out of 7 attempts will fail in the rejection scheme. Here's a plot of f(x) and b(x):
Since b(x) is linear, it is easy to generate x values using it as a distribution after scaling by 6/7 to make it a valid distribution function. The algorithm, expressed in pseudocode, then becomes:
function generate():
while TRUE:
x <- (sqrt(1 + 35 * U(0,1)) - 1) / 5 # inverse CDF transform of b(x)
if U(0, b(x)) <= f(x):
return x
end while
end function
where U(a,b) means generate a value uniformly distributed between a and b, f(x) is your density, and b(x) is the bounding function described above.
I implemented the algorithm described above to generate 100,000 candidate values, of which 14,199 (~1/7) were rejected, as expected. The end results are presented in the following histogram, which you can compare to f(x) in the plot above.
I'm assuming that you have a function y(x), which takes a value between [0,1] and returns the value of y. You just need to provide a random value of x and return the corresponding value of y.
def getSample():
#get uniform random number
x = numpy.random.random()
#sample my custom function
return y(x)

Probability normal distribution with an equation P(|x-3| > 5) for X~N(2,6)

I'm confused about how to go about solving this problem. I don't quite understand what |x-3| represents in this case, and how it impacts the outcome when the variable is normally distributed. What would be the steps required to solve this?
It is absolute value, so P(|X-3|>5) means out of whole [-infinity...+infinity] range subrange around point x=3 with width of 5 is excluded.
So you have X in ranges [-infinity...-2] and [8...+infinity]
Given N(x;2,6) distribution, probability would be sum of integrals
P(|X-3|>5) = S[-infinity...-2] N(x;2,6) dx + S[8...+infinity] N(x;2,6) dx
where S denotes integration, or, equivalent
P(|X-3|>5) = 1 - S[-2...8] N(x;2,6) dx

Sliding weighted randomization from number range?

When picking a random number from a range, I'm doing rand(0..100). That works all well and good, but I'd like it to favor the lower end of the range.
So, there's the highest probability of picking 0 and the lowest probability of picking 100 (and then everything in between), based on some weighted scale.
How would I implement that in Ruby?
You could try taking the lower of two random numbers. That would favour smaller numbers.
[rand(0..100), rand(0..100)].min
If your first number is 5, the chances of your second number being lower (and replacing) is only 4 in 100.
If your first number is 95, the chances of your second number being lower is 94 out of 100 so it's likely to be replaced with the lower number.
My answer concerns the generation of random variates from underlying probability distributions generally, not just those distributions that give greater weight to smaller random variates.
You need to identify a (probability) density function f that has has the desired shape. Then construct its (cumulative) distribution function F and the latter's inverse function G (the quantile), meaning that G(F(x)) = x for all x in the sample space. f can be continuous or discrete.
For example, f and F could be the (negative) exponential density and distribution functions, which give higher weight to smaller values, as shown below (source: Wiki for Exponential Distribution).
Exponential PDF Exponential CDF
These functions are given by f(x) = λe−λx and F(x) = 1 − e−λx, respectively, where e is the base of natural logarithms. λ is a shape parameter.
To generate random variates for this distribution we would draw a (pseudo-) random number between 0 and 1, mark that on the vertical axis of the CDF graph and draw a horizontal line from that point. The random variate is the point on the horizontal axis where the CDF intersects the horizontal line. If y is the random number between 0 and 1, we have
y = 1 − e**(−λx)
Solving for x,
x = -log(1 - y)/λ
so the inverse CDF is seen to be
g(y) = -log(1 - y)/λ
Here are some random variates for λ = 1.
def g(y)
-Math.log(1 - y)
end
5.times { y = rand; puts "y = #{y.round(2)}, x = #{g(y).round(2)}" }
y = 0.09, x = 0.10
y = 0.67, x = 1.09
y = 0.35, x = 0.43
y = 0.55, x = 0.79
y = 0.19, x = 0.21
Most CDFs do not have closed-form inverse functions, but if the CDF is continuous, a binary search can be performed to compute an arbitrarily-close approximation to the random variate (x on the graph) for a given y = rand.
The Weibull Distribution is one of the few other continuous distributions (besides uniform and triangular) that has a closed-form inverse function. Having two parameters, it offers greater scope than the single-parameter exponential distribution for modelling a desired shape.
For discrete CDFs, one can use if statements (or, better, a case statement) to compute the random variate for a given y = rand.
I'd do something like this:
low_end_of_range = 1
high_end_of_range = 100
weighted_range = []
(low_end_of_range..high_end_of_range).each do |num|
weight = (high_end_of_range - num) + 1
weight.times do
weighted_range << num
end
end
weighted_range.sample
This will give:
1 the highest probability of being picked, as it would appear 100 times in the weighted_range array,
2 the second highest probability of being picked, as it would appear 99 times in the weighted_range array,
100 the lowest probability of being picked, as it would appear only once in the weighted_range array, and
99 the second lowest probability of being picked, as it would appear twice in the weighted_range array,
etc.
And if you don't need any flexibility in the size of your sampling (i.e. low_end_of_range / high_end_of_range), you can do it in a nice one-liner:
(1..100).map { |i| (101 - i).times.map { i } }.flatten.sample

least square line fitting in 4D space

I have a set of points like:
(x , y , z , t)
(1 , 3 , 6 , 0.5)
(1.5 , 4 , 6.5 , 1)
(3.5 , 7 , 8 , 1.5)
(4 , 7.25 , 9 , 2)
I am looking to find the best linear fit on these points, let say a function like:
f(t) = a * x +b * y +c * z
This is Linear Regression problem. The "best fit" depends on the metric you define for being better.
One simple example is the Least Squares Metric, which aims to minimize the sum of squares: (f((x_i,y_i,z_i)) - w_i)^2 - where w_i is the measured value for the sample.
So, in least squares you are trying to minimize SUM{(a*x_i+b*y_i+c*z^i - w_i)^2 | per each i }. This function has a single global minimum at:
(a,b,c) = (X^T * X)^-1 * X^T * w
Where:
X is a 3xm matrix (m is the number of samples you have)
X^T - is the transposed of this matrix
w - is the measured results: `(w_1,w_2,...,w_m)`
The * operator represents matrix multiplication
There are more complex other methods, that use other distance metric, one example is the famous SVR with a linear kernel.
It seems that you are looking for the major axis of a point cloud.
You can work this out by finding the Eigenvector associated to the largest Eigenvalue of the covariance matrix. Could be an opportunity to use the power method (starting the iterations with the point farthest from the centroid, for example).
Can also be addressed by Singular Value Decomposition, preferably using methods that compute the largest values only.
If your data set contains outliers, then RANSAC could be a better choice: take two points at random and compute the sum of distances to the line they define. Repeat a number of times and keep the best fit.
Using the squared distances will answer your request for least-squares, but non-squared distances will be more robust.
You have a linear problem.
For example, my equation will be Y=ax1+bx2+c*x3.
In MATLAB do it:
B = [x1(:) x2(:) x3(:)] \ Y;
Y_fit = [x1(:) x2(:) x3(:)] * B;
In PYTHON do it:
import numpy as np
B, _, _, _ = np.linalg.lstsq([x1[:], x2[:], x3[:]], Y)
Y_fit = np.matmul([x1[:] x2[:] x3[:]], B)

Generate random numbers according to distributions

I want to generate random numbers according some distributions. How can I do this?
The standard random number generator you've got (rand() in C after a simple transformation, equivalents in many languages) is a fairly good approximation to a uniform distribution over the range [0,1]. If that's what you need, you're done. It's also trivial to convert that to a random number generated over a somewhat larger integer range.
Conversion of a Uniform distribution to a Normal distribution has already been covered on SO, as has going to the Exponential distribution.
[EDIT]: For the triangular distribution, converting a uniform variable is relatively simple (in something C-like):
double triangular(double a,double b,double c) {
double U = rand() / (double) RAND_MAX;
double F = (c - a) / (b - a);
if (U <= F)
return a + sqrt(U * (b - a) * (c - a));
else
return b - sqrt((1 - U) * (b - a) * (b - c));
}
That's just converting the formula given on the Wikipedia page. If you want others, that's the place to start looking; in general, you use the uniform variable to pick a point on the vertical axis of the cumulative density function of the distribution you want (assuming it's continuous), and invert the CDF to get the random value with the desired distribution.
The right way to do this is to decompose the distribution into n-1 binary distributions. That is if you have a distribution like this:
A: 0.05
B: 0.10
C: 0.10
D: 0.20
E: 0.55
You transform it into 4 binary distributions:
1. A/E: 0.20/0.80
2. B/E: 0.40/0.60
3. C/E: 0.40/0.60
4. D/E: 0.80/0.20
Select uniformly from the n-1 distributions, and then select the first or second symbol based on the probability if each in the binary distribution.
Code for this is here
It actually depends on distribution. The most general way is the following. Let P(X) be the probability that random number generated according to your distribution is less than X.
You start with generating uniform random X between zero and one. After that you find Y such that P(Y) = X and output Y. You could find such Y using binary search (since P(X) is an increasing function of X).
This is not very efficient, but works for distributions where P(X) could be efficiently computed.
You can look up inverse transform sampling, rejection sampling as well as the book by Devroye "Nonuniform random variate generation"/Springer Verlag 1986
You can convert from discrete bins to float/double with interpolation. Simple linear works well. If your table memory is constrained other interpolation methods can be used. -jlp
It's a standard textbook matter. See here for some code, or here at Section 3.2 for some reference mathematical background (actually very quick and simple to read).

Resources