Calculate integral of product of normal distributions efficiently [closed] - algorithm

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I've got two normal PDFs, given by μ1, μ2, σ1 and σ2. What I need is the integral over the product of these functions - the solution to the problem that if X occurred at μ1 with a certain probability expressed in σ1 and Y occurred at μ2 with a certain probability, what's the probability P(X=Y)?
x=linspace(-500,500,1000)
e1 = normpdf(x,mu1,sigma1)
e2 = normpdf(x,mu2,sigma2)
solution = sum(e1*e2)
To visualise, e1 is blue, e2 green, and e1*e2 is red (magnified by factor 100 for visualisation):
Is there however a more direct way of computing solution given mu1, mu2, sigma1 and sigma2?
Thanks!

You should be able to do the integral easily enough, but it does not mean what you think it means.
A mathematical normal distribution yields a randomly chosen real, which you could think of as containing an infinite number of random digits after the decimal point. The chance of any two numbers from such distributions being the same (even if they are from the same distribution) is zero.
A continuous probability density function p(x) like the normal distribution does not give, at p(x), the probability of the random number being x. Roughly speaking, it says that if you have a small interval of width delta-x at x then the probability of a random number being inside that interval is delta-x times p(x). For exact equality, you have to set delta-x to zero, so again you come out with probability zero.
To compute the interval (whatever it means) you might note that N(x;u,o) = exp(-(x-u)^2)/2o^2) neglecting terms that I can't be bothered to look up in http://en.wikipedia.org/wiki/Normal_distribution, and if you multiply two of these together you can add the stuff inside the exp(). If you do enough algebra you might end up with something that you can rewrite as another exponential with a quadratic inside, which will turn into another normal distribution, up to some factors which you can pull outside the integral sign.
A better way of approaching something like this problem would be to note that the difference of two normal distributions with mean M1 and M2 and variance V1 and V2 is a normal distribution with mean M1 - M2 and variance V1 + V2. Perhaps you could consider this distribution - you can easily work out that the probability that the difference of your two numbers is within any range that catches your fancy, for example between -0.0001 and +0.0001.

Related

Fast hill climbing algorithm that can stabilize when near optimal [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a floating point number x from [1, 500] that generates a binary y of 1 at some probability p. And I'm trying to find the x that can generate the most 1 or has highest p. I'm assuming there's only one maximum.
Is there a algorithm that can converge fast to the x with highest p while making sure it doesn't jump around too much after it's achieved for e.x. within 0.1% of the optimal x? Specifically, it would be great if it stabilizes when near < 0.1% of optimal x.
I know we can do this with simulated annealing but I don't think I should hard code temperature because I need to use the same algorithm when x could be from [1, 3000] or the p distribution is different.
This paper provides an for smart hill-climbing algorithm. The idea is basically you take n samples as starting points. The algorithm is as follows (it is simplified into one dimensional for your problem):
Take n sample points in the search space. In the paper, he uses Linear Hypercube Sampling since the dimensions of the data in the paper is assumed to be large. In your case, since it is one-dimensional, you can just use random sapling as usual.
For each sample points, gather points from its "local neighborhood" and find a best fit quadratic curve. Find the new maximum candidate from the quadratic curve. If the objective function of the new maximum candidate is actually higher than the previous one, update the sample point to the new maximum candidate. Repeat this step with smaller "local neighborhood" size for each iteration.
Use the best point from the sample points
Restart: repeat step 2 and 3, and then compare the maximums. If there is no improvement, stop. If there is improvement, repeat again.

Closest point to another point on a hypersphere [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have n (about 10^5) points on a hypersphere of dimension m (between 10^4 to 10^6).
I am going to make a bunch of queries of the form "given a point p, find the closest of the n points to p". I'll make about n of these queries.
(Not sure if the hypersphere fact helps at all.)
The simple naive algorithm to solve this is, for each query, to compare p to all other n points. Doing this n times ends up with a runtime of O(n^2 m), which is far too big for me to be able to compute.
Is there a more efficient algorithm I can use? If I could get it to O(nm) with some log factors that'd be great.
Probably not. Having many dimensions makes efficient indexing extremely hard. That is why people look for opportunities to reduce the number of dimensions to something manageable.
See https://en.wikipedia.org/wiki/Curse_of_dimensionality and https://en.wikipedia.org/wiki/Dimensionality_reduction for more.
Divide your space up into hypercubes -- call these cells -- with edge size chosen so that on average you'll have one point per cube. You'll want a map from hypercells to the set of points they contain.
Then, given a point, check its hypercell for other points. If it is empty, look at the adjacent hypercells (I'd recommend a literal hypercube of hypercells for simplicity rather than some approximation to a hypersphere built out of hypercells). Check that for other points. Keep repeating until you get a point. Assuming your points are randomly distributed, odds are high that you'll find a second point within 1-2 expansions.
Once you find a point, check all hypercells that could possibly contain a closer point. This is possible because the point you find may be in a corner, but there's some closer point outside of the hypercube containing all the hypercells you've inspected so far.

How to select the number of cluster centroid in K means [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am going through a list of algorithm that I found and try to implement them for learning purpose. Right now I am coding K mean and is confused in the following.
How do you know how many cluster there is in the original data set
Is there any particular format that I have follow in choosing the initial cluster centroid besides all centroid have to be different? For example does the algorithm converge if I choose cluster centroids that are different but close together?
Any advice would be appreciated
Thanks
With k-means you are minimizing a sum of squared distances. One approach is to try all plausible values of k. As k increases the sum of squared distances should decrease, but if you plot the result you may see that the sum of squared distances decreases quite sharply up to some value of k, and then much more slowly after that. The last value that gave you a sharp decrease is then the most plausible value of k.
k-means isn't guaranteed to find the best possible answer each run, and it is sensitive to the starting values you give it. One way to reduce problems from this is to start it many times, with different starting values, and pick the best answer. It looks a bit odd if an answer for larger k is actually larger than an answer for smaller k. One way to avoid this is to use the best answer found for k clusters as the basis (with slight modifications) for one of the starting points for k+1 clusters.
In the standard K-Means the K value is chosen by you, sometimes based on the problem itself ( when you know how many classes exists OR how many classes you want to exists) other times a "more or less" random value. Typically the first iteration consists of randomly selecting K points from the dataset to serve as centroids. In the following iterations the centroids are adjusted.
After check the K-Means algorithm, I suggest you also see the K-means++, which is an improvement of the first version, as it tries to find the best K for each problem, avoiding the sometimes poor clusterings found by the standard k-means algorithm.
If you need more specific details on implementation of some machine learning algorithm, please let me know.

Polynomial Regression - results accuracy between two algorithms [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 9 years ago.
Improve this question
I know that I can find a polynomial regression's coefficients doing (X'X)^-1 * X'y (where X' is the transpose, see Wikipedia for details).
This is a way of finding the coefficients; now, there is (as far as I know) at least one other way, which is by minimizing a cost function using gradient descent. The former method seems to be the easiest to implement ( I did it in C++, I have the latter in Matlab ).
What I wanted to know is the advantage of one of these methods over the other.
Upon a particular dataset, with very few points, I found that I couldn't find a satisfactory solution using (X'X)^-1 * X'y, but gradient descent worked fine and I could get an estimation function that made sense.
So what's wrong with the matrix resolution over gradient descent ? And how would one test a regression results, having all the details hidden from the user ?
Both methods are equivalent. Iterative method is much more computationally efficient thanks to lower storage and the avoidance of matrix inverse calculation. The method outweighs the closed form (matrix equation) methods especially when X is huge and sparse.
Make sure the row number of X is larger than the column number of X to avoid the underdetermined problem. Also check out the condition number of X'X to see if the problem is ill-posedness. If that is the case, you may add a small regularization factor in the closed form ((X'X + lambda * I)^(-1) * X'y) where lambda is a small value and I is the identity matrix.

generating random variable having an exponential density function [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I would like to generate a random variable having an exponential density function:
f(x) = e^x / (e - 1), 0 <= x <= 1
I know I can use a uniform random number generator with using the inversion method for a simple function like (e^-x). But, I am not sure how to use them on the function given above.
Any suggestions?
Per Wolfram Alpha, the integral of that density function from 0 to a is (e^a-1)/(e-1), which inverts to y=log((e-1)*x+1). So the inverse transform method should work fine.
In the more general case where the integral doesn't pan out or the inversion doesn't pan out, stochastic sampling methods are the most widely applicable methods for sampling a random variable given its probability density. The easiest to understand and implement is Rejection Sampling. After that, you're looking at Metropolis-Hastings, which is immensely powerful but not necessarily the simplest to get your head around.
The first step is to integrate f(x) from 0 to x to determine the cumulative distribution function, call this function U. When you (pseudo-)randomly pick a number, put it into this function U and find x that satisfies this.
Your function appears to be simple enough that direct inversion will work. If you have a more complicated function, you would have to use a Newton-Raphson method to solve x for the given U.

Resources