Efficient method for convolution like sum evaluation - algorithm

Problem Given N 3-dimensional points which are {$p_1,p_2,..,p_n$} where $p_i = (x_i,y_i,z_i) $ . I have to find the value of the formula
for some given constant integers P, Q, R, S.
all numbers are between 1 and M ( = 100).
I need an efficient method for the calculation for this formula
Please give any idea about how to reduce complexity better than $O(n^2)$

Assuming that all coordinates are between 1 and 100, then you could do this via:
Compute 3d histogram of all points O(100*100*100) operations.
Use FFT to compute convolution of histograms along each of the 3 axes
This will result in a 3d histogram of 3d vectors. You can then iterate over this histogram to compute your desired value.
The main point is that computing a convolution of histogram of values computes the histogram of pairwise differences of those values. This can also be used to compute a histogram of sums of values in a similar way.

Your problem looks like a particle potential problem (the kind you have in electrodynamics for instance), where you have to find some "potential" at the location (x_j, y_j) by summing all elementary contributions from the i-th particles.
The fast algorithm specific for this class of problems is the Fast Multipole method. Look up this keyword, but I must warn you it is by no means simple to understand or implement. Strong math background needed.


Divide 2D array into continuous regions of as-equal-as-possible sums?

I have a 2D array of floating-point numbers, and I'd like to divide this array into an arbitrary number of regions such that the sum of all the regions' elements are more or less equal. The regions must be continuous. By as-equal-as-possible, I mean that the standard deviation of the region sums should be reduced as much as possible.
I'm doing this because I have a map of values corresponding to the "population" in an area, and I want to divide this area into groups of relatively equal population.
I would do it like this:
1.compute the whole sum
2.compute local centers of mass (coordinates)
3.now compute the region sum
for example:
region sum = whole sum / number of centers of masses
4.for each center of mass
start a region
and incrementally increase the size until it sum match region sum
avoid intersection of regions (use some map of usage for that)
if region has the desired sum or has nowhere to grow stop
You will have to tweak this algorithm a little to suite your needs and input data
Hope it helps a little ...
Standard deviation is way to measure that whether the divisions are close to equal. Lower standard deviation means closer the sums are.
As the problem seems n-p like clustering problems , Genetic algorithms can be used to get good solutions to the problem :-
Standard deviation can be used as fitness measure for chromosomes.
Consider k contagious regions then each gene(element) will have one of the k values which maintain the contagious nature of the regions.
apply genetic algorithm on the chromosomes and get the best chromosome for that value of k after a fixed amount of generations.
vary k from 2 to n and get best chromosome by applying genetic algorithms.

Math - 3d positioning/multilateration

I have a problem involving 3d positioning - sort of like GPS. Given a set of known 3d coordinates (x,y,z) and their distances d from an unknown point, I want to find the unknown point. There can be any number of reference points, however there will be at least four.
So, for example, points are in the format (x,y,z,d). I might have:
And here the unknown point would be (0,0,0,0).
What would be the best way to go about this? Is there an existing library that supports 3d multilateration? (I have been unable to find one). Since it's unlikely that my data will have an exact solution (all of the 4+ spheres probably won't have a single perfect intersect point), the algorithm would need to be capable of approximating it.
So far, I was thinking of taking each subset of three points, triangulating the unknown based on those three, and then averaging all of the results. Is there a better way to do this?
You could take a non-linear optimisation approach, by defining a "cost" function that incorporates the distance error from each of your observation points.
Setting the unknown point at (x,y,z), and considering a set of N observation points (xi,yi,zi,di) the following function could be used to characterise the total distance error:
C(x,y,z) = sum( ((x-xi)^2 + (y-yi)^2 + (z-zi)^2 - di^2)^2 )
^^^ for all observation points i = 1 to N
This is the sum of the squared distance errors for all points in the set. (It's actually based on the error in the squared distance, so that there are no square roots!)
When this function is at a minimum the target point (x,y,z) would be at an optimal position. If the solution gives C(x,y,z) = 0 all observations would be exactly satisfied.
One apporach to minimise this type of equation would be Newton's method. You'd have to provide an initial starting point for the iteration - possibly a mean value of the observation points (if they en-circle (x,y,z)) or possibly an initial triangulated value from any three observations.
Edit: Newton's method is an iterative algorithm that can be used for optimisation. A simple version would work along these lines:
H(X(k)) * dX = G(X(k)); // solve a system of linear equations for the
// increment dX in the solution vector X
X(k+1) = X(k) - dX; // update the solution vector by dX
The G(X(k)) denotes the gradient vector evaluated at X(k), in this case:
G(X(k)) = [dC/dx
The H(X(k)) denotes the Hessian matrix evaluated at X(k), in this case the symmetric 3x3 matrix:
H(X(k)) = [d^2C/dx^2 d^2C/dxdy d^2C/dxdz
d^2C/dydx d^2C/dy^2 d^2C/dydz
d^2C/dzdx d^2C/dzdy d^2C/dz^2]
You should be able to differentiate the cost function analytically, and therefore end up with analytical expressions for G,H.
Another approach - if you don't like derivatives - is to approximate G,H numerically using finite differences.
Hope this helps.
Non-linear solution procedures are not required. You can easily linearise the system. If you take pair-wise differences
then a bit of algebra yields the linear equations
$(x_i-x_j) x +(y_i-y_j) y +(zi-zj) z=-1/2(d_i^2-d_j^2+ds_i^2-ds_j^2)$,
where $ds_i$ is the distance from the $i^{th}$ sensor to the origin. These are the equations of the planes defined by intersecting the $i^{th}$ and the $j^{th}$ spheres.
For four sensors you obtain an over-determined linear system with $4 choose 2 = 6$ equations. If $A$ is the resulting matrix and $b$ the corresponding vector of RHS, then you can solve the normal equations
$A^T A r = A^T b$
for the position vector $r$. This will work as long as your sensors are not coplanar.
If you can spend the time, an iterative solution should approach the correct solution pretty quickly. So pick any point the correct distance from site A, then go round the set working out the distance to the point then adjusting the point so that it's in the same direction from the site but the correct distance. Continue until your required precision is met (or until the point is no longer moving far enough in each iteration that it can meet your precision, as per the possible effects of approximate input data).
For an analytic approach, I can't think of anything better than what you already propose.

Random projection algorithm pseudo code

I am trying to apply Random Projections method on a very sparse dataset. I found papers and tutorials about Johnson Lindenstrauss method, but every one of them is full of equations which makes no meaningful explanation to me. For example, this document on Johnson-Lindenstrauss
Unfortunately, from this document, I can get no idea about the implementation steps of the algorithm. It's a long shot but is there anyone who can tell me the plain English version or very simple pseudo code of the algorithm? Or where can I start to dig this equations? Any suggestions?
For example, what I understand from the algorithm by reading this paper concerning Johnson-Lindenstrauss is that:
Assume we have a AxB matrix where A is number of samples and B is the number of dimensions, e.g. 100x5000. And I want to reduce the dimension of it to 500, which will produce a 100x500 matrix.
As far as I understand: first, I need to construct a 100x500 matrix and fill the entries randomly with +1 and -1 (with a 50% probability).
Okay, I think I started to get it. So we have a matrix A which is mxn. We want to reduce it to E which is mxk.
What we need to do is, to construct a matrix R which has nxk dimension, and fill it with 0, -1 or +1, with respect to 2/3, 1/6 and 1/6 probability.
After constructing this R, we'll simply do a matrix multiplication AxR to find our reduced matrix E. But we don't need to do a full matrix multiplication, because if an element of Ri is 0, we don't need to do calculation. Simply skip it. But if we face with 1, we just add the column, or if it's -1, just subtract it from the calculation. So we'll simply use summation rather than multiplication to find E. And that is what makes this method very fast.
It turned out a very neat algorithm, although I feel too stupid to get the idea.
You have the idea right. However as I understand random project, the rows of your matrix R should have unit length. I believe that's approximately what the normalizing by 1/sqrt(k) is for, to normalize away the fact that they're not unit vectors.
It isn't a projection, but, it's nearly a projection; R's rows aren't orthonormal, but within a much higher-dimensional space, they quite nearly are. In fact the dot product of any two of those vectors you choose will be pretty close to 0. This is why it is a generally good approximation of actually finding a proper basis for projection.
The mapping from high-dimensional data A to low-dimensional data E is given in the statement of theorem 1.1 in the latter paper - it is simply a scalar multiplication followed by a matrix multiplication. The data vectors are the rows of the matrices A and E. As the author points out in section 7.1, you don't need to use a full matrix multiplication algorithm.
If your dataset is sparse, then sparse random projections will not work well.
You have a few options here:
Option A:
Step 1. apply a structured dense random projection (so called fast hadamard transform is typically used). This is a special projection which is very fast to compute but otherwise has the properties of a normal dense random projection
Step 2. apply sparse projection on the "densified data" (sparse random projections are useful for dense data only)
Option B:
Apply SVD on the sparse data. If the data is sparse but has some structure SVD is better. Random projection preserves the distances between all points. SVD preserves better the distances between dense regions - in practice this is more meaningful. Also people use random projections to compute the SVD on huge datasets. Random Projections gives you efficiency, but not necessarily the best quality of embedding in a low dimension.
If your data has no structure, then use random projections.
Option C:
For data points for which SVD has little error, use SVD; for the rest of the points use Random Projection
Option D:
Use a random projection based on the data points themselves.
This is very easy to understand what is going on. It looks something like this:
create a n by k matrix (n number of data point, k new dimension)
for i from 0 to k do #generate k random projection vectors
randomized_combination = feature vector of zeros (number of zeros = number of features)
sample_point_ids = select a sample of point ids
for each point_id in sample_point_ids do:
random_sign = +1/-1 with prob. 1/2
randomized_combination += random_sign*feature_vector[point_id] #this is a vector operation
normalize the randomized combination
#note that the normal random projection is:
# randomized_combination = [+/-1, +/-1, ...] (k +/-1; if you want sparse randomly set a fraction to 0; also good to normalize by length]
to project the data points on this random feature just do
for each data point_id in dataset:
scores[point_id, j] = dot_product(feature_vector[point_id], randomized_feature)
If you are still looking to solve this problem, write a message here, I can give you more pseudocode.
The way to think about it is that a random projection is just a random pattern and the dot product (i.e. projecting the data point) between the data point and the pattern gives you the overlap between them. So if two data points overlap with many random patterns, those points are similar. Therefore, random projections preserve similarity while using less space, but they also add random fluctuations in the pairwise similarities. What JLT tells you is that to make fluctuations 0.1 (eps)
you need about 100*log(n) dimensions.
Good Luck!
An R Package to perform Random Projection using Johnson- Lindenstrauss Lemma

How to calculate a covariance matrix from each cluster, like from k-means?

I've been searching everywhere and I've only found how to create a covariance matrix from one vector to another vector, like cov(xi, xj). One thing I'm confused about is, how to get a covariance matrix from a cluster. Each cluster has many vectors. how to get them into one covariance matrix. Any suggestions??
info :
input : vectors in a cluster, Xi = (x0,x1,...,xt), x0 = { 5 1 2 3 4} --> a column vector
(actually it's an MFCC feature vector which has 12 coefficients per vector, after clustering them with k-means, 8 cluster, now i want to get the covariance matrix for each cluster to use it as the covariance matrix in Gaussian Mixture Model)
output : covariance matrix n x n
The question you are asking is: Given a set of N points of dimension D (e.g. the points you initially clustered as "speaker1"), fit a D-dimensional gaussian to those points (which we will call "the gaussian which represents speaker1"). To do so, merely calculate the sample mean and sample covariance: http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Estimation_of_parameters or http://en.wikipedia.org/wiki/Sample_mean_and_covariance
Repeat for the other k=8 speakers. I believe you may be able to use a "non-parametric" stochastic process, or modify the algorithm (e.g. run it a few times on many speakers), to remove your assumption of k=8 speakers. Note that the standard k-means clustering algorithms (and other common algorithms like EM) are very fickle in that they will give you different answers depending on how you initialize, so you may wish to perform appropriate regularization to penalize "bad" solutions as you discover them.
(below is my answer before you clarified your question)
covariance is a property of two random variables, which is a rough measure of how much changing one affects the other
a covariance matrix is merely a representation for the NxM separate covariances, cov(x_i,y_j), each element from the set X=(x1,x2,...,xN) and Y=(y1,y2,...,yN)
So the question boils down to, what you are actually trying to do with this "covariance matrix" you are searching for? Mel-Frequency Cepstral Coefficients... does each coefficient correspond to each note of an octave? You have chosen k=12 as the number of clusters you'd like? Are you basically trying to pick out notes in music?
I'm not sure how covariance generalizes to vectors, but I would guess that the covariance between two vectors x and y is just E[x dot y] - (E[x] dot E[y]) (basically replace multiplication with dot product) which would give you a scalar, one scalar per element of your covariance matrix. Then you would just stick this process inside two for-loops.
Or perhaps you could find the covariance matrix for each dimension separately. Without knowing exactly what you're doing though, one cannot give further advice than that.

Algorithm that takes 2 'similar' matrices and 'aligns' one to another

First of all, the title is very bad, due to my lack of a concise vocabulary. I'll try to describe what I'm doing and then ask my question again.
Background Info
Let's say I have 2 matrices of size n x m, where n is the number of experimental observation vectors, each of length m (the time series over which the observations were collected). One of these matrices is the original matrix, called S, the other which is a reconstructed version of S, called Y.
Let's assume that Y properly reconstructs S. However due to the limitations of the reconstruction algorithm, Y can't determine the true amplitude of the vectors in S, nor is it guaranteed to provide the proper sign for those vectors (the vectors might be flipped). Also, the order of the observation vectors in Y might not match the original ordering of the corresponding vectors in S.
My Question
Is there an algorithm or technique to generate a new matrix which is a 'realignment' of Y to S, so that when Y and S are normalized, the algorithm can (1) find the vectors in Y that match the vectors in S and restore the original ordering of the vectors and (2) likewise match the signs of the vectors?
As always, I really appreciate all help given. Thanks!
How about simply calculating the normalized form for each vector in both matrices and comparing? That should give you an exacty one-to-one match for each vector in each matrix.
The normal form of a vector is one that conforms to:
v_norm = v / ||v||
where ||v|| is the euclidean norm for the vector. For v=(v1, v2, ..., vn) we have ||v|| = sqrt(v1^2 + ... + vn^2).
From there you can reconstruct their order, and return each vector its original length and direction (the vector or its opposite).
The algorithm should be fairly simple from here on, just decide on your implementation. This method should be of quadratic complexity. Per the comment, you can indeed achieve O(nlogn) complexity on this algorithm. If you need something better than that, linear complexity - specifically, you're going to need a much more complicated algorithm which I can't think of right now.
