how to fast compute distance between high dimension vectors - algorithm

assume there are three group of high dimension vectors:
{a_1, a_2, ..., a_N},
{b_1, b_2, ... , b_N},
{c_1, c_2, ..., c_N}.
each of my vector can be represented as: x = a_i + b_j + c_k, where 1 <=i, j, k <= N. then the vector is encoded as (i, j, k) wich is then can be decoded as x = a_i + b_j + c_k.
my question is, if there are two vector: x = (i_1, j_1, k_1), y = (i_2, j_2, k_2), is there a method to compute the euclidian distance of these two vector without decode x and y.

Square root of the sum of squares of the differences between components. There's no other way to do it.
You should scale the values to guard against overflow/underflow issues. Search for the max difference and divide all the components by it before squaring, summing, and taking the square root.

Let's assume you have only two groups. You are trying to compute the scalar product
(a_i1 + b_j1, a_i2 + b_j2)
= (a_i1,a_i2) + (b_j1,b_j2) + (a_i1,b_j2) + (a_i2,b_j1) # <- elementary scalar products
So if you know the necessary elementary scalar products between the elements of your vectors a_i, b_j, c_k, then, you do not need to "decode" x and y and can compute the scalar product directly.
Note that this is exactly what happens when you compute an ordinary euclidian distance on a non orthogonal basis.

If you are happy with an approximate result, you could project your high dimension basis vectors using a random projection into a small dimensional space. Johnson-Lindenstrauss lemma says that you can reduce your dimension to O(log N), so that distances remain approximately the same with high probability.

Related

Algorithm to transform one set of numbers to another with optimization

We need to transform a set of integers A to another set of integers B such that the sum of squares of the elements of B is equal to a certain given value M.
Since there can be multiple such transformations, we need to find the one in which the sum of the square of the difference between the corresponding elements of A and B is minimum.
Input:
A set of non-negative integers A = {a1, a2, a3 ... an}
A non-negative integer M
Output:
A set of numbers B = {b1, b2, b3 ... bn}, such that:
sumi=1->n[ bi ^ 2 ] = M
sumi=1->n[ (ai-bi) ^ 2 ] = S is minimized.
The minimum sum S.
A bit of math.
Sum (ai - bi)2 = Sum (ai2 - 2 aibi + bi2) = Sum ai2 - 2 Sum aibi + Sum bi2
The first term is constant; the last one is M (also constant), so you are seeking to maximize
Sum aibi
with the restriction Sum bi2 = M.
In other words, you need a hyperplane normal to a vector A = { ai }, tangent to a hypersphere with a radius sqrt(M). Such hyperplane passes through a point where the normal line intersects with the sphere. This point is fA with |fA| = sqrt(M):
f = sqrt(M)/sqrt(Sum ai2)
The solution to your problem is
bi = ai * sqrt(M)/sqrt(Sum ai2)
EDIT: The answers so far, including the one below, map A to a set of real numbers instead of integers. As far as I can tell there is no general fix for this because there are many values of M for which there is no integer vector satisfying the constraint. Ex: M = 2. There is no vector of integers the sum of whose squares is 2. Even if M is a sum of squares, it is a sum of a certain number of squares, so even M = 4 has no solution if A has 3 or more non-zero components. As such, there is no general mapping that satisfies the problem as stated.
Here is the version that allows B to be a vector of reals:
The answer by #user58697 is quite elegant. Here is a restatement that is, perhaps, more intuitive for those of us less used to thinking with hyper geometry:
Treat A and B as vectors. Then start the same way: sum(ai - bi)2 = sum(ai2) - 2sum(aibi) + sum(bi2)
The first term is the magnitude of the vector A squared just as the last term is the magnitude of vector B squared. Both are constant so only the middle term can change. That means we want to maximize sum(aibi) which is exactly the dot product of A and B (https://en.wikipedia.org/wiki/Dot_product). The dot product of two vectors is maximized when the angle between them is 0, which is to say when they are co-directional (that is they point in the same direction).
This means that the unit vector forms of A and B must be the same. That is:
ai/|A| = bi/|B|. Solve this for bi: bi = ai * |B| / |A|
But |B| is just sqrt(M) and A is sqrt(sum(ai2)). So, just like in user58697's version:
bi = ai * sqrt(M) / sqrt(sum(ai2))

Algorithm to find if triangles formed by set of points contains origin or not and give total count as well?

Input: S = {p1, . . . , pn}, n points on 2D plane each point is given by its x and y-coordinate.
For simplicity, we assume:
The origin (0, 0) is NOT in S.
Any line L passing through (0, 0) contains at most one point in S.
No three points in S lie on the same line.
If we pick any three points from S, we can form a triangle. So the total number of triangles that can be formed this way is Θ(n^3).
Some of these triangles contain (0, 0), some do not.
Problem: Calculate the number of triangles that contain (0, 0).
You may assume we have an O(1) time function Test(pi, pj , pk) that, given three points pi, pj , pk in S, returns 1, if the triangle formed by {pi, pj , pk} contains (0, 0), and returns 0 otherwise. It’s trivial to solve the problem in Θ(n^3) time (just enumerate and test all triangles).
Describe an algorithm for solving this problem with O(n log n) run time.
My analysis of the above problem leads to the following conclusion
There are 4 coordinates ( + ,+ ) , ( + ,- ) , ( -, - ), ( -, + ) { x and y coordinate > 0 or not }.
Let
s1 = coordinate x < 0 and y > 0
s2 = x > 0 , y > 0
s3 = x < 0 , y < 0
s4 = x > 0 , y < 0
Now we need to do the testing of points in between sets of the following combinations only
S1 S2 S3
S1 S1 S4
S2 S2 S3
S3 S3 S2
S1 S4 S4
S1 S3 S4
S1 S2 S4
S2 S3 S4
I now need to test the points in the above combination of sets only ( e.g. one point from s1 , one point from s2 and one point from s3 < first combinaton > ) and see the points contain (0,0) by calling Test function ( which is assumed as constant time function here) .
Can someone guide me on this ?
Image added below for clarification on why only some subsets (s1,s2 , s4 ) can contain (0,0) and some ( s1,s1,s3) cannot.
I'm guessing we're in the same class (based on the strange wording of the question), so now that the due date is past, I feel alright giving out my solution. I managed to find the n log n algorithm, which, as the question stated, is more a matter of cleverly transforming the problem, and less of a Dynamic Programming / DaC solution.
Note: This is not an exhaustive proof, I leave that to you.
First, some visual observations. Take some triangle that obviously contains the origin.
Then, convert the points to vectors.
Convince yourself that any selection of three points, one from each vector, describes a triangle that also contains the origin.
It also follows that, if you perform the above steps on a triangle that doesn't enclose the origin, any combination of points along those vectors will also not contain the origin.
The main point to get from this is, the magnitude of the vector does not matter, only the direction. Additionally, a hint to the question says that "any line crossing (0,0) only contains one point in S", from which we can extrapolate that the direction of each vector is unique.
So, if only the angle matters, it would follow that there is some logic that determines what range of points, given two points, could possibly form a triangle that encloses the origin. For simplicity, we'll assume we've taken all the points in S and converted them to vectors, then normalized them, effectively making all points lie on the unit circle.
So, take two points along this circle.
Then, draw a line from each point through the origin and to the opposite side of the circle.
It follows that, given the two points, any point that lies along the red arc can form a triangle.
So our algorithm should do the following:
Take each point in S. Make a secondary array A, and for each point, add the angle along the unit circle (atan2(x,y)) to A (0 ≤ Ai ≤ 2π). Let's assume this is O(n)
Sort A by increasing. O(n log n), assuming we use Merge Sort.
Count the number of triangles possible for each pair (Ai,Aj). This means that we count the number of Ai + π ≤ Ak ≤ Aj + π. Since the array is sorted, we can use a Binary Search to find the indices of Ai + π and Aj + π, which is O(2 log n) = O(log n)
However, we run into a problem, there are n^2 points, and if we have to do an O(log n) search for each, we have O(n^2 log n). So, we need to make one more observation.
Given some Ai < Aj, we'll say Tij describes the number of triangles possible, as calculated by the above method. Then, given a third Ak > Aj, we know that Tij ≤ Tik, as the number of points between Ai + π and Ak + π must be at least as many as there are betwen Ai + π and Aj + π. In fact, it is exactly the count between Ai + π and Aj + π, plus the count between Aj + π and Ak + π. Since we already know the count between Ai + π and Aj + π, we don't need to recalculate it - we only need to calculate the number between Aj + π and Ak + π, then add the previous count. It follows that:
A(n) = count(A(n),A(n-1)) + count(A(n-1),A(n-2)) + ... + count(A(1),A(0))
And this means we don't need to check all n^2 pairs, we only need to check consecutive pairs - so, only n-1.
So, all the above can give us the following psuedocode solution.
int triangleCount(point P[],int n)
int A[n], C[n], totalCount = 0;
for(i=0...n)
A[i] = atan2(P[i].x,P[i].y);
mergeSort(A);
int midPoint = binarySearch(A,π);
for(i=0...midPoint-1)
int left = A[i] + π, right = A[i+1] + π;
C[i] = binarySearch(a,right) - binarySearch(a,left);
for(j=0...i)
totalCount += C[j]
return totalCount;
It seems that in the worst case there are Θ(n3) triangles containing the origin, and since you need them all, the answer is no, there is no better algorithm.
For a worst case consider a regular polygon of an odd degree n, centered at the origin.
Here is an outline of the calculations. A chord connecting two vertices which are k < n/2 vertices apart is a base for Θ(k) triangles. Fix a vertex; its contribution is a sum over all chords coming from it, yielding Θ(n2), and a total (a contribution of all n vertices) is Θ(n3) (each triangle is counted 3 times, which doesn't affect the asymptotic).

Algorithm to create a vector based puzzle

I am working on a little puzzle-game-project. The basic idea is built around projecting multi-dimensonal data down to 2D. My only problem is how to generate the randomized scenario data. Here is the problem:
I got muliple randomized vectors v_i and a target vector t, all 2D. Now I want to randomize scalar values c_i that:
t = sum c_i v_i
Because there are more than two v_i this is a overdetermined system. I also took care that the linear combination of v_i is actual able to reach t.
How can I create (randomized) values for my c_i?
Edit: After finding this Question I can additionally state, that it is possible for me also (slightly) change the v_i.
All values are based on double
Let's say your v_i form a matrix V with 2 rows and n columns, each vector is a column. The coefficients c_i form a column vector c. Then the equation can be written in matrix form as
V×c = t
Now apply a Singular Value Decomposition to matrix V:
V = A×D×B
with A being an orthogonal 2×2 matrix, D is a 2×n matrix and B an orthogonal n×n matrix. The original equation now becomes
A×D×B×c = t
multiply this equation with the inverse of A, the inverse is the same as the transposed matrix AT:
D×B×c = AT×t
Let's introduce new symbols c'=B×c and t'=AT×t:
D×c' = t'
The solution of this equation is simple, because Matrix D looks like this:
u 0 0 0 ... // n columns
0 v 0 0 ...
The solution is
c1' = t1' / u
c2' = t2' / v
And because all the other columns of D are zero, the remaining components c3'...cn' can be chosen freely. This is the place where you can create random numbers for c3'...cn. Having vector c' you can calculate c as
c = BT×c'
with BT being the inverse/transposed of B.
Since the v_i are linearly dependent there are non trivial solutions to 0 = sum l_i v_i.
If you have n vectors you can find n-2 independent such solutions.
If you have now one solution to t = sum c_i v_i you can add any multiple of l_i to c_i and you will still have a solution: c_i' = p l_i + c_i.
For each independent solution of the homogenous problem determine a random p_j and calculate
c_i'' = c_i + sum p_j l_i_j.

find the values greater than x in N dimensional Matrix, where x is sum of index

We are given an N dimensional matrix of order [m][m][m]....n times where value position contains the value sum of its index..
For example in 6x6 matrix A, value at position A[3][4] will be 7.
We have to find out the total number of counts of elements greater than x. For 2 dimensional matrix we have following approach:
If we know the one index say [i][j] {i+j = x} then we create a diagonal by just doing [i++][j--] of [i--][j++] with constraint that i and j are always in range of 0 to m.
For example in two dimensional matrix A[6][6] for value A[3][4] (x = 7), diagonal can be created via:
A[1][6] -> A[2][5] -> A[3][4] -> A[4][3] -> A[5][2] -> A[6][2]
Here we have converted our problem into another problem which is count the element below the diagonal including the diagonal.
We can easily count in O(m) complexity instead spending O(m^2) where 2 is order of matrix.
But if we consider N dimensional matrix, how we will do it, because in N dimensional matrix if we know the index of that location,
where sum of index is x say A[i1][i2][i3][i4]....[in] times.
Then there may be multiple diagonal which satisfy that condition, say by doing i1-- we can increment any of {i2, i3, i4....in}
So, above used approach for 2 dimensional matrix become useless here... because there is only two variable quantity i1 and i2 is present.
Please help me to find solution
For 2D: count of the elements below diagonal is triangular number.
For 3D: count of the elements below diagonal plane is tetrahedral number
Note that Kth tetrahedral number is the sum of the first K triangular numbers.
For nD: n-simplexial (I don't know exact english term) number (is sum of first (n-1)-simplexial numbers).
The value of Kth n-simplexial is
S(k, n) = k * (k+1) * (k+2).. (k + n - 1) / n! = BinomialCoefficient(k+n-1, n)
Edit: this method works "as is" for limited values of X below main anti-diagonal (hyper)plane.
Generating function approach:
Let's we have polynom
A(s)=1+s+s^2+s^3+..+s^m
then it's nth power
B(s) = An(s) has important property: coefficient of kth power of s is the number of ways to compose k from n summands. So the sum of nth to kth coefficients gives us the count of the elements below kth diagonal
For a 2-dimensional matrix, you converted the problem into another problem, which is count the elements below the diagonal including the diagonal.
Try and visualize it for a 3-d matrix. In case of a 3-dimensional matrix, the problem will be reduced to another problem, which is to count the elements below the diagonal plane including the diagonal

Computational Geometry set of points algorithm

I have to design an algorithm with running time O(nlogn) for the following problem:
Given a set P of n points, determine a value A > 0 such that the shear transformation (x,y) -> (x+Ay,y) does not change the order (in x direction) of points with unequal x-coordinates.
I am having a lot of difficulty even figuring out where to begin.
Any help with this would be greatly appreciated!
Thank you!
I think y = 0.
When x = 0, A > 0
(x,y) -> (x+Ay,y)
-> (0+(A*0),0) = (0,0)
When x = 1, A > 0
(x,y) -> (x+Ay,y)
-> (1+(A*0),0) = (1,0)
with unequal x-coordinates, (2,0), (3,0), (4,0)...
So, I think that the begin point may be (0,0), x=0.
Suppose all x,y coordinates are positive numbers. (Without loss of generality, one can add offsets.) In time O(n log n), sort a list L of the points, primarily in ascending order by x coordinates and secondarily in ascending order by y coordinates. In time O(n), process point pairs (in L order) as follows. Let p, q be any two consecutive points in L, and let px, qx, py, qy denote their x and y coordinate values. From there you just need to consider several cases and it should be obvious what to do: If px=qx, do nothing. Else, if py<=qy, do nothing. Else (px>qx, py>qy) require that px + A*py < qx + A*qy, i.e. (px-qx)/(py-qy) > A.
So: Go through L in order, and find the largest A' that is satisfied for all point pairs where px>qx and py>qy. Then choose a value of A that's a little less than A', for example, A'/2. (Or, if the object of the problem is to find the largest such A, just report the A' value.)
Ok, here's a rough stab at a method.
Sort the list of points by x order. (This gives the O(nlogn)--all the following steps are O(n).)
Generate a new list of dx_i = x_(i+1) - x_i, the differences between the x coordinates. As the x_i are ordered, all of these dx_i >= 0.
Now for some A, the transformed dx_i(A) will be x_(i+1) -x_i + A * ( y_(i+1) - y_i). There will be an order change if this is negative or zero (x_(i+1)(A) < x_i(A).
So for each dx_i, find the value of A that would make dx_i(A) zero, namely
A_i = - (x_(i+1) - x_i)/(y_(i+1) - y_i). You now have a list of coefficients that would 'cause' an order swap between a consecutive (in x-order) pair of points. Watch for division by zero, but that's the case where two points have the same y, these points will not change order. Some of the A_i will be negative, discard these as you want A>0. (Negative A_i will also induce an order swap, so the A>0 requirement is a little arbitrary.)
Find the smallest A_i > 0 in the list. So any A with 0 < A < A_i(min) will be a shear that does not change the order of your points. Pick A_i(min) as that will bring two points to the same x, but not past each other.

Resources