Matching points in 2 D space - algorithm

I have 2 matrices A and B both of size Rows X 2 where Rows = m , n for A and B respectively. These m and n denote the points in the euclidean space.
The task I wish to perform is to match the maximum number of points from A and B ( assuming A has less number of points than B ) given the condition that the distance is less than a threshold d and each pair is unique.
I have seen this nearest point pairs but this won't work on my problem because for every point in A it select the minimum left in B. However it may happen that the first pair I picked from A and B was wrong leading to less number of matching pairs.
I am looking for a fast solution since both A and B consists of about 1000 points each. Again, some points will be left and I am aware that this would somehow lead to an exhaustive search.
I am looking for a solution where there is some sort of inbuilt functions in matlab or using data structures that can help whose matlab code is available such as kd-trees. As mentioned I have to find unique nearest matching points from B to A.

You can use pdist2 to compute a pairwise distance between two pairs of observations (of different sizes). The final distance matrix will be an N x M matrix which you can probe for all values above the desired threshold.
A = randn(1000, 2);
B = randn(500, 2);
D = pdist2(A, B, 'euclidean'); % euclidean distance
d = 0.5; % threshold
indexD = D > d;
pointsA = any(indexD, 2);
pointsB = any(indexD, 1);
The two vectors provide logical indexes to the points in A and B that have at least one match, defined by the minimum distance d, on the other. The resulting sets will be composed of the entire set of elements from matrix A (or B) with distance above d from any element of the other matrix B (or A).
You can also generalize to more than 2 dimensions or different distance metrics.

Related

Algorithm to transform one set of numbers to another with optimization

We need to transform a set of integers A to another set of integers B such that the sum of squares of the elements of B is equal to a certain given value M.
Since there can be multiple such transformations, we need to find the one in which the sum of the square of the difference between the corresponding elements of A and B is minimum.
Input:
A set of non-negative integers A = {a1, a2, a3 ... an}
A non-negative integer M
Output:
A set of numbers B = {b1, b2, b3 ... bn}, such that:
sumi=1->n[ bi ^ 2 ] = M
sumi=1->n[ (ai-bi) ^ 2 ] = S is minimized.
The minimum sum S.
A bit of math.
Sum (ai - bi)2 = Sum (ai2 - 2 aibi + bi2) = Sum ai2 - 2 Sum aibi + Sum bi2
The first term is constant; the last one is M (also constant), so you are seeking to maximize
Sum aibi
with the restriction Sum bi2 = M.
In other words, you need a hyperplane normal to a vector A = { ai }, tangent to a hypersphere with a radius sqrt(M). Such hyperplane passes through a point where the normal line intersects with the sphere. This point is fA with |fA| = sqrt(M):
f = sqrt(M)/sqrt(Sum ai2)
The solution to your problem is
bi = ai * sqrt(M)/sqrt(Sum ai2)
EDIT: The answers so far, including the one below, map A to a set of real numbers instead of integers. As far as I can tell there is no general fix for this because there are many values of M for which there is no integer vector satisfying the constraint. Ex: M = 2. There is no vector of integers the sum of whose squares is 2. Even if M is a sum of squares, it is a sum of a certain number of squares, so even M = 4 has no solution if A has 3 or more non-zero components. As such, there is no general mapping that satisfies the problem as stated.
Here is the version that allows B to be a vector of reals:
The answer by #user58697 is quite elegant. Here is a restatement that is, perhaps, more intuitive for those of us less used to thinking with hyper geometry:
Treat A and B as vectors. Then start the same way: sum(ai - bi)2 = sum(ai2) - 2sum(aibi) + sum(bi2)
The first term is the magnitude of the vector A squared just as the last term is the magnitude of vector B squared. Both are constant so only the middle term can change. That means we want to maximize sum(aibi) which is exactly the dot product of A and B (https://en.wikipedia.org/wiki/Dot_product). The dot product of two vectors is maximized when the angle between them is 0, which is to say when they are co-directional (that is they point in the same direction).
This means that the unit vector forms of A and B must be the same. That is:
ai/|A| = bi/|B|. Solve this for bi: bi = ai * |B| / |A|
But |B| is just sqrt(M) and A is sqrt(sum(ai2)). So, just like in user58697's version:
bi = ai * sqrt(M) / sqrt(sum(ai2))

How to find independent points in a unit square in O(n log n)?

Consider a unit square containing n 2D points. We say that two points p and q are independent in a square, if the Euclidean distance between them is greater than 1. A unit square can contain at most 3 mutually independent points. I would like to find those 3 mutually independent points in the given unit square in O(n log n). Is it possible? Please help me.
Can this problem be solved in O(n^2) without using any spatial data structures such as Quadtree, kd-tree, etc?
Use a spatial data structure such as a Quadtree to store your points. Each node in the quadtree has a bounding box and a set of 4 child nodes, and a list of points (empty except for the leaf nodes). The points are stored in the leaf nodes.
The point quadtree is an adaptation of a binary tree used to represent two-dimensional point data. It shares the features of all quadtrees but is a true tree as the center of a subdivision is always on a point. The tree shape depends on the order in which data is processed. It is often very efficient in comparing two-dimensional, ordered data points, usually operating in O(log n) time.
For each point, maintain a set of all points that are independent of that point.
Insert all your points into the quadtree, then iterate through the points and use the quadtree to find the points that are independent of each:
main()
{
for each point p
insert p into quadtree
set p's set to empty
for each point p
findIndependentPoints(p, root node of quadtree)
}
findIndependentPoints(Point p, QuadTreeNode n)
{
Point f = farthest corner of bounding box of n
if distance between f and p < 1
return // none of the points in this node or
// its children are independent of p
for each point q in n
if distance between p and q > 1
find intersection r of q's set and p's set
if r is non-empty then
p, q, r are the 3 points -> ***SOLVED***
add p to q's set of independent points
add q to p's set of independent points
for each subnode m of n (up 4 of them)
findIndependentPoints(p, m)
}
You could speed up this:
find intersection r of q's set and p's set
by storing each set as a quadtree. Then you could find the intersection by searching in q's quadtree for a point independent of p using the same early-out technique:
// find intersection r of q's set and p's set:
// r = findMututallyIndependentPoint(p, q's quadtree root)
Point findMututallyIndependentPoint(Point p, QuadTreeNode n)
{
Point f = farthest corner of bounding box of n
if distance between f and p < 1
return // none of the points in this node or
// its children are independent of p
for each point r in n
if distance between p and r > 1
return r
for each subnode m of n (up 4 of them)
findMututallyIndependentPoint(p, m)
}
An alternative to using Quadtrees is using K-d trees, which produces more balanced trees where each leaf node is a similar depth from the root. The algorithm for finding independent points in that case would be the same, except that there would only be up to 2 and not 4 child nodes for each node in the data structure, and the bounding boxes at each level would be of variable size.
You might want to try this out.
Pick the top left point (Y) with coordinate (0,1). Calculate distance from each point from the List to point Y.
Sort the result in increasing order into SortedPointList (L)
If the first point (A) and the last point (B) in list L are independent:
Foreach point P in list L:
if P is independent to both A and B:
Return A, B, P
Pick the top right point (X) with coordinate (1,1). Calculate distance from each point from the List to point X.
Sort the result in increasing order into SortedPointList (S)
If the first point (C) and the last point (D) in list L are independent:
Foreach point O in list S:
if P is independent to both C and D:
Return C, D, O
Return null
This is a wrong solution. Kept it just for comments. If one finds another solution based on smallest enclosing circle, please put a link as a comment.
Solve the Smallest-circle problem.
If diameter of a circle <= 1, return null.
If the circle is determined by 3 points, check which are "mutually independent". If there are only two of them, try to find the third by iteration.
If the circle is determined by 2 points, they are "mutually independent". Try to find the third one by iteration.
Smallest-sircle problem can be solved in O(N), thus the whole problem complexity is also O(N).

Algorithm to create a vector based puzzle

I am working on a little puzzle-game-project. The basic idea is built around projecting multi-dimensonal data down to 2D. My only problem is how to generate the randomized scenario data. Here is the problem:
I got muliple randomized vectors v_i and a target vector t, all 2D. Now I want to randomize scalar values c_i that:
t = sum c_i v_i
Because there are more than two v_i this is a overdetermined system. I also took care that the linear combination of v_i is actual able to reach t.
How can I create (randomized) values for my c_i?
Edit: After finding this Question I can additionally state, that it is possible for me also (slightly) change the v_i.
All values are based on double
Let's say your v_i form a matrix V with 2 rows and n columns, each vector is a column. The coefficients c_i form a column vector c. Then the equation can be written in matrix form as
V×c = t
Now apply a Singular Value Decomposition to matrix V:
V = A×D×B
with A being an orthogonal 2×2 matrix, D is a 2×n matrix and B an orthogonal n×n matrix. The original equation now becomes
A×D×B×c = t
multiply this equation with the inverse of A, the inverse is the same as the transposed matrix AT:
D×B×c = AT×t
Let's introduce new symbols c'=B×c and t'=AT×t:
D×c' = t'
The solution of this equation is simple, because Matrix D looks like this:
u 0 0 0 ... // n columns
0 v 0 0 ...
The solution is
c1' = t1' / u
c2' = t2' / v
And because all the other columns of D are zero, the remaining components c3'...cn' can be chosen freely. This is the place where you can create random numbers for c3'...cn. Having vector c' you can calculate c as
c = BT×c'
with BT being the inverse/transposed of B.
Since the v_i are linearly dependent there are non trivial solutions to 0 = sum l_i v_i.
If you have n vectors you can find n-2 independent such solutions.
If you have now one solution to t = sum c_i v_i you can add any multiple of l_i to c_i and you will still have a solution: c_i' = p l_i + c_i.
For each independent solution of the homogenous problem determine a random p_j and calculate
c_i'' = c_i + sum p_j l_i_j.

Finding a square side length is R in 2D plane ?

I was at the high frequency Trading firm interview, they asked me
Find a square whose length size is R with given n points in the 2D plane
conditions:
--parallel sides to the axis
and it contains at least 5 of the n points
running complexity is not relative to the R
they told me to give them O(n) algorithm
Interesting problem, thanks for posting! Here's my solution. It feels a bit inelegant but I think it meets the problem definition:
Inputs: R, P = {(x_0, y_0), (x_1, y_1), ..., (x_N-1, y_N-1)}
Output: (u,v) such that the square with corners (u,v) and (u+R, v+R) contains at least 5 points from P, or NULL if no such (u,v) exist
Constraint: asymptotic run time should be O(n)
Consider tiling the plane with RxR squares. Construct a sparse matrix, B defined as
B[i][j] = {(x,y) in P | floor(x/R) = i and floor(y/R) = j}
As you are constructing B, if you find an entry that contains at least five elements stop and output (u,v) = (i*R, j*R) for i,j of the matrix entry containing five points.
If the construction of B did not yield a solution then either there is no solution or else the square with side length R does not line up with our tiling. To test for this second case we will consider points from four adjacent tiles.
Iterate the non-empty entries in B. For each non-empty entry B[i][j], consider the collection of points contained in the tile represented by the entry itself and in the tiles above and to the right. These are the points in entries: B[i][j], B[i+1][j], B[i][j+1], B[i+1][j+1]. There can be no more than 16 points in this collection, since each entry must have fewer than 5. Examine this collection and test if there are 5 points among the points in this collection satisfying the problem criteria; if so stop and output the solution. (I could specify this algorithm in more detail, but since (a) such an algorithm clearly exists, and (b) its asymptotic runtime is O(1), I won't go into that detail).
If after iterating the entries in B no solution is found then output NULL.
The construction of B involves just a single pass over P and hence is O(N). B has no more than N elements, so iterating it is O(N). The algorithm for each element in B considers no more than 16 points and hence does not depend on N and is O(1), so the overall solution meets the O(N) target.
Run through set once, keeping the 5 largest x values in a (sorted) local array. Maintaining the sorted local array is O(N) (constant time performed N times at most).
Define xMin and xMax as the x-coordinates of the two points with largest and 5th largest x values respectively (ie (a[0] and a[4]).
Sort a[] again on Y value, and set yMin and yMax as above, again in constant time.
Define deltaX = xMax- xMin, and deltaY as yMax - yMin, and R = largest of deltaX and deltaY.
The square of side length R located with upper-right at (xMax,yMax) meets the criteria.
Observation if R is fixed in advance:
O(N) complexity means no sort is allowed except on a fixed number of points, as only a Radix sort would meet the criteria and it requires a constraint on the values of xMax-xMin and of yMax-yMin, which was not provided.
Perhaps the trick is to start with the point furthest down and left, and move up and right. The lower-left-most point can be determined in a single pass of the input.
Moving up and right in steps and counitng points in the square requries sorting the points on X and Y in advance, which to be done in O(N) time requiress that the Radix sort constraint be met.

Distance measure between two sets of possibly different size

I have 2 sets of integers, A and B, not necessarily of the same size. For my needs, I take the distance between each 2 elements a and b (integers) to be just abs(a-b).
I am defining the distance between the two sets as follows:
If the sets are of the same size, minimize the sum of distances of all pairs [a,b] (a from A and b from B), minimization over all possible 'pairs partitions' (there are n! possible partitions).
If the sets are not of the same size, let's say A of size m and B of size n, with m < n, then minimize the distance from (1) over all subsets of B which are of size m.
My question is, is the following algorithm (just an intuitive guess) gives the right answer, according to the definition written above.
Construct a matrix D of size m X n, with D(i,j) = abs(A(i)-B(j))
Find the smallest element of D, accumulate it, and delete the row and the column of that element. Accumulate the next smallest entry, and keep accumulating until all rows and columns are deleted.
for example, if A={0,1,4} and B={3,4}, then D is (with the elements above and to the left):
3 4
0 3 4
1 2 3
4 1 0
And the distance is 0 + 2 = 2, coming from pairing 4 with 4 and 3 with 1.
Note that this problem is referred to sometimes as the skis and skiers problem, where you have n skis and m skiers of varying lengths and heights. The goal is to match skis with skiers so that the sum of the differences between heights and ski lengths is minimized.
To solve the problem you could use minimum weight bipartite matching, which requires O(n^3) time.
Even better, you can achieve O(n^2) time with O(n) extra memory using the simple dynamic programming algorithm below.
Optimally, you can solve the problem in linear time if the points are already sorted using the algorithm described in this paper.
O(n^2) dynamic programming algorithm:
if (size(B) > size(A))
swap(A, B);
sort(A);
sort(B);
opt = array(size(B));
nopt = array(size(B));
for (i = 0; i < size(B); i++)
opt[i] = abs(A[0] - B[i]);
for (i = 1; i < size(A); i++) {
fill(nopt, infinity);
for (j = 1; j < size(B); j++) {
nopt[j] = min(nopt[j - 1], opt[j - 1] + abs(A[i] - B[j]));
swap(opt, nopt);
}
return opt[size(B) - 1];
After each iteration i of the outer for loop above, opt[j] contains the optimal solution matching {A[0],..., A[i]} using the elements {B[0],..., B[j]}.
The correctness of this algorithm relies on the fact that in any optimal matching if a1 is matched with b1, a2 is matched with b2, and a1 < a2, then b1 <= b2.
In order to get the optimum, solve the assignment problem on D.
The assignment problem finds a perfect matching in a bipartite graph such that the total edge weight is minimized, which maps perfectly to your problem. It is also in P.
EDIT to explain how OP's problem maps onto assignment.
For simplicity of explanation, extend the smaller set with special elements e_k.
Let A be the set of workers, and B be the set of tasks (the contents are just labels).
Let the cost be the distance between an element in A and B (i.e. an entry of D). The distance between e_k and anything is 0.
Then, we want to find a perfect matching of A and B (i.e. every worker is matched with a task), such that the cost is minimized. This is the assignment problem.
No It's not a best answer, for example:
A: {3,7} and B:{0,4} you will choose: {(3,4),(0,7)} and distance is 8 but you should choose {(3,0),(4,7)} in this case distance is 6.
Your answer gives a good approximation to the minimum, but not necessarily the best minimum. You are following a "greedy" approach which is generally much easier, and gives good results, but can not guarantee the best answer.

Resources