I have two sets of points A and B, whereas the points can be 2D or 3D. Both sets have the same size n, which is rather low (5 - 20).
I would like to know how well these sets agree. That is, ideally I would find pairings between the points such that the sum of all Euclidean pair distances d(A,B) is minimal. So
d(A,B) = \sum_{i=1}^n ||A_i - B_i||_2
The final outcome is used to compare with other point sets. So, for example:
A = (1,1), (1,2), (1,3)
B = (1,1), (2,2), (1,3)
would give me d(A,B) = 1.
C = (1,1), (2,1), (3,1)
D = (2,1), (2,2), (3,1)
would give me d(C,D) = 1.414.
Any good ideas?
You can for example model your problem as an assignment problem (Wikipedia link), where you define the cost C_ij of assigning point A_i (from set A) to point B_j (from set B) to be equal to the distance between them. This assignment problem can then be solved using the Hungarian algorithm (Wikipedia link).
Related
Given a simple polygon P, consisting of n vertices, and Set S Of k points, determine if each of the polygon vertices are covered by some point from S.
My best solution was to check for every P vertex if there exist such point in S - total complexity of O(n*k). I belive there should be a more efficient solution. any hints?
Whether P is a polygon or not seems to be irrelevant. So the generalized question becomes: Given 2 sets of points A (with a points) and B (with b points), find out whether A is a subset of B or not?
The simple solution is O(a * b) but you can also get O(a + b) by doing some preprocessing.
Put all the points of B in a hash map with the x-coordinate as key and a hash set with the y-coordinates as values (Map<Number,Set<Number>>). This lets you query whether a point (x, y) is in B in O(1): map.containsKey(x) && map.get(x).contains(y).
Go through all the points of A and check whether the point is in B using the datastructure created above.
Step 1 is O(b) and Step 2 is O(a) which gives O(a + b).
From what I understood, the classical KNN algorithm works like this (for discrete data):
Let x be the point you want to classify
Let dist(a,b) be the Euclidean distance between points a and b
Iterate through the training set points pᵢ, taking the distances dist(pᵢ,x)
Classify x as the most frequent class between the K points closest (according to dist) to x.
How would I introduce weights on this classic KNN? I read that more importance should be given to nearer points, and I read this, but couldn't understand how this would apply to discrete data.
For me, first of all, using argmax doesn't make any sense, and if the weight acts increasing the distance, than it would make the distance worse. Sorry if I'm talking nonsense.
Consider a simple example with three classifications (red green blue) and the six nearest neighbors denoted by R, G, B. I'll make this linear to simplify visualization and arithmetic
R B G x G R R
The points listed with distance are
class dist
R 3
B 2
G 1
G 1
R 2
R 3
Thus, if we're using unweighted nearest neighbours, the simple "voting" algorithm is 3-2-1 in favor of Red. However, with the weighted influences, we have ...
red_total = 1/3^2 + 1/2^2 + 1/3^2 = 1/4 + 2/9 ~= .47
blue_total = 1/2^2 ..............................= .25
green_total = 1/1^2 + 1/1^2 ......................= 2.00
... and x winds up as Green due to proximity.
That lower-delta function is merely the classification function; in this simple example, it returns red | green | blue. In a more complex example, ... well, I'll leave that to later tutorials.
Okay, off the bat let me say I am not the fan of the link you provided, it has image equations and follows a different notation in the images and the text.
So leaving that off let's look at the regular k-NN algorithm. regular k-NN is actually just a special case of weighted k-NN. You assign a weight of 1 to k neighbors and 0 to the rest.
Let Wqj denote the weight associated with a point j relative to a point q
Let yj be the class label associated with the data point j. For simplicity let us assume we are classifying birds as either crows, hens or turkeys => discrete classes. So for all j, yj <- {crow, turkey, hen}
A good weight metric is the inverse of the distance , whatever distance be it Euclidean, Mahalanobis etc.
Given all this, the class label yq you would associate with the point q you are trying to predict would be the the sum of the wqj . yj terms diviided by the sum of all weights. You do not have to the division if you normalize the weights first.
You would end up with an equation as follows somevalue1 . crow + somevalue2 . hen + somevalue3 . turkey
One of these classes will have a higher somevalue. The class witht he highest value is what you will predict for point q
For the purpose of training you can factor in the error anyway you want. Since the classes are discrete there are a limited number of simple ways you can adjust the weight to improve accuracy
There are two sets of points, said set A and B, and for each point a in set A, try to get a subset sub_B of B, that the distance between b in sub_B and a is smaller than the distance between b and any other A's point. And the point is 2-dimensional. For example, A = { (0,0), (3,0) }, B = { (1,1), (2,1) }, then for (0,0) in A, its set is {(1,1)}, for (3,0) in A, its set is {(2,1)}. Obviously, a brute force method could get the result in O(mn), where m is A's size and n is B's size. My question is whether there's a better solution?
All the points in set A can be arranged into a Space Paritioned Tree and every point in B can be used as query to find the nearest neighbor in set A and taking all the set of points with smallest distance. This gives a O(N*log(N)+M*log(N)) solution using kd-trees.
I have to design an algorithm with running time O(nlogn) for the following problem:
Given a set P of n points, determine a value A > 0 such that the shear transformation (x,y) -> (x+Ay,y) does not change the order (in x direction) of points with unequal x-coordinates.
I am having a lot of difficulty even figuring out where to begin.
Any help with this would be greatly appreciated!
Thank you!
I think y = 0.
When x = 0, A > 0
(x,y) -> (x+Ay,y)
-> (0+(A*0),0) = (0,0)
When x = 1, A > 0
(x,y) -> (x+Ay,y)
-> (1+(A*0),0) = (1,0)
with unequal x-coordinates, (2,0), (3,0), (4,0)...
So, I think that the begin point may be (0,0), x=0.
Suppose all x,y coordinates are positive numbers. (Without loss of generality, one can add offsets.) In time O(n log n), sort a list L of the points, primarily in ascending order by x coordinates and secondarily in ascending order by y coordinates. In time O(n), process point pairs (in L order) as follows. Let p, q be any two consecutive points in L, and let px, qx, py, qy denote their x and y coordinate values. From there you just need to consider several cases and it should be obvious what to do: If px=qx, do nothing. Else, if py<=qy, do nothing. Else (px>qx, py>qy) require that px + A*py < qx + A*qy, i.e. (px-qx)/(py-qy) > A.
So: Go through L in order, and find the largest A' that is satisfied for all point pairs where px>qx and py>qy. Then choose a value of A that's a little less than A', for example, A'/2. (Or, if the object of the problem is to find the largest such A, just report the A' value.)
Ok, here's a rough stab at a method.
Sort the list of points by x order. (This gives the O(nlogn)--all the following steps are O(n).)
Generate a new list of dx_i = x_(i+1) - x_i, the differences between the x coordinates. As the x_i are ordered, all of these dx_i >= 0.
Now for some A, the transformed dx_i(A) will be x_(i+1) -x_i + A * ( y_(i+1) - y_i). There will be an order change if this is negative or zero (x_(i+1)(A) < x_i(A).
So for each dx_i, find the value of A that would make dx_i(A) zero, namely
A_i = - (x_(i+1) - x_i)/(y_(i+1) - y_i). You now have a list of coefficients that would 'cause' an order swap between a consecutive (in x-order) pair of points. Watch for division by zero, but that's the case where two points have the same y, these points will not change order. Some of the A_i will be negative, discard these as you want A>0. (Negative A_i will also induce an order swap, so the A>0 requirement is a little arbitrary.)
Find the smallest A_i > 0 in the list. So any A with 0 < A < A_i(min) will be a shear that does not change the order of your points. Pick A_i(min) as that will bring two points to the same x, but not past each other.
Picture a canvas that has a bunch of points randomly dispersed around it. Now pick one of those points. How would you find the closest 3 points to it such that if you drew a triangle connecting those points it would cover the chosen point?
Clarification: By "closest", I mean minimum sum of distances to the point.
This is mostly out of curiosity. I thought it would be a good way to estimate the "value" of a point if it is unknown, but the surrounding points are known. With 3 surrounding points you could extrapolate the value. I haven't heard of a problem like this before, doesn't seem very trivial so I thought it might be a fun exercise, even if it's not the best way to estimate something.
Your problem description is ambiguous. Which triangle are you after in this figure, the red one or the blue one?
The blue triangle is closer based on lexicographic comparison of the distances of the points, while the red triangle is closer based on the sum of the distances of the points.
Edit: you clarified it to make it clear that you want the sum of distances to be minimized (the red triangle).
So, how about this sketch algorithm?
Assume that the chosen point is at the origin (makes description of algorithm easy).
Sort the points by distance from the origin: P(1) is closest, P(n) is farthest.
Start with i = 3, s = ∞.
For each triple of points P(a), P(b), P(i) with a < b < i, if the triangle contains the origin, let s = min(s, |P(a)| + |P(b)| + |P(i)|).
If s ≤ |P(1)| + |P(2)| + |P(i)|, stop.
If i = n, stop.
Otherwise, increment i and go back to step 4.
Obviously this is O(n³) in the worst case.
Here's a sketch of another algorithm. Consider all pairs of points (A, B). For a third point to make a triangle containing the origin, it must lie in the grey shaded region in this figure:
By representing the points in polar coordinates (r, θ) and sorting them according to θ, it is straightforward to examine all these points and pick the closest one to the origin.
This is also O(n³) in the worst case, but a sensible order of visiting pairs (A, B) should yield an early exit in many problem instances.
Just a warning on the iterative method. You may find a triangle with 3 "near points" whose "length" is greater than another resulting by adding a more distant point to the set. Sorry, can't post this as a comment.
See Graph.
Red triangle has perimeter near 4 R while the black one has 3 Sqrt[3] -> 5.2 R
Like #thejh suggests, sort your points by distance from the chosen point.
Starting with the first 3 points, look for a triangle covering the chosen point.
If no triangle is found, expand you range to include the next closest point, and try all combinations.
Once a triangle is found, you don't necessarily have the final answer. However, you have now limited the final set of points to check. The furthest possible point to check would be at a distance equal to the sum of the distances of the first triangle found. Any further than this, and the sum of the distances is guaranteed to exceed the first triangle that was found.
Increase your range of points to include the last point whose distance <= the sum of the distances of the first triangle found.
Now check all combinations, and the answer is the triangle found from this set with the minimal sum of distances.
second shot
subsolution: (analytic geometry basics, skip if you are familiar with this) finding point of the opposite half-plane
Example: Let's have two points: A=[a,b]=[2,3] and B=[c,d]=[4,1]. Find vector u = A-B = (2-4,3-1) = (-2,2). This vector is parallel to AB line, so is the vector (-1,1). The equation for this line is defined by vector u and point in AB (i.e. A):
X = 2 -1*t
Y = 3 +1*t
Where t is any real number. Get rid of t:
t = 2 - X
Y = 3 + t = 3 + (2 - X) = 5 - X
X + Y - 5 = 0
Any point that fits in this equation is in the line.
Now let's have another point to define the half-plane, i.e. C=[1,1], we get:
X + Y - 5 = 1 + 1 - 5 < 0
Any point with opposite non-equation sign is in another half-plane, which are these points:
X + Y - 5 > 0
solution: finding the minimum triangle that fits the point S
Find the closest point P as min(sqrt( (Xp - Xs)^2 + (Yp - Ys)^2 ))
Find perpendicular vector to SP as u = (-Yp+Ys,Xp-Xs)
Find two closest points A, B from the opposite half-plane to sigma = pP where p = Su (see subsolution), such as A is on the different site of line q = SP (see final part of the subsolution)
Now we have triangle ABP that covers S: calculate sum of distances |SP|+|SA|+|SB|
Find the second closest point to S and continue from 1. If the sum of distances is smaller than that in previous steps, remember it. Stop if |SP| is greater than the smallest sum of distances or no more points are available.
I hope this diagram makes it clear.
This is my first shot:
split the space into quadrants
with picked point at the [0,0]
coords
find the closest point
from each quadrant (so you have 4
points)
any triangle from these
points should be small enough (but not necesarilly the smallest)
Take the closest N=3 points. Check whether the triange fits. If not, increment N by one and try out all combinations. Do that until something fits or nothing does.