Related
I'm trying to implement the Closest Pair algorithm with Manhattan distance. With Euclidean distance, it's working fine, but with Manhattan distance, it gives the wrong result. CLRS Exercise 33.4-3 asks us to replace Euclidean distance with Manhattan distance. They simply ask us to change one line, but it isn't what modification is needed in the below code.
lst = [(2,2),(4,2),(5,3)]
min_dist = float("inf")
for i in range(len(lst)):
for j in range(i + 1 , len(lst)):
dist = abs(lst[i][0] - lst[j][0]) + abs(lst[i][1] - lst[j][1])
if(dist < min_dist):
min_dist = dist
global minp1, minp2
minp1 = lst[i]
minp2 = lst[j]
I guess that the outcome of programs with both distances are different.
Indeed, with the euclidian distance, the closest pair is (4,2)-(5,3) while with the Manhattan distance, both (2,2)-(4,2) and (4,2)-(5,3) are closest pairs. Given your program, you only pick up the first one in the order of appearance, and the outcome is (2,2)-(4,2). If your program would return all closest pairs, you would have seen (4,2)-(5,3).
But generally speaking, there is no reason for the outcome of both programs to be the same. For example, in your example, change (5,3) to (5,3.1). To have a concrete idea of how different both distances are, it may useful for you to plot the "unit circle" using for norms and you will see that the Manhattan circle is more square than round.
For 3 points in 2D :
P1(x1,y1),
P2(x2,y2),
P3(x3,y3)
I need to find a point P(x,y), such that the maximum of the manhattan distances
max(dist(P,P1),
dist(P,P2),
dist(P,P3))
will be minimal.
Any ideas about the algorithm?
I would really prefer an exact algorithm.
There is an exact, noniterative algorithm for the problem; as Knoothe pointed out, the Manhattan distance is rotationally equivalent to the Chebyshev distance, and P is trivially computable for the Chebyshev distance as the mean of the extreme coordinates.
The points reachable from P within the Manhattan distance x form a diamond around P. Therefore, we need to find the minimum diamond that encloses all points, and its center will be P.
If we rotate the coordinate system by 45 degrees, the diamond is a square. Therefore, the problem can be reduced to finding the smallest enclosing square of the points.
The center of a smallest enclosing square can be found as the center of the smallest enclosing rectangle (which is trivially computed as the max and min of the coordinates). There is an infinite number of smallest enclosing squares, since you can shift the center along the shorter edge of the minimum rectangle and still have a minimal enclosing square. For our purposes, we can simply use the one whose center coincides with the enclosing rectangle.
So, in algorithmic form:
Rotate and scale the coordinate system by assigning x' = x/sqrt(2) - y/sqrt(2), y' = x/sqrt(2) + y/sqrt(2)
Compute x'_c = (max(x'_i) + min(x'_i))/2, y'_c = (max(y'_i) + min(y'_i))/2
Rotate back with x_c = x'_c/sqrt(2) + y'_c/sqrt(2), y_c = - x'_c/sqrt(2) + y'_c/sqrt(2)
Then x_c and y_c give the coordinates of P.
If an approximate solution is okay, you could try a simple optimization algorithm. Here's an example, in Python
import random
def opt(*points):
best, dist = (0, 0), 99999999
for i in range(10000):
new = best[0] + random.gauss(0, .5), best[1] + random.gauss(0, .5)
dist_new = max(abs(new[0] - qx) + abs(new[1] - qy) for qx, qy in points)
if dist_new < dist:
best, dist = new, dist_new
print new, dist_new
return best, dist
Explanation: We start with the point (0, 0), or any other random point, and modify it a few thousand times, each time keeping the better of the new and the previously best point. Gradually, this will approximate the optimum.
Note that simply picking the mean or median of the three points, or solving for x and y independently does not work when minimizing the maximum manhattan distance. Counter-example: Consider the points (0,0), (0,20) and (10,10), or (0,0), (0,1) and (0,100). If we pick the mean of the most separated points, this would yield (10,5) for the first example, and if we take the median this would be (0,1) for the second example, which both have a higher maximum manhattan distance than the optimum.
Update: Looks like solving for x and y independently and taking the mean of the most distant points does in fact work, provided that one does some pre- and postprocessing, as pointed out by thiton.
I'm having a hard time following the ray-plane intersection described in the following page.
SIGGRAPH Ray-Plane Intersection
Here is my understanding.
The plane is described as Ax + By + Cz + D = 0
or
The Vector ( A, B, C, D ), Where A, B, C define a normal plan. If A, B, and C define a unit normal, then the distance from the origin [0, 0, 0] to the plan is D.
My question is shouldn't D be a vector? Since it represents the distants between two points. I guess I just don't understand how you can represent the distance between to points as a non vector.
Any help is much appreciated.
Distance between two points is ALWAYS a scalar, a single number. Think of the vectors as points in space, right? So, when you say distance between two vectors, you are finding the distance between those two points which is a number. Distance between two vectors is the magnitude of the difference vector of the two vectors. So, you subtract the 2 vectors, get the difference vector and find its magnitude. That is your distance which is a SCALAR and NOT a vector.
Distance is a scalar value, not a vector. It is, in fact, the length of a vector.
You can think of a vector as a set of values describing a point in space in relation to the origin. In R3, you need a minimum of 3 pieces of information to describe the location of that point. These pieces of information give you a direction and a distance.
If you were to tell me that a city is 50 miles away, that would be you describing a distance. Of course, you will not have told me which direction that city was. When you give me 2 pieces of information, you have given me a vector, as opposed to scalar value.
Also recall the formula for distance:
D = sqrt(x^2 + y^2 + z^2)
Scalar value ;)
What is the simplest way to test if a point P is inside a convex hull formed by a set of points X?
I'd like an algorithm that works in a high-dimensional space (say, up to 40 dimensions) that doesn't explicitly compute the convex hull itself. Any ideas?
The problem can be solved by finding a feasible point of a Linear Program. If you're interested in the full details, as opposed to just plugging an LP into an existing solver, I'd recommend reading Chapter 11.4 in Boyd and Vandenberghe's excellent book on convex optimization.
Set A = (X[1] X[2] ... X[n]), that is, the first column is v1, the second v2, etc.
Solve the following LP problem,
minimize (over x): 1
s.t. Ax = P
x^T * [1] = 1
x[i] >= 0 \forall i
where
x^T is the transpose of x
[1] is the all-1 vector.
The problem has a solution iff the point is in the convex hull.
The point lies outside of the convex hull of the other points if and only if the direction of all the vectors from it to those other points are on less than one half of a circle/sphere/hypersphere around it.
Here is a sketch for the situation of two points, a blue one inside the convex hull (green) and the red one outside:
For the red one, there exist bisections of the circle, such that the vectors from the point to the points on the convex hull intersect only one half of the circle.
For the blue point, it is not possible to find such a bisection.
You don't have to compute convex hull itself, as it seems quite troublesome in multidimensional spaces. There's a well-known property of convex hulls:
Any vector (point) v inside convex hull of points [v1, v2, .., vn] can be presented as sum(ki*vi), where 0 <= ki <= 1 and sum(ki) = 1. Correspondingly, no point outside of convex hull will have such representation.
In m-dimensional space, this will give us the set of m linear equations with n unknowns.
edit
I'm not sure about complexity of this new problem in general case, but for m = 2 it seems linear. Perhaps, somebody with more experience in this area will correct me.
I had the same problem with 16 dimensions. Since even qhull didn't work properly as too much faces had to be generated, I developed my own approach by testing, whether a separating hyperplane can be found between the new point and the reference data (I call this "HyperHull" ;) ).
The problem of finding a separating hyperplane can be transformed to a convex quadratic programming problem (see: SVM). I did this in python using cvxopt with less then 170 lines of code (including I/O). The algorithm works without modification in any dimension even if there exists the problem, that as higher the dimension as higher the number of points on the hull (see: On the convex hull of random points in a polytope). Since the hull isn't explicitely constructed but only checked, whether a point is inside or not, the algorithm has very big advantages in higher dimensions compared to e.g. quick hull.
This algorithm can 'naturally' be parallelized and speed up should be equal to number of processors.
Though the original post was three years ago, perhaps this answer will still be of help. The Gilbert-Johnson-Keerthi (GJK) algorithm finds the shortest distance between two convex polytopes, each of which is defined as the convex hull of a set of generators---notably, the convex hull itself does not have to be calculated. In a special case, which is the case being asked about, one of the polytopes is just a point. Why not try using the GJK algorithm to calculate the distance between P and the convex hull of the points X? If that distance is 0, then P is inside X (or at least on its boundary). A GJK implementation in Octave/Matlab, called ClosestPointInConvexPolytopeGJK.m, along with supporting code, is available at http://www.99main.com/~centore/MunsellAndKubelkaMunkToolbox/MunsellAndKubelkaMunkToolbox.html . A simple description of the GJK algorithm is available in Sect. 2 of a paper, at http://www.99main.com/~centore/ColourSciencePapers/GJKinConstrainedLeastSquares.pdf . I've used the GJK algorithm for some very small sets X in 31-dimensional space, and had good results. How the performance of GJK compares to the linear programming methods that others are recommending is uncertain (although any comparisons would be interesting). The GJK method does avoid computing the convex hull, or expressing the hull in terms of linear inequalities, both of which might be time-consuming. Hope this answer helps.
Are you willing to accept a heuristic answer that should usually work but is not guaranteed to? If you are then you could try this random idea.
Let f(x) be the cube of the distance to P times the number of things in X, minus the sum of the cubes of the distance to all of the points in X. Start somewhere random, and use a hill climbing algorithm to maximize f(x) for x in a sphere that is very far away from P. Excepting degenerate cases, if P is not in the convex hull this should have a very good probability of finding the normal to a hyperplane which P is on one side of, and everything in X is on the other side of.
A write-up to test if a point is in a hull space, using scipy.optimize.minimize.
Based on user1071136's answer.
It does go a lot faster if you compute the convex hull, so I added a couple of lines for people who want to do that. I switched from graham scan (2D only) to the scipy qhull algorithm.
scipy.optimize.minimize documentation:
https://docs.scipy.org/doc/scipy/reference/optimize.nonlin.html
import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
from scipy.spatial import ConvexHull
def hull_test(P, X, use_hull=True, verbose=True, hull_tolerance=1e-5, return_hull=True):
if use_hull:
hull = ConvexHull(X)
X = X[hull.vertices]
n_points = len(X)
def F(x, X, P):
return np.linalg.norm( np.dot( x.T, X ) - P )
bnds = [[0, None]]*n_points # coefficients for each point must be > 0
cons = ( {'type': 'eq', 'fun': lambda x: np.sum(x)-1} ) # Sum of coefficients must equal 1
x0 = np.ones((n_points,1))/n_points # starting coefficients
result = scipy.optimize.minimize(F, x0, args=(X, P), bounds=bnds, constraints=cons)
if result.fun < hull_tolerance:
hull_result = True
else:
hull_result = False
if verbose:
print( '# boundary points:', n_points)
print( 'x.T * X - P:', F(result.x,X,P) )
if hull_result:
print( 'Point P is in the hull space of X')
else:
print( 'Point P is NOT in the hull space of X')
if return_hull:
return hull_result, X
else:
return hull_result
Test on some sample data:
n_dim = 3
n_points = 20
np.random.seed(0)
P = np.random.random(size=(1,n_dim))
X = np.random.random(size=(n_points,n_dim))
_, X_hull = hull_test(P, X, use_hull=True, hull_tolerance=1e-5, return_hull=True)
Output:
# boundary points: 14
x.T * X - P: 2.13984259782e-06
Point P is in the hull space of X
Visualize it:
rows = max(1,n_dim-1)
cols = rows
plt.figure(figsize=(rows*3,cols*3))
for row in range(rows):
for col in range(row, cols):
col += 1
plt.subplot(cols,rows,row*rows+col)
plt.scatter(P[:,row],P[:,col],label='P',s=300)
plt.scatter(X[:,row],X[:,col],label='X',alpha=0.5)
plt.scatter(X_hull[:,row],X_hull[:,col],label='X_hull')
plt.xlabel('x{}'.format(row))
plt.ylabel('x{}'.format(col))
plt.tight_layout()
Picture a canvas that has a bunch of points randomly dispersed around it. Now pick one of those points. How would you find the closest 3 points to it such that if you drew a triangle connecting those points it would cover the chosen point?
Clarification: By "closest", I mean minimum sum of distances to the point.
This is mostly out of curiosity. I thought it would be a good way to estimate the "value" of a point if it is unknown, but the surrounding points are known. With 3 surrounding points you could extrapolate the value. I haven't heard of a problem like this before, doesn't seem very trivial so I thought it might be a fun exercise, even if it's not the best way to estimate something.
Your problem description is ambiguous. Which triangle are you after in this figure, the red one or the blue one?
The blue triangle is closer based on lexicographic comparison of the distances of the points, while the red triangle is closer based on the sum of the distances of the points.
Edit: you clarified it to make it clear that you want the sum of distances to be minimized (the red triangle).
So, how about this sketch algorithm?
Assume that the chosen point is at the origin (makes description of algorithm easy).
Sort the points by distance from the origin: P(1) is closest, P(n) is farthest.
Start with i = 3, s = ∞.
For each triple of points P(a), P(b), P(i) with a < b < i, if the triangle contains the origin, let s = min(s, |P(a)| + |P(b)| + |P(i)|).
If s ≤ |P(1)| + |P(2)| + |P(i)|, stop.
If i = n, stop.
Otherwise, increment i and go back to step 4.
Obviously this is O(n³) in the worst case.
Here's a sketch of another algorithm. Consider all pairs of points (A, B). For a third point to make a triangle containing the origin, it must lie in the grey shaded region in this figure:
By representing the points in polar coordinates (r, θ) and sorting them according to θ, it is straightforward to examine all these points and pick the closest one to the origin.
This is also O(n³) in the worst case, but a sensible order of visiting pairs (A, B) should yield an early exit in many problem instances.
Just a warning on the iterative method. You may find a triangle with 3 "near points" whose "length" is greater than another resulting by adding a more distant point to the set. Sorry, can't post this as a comment.
See Graph.
Red triangle has perimeter near 4 R while the black one has 3 Sqrt[3] -> 5.2 R
Like #thejh suggests, sort your points by distance from the chosen point.
Starting with the first 3 points, look for a triangle covering the chosen point.
If no triangle is found, expand you range to include the next closest point, and try all combinations.
Once a triangle is found, you don't necessarily have the final answer. However, you have now limited the final set of points to check. The furthest possible point to check would be at a distance equal to the sum of the distances of the first triangle found. Any further than this, and the sum of the distances is guaranteed to exceed the first triangle that was found.
Increase your range of points to include the last point whose distance <= the sum of the distances of the first triangle found.
Now check all combinations, and the answer is the triangle found from this set with the minimal sum of distances.
second shot
subsolution: (analytic geometry basics, skip if you are familiar with this) finding point of the opposite half-plane
Example: Let's have two points: A=[a,b]=[2,3] and B=[c,d]=[4,1]. Find vector u = A-B = (2-4,3-1) = (-2,2). This vector is parallel to AB line, so is the vector (-1,1). The equation for this line is defined by vector u and point in AB (i.e. A):
X = 2 -1*t
Y = 3 +1*t
Where t is any real number. Get rid of t:
t = 2 - X
Y = 3 + t = 3 + (2 - X) = 5 - X
X + Y - 5 = 0
Any point that fits in this equation is in the line.
Now let's have another point to define the half-plane, i.e. C=[1,1], we get:
X + Y - 5 = 1 + 1 - 5 < 0
Any point with opposite non-equation sign is in another half-plane, which are these points:
X + Y - 5 > 0
solution: finding the minimum triangle that fits the point S
Find the closest point P as min(sqrt( (Xp - Xs)^2 + (Yp - Ys)^2 ))
Find perpendicular vector to SP as u = (-Yp+Ys,Xp-Xs)
Find two closest points A, B from the opposite half-plane to sigma = pP where p = Su (see subsolution), such as A is on the different site of line q = SP (see final part of the subsolution)
Now we have triangle ABP that covers S: calculate sum of distances |SP|+|SA|+|SB|
Find the second closest point to S and continue from 1. If the sum of distances is smaller than that in previous steps, remember it. Stop if |SP| is greater than the smallest sum of distances or no more points are available.
I hope this diagram makes it clear.
This is my first shot:
split the space into quadrants
with picked point at the [0,0]
coords
find the closest point
from each quadrant (so you have 4
points)
any triangle from these
points should be small enough (but not necesarilly the smallest)
Take the closest N=3 points. Check whether the triange fits. If not, increment N by one and try out all combinations. Do that until something fits or nothing does.