Given n points on a 2-D plane, what is the point such that the distance from all the points is minimized? This point need not be from the set of points given. Is it centroid or something else?
How to find all such points(if more than one) with an algorithm?
This is known as the "Center of Distance" and is different from the centroid.
Firstly you have to define what measure of distance you are using. If we assume you are using the standard metric of d=sqrt( (x1-x2)^2 + (y1-y2)^2) then it is not unique, and the problem is minimising this sum.
The easiest example to show this answer is not unique is the straight line example. Any point in between the two points has an equal total distance from all points.
In 1D, the correct answer will be any answer that has the same number of points to the right and the left. As long as this is true, then any move to the left and right will increase and decrease the left and right sides by the same amount, and so leave the distance the same. This also proves the centroid is not necessarily the right answer.
If we extend to 2D this is no longer the case - as the sqrt makes the problem weighted. Surprisingly to me there does not seem to be a standard algorithm! The page here seems to use a brute force method. I never knew that!
If I wanted to use an algorithm, then I would find the median point in X and Y as a start point, then use a gradient descent algorithm - this would get the answer pretty quickly. The whole equation ends up as a quadratic, so it feels like there ought to be an exact solution.
There may be more than one point. Consider a plane with only two points on it. Those points describe a line-segment. Any point on that line-segment will have the same total distance from the two end-points.
This is discussed in detail here http://www.ddj.com/architect/184405252?pgno=1
A brute force algo. might give you the best results. Firstly, locate a rectangle/any quadrilateral bounding the input points. Finally, for each point inside the rectangle, calculate distance from other points. Sum the distances of the point from the input set. Say this is the 'cost' of the point. Repeat for each point and select point with min. cost.
Intelligence can also be added to the algo. it can eliminate areas based on average cost, etc...
Thats how I would approach the problem at least... hope it helps.
Related
I'm looking for a general algorithm for creating an evenly spaced grid, and I've been surprised how difficult it is to find!
Is this a well solved problem whose name I don't know?
Or is this an unsolved problem that is best done by self organising map?
More specifically, I'm attempting to make a grid on a 2D Cartesian plane in which the Euclidean distance between each point and 4 bounding lines (or "walls" to make a bounding box) are equal or nearly equal.
For a square number, this is as simple as making a grid with sqrt(n) rows and sqrt(n) columns with equal spacing positioned in the center of the bounding box. For 5 points, the pattern would presumably either be circular or 4 points with a point in the middle.
I didn't find a very good solution, so I've sadly left the problem alone and settled with a quick function that produces the following grid:
There is no simple general solution to this problem. A self-organizing map is probably one of the best choices.
Another way to approach this problem is to imagine the points as particles that repel each others and that are also repelled by the walls. As an initial arrangement, you could already evenly distribute the points up to the next smaller square number - for this you already have a solution. Then randomly add the remaining points.
Iteratively modify the locations to minimize the energy function based on the total force between the particles and walls. The result will of course depend on the force law, i.e. how the force depends on the distance.
To solve this, you can use numerical methods like FEM.
A simplified and less efficient method that is based on the same principle is to first set up an estimated minimal distance, based on the square number case which you can calculate. Then iterate through all points a number of times and for each one calculate the distance to its closest neighbor. If this is smaller than the estimated distance, move your point into the opposite direction by a certain fraction of the difference.
This method will generally not lead to a stable minimum but should find an acceptable solution after a number ot iterations. You will have to experiment with the stepsize and the number of iterations.
To summarize, you have three options:
FEM method: Efficient but difficult to implement
Self organizing map: Slightly less efficient, medium complexity of implementation.
Iteration described in last section: Less efficient but easy to implement.
Unfortunately your problem is still not very clearly specified. You say you want the points to be "equidistant" yet in your example, some pairs of points are far apart (eg top left and bottom right) and the points are all different distances from the walls.
Perhaps you want the points to have equal minimum distance? In which case a simple solution is to draw a cross shape, with one point in the centre and the remainder forming a vertical and horizontal crossed line. The gap between the walls and the points, and the points in the lines can all be equal and this can work with any number of points.
I'm looking for efficient solution of following problem: For given set of points in n-dimensional euclidian space find such member of this set that minimizes total distance to other points in set.
The obvious naïve approach is quadratic, so I'm looking for something less than quadratic.
My first thought was that all I need is just to find the center of bounding sphere and then, find the closest point in set to this point. But this is actually not true, imagine right triangle - all its vertices are equidistant from such center, nevertheless, exactly one vertice meets our requirements.
It would be nice it one will shed some light on this issue.
What minimizes the distance to all of the points is their average. Only a guess, but after you'll find the average you could find a point closest to it. As correctly pointed out in comments, median instead of average will actually minimize the distance (average will minimize squared distance). Median can also be calculated in O(n). For high dimensional datasets this solution would be O(n*m) of course, where m is the number of dimensions.
Also some links:
See accepted answer here: Algorithm to find point of minimum total distance from locations
And link provided by mcdowella: http://en.wikipedia.org/wiki/Geometric_median
I am making this up as I go along, but there appears to be a close connection between "best point of a set" and "best point" in convex optimization.
Your score function is a sum of distances. Each distance is convex U-shaped (OK V-shaped in this case) so their sum is convex U-shaped. In particular it has a perfectly good derivative everywhere except at points in the set, and this derivative is optimistic - if you take the value at a point and its derivative, neglecting any point at the point you are looking at, then predictions based on this will be optimistic - the line formed using the derivative lies almost entirely beneath the correct answer but grazes it at a single point.
This leads to the following algorithm:
Repeatedly
Pick a point at random and look to see if is the best point so far. If so, take note of it. Take the derivative of the sum of distances at this point. Use this, and the value at that point, to work out the predicted sum of distances at every other point and discard the points where this prediction is worse than the best answer so far as possible answers (although you still need to take them into account when working out distances and derivatives). These will be the points on the far side of a plane drawn through the chosen point normal to the derivative.
Now discard the chosen point as a contender as well and repeat if there are any points left to consider.
I would expect this to be something like n log n on randomly chosen points. However, if the set of points form the vertices of a regular polygon in n dimensions then it will cost N^2, discarding only the chosen point each time - any of the N points is in fact a correct answer and they all have the same sum of distances from each other.
I will of course up-vote anybody who can confirm or deny this general principle for finding the best of a set of given points under a convex objective function.
OK - I was interested enough in this to program this up - so I have 200+ lines of Java to dump in here if anybody cares. In 2 dimensions it's very fast, but at 20 dimensions you gain only a factor of two or so - this is reasonably understandable - each iteration cuts off points by projecting the problem down to a line and chopping off a fraction of the points outside the line. A randomly chosen point will be about half as far away from the centre as the other points - and very roughly you can expect the projection to cut off all but some multiple of the d-th root of 1/2 so as d increases the fraction of points you can discard in each iteration reduces.
The Problem:
I have an image that I downloaded from google's static map api. I use this image to basically create a "magic wand" type feature where a user clicks. For those interested I am using the graph cut algorithm to find the shape that the user clicked. I then find all the points that represent the border of this shape (borderPoints) using contour tracing.
My Goal:
Straighten out the lines (if possible) and minimize the amount of borderPoints (as much as possible). My current use case are the roofs of houses so in the majority of cases I would hope that I could find the corners and just use those as the borderPoints instead of all the varying points in between. But I am having trouble figuring out how to find those corners because of the bumpy pixel lines.
My Attempts at a Solution:
One simple technique is to loop over the points checking the point before, the current point, and the point after. If the point before and the point after have the same x or the same y then the current point can be removed. This trims the number of points down a little but not as much as I would like.
I also tried looking at the before and after point to see if the current point could be removed if it wasn't within a certain slope range but had little success because occasionally a key corner point was removed because the image was kind of fuzzy and the corner had slightly rounded points.
My Question:
Are there any algorithms for doing this type of thing? If so, what is it (they) called? If not, any thoughts on how to progamatically approach this problem?
This sounds similar to the Ramer–Douglas–Peucker algorithm. You may be able to do better by exploiting the fact that all your points lie on a grid.
Seems to me like you are looking for a polynomial approximation of degree 1.
For a quick answer to your question, you may want to read this: http://en.wikipedia.org/wiki/Simple_regression. The Numerical example section shows you concretely how the equation for your line can be computed.
Polynomial approximations allow you to approach a function, curve, group of point, however you want to call it with a polynomial function of the form an.x^n + ... + a1.x^1 + a0
In your case, you want a line, so you want a function a1.x + a0 where a1 and a0 will be calculated to minimize the error with the set of points you have.
There are various ways of computing your error (called a norm) and minimizing it. You may be interested for example in finding the line that minimizes the distance to any of the points you have (minimizing the max), or in minimizing the distance to the set of points as a whole (minimizing the sum of absolute differences, or the sum of squares of differences, etc.)
In terms of algorithms, you may want to look at Chebyshev approximations and Remez algorithms specifically. All of these solve the approximation of a function with a polynomial of any degree but in your case you will only care about degree 1.
I'm trying to come up with an algorithm that will do the following:
If a set of points is given, find for a query point the largest circle (with the query point as its center) that does not contain any points from the set.
So far I've thought of using a Voronoi diagram to find the areas (cells) that contain the points closest to a site point of the set, and then use the edge list from Voronoi to construct a trapezodial decomposition. From the decomposition I will be able to find which cell the query point lies in, and then the radius of the circle will be the distance from the query point to the point (site) of that cell. I think that the storage needed to create something like this is linear, since the Voronoi needs O(n) storage, and creating the trapezodial decomposition from the Voronoi can also be done with O(n) storage.
*Edit: Query time must be O(logn), which means I can't iterate through all of the points of the set one at a time.
Does this sound right, or am I missing something here?
Also, if anyone has some references that I could look at regarding this algorithm please let me know. Thanks :)
This question seems to be asking for the distance from the query point to the closest point to it in the set, so one way to answer it would be to find that closest point. One reasonably standard way of doing this would be with a http://en.wikipedia.org/wiki/K-d_tree, and this question in general is covered in http://en.wikipedia.org/wiki/Nearest_neighbour_search
That sounds overly complex. I don't even know what a Voroni diagram is, but assuming your points are all in a 2D plane (which seems to be the case since you mention a circle not a sphere) this is quite trivial:
Iterate through all the points and find the point which is closest to the query point. This distance is just Pythagorean's theorem sqrt((point_x - query_x)^2 + (point_y - query_y)^2). The smallest distance is the radius of the circle.
I have a list of rectangles that don't have to be parallel to the axes. I also have a master rectangle that is parallel to the axes.
I need an algorithm that can tell which rectangle is a point closest to(the point must be in the master rectangle). the list of rectangles and master rectangle won't change during the algorithm and will be called with many points so some data structure should be created to make the lookup faster.
To be clear: distance from a rectangle to a point is the distance between the closest point in the rectangle to the point.
What algorithm/data structure can be used for this? memory is on higher priority on this, n log n is ok but n^2 is not.
You should be able to do this with a Voronoi diagram with O(n log n) preprocessing time with O(log n) time queries. Because the objects are rectangles, not points, the cells may be curved. Nevertheless, a Voronoi diagram should work fine for your purposes. (See http://en.wikipedia.org/wiki/Voronoi_diagram)
For a quick and dirty solution that you could actually get working within a day, you could do something inspired by locality sensitive hashing. For example, if the rectangles are somewhat well-spaced, you could hash them into square buckets with a few different offsets, and then for each query examine each rectangle that falls in one of the handful of buckets that contain the query point.
You should be able to do this in O(n) time and O(n) memory.
Calculate the closest point on each edge of each rectangle to the point in question. To do this, see my detailed answer in the this question. Even though the question has to do with a point inside of the polygon (rather than outside of it), the algorithm still can be applied here.
Calculate the distance between each of these closest points on the edges, and find the closest point on the entire rectangle (for each rectangle) to the point in question. See the link above for more details.
Find the minimum distance between all of the rectangles. The rectangle corresponding with your minimum distance is the winner.
If memory is more valuable than speed, use brute force: for a given point S, compute the distance from S to each edge. Choose the rectangle with the shortest distance.
This solution requires no additional memory, while its execution time is in O(n).
Depending on your exact problem specification, you may have to adjust this solution if the rectangles are allowed to overlap with the master rectangle.
As you described, a distance between one point to a rectangle is the minimum length of all lines through that point which is perpendicular with all four edges of a rectangle and all lines connect that point with one of four vertices of the rectangle.
(My English is not good at describing a math solution, so I think you should think more deeply for understanding my explanation).
For each rectangle, you should save four vertices and four edges function for fast calculation distance between them with the specific point.