The problem here is to find set of all integer points which gives minimum sum over all Manhattan distances from given set of points!
For example:
lets have a given set of points { P1, P2, P3...Pn }
Basic problem is to find a point say X which would have minimum sum over all distances from points { P1, P2, P3... Pn }.
i.e. |P1-X| + |P2-X| + .... + |Pn-X| = D, where D will be minimum over all X.
Moving a step further, there can be more than one value of X satisfying above condition. i.e. more than one X can be possible which would give the same value D. So, we need to find all such X.
One basic approach that anyone can think of will be to find the median of inputs and then brute force the co-ordinates which is mentioned in this post
But the problem with such approach is: if the median gives two values which are very apart, then we end up brute forcing all points which will never run in given time.
So, is there any other approach which would give the result even when the points are very far apart (where median gives a range which is of the order of 10^9).
You can consider X and Y separately, since they add to the distance independently of each other. This reduces the question to finding, given n points on a line, a point with the minimum sum-of-distances to the other points. This is simple: any point between the two medians (inclusive) will satisfy this.
Proof: If we have an even number of points, there will be two medians. A point between the two medians will have n/2 points to the left and n/2 points to the right, and a total sum-of-distances to those points of S.
If we move it one point to the left, S will go up by n/2 (since we're moving away from the right-most points) and down by n/2 (since we're moving towards the left-most points), so overall S remains the same. This holds true until we hit the left-most median point. When we move one left of the left-most median point, we now have (n/2 + 1) points to the right, and (n/2 - 1) points to the left, so S goes up by two. Continuing to the left will only increase S further.
By the same logic, all points to the right of the right-most median also have a higher S.
If we have an odd number of points, there is only one median. Using the same logic as above, we can show that it has the lowest value of S.
If the median gives you an interval of the order of 10^9 then each point in that interval is as good as any other.
So depending on what you want to do with those points later on you can either return the range or enumerate points in that range. No way around it..
Obviously in two dimensions you'll get a bouding rectangle, in 3 dimensions a bounding cuboid etc..
The result will always be a cartesian product of ranges obtained for each dimension, so you can return a list of those ranges as a result.
Since in manhattan distance each component contributes separately, you can consider them separately too. The optimal answer is ( median(x),median(y) ). You need to look around this point for integer solutions.
NOTE: I did not read your question properly while answering. My answer still holds, but probably you knew about this solution already.
Yes i also think that for odd number of N points on a grid , there will be only a Single point(i.e the MEDIAN) which will be at minimum sum of Manhattan distance from all other points.
For Even value of N, the scenario will be a little different.
According to me if two Sets X = {1,2} and Y= {3,4} their Cartesian product will be always 4.
i.e X × Y = {1,2} × {3,4} = {(1,3), (1,4), (2,3), (2,4)}. This is what i have understood so far.
As for EVEN number of values we always take "MIDDLE TWO" values as MEDIAN. Taking 2 from X and 2 from Y will always return a Cartesian product of 4 points.
Correct me if i am wrong.
Related
I'm new into coding and today I completed the trivial solution for the Closest-Pair problem in a 2-D space. (2 for loops)
However I gave up finding any solution which could do it in O(n log n). Even after researching it, I still don't understand how this can be faster than the trivial method.
What I understand:
-> At first we split the array in 2 halfs and sort everything only considering the X coordinates. This can be done in n log n.
Next there are recursive calls which "find the two points with the lowest distance" in each half. But how is this done exactly below O(n^2)?
In my understanding it is impossible to find the lowest distance between N/2 points without checking every single one of them.
There is a solution in 1-D which absolutely makes sense to me. After sorting we know, that the distance between two non-adjacent points can't be lower than the distance of at least 2 adjacent ones. However this is not true for 2-D space, since we have an additional Y coordinate which could lead to the lowest distance between two points which are not adjacent on the X axis.
First of all, heed the advice of user #Evg - this answer cannot substitute the comprehensive description and mathematically rigorous analysis of the algorithm.
However, here are some ideas to get the intuition started:
(Recursion structure)
The question states:
Next there are recursive calls which "find the two points with the lowest distance" in each half. But how is this done exactly below O(n^2)? In my understanding it is impossible to find the lowest distance between N/2 points without checking every single one of them.
The recursion, however, does not stop at level 1 - assume for the sake of the argument that some O(n log n) algorithm works. Finding closest pairs among N/2 points applying that very algorithm takes O(N/2 log N/2) - not O((N/2)^2).
(Consequences of finding a closest pair in one half)
If you have found a closest pair (p, q) in the 'left' half of the point set, this pair's distance sets an upper bound to the width of a corridor around the halving line from which a closer pair (r, s) with r from the left, s from the right half can be drawn. If the closest distance found so far is 'small', it significantly reduces the size of the candidate set. As the points have been ordered by their x coordinate, the algorithm can exploit the information efficiently.
Said corridor may still cover up to the whole set of N points, but if it does, it provides information of the geometry of the point set: the points of each half will basically be aligned along a vertical line. This information can be exploited algorithmically - the most naive way would be to execute the algorithm once again but sorting along y coordinates and halving the point set by a horizontal line. Note that executing any algorithm a constant number of times does not change asymptotic run time expressed by the O(.) notation.
(Finding a close pair with one point from each half)
Consider checking a pair of points (r, s), one point from each half. It is known that the difference in their x and y coordinates, resp., mustn't exceed the minimal distance d found so far. It is known from the recursion that there can be no points r', s' (r' from the left, s' from the right half) closer to r, s, resp., than d. So given some r there cannot be 'many' candidates from the other half.
Imagine a circle of radius d drawn around r. Any point s from the other half being closer than d must be located within that circle. Let there be a few of them - however, the minimum distance among each pair still be at least d. The maximum number of points that can be distributed within a circle of radius d such that the distance between each pair of them is at least d is 7 - think of a regular hexagon with side length d and its center coinciding with the circle's center.
So after the recursion, at most every r from the left half needs to be checked against at max a constant number of points from the other half which makes the part of the algorithm after the recursion run in O(N).
Note that finding the pairing candidates for a given r is an efficient operation - the points from both halves have been sorted by the same criterion.
My problem is: we have N points in a 2D space, each point has a positive weight. Given a query consisting of two real numbers a,b and one integer k, find the position of a rectangle of size a x b, with edges are parallel to axes, so that the sum of weights of top-k points, i.e. k points with highest weights, covered by the rectangle is maximized?
Any suggestion is appreciated.
P.S.:
There are two related problems, which are already well-studied:
Maximum region sum: find the rectangle with the highest total weight sum. Complexity: NlogN.
top-K query for orthogonal ranges: find top-k points in a given rectangle. Complexity: O(log(N)^2+k).
You can reduce this problem into finding two points in the rectangle: rightmost and topmost. So effectively you can select every pair of points and calculate the top-k weight (which according to you is O(log(N)^2+k)). Complexity: O(N^2*(log(N)^2+k)).
Now, given two points, they might not form a valid pair: they might be too far or one point may be right and top of the other point. So, in reality, this will be much faster.
My guess is the optimal solution will be a variation of maximum region sum problem. Could you point to a link describing that algorithm?
An non-optimal answer is the following:
Generate all the possible k-plets of points (they are N × N-1 × … × N-k+1, so this is O(Nk) and can be done via recursion).
Filter this list down by eliminating all k-plets which are not enclosed in a a×b rectangle: this is a O(k Nk) at worst.
Find the k-plet which has the maximum weight: this is a O(k Nk-1) at worst.
Thus, this algorithm is O(k Nk).
Improving the algorithm
Step 2 can be integrated in step 1 by stopping the branch recursion when a set of points is already too large. This does not change the need to scan the element at least once, but it can reduce the number significantly: think of cases where there are no solutions because all points are separated more than the size of the rectangle, that can be found in O(N2).
Also, the permutation generator in step 1 can be made to return the points in order by x or y coordinate, by pre-sorting the point array correspondingly. This is useful because it lets us discard a bunch of more possibilities up front. Suppose the array is sorted by y coordinate, so the k-plets returned will be ordered by y coordinate. Now, supposing we are discarding a branch because it contains a point whose y coordinate is outside the max rectangle, we can also discard all the next sibling branches because their y coordinate will be more than of equal to the current one which is already out of bounds.
This adds O(n log n) for the sort, but the improvement can be quite significant in many cases -- again, when there are many outliers. The coordinate should be chosen corresponding to the minimum rectangle side, divided by the corresponding side of the 2D field -- by which I mean the maximum coordinate minus the minimum coordinate of all points.
Finally, if all the points lie within an a×b rectangle, then the algorithm performs as O(k Nk) anyways. If this is a concrete possibility, it should be checked, an easy O(N) loop, and if so then it's enough to return the points with the top N weights, which is also O(N).
I read a post which talks about pretty much the same problem. But here I simplify the problem hoping that a concrete proof can be offered.
There is a set A which contains some discrete points (1-dimension), like {1, 3, 37, 59}. And I want to pick one point from A which can minimize the sum of distances between this point and others.
There might be a lot of posts out there, and my problem is just the 1-d version of those, and I know how to prove it if A is not discrete, but I fail when A is discrete like above.
Plz offer me the answer with concrete proof, thanks.
Not a strictly (or even loosely) mathematical proof, if that is what you need, you're most likely asking in the wrong place :)
Let's call the 1D coordinate x and let's say raising x is moving to the right. The point you're looking for is mx.
If mx has more points to the right than to the left, moving mx to the right will raise the distance to the fewer points to the left and lower it to the more points to the right which will lower the average distance.
In other words, mx cannot have fewer points to the left than to the right, if there are, there is a way to lower the average.
The converse is also true.
If the number of elements is odd, the only point fulfilling the above requirement is the median.
If the number of elements is even, any point between the two mid elements will fulfill the condition.
Since between points the total-distance function is linear and the distance-function is continuous everywhere, the minimum must be attained at one of the points in the set. So you can restrict A to be in the given set.
Here's a graph for the total-distance function from the points {-3, 1, 2, 5} as an example:
Say there's L points to the left, and R to the right of A.
Moving one point to the right changes the distance by d * (L + 1 - R) (where d is the distance to the next point). Similarly, moving one point to the left changes the distance by d * (R + 1 - L).
The points that minimize |L-R| are where this distance is minimized. That's either the median (if it's in the set), or either of the two adjacent points to the median if it's not.
Considering Fastest way to find minimum distance between points question try to take a look to closest pair of points problem.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
Given a 2 dimensional plane in which there are n points. I need to generate the equation of a line that divides the plane such that there are n/2 points on one side and n/2 points on the other.
I have assumed the points are distinct, otherwise there might not even be such a line.
If points are distinct, then such a line always exists and is possible to find using a deterministic O(nlogn) time algorithm.
Say the points are P1, P2, ..., P2n. Assume they are not all on the same line. If they were, then we can easily form the splitting line.
First translate the points so that all the co-ordinates (x and y) are positive.
Now suppose we magically had a point Q on the y-axis such that no line formed by those points (i.e. any infinite line Pi-Pj) passes through Q.
Now since Q does not lie within the convex hull of the points, we can easily see that we can order the points by a rotating line passing through Q. For some angle of rotation, half the points will lie on one side and the other half will lie on the other of this rotating line, or, in other words, if we consider the points being sorted by the slope of the line Pi-Q, we could pick a slope between the (median)th and (median+1)th points. This selection can be done in O(n) time by any linear time selection algorithm without any need for actually sorting the points.
Now to pick the point Q.
Say Q was (0,b).
Suppose Q was collinear with P1 (x1,y1) and P2 (x2,y2).
Then we have that
(y1-b)/x1 = (y2-b)/x2 (note we translated the points so that xi > 0).
Solving for b gives
b = (x1y2 - y1x2)/(x1-x2)
(Note, if x1 = x2, then P1 and P2 cannot be collinear with a point on the Y axis).
Consider |b|.
|b| = |x1y2 - y1x2| / |x1 -x2|
Now let the xmax be the x-coordinate of the rightmost point and ymax the co-ordinate of the topmost.
Also let D be the smallest non-zero x-coordinate difference between two points (this exists, as not all xis are same, as not all points are collinear).
Then we have that |b| <= xmax*ymax/D.
Thus, pick our point Q (0,b) to be such that |b| > b_0 = xmax*ymax/D
D can be found in O(nlogn) time.
The magnitude of b_0 can get quite large and we might have to deal with precision issues.
Of course, a better option is to pick Q randomly! With probability 1, you will find the point you need, thus making the expected running time O(n).
If we could find a way to pick Q in O(n) time (by finding some other criterion), then we can make this algorithm run in O(n) time.
Create an arbitrary line in that plane. Project each point onto that line a.k.a for each point, get the closest point on that line to that point.
Order those points along the line in either direction, and choose a point on that line such that there is an equal number of points on the line in either direction.
Get the line perpendicular to the first line which passes through that point. This line will have half the original points on either side.
There are some cases to avoid when doing this. Most importantly, if all the point are themselves on a single line, don't choose a perpendicular line which passes through it. In fact, choose that line itself so you don't have to worry about projecting the points. In terms of the actual mathematics behind this, vector projections will be very useful.
This is a modification of Dividing a plane of points into two equal halves which allows for the case with overlapping points (in which case, it will say whether or not the answer exists).
If number of points is odd, return "impossible".
Pick a random line (two random points)
Project all points onto this line (`O(N)` operation)
(i.e. we pretend this line is our new X'-axis, and write down the
X'-coordinate of each point)
Perform any median-finding algorithm on the X'-coordinates
(`O(N)` or faster-if-desired operation)
(returns 2 medians if no overlapping points)
Return the line perpendicular to original random line that splits the medians
In rare case of overlapping points, repeat a few times (it would take
a pathological case to prevent a solution from existing).
This is O(N) unlike other proposed solutions.
Assuming a solution exists, the above method will probably terminate, though I don't have a proof.
Try the above algorithm a few times unless you detect overlapping points. If you detect a high number of overlapping points, you may be in for a rough ride, but there is a terribly inefficient brute-force solution that involves checking all possible angles:
For every "critical slope range", perform the above algorithm
by choosing a line with a slope within the range.
If all critical slope ranges fail, the solution is impossible.
A critical angle is defined as the angle which could possibly change the result (imagine the solution to a previous answer, rotate the entire set of points until one or more points swaps position with one or more other points, crossing the dividing line. There are only finitely many of these, and I think they are bounded by the number of points, so you're probably looking at something in the range O(N^2)-O(N^2 log(N)) if you have overlapping points, for a brute-force approach.
I'd guess that a good way is to sort/sequence/order the points (e.g. from left to right), and then choose a line which passes through (or between) the middle point[s] in the sequence.
There are obvious cases where no solution is possible. E.g. when you have three heaps of points. One point at location A, Two points at location B, and five points at location C.
If you expect some decent distribution, you can probably get a good result with tlayton's algorithm. To select the initial line slant, you could determine the extent of the whole point set, and choose the angle of the largest diagonal.
The median equally divides a set of numbers in the manner similar to what you're trying to accomplish, and it can be computed in O(n) time using a selection algorithm (the writeup in Cormen et al is better, so you may want to look there instead). So, find the median of your x values Mx (or your y values My if you prefer) and set x = Mx (or y = My) and that line will be axially aligned and split your points equally.
If the nature of your problem requires that no more than one point lies on the line (if you have an odd number of points in your set, at least one of them will be on the line) and you discover that's what's happened (or you just want to guard against the possibility), rotate all of your points by some random angle, θ, and compute the median of the rotated points. You then rotate the median line you computed by -θ and it will evenly divide points.
The likelihood of randomly choosing θ such that the problem manifests itself again is very small with a finite number of points, but if it does, try again with a different θ.
Here is how I approach this problem (with the assumption that n is even and NO three points are collinear):
1) Pick up the point with smallest Y value. Let's call it point P.
2) Take this point as the new origin point, so that all other points will have positive Y values after this transformation.
3) For every other point (there are n - 1 points remaining), think it under the polar coordinate system. Each other point can be represented with a radius and angle. You could ignore the radius and just focus on the angle.
4) How can you find a line that split the points evenly? Find the median of (n - 1) angles. The line from point P to the point with that median angle will split the points evenly.
Time complexity for this algorithm is O(n).
I dont know how useful this is I have seen a similar problem...
If you already have the directional vector (aka the coefficients of the dimensions of your plane).
You can then find two points inside that plane, and by simply using the midpoint formula you can find the midpoint of that plane.
Then using the coefficients of that plane and the midpoint you can find a plane that is from equal distance from both points, using the general equation of a plane.
A line then would constitute in expressing one variable in terms of the other
so you would find a line with equal distance between both planes.
There are different methods of doing this such as projection using the distance equation from a plane but I believe that would complicate your math a lot.
To add to M's answer: a method to generate a Q (that's not so far away) in O(n log n).
To begin with, let Q be any point on the y-axis ie. Q = (0,b) - some good choices might be (0,0) or (0, (ymax-ymin)/2).
Now check if there are two points (x1, y1), (x2, y2) collinear with Q. The line between any point and Q is y = mx + b; since b is constant, this means two points are collinear with Q if their slopes m are equal. So determine the slopes mi for all points and check if there are any duplicates: (amoritized O(n) using a hash-table)
If all the m's are distinct, we're done; we found Q, and M's algorithm above generates the line in O(n) steps.
If two points are collinear with Q, we'll move Q up just a tiny amount ε, Qnew = (0, b + ε), and show that Qnew will not be collinear with two other points.
The criterion for ε, derived below, is:
ε < mminΔ*xmin
To begin with, our m's look like this:
mi = yi/xi - b/xi
Let's find the minimum difference between any two distinct mi and call it mminΔ (O(n log n) by, for instance, sorting then comparing differences between mi and i+1 for all i)
If we fudge b up by ε, the new equation for m becomes:
mi,new = yi/xi - b/xi - ε/xi
= mi,old - ε/xi
Since ε > 0 and xi > 0, all m's are reduced, and all are reduced by a maximum of ε/xmin. Thus, if
ε/xmin < mminΔ, ie.
ε < mminΔ*xmin
is true, then two mi which were previously unequal will be guaranteed to remain unequal.
All that's left is to show that if m1,old = m2,old, then m1,new =/= m2,new. Since both mi were reduced by an amount ε/xi, this is equivalent to showing x1 =/= x2. If they were equal, then:
y1 = m1,oldx1 + b = m2,oldx2 + b = y2
Contradicting our assumption that all points are distinct. Thus, m1, new =/= m2, new, and no two points are collinear with Q.
I picked up the idea from Moron and andand and
continued to form a deterministic O(n) algorithm.
I also assumed that the points are distinct and
n is even (thought the algorithm can be
changed so that uneven n with one point
on the dividing line are also supported).
The algorithm tries to divide the points with a vertical line between them. This only fails if the points in the middle have the same x value. In that case the algorithm determines how many points with the same x value have to be on the left and lower site and and accordingly rotates the line.
I'll try to explain with an example.
Let's asume we have 16 points on a plane.
First we need to get the point with the 8th greatest x-value
and the point with the 9th greatest x-value.
With a selection algorithm this is possible in O(n),
as pointed out in another answer.
If the x-value of that points is different, we are done.
We create a vertical line between that two points and
that splits the points equal.
Problematically now is if the x-values are equal. So we have 3 sets of points.
That on the left side (x < xa), in the middle (x = xa)
and that on the right side (x > xa).
The idea now is to count the points on the left side and
calculate how many points from the middle needs to go there,
so that half of the points are on that side. We can ignore the right side here
because if we have half of the points on the left side, the over half must be on the right side.
So let's asume we have we have 3 points (c=3) on the left side,
6 in the middle and 7 on the right side
(the algorithm doesn't know the count from the middle or right side,
because it doesn't need it, but we could also determine it in O(n)).
We need 8-3=5 points from the middle to go on the left side.
The points we already got in the first step are useless now,
because they are only determined by the x-value
and can be any of the points in the middle.
We want the 5 points from the middle with the lowest y-value on the left side and
the point with the highest y-value on the right side.
Again using the selection algorithm, we get the point with the 5th greatest y-value
and the point with the 6th greatest y-value.
Both points will have the x-value equal to xa,
else we wouldn't get to this step,
because there would be a vertical line.
Now we can create the point Q in the middle of that two points.
Thats one point from the resulting line.
Another point is needed, so that no points from the left or right side are divided.
To get that point we need the point from the left side,
that has the lowest angle (bh) between the the vertical line at xa
and the line determined by that point and Q.
We also need that point from the right side (with angle ag).
The new point R is between the point with the lower angle
and a point on the vertical line
(if the lower angle is on the left side a point above Q
and if the lower angle is on the right side a point below Q).
The line determined by Q and R divides the points in the middle
so that there are a even number of points on both sides.
It doesn't divide any points on the left or right side,
because if it would that point would have a lower angle and
would have been choosen to calculate R.
From the view of a mathematican that should work well in O(n).
For computer programs it is fairly easy to find a case
where precision becomes a problem. An example with 4 points would be
A(0, 100000000), B(0, 100000001), C(0, 0), D(0.0000001, 0).
In this example Q would be (0, 100000000.5) and R (0.00000005, 0).
Which gives B and C on the left side and A and D on the right side.
But it is possible that A and B are both on the dividing line,
because of rounding errors. Or maybe only one of them.
So it belongs to the input values if this algorithm suits to the requirements.
get that two points Pa(xa, ya) and Pb(xb, yb)
which are the medians based on the x values > O(n)
if xa != xb you can stop here
because a y-axis parallel line between that two points is the result > O(1)
get all points where the x value equals xa > O(n)
count points with x value less than xa as c > O(n)
get the lowest point Pc based on the y values from the points from 3. > O(n)
get the greatest point Pd based on the y values from the points from 3. > O(n)
get the (n/2-c)th greatest point Pe based on the y values from the points from 3. > O(n)
also get the next greatest point Pf based on the y values from the points from 3. > O(n)
create a new point Q (xa, (ye+yf)/2)
between Pe and Pf > O(1)
for all points Pi calculate
the angle ai between Pc, Q and Pi and
the angle bi between Pd, Q and Pi > O(n)
get the point Pg with the lowest angle ag (with ag>0° and ag<180°) > O(n)
get the point Ph with the lowest angle bh (with bh>0° and bh<180°) > O(n)
if there aren't any Pg or Ph (all points have same x value)
create a new point R (xa+1, 0) anywhere but with a different x value than xa
else if ag is lower than bh
create a new point R ((xc+xg)/2, (yc+yg)/2) between Pc and Pg
else
create a new point R ((xd+xh)/2, (yd+yh)/2) between Pd and Ph > O(1)
the line determined by Q and R divides the points > O(1)
Given a collection of random points within a grid, how do you check efficiently that they are all lie within a fixed range of other points. ie: Pick any one random point you can then navigate to any other point in the grid.
To clarify further: If you have a 1000 x 1000 grid and randomly placed 100 points in it how can you prove that any one point is within 100 units of a neighbour and all points are accessible by walking from one point to another?
I've been writing some code and came up with an interesting problem: Very occasionally (just once so far) it creates an island of points which exceeds the maximum range from the rest of the points. I need to fix this problem but brute force doesn't appear to be the answer.
It's being written in Java, but I am good with either pseudo-code or C++.
I like #joel.neely 's construction approach but if you want to ensure a more uniform density this is more likely to work (though it would probably produce more of a cluster rather than an overall uniform density):
Randomly place an initial point P_0 by picking x,y from a uniform distribution within the valid grid
For i = 1:N-1
Choose random j = uniformly distributed from 0 to i-1, identify point P_j which has been previously placed
Choose random point P_i where distance(P_i,P_j) < 100, by repeating the following until a valid P_i is chosen in substep 4 below:
Choose (dx,dy) each uniformly distributed from -100 to +100
If dx^2+dy^2 > 100^2, the distance is too large (fails 21.5% of the time), go back to previous step.
Calculate candidate coords(P_i) = coords(P_j) + (dx,dy).
P_i is valid if it is inside the overall valid grid.
Just a quick thought: If you divide the grid into 50x50 patches and when you place the initial points, you also record which patch they belong to. Now, when you want to check if a new point is within 100 pixels of the others, you could simply check the patch plus the 8 surrounding it and see if the point counts match up.
E.g., you know you have 100 random points, and each patch contains the number of points they contain, you can simply sum up and see if it is indeed 100 — which means all points are reachable.
I'm sure there are other ways, tough.
EDIT: The distance from the upper left point to the lower right of a 50x50 patch is sqrt(50^2 + 50^2) = 70 points, so you'd probably have to choose smaller patch size. Maybe 35 or 36 will do (50^2 = sqrt(x^2 + x^2) => x=35.355...).
Find the convex hull of the point set, and then use the rotating calipers method. The two most distant points on the convex hull are the two most distant points in the set. Since all other points are contained in the convex hull, they are guaranteed to be closer than the two extremal points.
As far as evaluating existing sets of points, this looks like a type of Euclidean minimum spanning tree problem. The wikipedia page states that this is a subgraph of the Delaunay triangulation; so I would think it would be sufficient to compute the Delaunay triangulation (see prev. reference or google "computational geometry") and then the minimum spanning tree and verify that all edges have length less than 100.
From reading the references it appears that this is O(N log N), maybe there is a quicker way but this is sufficient.
A simpler (but probably less efficient) algorithm would be something like the following:
Given: the points are in an array from index 0 to N-1.
Sort the points in x-coordinate order, which is O(N log N) for an efficient sort.
Initialize i = 0.
Increment i. If i == N, stop with success. (All points can be reached from another with radius R)
Initialize j = i.
Decrement j.
If j<0 or P[i].x - P[j].x > R, Stop with failure. (there is a gap and all points cannot be reached from each other with radius R)
Otherwise, we get here if P[i].x and P[j].x are within R of each other. Check if point P[j] is sufficiently close to P[i]: if (P[i].x-P[j].x)^2 + (P[i].y-P[j].y)^2 < R^2`, then point P[i] is reachable by one of the previous points within radius R, and go back to step 4.
Keep trying: go back to step 6.
Edit: this could be modified to something that should be O(N log N) but I'm not sure:
Given: the points are in an array from index 0 to N-1.
Sort the points in x-coordinate order, which is O(N log N) for an efficient sort.
Maintain a sorted set YLIST of points in y-coordinate order, initializing YLIST to the set {P[0]}. We'll be sweeping the x-coordinate from left to right, adding points one by one to YLIST and removing points that have an x-coordinate that is too far away from the newly-added point.
Initialize i = 0, j = 0.
Loop invariant always true at this point: All points P[k] where k <= i form a network where they can be reached from each other with radius R. All points within YLIST have x-coordinates that are between P[i].x-R and P[i].x
Increment i. If i == N, stop with success.
If P[i].x-P[j].x <= R, go to step 10. (this is automatically true if i == j)
Point P[j] is not reachable from point P[i] with radius R. Remove P[j] from YLIST (this is O(log N)).
Increment j, go to step 6.
At this point, all points P[j] with j<i and x-coordinates between P[i].x-R and P[i].x are in the set YLIST.
Add P[i] to YLIST (this is O(log N)), and remember the index k within YLIST where YLIST[k]==P[i].
Points YLIST[k-1] and YLIST[k+1] (if they exist; P[i] may be the only element within YLIST or it may be at an extreme end) are the closest points in YLIST to P[i].
If point YLIST[k-1] exists and is within radius R of P[i], then P[i] is reachable with radius R from at least one of the previous points. Go to step 5.
If point YLIST[k+1] exists and is within radius R of P[i], then P[i] is reachable with radius R from at least one of the previous points. Go to step 5.
P[i] is not reachable from any of the previous points. Stop with failure.
New and Improved ;-)
Thanks to Guillaume and Jason S for comments that made me think a bit more. That has produced a second proposal whose statistics show a significant improvement.
Guillaume remarked that the earlier strategy I posted would lose uniform density. Of course, he is right, because it's essentially a "drunkard's walk" which tends to orbit the original point. However, uniform random placement of the points yields a significant probability of failing the "path" requirement (all points being connectible by a path with no step greater than 100). Testing for that condition is expensive; generating purely random solutions until one passes is even more so.
Jason S offered a variation, but statistical testing over a large number of simulations leads me to conclude that his variation produces patterns that are just as clustered as those from my first proposal (based on examining mean and std. dev. of coordinate values).
The revised algorithm below produces point sets whose stats are very similar to those of purely (uniform) random placement, but which are guaranteed by construction to satisfy the path requirement. Unfortunately, it's a bit easier to visualize than to explain verbally. In effect, it requires the points to stagger randomly in a vaguely consistant direction (NE, SE, SW, NW), only changing directions when "bouncing off a wall".
Here's the high-level overview:
Pick an initial point at random, set horizontal travel to RIGHT and vertical travel to DOWN.
Repeat for the remaining number of points (e.g. 99 in the original spec):
2.1. Randomly choose dx and dy whose distance is between 50 and 100. (I assumed Euclidean distance -- square root of sums of squares -- in my trial implementation, but "taxicab" distance -- sum of absolute values -- would be even easier to code.)
2.2. Apply dx and dy to the previous point, based on horizontal and vertical travel (RIGHT/DOWN -> add, LEFT/UP -> subtract).
2.3. If either coordinate goes out of bounds (less than 0 or at least 1000), reflect that coordinate around the boundary violated, and replace its travel with the opposite direction. This means four cases (2 coordinates x 2 boundaries):
2.3.1. if x < 0, then x = -x and reverse LEFT/RIGHT horizontal travel.
2.3.2. if 1000 <= x, then x = 1999 - x and reverse LEFT/RIGHT horizontal travel.
2.3.3. if y < 0, then y = -y and reverse UP/DOWN vertical travel.
2.3.4. if 1000 <= y, then y = 1999 - y and reverse UP/DOWN vertical travel.
Note that the reflections under step 2.3 are guaranteed to leave the new point within 100 units of the previous point, so the path requirement is preserved. However, the horizontal and vertical travel constraints force the generation of points to "sweep" randomly across the entire space, producing more total dispersion than the original pure "drunkard's walk" algorithm.
If I understand your problem correctly, given a set of sites, you want to test whether the nearest neighbor (for the L1 distance, i.e. the grid distance) of each site is at distance less than a value K.
This is easily obtained for the Euclidean distance by computing the Delaunay triangulation of the set of points: the nearest neighbor of a site is one of its neighbor in the Delaunay triangulation. Interestingly, the L1 distance is greater than the Euclidean distance (within a factor sqrt(2)).
It follows that a way of testing your condition is the following:
compute the Delaunay triangulation of the sites
for each site s, start a breadth-first search from s in the triangulation, so that you discover all the vertices at Euclidean distance less than K from s (the Delaunay triangulation has the property that the set of vertices at distance less than K from a given site is connected in the triangulation)
for each site s, among these vertices at distance less than K from s, check if any of them is at L1 distance less than K from s. If not, the property is not satisfied.
This algorithm can be improved in several ways:
the breadth-first search at step 2 should of course be stopped as soon as a site at L1 distance less than K is found.
during the search for a valid neighbor of s, if a site s' is found to be at L1 distance less than K from s, there is no need to look for a valid neighbor for s': s is obviously one of them.
a complete breadth-first search is not needed: after visiting all triangles incident to s, if none of the neighbors of s in the triangulation is a valid neighbor (i.e. a site at L1 distance less than K), denote by (v1,...,vn) the neighbors. There are at most four edges (vi, vi+1) which intersect the horizontal and vertical axis. The search should only be continued through these four (or less) edges. [This follows from the shape of the L1 sphere]
Force the desired condition by construction. Instead of placing all points solely by drawing random numbers, constrain the coordinates as follows:
Randomly place an initial point.
Repeat for the remaining number of points (e.g. 99):
2.1. Randomly select an x-coordinate within some range (e.g. 90) of the previous point.
2.2. Compute the legal range for the y-coordinate that will make it within 100 units of the previous point.
2.3. Randomly select a y-coordinate within that range.
If you want to completely obscure the origin, sort the points by their coordinate pair.
This will not require much overhead vs. pure randomness, but will guarantee that each point is within 100 units of at least one other point (actually, except for the first and last, each point will be within 100 units of two other points).
As a variation on the above, in step 2, randomly choose any already-generated point and use it as the reference instead of the previous point.