Connected points with-in a grid

Connected points with-in a grid - algorithm

Given a collection of random points within a grid, how do you check efficiently that they are all lie within a fixed range of other points. ie: Pick any one random point you can then navigate to any other point in the grid.
To clarify further: If you have a 1000 x 1000 grid and randomly placed 100 points in it how can you prove that any one point is within 100 units of a neighbour and all points are accessible by walking from one point to another?
I've been writing some code and came up with an interesting problem: Very occasionally (just once so far) it creates an island of points which exceeds the maximum range from the rest of the points. I need to fix this problem but brute force doesn't appear to be the answer.
It's being written in Java, but I am good with either pseudo-code or C++.

I like #joel.neely 's construction approach but if you want to ensure a more uniform density this is more likely to work (though it would probably produce more of a cluster rather than an overall uniform density):
Randomly place an initial point P_0 by picking x,y from a uniform distribution within the valid grid
For i = 1:N-1
Choose random j = uniformly distributed from 0 to i-1, identify point P_j which has been previously placed
Choose random point P_i where distance(P_i,P_j) < 100, by repeating the following until a valid P_i is chosen in substep 4 below:
Choose (dx,dy) each uniformly distributed from -100 to +100
If dx^2+dy^2 > 100^2, the distance is too large (fails 21.5% of the time), go back to previous step.
Calculate candidate coords(P_i) = coords(P_j) + (dx,dy).
P_i is valid if it is inside the overall valid grid.

Just a quick thought: If you divide the grid into 50x50 patches and when you place the initial points, you also record which patch they belong to. Now, when you want to check if a new point is within 100 pixels of the others, you could simply check the patch plus the 8 surrounding it and see if the point counts match up.
E.g., you know you have 100 random points, and each patch contains the number of points they contain, you can simply sum up and see if it is indeed 100 — which means all points are reachable.
I'm sure there are other ways, tough.
EDIT: The distance from the upper left point to the lower right of a 50x50 patch is sqrt(50^2 + 50^2) = 70 points, so you'd probably have to choose smaller patch size. Maybe 35 or 36 will do (50^2 = sqrt(x^2 + x^2) => x=35.355...).

Find the convex hull of the point set, and then use the rotating calipers method. The two most distant points on the convex hull are the two most distant points in the set. Since all other points are contained in the convex hull, they are guaranteed to be closer than the two extremal points.

As far as evaluating existing sets of points, this looks like a type of Euclidean minimum spanning tree problem. The wikipedia page states that this is a subgraph of the Delaunay triangulation; so I would think it would be sufficient to compute the Delaunay triangulation (see prev. reference or google "computational geometry") and then the minimum spanning tree and verify that all edges have length less than 100.
From reading the references it appears that this is O(N log N), maybe there is a quicker way but this is sufficient.
A simpler (but probably less efficient) algorithm would be something like the following:
Given: the points are in an array from index 0 to N-1.
Sort the points in x-coordinate order, which is O(N log N) for an efficient sort.
Initialize i = 0.
Increment i. If i == N, stop with success. (All points can be reached from another with radius R)
Initialize j = i.
Decrement j.
If j<0 or P[i].x - P[j].x > R, Stop with failure. (there is a gap and all points cannot be reached from each other with radius R)
Otherwise, we get here if P[i].x and P[j].x are within R of each other. Check if point P[j] is sufficiently close to P[i]: if (P[i].x-P[j].x)^2 + (P[i].y-P[j].y)^2 < R^2`, then point P[i] is reachable by one of the previous points within radius R, and go back to step 4.
Keep trying: go back to step 6.
Edit: this could be modified to something that should be O(N log N) but I'm not sure:
Given: the points are in an array from index 0 to N-1.
Sort the points in x-coordinate order, which is O(N log N) for an efficient sort.
Maintain a sorted set YLIST of points in y-coordinate order, initializing YLIST to the set {P[0]}. We'll be sweeping the x-coordinate from left to right, adding points one by one to YLIST and removing points that have an x-coordinate that is too far away from the newly-added point.
Initialize i = 0, j = 0.
Loop invariant always true at this point: All points P[k] where k <= i form a network where they can be reached from each other with radius R. All points within YLIST have x-coordinates that are between P[i].x-R and P[i].x
Increment i. If i == N, stop with success.
If P[i].x-P[j].x <= R, go to step 10. (this is automatically true if i == j)
Point P[j] is not reachable from point P[i] with radius R. Remove P[j] from YLIST (this is O(log N)).
Increment j, go to step 6.
At this point, all points P[j] with j<i and x-coordinates between P[i].x-R and P[i].x are in the set YLIST.
Add P[i] to YLIST (this is O(log N)), and remember the index k within YLIST where YLIST[k]==P[i].
Points YLIST[k-1] and YLIST[k+1] (if they exist; P[i] may be the only element within YLIST or it may be at an extreme end) are the closest points in YLIST to P[i].
If point YLIST[k-1] exists and is within radius R of P[i], then P[i] is reachable with radius R from at least one of the previous points. Go to step 5.
If point YLIST[k+1] exists and is within radius R of P[i], then P[i] is reachable with radius R from at least one of the previous points. Go to step 5.
P[i] is not reachable from any of the previous points. Stop with failure.

New and Improved ;-)
Thanks to Guillaume and Jason S for comments that made me think a bit more. That has produced a second proposal whose statistics show a significant improvement.
Guillaume remarked that the earlier strategy I posted would lose uniform density. Of course, he is right, because it's essentially a "drunkard's walk" which tends to orbit the original point. However, uniform random placement of the points yields a significant probability of failing the "path" requirement (all points being connectible by a path with no step greater than 100). Testing for that condition is expensive; generating purely random solutions until one passes is even more so.
Jason S offered a variation, but statistical testing over a large number of simulations leads me to conclude that his variation produces patterns that are just as clustered as those from my first proposal (based on examining mean and std. dev. of coordinate values).
The revised algorithm below produces point sets whose stats are very similar to those of purely (uniform) random placement, but which are guaranteed by construction to satisfy the path requirement. Unfortunately, it's a bit easier to visualize than to explain verbally. In effect, it requires the points to stagger randomly in a vaguely consistant direction (NE, SE, SW, NW), only changing directions when "bouncing off a wall".
Here's the high-level overview:
Pick an initial point at random, set horizontal travel to RIGHT and vertical travel to DOWN.
Repeat for the remaining number of points (e.g. 99 in the original spec):
2.1. Randomly choose dx and dy whose distance is between 50 and 100. (I assumed Euclidean distance -- square root of sums of squares -- in my trial implementation, but "taxicab" distance -- sum of absolute values -- would be even easier to code.)
2.2. Apply dx and dy to the previous point, based on horizontal and vertical travel (RIGHT/DOWN -> add, LEFT/UP -> subtract).
2.3. If either coordinate goes out of bounds (less than 0 or at least 1000), reflect that coordinate around the boundary violated, and replace its travel with the opposite direction. This means four cases (2 coordinates x 2 boundaries):
2.3.1. if x < 0, then x = -x and reverse LEFT/RIGHT horizontal travel.
2.3.2. if 1000 <= x, then x = 1999 - x and reverse LEFT/RIGHT horizontal travel.
2.3.3. if y < 0, then y = -y and reverse UP/DOWN vertical travel.
2.3.4. if 1000 <= y, then y = 1999 - y and reverse UP/DOWN vertical travel.
Note that the reflections under step 2.3 are guaranteed to leave the new point within 100 units of the previous point, so the path requirement is preserved. However, the horizontal and vertical travel constraints force the generation of points to "sweep" randomly across the entire space, producing more total dispersion than the original pure "drunkard's walk" algorithm.

If I understand your problem correctly, given a set of sites, you want to test whether the nearest neighbor (for the L1 distance, i.e. the grid distance) of each site is at distance less than a value K.
This is easily obtained for the Euclidean distance by computing the Delaunay triangulation of the set of points: the nearest neighbor of a site is one of its neighbor in the Delaunay triangulation. Interestingly, the L1 distance is greater than the Euclidean distance (within a factor sqrt(2)).
It follows that a way of testing your condition is the following:
compute the Delaunay triangulation of the sites
for each site s, start a breadth-first search from s in the triangulation, so that you discover all the vertices at Euclidean distance less than K from s (the Delaunay triangulation has the property that the set of vertices at distance less than K from a given site is connected in the triangulation)
for each site s, among these vertices at distance less than K from s, check if any of them is at L1 distance less than K from s. If not, the property is not satisfied.
This algorithm can be improved in several ways:
the breadth-first search at step 2 should of course be stopped as soon as a site at L1 distance less than K is found.
during the search for a valid neighbor of s, if a site s' is found to be at L1 distance less than K from s, there is no need to look for a valid neighbor for s': s is obviously one of them.
a complete breadth-first search is not needed: after visiting all triangles incident to s, if none of the neighbors of s in the triangulation is a valid neighbor (i.e. a site at L1 distance less than K), denote by (v1,...,vn) the neighbors. There are at most four edges (vi, vi+1) which intersect the horizontal and vertical axis. The search should only be continued through these four (or less) edges. [This follows from the shape of the L1 sphere]

Force the desired condition by construction. Instead of placing all points solely by drawing random numbers, constrain the coordinates as follows:
Randomly place an initial point.
Repeat for the remaining number of points (e.g. 99):
2.1. Randomly select an x-coordinate within some range (e.g. 90) of the previous point.
2.2. Compute the legal range for the y-coordinate that will make it within 100 units of the previous point.
2.3. Randomly select a y-coordinate within that range.
If you want to completely obscure the origin, sort the points by their coordinate pair.
This will not require much overhead vs. pure randomness, but will guarantee that each point is within 100 units of at least one other point (actually, except for the first and last, each point will be within 100 units of two other points).
As a variation on the above, in step 2, randomly choose any already-generated point and use it as the reference instead of the previous point.

Related

Find two rectangles with minimum areas that cover all points

You're given a n points, unsorted in an array. You're supposed to find two rectangles that cover all points and they should not overlap. Edges of rectangles should be parallel to x or y ordinate.
The program should return the minimum area covered by all these dots. Area of first rectangle + area of second rectangle.
I tried to solve this problem. I sorted points by X ordinate and the first one is the leftmost one of the first rectangle. When we go through the points we find the highest and lowest one. I was thinking that when the difference between two points by x is the biggest, that means that the first point is rightmost one of the first rectangle, and the second point is the leftmost one of the second rectangle.
It should work when the points are given as in first example, however, if the example is the second one it doesn't work. As it would return something like this and that's wrong:
This should be correct:
Then i was thinking doing sorting twice, just, the second time do it by Y ordinate and then compare two total areas. Areas when points are sorted by x and when points are sorted by y and the smaller area is the correct answer.

The two rectangles cannot overlap, so one must be either completely to the right or on top of the other. Your idea to sort the points by x-value and find the biggest gap is good, but you should do that for both directions, as you suggested. That would find the correct rectangles in your example.
The biggest gap isn't necessarily the ideal splitting point, however. Depending on the extent of the bounding boxes in the perpendicular direction, the split may be somewhere else. Consider a rectangular area with four quadrants, where two diagonally opposite quadrants are populated with points:
Here, the ideal split isn't where the largest gap is.
You can find the ideal location by considering all possible splits between points with adjacent x- and y-coordinates.
Sort the points by x-coordinate.
Scan the sorted array in ascending order. Keep track of the minimum rectangle to the left of the current point by storing the minimum and maximum y-coordinates. Store these running top and bottom borders for each point.
Now do the same in descending order, where you keep running top and bottom borders for the right rectangle.
Finally, loop through the points again and calculate the areas of the left and right minimal rectangles for a split between two adjacent nodes. Keep track of the minimum area sum.
Then do the same for minimum top and bottom rectangles. The last two steps can be combined, which will save arrays for the minimum bounds for the right rectangle.
This should be O(n · log n) in time: Sorting is O(n · log n) and the individual passes are O(n). You need O(n) additional memory for the running bounding boxes for thze first rectangle.

The first observation is that any edge of a rectangle must touch one of the points. Edges that didn't touch a point could be pulled back, resulting in less area.
Given n points, there are thus n selections total for left1, right1, bottom1, top1, left2, right2, bottom2 and top2. This gives a simple O(n^8) algorithm already: try all possible assignments and remember the one giving the least total area (right1 - left1)(top1 - bottom1) + (right2 - left2)(top2 - bottom2). Indeed, you can skip any combinations with right < left or top < bottom. This gives a speedup, though it does not change the O(n^8) bound.
Another observation is that the edges should stay within the minimum and maximum X and Y bounds of all points. Find the minimum and maximum X and Y values of any points. Call these minX, maxX, minY and maxY. At least one of your rectangles will need to have its left, right, bottom and top edges, respectively, at those levels.
Because minx, maxX, minY and maxY must be assigned to one of the two rectangles, and there are exactly 2^4 = 16 ways to do this, you can try each of the four possible assignments with the remaining coordinates assigned as above. This gives an O(n^4) algorithm: O(n) to get minX, maxX, minY and maxY, and then O(n^4) to fill in the four unassigned variables for each of 16 assignments of minX, maxX, minY and maxY to the eight edge coordinates.
We have so far ignored the requirement that rectangles not overlap. To accommodate that, we must ensure at least one of the following four conditions holds true:
a horizontal line at Y coordinate h with top1 <= h <= bottom2
a horizontal line at Y coordinate h with top2 <= h <= bottom1
a vertical line at X coordinate w with right1 <= h <= left2
a vertical line at X coordinate w with right2 <= h <= left1
The two rectangles overlap if and only if all four of these conditions are simultaneously false. This allows us to skip over candidate solutions, giving a speedup but not changing the asymptotic bound O(n^4). Note that we need to check this condition specifically since, otherwise, optimal solutions might have overlap (exercise: show such an example).
Let's try to shave some more time off of this. Assume we have non-overlapping rectangles by condition #1 above. Then there are n choices for h; we can try each of these n choices and then determine the area of the resulting selections by finding the minimum and maximum coordinates of points in the resulting halves. By trying all n selections for h, we can determine the "best case" vertical split. We need not try condition #2, since the only difference is in the ordering of the rectangles which is arbitrary. We must also try condition #3 with a horizontal split. This suggests an O(n^2) algorithm:
For each point, choose h = point.y
Separate the points into groups with point.y <= h and point.y > h.
Find the minimum and maximum X and Y coordinates of both subsets of points.
Compute the sum of the areas of the two rectangles.
Remember the minimum area obtained from the above and the corresponding h.
Repeat, but using w and X coordinates.
Determine whether minimum area was obtained for a vertical or horizontal split
Return the corresponding rectangles as the answer
Can we do even better? This is O(n^2) and not O(n) because for each choice of h and w we need to then find the minimum and maximum coordinates of each subgroup. This assumes a linear scan through both subgroups. We don't actually need to do this for the min and max X/Y coordinates when scanning horizontally/vertically, since those will be known. What we need is a solution to this problem:
Given n points and a value h, find the maximum X coordinate of any point whose Y coordinate is no greater than h.
The obvious solution I give above is O(n^2), but you might be able to find an O(n log n) solution by clever application of sorting or maybe even an O(n) solution by some even more clever method. I will not attempt this.
Our solution is O(n^2); the theoretically optimal solution is Omega(n) since you must at least look at all the points. So we're pretty close but there is room for improvement.

Mesh with minimal area between two polylines

I have two polylines v and u with n and m vertices respectively in 3D. I want to connect v[0] to u[0], v[n-1] to u[m-1] and also the inner vertices somehow to obtain a triangle mesh strip with minimal surface area.
My naïve solution is to get the near-optimal initial mesh by subsequent addition of the smallest diagonal and then switch diagonal in every quadrilateral if it produces smaller area until this is no longer possible.
But I am afraid I can end in local minimum and not global. What are the better options to achieve a minimal mesh?

This can be solved with a Dynamic Program.
Let's visualize the problem as a table, where the columns represent the vertices of the first polyline and the rows represent the vertices of the second polyline:
0 1 2 3 ... n-1 -> v
0
1
2
...
m-1
Every cell represents an edge between the polylines. You start at (0, 0) and want to find a path to (n-1, m-1) by taking either (+1, 0) or (0, +1) steps. Every step that you make has a cost (the area of the resulting triangle) and you want to find the path that results in the minimum cost.
So you can iteratively (just in the style of dynamic programming) calculate the cost that is necessary to reach any cell (by comparing the resulting cost of the two possible incoming directions). Remember the direction that you chose and you will have a complete path of minimum cost in the end. The overall runtime will be O(n * m).
If you know that your vertices are more or less nicely distributed, you can restrict the calculation of the table to a few entries near the diagonal. This could get the runtime down to O(k * max(n, m)), where k is the variable radius around the diagonal. But you may miss the optimal solution if the assumption of a nice vertex distribution does not hold.
You could also employ an A*-like strategy where you calculate a cell only when you think it could belong to the minimum path (with the help of some heuristic).

All points with minimum Manhattan distance from all other given points [Optimized]

The problem here is to find set of all integer points which gives minimum sum over all Manhattan distances from given set of points!
For example:
lets have a given set of points { P1, P2, P3...Pn }
Basic problem is to find a point say X which would have minimum sum over all distances from points { P1, P2, P3... Pn }.
i.e. |P1-X| + |P2-X| + .... + |Pn-X| = D, where D will be minimum over all X.
Moving a step further, there can be more than one value of X satisfying above condition. i.e. more than one X can be possible which would give the same value D. So, we need to find all such X.
One basic approach that anyone can think of will be to find the median of inputs and then brute force the co-ordinates which is mentioned in this post
But the problem with such approach is: if the median gives two values which are very apart, then we end up brute forcing all points which will never run in given time.
So, is there any other approach which would give the result even when the points are very far apart (where median gives a range which is of the order of 10^9).

You can consider X and Y separately, since they add to the distance independently of each other. This reduces the question to finding, given n points on a line, a point with the minimum sum-of-distances to the other points. This is simple: any point between the two medians (inclusive) will satisfy this.
Proof: If we have an even number of points, there will be two medians. A point between the two medians will have n/2 points to the left and n/2 points to the right, and a total sum-of-distances to those points of S.
If we move it one point to the left, S will go up by n/2 (since we're moving away from the right-most points) and down by n/2 (since we're moving towards the left-most points), so overall S remains the same. This holds true until we hit the left-most median point. When we move one left of the left-most median point, we now have (n/2 + 1) points to the right, and (n/2 - 1) points to the left, so S goes up by two. Continuing to the left will only increase S further.
By the same logic, all points to the right of the right-most median also have a higher S.
If we have an odd number of points, there is only one median. Using the same logic as above, we can show that it has the lowest value of S.

If the median gives you an interval of the order of 10^9 then each point in that interval is as good as any other.
So depending on what you want to do with those points later on you can either return the range or enumerate points in that range. No way around it..
Obviously in two dimensions you'll get a bouding rectangle, in 3 dimensions a bounding cuboid etc..
The result will always be a cartesian product of ranges obtained for each dimension, so you can return a list of those ranges as a result.

Since in manhattan distance each component contributes separately, you can consider them separately too. The optimal answer is ( median(x),median(y) ). You need to look around this point for integer solutions.
NOTE: I did not read your question properly while answering. My answer still holds, but probably you knew about this solution already.

Yes i also think that for odd number of N points on a grid , there will be only a Single point(i.e the MEDIAN) which will be at minimum sum of Manhattan distance from all other points.
For Even value of N, the scenario will be a little different.
According to me if two Sets X = {1,2} and Y= {3,4} their Cartesian product will be always 4.
i.e X × Y = {1,2} × {3,4} = {(1,3), (1,4), (2,3), (2,4)}. This is what i have understood so far.
As for EVEN number of values we always take "MIDDLE TWO" values as MEDIAN. Taking 2 from X and 2 from Y will always return a Cartesian product of 4 points.
Correct me if i am wrong.

Find the largest convex black area in an image

I have an image of which this is a small cut-out:
As you can see it are white pixels on a black background. We can draw imaginary lines between these pixels (or better, points). With these lines we can enclose areas.
How can I find the largest convex black area in this image that doesn't contain a white pixel in it?
Here is a small hand-drawn example of what I mean by the largest convex black area:
P.S.: The image is not noise, it represents the primes below 10000000 ordered horizontally.

Trying to find maximum convex area is a difficult task to do. Wouldn't you just be fine with finding rectangles with maximum area? This problem is much easier and can be solved in O(n) - linear time in number of pixels. The algorithm follows.
Say you want to find largest rectangle of free (white) pixels (Sorry, I have images with different colors - white is equivalent to your black, grey is equivalent to your white).
You can do this very efficiently by two pass linear O(n) time algorithm (n being number of pixels):
1) in a first pass, go by columns, from bottom to top, and for each pixel, denote the number of consecutive pixels available up to this one:
repeat, until:
2) in a second pass, go by rows, read current_number. For each number k keep track of the sums of consecutive numbers that were >= k (i.e. potential rectangles of height k). Close the sums (potential rectangles) for k > current_number and look if the sum (~ rectangle area) is greater than the current maximum - if yes, update the maximum. At the end of each line, close all opened potential rectangles (for all k).
This way you will obtain all maximum rectangles. It is not the same as maximum convex area of course, but probably would give you some hints (some heuristics) on where to look for maximum convex areas.

I'll sketch a correct, poly-time algorithm. Undoubtedly there are data-structural improvements to be made, but I believe that a better understanding of this problem in particular will be required to search very large datasets (or, perhaps, an ad-hoc upper bound on the dimensions of the box containing the polygon).
The main loop consists of guessing the lowest point p in the largest convex polygon (breaking ties in favor of the leftmost point) and then computing the largest convex polygon that can be with p and points q such that (q.y > p.y) || (q.y == p.y && q.x > p.x).
The dynamic program relies on the same geometric facts as Graham's scan. Assume without loss of generality that p = (0, 0) and sort the points q in order of the counterclockwise angle they make with the x-axis (compare two points by considering the sign of their dot product). Let the points in sorted order be q1, …, qn. Let q0 = p. For each 0 ≤ i < j ≤ n, we're going to compute the largest convex polygon on points q0, a subset of q1, …, qi - 1, qi, and qj.
The base cases where i = 0 are easy, since the only “polygon” is the zero-area segment q0qj. Inductively, to compute the (i, j) entry, we're going to try, for all 0 ≤ k ≤ i, extending the (k, i) polygon with (i, j). When can we do this? In the first place, the triangle q0qiqj must not contain other points. The other condition is that the angle qkqiqj had better not be a right turn (once again, check the sign of the appropriate dot product).
At the end, return the largest polygon found. Why does this work? It's not hard to prove that convex polygons have the optimal substructure required by the dynamic program and that the program considers exactly those polygons satisfying Graham's characterization of convexity.

You could try treating the pixels as vertices and performing Delaunay triangulation of the pointset. Then you would need to find the largest set of connected triangles that does not create a concave shape and does not have any internal vertices.

If I understand your problem correctly, it's an instance of Connected Component Labeling. You can start for example at: http://en.wikipedia.org/wiki/Connected-component_labeling

I thought of an approach to solve this problem:
Out of the set of all points generate all possible 3-point-subsets. This is a set of all the triangles in your space. From this set remove all triangles that contain another point and you obtain the set of all empty triangles.
For each of the empty triangles you would then grow it to its maximum size. That is, for every point outside the rectangle you would insert it between the two closest points of the polygon and check if there are points within this new triangle. If not, you will remember that point and the area it adds. For every new point you want to add that one that maximizes the added area. When no more point can be added the maximum convex polygon has been constructed. Record the area for each polygon and remember the one with the largest area.
Crucial to the performance of this algorithm is your ability to determine a) whether a point lies within a triangle and b) whether the polygon remains convex after adding a certain point.
I think you can reduce b) to be a problem of a) and then you only need to find the most efficient method to determine whether a point is within a triangle. The reduction of the search space can be achieved as follows: Take a triangle and increase all edges to infinite length in both directions. This separates the area outside the triangle into 6 subregions. Good for us is that only 3 of those subregions can contain points that would adhere to the convexity constraint. Thus for each point that you test you need to determine if its in a convex-expanding subregion, which again is the question of whether it's in a certain triangle.
The whole polygon as it evolves and approaches the shape of a circle will have smaller and smaller regions that still allow convex expansion. A point once in a concave region will not become part of the convex-expanding region again so you can quickly reduce the number of points you'll have to consider for expansion. Additionally while testing points for expansion you can further cut down the list of possible points. If a point is tested false, then it is in the concave subregion of another point and thus all other points in the concave subregion of the tested points need not be considered as they're also in the concave subregion of the inner point. You should be able to cut down to a list of possible points very quickly.
Still you need to do this for every empty triangle of course.
Unfortunately I can't guarantee that by adding always the maximum new region your polygon becomes the maximum polygon possible.

Dividing a plane of points into two equal halves [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 3 years ago.
Improve this question
Given a 2 dimensional plane in which there are n points. I need to generate the equation of a line that divides the plane such that there are n/2 points on one side and n/2 points on the other.

I have assumed the points are distinct, otherwise there might not even be such a line.
If points are distinct, then such a line always exists and is possible to find using a deterministic O(nlogn) time algorithm.
Say the points are P1, P2, ..., P2n. Assume they are not all on the same line. If they were, then we can easily form the splitting line.
First translate the points so that all the co-ordinates (x and y) are positive.
Now suppose we magically had a point Q on the y-axis such that no line formed by those points (i.e. any infinite line Pi-Pj) passes through Q.
Now since Q does not lie within the convex hull of the points, we can easily see that we can order the points by a rotating line passing through Q. For some angle of rotation, half the points will lie on one side and the other half will lie on the other of this rotating line, or, in other words, if we consider the points being sorted by the slope of the line Pi-Q, we could pick a slope between the (median)th and (median+1)th points. This selection can be done in O(n) time by any linear time selection algorithm without any need for actually sorting the points.
Now to pick the point Q.
Say Q was (0,b).
Suppose Q was collinear with P1 (x1,y1) and P2 (x2,y2).
Then we have that
(y1-b)/x1 = (y2-b)/x2 (note we translated the points so that xi > 0).
Solving for b gives
b = (x1y2 - y1x2)/(x1-x2)
(Note, if x1 = x2, then P1 and P2 cannot be collinear with a point on the Y axis).
Consider |b|.
|b| = |x1y2 - y1x2| / |x1 -x2|
Now let the xmax be the x-coordinate of the rightmost point and ymax the co-ordinate of the topmost.
Also let D be the smallest non-zero x-coordinate difference between two points (this exists, as not all xis are same, as not all points are collinear).
Then we have that |b| <= xmax*ymax/D.
Thus, pick our point Q (0,b) to be such that |b| > b_0 = xmax*ymax/D
D can be found in O(nlogn) time.
The magnitude of b_0 can get quite large and we might have to deal with precision issues.
Of course, a better option is to pick Q randomly! With probability 1, you will find the point you need, thus making the expected running time O(n).
If we could find a way to pick Q in O(n) time (by finding some other criterion), then we can make this algorithm run in O(n) time.

Create an arbitrary line in that plane. Project each point onto that line a.k.a for each point, get the closest point on that line to that point.
Order those points along the line in either direction, and choose a point on that line such that there is an equal number of points on the line in either direction.
Get the line perpendicular to the first line which passes through that point. This line will have half the original points on either side.
There are some cases to avoid when doing this. Most importantly, if all the point are themselves on a single line, don't choose a perpendicular line which passes through it. In fact, choose that line itself so you don't have to worry about projecting the points. In terms of the actual mathematics behind this, vector projections will be very useful.

This is a modification of Dividing a plane of points into two equal halves which allows for the case with overlapping points (in which case, it will say whether or not the answer exists).
If number of points is odd, return "impossible".
Pick a random line (two random points)
Project all points onto this line (`O(N)` operation)
(i.e. we pretend this line is our new X'-axis, and write down the
X'-coordinate of each point)
Perform any median-finding algorithm on the X'-coordinates
(`O(N)` or faster-if-desired operation)
(returns 2 medians if no overlapping points)
Return the line perpendicular to original random line that splits the medians
In rare case of overlapping points, repeat a few times (it would take
a pathological case to prevent a solution from existing).
This is O(N) unlike other proposed solutions.
Assuming a solution exists, the above method will probably terminate, though I don't have a proof.
Try the above algorithm a few times unless you detect overlapping points. If you detect a high number of overlapping points, you may be in for a rough ride, but there is a terribly inefficient brute-force solution that involves checking all possible angles:
For every "critical slope range", perform the above algorithm
by choosing a line with a slope within the range.
If all critical slope ranges fail, the solution is impossible.
A critical angle is defined as the angle which could possibly change the result (imagine the solution to a previous answer, rotate the entire set of points until one or more points swaps position with one or more other points, crossing the dividing line. There are only finitely many of these, and I think they are bounded by the number of points, so you're probably looking at something in the range O(N^2)-O(N^2 log(N)) if you have overlapping points, for a brute-force approach.

I'd guess that a good way is to sort/sequence/order the points (e.g. from left to right), and then choose a line which passes through (or between) the middle point[s] in the sequence.

There are obvious cases where no solution is possible. E.g. when you have three heaps of points. One point at location A, Two points at location B, and five points at location C.
If you expect some decent distribution, you can probably get a good result with tlayton's algorithm. To select the initial line slant, you could determine the extent of the whole point set, and choose the angle of the largest diagonal.

The median equally divides a set of numbers in the manner similar to what you're trying to accomplish, and it can be computed in O(n) time using a selection algorithm (the writeup in Cormen et al is better, so you may want to look there instead). So, find the median of your x values Mx (or your y values My if you prefer) and set x = Mx (or y = My) and that line will be axially aligned and split your points equally.
If the nature of your problem requires that no more than one point lies on the line (if you have an odd number of points in your set, at least one of them will be on the line) and you discover that's what's happened (or you just want to guard against the possibility), rotate all of your points by some random angle, θ, and compute the median of the rotated points. You then rotate the median line you computed by -θ and it will evenly divide points.
The likelihood of randomly choosing θ such that the problem manifests itself again is very small with a finite number of points, but if it does, try again with a different θ.

Here is how I approach this problem (with the assumption that n is even and NO three points are collinear):
1) Pick up the point with smallest Y value. Let's call it point P.
2) Take this point as the new origin point, so that all other points will have positive Y values after this transformation.
3) For every other point (there are n - 1 points remaining), think it under the polar coordinate system. Each other point can be represented with a radius and angle. You could ignore the radius and just focus on the angle.
4) How can you find a line that split the points evenly? Find the median of (n - 1) angles. The line from point P to the point with that median angle will split the points evenly.
Time complexity for this algorithm is O(n).

I dont know how useful this is I have seen a similar problem...
If you already have the directional vector (aka the coefficients of the dimensions of your plane).
You can then find two points inside that plane, and by simply using the midpoint formula you can find the midpoint of that plane.
Then using the coefficients of that plane and the midpoint you can find a plane that is from equal distance from both points, using the general equation of a plane.
A line then would constitute in expressing one variable in terms of the other
so you would find a line with equal distance between both planes.
There are different methods of doing this such as projection using the distance equation from a plane but I believe that would complicate your math a lot.

To add to M's answer: a method to generate a Q (that's not so far away) in O(n log n).
To begin with, let Q be any point on the y-axis ie. Q = (0,b) - some good choices might be (0,0) or (0, (ymax-ymin)/2).
Now check if there are two points (x1, y1), (x2, y2) collinear with Q. The line between any point and Q is y = mx + b; since b is constant, this means two points are collinear with Q if their slopes m are equal. So determine the slopes mi for all points and check if there are any duplicates: (amoritized O(n) using a hash-table)
If all the m's are distinct, we're done; we found Q, and M's algorithm above generates the line in O(n) steps.
If two points are collinear with Q, we'll move Q up just a tiny amount ε, Qnew = (0, b + ε), and show that Qnew will not be collinear with two other points.
The criterion for ε, derived below, is:
ε < mminΔ*xmin
To begin with, our m's look like this:
mi = yi/xi - b/xi
Let's find the minimum difference between any two distinct mi and call it mminΔ (O(n log n) by, for instance, sorting then comparing differences between mi and i+1 for all i)
If we fudge b up by ε, the new equation for m becomes:
mi,new = yi/xi - b/xi - ε/xi
= mi,old - ε/xi
Since ε > 0 and xi > 0, all m's are reduced, and all are reduced by a maximum of ε/xmin. Thus, if
ε/xmin < mminΔ, ie.
ε < mminΔ*xmin
is true, then two mi which were previously unequal will be guaranteed to remain unequal.
All that's left is to show that if m1,old = m2,old, then m1,new =/= m2,new. Since both mi were reduced by an amount ε/xi, this is equivalent to showing x1 =/= x2. If they were equal, then:
y1 = m1,oldx1 + b = m2,oldx2 + b = y2
Contradicting our assumption that all points are distinct. Thus, m1, new =/= m2, new, and no two points are collinear with Q.

I picked up the idea from Moron and andand and
continued to form a deterministic O(n) algorithm.
I also assumed that the points are distinct and
n is even (thought the algorithm can be
changed so that uneven n with one point
on the dividing line are also supported).
The algorithm tries to divide the points with a vertical line between them. This only fails if the points in the middle have the same x value. In that case the algorithm determines how many points with the same x value have to be on the left and lower site and and accordingly rotates the line.
I'll try to explain with an example.
Let's asume we have 16 points on a plane.
First we need to get the point with the 8th greatest x-value
and the point with the 9th greatest x-value.
With a selection algorithm this is possible in O(n),
as pointed out in another answer.
If the x-value of that points is different, we are done.
We create a vertical line between that two points and
that splits the points equal.
Problematically now is if the x-values are equal. So we have 3 sets of points.
That on the left side (x < xa), in the middle (x = xa)
and that on the right side (x > xa).
The idea now is to count the points on the left side and
calculate how many points from the middle needs to go there,
so that half of the points are on that side. We can ignore the right side here
because if we have half of the points on the left side, the over half must be on the right side.
So let's asume we have we have 3 points (c=3) on the left side,
6 in the middle and 7 on the right side
(the algorithm doesn't know the count from the middle or right side,
because it doesn't need it, but we could also determine it in O(n)).
We need 8-3=5 points from the middle to go on the left side.
The points we already got in the first step are useless now,
because they are only determined by the x-value
and can be any of the points in the middle.
We want the 5 points from the middle with the lowest y-value on the left side and
the point with the highest y-value on the right side.
Again using the selection algorithm, we get the point with the 5th greatest y-value
and the point with the 6th greatest y-value.
Both points will have the x-value equal to xa,
else we wouldn't get to this step,
because there would be a vertical line.
Now we can create the point Q in the middle of that two points.
Thats one point from the resulting line.
Another point is needed, so that no points from the left or right side are divided.
To get that point we need the point from the left side,
that has the lowest angle (bh) between the the vertical line at xa
and the line determined by that point and Q.
We also need that point from the right side (with angle ag).
The new point R is between the point with the lower angle
and a point on the vertical line
(if the lower angle is on the left side a point above Q
and if the lower angle is on the right side a point below Q).
The line determined by Q and R divides the points in the middle
so that there are a even number of points on both sides.
It doesn't divide any points on the left or right side,
because if it would that point would have a lower angle and
would have been choosen to calculate R.
From the view of a mathematican that should work well in O(n).
For computer programs it is fairly easy to find a case
where precision becomes a problem. An example with 4 points would be
A(0, 100000000), B(0, 100000001), C(0, 0), D(0.0000001, 0).
In this example Q would be (0, 100000000.5) and R (0.00000005, 0).
Which gives B and C on the left side and A and D on the right side.
But it is possible that A and B are both on the dividing line,
because of rounding errors. Or maybe only one of them.
So it belongs to the input values if this algorithm suits to the requirements.
get that two points Pa(xa, ya) and Pb(xb, yb)
which are the medians based on the x values > O(n)
if xa != xb you can stop here
because a y-axis parallel line between that two points is the result > O(1)
get all points where the x value equals xa > O(n)
count points with x value less than xa as c > O(n)
get the lowest point Pc based on the y values from the points from 3. > O(n)
get the greatest point Pd based on the y values from the points from 3. > O(n)
get the (n/2-c)th greatest point Pe based on the y values from the points from 3. > O(n)
also get the next greatest point Pf based on the y values from the points from 3. > O(n)
create a new point Q (xa, (ye+yf)/2)
between Pe and Pf > O(1)
for all points Pi calculate
the angle ai between Pc, Q and Pi and
the angle bi between Pd, Q and Pi > O(n)
get the point Pg with the lowest angle ag (with ag>0° and ag<180°) > O(n)
get the point Ph with the lowest angle bh (with bh>0° and bh<180°) > O(n)
if there aren't any Pg or Ph (all points have same x value)
create a new point R (xa+1, 0) anywhere but with a different x value than xa
else if ag is lower than bh
create a new point R ((xc+xg)/2, (yc+yg)/2) between Pc and Pg
else
create a new point R ((xd+xh)/2, (yd+yh)/2) between Pd and Ph > O(1)
the line determined by Q and R divides the points > O(1)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio