algorithm to create bounding rectangles for 2D points - algorithm

The input is a series of point coordinates (x0,y0),(x1,y1) .... (xn,yn) (n is not very large, say ~ 1000). We need to create some rectangles as bounding box of these points. There's no need to find the global optimal solution. The only requirement is if the euclidean distance between two point is less than R, they should be in the same bounding rectangle. I've searched for sometime and it seems to be a clustering problem and K-means method might be a useful one.
However, the input point coordinates didn't have specific pattern from time to time. So it maybe not possible to set a specific K in K-mean. I am wondering if there is any algorithm or method possible to solve this problem?

The only requirement is if the euclidean distance between two point is less than R, they should be in the same bounding rectangle
This is the definition of single-linkage hierarchical clustering cut at a height of R.
Note that this may yield overlapping rectangles.
For much faster and highly efficient methods, have a look at bulk loading strategies for R*-trees, such as sort-tile-recursive. It won't satisfy your "only" requirement above, but it will yield well balanced, non-overlapping rectangles.
K-means is obviously not appropriate for your requirements.

With only 1000 points I would do the following:
1) Work out the difference between all pairs of points. If the distance of a pair is less than R, they need to go in the same bounding rectangle, so use http://en.wikipedia.org/wiki/Disjoint-set_data_structure to record this.
2) For each subset that comes out of your Disjoint set data structure, work out the min and max co-ordinates of the points in it and use this to create a bounding box for the points in this subset.
If you have more points or are worried about efficiency, you will want to make stage (1) more efficient. One easy way would be to go through the points in order of x co-ordinate, keeping only points at most R to the left of the most recent point seen, and using a balanced tree structure to find from these the points at most R above or below the most recent point seen, before calculating the distance to the most recent point seen. One step up from this would be to create a spatial data structure to get yet more efficiency in finding pairs with distance R of each other.
Note that for some inputs you will get just one huge bounding box because you have long chains of points, and for some other inputs you will get bounding boxes inside bounding boxes, for instance if your points are in concentric circles.

Related

What is an efficient way of determining the number of intersections between several lines defined in Cartesian [x,y,z] space?

I'm trying to efficiently calculate whether any of several curved lines, defined in Cartesian [x,y,z] space, intersect. I have a working algorithm that calculates the intersection of 2 lines - the simplest case.
However, I need to scale up my algorithm to calculate whether any of 100,000 lines do intersect. I am expecting there to be little to no intersections. I was wondering if anyone had any advice as to how to scale up my intersection algorithm (i.e. what is the minimum number of computations I need to run). I'm using MATLAB, but I am interested also in general logic answers.
Each individual line is organised in vector format as follows:
V1 = [x1 y1 z1; x2 y2 z2;... ; xn, yn, zn]
In the past I have use a kd-tree to solve problems like this. Try to access the paper: K-d Trees for Semidynamic Point Sets by Jon Louis Bentley.
In its simplest form it can find the nearest neighbor to each point in massive point sets very fast.
To summarize, all the points are placed into buckets as you build the kd-tree. Then as you search with each point, you descend down into the tree, eliminating half the remaining points at each step with a fast test to see which side of a wall you are on.
With adjustments, you can do point-in-sphere tests, n-nearest-neighbors etc.
For tests that are not conveniently spherical, like triangle intersections and in your case curve-curve intersections, you can create an axis-aligned bounding box around each curve, and as you build the tree you use the center of the AABB in order to partition the space (same as with points), but then you make another pass where you attach a list of the overlapping AABBs to each bucket. Then when it comes to searching the tree for each curve, you end up testing its AABB against buckets it overlap with, and with other curves in those buckets' overlap list. These AABB intersection tests eliminate most of the set very fast (benchmarks in the paper), and then you are left to do your actual curve-curve intersection test on a handful of objects.
You'll get plenty of Google results for kdtree implementations in matlab. Just make sure they can find intersections of objects in bounding boxes.

Optimize bruteforce solution of searching nearest point

I have non empty Set of points scattered on plane, they are given by their coordinates.
Problem is to quickly reply such queries:
Give me the point from your set which is nearest to the point A(x, y)
My current solution pseudocode
query( given_point )
{
nearest_point = any point from Set
for each point in Set
if dist(point, query_point) < dist(nearest_point, given_point)
nearest_point = point
return nearest_point
}
But this algorithm is very slow with complexity is O(N).
The question is, is there any data structure or tricky algorithms with precalculations which will dramatically reduce time complexity? I need at least O(log N)
Update
By distance I mean Euclidean distance
You can get O(log N) time using a kd-tree. This is like a binary search tree, except that it splits points first on the x-dimension, then the y-dimension, then the x-dimension again, and so on.
If your points are homogeneously distributed, you can achieve O(1) look-up by binning the points into evenly-sized boxes and then searching the box in which the query point falls and its eight neighbouring boxes.
It would be difficult to make an efficient solution from Voronoi diagrams since this requires that you solve the problem of figuring out which Voronoi cell the query point falls in. Much of the time this involves building an R*-tree to query the bounding boxes of the Voronoi cells (in O(log N) time) and then performing point-in-polygon checks (O(p) in the number of points in the polygon's perimeter).
You can divide your grid in subsections:
Depending on the number of points and grid size, you choose a useful division. Let's assume a screen of 1000x1000 pixels, filled with random points, evenly distributed over the surface.
You may divide the screen into 10x10 sections and make a map (roughX, roughY)->(List ((x, y), ...). For a certain point, you may lookup all points in the same cell and - since the point may be closer to points of the neighbor cell than to an extreme point in the same cell, the surrounding cells, maybe even 2 cells away. This would reduce the searching scope to 16 cells.
If you don't find a point in the same cell/layer, expand the search to next layer.
If you happen to find the next neighbor in one of the next layers, you have to expand the searching scope to an additional layer for each layer. If there are too many points, choose a finer grid. If there are to few points, choose a bigger grid. Note, that the two green circles, connected to the red with a line, have the same distance to the red one, but one is in layer 0 (same cell) but the other layer 2 (next of next cell).
Without preprocessing you definitely need to spend O(N), as you must look at every point before return the closest.
You can look here Nearest neighbor search for how to approach this problem.

Algorithm to Produce an Evenly Spaced Grid

I'm looking for a general algorithm for creating an evenly spaced grid, and I've been surprised how difficult it is to find!
Is this a well solved problem whose name I don't know?
Or is this an unsolved problem that is best done by self organising map?
More specifically, I'm attempting to make a grid on a 2D Cartesian plane in which the Euclidean distance between each point and 4 bounding lines (or "walls" to make a bounding box) are equal or nearly equal.
For a square number, this is as simple as making a grid with sqrt(n) rows and sqrt(n) columns with equal spacing positioned in the center of the bounding box. For 5 points, the pattern would presumably either be circular or 4 points with a point in the middle.
I didn't find a very good solution, so I've sadly left the problem alone and settled with a quick function that produces the following grid:
There is no simple general solution to this problem. A self-organizing map is probably one of the best choices.
Another way to approach this problem is to imagine the points as particles that repel each others and that are also repelled by the walls. As an initial arrangement, you could already evenly distribute the points up to the next smaller square number - for this you already have a solution. Then randomly add the remaining points.
Iteratively modify the locations to minimize the energy function based on the total force between the particles and walls. The result will of course depend on the force law, i.e. how the force depends on the distance.
To solve this, you can use numerical methods like FEM.
A simplified and less efficient method that is based on the same principle is to first set up an estimated minimal distance, based on the square number case which you can calculate. Then iterate through all points a number of times and for each one calculate the distance to its closest neighbor. If this is smaller than the estimated distance, move your point into the opposite direction by a certain fraction of the difference.
This method will generally not lead to a stable minimum but should find an acceptable solution after a number ot iterations. You will have to experiment with the stepsize and the number of iterations.
To summarize, you have three options:
FEM method: Efficient but difficult to implement
Self organizing map: Slightly less efficient, medium complexity of implementation.
Iteration described in last section: Less efficient but easy to implement.
Unfortunately your problem is still not very clearly specified. You say you want the points to be "equidistant" yet in your example, some pairs of points are far apart (eg top left and bottom right) and the points are all different distances from the walls.
Perhaps you want the points to have equal minimum distance? In which case a simple solution is to draw a cross shape, with one point in the centre and the remainder forming a vertical and horizontal crossed line. The gap between the walls and the points, and the points in the lines can all be equal and this can work with any number of points.

How to compute the union polygon of two (or more) rectangles

For example we have two rectangles and they overlap. I want to get the exact range of the union of them. What is a good way to compute this?
These are the two overlapping rectangles. Suppose the cords of vertices are all known:
How can I compute the cords of the vertices of their union polygon? And what if I have more than two rectangles?
There exists a Line Sweep Algorithm to calculate area of union of n rectangles. Refer the link for details of the algorithm.
As said in article, there exist a boolean array implementation in O(N^2) time. Using the right data structure (balanced binary search tree), it can be reduced to O(NlogN) time.
Above algorithm can be extended to determine vertices as well.
Details:
Modify the event handling as follows:
When you add/remove the edge to the active set, note the starting point and ending point of the edge. If any point lies inside the already existing active set, then it doesn't constitute a vertex, otherwise it does.
This way you are able to find all the vertices of resultant polygon.
Note that above method can be extended to general polygon but it is more involved.
For a relatively simple and reliable way, you can work as follows:
sort all abscissas (of the vertical sides) and ordinates (of the horizontal sides) independently, and discard any duplicate.
this establishes mappings between the coordinates and integer indexes.
create a binary image of size NxN, filled with black.
for every rectangle, fill the image in white between the corresponding indexes.
then scan the image to find the corners, by contour tracing, and revert to the original coordinates.
This process isn't efficient as it takes time proportional to N² plus the sum of the (logical) areas of the rectangles, but it can be useful for a moderate amount of rectangles. It easily deals with coincidences.
In the case of two rectangles, there aren't so many different configurations possible and you can precompute all vertex sequences for the possible configuration (a small subset of the 2^9 possible images).
There is no need to explicitly create the image, just associate vertex sequences to the possible permutations of the input X and Y.
Look into binary space partitioning (BSP).
https://en.wikipedia.org/wiki/Binary_space_partitioning
If you had just two rectangles then a bit of hacking could yield some result, but for finding intersections and unions of multiple polygons you'll want to implement BSP.
Chapter 13 of Geometric Tools for Computer Graphics by Schneider and Eberly covers BSP. Be sure to download the errata for the book!
Eberly, one of the co-authors, has a wonderful website with PDFs and code samples for individual topics:
https://www.geometrictools.com/
http://www.geometrictools.com/Books/Books.html
Personally I believe this problem should be solved just as all other geometry problems are solved in engineering programs/languages, meshing.
So first convert your vertices into rectangular grids of fixed size, using for example:
MatLab meshgrid
Then go through all of your grid elements and remove any with duplicate edge elements. Now sum the number of remaining meshes and times it by the area of the mesh you have chosen.

Shortest distance to rectangle caching

I have a list of rectangles that don't have to be parallel to the axes. I also have a master rectangle that is parallel to the axes.
I need an algorithm that can tell which rectangle is a point closest to(the point must be in the master rectangle). the list of rectangles and master rectangle won't change during the algorithm and will be called with many points so some data structure should be created to make the lookup faster.
To be clear: distance from a rectangle to a point is the distance between the closest point in the rectangle to the point.
What algorithm/data structure can be used for this? memory is on higher priority on this, n log n is ok but n^2 is not.
You should be able to do this with a Voronoi diagram with O(n log n) preprocessing time with O(log n) time queries. Because the objects are rectangles, not points, the cells may be curved. Nevertheless, a Voronoi diagram should work fine for your purposes. (See http://en.wikipedia.org/wiki/Voronoi_diagram)
For a quick and dirty solution that you could actually get working within a day, you could do something inspired by locality sensitive hashing. For example, if the rectangles are somewhat well-spaced, you could hash them into square buckets with a few different offsets, and then for each query examine each rectangle that falls in one of the handful of buckets that contain the query point.
You should be able to do this in O(n) time and O(n) memory.
Calculate the closest point on each edge of each rectangle to the point in question. To do this, see my detailed answer in the this question. Even though the question has to do with a point inside of the polygon (rather than outside of it), the algorithm still can be applied here.
Calculate the distance between each of these closest points on the edges, and find the closest point on the entire rectangle (for each rectangle) to the point in question. See the link above for more details.
Find the minimum distance between all of the rectangles. The rectangle corresponding with your minimum distance is the winner.
If memory is more valuable than speed, use brute force: for a given point S, compute the distance from S to each edge. Choose the rectangle with the shortest distance.
This solution requires no additional memory, while its execution time is in O(n).
Depending on your exact problem specification, you may have to adjust this solution if the rectangles are allowed to overlap with the master rectangle.
As you described, a distance between one point to a rectangle is the minimum length of all lines through that point which is perpendicular with all four edges of a rectangle and all lines connect that point with one of four vertices of the rectangle.
(My English is not good at describing a math solution, so I think you should think more deeply for understanding my explanation).
For each rectangle, you should save four vertices and four edges function for fast calculation distance between them with the specific point.

Resources