Say you have a 2D grid with each spot on the grid having x number of objects (with x >=0). I am having trouble thinking of a clean algorithm so that when a user specifies a coordinate, the algorithm finds the closest coordinate (including the one specified) with an object on it.
For simplicity's sake, we'll assume that if 2 coordinates are the same distance away the first one will be returned (or if your algorithm doesn't work this way then the last one, doesn't matter).
Edit: A coordinate that is 1 away must be either 1 up, down, left or right. Coordinates that are away diagonally are 2.
As a side note, what is a great, free, online reference for algorithms?
Update
With new information:
Assuming that a coordinate diagonally
is 2 away
This algorithm would work. The algorithm searches outwards in a spiral kinda way testing each point in each 'ring' started from the inside.
Note that it does not handle out of bounds situations. So you should change this to fit your needs.
int xs, ys; // Start coordinates
// Check point (xs, ys)
for (int d = 1; d<maxDistance; d++)
{
for (int i = 0; i < d + 1; i++)
{
int x1 = xs - d + i;
int y1 = ys - i;
// Check point (x1, y1)
int x2 = xs + d - i;
int y2 = ys + i;
// Check point (x2, y2)
}
for (int i = 1; i < d; i++)
{
int x1 = xs - i;
int y1 = ys + d - i;
// Check point (x1, y1)
int x2 = xs + i;
int y2 = ys - d + i;
// Check point (x2, y2)
}
}
Old version
Assuming that in your 2D grid the distance between (0, 0) and (1, 0) is the same as (0, 0) and (1, 1). This algorithm would work. The algorithm searches outwards in a spiral kinda way testing each point in each 'ring' started from the inside.
Note that it does not handle out of bounds situations. So you should change this to fit your needs.
int xs, ys; // Start coordinates
if (CheckPoint(xs, ys) == true)
{
return (xs, ys);
}
for (int d = 0; d<maxDistance; d++)
{
for (int x = xs-d; x < xs+d+1; x++)
{
// Point to check: (x, ys - d) and (x, ys + d)
if (CheckPoint(x, ys - d) == true)
{
return (x, ys - d);
}
if (CheckPoint(x, ys + d) == true)
{
return (x, ys - d);
}
}
for (int y = ys-d+1; y < ys+d; y++)
{
// Point to check = (xs - d, y) and (xs + d, y)
if (CheckPoint(x, ys - d) == true)
{
return (xs - d, y);
}
if (CheckPoint(x, ys + d) == true)
{
return (xs - d, y);
}
}
}
If you have a list of objects
If you had all the positions of all the objects in a list, this would be a lot easier as you wouldn't need to search all the empty squares and could perform 2D distance calculations to determine the one closest to you. Loop through your list of objects and calculate the distance as follows:
Define your two points. Point 1 at (x1, y1) and Point 2 at (x2, y2).
xd = x2-x1
yd = y2-y1
Distance = SquareRoot(xd*xd + yd*yd)
Then simply pick the one with the shortest distance.
If you only have a 2D array
If however the problem as described assumes a 2D array where the locations of the objects cannot be listed without first searching for all of them, then you are going to have to do a spiral loop.
Searching for 'Spiral Search Method' comes up with a few interesting links. Here is some code that does a spiral loop around an array, however this doesn't work from an arbitrary point and spiral outwards, but should give you some good ideas about how to achieve what you want.
Here is a similar question about filling values in spiral order in a 2D array.
Anyway, here is how I would tackle the problem:
Given point P, create a vector pair that specifies an area around P.
So if P = 4,4
Then your vector pair would be 3,3|5,5
Loop each value in those bounds.
for x = 3 to 5
for y = 3 to 5
check(x,y)
next
next
If a value is found, exit. If not, increase the bounds by one again. So in this case we would go to 2,2|6,6
When looping to check the values, ensure we haven't gone into any negative indexes, and also ensure we haven't exceeded the size of the array.
Also if you extend the bounds n times, you only need to loop the outer boundary values, you do not need to recheck inner values.
Which method is faster?
It all depends on:
The density of your array
Distribution technique
Number of objects
Density of Array
If you have a 500x500 array with 2 objects in it, then looping the list will always outperform doing a spiral
Distribution technique
If there are patterns of distribution (IE the objects tend to cluster around one another) then a spiral may perform faster.
Number of objects
A spiral will probably perform faster if there are a million objects, as the list technique requires you to check and calculate every distance.
You should be able to calculate the fastest solution by working out the probability of a space being filled, compared to the fact that the list solution has to check every object every time.
However, with the list technique, you may be able to do some smart sorting to improve performance. It's probably worth looking into.
If your objects are dense, then just searching the nearby points will probably be the best way to find the nearest object, spiraling out from the center. If your objects are sparse, then a quadtree or related data structure (R-tree, etc.) is probably better. Here is a writeup with examples.
I do not know of a good online algorithm reference, but I can say that if you are going to write more than the occasional line of code, saving your pennies to buy CLRS will be worth the money. There are lectures based on this book online that have been painstakingly annotated by Peteris Krumins, but they only cover part of the book. This is one of the few books that you need to own.
The following simple solution assumes that you can afford storing extra information per grid cell, and that the time cost of adding new objects to the grid is allowed to be relatively high.
The idea is that every cell holds a reference to the closest occupied cell, thus allowing O(1) query time.
Whenever an object is added to position (i,j), perform a scan of the surrounding cells, covering rings of increasing size. For each cell being scanned, evaluate its current closest occupied cell reference, and replace it if necessary. The process ends when the last ring being scanned isn't modified at all. In the worst case the process scans all grid cells, but eventually it becomes better when the grid becomes dense enough.
This solution is simple to implement, may have a significant space overhead (depending on how your grid is organized in memory), but provides optimal query time.
A simple BFS from starting coordinate in 4 directions is sufficient to find the closest point on the grid with the object.
Related
I have a list of coordinates that form axis-aligned 2D boxes (axis-oriented/iso-oriented rectangles, rects).
I want to see how many boxes intersect each other without repeating it. For example, I have boxes A, B, and C. If A intersects with B, B intersecting with A will still count as one intersection instead of two separate ones.
The way I'm thinking is to grab one box, compare it to all the rest and then throw it out since the comparison is done and I want no repeats. For example going back to A, B and C. If I'm on A and it intersects B and not C I have made its round and hence will not need to keep it in the loop. Hence once I'm on B it will not recheck A.
I can't think of a faster method. This is akin to finding duplicates using linear search and I think the worst-case complexity will be O(n^2). I don't see how to use binary search as it is possible that there are multiple intersections. I've been also looking at the sorting algorithm but I need one that won't go back and check the older values again.
You can solve this in O(n log n). Since you're trying to solve a static intersection problem in two dimensions, a common trick is to transform the problem into a dynamic intersection problem in one dimension (i.e., one space dimension becomes the time dimension).
Suppose you have a closed rectangle with lower left corner (x1, y1) and upper right corner (x2, y2). This rectangle becomes two "interval events":
Interval [y1, y2] inserted at time x1
Interval [y1, y2] removed after time x2
Transform all your rectangles into events in this way, and then sort by time (breaking ties with insertions coming before removals).
Now, you need a data structure that lets you add and remove intervals [A, B], and also query the number of intervals in the data structure intersecting [A, B]. Then you process the "interval events" in the sorted order, but keep a running sum of how many current intervals intersect [A, B] before inserting each interval [A, B].
One data structure to do this in O(log n) time per operation is with two balanced binary search trees: one holding beginning-points of intervals, and the other holding ending-points of intervals. You'll also need to augment each BST to be an Order Statistic Tree, to quickly query the number of points less than or equal to a certain value that are in the tree.
Then, finding how many intervals currently in your data structure intersect an arbitrary interval [A, B] is simple; that count is:
#(Intervals intersecting [A, B]) =
#(values in the beginning-points tree that are <= B)
- #(values in the ending-points tree that are < A)
which you can check is correct from the definition of two intervals intersecting: neither interval starts after the other one has ended.
You could also replace the order statistic tree with a data structure for prefix-sums, like a Fenwick tree, which requires much less code for the same complexity.
Sample implementation of kcsquared's algorithm. You didn't specify a language, so I chose Python since I'm familiar with it and it's short readable code. Rectangles have lower left coordinate (x,y) and upper right coordinate (X,Y).
def intersections(rects):
events = []
for A in rects:
events.append((A.x, 'insert', A.y, A.Y))
events.append((A.X, 'remove', A.y, A.Y))
intersections = 0
ys = SortedList()
Ys = SortedList()
for x, op, y, Y in sorted(events):
if op == 'insert':
intersections += ys.bisect_right(Y) - Ys.bisect_left(y)
ys.add(y)
Ys.add(Y)
else:
ys.remove(y)
Ys.remove(Y)
return intersections
Tests on lists of 5,000 random rectangles against an O(n2) reference implementation, showing the numbers of intersections and runtimes:
124257 5465 ms reference
124257 127 ms intersections
121166 5444 ms reference
121166 124 ms intersections
118980 5435 ms reference
118980 124 ms intersections
With 10,000 rectangles:
489617 22342 ms reference
489617 292 ms intersections
489346 22491 ms reference
489346 296 ms intersections
489990 22302 ms reference
489990 290 ms intersections
Full code:
def reference(rects):
intersections = 0
for A, B in combinations(rects, 2):
if A.X >= B.x and A.x <= B.X and A.Y >= B.y and A.y <= B.Y:
intersections += 1
return intersections
def intersections(rects):
events = []
for A in rects:
events.append((A.x, 'insert', A.y, A.Y))
events.append((A.X, 'remove', A.y, A.Y))
intersections = 0
ys = SortedList()
Ys = SortedList()
for x, op, y, Y in sorted(events):
if op == 'insert':
intersections += ys.bisect_right(Y) - Ys.bisect_left(y)
ys.add(y)
Ys.add(Y)
else:
ys.remove(y)
Ys.remove(Y)
return intersections
from random import randint, randrange
from itertools import combinations
from timeit import default_timer as timer
from sortedcontainers import SortedList
from collections import namedtuple
Rect = namedtuple('Rect', 'x X y Y')
for _ in range(3):
rects = [Rect(x, x + randint(1, 100), y, y + randint(1, 100))
for _ in range(5000)
for x, y in [(randrange(1000), randrange(1000))]]
for func in reference, intersections:
t0 = timer()
result = func(rects)
t1 = timer()
print(result, '%4d ms' % ((t1 - t0) * 1e3), func.__name__)
print()
You can somewhat reduce the average complexity, more precisely the computation time, but you'll always get a worst case in n²... Because, inherently, that's a O(n²) algorithm, since you have to test two by two every N boxes.
One way to reduce that computation time is to "attach", to each box, its circumscribed sphere. It's quite simple to compute: its center is the barycenter of the 8 vertexes of the box / 4 vertexes of the rectangle, and its radius is any distance from this barycenter to one vertex.
The trick is: given two boxes, if their circumscribed spheres aren't intersecting (distance between the two centers is greater than the sum of the two radiuses), then the boxes can not intersect.
If the two spheres intersect, then the boxes may intersect.
This trick doesn't change at all the complexity, but it reduce A LOT the computation time, since it's way faster to test for spheres than for boxes (especially if they aren't parallel to axes...). And it will reduce the list size, up to empty it for any potential intersections. Also, you also have only a shortened (even empty) list of potential intersections for each box.
You can gain some computation time by using a list of boxes pairs to finely test, using squares of distances for comparisons, etc.: it's not so much, but in the end, it still saves some time.
Then, you can compute the real boxes intersections with way less boxes pairs.
This would be a significative boost with a lot of 3D random boxes unaligned with axes, and it won't be with few 2D rectangles aligned with axes. In both cases, you'll always be with a O(n²) complexity (i.e. the worst case).
The goal is to find coordinates in a figure with an unknown shape. What IS known is a list of coordinates of the boundary of that figure, for example:
boundary = [(0,0),(1,0),(2,0),(3,0),(3,1),(3,2),(3,3),(2,3),(2,2),(1,2),(1,3),(0,3),(0,2),(0,1]
which would look something like this:
Square with a gab
This is a very basic example and i'd like to do it with very larg lists of very different kinds of figures.
The question is how to get a random coordinate that lies within the figure WITHOUT hardcoding the anything about the shape of the figure, because this will be unknown at the beginning? Is there a way to know for certain or is making an estimate the best option? How would I implement an estimate like that?
Here is tentative answer. You sample numbers in two steps.
Before, do preparation work - split your figure into simple elementary objects. In your case you split it into rectangles, often people triangulate and split it into triangles.
So you have number N of simple objects, each with area of Ai and total area A = Sum(Ai).
First sampling step - select which rectangle you pick point from.
In some pseudocode
r = randomU01(); // random value in [0...1) range
for(i in N) {
r = r - A_i/A;
if (r <= 0) {
k = i;
break;
}
}
So you picked up one rectangle with index k, and then just sample point uniformly in that rectangle
x = A_k.dim.x * randomU01();
y = A_k.dim.y * randomU01();
return (x + A_k.lower_left_corner.x, y + A_k.lower_left_corner.y);
And that is it. Very similar technique for triangulated figure.
Rectangle selection could be optimized by doing binary search or even more complicated alias method
UPDATE
If your boundary is generic, then the only good way to go is to triangulate your polygon using any good library out there (f.e. Triangle), then select one of the triangles based on area (step 1), then sample uniformly point in the triangle using two random U01 numbers r1 and r2,
P = (1 - sqrt(r1)) * A + (sqrt(r1)*(1 - r2)) * B + (r2*sqrt(r1)) * C
i.e., in pseudocode
r1 = randomU01();
s1 = sqrt(r1);
r2 = randomU01();
x = (1.0-s1)*A.x + s1*(1.0-r2)*B.x + r2*s1*C.x;
y = (1.0-s1)*A.y + s1*(1.0-r2)*B.y + r2*s1*C.y;
return (x,y);
Is there any algorithm / method to find the smallest regular hexagon around a set of points (x, y).
And by smallest I mean smallest area.
My current idea was to find the smallest circle enclosing the points, and then create a hexagon from there and check if all the points are inside, but that is starting to sound like a never ending problem.
Requirements
First of all, let's define a hexagon as quadruple [x0, y0, t0, s], where (x0, y0), t0 and s are its center, rotation and side-length respectively.
Next, we need to find whether an arbitrary point is inside the hexagon. The following functions do this:
function getHexAlpha(t, hex)
t = t - hex.t0;
t = t - 2*pi * floor(t / (2*pi));
return pi/2 - abs(rem(t, pi/3) - (pi/6));
end
function getHexRadious( P, hex )
x = P.x - hex.x0;
y = P.y - hex.y0;
t = atan2(y, x);
return hex.s * cos(pi/6) / sin(getHexAlpha(t, hex));
end
function isInHex(P, hex)
r = getHexRadious(P, hex);
d = sqrt((P.x - hex.x0)^2 + (P.y - hex.y0)^2);
return r >= d;
end
Long story short, the getHexRadious function formulates the hexagon in polar form and returns distance from center of hexagon to its boundary at each angle. Read this post for more details about getHexRadious and getHexRadious functions. This is how these work for a set of random points and an arbitrary hexagon:
The Algorithm
I suggest a two-stepped algorithm:
1- Guess an initial hexagon that covers most of points :)
2- Tune s to cover all points
Chapter 1: (2) Following Tarantino in Kill Bill Vol.1
For now, let's assume that our arbitrary hexagon is a good guess. Following functions keep x0, y0, t0 and tune s to cover all points:
function getHexSide( P, hex )
x = P.x - hex.x0;
y = P.y - hex.y0;
r = sqrt(x^2 + y^2);
t = atan2(y, x);
return r / (cos(pi/6) / sin(getHexAlpha(t, hex)));
end
function findMinSide( P[], hex )
for all P[i] in P
S[i] = getHexSide(P, hex);
end
return max(S[]);
end
The getHexSide function is reverse of getHexRadious. It returns the minimum required side-length for a hexagon with x0, y0, t0 to cover point P. This is the outcome for previous test case:
Chapter 2: (1)
As a guess, we can find two points furthest away from each other and fit one of hexagon diameters' on them:
function guessHex( P[] )
D[,] = pairwiseDistance(P[]);
[i, j] = indexOf(max(max(D[,])));
[~, j] = max(D(i, :));
hex.x0 = (P[i].x + P[j].x) / 2;
hex.y0 = (P[i].y + P[j].y) / 2;
hex.s = D[i, j]/2;
hex.t0 = atan2(P.y(i)-hex.y0, P.x(i)-hex.x0);
return hex;
end
Although this method can find a relatively small polygon, but as a greedy approach, it never guarantees to find the optimum solutions.
Chapter 3: A Better Guess
Well, this problem is definitely an optimization problem with its objective being to minimize area of hexagon (or s variable). I don't know if it has an analytical solution, and SO is not the right place to discuss it. But any optimization algorithm can be used to provide a better initial guess. I used GA to solve this with findMinSide as its cost function. In fact GA generates many guesses about x0, y0, and t0 and the best one will be selected. It finds better results but is more time consuming. Still no guarantee to find the optimum!
Optimization of Optimization
When it comes to optimization algorithms, performance is always an issue. Keep in mind that hexagon only needs to enclose the convex-hall of points. If you are dealing with large sets of points, it's better to find the convex-hall and get rid of the rest of the points.
Edit: I've worked a solution. Feel free to contact me if you come across this in the future and need something similar.
--
Instead of generating random points on a plane, how would you check if a given coordinate is equal to a random point? Or inside a random bounding box?
For example you have a plane with integer coordinates. That plane is somehow populated with random bounding boxes (generated using a formula, not data). The goal is to check if a given (x, y) is within one of those boxes.
I can find many references on how to generate random points but not much for doing it in this more backwards way (I guess you'd call it 'functional'?).
I have managed to make an algorithm that splits the plane into 100x100 squares, and within each square is a bounding box that is randomly placed. But is it possible with an algorithm that places the boxes more organically?
Edit: Here's an example algorithm I used for a simple "random point within a 100x100 grid" (from memory, might be missing something):
// check if equal to a random point within the point's grid square
boolean isRandomCenter(x, y) {
// offset relative to origin of grid square
int offsetX = x mod 100
int offsetY = y mod 100
// any random seed will do
int randomSeed = x * y
// random position of point for this square
int randomOffsetX = random(50, randomSeed)
int randomOffsetY = random(50, randomSeed)
if (offsetX == randomOffsetX && offsetY == randomOFfsetY)
return true
return false
}
Well, I don't know if I exactly understand your problem, but the condition to know if a given point M(x, y) plotted in a 2 dimensional Euclidian space represented with two axes x and y is inside a box represented with two opposites points A(xa, ya) and B(xb, yb) is pretty simple.
Let's define a function isInsideTheBox(x, y, xa, ya, xb, yb) returning true if M is inside the box and false else :
bool isInsideTheBox(int x, int y, int xa, int ya, int xb, int yb)
{
// We assume xa < xb and ya < yb
return (x >= xa && x <= xb && y >= ya && y <= yb);
}
I am answering the question: check if a point is over a random point.
If the coordinates are real, the probability of an overlap is null and the question is virtually useless. So I assume discrete coordinates.
If the question regards random points that have already been drawn, the only way is to remember the random points in some container as you draw them (array, sorted, list, search tree, hash table).
If the question regards points that might be drawn at that location, the answer is "true" in the whole domain (where the distribution is nonzero). You need to model the domain geometrically to perform point-in-... queries.
If the question is about pseudo-random or quasi-random points, I don't think there is any shortcut and you should proceed as for the truly random case (unless the generator is really poor).
I searched a lot, but I didn't find a good answer that works for this case.
We have some rectangles that are horizontal or vertical. They can be placed on the page randomly. They can overlap or have a common edge or be separate from each other.
I want to find an algorithm with O(nlogn) that can find perimeter and area of these rectangles.
These pictures may make the problem clear.
I think that interval trees might help, but I'm not sure how.
It can be done by a sweep-line algorithm.
We'll sweep an imaginary line from left to right.
We'll notice the way the intersection between the line and the set of rectangles represents a set of intervals, and that it changes when we encounter a left or right edge of a rectangle.
Let's say that the intersection doesn't change between x coordinates x1 and x2.
Then, if the length of the intersection after x1 was L, the line would have swept an area equal to (x2 - x1) * L, by sweeping from x1 to x2.
For example, you can look at x1 as the left blue line, and x1 as the right blue line on the following picture (that I stole from you and modified a bit :)):
It should be clear that the intersection of our sweep-line doesn't change between those points. However, the blue intersection is quite different from the red one.
We'll need a data structure with these operations:
insert_interval(y1, y2);
get_total_length();
Those are easily implemented with a segment tree, so I won't go into details now.
Now, the algorithm would go like this:
Take all the vertical segments and sort them by their x coordinates.
For each relevant x coordinate (only the ones appearing as edges of rectangles are important):
Let x1 be the previous relevant x coordinate.
Let x2 be the current relevant x coordinate.
Let L be the length given by our data structure.
Add (x2 - x1) * L to the total area sum.
Remove all the right edges with x = x2 segments from the data structure.
Add all the left edges with x = x2 segments to the data structure.
By left and right I mean the sides of a rectangle.
This idea was given only for computing the area, however, you may modify it to compute the perimeter. Basically you'll want to know the difference between the lengths of the intersection before and after it changes at some x coordinate.
The complexity of the algorithm is O(N log N) (although it depends on the range of values you might get as input, this is easily dealt with).
You can find more information on the broad topic of sweep-line algorithms on TopCoder.
You can read about various ways to use the segment tree on the PEG judge wiki.
Here's my (really old) implementation of the algorithm as a solution to the SPOJ problem NKMARS: implementation.
The following is O(N2) solution.
int area = 0;
FOR(triange=0->N)
{
Area = area trianlges[triangle];
FOR(int j = triangle+1 -> N)
{
area-= inter(triangle , j)
}
}
return area;
int inter(tri a,tri b)
{
if ( ( min(a.highY ,b.highY) > max(a.lowerY, b.lowerY) ) && ( min(a.highX ,b.highX) > max(a.lowerX, b.lowerX) ) )
return ( min(a.highY ,b.highY) - max(a.lowerY, b.lowerY) ) * ( min(a.highX ,b.highX) - max(a.lowerX, b.lowerX) )
else return 0;
}