Is there a way better than a brute force comparison to get the max and min length diagonals of a polygon? To be more specific, I would like to find the ratio, so I can sort polygon on their "skinniness."
The polygons aren't too large (usually 4-8 faces per polygon), but there's a lot of them. I thought I'd just check with SO to see if there was a better way of doing this.
Thanks in advance
The polygons aren't too large (usually 4-8 faces per polygon), but there's a lot of them.
I don't know if there's a faster solution than O(n^2), but for n <= 8 it's not gonna matter. If n = 8, you just have to check 20 diagonals (8 * 5 / 2). It's not so big multiplier itself, and any complex algorithm is likely to have a lot of computational overhead (data structures, sophisticated loops and checks).
One thing you can do to speed it up, though, is to throw away square root in the formula of distance between two points. First find min/max of (xi-xj)*(xi-xj) + (yi-yj)*(yi-yj), then apply square root. It's quite expensive operation and doing it 2 times instead of 20 could make a difference.
Related
Suppose we have n points in a bounded region of the plane. The problem is to divide it in 4 regions (with a horizontal and a vertical line) such that the sum of a metric in each region is minimized.
The metric can be for example, the sum of the distances between the points in each region ; or any other measure about the spreadness of the points. See the figure below.
I don't know if any clustering algorithm might help me tackle this problem, or if for instance it can be formulated as a simple optimization problem. Where the decision variables are the "axes".
I believe this can be formulated as a MIP (Mixed Integer Programming) problem.
Lets introduce 4 quadrants A,B,C,D. A is right,upper, B is right,lower, etc. Then define a binary variable
delta(i,k) = 1 if point i is in quadrant k
0 otherwise
and continuous variables
Lx, Ly : coordinates of the lines
Obviously we have:
sum(k, delta(i,k)) = 1
xlo <= Lx <= xup
ylo <= Ly <= yup
where xlo,xup are the minimum and maximum x-coordinate. Next we need to implement implications like:
delta(i,'A') = 1 ==> x(i)>=Lx and y(i)>=Ly
delta(i,'B') = 1 ==> x(i)>=Lx and y(i)<=Ly
delta(i,'C') = 1 ==> x(i)<=Lx and y(i)<=Ly
delta(i,'D') = 1 ==> x(i)<=Lx and y(i)>=Ly
These can be handled by so-called indicator constraints or written as linear inequalities, e.g.
x(i) <= Lx + (delta(i,'A')+delta(i,'B'))*(xup-xlo)
Similar for the others. Finally the objective is
min sum((i,j,k), delta(i,k)*delta(j,k)*d(i,j))
where d(i,j) is the distance between points i and j. This objective can be linearized as well.
After applying a few tricks, I could prove global optimality for 100 random points in about 40 seconds using Cplex. This approach is not really suited for large datasets (the computation time quickly increases when the number of points becomes large).
I suspect this cannot be shoe-horned into a convex problem. Also I am not sure this objective is really what you want. It will try to make all clusters about the same size (adding a point to a large cluster introduces lots of distances to be added to the objective; adding a point to a small cluster is cheap). May be an average distance for each cluster is a better measure (but that makes the linearization more difficult).
Note - probably incorrect. I will try and add another answer
The one dimensional version of minimising sums of squares of differences is convex. If you start with the line at the far left and move it to the right, each point crossed by the line stops accumulating differences with the points to its right and starts accumulating differences to the points to its left. As you follow this the differences to the left increase and the differences to the right decrease, so you get a monotonic decrease, possibly a single point that can be on either side of the line, and then a monotonic increase.
I believe that the one dimensional problem of clustering points on a line is convex, but I no longer believe that the problem of drawing a single vertical line in the best position is convex. I worry about sets of points that vary in y co-ordinate so that the left hand points are mostly high up, the right hand points are mostly low down, and the intermediate points alternate between high up and low down. If this is not convex, the part of the answer that tries to extend to two dimensions fails.
So for the one dimensional version of the problem you can pick any point and work out in time O(n) whether that point should be to the left or right of the best dividing line. So by binary chop you can find the best line in time O(n log n).
I don't know whether the two dimensional version is convex or not but you can try all possible positions for the horizontal line and, for each position, solve for the position of the vertical line using a similar approach as for the one dimensional problem (now you have the sum of two convex functions to worry about, but this is still convex, so that's OK). Therefore you solve at most O(n) one-dimensional problems, giving cost O(n^2 log n).
If the points aren't very strangely distributed, I would expect that you could save a lot of time by using the solution of the one dimensional problem at the previous iteration as a first estimate of the position of solution for the next iteration. Given a starting point x, you find out if this is to the left or right of the solution. If it is to the left of the solution, go 1, 2, 4, 8... steps away to find a point to the right of the solution and then run binary chop. Hopefully this two-stage chop is faster than starting a binary chop of the whole array from scratch.
Here's another attempt. Lay out a grid so that, except in the case of ties, each point is the only point in its column and the only point in its row. Assuming no ties in any direction, this grid has N rows, N columns, and N^2 cells. If there are ties the grid is smaller, which makes life easier.
Separating the cells with a horizontal and vertical line is pretty much picking out a cell of the grid and saying that cell is the cell just above and just to the right of where the lines cross, so there are roughly O(N^2) possible such divisions, and we can calculate the metric for each such division. I claim that when the metric is the sum of the squares of distances between points in a cluster the cost of this is pretty much a constant factor in an O(N^2) problem, so the whole cost of checking every possibility is O(N^2).
The metric within a rectangle formed by the dividing lines is SUM_i,j[ (X_i - X_j)^2 + (Y_i-Y_j)^2]. We can calculate the X contributions and the Y contributions separately. If you do some algebra (which is easier if you first subtract a constant so that everything sums to zero) you will find that the metric contribution from a co-ordinate is linear in the variance of that co-ordinate. So we want to calculate the variances of the X and Y co-ordinates within the rectangles formed by each division. https://en.wikipedia.org/wiki/Algebraic_formula_for_the_variance gives us an identity which tells us that we can work out the variance given SUM_i Xi and SUM_i Xi^2 for each rectangle (and the corresponding information for the y co-ordinate). This calculation can be inaccurate due to floating point rounding error, but I am going to ignore that here.
Given a value associated with each cell of a grid, we want to make it easy to work out the sum of those values within rectangles. We can create partial sums along each row, transforming 0 1 2 3 4 5 into 0 1 3 6 10 15, so that each cell in a row contains the sum of all the cells to its left and itself. If we take these values and do partial sums up each column, we have just worked out, for each cell, the sum of the rectangle whose top right corner lies in that cell and which extends to the bottom and left sides of the grid. These calculated values at the far right column give us the sum for all the cells on the same level as that cell and below it. If we subtract off the rectangles we know how to calculate we can find the value of a rectangle which lies at the right hand side of the grid and the bottom of the grid. Similar subtractions allow us to work out first the value of the rectangles to the left and right of any vertical line we choose, and then to complete our set of four rectangles formed by two lines crossing by any cell in the grid. The expensive part of this is working out the partial sums, but we only have to do that once, and it costs only O(N^2). The subtractions and lookups used to work out any particular metric have only a constant cost. We have to do one for each of O(N^2) cells, but that is still only O(N^2).
(So we can find the best clustering in O(N^2) time by working out the metrics associated with all possible clusterings in O(N^2) time and choosing the best).
I have a rectangular area where there are circles with equal radius. I want to find which circles overlap with other circles (the output is a list of 2-element sets of overlapping circles).
I know how to check if two of the circles overlap (the distance between their centers is less than the diameter). I can perform this check for every pair of circles, but I was wondering if there is a better algorithm (faster than O(n^2)).
EDIT
The number of circles is usually about 100 and overlappings won't happen very often.
Here is some context:
The rectangle is a battlefield in a game. The movement of the units is done on small steps and I'm trying to detect collisions between units.
Given the new explanation of the problem statement, I would recommend a different approach.
Overlay a square grid over the battlefield, with a grid step equal to one circle diameter. Every circle can overlap at most four cells. In each cell, keep a list of the overlapping circles (and update it on every move).
Detecting potential collisions will now take about four cell/circle tests per circle, i.e. close to linear time.
For a simple solution, insert the centers in a 2d-tree and perform circular range queries around every center with a query radius 2R. In good conditions, this can be O(N Log(N)).
Alternatively, just sort the centers on X and try all circles in turn: by dichotomic search, locate the abscissa Xc and scan to Xc-2R and to Xc+2R, then check the 2D distance.
The cost of the dichotomic searches will be O(N Log(N)). If the circles are uniformly spread out in a square of side S, a stripe of width 4R contains 4RN/S circles, hence a total comparison cost of 4RN²/S. This is a good performance if S is large (think that for N tightly packed circles in a square, S~2R√N, hence 2N√N comparisons).
Direct answer: You cannot get better than O(n^2) in general since the circles could potentially all overlap, so you have to generate n^2 answers.
If you get more specific, you might get better answers. For example, if what you are really trying to do is find bounding spheres in a 2D simulation, you can profit from the fact that entities only move so far between frames, if the circles are sparse it's different from when they are tightly packed, etc. So let us know more about what it's all about.
EDIT: You edited your question - you indeed are looking for collision detection in a 2D simulation. If you check out https://en.wikipedia.org/wiki/Collision_detection , they point to several algorithms for exactly your case.
I like the one detailed right on that page where you keep one list of bounding intervals per axis (2 in "2D") and only need to "work hard" when those bounding intervals (which are themself by definition one-dimensional) change (i.e., there "overlap state"). This removes the O(n²) for well-behaved cases. They don't give an estimate for the complexity of that, but as it basically comes down to sorting, it looks more or less O(n logn) to me, and less when there are only minimal changes between frames.
In my wip game i have to implement Circle-Circle collisions. To implement this i simply calculate the square distance between their Centers (x1-x2)² + (y1-y2)². If this is smaller then their sqared radiuses (r1+r2)² a collision happened. But today i saw this link:
Circle-Circle collision
Here they first use AABB collision to notice if the circles are near. But why should i do this? The circle-circle collision is a simple and not realy expensive calculation. When i use the AABB first i do at least the same number of calculations and if circles are near even more.
Let me explain:
I do an AABB collision detection for every circle with every other.
So i have to do n! / (n-2)! calculations. n = number of circles to test.
For every AABB colliding circle-pair i then have to do another calculation if they realy collide.
Without the AABB collision detection i only do n! / (n-2)! calculations and i don't think this calculations are so costly.
What do you think?
I think here is way you can do this in O(NlogN) in average case but O(N^2) in worst : -
Consider each circle as rectangle of 2R*2R with center at center of circle.
Use sweep line algorithm for rectangles which is O(NlogN + R) where is number of intersections.
The intersecting pairs of rectangles can be checked as circles for intersection using your algorithm in O(R^2).
Note: If R is small then it is O(NlogN) but else if R = O(N) then it is O(N^2)
This comment is the right answer. It depends on your Hardware and compiler. If you are using AABB detection first you have 8 add + 4 compare operations. If you compare the square distance and the square radiuses you do 6 add/subtract + 3 multiply + 1 compare. Maybe your hardware and compiler is faster with compares then with multiplications. But it shouldn't be a big difference in performance. If you instead of comparing the squares use the square roots (for whatever reason) you should do the AABB compare first cause square root calculations take some time.
There are ofc also better Solutions like in this answer. If you have to do many detections between many objects you can think about one of this solutions.
In this problem r is a fixed positive integer. You are given N rectangles, all the same size, in the plane. The sides are either vertical or horizontal. We assume the area of the intersection of all N rectangles has non-zero area. The problem is how to find N-r of these rectangles, so as to maximize the area of the intersection. This problem arises in practical microscopy when one repeatedly images a given biological specimen, and alignment changes slightly during this process, due to physical reasons (e.g. differential expansion of parts of the microscope and camera). I have expressed the problem for dimension d=2. There is a similar problem for each d>0. For d=1, an O(N log(N)) solution is obtained by sorting the lefthand endpoints of the intervals. But let's stick with d=2. If r=1, one can again solve the problem in time O(N log(N)) by sorting coordinates of the corners.
So, is the original problem solved by solving first the case (N,1) obtaining N-1 rectangles, then solving the case (N-1,1), getting N-2 rectangles, and so on, until we reduce to N-r rectangles? I would be interested to see an explicit counter-example to this optimistic attempted procedure. It would be even more interesting if the procedure works (proof please!), but that seems over-optimistic.
If r is fixed at some value r>1, and N is large, is this problem in one of the NP classes?
Thanks for any thoughts about this.
David
Since the intersection of axis-aligned rectangles is an axis-aligned rectangle, there are O(N4) possible intersections (O(N) lefts, O(N) rights, O(N) tops, O(N) bottoms). The obvious O(N5) algorithm is to try all of these, checking for each whether it's contained in at least N - r rectangles.
An improvement to O(N3) is to try all O(N2) intervals in the X dimension and run the 1D algorithm in the Y dimension on those rectangles that contain the given X-interval. (The rectangles need to be sorted only once.)
How large is N? I expect that fancy data structures might lead to an O(N2 log N) algorithm, but it wouldn't be worth your time if a cubic algorithm suffices.
I think I have a counter-example. Let's say you have r := N-2. I.e. you want to find two rectangles with maximum overlapping. Let's say you have to rectangles covering the same area (=maximum overlapping). Those two will be the optimal result in the end.
Now we need to construct some more rectangles, such that at least one of those two get removed in a reduction step.
Let's say we have three rectangles which overlap a lot..but they are not optimal. They have a very small overlapping area with the other two rectangles.
Now if you want to optimize the area for four rectangles, you will remove one of the two optimal rectangles, right? Or maybe you don't HAVE to, but you're not sure which decision is optimal.
So, I think your reduction algorithm is not quite correct. Atm I'm not sure if there is a good algorithm for this or in which complexity class this belongs to, though. If I have time I think about it :)
Postscript. This is pretty defective, but may spark some ideas. It's especially defective where there are outliers in a quadrant that are near the X and Y axes - they will tend to reinforce each other, as if they were both at 45 degrees, pushing the solution away from that quadrant in a way that may not make sense.
-
If r is a lot smaller than N, and N is fairly large, consider this:
Find the average center.
Sort the rectangles into 2 sequences by (X - center.x) + (Y - center.y) and (X - center.x) - (Y - center.y), where X and Y are the center of each rectangle.
For any solution, all of the reject rectangles will be members of up to 4 subsequences, each of which is a head or tail of each of the 2 sequences. Assuming N is a lot bigger than r, most the time will be in sorting the sequences - O(n log n).
To find the solution, first find the intersection given by removing the r rectangles at the head and tail of each sequence. Use this base intersection to eliminate consideration of the "core" set of rectangles that you know will be in the solution. This will reduce the intersection computations to just working with up to 4*r + 1 rectangles.
Each of the 4 sequence heads and tails should be associated with an array of r rectangles, each entry representing the intersection given by intersecting the "core" with the i innermost rectangles from the head or tail. This precomputation reduces the complexity of finding the solution from O(r^4) to O(r^3).
This is not perfect, but it should be close.
Defects with a small r will come from should-be-rejects that are at off angles, with alternatives that are slightly better but on one of the 2 axes. The maximum error is probably computable. If this is a concern, use a real area-of-non-intersection computation instead of the simple "X+Y" difference formula I used.
Here is an explicit counter-example (with N=4 and r=2) to the greedy algorithm proposed by the asker.
The maximum intersection between three of these rectangles is between the black, blue, and green rectangles. But, it's clear that the maximum intersection between any two of these three is smaller than intersection between the black and the red rectangles.
I now have an algorithm, pretty similar to Ed Staub's above, with the same time estimates. It's a bit different from Ed's, since it is valid for all r
The counter-example by mhum to the greedy algorithm is neat. Take a look.
I'm still trying to get used to this site. Somehow an earlier answer by me was truncated to two sentences. Thanks to everyone for their contributions, particularly to mhum whose counter-example to the greedy algorithm is satisfying. I now have an answer to my own question. I believe it is as good as possible, but lower bounds on complexity are too difficult for me. My solution is similar to Ed Staub's above and gives the same complexity estimates, but works for any value of r>0.
One of my rectangles is determined by its lower left corner. Let S be the set of lower left corners. In time O(N log(N)) we sort S into Sx according to the sizes of the x-coordinates. We don't care about the order within Sx between two lower left corners with the same x-coord. Similarly the sorted sequence Sy is defined by using the sizes of the y-coords. Now let u1, u2, u3 and u4 be non-negative integers with u1+u2+u3+u4=r. We compute what happens to the area when we remove various rectangles that we now name explicitly. We first remove the u1-sized head of Sx and the u2-sized tail of Sx. Let Syx be the result of removing these u1+u2 entries from Sy. We remove the u3-sized head of Syx and the u4-sized tail of Syx. One can now prove that one of these possible choices of (u1,u2,u3,u4) gives the desired maximal area of intersection. (Email me if you want a pdf of the proof details.) The number of such choices is equal to the number of integer points in the regular tetrahedron in 4-d euclidean space with vertices at the 4 points whose coordinate sum is r and for which 3 of the 4 coordinates are equal to 0. This is bounded by the volume of the tetrahedron, giving a complexity estimate of O(r^3).
So my algorithm has time complexity O(N log(N)) + O(r^3).
I believe this produces a perfect solution.
David's solution is easier to implement, and should be faster in most cases.
This relies on the assumption that for any solution, at least one of the rejects must be a member of the complex hull. Applying this recursively leads to:
Compute a convex hull.
Gather the set of all candidate solutions produced by:
{Remove a hull member, repair the hull} r times
(The hull doesn't really need to be repaired the last time.)
If h is the number of initial hull members, then the complexity is less than
h^r, plus the cost of computing the initial hull. I am assuming that a hull algorithm is chosen such that the sorted data can be kept and reused in the hull repairs.
This is just a thought, but if N is very large, I would probably try a Monte-Carlo algorithm.
The idea would be to generate random points (say, uniformly in the convex hull of all rectangles), and score how each random point performs. If the random point is in N-r or more rectangles, then update the number of hits of each subset of N-r rectangles.
In the end, the N-r rectangle subset with the most random points in it is your answer.
This algorithm has many downsides, the most obvious one being that the result is random and thus not guaranteed to be correct. But as most Monte-Carlo algorithms it scales well, and you should be able to use it with higher dimensions as well.
Problem: I have two overlapping 2D shapes, A and B, each shape having the same number of pixels, but differing in shape. Some portion of the shapes are overlapping, and there are some pieces of each that are not overlapping. My goal is to move all the non-overlapping pixels in shape A to the non-overlapping pixels in shape B. Since the number of pixels in each shape is the same, I should be able to find a 1-to-1 mapping of pixels. The restriction is that I want to find the mapping that minimizes the total distance traveled by all the pixels that moved.
Brute Force: The brute force approach to solving this problem is obviously out of the question, since I would have to compute the total distance of all possible mappings of which I think there are n! (where n is the number of non-overlapping pixels in one shape) times the computation of calculating a distance for each pair of points in the mapping, n, giving a total of O( n * n! ) or something similar.
Backtracking: The only "better" solution I could think of was to use backtracking, where I would keep track of the current minimum so far and at any point when I'm evaluating a certain mapping, if I reach or exceed that minimum, I move on to the next mapping. Even this won't do any better than O( n! ).
Is there any way to solve this problem with a reasonable complexity?
Also note that the "obvious" approach of simply mapping a point to it's closest matching neighbour does not always yield the optimum solution.
Simpler Approach?: As a secondary question, if a feasible solution doesn't exist, one possibility might be to partition each non-overlapping section into small regions, and map these regions, greatly reducing the number of mappings. To calculate the distance between two regions I would use the center of mass (average of the pixel locations in the region). However, this presents the problem of how I should go about doing the partitioning in order to get a near-optimal answer.
Any ideas are appreciated!!
This is the Minimum Matching problem, and you are correct that it is a hard problem in general. However for the 2D Euclidean Bipartite Minimum Matching case it is solvable in close to O(n²) (see link).
For fast approximations, FryGuy is on the right track with Simulated Annealing. This is one approach.
Also take a look at Approximation algorithms for bipartite and non-bipartite matching in the plane for a O((n/ε)^1.5*log^5(n)) (1+ε)-randomized approximation scheme.
You might consider simulated annealing for this. Start off by assigning A[x] -> B[y] for each pixel, randomly, and calculate the sum of squared distances. Then swap a pair of x<->y mappings, randomly. Then choose to accept this with probability Q, where Q is higher if the new mapping is better, and tends towards zero over time. See the wikipedia article for a better explanation.
Sort pixels in shape A: in increasing order of 'x' and then 'y' ordinates
Sort pixels in shape B: in decreasing order of 'x' and then increasing 'y'
Map pixels at the same index: in the sorted list the first pixel in A will map to first pixel in B. Is this not the mapping you are looking for?