Choose rectangles with maximal intersection area - algorithm

In this problem r is a fixed positive integer. You are given N rectangles, all the same size, in the plane. The sides are either vertical or horizontal. We assume the area of the intersection of all N rectangles has non-zero area. The problem is how to find N-r of these rectangles, so as to maximize the area of the intersection. This problem arises in practical microscopy when one repeatedly images a given biological specimen, and alignment changes slightly during this process, due to physical reasons (e.g. differential expansion of parts of the microscope and camera). I have expressed the problem for dimension d=2. There is a similar problem for each d>0. For d=1, an O(N log(N)) solution is obtained by sorting the lefthand endpoints of the intervals. But let's stick with d=2. If r=1, one can again solve the problem in time O(N log(N)) by sorting coordinates of the corners.
So, is the original problem solved by solving first the case (N,1) obtaining N-1 rectangles, then solving the case (N-1,1), getting N-2 rectangles, and so on, until we reduce to N-r rectangles? I would be interested to see an explicit counter-example to this optimistic attempted procedure. It would be even more interesting if the procedure works (proof please!), but that seems over-optimistic.
If r is fixed at some value r>1, and N is large, is this problem in one of the NP classes?
Thanks for any thoughts about this.
David

Since the intersection of axis-aligned rectangles is an axis-aligned rectangle, there are O(N4) possible intersections (O(N) lefts, O(N) rights, O(N) tops, O(N) bottoms). The obvious O(N5) algorithm is to try all of these, checking for each whether it's contained in at least N - r rectangles.
An improvement to O(N3) is to try all O(N2) intervals in the X dimension and run the 1D algorithm in the Y dimension on those rectangles that contain the given X-interval. (The rectangles need to be sorted only once.)
How large is N? I expect that fancy data structures might lead to an O(N2 log N) algorithm, but it wouldn't be worth your time if a cubic algorithm suffices.

I think I have a counter-example. Let's say you have r := N-2. I.e. you want to find two rectangles with maximum overlapping. Let's say you have to rectangles covering the same area (=maximum overlapping). Those two will be the optimal result in the end.
Now we need to construct some more rectangles, such that at least one of those two get removed in a reduction step.
Let's say we have three rectangles which overlap a lot..but they are not optimal. They have a very small overlapping area with the other two rectangles.
Now if you want to optimize the area for four rectangles, you will remove one of the two optimal rectangles, right? Or maybe you don't HAVE to, but you're not sure which decision is optimal.
So, I think your reduction algorithm is not quite correct. Atm I'm not sure if there is a good algorithm for this or in which complexity class this belongs to, though. If I have time I think about it :)

Postscript. This is pretty defective, but may spark some ideas. It's especially defective where there are outliers in a quadrant that are near the X and Y axes - they will tend to reinforce each other, as if they were both at 45 degrees, pushing the solution away from that quadrant in a way that may not make sense.
-
If r is a lot smaller than N, and N is fairly large, consider this:
Find the average center.
Sort the rectangles into 2 sequences by (X - center.x) + (Y - center.y) and (X - center.x) - (Y - center.y), where X and Y are the center of each rectangle.
For any solution, all of the reject rectangles will be members of up to 4 subsequences, each of which is a head or tail of each of the 2 sequences. Assuming N is a lot bigger than r, most the time will be in sorting the sequences - O(n log n).
To find the solution, first find the intersection given by removing the r rectangles at the head and tail of each sequence. Use this base intersection to eliminate consideration of the "core" set of rectangles that you know will be in the solution. This will reduce the intersection computations to just working with up to 4*r + 1 rectangles.
Each of the 4 sequence heads and tails should be associated with an array of r rectangles, each entry representing the intersection given by intersecting the "core" with the i innermost rectangles from the head or tail. This precomputation reduces the complexity of finding the solution from O(r^4) to O(r^3).
This is not perfect, but it should be close.
Defects with a small r will come from should-be-rejects that are at off angles, with alternatives that are slightly better but on one of the 2 axes. The maximum error is probably computable. If this is a concern, use a real area-of-non-intersection computation instead of the simple "X+Y" difference formula I used.

Here is an explicit counter-example (with N=4 and r=2) to the greedy algorithm proposed by the asker.
The maximum intersection between three of these rectangles is between the black, blue, and green rectangles. But, it's clear that the maximum intersection between any two of these three is smaller than intersection between the black and the red rectangles.

I now have an algorithm, pretty similar to Ed Staub's above, with the same time estimates. It's a bit different from Ed's, since it is valid for all r
The counter-example by mhum to the greedy algorithm is neat. Take a look.

I'm still trying to get used to this site. Somehow an earlier answer by me was truncated to two sentences. Thanks to everyone for their contributions, particularly to mhum whose counter-example to the greedy algorithm is satisfying. I now have an answer to my own question. I believe it is as good as possible, but lower bounds on complexity are too difficult for me. My solution is similar to Ed Staub's above and gives the same complexity estimates, but works for any value of r>0.
One of my rectangles is determined by its lower left corner. Let S be the set of lower left corners. In time O(N log(N)) we sort S into Sx according to the sizes of the x-coordinates. We don't care about the order within Sx between two lower left corners with the same x-coord. Similarly the sorted sequence Sy is defined by using the sizes of the y-coords. Now let u1, u2, u3 and u4 be non-negative integers with u1+u2+u3+u4=r. We compute what happens to the area when we remove various rectangles that we now name explicitly. We first remove the u1-sized head of Sx and the u2-sized tail of Sx. Let Syx be the result of removing these u1+u2 entries from Sy. We remove the u3-sized head of Syx and the u4-sized tail of Syx. One can now prove that one of these possible choices of (u1,u2,u3,u4) gives the desired maximal area of intersection. (Email me if you want a pdf of the proof details.) The number of such choices is equal to the number of integer points in the regular tetrahedron in 4-d euclidean space with vertices at the 4 points whose coordinate sum is r and for which 3 of the 4 coordinates are equal to 0. This is bounded by the volume of the tetrahedron, giving a complexity estimate of O(r^3).
So my algorithm has time complexity O(N log(N)) + O(r^3).

I believe this produces a perfect solution.
David's solution is easier to implement, and should be faster in most cases.
This relies on the assumption that for any solution, at least one of the rejects must be a member of the complex hull. Applying this recursively leads to:
Compute a convex hull.
Gather the set of all candidate solutions produced by:
{Remove a hull member, repair the hull} r times
(The hull doesn't really need to be repaired the last time.)
If h is the number of initial hull members, then the complexity is less than
h^r, plus the cost of computing the initial hull. I am assuming that a hull algorithm is chosen such that the sorted data can be kept and reused in the hull repairs.

This is just a thought, but if N is very large, I would probably try a Monte-Carlo algorithm.
The idea would be to generate random points (say, uniformly in the convex hull of all rectangles), and score how each random point performs. If the random point is in N-r or more rectangles, then update the number of hits of each subset of N-r rectangles.
In the end, the N-r rectangle subset with the most random points in it is your answer.
This algorithm has many downsides, the most obvious one being that the result is random and thus not guaranteed to be correct. But as most Monte-Carlo algorithms it scales well, and you should be able to use it with higher dimensions as well.

Related

List all sets of points that are enclosed by a circle with given radius

My problem is: Given N points in a plane and a number R, list/enumerate all subsets of points, where points in each subset are enclosed by a circle with radius of R. Two subsets should be different and not covered each other.
Efficiency may not be important, but the algorithm should not be too slow.
In a special case, can we find K subsets with most points? Approximation algorithm can be accepted.
Thanks,
Edit: It seems that the statement is not clear to understand. My bad!
So I restate my question as follows: Given N points and a circle with fixed radius R, use the circle to scan whole the space. At a time, the circle will cover a subset of points. The goal is to list all the possible subset of points that can be covered by such an R-radius circle. One subset cannot be a superset of other subsets.
I am not sure I get what you mean by 'not covered'. If you drop this, what you are looking for is exactely a Cech complex whose complexity is high, you wont have efficient algorithm if you dont have condition on the sampling (sampling should be sparse enough and R not too big otherwise you could have 2^n subsets with n your number of points). You have to enumerate all subsets and check if their minimal enclosing ball radius is lower than R. You can reduce the search to all subsets whose diameter is lower than R (eg pairwise distance lower than R) which may be sufficient in your case.
If 'not covered' for two subsets mean that one is not included into the other, you can have many different decompositions. One of interest is the alpha-complex as it can be computed efficiently in O(nlogn) in dimension 2-3 (I will suggest to use CGAL to compute it, you can also see what it means with pictures). If your points are high dimensional, then you will probably end up computing a Cech complex.
Without loss of generality, we can assume that the enclosing circles considered pass through at least two points (ignoring the trivial cases of no points or one point and assuming that your motivation is maximizing density, so that you don't care if non-maximal subsets are omitted). Build a proximity structure (kd-tree, cover tree, etc.) on the input points. For each input point p, use the structure to find all points q such that d(p, q) ≤ 2R. For each point q, there are one or two circles that contain p and q on their boundary. Find their centers by solving some quadratic equations and then look among the other choices of q to determine the subset.

Intersection of line and convex set

Let X be a collection of n points in some moderate-dimensional space - for now, say R^5. Let S be the convex hull of X, let p be a point in S, and let v be any direction. Finally, let L = {p + lambda v : lambda a real number} be the line passing through p in direction v.
I am interested in finding a reasonably efficient algorithm for computing the intersection of S with L. I'd also be interested in hearing if it is known that no such algorithm exists! Note that this intersection can be represented by the (extreme) two points of intersection of L with the boundary of S. I'm particularly interested in finding an algorithm that behaves well when n is large.
I should say that it is easy to do this very efficiently in two dimensions. In that case, one can order the points of X in 'clockwise order' as seen from p, and then do binary search. So, the initial ordering takes O(n log(n)) steps and then further lookups take O(log(n)) steps. I don't see what the analogous algorithm should be in higher dimensions. Part of the problem is that a convex body in two dimensions has n vertices and n faces, while a convex body in 3 or higher dimensions can have n vertices but many, many more than n faces.
You can write a simple linear program for this. You want to minimise/maximise lambda subject to the constraint that x + lambda v lies in the convex hull of your input points. "Lies in the convex hull of" is coordinatewise equality between two points, one of which is a nonnegative weighted average of your input points such that the weights sum to 1.
As a practical matter, it may be useful to start with a handful of randomly chosen points, get a convex combination or a certificate of infeasibility, then interpret the cerificate of infeasibility as a linear inequality and find the input point that most violates it. If you're using a practical solver, this means you want to formulate the dual, switch a bunch of presolve things off, and run the above essentially as a cutting plane method using certificates of unboundedness instead. It is likely that, unless you have pathological data, you will only need to tell the LP solver about a small handful of your input points.

algorithm to find a point among n points in plane to minimize the sum of distances

I have an algorithm problem here. It is different from the normal Fermat Point problem.
Given a set of n points in the plane, I need to find which one can minimize the sum of distances to the rest of n-1 points.
Is there any algorithm you know of run less than O(n^2)?
Thank you.
One solution is to assume median is close to the mean and for a subset of points close to the mean exhaustively calculate sum of distances. You can choose klog(n) points closest to the mean, where k is an arbitrarily chosen constant (complexity nlog(n)).
Another possible solution is Delaunay Triangulation. This triangulation is possible in O(nlogn) time. The triangulation results in a graph with one vertex for each point and edges to satisfy delauney triangulation.
Once you have the triangulation, you can start at any point and compare sum-of-distances of that point to its neighbors and keep moving iteratively. You can stop when the current point has the minimum sum-of-distance compared to its neighbors. Intuitively, this will halt at the global optimal point.
I think the underlying assumption here is that you have a dataset of points which you can easily bound, as many algorithms which would be "good enough" in practice may not be rigorous enough for theory and/or may not scale well for arbitrarily large solutions.
A very simple solution which is probably "good enough" is to sort the coordinates on the Y ordinate, then do a stable sort on the X ordinate.
Take the rectangle defined by the min(X,Y) and max(X,Y) values, complexity O(1) as the values will be at known locations in the sorted dataset.
Now, working from the center of your sorted dataset, find coordinate values as close as possible to {Xctr = Xmin + (Xmax - Xmin) / 2, Yctr = Ymin + (Ymax - Ymin) / 2} -- complexity O(N) bounded by your minimization criteria, distance being the familiar radius from {Xctr,Yctr}.
The worst case complexity would be comparing your centroid to every other point, but once you get away from the middle points you will not be improving the global optimal and should terminate the search.

Fast algorithm for calculating union of 'local convex hulls'

I have a set of 2D points from which I want to generate a polygon (or collection of polygons) outlining the 'shape' of those points, using the following concept:
For each point in the set, calculate the convex hull of all points within radius R of that point. After doing this for each point, take the union of these convex hulls to produce the final shape.
A brute force approach of actually constructing all these convex hulls is something like O(N^2 + R^2 log R). Is there a known, more efficient algorithm to produce the same result? Or perhaps a different way of expressing the problem?
Note: I am aware of alpha shapes, they are different; I am looking for an algorithm to perform what is described above.
The following solution does not work - disproved experimentally in MATLAB.
Update: I have a proposed solution.
Proposition: take the Delaunay Triangulation of the set of points, remove all triangles having circumradius greater than R. Then take the union of the remaining triangles.
A sweep line algorithm can improve searching for the R-neighbors. Alternatively, you can consider only pairs of points that are in neighboring squares of square grid of width R. Both of these ideas can get rid of the N^2 - of course only if the points are relatively sparse.
I believe that a clever combination of sweeping and convex hull finding cat get rid of the N^2 even if the points are not sparse (as in Olexiy's example), but cannot come up with a concrete algorithm.
Yes, using rotating calipers. My prof wrote some stuff on this, it starts on page 19.
Please let me know if I misunderstood the problem.
I don't see how do you get N^2 time for brute-forcing all convex hulls in the worst case (1). What if almost any 2 points are closer than R - in this case you need at least N^2*logN to just construct the convex hulls, leave alone computing their union.
Also, where does R^2*logR in your estimation comes from?
1 The worst case (as I see it) for a huge N - take a circle of radius R / 2 and randomly place points on its border and just outside it.

Need Better Algorithm for Finding Mapping Between 2 Sets of Points with Minimum Distance

Problem: I have two overlapping 2D shapes, A and B, each shape having the same number of pixels, but differing in shape. Some portion of the shapes are overlapping, and there are some pieces of each that are not overlapping. My goal is to move all the non-overlapping pixels in shape A to the non-overlapping pixels in shape B. Since the number of pixels in each shape is the same, I should be able to find a 1-to-1 mapping of pixels. The restriction is that I want to find the mapping that minimizes the total distance traveled by all the pixels that moved.
Brute Force: The brute force approach to solving this problem is obviously out of the question, since I would have to compute the total distance of all possible mappings of which I think there are n! (where n is the number of non-overlapping pixels in one shape) times the computation of calculating a distance for each pair of points in the mapping, n, giving a total of O( n * n! ) or something similar.
Backtracking: The only "better" solution I could think of was to use backtracking, where I would keep track of the current minimum so far and at any point when I'm evaluating a certain mapping, if I reach or exceed that minimum, I move on to the next mapping. Even this won't do any better than O( n! ).
Is there any way to solve this problem with a reasonable complexity?
Also note that the "obvious" approach of simply mapping a point to it's closest matching neighbour does not always yield the optimum solution.
Simpler Approach?: As a secondary question, if a feasible solution doesn't exist, one possibility might be to partition each non-overlapping section into small regions, and map these regions, greatly reducing the number of mappings. To calculate the distance between two regions I would use the center of mass (average of the pixel locations in the region). However, this presents the problem of how I should go about doing the partitioning in order to get a near-optimal answer.
Any ideas are appreciated!!
This is the Minimum Matching problem, and you are correct that it is a hard problem in general. However for the 2D Euclidean Bipartite Minimum Matching case it is solvable in close to O(n²) (see link).
For fast approximations, FryGuy is on the right track with Simulated Annealing. This is one approach.
Also take a look at Approximation algorithms for bipartite and non-bipartite matching in the plane for a O((n/ε)^1.5*log^5(n)) (1+ε)-randomized approximation scheme.
You might consider simulated annealing for this. Start off by assigning A[x] -> B[y] for each pixel, randomly, and calculate the sum of squared distances. Then swap a pair of x<->y mappings, randomly. Then choose to accept this with probability Q, where Q is higher if the new mapping is better, and tends towards zero over time. See the wikipedia article for a better explanation.
Sort pixels in shape A: in increasing order of 'x' and then 'y' ordinates
Sort pixels in shape B: in decreasing order of 'x' and then increasing 'y'
Map pixels at the same index: in the sorted list the first pixel in A will map to first pixel in B. Is this not the mapping you are looking for?

Resources