Calculating all bitonic paths - algorithm

I'm trying to calculate all bitonic paths for a given set of points.
Given N points.
My guess is there are O(n!) possible paths.
Reasoning
You have n points you can choose from your starting location. From there you have n-1 points, then n-2 points...which seems to equal n!.
Is this reasoning correct?

You can solve it with good old dynamic programming.
Let Count(top,bottom) be the number of incomplete tours such that top is the rightmost top row point and bottom is the rightmost point and all the points left of top are bottom are already in the trail.
Now, Count(i,j) = Count(k,j) where k={i-1}U{l: l
This is O(n^3) complexity.
If you want to enumerate all the bitonic trails, along with Count also keep track of all the paths. In the update step append path appropriately. This would require a lot of memory though. If you don't want to use lot of memory use recursion (same idea. sort the points. At every recursion point either put the new point is top fork or the bottom fork and check if there are any crossings)

Related

Most efficient way to select point with the most surrounding points

N.B: there's a major edit at the bottom of the question - check it out
Question
Say I have a set of points:
I want to find the point with the most points surrounding it, within radius (ie a circle) or within (ie a square) of the point for 2 dimensions. I'll refer to it as the densest point function.
For the diagrams in this question, I'll represent the surrounding region as circles. In the image above, the middle point's surrounding region is shown in green. This middle point has the most surrounding points of all the points within radius and would be returned by the densest point function.
What I've tried
A viable way to solve this problem would be to use a range searching solution; this answer explains further and that it has " worst-case time". Using this, I could get the number of points surrounding each point and choose the point with largest surrounding point count.
However, if the points were extremely densely packed (in the order of a million), as such:
then each of these million points () would need to have a range search performed. The worst-case time , where is the number of points returned in the range, is true for the following point tree types:
kd-trees of two dimensions (which are actually slightly worse, at ),
2d-range trees,
Quadtrees, which have a worst-case time of
So, for a group of points within radius of all points within the group, it gives complexity of for each point. This yields over a trillion operations!
Any ideas on a more efficient, precise way of achieving this, so that I could find the point with the most surrounding points for a group of points, and in a reasonable time (preferably or less)?
EDIT
Turns out that the method above is correct! I just need help implementing it.
(Semi-)Solution
If I use a 2d-range tree:
A range reporting query costs , for returned points,
For a range tree with fractional cascading (also known as layered range trees) the complexity is ,
For 2 dimensions, that is ,
Furthermore, if I perform a range counting query (i.e., I do not report each point), then it costs .
I'd perform this on every point - yielding the complexity I desired!
Problem
However, I cannot figure out how to write the code for a counting query for a 2d layered range tree.
I've found a great resource (from page 113 onwards) about range trees, including 2d-range tree psuedocode. But I can't figure out how to introduce fractional cascading, nor how to correctly implement the counting query so that it is of O(log n) complexity.
I've also found two range tree implementations here and here in Java, and one in C++ here, although I'm not sure this uses fractional cascading as it states above the countInRange method that
It returns the number of such points in worst case
* O(log(n)^d) time. It can also return the points that are in the rectangle in worst case
* O(log(n)^d + k) time where k is the number of points that lie in the rectangle.
which suggests to me it does not apply fractional cascading.
Refined question
To answer the question above therefore, all I need to know is if there are any libraries with 2d-range trees with fractional cascading that have a range counting query of complexity so I don't go reinventing any wheels, or can you help me to write/modify the resources above to perform a query of that complexity?
Also not complaining if you can provide me with any other methods to achieve a range counting query of 2d points in in any other way!
I suggest using plane sweep algorithm. This allows one-dimensional range queries instead of 2-d queries. (Which is more efficient, simpler, and in case of square neighborhood does not require fractional cascading):
Sort points by Y-coordinate to array S.
Advance 3 pointers to array S: one (C) for currently inspected (center) point; other one, A (a little bit ahead) for nearest point at distance > R below C; and the last one, B (a little bit behind) for farthest point at distance < R above it.
Insert points pointed by A to Order statistic tree (ordered by coordinate X) and remove points pointed by B from this tree. Use this tree to find points at distance R to the left/right from C and use difference of these points' positions in the tree to get number of points in square area around C.
Use results of previous step to select "most surrounded" point.
This algorithm could be optimized if you rotate points (or just exchange X-Y coordinates) so that width of the occupied area is not larger than its height. Also you could cut points into vertical slices (with R-sized overlap) and process slices separately - if there are too many elements in the tree so that it does not fit in CPU cache (which is unlikely for only 1 million points). This algorithm (optimized or not) has time complexity O(n log n).
For circular neighborhood (if R is not too large and points are evenly distributed) you could approximate circle with several rectangles:
In this case step 2 of the algorithm should use more pointers to allow insertion/removal to/from several trees. And on step 3 you should do a linear search near points at proper distance (<=R) to distinguish points inside the circle from the points outside it.
Other way to deal with circular neighborhood is to approximate circle with rectangles of equal height (but here circle should be split into more pieces). This results in much simpler algorithm (where sorted arrays are used instead of order statistic trees):
Cut area occupied by points into horizontal slices, sort slices by Y, then sort points inside slices by X.
For each point in each slice, assume it to be a "center" point and do step 3.
For each nearby slice use binary search to find points with Euclidean distance close to R, then use linear search to tell "inside" points from "outside" ones. Stop linear search where the slice is completely inside the circle, and count remaining points by difference of positions in the array.
Use results of previous step to select "most surrounded" point.
This algorithm allows optimizations mentioned earlier as well as fractional cascading.
I would start by creating something like a https://en.wikipedia.org/wiki/K-d_tree, where you have a tree with points at the leaves and each node information about its descendants. At each node I would keep a count of the number of descendants, and a bounding box enclosing those descendants.
Now for each point I would recursively search the tree. At each node I visit, either all of the bounding box is within R of the current point, all of the bounding box is more than R away from the current point, or some of it is inside R and some outside R. In the first case I can use the count of the number of descendants of the current node to increase the count of points within R of the current point and return up one level of the recursion. In the second case I can simply return up one level of the recursion without incrementing anything. It is only in the intermediate case that I need to continue recursing down the tree.
So I can work out for each point the number of neighbours within R without checking every other point, and pick the point with the highest count.
If the points are spread out evenly then I think you will end up constructing a k-d tree where the lower levels are close to a regular grid, and I think if the grid is of size A x A then in the worst case R is large enough so that its boundary is a circle that intersects O(A) low level cells, so I think that if you have O(n) points you could expect this to cost about O(n * sqrt(n)).
You can speed up whatever algorithm you use by preprocessing your data in O(n) time to estimate the number of neighbouring points.
For a circle of radius R, create a grid whose cells have dimension R in both the x- and y-directions. For each point, determine to which cell it belongs. For a given cell c this test is easy:
c.x<=p.x && p.x<=c.x+R && c.y<=p.y && p.y<=c.y+R
(You may want to think deeply about whether a closed or half-open interval is correct.)
If you have relatively dense/homogeneous coverage, then you can use an array to store the values. If coverage is sparse/heterogeneous, you may wish to use a hashmap.
Now, consider a point on the grid. The extremal locations of a point within a cell are as indicated:
Points at the corners of the cell can only be neighbours with points in four cells. Points along an edge can be neighbours with points in six cells. Points not on an edge are neighbours with points in 7-9 cells. Since it's rare for a point to fall exactly on a corner or edge, we assume that any point in the focal cell is neighbours with the points in all 8 surrounding cells.
So, if a point p is in a cell (x,y), N[p] identifies the number of neighbours of p within radius R, and Np[y][x] denotes the number of points in cell (x,y), then N[p] is given by:
N[p] = Np[y][x]+
Np[y][x-1]+
Np[y-1][x-1]+
Np[y-1][x]+
Np[y-1][x+1]+
Np[y][x+1]+
Np[y+1][x+1]+
Np[y+1][x]+
Np[y+1][x-1]
Once we have the number of neighbours estimated for each point, we can heapify that data structure into a maxheap in O(n) time (with, e.g. make_heap). The structure is now a priority-queue and we can pull points off in O(log n) time per query ordered by their estimated number of neighbours.
Do this for the first point and use a O(log n + k) circle search (or some more clever algorithm) to determine the actual number of neighbours the point has. Make a note of this point in a variable best_found and update its N[p] value.
Peek at the top of the heap. If the estimated number of neighbours is less than N[best_found] then we are done. Otherwise, repeat the above operation.
To improve estimates you could use a finer grid, like so:
along with some clever sliding window techniques to reduce the amount of processing required (see, for instance, this answer for rectangular cases - for circular windows you should probably use a collection of FIFO queues). To increase security you can randomize the origin of the grid.
Considering again the example you posed:
It's clear that this heuristic has the potential to save considerable time: with the above grid, only a single expensive check would need to be performed in order to prove that the middle point has the most neighbours. Again, a higher-resolution grid will improve the estimates and decrease the number of expensive checks which need to be made.
You could, and should, use a similar bounding technique in conjunction with mcdowella's answers; however, his answer does not provide a good place to start looking, so it is possible to spend a lot of time exploring low-value points.

Connect points from set in the line segments

I have been given a task where I have to connects all the points in the 2D plane.
There are four conditions to to be met:
Length of the all segments joined together has to be minimal.
One point can be a part of only one line segment.
Line segments cannot intersect
All points have to be used(one can't be left alone but only if it cannot be avoided)
Image to visualize the problem:
The wrong image connected points correctly, although the total length is bigger that the the one in on the left.
At first I thought about sorting the points and doing it with a sweeping line and building a tree of all possibilities, although it does seem like a way to complicated solution with huge complexity. Therefore I search better approaches. I would appreciate some hints what to do, or how could I approach the problem.
I would start with a Delaunay triangulation of the point set. This should already give you the nearest neighbor connections of each point without any intersections. In the next step I'd look at the triangles that result from the triangulation - the convenient property here is that based on your ruleset you can pick exactly one side from each triangle and remove the remaining two from the selection.
The problem that remains now is to pick those edges that give you the smallest total sum which of course will not always be the smallest side since that one might already have been blocked by a neighboring triangle. I'd start with a greedy approach, always picking the smallest remaining edge that has not been blocked by neighboring triangles yet.
Edit: In the next step you retrieve a list of all the edges in that triangulation and sort them by length. You also make another list in which you count the amount of connections each point has. Now you iterate through the edge list going from the longest edge to the shortest one and check the two points it connects in the connection count list: if each of the points has still more than 1 connection left, you can discard the edge and decrement the connection count for the two points involved. If at least one of the points has only one connection left, you have got yourself one of the edges you are looking for. You repeat the process until there are no edges left and this should hopefully give you the smallest possible edge sum.
If I am not mistaken this problem is loosely related to the knapsack problem which is NP-Hard so I am not sure if this solution really gives you the best possible one.
I'd say this is an extension to the well-known travelling salesman problem.
A good technique (if a little old-fashioned) is to use a simulated annealing optimisation technique.
You'll need to make adjustments to the cost (a.k.a. objective) function to miss out sections of the path. But given a candidate continuous path, it's reasonably trivial to decide which sections to miss out to minimise its length. (You'd first remove the longer of any intersecting lines).
Wow, that's a tricky one. That's a lot of conditions to meet.
I think from a programming standpoint, the "simplest" solution might actually be to just loop through, find all the possibilities that satisfy the last 3 conditions, and record the total length as you loop through, and just choose the one with the shortest length in the end - brute force, guess-and-check. I think this is what you were referring to in your OP when you mentioned a "sweeping line and building a tree of all possibilities". This approach is very computationally expensive, but if the code is written right, it should always work in the end.
If you want the "best" solution, where you want to just solve for the single final answer right away, I'm afraid my math skills aren't strong enough for that - I'm not even sure if there is any single analytical solution to that problem for any arbitrary collection of points. Maybe try checking with the people over at MathOverflow. If someone over there can explain you with the math behind that calculation, and you then you still need help to convert that math into code in a certain programming language, update your question here (maybe with a link to the answer they provide you) and I'm sure someone will be able to help you out from that point.
One of the possible solutions is to use graph theory.
Construct a bipartite graph G, such that each point has its copy in both parts. Now put the edges between the points i and j with the weight = i == j ? infinity : distance[i][j]. The minimal weight maximum matching in the graph will be your desired configuration.
Notice that since this is on a euclidean 2D plane, the resulting "edges" of the matching will not intersect. Let's say that edges AB and XY intersect for points A, B, X, Y. Then the matching is not of the minimum weight, because either AX, BY or AY, BX will produce a smaller total weight without an intersection (this comes from triangle inequality a+b > c)

In the closest pair of points divide and conquer algorithm, what is the significance of sorting the "strip" by the points' y values?

I believe I understand the algorithm quite clearly, except for the step where you look to see if there's any points that are close by looking across the division and create a strip where points within the strip are candidates.
But then the algorithm states to sort the points by their y coordinates and then check each other point in the strip to find if there is a smaller distance than the one previously found. It basically sounds like you brute force within the strip.
For example, here's what Introduction to Algorithms states:
So it seems you just take each point and compare it against all the others to find the closest points? Why is it necessary to sort by y value then? You already have them sorted by x, why not brute force with that?
You don't brute force compare against all points in Y' but only against the one next to p. If that one is already too far away you can just stop, because all other points will be even further away. You only continue evaluating the next closest neighbor if the last one was still within your search distance.
The text explains it in the As we will see shortly section.
Sorting is an optimization here that allows you to iterate nearest neighbors in O(1) after paying the sorting costs of O(n log n) once.

Closest Pair of Points Algorithm

I am currently working on implementing the closest pair of points algorithm in C++. That is, given a list of points (x, y) find the pair of points that has the smallest Euclidean distance. I have done research into this and my understanding of the algorithm is the following (please correct me if I'm wrong):
Split the array of points down the middle
Recursively find the pair of points with the minimum distance for the left and right halves.
Sort the left and right halves by y-coordinate, and compare each point on the left to its 6 closest neighbors (by y-coordinate) on the right. There is some theoretical stuff behind this, but this is my understanding of what needs to be done).
I've gotten the recursion part of the algorithm to work, but am struggling to find an efficient way to find the 6 closest neighbors on the right for each point on the left. In other words, given two sorted arrays, I need to find the 6 closest numbers in Array B for each point in array A. I assume something similar to merge sort is required here, but haven't been able to figure it out. Any help would be much appreciated.
Sounds like you want a quad tree.
Let dist = min(dist_L, dist_R) where dist_L, dist_R are the minimum distances found in the left and right sets, respectively.
Now to find the minimum distance where one point is on the left half and the other on the right half, you only need to consider points whose x-coordinates are in the interval [x_m - dist, x_m+dist].
The idea now is to consider the 6 closest points in this interval. So sort the points by y-coordinate for each point, look forward at the next 6. This will result in an O(nlog^2(n)) running time.
You can further improve upon this to O(nlogn) by speeding up the sorting process. To do this, have each recursive call also return a sorted list of the points. Then to sort the entire list, you just have to merge the two sorted lists. An observant reader would notice that this is precisely merge sort.

Closest pair of points Planar case

I am looking at the wikipedia entry for how to solve this. It lists five steps
1.Sort points along the x-coordinate
2.Split the set of points into two equal-sized subsets by a vertical line x = xmid
3.Solve the problem recursively in the left and right subsets. This will give the left-side and right-side minimal distances dLmin and dRmin respectively.
4.Find the minimal distance dLRmin among the pair of points in which one point lies on the left of the dividing vertical and the second point lies to the right.
5.The final answer is the minimum among dLmin, dRmin, and dLRmin.
The fourth step I am having trouble understanding. How do I choose what point to the left of the line to compare to a point right of the line. I know I am not supposed to compare all points, but I am unclear about how to choose points to compare. Please do not send me a link, I have searched, gone to numerous links, and have not found an explanation that helps me understand step 4.
Thanks
Aaron
The answer to your question was in the next paragraph of the wikipedia article:
It turns out that step 4 may be
accomplished in linear time. Again, a
naive approach would require the
calculation of distances for all
left-right pairs, i.e., in quadratic
time. The key observation is based on
the following sparsity property of the
point set. We already know that the
closest pair of points is no further
apart than dist = min(dLmin,dRmin).
Therefore for each point p of the left
of the dividing line we have to
compare the distances to the points
that lie in the rectangle of
dimensions (dist, 2 * dist) to the
right of the dividing line, as shown
in the figure. And what is more, this
rectangle can contain at most 6 points
with pairwise distances at least
dRmin. Therefore it is sufficient to
compute at most 6n left-right
distances in step 4. The recurrence
relation for the number of steps can
be written as T(n) = 2T(n / 2) + O(n),
which we can solve using the master
theorem to get O(n log n).
I don't think I can put it much clearer than they already have, but do you have any specific questions about this step of the algorithm?

Resources