Two salesmen - one always visits the nearest neighbour, the other the farthest - algorithm

Consider this question relative to graph theory:
Let G a complete (every vertex is connected to all the other vertices) non-directed graph of size N x N. Two "salesmen" travel this way: the first always visits the nearest non visited vertex, the second the farthest, until they have both visited all the vertices. We must generate a matrix of distances and the starting points for the two salesmen (they can be different) such that:
All the distances are unique Edit: positive integers
The distance from a vertex to itself is always 0.
The difference between the total distance covered by the two salesmen must be a specific number, D.
The distance from A to B is equal to the distance from B to A
What efficient algorithms cn be useful to help me? I can only think of backtracking, but I don't see any way to reduce the work to be done by the program.

Geometry is helpful.
Using the distances of points on a circle seems like it would work. Seems like you could determine adjust D by making the circle radius larger or smaller.
Alternatively really any 2D shape, where the distances are all different could probably used as well. In this case you should scale up or down the shape to obtain the correct D.
Edit: Now that I think about it, the simplest solution may be to simply pick N random 2D points, say 32 bit integer coordinates to lower the chances of any distances being too close to equal. If two distances are too close, just pick a different point for one of them until it's valid.
Ideally, you'd then just need to work out a formula to determine the relationship between D and the scaling factor, which I'm not sure of offhand. If nothing else, you could also just use binary search or interpolation search or something to search for scaling factor to obtain the required D, but that's a slower method.

Related

Finding the closest, "farthest" point from a set of points in 3-dimensions

Suppose I have some n points (in my case, 4 points) in 3 dimensions. I want to determine both the point a which minimizes the squared distance to each of these n points, as well as the largest difference that can exist between the distance from an arbitrary point b and any two of these n points (i.e. the two "farthest points").
How can this be most efficiently accomplished? I know that, in 2 dimensions and with 3 points, the solution to the point that minimized distance is the centroid of the triangle formed by the 3 points, and the solution to the largest difference can be found by taking a point located precisely at one (any?) of the 3 points. It seems that the same should be true in 3 dimensions, although I am unsure.
I want to determine both the point that minimizes distance from each of these n points
The centroid minimizes the sum of the squared distances to every point in the set. But will not minimize the max distance (the farther distance) to the points.
I suspect that you are interested in computing the center and radius of the minimal sphere containing every point in the set. This is a classic problem in CG that can be solved in linear time quite easily in an approximate way, or exactly if you program the algorithm propossed by Emmerich Welzl.
If the number of points is as small as 4, an approximate solution is search the pair of point with maximum distance (there is 12 possible pairs) and compute the midpoint as center and half-distance as radius . Then, ensure that the other two points are also inside the sphere, or make it grow if necessary.
See more information at
https://en.wikipedia.org/wiki/Bounding_sphere
https://en.wikipedia.org/wiki/Smallest-circle_problem
The largest difference between the distances of a point to two given points is achieved when the three points are aligned and the unknown point is "outside" (there are infinitely many solutions). In this configuration, the difference is just the distance between the two given points.
If you mean to maximize all differences simultaneously (or rather the sum of differences), you must go to infinity in some direction. That direction maximizes the sum of the lengths of the projections of all edges.

List all sets of points that are enclosed by a circle with given radius

My problem is: Given N points in a plane and a number R, list/enumerate all subsets of points, where points in each subset are enclosed by a circle with radius of R. Two subsets should be different and not covered each other.
Efficiency may not be important, but the algorithm should not be too slow.
In a special case, can we find K subsets with most points? Approximation algorithm can be accepted.
Thanks,
Edit: It seems that the statement is not clear to understand. My bad!
So I restate my question as follows: Given N points and a circle with fixed radius R, use the circle to scan whole the space. At a time, the circle will cover a subset of points. The goal is to list all the possible subset of points that can be covered by such an R-radius circle. One subset cannot be a superset of other subsets.
I am not sure I get what you mean by 'not covered'. If you drop this, what you are looking for is exactely a Cech complex whose complexity is high, you wont have efficient algorithm if you dont have condition on the sampling (sampling should be sparse enough and R not too big otherwise you could have 2^n subsets with n your number of points). You have to enumerate all subsets and check if their minimal enclosing ball radius is lower than R. You can reduce the search to all subsets whose diameter is lower than R (eg pairwise distance lower than R) which may be sufficient in your case.
If 'not covered' for two subsets mean that one is not included into the other, you can have many different decompositions. One of interest is the alpha-complex as it can be computed efficiently in O(nlogn) in dimension 2-3 (I will suggest to use CGAL to compute it, you can also see what it means with pictures). If your points are high dimensional, then you will probably end up computing a Cech complex.
Without loss of generality, we can assume that the enclosing circles considered pass through at least two points (ignoring the trivial cases of no points or one point and assuming that your motivation is maximizing density, so that you don't care if non-maximal subsets are omitted). Build a proximity structure (kd-tree, cover tree, etc.) on the input points. For each input point p, use the structure to find all points q such that d(p, q) ≤ 2R. For each point q, there are one or two circles that contain p and q on their boundary. Find their centers by solving some quadratic equations and then look among the other choices of q to determine the subset.

How can I pick a set of vertices to subtract from a polygon such that the distortion is minimum?

I'm working with a really slow renderer, and I need to approximate polygons so that they look almost the same when confined to a screen area containing very few pixels. That is, I'd need an algorithm to go through a polygon and subtract/move a bunch of vertices until the end polygon has a good combination of shape preservation and economy of vertice usage.
I don't know if there's a formal name for these kind of problems, but if anyone knows what it is it would help me get started with my research.
My untested plan is to remove the vertices that change the polygon area the least, and protect the vertices that touch the bounding box from removal, until the difference in area from the original polygon to the proposed approximate one exceeds a tolerance I specify.
This would all be done only once, not in real time.
Any other ideas?
Thanks!
You're thinking about the problem in a slightly off way. If your goal is to reduce the number of vertices with a minimum of distortion, you should be defining your distortion in terms of those same vertices, which define the shape. There's a very simple solution here, which I believe would solve your problem:
Calculate distance between adjacent vertices
Choose a tolerance between vertices, below which the vertices are resolved into a single vertex
Replace all pairs of vertices with distances lower than your cutoff with a single vertex halfway between the two.
Repeat until no vertices are removed.
Since your area is ultimately decided by the vertex placement, this method preserves shape and minimizes shape distortion. The one drawback is that distance between vertices might be slightly less intuitive than polygon area, but the two are proportional. If you really wish, you could run through the change in area that would result from vertex removal, but that's a lot more work for questionable benefit imo.
As mentioned by Angus, if you want a direct solution for the change in area, it's not actually super difficult. Was originally going to leave this as an exercise to the reader, but it's totally possible to solve this exactly, though you need to include vertices on either side.
Assume you're looking at a window of vertices [A, B, C, D] that are connected in that order. In this example we're determining the "cost" of combining B and C.
Calculate the angle offset from collinearity from A toward C. Basically you just want to see how far from collinear the two points are. This is |sin(|arctan(B - A)| - |arctan(C - A)|)| Where pipes are absolute value, and differences are the sensical notion of difference.
Calculate the total distance over which the angle change will effectively be applied, this is just the euclidean distance from A to B times the euclidean distance from B to C.
Multiply the terms from 2 and 3 to get your first term
To get your second term, repeat steps 2 - 4 replacing A with D, B with C, and C with B (just going in the opposite direction)
Calculate the geometric mean of the two terms obtained.
The number that results in step 6 presents the full-picture minus a couple constants.
I tried my own plan first: Protect the vertices touching the bounding box, then remove the rest in the order that changes the resultant area the least, until you can't find a vertice to remove that keeps the new polygon area within X% of the original one. This is the result with X = 5%:
When the user zooms out really far these shapes fit the bill well enough for me. I haven't tried any of the other suggestions. The savings are quite astonishing, sometimes from 80-100 vertices down to 4 or 5.

Remove points to maximize shortest nearest neighbor distance

If I have a set of N points in 2D space, defined by vectors X and Y of their locations. What is an efficient algorithm that will
Select a fixed number (M) points to remove so as to maximize the shortest nearest-neighbor distance among the remaining points.
Remove a minimum number of points so that the shortest nearest-neighbor distance among the remaining points is greater than a fixed distance (D).
Sorting by points by their shortest nearest neighbor distance and removing the points with the smallest values does not give the correct answer, as then you remove both points of close pairs, while you may only need to remove one of the points in those pairs.
For my case, I am usually dealing with 1,000-10,000 points, and I may remove 50-90% of points.
You shouldn't need to store (or compute) the entire distance matrix: a Delaunay triangulation should efficiently (O(n log n) worst case) give you the closest neighbors of your point set. You should also be able to update it efficiently as you delete points.
For most cases of close pairs, you should be able to check to see which of the pair would be farthest from its neighbors if the other is removed. This is not an exact solution; especially if you remove a large proportion of points, removing a locally optimum point may exclude the globally optimum solution. Also, you should be able to deal with clusters of 3 or more locally close points. However, if you are only removing a small proportion of points from a randomly distributed set, both these cases may be relatively rare.
There may or may not be a better way (i.e., an exact and efficient algorithm) to solve your problem, but the above suggestions should lead to an approximate and/or combinatorial approach which works best when the points that need deleting are sparsely distributed.
Noam
One method is to break your 2D space into N partitions. Within each partition, determine an average position for each X,Y. Then perform the nearest neighbor algorightm on the averaged points. Then repeat the nearest neighbor test on the full point set of the partitions that matched.
Here's the catch. The larger the partitions, the fewer points you will have but the less accurate. The smaller the partitions, it will be more accurate but with more points to process.
I can't think of anything other than a brute force approach. But you can probably shorten the data set you are looking at significantly before any analysis.
So, what I would do is. First work out the nearest neighbour distance for each point. Let's call that P_in. Then work out the maximum distance of each point to its M nearest neighbours, call it P_iM. If P_in is greater than P_iM for any point then it can be excluded from the analysis. Basically if you have one point that is a distance of 10 from any other point, and you have another point that is a distance of 9 from the nearest M points then you should remove the first point.
Depending on the level of clustering or how big M is, this might reduce your data set quite a bit.

How to select points at a regular density

how do I select a subset of points at a regular density? More formally,
Given
a set A of irregularly spaced points,
a metric of distance dist (e.g., Euclidean distance),
and a target density d,
how can I select a smallest subset B that satisfies below?
for every point x in A,
there exists a point y in B
which satisfies dist(x,y) <= d
My current best shot is to
start with A itself
pick out the closest (or just particularly close) couple of points
randomly exclude one of them
repeat as long as the condition holds
and repeat the whole procedure for best luck. But are there better ways?
I'm trying to do this with 280,000 18-D points, but my question is in general strategy. So I also wish to know how to do it with 2-D points. And I don't really need a guarantee of a smallest subset. Any useful method is welcome. Thank you.
bottom-up method
select a random point
select among unselected y for which min(d(x,y) for x in selected) is largest
keep going!
I'll call it bottom-up and the one I originally posted top-down. This is much faster in the beginning, so for sparse sampling this should be better?
performance measure
If guarantee of optimality is not required, I think these two indicators could be useful:
radius of coverage: max {y in unselected} min(d(x,y) for x in selected)
radius of economy: min {y in selected != x} min(d(x,y) for x in selected)
RC is minimum allowed d, and there is no absolute inequality between these two. But RC <= RE is more desirable.
my little methods
For a little demonstration of that "performance measure," I generated 256 2-D points distributed uniformly or by standard normal distribution. Then I tried my top-down and bottom-up methods with them. And this is what I got:
RC is red, RE is blue. X axis is number of selected points. Did you think bottom-up could be as good? I thought so watching the animation, but it seems top-down is significantly better (look at the sparse region). Nevertheless, not too horrible given that it's much faster.
Here I packed everything.
http://www.filehosting.org/file/details/352267/density_sampling.tar.gz
You can model your problem with graphs, assume points as nodes, and connect two nodes with edge if their distance is smaller than d, Now you should find the minimum number of vertex such that they are with their connected vertices cover all nodes of graph, this is minimum vertex cover problem (which is NP-Hard in general), but you can use fast 2-approximation : repeatedly taking both endpoints of an edge into the vertex cover, then removing them from the graph.
P.S: sure you should select nodes which are fully disconnected from the graph, After removing this nodes (means selecting them), your problem is vertex cover.
A genetic algorithm may probably produce good results here.
update:
I have been playing a little with this problem and these are my findings:
A simple method (call it random-selection) to obtain a set of points fulfilling the stated condition is as follows:
start with B empty
select a random point x from A and place it in B
remove from A every point y such that dist(x, y) < d
while A is not empty go to 2
A kd-tree can be used to perform the look ups in step 3 relatively fast.
The experiments I have run in 2D show that the subsets generated are approximately half the size of the ones generated by your top-down approach.
Then I have used this random-selection algorithm to seed a genetic algorithm that resulted in a further 25% reduction on the size of the subsets.
For mutation, giving a chromosome representing a subset B, I randomly choose an hyperball inside the minimal axis-aligned hyperbox that covers all the points in A. Then, I remove from B all the points that are also in the hyperball and use the random-selection to complete it again.
For crossover I employ a similar approach, using a random hyperball to divide the mother and father chromosomes.
I have implemented everything in Perl using my wrapper for the GAUL library (GAUL can be obtained from here.
The script is here: https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_density.pl
It accepts a list of n-dimensional points from stdin and generates a collection of pictures showing the best solution for every iteration of the genetic algorithm. The companion script https://github.com/salva/p5-AI-GAUL/blob/master/examples/point_gen.pl can be used to generate the random points with a uniform distribution.
Here is a proposal which makes an assumption of Manhattan distance metric:
Divide up the entire space into a grid of granularity d. Formally: partition A so that points (x1,...,xn) and (y1,...,yn) are in the same partition exactly when (floor(x1/d),...,floor(xn/d))=(floor(y1/d),...,floor(yn/d)).
Pick one point (arbitrarily) from each grid space -- that is, choose a representative from each set in the partition created in step 1. Don't worry if some grid spaces are empty! Simply don't choose a representative for this space.
Actually, the implementation won't have to do any real work to do step one, and step two can be done in one pass through the points, using a hash of the partition identifier (the (floor(x1/d),...,floor(xn/d))) to check whether we have already chosen a representative for a particular grid space, so this can be very, very fast.
Some other distance metrics may be able to use an adapted approach. For example, the Euclidean metric could use d/sqrt(n)-size grids. In this case, you might want to add a post-processing step that tries to reduce the cover a bit (since the grids described above are no longer exactly radius-d balls -- the balls overlap neighboring grids a bit), but I'm not sure how that part would look.
To be lazy, this can be casted to a set cover problem, which can be handled by mixed-integer problem solver/optimizers. Here is a GNU MathProg model for the GLPK LP/MIP solver. Here C denotes which point can "satisfy" each point.
param N, integer, > 0;
set C{1..N};
var x{i in 1..N}, binary;
s.t. cover{i in 1..N}: sum{j in C[i]} x[j] >= 1;
minimize goal: sum{i in 1..N} x[i];
With normally distributed 1000 points, it didn't find the optimum subset in 4 minutes, but it said it knew the true minimum and it selected only one more point.

Resources