Related
I have to find the best algorithm to define pairing between the items from two lists as in the figure. The pair is valid only if the number of node in list A is lower than number of node in list B and there are no crosses between links. The quality of the matching algorithm is determined by the total number of links.
I firstly tried to use a very simple algorithm: take a node in the list A and then look for the first node in list B that is higher than the former. The second figure shows a test case where this algorithm is not the best one.
Simple back-tracking can work (it may not be optimal, but it will certainly work).
For each legal pairing A[i], B[j], there are two choices:
take it, and make it illegal to try to pair any A[x], B[y] with x>i and y<j
not take it, and look at other possible pairs
By incrementally adding legal pairs to a bunch of pairs, you will eventually exhaust all legal pairings down a path. The number of valid pairings in a path is what you seek to maximize, and this algorithm will look at all possible answers and is guaranteed to work.
Pseudocode:
function search(currentPairs):
bestPairing = currentPairs
for each currently legal pair:
nextPairing = search(copyOf(currentPairs) + this pair)
if length of nextPairing > length of bestPairing:
bestPairing = nextPairing
return bestPairing
Initially, you will pass an empty currentPairs. Searching for legal pairs is the tricky part. You can use 3 nested loops that look at all A[x], B[y], and finally, if A[x] < B[y], look against all currentPairs to see if the there is a crossing line (the cost of this is roughly O(n^3)); or you can use a boolean matrix of valid pairings, which you update at each level (less computation time, down to O(n^2) - but more expensive in terms of memory)
Here a Java implementation.
For convinience I first build a map with the valid choices for each entry of list(array) a to b.
Then I loop throuough the list, making no choice and the valid choices for a connection to b.
Since you cant go back without crossing the existing connections I keep track of the maximum assigned in b.
Works at least for the two examples...
public class ListMatcher {
private int[] a ;
private int[] b ;
private Map<Integer,List<Integer>> choicesMap;
public ListMatcher(int[] a, int[] b) {
this.a = a;
this.b = b;
choicesMap = makeMap(a,b);
}
public Map<Integer,Integer> solve() {
Map<Integer,Integer> solution = new HashMap<>();
return solve(solution, 0, -1);
}
private Map<Integer,Integer> solve(Map<Integer,Integer> soFar, int current, int max) {
// done
if (current >= a.length) {
return soFar;
}
// make no choice from this entry
Map<Integer, Integer> solution = solve(new HashMap<>(soFar),current+1, max);
for (Integer choice : choicesMap.get(current)) {
if (choice > max) // can't go back
{
Map<Integer,Integer> next = new HashMap<>(soFar);
next.put(current, choice);
next = solve(next, current+1, choice);
if (next.size() > solution.size()) {
solution = next;
}
}
}
return solution;
}
// init possible choices
private Map<Integer, List<Integer>> makeMap(int[] a, int[] b) {
Map<Integer,List<Integer>> possibleMap = new HashMap<>();
for(int i = 0; i < a.length; i++) {
List<Integer> possible = new ArrayList<>();
for(int j = 0; j < b.length; j++) {
if (a[i] < b[j]) {
possible.add(j);
}
}
possibleMap.put(i, possible);
}
return possibleMap;
}
public static void main(String[] args) {
ListMatcher matcher = new ListMatcher(new int[]{3,7,2,1,5,9,2,2},new int[]{4,5,10,1,12,3,6,7});
System.out.println(matcher.solve());
matcher = new ListMatcher(new int[]{10,1,1,1,1,1,1,1},new int[]{2,2,2,2,2,2,2,101});
System.out.println(matcher.solve());
}
}
Output
(format: zero-based index_in_a=index_in_b)
{2=0, 3=1, 4=2, 5=4, 6=5, 7=6}
{1=0, 2=1, 3=2, 4=3, 5=4, 6=5, 7=6}
Your solution isn't picked because the solutions making no choice are picked first.
You can change this by processing the loop first...
Thanks to David's suggestion, I finally found the algorithm. It is an LCS approach, replacing the '=' with an '>'.
Recursive approach
The recursive approach is very straightforward. G and V are the two vectors with size n and m (adding a 0 at the beginning of both). Starting from the end, if last from G is larger than last from V, then return 1 + the function evaluated without the last item, otherwise return max of the function removing last from G or last from V.
int evaluateMaxRecursive(vector<int> V, vector<int> G, int n, int m) {
if ((n == 0) || (m == 0)) {
return 0;
}
else {
if (V[n] < G[m]) {
return 1 + evaluateMaxRecursive(V, G, n - 1, m - 1);
} else {
return max(evaluateMaxRecursive(V, G, n - 1, m), evaluateMaxRecursive(V, G, n, m - 1));
}
}
};
The recursive approach is valid with small number of items, due to the re-evaluation of same lists that occur during the loop.
Non recursive approach
The non recursive approach goes in the opposite direction and works with a table that is filled in after having clared to 0 first row and first column. The max value is the value in the bottom left corner of the table
int evaluateMax(vector<int> V, vector<int> G, int n, int m) {
int** table = new int* [n + 1];
for (int i = 0; i < n + 1; ++i)
table[i] = new int[m + 1];
for (int i = 0; i < n + 1; i++)
for (int t = 0; t < m + 1; t++)
table[i][t] = 0;
for (int i = 1; i < m + 1; i++)
for (int t = 1; t < n + 1; t++) {
if (G[i - 1] > V[t - 1]) {
table[t] [i] = 1 + table[t - 1][i - 1];
}
else {
table[t][i] = max(table[t][i - 1], table[t - 1][i]);
}
}
return table[n][m];
}
You can find more details here LCS - Wikipedia
Starting from the green square, I want an efficient algorithm to find the nearest 3 x 3 window with no red squares, only blue squares. The algorithm doesn't need to find exactly the closest 3 x 3 window, but it should find a 3 x 3 all-blue window close to the green square (assuming one exists). I thought about implementing this as a recursive breadth-first search, but that solution would involve re-checking the same square many times. Posting this to see if someone knows of a more efficient solution. Cost to check a given square is constant and cheap, but I want to minimize the execution time of the algorithm as much as possible (practical application of this will involve finding a 3x3 "clear" / all-blue window within a much larger 2D search area).
Here's an example solution, but I don't think it's optimal. It's actually a depth-first search that I will have to restructure to convert to a breadth-first, but I need to think a bit more about how to do that (one way would be to make each point an object that expands to neighboring points, then iterate multiple times across those points to children, visit those children before allowing those children to generate more children). Point is that I think there's a more efficient and common way to do this so I'm trying to avoid reinventing the wheel.
public class Search2D {
private TreeSet<Point> centerpointscheckedsofar;
private Point Search(Point centerpoint) {
if(centerpointscheckedsofar.contains(centerpoint)) {
return null;
}
if(isWithinBounds(centerpoint)) {
if(checkCenterPoint(centerpoint)) {
centerpointscheckedsofar.add(centerpoint);
return null;
}
Point result = Search(getPoint(-1, -1, centerpoint));
if(result != null) return result;
result = Search(getPoint(-1, 0, centerpoint));
if(result != null) return result;
result = Search(getPoint(-1, 1, centerpoint));
if(result != null) return result;
result = Search(getPoint(0, -1, centerpoint));
if(result != null) return result;
result = Search(getPoint(0, 1, centerpoint));
if(result != null) return result;
result = Search(getPoint(1, -1, centerpoint));
if(result != null) return result;
result = Search(getPoint(1, 0, centerpoint));
if(result != null) return result;
result = Search(getPoint(1, 1, centerpoint));
if(result != null) return result;
}
return null;
}
private Point getPoint(int x, int y, Point centerpoint) {
return new Point(centerpoint.x + x, centerpoint.y + y);
}
private boolean checkCenterPoint(Point centerpoint) {
//check here to see if point is valid
return false;
}
private boolean isWithinBounds(Point startPoint) {
//check here to see if point and all neighboring points of 3 x 3 window falls within bounds
return false;
}
}
UPDATE:
Distance measure is not that important, but for simplicity, let's minimize Manhattan distance.
Here's a better algorithm that does not use recursion and will be guaranteed to find the closest solution (or one of the closest solutions if there is a tie). It needs a grid greater than 5 x 5 to work properly, but if you want to search a grid smaller than that, there's probably a more efficient algorithm that can be used. Assumes lowest x-index is 0 and lowest y-index is also 0.
import java.awt.Point;
public class Search2D_v2 {
private boolean[][] bitgrid;
public Search2D_v2() {
bitgrid = new boolean[20][20];
}
public Point search(int centerx, int centery, int maxx, int maxy, int maxsearchsteps) {
//check starting point first, if it works, we're done
if(checkPoint(centerx, centery)) {
return new Point(centerx, centery);
}
int westbound = centerx-1;
boolean keepgoingwest = true;
int eastbound = centerx+1;
boolean keepgoingeast = true;
int southbound = centery-1;
boolean keepgoingsouth = true;
int northbound = centery+1;
boolean keepgoingnorth = true;
//stay within bounds, may move initial search square by 1 east and 1 west
if(westbound <= 0) {
eastbound = 3;
westbound = 1;
}
if(eastbound >= maxx) {
eastbound = maxx - 1;
westbound = maxx - 3;
}
if(southbound == 0) {
northbound = 3;
southbound = 1;
}
if(northbound == maxy) {
northbound = maxy - 1;
southbound = maxy - 3;
}
//always search boundary, we've already searched inside the boundary on previous iterations, expand boundary by 1 step / square for each iteration
for(int i = 0; i < maxsearchsteps && (keepgoingwest || keepgoingeast || keepgoingsouth || keepgoingnorth); i++) {
//search top row
if(keepgoingnorth) { //if we have already hit the north bound, stop searching the top row
for(int x = westbound; x <= eastbound; x++) {
if(checkPoint(x, northbound)) {
return new Point(x, northbound);
}
}
}
//search bottom row
if(keepgoingsouth) {
for(int x = westbound; x <= eastbound; x++) {
if(checkPoint(x, southbound)) {
return new Point(x, southbound);
}
}
}
//search westbound
if(keepgoingwest) {
for(int y = southbound; y <= northbound; y++) {
if(checkPoint(westbound, northbound)) {
return new Point(westbound, y);
}
}
}
//search eastbound
if(keepgoingeast) {
for(int y = southbound; y <= northbound; y++) {
if(checkPoint(eastbound, northbound)) {
return new Point(eastbound, y);
}
}
}
//expand search area by one square on each side
if(westbound - 2 >= 0) {
westbound--;
}
else {
keepgoingwest = false;
}
if(eastbound + 2 <= maxx) {
eastbound++;
}
else {
keepgoingeast = false;
}
if(southbound - 2 >= 0) {
southbound--;
}
else {
keepgoingsouth = false;
}
if(northbound + 2 <= maxy) {
northbound++;
}
else {
keepgoingnorth = false;
}
}
return null; //failed to find a point
}
private boolean checkPoint(int centerx, int centery) {
return !bitgrid[centerx][centery] && //center
!bitgrid[centerx-1][centery-1] && //left lower
!bitgrid[centerx-1][centery] && //left middle
!bitgrid[centerx-1][centery+1] && //left upper
!bitgrid[centerx][centery-1] && //middle lower
!bitgrid[centerx][centery+1] && //middle upper
!bitgrid[centerx+1][centery-1] && //right lower
!bitgrid[centerx+1][centery] && //right middle
!bitgrid[centerx+1][centery+1]; //right upper
}
}
A simple advice would be to mark all the cells you have checked. That way you won't have to check the cells multiple times.
Recursion will definitely take more time than an iteration based approach since it will create a new stack each time you make a new call. If you are trying to find the closest one, prefer BFS over DFS.
I would also suggest making a quick internet research for "Flood Fill Algorithm".
You could spiral outwards from your starting pixel. Whenever you encounter a pixel p that has not been checked, examine the 3x3 environment around p.
For each red pixel r in the environment set the 3x3 environment of r to checked.
If there was no red pixel in the environment you found a solution.
What you're trying to find in a more general sense is a kind of morphological filter of your array.
We can define the filter as a 3x3 sliding window which sets the center of the window to the sum of the array elements within the window. Let blue squares be represented by 1 and red squares be represented by 0.
In this situation, you're trying to find the closest element with a sum value of 9.
Note that one way of solving this problem is slide a 3x3 window across your array so that it covers all possible locations. In this case, you would look at 9*width*height elements. You could then find the nearest sum value of 9 using a breadth-first search in, at most, width*height checks. So the naive time of your algorithm is proportional to 10*width*height
You can reduce this by ensuring that your filter only has to look at one value per focal cell, rather than 9. To do so, generate a summed-area table. Now your time is proportional to 2*width*height.
An example of a summed-area table
You can might be able to make this faster. Each time you find a value of 9, compare it against the location of your green cell at that moment. If most cells are not 9s, this reduces your time to some proportional to width*height.
Hensley et al. (2005)'s paper Fast Summed-Area Table Generation and its Applications explains how to use graphics hardware to generate the summed-area table in O(log n) time. So it's possible to really reduce run-times on this. Nehab et al. (2011)'s paper GPU-efficient recursive filtering and summed-area tables might also be useful (source code): their work suggests that for small windows, such as yours, the direct approach may be most efficient.
I think the easiest way is to use a slightly modified breadth-first search.
If we talk about Manhattan distance, then each square will have maximum 4 neighbors. On each step we check if the number of neighbors is equal to 3 (the fourth neighbor is a square we came from). If so, we check diagonals. Else - continue search.
public class Field3x3 {
private static class Point {
int x, y, distance;
Point previous;
public Point(int x, int y) {
this.x = x;
this.y = y;
this.distance = 0;
this.previous = this;
}
public Point(int x, int y, Point previous) {
this.x = x;
this.y = y;
this.previous = previous;
this.distance = previous.distance + 1;
}
#Override
public String toString() {
return "{x: " + x +", y: " + y + ", distance:" + distance +'}';
}
}
private static Point traverse(int[][] field, int x, int y) {
int i = 0;
Queue<Point> q = new LinkedList<>();
q.add(new Point(x, y));
while (!q.isEmpty()) {
Point p = q.remove();
System.out.print(i++ + ". current: " + p);
if (field[p.y][p.x] == 1) {
field[p.y][p.x] = 2;
List<Point> neighbors = getNeighbors(p, field);
System.out.println(", neighbors: " + neighbors);
if (neighbors.size() == 3 && checkDiagonals(p, field)) return p;
for (Point neighbor : neighbors) {
if (field[neighbor.y][neighbor.x] == 1) {
q.add(neighbor);
}
}
} else System.out.println(", already visited");
}
return null;
}
private static boolean checkDiagonals(Point p, int[][] field) {
return field[p.y - 1][p.x - 1] > 0 && field[p.y + 1][p.x - 1] > 0
&& field[p.y - 1][p.x + 1] > 0 && field[p.y + 1][p.x + 1] > 0;
}
private static List<Point> getNeighbors(Point p, int[][] field) {
List<Point> neighbors = new ArrayList<>();
if (p.y > 0 && field[p.y - 1][p.x] > 0 && p.y <= p.previous.y)
neighbors.add(new Point(p.x, p.y - 1, p));
if (p.y < field.length - 1 && field[p.y + 1][p.x] > 0 && p.y >= p.previous.y)
neighbors.add(new Point(p.x, p.y + 1, p));
if (p.x > 0 && field[p.y][p.x - 1] > 0 && p.x <= p.previous.x)
neighbors.add(new Point(p.x - 1, p.y, p));
if (p.x < field[p.y].length - 1 && field[p.y][p.x + 1] > 0 && p.x >= p.previous.x)
neighbors.add(new Point(p.x + 1, p.y, p));
return neighbors;
}
public static void main(String[] args){
int[][] field = {{1,0,0,1,1,0,1,1,1},
{1,1,1,1,1,1,1,0,1},
{1,1,1,0,1,0,1,1,1},
{0,1,1,1,1,1,1,1,0},
{1,1,1,0,0,1,1,1,0},
{1,0,1,1,1,1,0,1,0},
{1,1,1,1,0,1,1,1,0},
{1,1,1,0,1,1,1,1,0},
{1,1,1,1,0,1,1,1,0}};
System.out.println("Answer: " + traverse(field, 1, 2));
}
}
I am looking at the following interview question :
Given 2d coordinates , find the k points which are closest to the
origin. Propose a data structure for storing the points and the method to get the k points. Also point out the complexity of the code.
The solution that I have figured out is to save the 2d points in an array. For the first k points, find the distance of each point from the origin and build a max heap. For the remaining points , calculate distance from the origin , say dist. If dist is greater than the topmost element of the heap, then change topmost element of heap to dist and run the heapify() procedure.
This would take O(k) to build the heap and O((n-k)log k) for heapify() procedure , thus the total complexity = O(n log k).
Can anyone suggest a better data structure and/or method , with a possibly better efficiency too ?
EDIT
Would some other data structure be beneficial here ?
What you're looking for is partial sorting.
I think the best way is to put everything into an unsorted array and then do use a modified in-place quicksort which ignores partitions whose indices are entirely above or entirely below k, and use distance from origin as your comparison.
Pseudocode from the wikipedia article above:
function quickfindFirstK(list, left, right, k)
if right > left
select pivotIndex between left and right
pivotNewIndex := partition(list, left, right, pivotIndex)
if pivotNewIndex > left + k // new condition
quickfindFirstK(list, left, pivotNewIndex-1, k)
if pivotNewIndex < left + k
quickfindFirstK(list, pivotNewIndex+1, right, k+left-pivotNewIndex-1)
After execution, this will leave the smallest k items in the first k positions, but not in order.
I would use order statistics for this one.
Note that we use a modified SELECT that uses distance from the origin as the comparison function.
Store the elements in an array A, first element is A[1] and last is A[n].
Run SELECT(A,1,n,k) to find the kth closest element to the origin.
Return the elements A[1..k].
One of the benefits of SELECTthat it partitions the input,
so that the smallest k-1 elements are left to A[k].
So storing the elements in an array is O(n).
Running SELECT is O(n).
Returning the requested elements is O(1).
I write a simple version for you using so-called 'partial sorting' http://tzutalin.blogspot.sg/2017/02/interview-type-questions-minqueue.html
public static void main(String[] args) {
Point[] points = new Point[7];
points[0] = new Point(0, 0);
points[1] = new Point(1, 7);
points[2] = new Point(2, 2);
points[3] = new Point(2, 2);
points[4] = new Point(3, 2);
points[5] = new Point(1, 4);
points[6] = new Point(1, 1);
int k = 3;
qSelect(points, k - 1);
for (int i = 0; i < k; i++) {
System.out.println("" + points[i].x + "," + points[i].y);
}
// Output will be
// 0,0
// 1,1
// 2,2
}
// in-place qselect and zero-based
static void qSelect(Point[] points, int k) {
int l = 0;
int h = points.length - 1;
while (l <= h) {
int partionInd = partition(l, h, points);
if (partionInd == k) {
return;
} else if (partionInd < k) {
l = partionInd + 1;
} else {
h = partionInd - 1;
}
}
}
static int partition(int l, int h, Point[] points) {
// Random can be better
// int p = l + new Random.nextInt(h - l + 1);
int p = l + (h - l) / 2;
int ind = l;
swap(p, h, points);
Point comparePoint = points[h];
for (int i = l; i < h; i++) {
if (points[i].getDistFromCenter() < comparePoint.getDistFromCenter()) {
swap(i, ind, points);
ind++;
}
}
swap(ind, h, points);
return ind;
}
static void swap(int i, int j, Point[] points) {
Point temp = points[i];
points[i] = points[j];
points[j] = temp;
}
I want to compute the distance of cells from a destination cell, using number of four-way movements to reach something. So the the four cells immediately adjacent to the destination have a distance of 1, and those on the four cardinal directions of each of them have a distance of 2 and so on. There is a maximum distance that might be around 16 or 20, and there are cells that are occupied by barriers; the distance can flow around them but not through them.
I want to store the output into a 2D array, and I want to be able to compute this 'distance map' for any destination on a bigger maze map very quickly.
I am successfully doing it with a variation on a flood fill where the I place incremental distance of the adjacent unfilled cells in a priority queue (using C++ STL).
I am happy with the functionality and now want to focus on optimizing the code, as it is very performance sensitive.
What cunning and fast approaches might there be?
I think you have done everything right. If you coded it correct it takes O(n) time and O(n) memory to compute flood fill, where n is the number of cells, and it can be proven that it's impossible to do better (in general case). And after fill is complete you just return distance for any destination with O(1), it easy to see that it also can be done better.
So if you want to optimize performance, you can only focused on CODE LOCAL OPTIMIZATION. Which will not affect asymptotic but can significantly improve your real execution time. But it's hard to give you any advice for code optimization without actually seeing source.
So if you really want to see optimized code see the following (Pure C):
include
int* BFS()
{
int N, M; // Assume we have NxM grid.
int X, Y; // Start position. X, Y are unit based.
int i, j;
int movex[4] = {0, 0, 1, -1}; // Move on x dimension.
int movey[4] = {1, -1, 0, 0}; // Move on y dimension.
// TO DO: Read N, M, X, Y
// To reduce redundant functions calls and memory reallocation
// allocate all needed memory once and use a simple arrays.
int* map = (int*)malloc((N + 2) * (M + 2));
int leadDim = M + 2;
// Our map. We use one dimension array. map[x][y] = map[leadDim * x + y];
// If (x,y) is occupied then map[leadDim*x + y] = -1;
// If (x,y) is not visited map[leadDim*x + y] = -2;
int* queue = (int*)malloc(N*M);
int first = 0, last =1;
// Fill the boarders to simplify the code and reduce conditions
for (i = 0; i < N+2; ++i)
{
map[i * leadDim + 0] = -1;
map[i * leadDim + M + 1] = -1;
}
for (j = 0; j < M+2; ++j)
{
map[j] = -1;
map[(N + 1) * leadDim + j] = -1;
}
// TO DO: Read the map.
queue[first] = X * leadDim + Y;
map[X * leadDim + Y] = 0;
// Very simple optimized process loop.
while (first < last)
{
int current = queue[first];
int step = map[current];
for (i = 0; i < 4; ++i)
{
int temp = current + movex[i] * leadDim + movey[i];
if (map[temp] == -2) // only one condition in internal loop.
{
map[temp] = step + 1;
queue[last++] = temp;
}
}
++first;
}
free(queue);
return map;
}
Code may seems tricky. And of course, it doesn't look like OOP (I actually think that OOP fans will hate it) but if you want something really fast that's what you need.
It's common task for BFS. Complexity is O(cellsCount)
My c++ implementation:
vector<vector<int> > GetDistance(int x, int y, vector<vector<int> > cells)
{
const int INF = 0x7FFFFF;
vector<vector<int> > distance(cells.size());
for(int i = 0; i < distance.size(); i++)
distance[i].assign(cells[i].size(), INF);
queue<pair<int, int> > q;
q.push(make_pair(x, y));
distance[x][y] = 0;
while(!q.empty())
{
pair<int, int> curPoint = q.front();
q.pop();
int curDistance = distance[curPoint.first][curPoint.second];
for(int i = -1; i <= 1; i++)
for(int j = -1; j <= 1; j++)
{
if( (i + j) % 2 == 0 ) continue;
pair<int, int> nextPoint(curPoint.first + i, curPoint.second + j);
if(nextPoint.first >= 0 && nextPoint.first < cells.size()
&& nextPoint.second >= 0 && nextPoint.second < cells[nextPoint.first].size()
&& cells[nextPoint.first][nextPoint.second] != BARRIER
&& distance[nextPoint.first][nextPoint.second] > curDistance + 1)
{
distance[nextPoint.first][nextPoint.second] = curDistance + 1;
q.push(nextPoint);
}
}
}
return distance;
}
Start with a recursive implementation: (untested code)
int visit( int xy, int dist) {
int ret =1;
if (array[xy] <= dist) return 0;
array[xy] = dist;
if (dist == maxdist) return ret;
ret += visit ( RIGHT(xy) , dist+1);
...
same for left, up, down
...
return ret;
}
You'l need to handle the initalisation and the edge-cases. And you have to decide if you want a two dimentional array or a one dimensonal array.
A next step could be to use a todo list and remove the recursion, and a third step could be to add some bitmasking.
8-bit computers in the 1970s did this with an optimization that has the same algorithmic complexity, but in the typical case is much faster on actual hardware.
Starting from the initial square, scan to the left and right until "walls" are found. Now you have a "span" that is one square tall and N squares wide. Mark the span as "filled," in this case each square with the distance to the initial square.
For each square above and below the current span, if it's not a "wall" or already filled, pick it as the new origin of a span.
Repeat until no new spans are found.
Since horizontal rows tend to be stored contiguously in memory, this algorithm tends to thrash the cache far less than one that has no bias for horizontal searches.
Also, since in the most common cases far fewer items are pushed and popped from a stack (spans instead of individual blocks) there is less time spent maintaining the stack.
Given a set of points on a plane, find the shortest line segment formed by any two of these points.
How can I do that? The trivial way is obviously to calculate each distance, but I need another algorithm to compare.
http://en.wikipedia.org/wiki/Closest_pair_of_points
The problem can be solved in O(n log n) time using the recursive divide and conquer approach, e.g., as follows:
Sort points along the x-coordinate
Split the set of points into two equal-sized subsets by a vertical line x = xmid
Solve the problem recursively in the left and right subsets. This will give the left-side and right-side minimal distances dLmin and dRmin respectively.
Find the minimal distance dLRmin among the pair of points in which one point lies on the left of the dividing vertical and the second point lies to the right.
The final answer is the minimum among dLmin, dRmin, and dLRmin.
I can't immediately think of a quicker alternative than the brute force technique (although there must be plenty) but whatever algorithm you choose don't calculate the distance between each point. If you need to compare distances just compare the squares of the distances to avoid the expensive and entirely redundant square root.
One possibility would be to sort the points by their X coordinates (or the Y -- doesn't really matter which, just be consistent). You can then use that to eliminate comparisons to many of the other points. When you're looking at the distance between point[i] and point[j], if the X distance alone is greater than your current shortest distance, then point[j+1]...point[N] can be eliminated as well (assuming i<j -- if j<i, then it's point[0]...point[i] that are eliminated).
If your points start out as polar coordinates, you can use a variation of the same thing -- sort by distance from the origin, and if the difference in distance from the origin is greater than your current shortest distance, you can eliminate that point, and all the others that are farther from (or closer to) the origin than the one you're currently considering.
You can extract the closest pair in linear time from the Delaunay triangulation and conversly from Voronoi diagram.
There is a standard algorithm for this problem, here you can find it:
http://www.cs.mcgill.ca/~cs251/ClosestPair/ClosestPairPS.html
And here is my implementation of this algo, sorry it's without comments:
static long distSq(Point a, Point b) {
return ((long) (a.x - b.x) * (long) (a.x - b.x) + (long) (a.y - b.y) * (long) (a.y - b.y));
}
static long ccw(Point p1, Point p2, Point p3) {
return (long) (p2.x - p1.x) * (long) (p3.y - p1.y) - (long) (p2.y - p1.y) * (long) (p3.x - p1.x);
}
static List<Point> convexHull(List<Point> P) {
if (P.size() < 3) {
//WTF
return null;
}
int k = 0;
for (int i = 0; i < P.size(); i++) {
if (P.get(i).y < P.get(k).y || (P.get(i).y == P.get(k).y && P.get(i).x < P.get(k).x)) {
k = i;
}
}
Collections.swap(P, k, P.size() - 1);
final Point o = P.get(P.size() - 1);
P.remove(P.size() - 1);
Collections.sort(P, new Comparator() {
public int compare(Object o1, Object o2) {
Point a = (Point) o1;
Point b = (Point) o2;
long t1 = (long) (a.y - o.y) * (long) (b.x - o.x) - (long) (a.x - o.x) * (long) (b.y - o.y);
if (t1 == 0) {
long tt = distSq(o, a);
tt -= distSq(o, b);
if (tt > 0) {
return 1;
} else if (tt < 0) {
return -1;
}
return 0;
}
if (t1 < 0) {
return -1;
}
return 1;
}
});
List<Point> hull = new ArrayList<Point>();
hull.add(o);
hull.add(P.get(0));
for (int i = 1; i < P.size(); i++) {
while (hull.size() >= 2 &&
ccw(hull.get(hull.size() - 2), hull.get(hull.size() - 1), P.get(i)) <= 0) {
hull.remove(hull.size() - 1);
}
hull.add(P.get(i));
}
return hull;
}
static long nearestPoints(List<Point> P, int l, int r) {
if (r - l == P.size()) {
Collections.sort(P, new Comparator() {
public int compare(Object o1, Object o2) {
int t = ((Point) o1).x - ((Point) o2).x;
if (t == 0) {
return ((Point) o1).y - ((Point) o2).y;
}
return t;
}
});
}
if (r - l <= 100) {
long ret = distSq(P.get(l), P.get(l + 1));
for (int i = l; i < r; i++) {
for (int j = i + 1; j < r; j++) {
ret = Math.min(ret, distSq(P.get(i), P.get(j)));
}
}
return ret;
}
int c = (l + r) / 2;
long lD = nearestPoints(P, l, c);
long lR = nearestPoints(P, c + 1, r);
long ret = Math.min(lD, lR);
Set<Point> set = new TreeSet<Point>(new Comparator<Point>() {
public int compare(Point o1, Point o2) {
int t = o1.y - o2.y;
if (t == 0) {
return o1.x - o2.x;
}
return t;
}
});
for (int i = l; i < r; i++) {
set.add(P.get(i));
}
int x = P.get(c).x;
double theta = Math.sqrt(ret);
Point[] Q = set.toArray(new Point[0]);
Point[] T = new Point[Q.length];
int pos = 0;
for (int i = 0; i < Q.length; i++) {
if (Q[i].x - x + 1 > theta) {
continue;
}
T[pos++] = Q[i];
}
for (int i = 0; i < pos; i++) {
for (int j = 1; j < 7 && i + j < pos; j++) {
ret = Math.min(ret, distSq(T[i], T[j + i]));
}
}
return ret;
}
From your question it is not clear if you are looking for the distance of the segment, or the segment itself. Assuming you are looking for the distance (the segment in then a simple modification, once you know which are the two points whose distance is minimal), given 5 points, numbered from 1 to 5, you need to
compare 1 with 2,3,4,5, then
compare 2, with 3,4,5, then
compare 3 with 4,5, then
compare 4 with 5.
If I am not wrong, given the commutativity of the distance you do not need to perform other comparisons.
In python, may sound like something
import numpy as np
def find_min_distance_of_a_cloud(cloud):
"""
Given a cloud of points in the n-dim space, provides the minimal distance.
:param cloud: list of nX1-d vectors, as ndarray.
:return:
"""
dist_min = None
for i, p_i in enumerate(cloud[:-1]):
new_dist_min = np.min([np.linalg.norm(p_i - p_j) for p_j in cloud[(i + 1):]])
if dist_min is None or dist_min > new_dist_min:
dist_min = new_dist_min
return dist_min
That can be tested with something like the following code:
from nose.tools import assert_equal
def test_find_min_distance_of_a_cloud_1pt():
cloud = [np.array((1, 1, 1)), np.array((0, 0, 0))]
min_out = find_min_distance_of_a_cloud(cloud)
assert_equal(min_out, np.sqrt(3))
def test_find_min_distance_of_a_cloud_5pt():
cloud = [np.array((0, 0, 0)),
np.array((1, 1, 0)),
np.array((2, 1, 4)),
np.array((3, 4, 4)),
np.array((5, 3, 4))]
min_out = find_min_distance_of_a_cloud(cloud)
assert_equal(min_out, np.sqrt(2))
If more than two points can have the same minimal distance, and you are looking for the segments, you need again to modify the proposed code, and the output will be the list of points whose distance is minimal (or couple of points). Hope it helps!
Here is a code example demonstrating how to implement the divide and conquer algorithm. For the algorithm to work, the points x-values must be unique. The non-obvious part of the algorithm is that you must sort both along the x and the y-axis. Otherwise you can't find minimum distances over the split seam in linear time.
from collections import namedtuple
from itertools import combinations
from math import sqrt
IxPoint = namedtuple('IxPoint', ['x', 'y', 'i'])
ClosestPair = namedtuple('ClosestPair', ['distance', 'i', 'j'])
def check_distance(cp, p1, p2):
xd = p1.x - p2.x
yd = p1.y - p2.y
dist = sqrt(xd * xd + yd * yd)
if dist < cp.distance:
return ClosestPair(dist, p1.i, p2.i)
return cp
def closest_helper(cp, xs, ys):
n = len(xs)
if n <= 3:
for p1, p2 in combinations(xs, 2):
cp = check_distance(cp, p1, p2)
return cp
# Divide
mid = n // 2
mid_x = xs[mid].x
xs_left = xs[:mid]
xs_right = xs[mid:]
ys_left = [p for p in ys if p.x < mid_x]
ys_right = [p for p in ys if p.x >= mid_x]
# Conquer
cp_left = closest_helper(cp, xs_left, ys_left)
cp_right = closest_helper(cp, xs_right, ys_right)
if cp_left.distance < cp_right.distance:
cp = cp_left
else:
cp = cp_right
ys_strip = [p for p in ys if abs(p.x - mid_x) < cp.distance]
n_strip = len(ys_strip)
for i in range(n_strip):
for j in range(i + 1, n_strip):
p1, p2 = ys_strip[j], ys_strip[i]
if not p1.y - p2.y < cp.distance:
break
cp = check_distance(cp, p1, p2)
return cp
def closest_pair(points):
points = [IxPoint(p[0], p[1], i)
for (i, p) in enumerate(points)]
xs = sorted(points, key = lambda p: p.x)
xs = [IxPoint(p.x + i * 1e-8, p.y, p.i)
for (i, p) in enumerate(xs)]
ys = sorted(xs, key = lambda p: p.y)
cp = ClosestPair(float('inf'), -1, -1)
return closest_helper(cp, xs, ys)