How to find a unique set of closest pairs of points?

How to find a unique set of closest pairs of points? - algorithm

A and B are sets of m and n points respectively, where m<=n. I want to find a set of m unique points from B, named C, where the sum of distances between all [A(i), C(i)] pairs is the minimal.
To solve this without uniqueness constraint I can just find closest points from B to each point in A:
m = 5; n = 8; dim = 2;
A = rand(m, dim);
B = rand(n, dim);
D = pdist2(A, B);
[~, I] = min(D, [], 2);
C2 = B(I, :);
Where there may be repeated elements of B present in C. Now the first solution is brute-force search:
minSumD = inf;
allCombs = nchoosek(1:n, m);
for i = 1:size(allCombs, 1)
allPerms = perms(allCombs(i, :));
for j = 1:size(allPerms, 1)
ind = sub2ind([m n], 1:m, allPerms(j, :));
sumD = sum(D(ind));
if sumD<minSumD
minSumD = sumD;
I = allPerms(j, :);
end
end
end
C = B(I, :);
I think C2 (set of closest points to each A(i)) is pretty much alike C except for its repeated points. So how can I decrease the computation time?

Use a variant of the Hungarian algorithm, which computes a minimum/maximum weight perfect matching. Create n-m dummy points for the unused B points to match with (or, if you're willing to put in more effort, adapt the Hungarian algorithm machinery to non square matrices).

Related

How to vectorize these two nested loops?

Vectorize the loop without use of any for loop to get v:
v = zeros(10, 1)
for i = 1:10
for j = 1:10
v(i) = v(i) + A(i, j) * x(j)
end
end
A is a 10x10 matrix and x a 10x1 vector
I have been trying but was not able to figure out the right answer:
v = A * x;
v = Ax;
v = x' * A;
v = sum (A * x);

Proceed step by step, starting with the inner loop. The inner loop computes a dot product between A(i,:) and x(:). In Octave notation it can be expressed by a simple multiplication: v(i) = A(i,:)*x(:). So we are left with only one loop:
v = zeros(10, 1)
for i = 1:10
v(i) = A(i,:)*x(:)
end
Each iteration computes the ith element of v as the dot product between the ith row of A with x: we recognize here the classical matrix vector multiplication
v(:) = A(:,:)*x(:)
And since there are no more explicit indeces, all the : can be omitted (but as mentioned by #ChrisLuengo in the comments, it can be wise to keep it for x, as x(:) is always a column vector, even if x has been defined as a "row vector" (i.e. as a 1x10 matrix))
v = A*x

Triple loop into tensor matlab

I am aware of similar questions, but couldn't find a useful one for my problem
Is there a way to speed up my code by transforming this triple loop into matrix/tensor operations?
% preallocate
X_tensor = zeros(N, N, N);
% loop
for i = 1: N
for k = 1: N
for j = 1:N
X_tensor(i,k,j) = X(i,j) * B(i,k) * Btilde(k,j) / B(i,j);
end
end
end
EDIT
I am sorry, I forgot one important information:
X, B and Btilde are all NxN matrices

You can vectorize the operation by permuting the matrices and using element-wise multiplication/division.
i, k, and j are dimensions [1,2,3]. Looking at the first term in the equation, that means we want to take the [1,2,3] order of X (where the third dimension is a singleton) and rearrange it to [1,3,2]. This corresponds to an index of [i,j,k]. B in the second term is already in the required [1,2,3] order.
So the loops can be replaced by:
X_tensor2 = permute(X,[1,3,2]) .* B .* permute(Btilde,[3,1,2]) ./ permute(B,[1,3,2])
Here's a test program to verify correctness:
N = 5;
P = primes(400);
X = reshape(P(1:25),N,N);
B = reshape(P(26:50),N,N);
Btilde = reshape(P(51:75),N,N);
% preallocate
X_tensor = zeros(N, N, N);
X_tensor2 = zeros(N, N, N);
% loop
for i = 1: N
for k = 1: N
for j = 1:N
X_tensor(i,k,j) = X(i,j) * B(i,k) * Btilde(k,j) / B(i,j);
end
end
end
X_tensor2 = permute(X,[1,3,2]) .* B .* permute(Btilde,[3,1,2]) ./ permute(B,[1,3,2]);
isequal(X_tensor, X_tensor2)

Algorithm Problem: Finding all cells that has distance of K from some specific cells in a 2D grid [duplicate]

I am attempting to solve a coding challenge however my solution is not very performant, I'm looking for advice or suggestions on how I can improve my algorithm.
The puzzle is as follows:
You are given a grid of cells that represents an orchard, each cell can be either an empty spot (0) or a fruit tree (1). A farmer wishes to know how many empty spots there are within the orchard that are within k distance from all fruit trees.
Distance is counted using taxicab geometry, for example:
k = 1
[1, 0]
[0, 0]
the answer is 2 as only the bottom right spot is >k distance from all trees.
My solution goes something like this:
loop over grid and store all tree positions
BFS from the first tree position and store all empty spots until we reach a neighbour that is beyond k distance
BFS from the next tree position and store the intersection of empty spots
Repeat step 3 until we have iterated over all tree positions
Return the number of empty spots remaining after all intersections
I have found that for large grids with large values of k, my algorithm becomes very slow as I end up checking every spot in the grid multiple times. After doing some research, I found some solutions for similar problems that suggest taking the two most extreme target nodes and then only comparing distance to them:
https://www.codingninjas.com/codestudio/problem-details/count-nodes-within-k-distance_992849
https://www.geeksforgeeks.org/count-nodes-within-k-distance-from-all-nodes-in-a-set/
However this does not work for my challenge given certain inputs like below:
k = 4
[0, 0, 0, 1]
[0, 1, 0, 0]
[0, 0, 0, 0]
[1, 0, 0, 0]
[0, 0, 0, 0]
Using the extreme nodes approach, the bottom right empty spot is counted even though it is 5 distance away from the middle tree.
Could anyone point me towards a more efficient approach? I am still very new to these types of problems so I am finding it hard to see the next step I should take.

There is a simple, linear time solution to this problem because of the grid and distance structure. Given a fruit tree with coordinates (a, b), consider the 4 diagonal lines bounding the box of distance k around it. The diagonals going down and to the right have a constant value of x + y, while the diagonals going down and to the left have a constant value of x - y.
A point (x, y) is inside the box (and therefore, within distance k of (a, b)) if and only if:
a + b - k <= x + y <= a + b + k, and
a - b - k <= x - y <= a - b + k
So we can iterate over our fruit trees (a, b) to find four numbers:
first_max = max(a + b - k); first_min = min(a + b + k);
second_max = max(a - b - k); second_min = min(a - b + k);
where min and max are taken over all fruit trees. Then, iterate over empty cells (or do some math and subtract fruit tree counts, if your grid is enormous), counting how many empty spots (x,y) satisfy
first_max <= x + y <= first_min, and
second_max <= x - y <= second_min.
This Python code (written in a procedural style) illustrates this idea. Each diagonal of each bounding box cuts off exactly half of the plane, so this is equivalent to intersection of parallel half planes:
fruit_trees = [(a, b) for a in range(len(grid))
for b in range(len(grid[0]))
if grid[a][b] == 1]
northwest_half_plane = -infinity
southeast_half_plane = infinity
southwest_half_plane = -infinity
northeast_half_plane = infinity
for a, b in fruit_trees:
northwest_half_plane = max(northwest_half_plane, a - b - k)
southeast_half_plane = min(southeast_half_plane, a - b + k)
southwest_half_plane = max(southwest_half_plane, a + b - k)
northeast_half_plane = min(northeast_half_plane, a + b + k)
count = 0
for x in range(len(grid)):
for y in range(len(grid[0])):
if grid[x][y] == 0:
if (northwest_half_plane <= x - y <= southeast_half_plane
and southwest_half_plane <= x + y <= northeast_half_plane):
count += 1
print(count)
Some notes on the code: Technically the array coordinates are a quarter-turn rotated from the Cartesian coordinates of the picture, but that is immaterial here. The code is left deliberately bereft of certain 'optimizations' which may seem obvious, for two reasons: 1. The best optimization depends on the input format of fruit trees and the grid, and 2. The solution, while being simple in concept and simple to read, is not simple to get right while writing, and it's important that the code be 'obviously correct'. Things like 'exit early and return 0 if a lower bound exceeds an upper bound' can be added later if the performance is necessary.

As Answered by #kcsquared ,Providing an implementation in JAVA
public int solutionGrid(int K, int [][]A){
int m=A.length;
int n=A[0].length;
int k=K;
//to store the house coordinates
Set<String> houses=new HashSet<>();
//Find the house and store the coordinates
for(int i=0;i<m;i++) {
for (int j = 0; j < n; j++) {
if (A[i][j] == 1) {
houses.add(i + "&" + j);
}
}
}
int northwest_half_plane = Integer.MIN_VALUE;
int southeast_half_plane = Integer.MAX_VALUE;
int southwest_half_plane = Integer.MIN_VALUE;
int northeast_half_plane = Integer.MAX_VALUE;
for(String ele:houses){
String arr[]=ele.split("&");
int a=Integer.valueOf(arr[0]);
int b=Integer.valueOf(arr[1]);
northwest_half_plane = Math.max(northwest_half_plane, a - b - k);
southeast_half_plane = Math.min(southeast_half_plane, a - b + k);
southwest_half_plane = Math.max(southwest_half_plane, a + b - k);
northeast_half_plane = Math.min(northeast_half_plane, a + b + k);
}
int count = 0;
for(int x=0;x<m;x++) {
for (int y = 0; y < n; y++) {
if (A[x][y] == 0){
if ((northwest_half_plane <= x - y && x - y <= southeast_half_plane)
&& southwest_half_plane <= x + y && x + y <= northeast_half_plane){
count += 1;
}
}
}
}
return count;
}

This wouldn't be easy to implement but could be sublinear for many cases, and at most linear. Consider representing the perimeter of each tree as four corners (they mark a square rotated 45 degrees). For each tree compute it's perimeter intersection with the current intersection. The difficulty comes with managing the corners of the intersection, which could include more than one point because of the diagonal alignments. Run inside the final intersection to count how many empty spots are within it.

Since you are using taxicab distance, BFS is unneccesary. You can compute the distance between an empty spot and a tree directly.
This algorithm is based on a suggestion by https://stackoverflow.com/users/3080723/stef
// select tree near top left corner
SET flag false
LOOP r over rows
LOOP c over columns
IF tree at c, r
SET t to tree at c,r
SET flag true
BREAK
IF flag
BREAK
LOOP s over empty spots
Calculate distance between s and t
IF distance <= k
ADD s to spotlist
LOOP s over spotlist
LOOP t over trees, starting at bottom right corner
Calculate distance between s and t
IF distance > k
REMOVE s from spotlist
BREAK
RETURN spotlist

Count nodes within k distance of marked nodes in grid

I am attempting to solve a coding challenge however my solution is not very performant, I'm looking for advice or suggestions on how I can improve my algorithm.
The puzzle is as follows:
You are given a grid of cells that represents an orchard, each cell can be either an empty spot (0) or a fruit tree (1). A farmer wishes to know how many empty spots there are within the orchard that are within k distance from all fruit trees.
Distance is counted using taxicab geometry, for example:
k = 1
[1, 0]
[0, 0]
the answer is 2 as only the bottom right spot is >k distance from all trees.
My solution goes something like this:
loop over grid and store all tree positions
BFS from the first tree position and store all empty spots until we reach a neighbour that is beyond k distance
BFS from the next tree position and store the intersection of empty spots
Repeat step 3 until we have iterated over all tree positions
Return the number of empty spots remaining after all intersections
I have found that for large grids with large values of k, my algorithm becomes very slow as I end up checking every spot in the grid multiple times. After doing some research, I found some solutions for similar problems that suggest taking the two most extreme target nodes and then only comparing distance to them:
https://www.codingninjas.com/codestudio/problem-details/count-nodes-within-k-distance_992849
https://www.geeksforgeeks.org/count-nodes-within-k-distance-from-all-nodes-in-a-set/
However this does not work for my challenge given certain inputs like below:
k = 4
[0, 0, 0, 1]
[0, 1, 0, 0]
[0, 0, 0, 0]
[1, 0, 0, 0]
[0, 0, 0, 0]
Using the extreme nodes approach, the bottom right empty spot is counted even though it is 5 distance away from the middle tree.
Could anyone point me towards a more efficient approach? I am still very new to these types of problems so I am finding it hard to see the next step I should take.

There is a simple, linear time solution to this problem because of the grid and distance structure. Given a fruit tree with coordinates (a, b), consider the 4 diagonal lines bounding the box of distance k around it. The diagonals going down and to the right have a constant value of x + y, while the diagonals going down and to the left have a constant value of x - y.
A point (x, y) is inside the box (and therefore, within distance k of (a, b)) if and only if:
a + b - k <= x + y <= a + b + k, and
a - b - k <= x - y <= a - b + k
So we can iterate over our fruit trees (a, b) to find four numbers:
first_max = max(a + b - k); first_min = min(a + b + k);
second_max = max(a - b - k); second_min = min(a - b + k);
where min and max are taken over all fruit trees. Then, iterate over empty cells (or do some math and subtract fruit tree counts, if your grid is enormous), counting how many empty spots (x,y) satisfy
first_max <= x + y <= first_min, and
second_max <= x - y <= second_min.
This Python code (written in a procedural style) illustrates this idea. Each diagonal of each bounding box cuts off exactly half of the plane, so this is equivalent to intersection of parallel half planes:
fruit_trees = [(a, b) for a in range(len(grid))
for b in range(len(grid[0]))
if grid[a][b] == 1]
northwest_half_plane = -infinity
southeast_half_plane = infinity
southwest_half_plane = -infinity
northeast_half_plane = infinity
for a, b in fruit_trees:
northwest_half_plane = max(northwest_half_plane, a - b - k)
southeast_half_plane = min(southeast_half_plane, a - b + k)
southwest_half_plane = max(southwest_half_plane, a + b - k)
northeast_half_plane = min(northeast_half_plane, a + b + k)
count = 0
for x in range(len(grid)):
for y in range(len(grid[0])):
if grid[x][y] == 0:
if (northwest_half_plane <= x - y <= southeast_half_plane
and southwest_half_plane <= x + y <= northeast_half_plane):
count += 1
print(count)
Some notes on the code: Technically the array coordinates are a quarter-turn rotated from the Cartesian coordinates of the picture, but that is immaterial here. The code is left deliberately bereft of certain 'optimizations' which may seem obvious, for two reasons: 1. The best optimization depends on the input format of fruit trees and the grid, and 2. The solution, while being simple in concept and simple to read, is not simple to get right while writing, and it's important that the code be 'obviously correct'. Things like 'exit early and return 0 if a lower bound exceeds an upper bound' can be added later if the performance is necessary.

As Answered by #kcsquared ,Providing an implementation in JAVA
public int solutionGrid(int K, int [][]A){
int m=A.length;
int n=A[0].length;
int k=K;
//to store the house coordinates
Set<String> houses=new HashSet<>();
//Find the house and store the coordinates
for(int i=0;i<m;i++) {
for (int j = 0; j < n; j++) {
if (A[i][j] == 1) {
houses.add(i + "&" + j);
}
}
}
int northwest_half_plane = Integer.MIN_VALUE;
int southeast_half_plane = Integer.MAX_VALUE;
int southwest_half_plane = Integer.MIN_VALUE;
int northeast_half_plane = Integer.MAX_VALUE;
for(String ele:houses){
String arr[]=ele.split("&");
int a=Integer.valueOf(arr[0]);
int b=Integer.valueOf(arr[1]);
northwest_half_plane = Math.max(northwest_half_plane, a - b - k);
southeast_half_plane = Math.min(southeast_half_plane, a - b + k);
southwest_half_plane = Math.max(southwest_half_plane, a + b - k);
northeast_half_plane = Math.min(northeast_half_plane, a + b + k);
}
int count = 0;
for(int x=0;x<m;x++) {
for (int y = 0; y < n; y++) {
if (A[x][y] == 0){
if ((northwest_half_plane <= x - y && x - y <= southeast_half_plane)
&& southwest_half_plane <= x + y && x + y <= northeast_half_plane){
count += 1;
}
}
}
}
return count;
}

This wouldn't be easy to implement but could be sublinear for many cases, and at most linear. Consider representing the perimeter of each tree as four corners (they mark a square rotated 45 degrees). For each tree compute it's perimeter intersection with the current intersection. The difficulty comes with managing the corners of the intersection, which could include more than one point because of the diagonal alignments. Run inside the final intersection to count how many empty spots are within it.

Since you are using taxicab distance, BFS is unneccesary. You can compute the distance between an empty spot and a tree directly.
This algorithm is based on a suggestion by https://stackoverflow.com/users/3080723/stef
// select tree near top left corner
SET flag false
LOOP r over rows
LOOP c over columns
IF tree at c, r
SET t to tree at c,r
SET flag true
BREAK
IF flag
BREAK
LOOP s over empty spots
Calculate distance between s and t
IF distance <= k
ADD s to spotlist
LOOP s over spotlist
LOOP t over trees, starting at bottom right corner
Calculate distance between s and t
IF distance > k
REMOVE s from spotlist
BREAK
RETURN spotlist

Efficient way to find min(max(A[L], A[L+1],...,A[R]), min(B[L], B[L+1],…, B[R]))

Given 2 array A[N] and B[N]. For each 0 <= L < N <= 5e5, find the maximum value of
min(max(A[L], A[L+1],...,A[R]), min(B[L], B[L+1],…, B[R]))
for L <= R <= N.
ans[L] is the answer for L.
For example,
N = 3
A[3] = {3, 2, 1}
B[3] = {3, 2, 3}
So, the answer is
ans[0] = 3
ans[1] = 2
ans[2] = 1
It is clear that brute-forces can run fast.
Then, I tried using Sparse table, Segment Tree or Binary Indexed Tree (and we don't need to update anything, so I choose Sparse Table). But for each L, we don't know R, so I need to run to the end of the array, and this doesn't different from brute-forces .
Is there any efficient algorithm or data structures for this problem??
P/s: Sorry for my bad English.

Using Sparse table A is monotone increasing, B is monotone decreasing, so we need to find the crossing point to get the max out of their min ...
pseudo python code untested
stA = SparseTable(A);
stB = SparseTable(B);
for (i in range(len(A))
r = len(B)
l = i
a = stA.max(l,r)
b = stB.min(l,r)
# binary search for crossing point
while (l != r)
m = l + (r-l)//2 # integer division
a = stA.max(l,m)
b = stB.min(l,m)
if (b > a)
l = m + 1
else
r = m
ans[i] = min(a,b) # might be off-by-one m?

max(A[L], A[L+1], ..., A[R]) is non-increasing in L and non-decreasing in R. Conversely, min(B[L], B[L+1], ..., B[R]) is non-decreasing in L and non-increasing in R. It follows that the function from L to the argmax in R is non-decreasing. The last ingredient is two queues, one that can report max, one that can report min, to quickly compute the sliding window aggregates.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to find a unique set of closest pairs of points? - algorithm

Use a variant of the Hungarian algorithm, which computes a minimum/maximum weight perfect matching. Create n-m dummy points for the unused B points to match with (or, if you're willing to put in more effort, adapt the Hungarian algorithm machinery to non square matrices).

Related

How to vectorize these two nested loops?

Triple loop into tensor matlab

Algorithm Problem: Finding all cells that has distance of K from some specific cells in a 2D grid [duplicate]

Count nodes within k distance of marked nodes in grid

Efficient way to find min(max(A[L], A[L+1],...,A[R]), min(B[L], B[L+1],…, B[R]))

Categories

Resources