How to partition 2D-points into intervals (using only vertical lines)? - algorithm

So I have a 2D scatter filled with points (x,y). I want to draw k vertical lines (x_1 = a, x_2 = b, ..., x_k = k), so as to partition the points into k groups.
The optimal solution would minimize the average variance of each group's y_value.
What is the appropriate algorithm? It sounded like k-means but I have the constraint that the lines must be vertical.

Here is an idea based on dynamic programming.
With the following notations:
(x_1, y_1), ..., (x_n, y_n) the points, with x_1 <= x_2 <= ... <= x_n to cut in K groups.
Var(i, j) the variance of the y's: y_i, ..., y_j.
F_K((x_1,y_1), ..., (x_n, y_n)) = F_k(1,n) the value of the best solution for the problem.
Then we have the following:
F_k(i,j) = min for l in i...j-k+1 of (Var(i,l) + F_(k-1)(l+1, j) and
F_1(i,j) = Var(i,j).
The property above simply means that the best way to split your points in 'k' groups is to select the leftmost cut (the choice of l), and the best choice of k-1 cuts for the remaining points.
From there you can go for a dynamic program. You'll need a 3D array A of dimensions n*n*K to store the value of F_k(i,j) for all i,j,k.
The program would look like:
function get_value(P: points, A: 3D array, i, j, k){
if A[i][j][k] is defined{
result = A[i][j][k]
} else if k == 1 {
A[i][j][k] = get_var(P, i, j)
result = A[i][j][k]
} else {
result = +INF
for l in i ... j-k+1 {
tmp = get_value(P, A, i, l, 1) + get_value(P, A, l+1, j, k-1)
if tmp < result {
result = tmp
}
}
}
return result
}
NB: I was a bit quick about the range to iterate on for l, that might be something to look into.

Related

Algorithm Problem: Finding all cells that has distance of K from some specific cells in a 2D grid [duplicate]

I am attempting to solve a coding challenge however my solution is not very performant, I'm looking for advice or suggestions on how I can improve my algorithm.
The puzzle is as follows:
You are given a grid of cells that represents an orchard, each cell can be either an empty spot (0) or a fruit tree (1). A farmer wishes to know how many empty spots there are within the orchard that are within k distance from all fruit trees.
Distance is counted using taxicab geometry, for example:
k = 1
[1, 0]
[0, 0]
the answer is 2 as only the bottom right spot is >k distance from all trees.
My solution goes something like this:
loop over grid and store all tree positions
BFS from the first tree position and store all empty spots until we reach a neighbour that is beyond k distance
BFS from the next tree position and store the intersection of empty spots
Repeat step 3 until we have iterated over all tree positions
Return the number of empty spots remaining after all intersections
I have found that for large grids with large values of k, my algorithm becomes very slow as I end up checking every spot in the grid multiple times. After doing some research, I found some solutions for similar problems that suggest taking the two most extreme target nodes and then only comparing distance to them:
https://www.codingninjas.com/codestudio/problem-details/count-nodes-within-k-distance_992849
https://www.geeksforgeeks.org/count-nodes-within-k-distance-from-all-nodes-in-a-set/
However this does not work for my challenge given certain inputs like below:
k = 4
[0, 0, 0, 1]
[0, 1, 0, 0]
[0, 0, 0, 0]
[1, 0, 0, 0]
[0, 0, 0, 0]
Using the extreme nodes approach, the bottom right empty spot is counted even though it is 5 distance away from the middle tree.
Could anyone point me towards a more efficient approach? I am still very new to these types of problems so I am finding it hard to see the next step I should take.
There is a simple, linear time solution to this problem because of the grid and distance structure. Given a fruit tree with coordinates (a, b), consider the 4 diagonal lines bounding the box of distance k around it. The diagonals going down and to the right have a constant value of x + y, while the diagonals going down and to the left have a constant value of x - y.
A point (x, y) is inside the box (and therefore, within distance k of (a, b)) if and only if:
a + b - k <= x + y <= a + b + k, and
a - b - k <= x - y <= a - b + k
So we can iterate over our fruit trees (a, b) to find four numbers:
first_max = max(a + b - k); first_min = min(a + b + k);
second_max = max(a - b - k); second_min = min(a - b + k);
where min and max are taken over all fruit trees. Then, iterate over empty cells (or do some math and subtract fruit tree counts, if your grid is enormous), counting how many empty spots (x,y) satisfy
first_max <= x + y <= first_min, and
second_max <= x - y <= second_min.
This Python code (written in a procedural style) illustrates this idea. Each diagonal of each bounding box cuts off exactly half of the plane, so this is equivalent to intersection of parallel half planes:
fruit_trees = [(a, b) for a in range(len(grid))
for b in range(len(grid[0]))
if grid[a][b] == 1]
northwest_half_plane = -infinity
southeast_half_plane = infinity
southwest_half_plane = -infinity
northeast_half_plane = infinity
for a, b in fruit_trees:
northwest_half_plane = max(northwest_half_plane, a - b - k)
southeast_half_plane = min(southeast_half_plane, a - b + k)
southwest_half_plane = max(southwest_half_plane, a + b - k)
northeast_half_plane = min(northeast_half_plane, a + b + k)
count = 0
for x in range(len(grid)):
for y in range(len(grid[0])):
if grid[x][y] == 0:
if (northwest_half_plane <= x - y <= southeast_half_plane
and southwest_half_plane <= x + y <= northeast_half_plane):
count += 1
print(count)
Some notes on the code: Technically the array coordinates are a quarter-turn rotated from the Cartesian coordinates of the picture, but that is immaterial here. The code is left deliberately bereft of certain 'optimizations' which may seem obvious, for two reasons: 1. The best optimization depends on the input format of fruit trees and the grid, and 2. The solution, while being simple in concept and simple to read, is not simple to get right while writing, and it's important that the code be 'obviously correct'. Things like 'exit early and return 0 if a lower bound exceeds an upper bound' can be added later if the performance is necessary.
As Answered by #kcsquared ,Providing an implementation in JAVA
public int solutionGrid(int K, int [][]A){
int m=A.length;
int n=A[0].length;
int k=K;
//to store the house coordinates
Set<String> houses=new HashSet<>();
//Find the house and store the coordinates
for(int i=0;i<m;i++) {
for (int j = 0; j < n; j++) {
if (A[i][j] == 1) {
houses.add(i + "&" + j);
}
}
}
int northwest_half_plane = Integer.MIN_VALUE;
int southeast_half_plane = Integer.MAX_VALUE;
int southwest_half_plane = Integer.MIN_VALUE;
int northeast_half_plane = Integer.MAX_VALUE;
for(String ele:houses){
String arr[]=ele.split("&");
int a=Integer.valueOf(arr[0]);
int b=Integer.valueOf(arr[1]);
northwest_half_plane = Math.max(northwest_half_plane, a - b - k);
southeast_half_plane = Math.min(southeast_half_plane, a - b + k);
southwest_half_plane = Math.max(southwest_half_plane, a + b - k);
northeast_half_plane = Math.min(northeast_half_plane, a + b + k);
}
int count = 0;
for(int x=0;x<m;x++) {
for (int y = 0; y < n; y++) {
if (A[x][y] == 0){
if ((northwest_half_plane <= x - y && x - y <= southeast_half_plane)
&& southwest_half_plane <= x + y && x + y <= northeast_half_plane){
count += 1;
}
}
}
}
return count;
}
This wouldn't be easy to implement but could be sublinear for many cases, and at most linear. Consider representing the perimeter of each tree as four corners (they mark a square rotated 45 degrees). For each tree compute it's perimeter intersection with the current intersection. The difficulty comes with managing the corners of the intersection, which could include more than one point because of the diagonal alignments. Run inside the final intersection to count how many empty spots are within it.
Since you are using taxicab distance, BFS is unneccesary. You can compute the distance between an empty spot and a tree directly.
This algorithm is based on a suggestion by https://stackoverflow.com/users/3080723/stef
// select tree near top left corner
SET flag false
LOOP r over rows
LOOP c over columns
IF tree at c, r
SET t to tree at c,r
SET flag true
BREAK
IF flag
BREAK
LOOP s over empty spots
Calculate distance between s and t
IF distance <= k
ADD s to spotlist
LOOP s over spotlist
LOOP t over trees, starting at bottom right corner
Calculate distance between s and t
IF distance > k
REMOVE s from spotlist
BREAK
RETURN spotlist

Count nodes within k distance of marked nodes in grid

I am attempting to solve a coding challenge however my solution is not very performant, I'm looking for advice or suggestions on how I can improve my algorithm.
The puzzle is as follows:
You are given a grid of cells that represents an orchard, each cell can be either an empty spot (0) or a fruit tree (1). A farmer wishes to know how many empty spots there are within the orchard that are within k distance from all fruit trees.
Distance is counted using taxicab geometry, for example:
k = 1
[1, 0]
[0, 0]
the answer is 2 as only the bottom right spot is >k distance from all trees.
My solution goes something like this:
loop over grid and store all tree positions
BFS from the first tree position and store all empty spots until we reach a neighbour that is beyond k distance
BFS from the next tree position and store the intersection of empty spots
Repeat step 3 until we have iterated over all tree positions
Return the number of empty spots remaining after all intersections
I have found that for large grids with large values of k, my algorithm becomes very slow as I end up checking every spot in the grid multiple times. After doing some research, I found some solutions for similar problems that suggest taking the two most extreme target nodes and then only comparing distance to them:
https://www.codingninjas.com/codestudio/problem-details/count-nodes-within-k-distance_992849
https://www.geeksforgeeks.org/count-nodes-within-k-distance-from-all-nodes-in-a-set/
However this does not work for my challenge given certain inputs like below:
k = 4
[0, 0, 0, 1]
[0, 1, 0, 0]
[0, 0, 0, 0]
[1, 0, 0, 0]
[0, 0, 0, 0]
Using the extreme nodes approach, the bottom right empty spot is counted even though it is 5 distance away from the middle tree.
Could anyone point me towards a more efficient approach? I am still very new to these types of problems so I am finding it hard to see the next step I should take.
There is a simple, linear time solution to this problem because of the grid and distance structure. Given a fruit tree with coordinates (a, b), consider the 4 diagonal lines bounding the box of distance k around it. The diagonals going down and to the right have a constant value of x + y, while the diagonals going down and to the left have a constant value of x - y.
A point (x, y) is inside the box (and therefore, within distance k of (a, b)) if and only if:
a + b - k <= x + y <= a + b + k, and
a - b - k <= x - y <= a - b + k
So we can iterate over our fruit trees (a, b) to find four numbers:
first_max = max(a + b - k); first_min = min(a + b + k);
second_max = max(a - b - k); second_min = min(a - b + k);
where min and max are taken over all fruit trees. Then, iterate over empty cells (or do some math and subtract fruit tree counts, if your grid is enormous), counting how many empty spots (x,y) satisfy
first_max <= x + y <= first_min, and
second_max <= x - y <= second_min.
This Python code (written in a procedural style) illustrates this idea. Each diagonal of each bounding box cuts off exactly half of the plane, so this is equivalent to intersection of parallel half planes:
fruit_trees = [(a, b) for a in range(len(grid))
for b in range(len(grid[0]))
if grid[a][b] == 1]
northwest_half_plane = -infinity
southeast_half_plane = infinity
southwest_half_plane = -infinity
northeast_half_plane = infinity
for a, b in fruit_trees:
northwest_half_plane = max(northwest_half_plane, a - b - k)
southeast_half_plane = min(southeast_half_plane, a - b + k)
southwest_half_plane = max(southwest_half_plane, a + b - k)
northeast_half_plane = min(northeast_half_plane, a + b + k)
count = 0
for x in range(len(grid)):
for y in range(len(grid[0])):
if grid[x][y] == 0:
if (northwest_half_plane <= x - y <= southeast_half_plane
and southwest_half_plane <= x + y <= northeast_half_plane):
count += 1
print(count)
Some notes on the code: Technically the array coordinates are a quarter-turn rotated from the Cartesian coordinates of the picture, but that is immaterial here. The code is left deliberately bereft of certain 'optimizations' which may seem obvious, for two reasons: 1. The best optimization depends on the input format of fruit trees and the grid, and 2. The solution, while being simple in concept and simple to read, is not simple to get right while writing, and it's important that the code be 'obviously correct'. Things like 'exit early and return 0 if a lower bound exceeds an upper bound' can be added later if the performance is necessary.
As Answered by #kcsquared ,Providing an implementation in JAVA
public int solutionGrid(int K, int [][]A){
int m=A.length;
int n=A[0].length;
int k=K;
//to store the house coordinates
Set<String> houses=new HashSet<>();
//Find the house and store the coordinates
for(int i=0;i<m;i++) {
for (int j = 0; j < n; j++) {
if (A[i][j] == 1) {
houses.add(i + "&" + j);
}
}
}
int northwest_half_plane = Integer.MIN_VALUE;
int southeast_half_plane = Integer.MAX_VALUE;
int southwest_half_plane = Integer.MIN_VALUE;
int northeast_half_plane = Integer.MAX_VALUE;
for(String ele:houses){
String arr[]=ele.split("&");
int a=Integer.valueOf(arr[0]);
int b=Integer.valueOf(arr[1]);
northwest_half_plane = Math.max(northwest_half_plane, a - b - k);
southeast_half_plane = Math.min(southeast_half_plane, a - b + k);
southwest_half_plane = Math.max(southwest_half_plane, a + b - k);
northeast_half_plane = Math.min(northeast_half_plane, a + b + k);
}
int count = 0;
for(int x=0;x<m;x++) {
for (int y = 0; y < n; y++) {
if (A[x][y] == 0){
if ((northwest_half_plane <= x - y && x - y <= southeast_half_plane)
&& southwest_half_plane <= x + y && x + y <= northeast_half_plane){
count += 1;
}
}
}
}
return count;
}
This wouldn't be easy to implement but could be sublinear for many cases, and at most linear. Consider representing the perimeter of each tree as four corners (they mark a square rotated 45 degrees). For each tree compute it's perimeter intersection with the current intersection. The difficulty comes with managing the corners of the intersection, which could include more than one point because of the diagonal alignments. Run inside the final intersection to count how many empty spots are within it.
Since you are using taxicab distance, BFS is unneccesary. You can compute the distance between an empty spot and a tree directly.
This algorithm is based on a suggestion by https://stackoverflow.com/users/3080723/stef
// select tree near top left corner
SET flag false
LOOP r over rows
LOOP c over columns
IF tree at c, r
SET t to tree at c,r
SET flag true
BREAK
IF flag
BREAK
LOOP s over empty spots
Calculate distance between s and t
IF distance <= k
ADD s to spotlist
LOOP s over spotlist
LOOP t over trees, starting at bottom right corner
Calculate distance between s and t
IF distance > k
REMOVE s from spotlist
BREAK
RETURN spotlist

Finding median in merged array of two sorted arrays

Assume we have 2 sorted arrays of integers with sizes of n and m. What is the best way to find median of all m + n numbers?
It's easy to do this with log(n) * log(m) complexity. But i want to solve this problem in log(n) + log(m) time. So is there any suggestion to solve this problem?
Explanation
The key point of this problem is to ignore half part of A and B each step recursively by comparing the median of remaining A and B:
if (aMid < bMid) Keep [aMid +1 ... n] and [bLeft ... m]
else Keep [bMid + 1 ... m] and [aLeft ... n]
// where n and m are the length of array A and B
As the following: time complexity is O(log(m + n))
public double findMedianSortedArrays(int[] A, int[] B) {
int m = A.length, n = B.length;
int l = (m + n + 1) / 2;
int r = (m + n + 2) / 2;
return (getkth(A, 0, B, 0, l) + getkth(A, 0, B, 0, r)) / 2.0;
}
public double getkth(int[] A, int aStart, int[] B, int bStart, int k) {
if (aStart > A.length - 1) return B[bStart + k - 1];
if (bStart > B.length - 1) return A[aStart + k - 1];
if (k == 1) return Math.min(A[aStart], B[bStart]);
int aMid = Integer.MAX_VALUE, bMid = Integer.MAX_VALUE;
if (aStart + k/2 - 1 < A.length) aMid = A[aStart + k/2 - 1];
if (bStart + k/2 - 1 < B.length) bMid = B[bStart + k/2 - 1];
if (aMid < bMid)
return getkth(A, aStart + k / 2, B, bStart, k - k / 2); // Check: aRight + bLeft
else
return getkth(A, aStart, B, bStart + k / 2, k - k / 2); // Check: bRight + aLeft
}
Hope it helps! Let me know if you need more explanation on any part.
Here's a very good solution I found in Java on Stack Overflow. It's a method of finding the K and K+1 smallest items in the two arrays where K is the center of the merged array.
If you have a function for finding the Kth item of two arrays then finding the median of the two is easy;
Calculate the weighted average of the Kth and Kth+1 items of X and Y
But then you'll need a way to find the Kth item of two lists; (remember we're one indexing now)
If X contains zero items then the Kth smallest item of X and Y is the Kth smallest item of Y
Otherwise if K == 2 then the second smallest item of X and Y is the smallest of the smallest items of X and Y (min(X[0], Y[0]))
Otherwise;
i. Let A be min(length(X), K / 2)
ii. Let B be min(length(Y), K / 2)
iii. If the X[A] > Y[B] then recurse from step 1. with X, Y' with all elements of Y from B to the end of Y and K' = K - B, otherwise recurse with X' with all elements of X from A to the end of X, Y and K' = K - A
If I find the time tomorrow I will verify that this algorithm works in Python as stated and provide the example source code, it may have some off-by-one errors as-is.
Take the median element in list A and call it a. Compare a to the center elements in list B. Lets call them b1 and b2 (if B has odd length then exactly where you split b depends on your definition of the median of an even length list, but the procedure is almost identical regardless). if b1&leq;a&leq;b2 then a is the median of the merged array. This can be done in constant time since it requires exactly two comparisons.
If a is greater than b2 then we add the top half of A to the top of B and repeat. B will no longer be sorted, but it doesn't matter. If a is less than b1 then we add the bottom half of A to the bottom of B and repeat. These will iterate log(n) times at most (if the median is found sooner then stop, of course).
It is possible that this will not find the median. If this is the case then the median is in B. If so, perform the same algorithm with A and B reversed. This will require log(m) iterations. In total you will have performed at most 2*(log(n)+log(m)) iterations of a constant time operation, so you have solved the problem in order log(n)+log(m) time.
This is essentially the same answer as was given by iehrlich, but written out more explicitly.
Yes, this can be done. Given two arrays, A and B, in the worst-case scenario you have to first perform a binary search in A, and then, if it fails, binary search in B looking for the median. On each step of a binary search, you check if the current element is actually a median of a merged A+B array. Such check takes constant time.
Let's see why such check is constant. For simplicity, let's assume that |A| + |B| is an odd number, and that all numbers in both arrays are different. You can remove these restrictions later by applying the usual median definition approach (i.e., how to calculate the median of an array containing duplicates, or of an array with even length). Anyway, given that, we know for sure, that in the merged array there will be (|A| + |B| - 1) / 2 elements to the right and to the left of an actual median. In the process of a binary search in A, we know the index of current element x in array A (let it be i). Now, if x satisfies the condition B[j] < x < B[j+1], where i + j == (|A| + |B| - 1) / 2, then x is your median.
The overall complexity is O(log(max(|A|, |B|)) time and O(1) memory.

Efficient Algorithm to Compute The Summation of the Function

We are given N points of the form (x,y) and we need to compute the following function:
F(i,j) = ( | X[i] - X[j] | ) * ( | Y[i] - Y[j] | )
Compute Summation of F(i,j) for all ordered pairs (i,j)
N <= 300000
I am looking for a O(N log N) solution.
My initial thought was to sort the points by X and then use a BIT but I'm not being able to formulate a clear solution.
I have a solution using O(N log(M)) time and O(M) memory, where M is the size of range of Y. It's similar to what you are thinking.
First sort the points so that the X coordinates are increasing.
Let's write A for the sum of (X[i] - X[j]) * (Y[i] - Y[j]) for all pairs i > j such that Y[i] > Y[j], and B for the sum of the same expression for all pairs i > j such that Y[i] < Y[j].
The sum A + B can be calculated easily in O(N) time, and the final answer can be calculated from A - B. Thus it suffices to calculate A.
Now create a binary indexed tree, whose nodes are indexed by intevals of the form [a, b) with b = a + 2^k for some k. (Not a good sentance, but you know what I mean, right?) The root node should cover the inteval [Y_min, Y_max] of possible values of Y.
For any node indexed by [a, b) and for any i, let f(a, b, i) be the following polynomial:
f(a, b, i)(X, Y) = sum of (X - X[j]) * (Y - Y[j]) for all j such that j < i and Y[j] < Y
It is of the form P * XY + Q * X + R * Y + S, thus such a polynomial can be represented by the four numbers P, Q, R, S.
Now beginning with i = 0, you may calculate f(a, b, i)(X[i], Y[i]). To go from i to i + 1, you only need to update those intevals [a, b) containing Y[i]. When you reach i = N, the value of A is calculated.
If you can afford O(M) memory, then this should work fine.

Path of Length N in graph with constraints

I want to find number of path of length N in a graph where the vertex can be any natural number. However two vertex are connected only if the product of the two vertices is less than some natural number P. If the product of two vertexes are greater than P than those are not connected and can't be reached from one other.
I can obviously run two nested loops (<= P) and create an adjacency matrix, but P can be extremely large and this approach would be extremely slow. Can anyone think of some optimal approach to solve the problem? Can we solve it using Dynamic Programming?
I agree with Ante's recurrence, although I used a slightly simplified version. Note that I'm using the letter P to name the maximum product, as it is used in the original problem statement:
f(1,x) = 1
f(i,x) = sum(f(i-1, y) for y in {1, ..., floor(P/x)})
f(i,x) is the number of sequences of length i that end with x. The answer to the question is then f(n+1, 1).
Of course since P can be up to 10^9 in this task, a straightforward implementation with a DP table is out of the question. However, there are only up to m < 70000 possible different values of floor(P/i). So let's find the maximal segments aj ... bj, where floor(P/aj) = floor(P/bj). We can find those segments in O(number of segments * log P) using binary search.
Imagine the full DP table for f. Since there are only m different values for floor(P/x), every row of f consists of m contiguous ranges that have the same value.
So let's compute the compressed DP table, where we represent the rows as list of (length, value) pairs. We start with f(1) = [(P, 1)] and we can compute f(i+1) from f(i) by processing the segments in increasing order and computing prefix sums of the lengths stored in f(i).
The total runtime of my implementation of this approach is O(m (log P + n)). This is the code I used:
using ll=long long;
const int mod = 1000000007;
void add(int& x, ll y) { x = (x+y)%mod; }
int main() {
int n, P;
cin >> n >> P;
int x = 1;
vector<pair<int,int>> segments;
while(x <= P) {
int y = x+1, hi = P+1;
while(y<hi) {
int mid = (y+hi)/2;
if (P/mid < P/x) hi=mid;
else y=mid+1;
}
segments.push_back(make_pair(P/x, y-x));
x = y;
}
reverse(begin(segments), end(segments));
vector<pair<int,int>> dp;
dp.push_back(make_pair(P,1));
for (int i = 1; i <= n; ++i) {
int j = 0;
int sum_smaller = 0, cnt_smaller = 0;
vector<pair<int,int>> dp2;
for (auto it : segments) {
int value = it.first, cnt = it.second;
while (cnt_smaller + dp[j].first <= value) {
cnt_smaller += dp[j].first;
add(sum_smaller,(ll)dp[j].first*dp[j].second);
j++;
}
int pref_sum = sum_smaller;
if (value > cnt_smaller)
add(pref_sum, (ll)(value - cnt_smaller)*dp[j].second);
dp2.push_back(make_pair(cnt, pref_sum));
}
dp = dp2;
reverse(begin(dp),end(dp));
}
cout << dp[0].second << endl;
}
I needed to do some micro-optimizations with the handling of the arrays to get AC, but those aren't really relevant, so I left them away.
If number of vertices is small than adjacency matrix (A) can help. Since sum of elements in A^N is number of distinct paths, if paths are oriented. If not than number of paths i sum of elements / 2. That is due an element (i,j) represents number of paths from vertex i to vertex j.
In this case, same approach can be done by DP, using reasoning that number of paths of length n from vertex v is sum of numbers of paths of length n-1 of all it's neighbours. Neigbours of vertex i are vertices from 1 to floor(Q/i). With that we can construct function N(vertex, length) which represent number of paths from given vertex with given length:
N(i, 1) = floor(Q/i),
N(i, n) = sum( N(j, n-1) for j in {1, ..., floor(Q/i)}.
Number of all oriented paths of length is sum( N(i,N) ).

Resources