Fast way to simulate operations on a matrix - algorithm

This is from an old Olympiad practice problem:
Imagine you have a 1000x1000 grid, in which the cell (i,j) contains the number i*j. (Rows and columns are numbered starting at 1.)
At each step, we build a new grid from the old one, in which each cell (i,j) contains the "neighborhood average" of (i,j) in the last grid. The "neighborhood average" is defined as the floor of the average values of the cell and its up to 8 neighbors. So for example if the 4 numbers in the corner of the grid were 1,2,5,7, in the next step the corner would be calculated as (1+2+5+7)/4 = 3.
Eventually we'll reach a point where all the numbers are the same and the grid doesn't change anymore. The goal is to figure out how many steps it takes to reach this point.
I tried simply simulating it but that doesn't work, because it seems that the answer is O(n^2) steps and each simulation step takes O(n^2) to process, resulting in O(n^4) which is too slow for n=1000.
Is there a faster way to do it?

A slightly faster way can be as follows:
If you notice for any cell that is not on the matrix border (x,y), it's original value shall be x*y.
Also, the value of the cell after 1st iteration shall be:
V1 = ( xy + x(y+1) + x(y-1)
+(x+1)y + (x+1)(y+1) + (x+1)(y-1)
+(x-1)y + (x-1)(y+1) + (x-1)(y-1)
) / 9
= xy
For the elements on the left vertical edge (not on the corners)
v2 = ( xy + (x-1)y + (x+1)y + x(y+1) + (x-1)(y+1) + (x+1)(y+1) ) / 6
= xy + x/2.
For the elements on the right vertical edge (not on the corners)
v3 = ( xy + (x-1)y + (x+1)y + x(y-1) + (x-1)(y-1) + (x+1)(y-1) ) / 6
= xy - x/2.
Similarly for top and bottom horizontal edges and corners.
Hence after the 1st iteration, only the border elements shall change their value, the non-border elements shall remain the same.
For subsequent iterations, this change shall be propagated from the borders inwards in the matrix.
So one obvious way you can reduce your computations a little is to only change those elements that you expect to get changed in the first N/2 iterations. Note: by doing this the complexity shall not change IMO but the constant factor shall reduce.
Another possible way that you can consider is as follows:
You know that the centre-most element shall be unchanged till N/2 iterations.
So you may think of a way to jumptstart your iterations by starting from the centre-most element outwards.
That it, if you can find out an incremental mathematical formula for the change in elements after N/2 iterations, you may reduce the complexity of your algorithm by a factor of N.

The "floor" step makes me suspect an analytical solution is unlikely, and that this actually a micro-optimization exercise. Here is my idea.
Let's ignore the corners and edges for a moment. There are only 3996 of them and they will need special treatment anyway.
For an interior cell, you need to add 9 elements to get its next state. But turn that around, and say: Each interior cell has to be part of 8 additions.
Or does it? Start with three consecutive rows A[i], B[i], and C[i], and compute three new rows:
A'[i] = A[i-1] + A[i] + A[i+1]
B'[i] = B[i-1] + B[i] + B[i+1]
C'[i] = C[i-1] + C[i] + C[i+1]
(Note that you can compute each of these slightly faster with a "sliding window", since A'[i+1] = A'[i] - A[i-1] + A[i+1]. Same number of arithmetic operations but fewer loads.)
Now, to get the new value at location B[j], you just compute A'[j] + B'[j] + C'[j].
So far, we have not saved any work; we have just reordered the additions.
But now, having computing the updated row B, you can throw away A' and compute the next row:
D'[i] = D[i-1] + D[i] + D[i+1]
...which you can use with arrays B' and C' to compute the new values for row C without recomputing B' or C'. (You would implement this by shifting row B' and C' to become A' and B', of course... But this way was easier to explain. Maybe. I think.)
For each row, say B, we scan it once to produce B' doing 2n arithmetic operations, and a second time to compute the updated B which also takes 2n operations, so in total we do four additions/subtractions per element instead of eight.
Of course, in practice, you would compute C' while updating B for the same number of operations but better locality.
That's the only structural idea I have. A SIMD optimization expert might have other suggestions...

If you look at the initial matrix you'll notice that it's symmetric i.e. m[i][j] = m[j][i]. Therefore the neighbors of m[i][j] will have the same values as the neighbors of m[j][i], so you only need to calculate values for a little more than half the matrix for each step.
This optimization reduces the # of calculations per grid from N^2 to ((N^2)+N)/2.

Related

How many rectangles contain exactly k ones on a grid N*M

You are given a grid of 0's and 1's and it's dimension 1 ≤ N,M ≤ 2500 and a number 0 ≤ K ≤ 6. The task is to count the number of rectangles in the grid that have exactly K ones inside it.
It has to be quicker that O(N^2*M), something like O(NMlog(N+M)) will work. My best aproach was a dp with complexity O(N^2*M), but this is to slow, I've been told that the answer is divide and conquer but I can't get it. Any idea?
One way to get the log factor is to divide horizontally, add the number of rectangles with K 1s above the dividing line, the number of rectangles with K 1s below the dividing line (computed recursively), and for each horizontal width, (there are O(|columns|^2) of them) the number of rectangles that extend some part above and some part below the divided line with the same fixed width. We compute the last one by splitting into two-part partitions of K (maxed at 7 since K ≤ 6 and we may want one part zero). We can binary search on a fixed width and bottom or top line, up or down, on matrix prefix sums, which we can precalculate in O(M * N) and retrieve in O(1).
f(vertical_range) =
f(top_part) +
f(bottom_part) +
NumTopRecs(w, i) * NumBottomRecs(w, k - i)
where i <- [0...k],
w <- all O(M^2) widths
The trick is that in each recursive call, we rotate the half that's passed to f such that the horizontal divider line becomes vertical, which means we've cut the M for our call (from which we draw O(M^2) widths) in 2.

Naive way to find largest block in a rectangle of 1's and 0's

I'm trying to come up with the bruteforce(naive) solution to find the largest block of 1 or 0 in a rectangle of 1 and 0. I know optimal ways which can do it in O(n) time where n is the total size of rectangle.
1 1 0 1 0 1
1 0 0 0 1 1
1 0 0 0 1 1
1 1 0 1 1 0
In the above rectangle, it is rectangle starting at (Row 2, Col 2) of size 6. I was thinking this..
Go through each element and then find the size it makes by iterating
in all directions from it.
Is it bruteforce? What will be the complexity? I'm going through all elements that is n, but then I'm iterating in all directions, how much will that be?
I'm aware that this question has been asked 100 times, but they talk about optimal solutions. What I'm looking for is a bruteforce solution and its complexity?
The algorithm you described looks somehow like this C code:
//for each entry
for(int x = 0; x < width; ++x)
for(int y = 0; y < height; ++y)
{
char lookFor = entries[x][y];
int area = 0;
for(int row = y; row < height; ++row)
{
if(entries[x, row] != lookFor)
break;
for(int col = x; col < width; ++col)
{
if(entries[col, row] != lookFor)
break;
int currentArea = (col - x + 1) * (row - y + 1);
if(currentArea > area)
{
//save the current rect
}
}
}
}
There are four nested loops. The outer loops will iterate exactly n times (with n being the number of entries). The inner loops will iterate width * f1 and height * f2 times in average (with f1 and f2 being some constant fraction). The rest of the algorithm takes constant time and does not depend on the problem size.
Therefore, the complexity is O(n * f1 * width * f2 * height) = O(n^2), which essentially means "go to each entry and from there, visit each entry again", regardless of whether all entries really need to be checked or just a constant fraction that increases with the problem size.
Edit
The above explanations assume that the entries are not distributed randomly and that for larger fields it is more likely to find larger homogeneous subregions. If this is not the case and the average iteration count for the inner loops does not depend on the field size at all (e.g. for randomly distributed entries), then the resulting time complexity is O(n)
Brute force is generally split into two (sometimes sequential) parts. The first part is generating all possible candidates for solutions to the problem. The second part is testing them to see if they actually are solutions.
Brute force: Assume your rectangle is m x n. Generate all sub-rectangles of size a x b where a is in {1..m} and b is in {1..n}. Set a maximum variable to 0. Test all sub-rectangles to see if it is all 0s and 1s. If it is, let maximum = max(maximum, size(sub-rectangle). Alternatively simply start by testing the larger sub-rectangles and move towards testing smaller sub-rectangles. As soon as you find a sub-rectangle with all 0s or 1s, stop. The time complexity will be the same because in the worst-case for both methods, you will still have to iterate through all sub-rectangles.
Time complexity:
Let's count the number of sub-rectangles generated at each step.
There are m*n subrectangles of size 1 x 1.
There are (m-1)*n subrectangles of size 2 x 1.
There are m*(n-1) subrectangles of size 1 x 2.
There are (m-1)*(n-1) subrectangles of size 2 x 2.
... < and so forth >
There are (m-(m-1))*(n-(n-1)) subrectangles of size m x n.
Thus the formula for counting the number of subrectangles of size a x b from a larger rectangle of size m x n is simply:
number_of_subrectangles_of_size_a_b = (m-a) * (m-b)
If we imagine these numbers laid out in an arithmetic series we get
1*1 + 1*2 + ... + 1*n + 2*1 + ... + m*n
This can be factored to:
(1 + 2 + ... + m) * (1 + 2 + ... + n).
These two arithmetic series converge to the order of O(m2) and O(n2) respectively. Thus generating all sub-rectangles of an m*n rectangle is O(m2n2). Now we look at the testing phase.
After generating all sub-rectangles, testing if each sub-rectangle of size a x b is all 0s or all 1s is O(a * b). Unlike the previous step of generating sub-rectangles of size a x b which scales upwards as a x b decreases, this step increases proportionally with the size of a x b.
e.g.: There is 1 sub-rectangle of size m x n. But testing to see if that rectangle is all 0s or all 1s takes O(m*n) time. Conversely there are m*n sub-rectangles of size 1. But testing to see if those rectangles are all 0s or all 1s takes only O(1) time per rectangle.
What you finally end up for the time complexity is a series like this:
O( (m-(m-1))(n-(n-1))*(mn) + (m-(m-1))(n-(n-2))*m(n-1)... + mn*(m-(m-1))(n-(n-1)) )
Note 2 things here.
The largest term in the series is going to be somewhere close to (m
/ 2)(n / 2)(m / 2) (n / 2) which is O(m2n2)
There are m * n terms total in the series.
Thus the testing phase of the brute-force solution will be O(mn * m2n2) = O(m3n3).
Total time complexity is:
O(generating) + O(testing)
= O(m2n2 + m3n3)
= O(m3n3)
If the area of the given rectangle is say, N, we will have O(N3) time complexity.
Look into "connected components" algorithms for additional ideas. What you've presented as a two-dimensional array of binary values looks just like a binary black & white image. An important exception is that in image processing we typically allow a connected component (a blob of 0s or 1s) to have non-rectangular shapes. Some tweaks to the existing multi-pass and single-pass algorithms should be easy to implement.
http://en.wikipedia.org/wiki/Connected-component_labeling
Although it's a more general solution than you need, you could also run a connected components algorithm to find all connected regions (0s or 1s, background or foreground) and then filter the resulting components (a.k.a. blobs). I'll also mention that for foreground components it's preferable to select for "4-connectivity" rather than "8-connectivity," where the former means connectivity is allowed only at pixels above, below, left, and right of a center pixel, and the latter means connectivity is allowed for any of the eight pixels surrounding a center pixel.
A bit farther afield, for very large 2D arrays it may (just may) help to first reduce the search space by creating what we'd call an "image pyramid," meaning a stack of arrays of progressively smaller size: 1/2 each dimension (filled, as needed), 1/4, 1/8, and so on. A rectangle detectable in a reduced resolution image is a good candidate for being the largest rectangle in a very large image (or 2D array of bits). Although that may not be the best solution for whatever cases you're considering, it's scalable. Some effort would be required, naturally, to determine the cost of scaling the array/image versus the cost of maintaining relatively larger lists of candidate rectangles in the original, large image.
Run-length encoding may help you, too, especially since you're dealing with rectangles instead of connected components of arbitrary shape. Run-length encoding would express each row as stretches or "run lengths" of 0s or 1s. This technique was used to speed up connected component algorithms a decade or two ago, and it's still a reasonable approach.
Anyway, that's not the most direct answer to your question, but I hope it helps somewhat.

Algorithm: Distance transform - any faster algorithm?

I'm trying to solve distance transform problem (using Manhattan's distance). Basically, giving matrix with 0's and 1's, program must assign distances of every position to nearest 1. For example, for this one
0000
0100
0000
0000
distance transform matrix is
2123
1012
2123
3234
Possible solutions from my head are:
Slowest ones (slowest because I have tried to implement them - they were lagging on very big matrices):
Brute-force - for every 1 that program reads, change distances accordingly from beginning till end.
Breadth-first search from 0's - for every 0, program looks for nearest 1 inside out.
Same as 2 but starting from 1's mark every distance inside out.
Much faster (read from other people's code)
Breadth-first search from 1's
1. Assign all values in the distance matrix to -1 or very big value.
2. While reading matrix, put all positions of 1's into queue.
3. While queue is not empty
a. Dequeue position - let it be x
b. For each position around x (that has distance 1 from it)
if position is valid (does not exceed matrix dimensions) then
if distance is not initialized or is greater than (distance of x) + 1 then
I. distance = (distance of x) + 1
II. enqueue position into queue
I wanted to ask if there is faster solution to that problem. I tried to search algorithms for distance transform but most of them are dealing with Euclidean distances.
Thanks in advance.
The breadth first search would perform Θ(n*m) operations where n and m are the width and height of your matrix.
You need to output Θ(n*m) numbers, so you can't get any faster than that from a theoretical point of view.
I'm assuming you are not interested in going towards discussions involving cache and such optimizations.
Note that this solution works in more interesting cases. For example, imagine the same question, but there could be different "sources":
00000
01000
00000
00000
00010
Using BFS, you will get the following distance-to-closest-source in the same time complexity:
21234
10123
21223
32212
32101
However, with a single source, there is another solution that might have a slightly better performance in practice (even though the complexity is still the same).
Before, let's observe the following property.
Property: If source is at (a, b), then a point (x, y) has the following manhattan distance:
d(x, y) = abs(x - a) + abs(y - b)
This should be quite easy to prove. So another algorithm would be:
for r in rows
for c in cols
d(r, c) = abc(r - a) + abs(c - b)
which is very short and easy.
Unless you write and test it, there is no easy way of comparing the two algorithms. Assuming an efficient bounded queue implementation (with an array), you have the following major operations per cell:
BFS: queue insertion/deletion, visit of each node 5 times (four times by neighbors, and one time out of the queue)
Direct formula: two subtraction and two ifs
It would really depend on the compiler and its optimizations as well as the specific CPU and memory architecture to say which would perform better.
That said, I'd advise for going with whichever seems simpler to you. Note however that with multiple sources, in the second solution you would need multiple passes on the array (or multiple distance calculations in one pass) and that would definitely have a worse performance than BFS for a large enough number of sources.
You don't need a queue or anything like that at all. Notice that if (i,j) is at distance d from (k,l), one way to realise that distance is to go left or right |i-k| times and then up or down |j-l| times.
So, initialise your matrix with big numbers and stick a zero everywhere you have a 1 in your input. Now do something like this:
for (i = 0; i < sx-1; i++) {
for (j = 0; j < sy-1; j++) {
dist[i+1][j] = min(dist[i+1][j], dist[i][j]+1);
dist[i][j+1] = min(dist[i][j+1], dist[i][j]+1);
}
dist[i][sy-1] = min(dist[i][sy-1], dist[i][sy-2]+1);
}
for (j = 0; j < sy-1; j++) {
dist[sx-1][j] = min(dist[sx-1][j], dist[sx-2][j]+1);
}
At this point, you've found all of the shortest paths that involve only going down or right. If you do a similar thing for going up and left, dist[i][j] will give you the distance from (i, j) to the nearest 1 in your input matrix.

Finding a square side length is R in 2D plane ?

I was at the high frequency Trading firm interview, they asked me
Find a square whose length size is R with given n points in the 2D plane
conditions:
--parallel sides to the axis
and it contains at least 5 of the n points
running complexity is not relative to the R
they told me to give them O(n) algorithm
Interesting problem, thanks for posting! Here's my solution. It feels a bit inelegant but I think it meets the problem definition:
Inputs: R, P = {(x_0, y_0), (x_1, y_1), ..., (x_N-1, y_N-1)}
Output: (u,v) such that the square with corners (u,v) and (u+R, v+R) contains at least 5 points from P, or NULL if no such (u,v) exist
Constraint: asymptotic run time should be O(n)
Consider tiling the plane with RxR squares. Construct a sparse matrix, B defined as
B[i][j] = {(x,y) in P | floor(x/R) = i and floor(y/R) = j}
As you are constructing B, if you find an entry that contains at least five elements stop and output (u,v) = (i*R, j*R) for i,j of the matrix entry containing five points.
If the construction of B did not yield a solution then either there is no solution or else the square with side length R does not line up with our tiling. To test for this second case we will consider points from four adjacent tiles.
Iterate the non-empty entries in B. For each non-empty entry B[i][j], consider the collection of points contained in the tile represented by the entry itself and in the tiles above and to the right. These are the points in entries: B[i][j], B[i+1][j], B[i][j+1], B[i+1][j+1]. There can be no more than 16 points in this collection, since each entry must have fewer than 5. Examine this collection and test if there are 5 points among the points in this collection satisfying the problem criteria; if so stop and output the solution. (I could specify this algorithm in more detail, but since (a) such an algorithm clearly exists, and (b) its asymptotic runtime is O(1), I won't go into that detail).
If after iterating the entries in B no solution is found then output NULL.
The construction of B involves just a single pass over P and hence is O(N). B has no more than N elements, so iterating it is O(N). The algorithm for each element in B considers no more than 16 points and hence does not depend on N and is O(1), so the overall solution meets the O(N) target.
Run through set once, keeping the 5 largest x values in a (sorted) local array. Maintaining the sorted local array is O(N) (constant time performed N times at most).
Define xMin and xMax as the x-coordinates of the two points with largest and 5th largest x values respectively (ie (a[0] and a[4]).
Sort a[] again on Y value, and set yMin and yMax as above, again in constant time.
Define deltaX = xMax- xMin, and deltaY as yMax - yMin, and R = largest of deltaX and deltaY.
The square of side length R located with upper-right at (xMax,yMax) meets the criteria.
Observation if R is fixed in advance:
O(N) complexity means no sort is allowed except on a fixed number of points, as only a Radix sort would meet the criteria and it requires a constraint on the values of xMax-xMin and of yMax-yMin, which was not provided.
Perhaps the trick is to start with the point furthest down and left, and move up and right. The lower-left-most point can be determined in a single pass of the input.
Moving up and right in steps and counitng points in the square requries sorting the points on X and Y in advance, which to be done in O(N) time requiress that the Radix sort constraint be met.

Optimizing a DP on Intervals/Points

Well the problem is quite easy to solve naively in O(n3) time. The problem is something like:
There are N unique points on a number line. You want to cover every
single point on the number line with some set of intervals. You can
place an interval anywhere, and it costs B + MX to create an
interval, where B is the initial cost of creating an interval, and
X is half the length of the interval, and M is the cost per
length of interval. You want to find the minimum cost to cover every
single interval.
Sample data:
Points = {0, 7, 100}
B = 20
M = 5
So the optimal solution would be 57.50 because you can build an interval [0,7] at cost 20 + 3.5×5 and build an interval at [100,100] at cost 100 + 0×5, which adds up to 57.50.
I have an O(n3) solution, where the DP is minimum cost to cover points from [left, right]. So the answer would be in DP[1][N]. For every pair (i,j) I just iterate over k = {i...j-1} and compute DP[i][k] + DP[k + 1][j].
However, this solution is O(n3) (kind of like matrix multiplication I think) so it's too slow on N > 2000. Any way to optimize this?
Here's a quadratic solution:
Sort all the points by coordinate. Call the points p.
We'll keep an array A such that A[k] is the minimum cost to cover the first k points. Set A[0] to zero and all other elements to infinity.
For each k from 0 to n-1 and for each l from k+1 to n, set A[l] = min(A[l], A[k] + B + M*(p[l-1] - p[k])/2);
You should be able to convince yourself that, at the end, A[n] is the minimum cost to cover all n points. (We considered all possible minimal covering intervals and we did so from "left to right" in a certain sense.)
You can speed this up so that it runs in O(n log n) time; replace step 3 with the following:
Set A[1] = B. For each k from 2 to n, set A[k] = A[k-1] + min(M/2 * (p[k-1] - p[k-2]), B).
The idea here is that we either extend the previous interval to cover the next point or we end the previous interval at p[k-2] and begin a new one at p[k-1]. And the only thing we need to know to make that decision is the distance between the two points.
Notice also that, when computing A[k], I only needed the value of A[k-1]. In particular, you don't need to store the whole array A; only its most recent element.

Resources