Lights-out puzzle on a large grid - algorithm

I am trying to solve this algorithmic problem. For convenience, I have replicated it below:
We are given a grid G of size N x M (1 <= N,M <= 1000) containing only 1s and 0s. If we choose one of the cells, this will toggle the value in adjacent cells (and that cell itself). Two cells that differ by exactly 1 row or 1 column are considered adjacent (i.e. diagonal cells are not adjacent). Our goal is to find a grid G' containing 1s at cell positions that we need to choose in order to turn all cells in G to 0 (those cells that we don't have to choose are marked with 0). Given any G (in this problem), G' is guaranteed to exist.
Note: There is no wraparound in the grid.
For example, if G is given as the following:
000
100
000
If we choose the center cell, G will become:
010
011
010
For this example, G' is:
001
011
001
It looks very similar to the lights-out puzzle. I am only able to solve this for small instances (N,M <= 20) using brute force. A search on google also yields a solution (for the lights-out puzzle) that uses Gaussian elimination. But this only works on small grids (N,M <= 100) since this method has a cubic time complexity.
Could someone please advise me on how I could solve this problem?

Gaussian elimination doesn't have cubic time complexity. It takes O(N^3) time for NxN matrices, but in terms of the input size, that's just O(N^1.5)
To solve the lights out puzzle on an NxM grid with Gaussian elimination takes O(N^2*M) time and O(N*M) space, where N can be whichever dimension is smaller.
On an average PC or MAC you should be able to do 1000x1000 in a small number of seconds in C++ or Java if you make sure the inner loops are fast.

Related

Optimal solution for clustering of rectangles

I am looking for some approach (algorithm to be very specific) here.
Problem:There are N rectangles (r1, r2,.. rn) scattered in X-Y plane. Need to find optimal solution to cluster these rectangles with bigger bounded polygons.
Condition for clustering:
Results should have maximum number of rectangles covered in the polygon.
Total count of bounded polygon should be minimum possible and maximum K.
Each bounded polygon must have at least 70% area filled with given rectangles.
All rectangles need not to be bounded.
constraint:
1 millions <= n <= billions
K = 50000
Problem can be thought of as to identify islands (max 50k island) having higher density of rectangles (70% in each island). We can off course exclude certain rectangles. But the idea is to find optimal solution and there is no single best solution.
I was trying to use K-means clustering but it doesn't fit in my case as in my problem solution can lie within 1-K values instead of K values. May be it requires all together different dimension of thinking. Hope I am clear!!
Adding image to be just more clearer:

Algorithm for placing rooks on a $nxn$ chessboard such that they attack exactly $m$ squares

Suppose there is a chessboard with dimensions nxn and you put rooks on the chessboard such that they collectively attack m squares on that chessboard. Given n and m, how can you determine how many rooks must be placed on the chessboard, and where to put them?
For example, let's say that the board dimensions are 3x3 and you have to cover 9 squares on the board by placing rooks. To do this, or to simply cover the board with rooks such that there are no safe squares, you can put 3 rooks in the coordinates (1,1);(1,2);(1,3) on the board (the first number in the coordinate is the column number, the second is the row number). That way, since a rook attacks all squares in the same row and column as where it stands, all 9 squares are attacked.
But how can you find the optimal coordinates for any n and m with an algorithm?
If you cover x rows and y columns, then you cover N(x+y) - xy squares, and it takes max(x,y) rooks.
A simple algorithm to solve your problem would try to calculate the matching value of y for each possible value of x, and remember the solution that requires the fewest rooks.
For a more sophisticated solution that works for bigger problems, notice that -(N-x)(N-y) = N(x+y) - xy - N^2
If there is a solution that covers X rows and y columns to attack m squares, then:
m = -(N-x)(N-y) + N^2
so
(N-x)(N-y) = N^2 - m
Each solution therefore corresponds to a factoring of N^2 - m into a product, and the solution that takes the fewest number of rooks is the one with the factors closest together. You can start at sqrt(N^2 - m) and count down until you find a divisor, or use Fermat's factorization method: https://en.wikipedia.org/wiki/Fermat%27s_factorization_method

Algorithm to divide region such that sum of distance is minimized

Suppose we have n points in a bounded region of the plane. The problem is to divide it in 4 regions (with a horizontal and a vertical line) such that the sum of a metric in each region is minimized.
The metric can be for example, the sum of the distances between the points in each region ; or any other measure about the spreadness of the points. See the figure below.
I don't know if any clustering algorithm might help me tackle this problem, or if for instance it can be formulated as a simple optimization problem. Where the decision variables are the "axes".
I believe this can be formulated as a MIP (Mixed Integer Programming) problem.
Lets introduce 4 quadrants A,B,C,D. A is right,upper, B is right,lower, etc. Then define a binary variable
delta(i,k) = 1 if point i is in quadrant k
0 otherwise
and continuous variables
Lx, Ly : coordinates of the lines
Obviously we have:
sum(k, delta(i,k)) = 1
xlo <= Lx <= xup
ylo <= Ly <= yup
where xlo,xup are the minimum and maximum x-coordinate. Next we need to implement implications like:
delta(i,'A') = 1 ==> x(i)>=Lx and y(i)>=Ly
delta(i,'B') = 1 ==> x(i)>=Lx and y(i)<=Ly
delta(i,'C') = 1 ==> x(i)<=Lx and y(i)<=Ly
delta(i,'D') = 1 ==> x(i)<=Lx and y(i)>=Ly
These can be handled by so-called indicator constraints or written as linear inequalities, e.g.
x(i) <= Lx + (delta(i,'A')+delta(i,'B'))*(xup-xlo)
Similar for the others. Finally the objective is
min sum((i,j,k), delta(i,k)*delta(j,k)*d(i,j))
where d(i,j) is the distance between points i and j. This objective can be linearized as well.
After applying a few tricks, I could prove global optimality for 100 random points in about 40 seconds using Cplex. This approach is not really suited for large datasets (the computation time quickly increases when the number of points becomes large).
I suspect this cannot be shoe-horned into a convex problem. Also I am not sure this objective is really what you want. It will try to make all clusters about the same size (adding a point to a large cluster introduces lots of distances to be added to the objective; adding a point to a small cluster is cheap). May be an average distance for each cluster is a better measure (but that makes the linearization more difficult).
Note - probably incorrect. I will try and add another answer
The one dimensional version of minimising sums of squares of differences is convex. If you start with the line at the far left and move it to the right, each point crossed by the line stops accumulating differences with the points to its right and starts accumulating differences to the points to its left. As you follow this the differences to the left increase and the differences to the right decrease, so you get a monotonic decrease, possibly a single point that can be on either side of the line, and then a monotonic increase.
I believe that the one dimensional problem of clustering points on a line is convex, but I no longer believe that the problem of drawing a single vertical line in the best position is convex. I worry about sets of points that vary in y co-ordinate so that the left hand points are mostly high up, the right hand points are mostly low down, and the intermediate points alternate between high up and low down. If this is not convex, the part of the answer that tries to extend to two dimensions fails.
So for the one dimensional version of the problem you can pick any point and work out in time O(n) whether that point should be to the left or right of the best dividing line. So by binary chop you can find the best line in time O(n log n).
I don't know whether the two dimensional version is convex or not but you can try all possible positions for the horizontal line and, for each position, solve for the position of the vertical line using a similar approach as for the one dimensional problem (now you have the sum of two convex functions to worry about, but this is still convex, so that's OK). Therefore you solve at most O(n) one-dimensional problems, giving cost O(n^2 log n).
If the points aren't very strangely distributed, I would expect that you could save a lot of time by using the solution of the one dimensional problem at the previous iteration as a first estimate of the position of solution for the next iteration. Given a starting point x, you find out if this is to the left or right of the solution. If it is to the left of the solution, go 1, 2, 4, 8... steps away to find a point to the right of the solution and then run binary chop. Hopefully this two-stage chop is faster than starting a binary chop of the whole array from scratch.
Here's another attempt. Lay out a grid so that, except in the case of ties, each point is the only point in its column and the only point in its row. Assuming no ties in any direction, this grid has N rows, N columns, and N^2 cells. If there are ties the grid is smaller, which makes life easier.
Separating the cells with a horizontal and vertical line is pretty much picking out a cell of the grid and saying that cell is the cell just above and just to the right of where the lines cross, so there are roughly O(N^2) possible such divisions, and we can calculate the metric for each such division. I claim that when the metric is the sum of the squares of distances between points in a cluster the cost of this is pretty much a constant factor in an O(N^2) problem, so the whole cost of checking every possibility is O(N^2).
The metric within a rectangle formed by the dividing lines is SUM_i,j[ (X_i - X_j)^2 + (Y_i-Y_j)^2]. We can calculate the X contributions and the Y contributions separately. If you do some algebra (which is easier if you first subtract a constant so that everything sums to zero) you will find that the metric contribution from a co-ordinate is linear in the variance of that co-ordinate. So we want to calculate the variances of the X and Y co-ordinates within the rectangles formed by each division. https://en.wikipedia.org/wiki/Algebraic_formula_for_the_variance gives us an identity which tells us that we can work out the variance given SUM_i Xi and SUM_i Xi^2 for each rectangle (and the corresponding information for the y co-ordinate). This calculation can be inaccurate due to floating point rounding error, but I am going to ignore that here.
Given a value associated with each cell of a grid, we want to make it easy to work out the sum of those values within rectangles. We can create partial sums along each row, transforming 0 1 2 3 4 5 into 0 1 3 6 10 15, so that each cell in a row contains the sum of all the cells to its left and itself. If we take these values and do partial sums up each column, we have just worked out, for each cell, the sum of the rectangle whose top right corner lies in that cell and which extends to the bottom and left sides of the grid. These calculated values at the far right column give us the sum for all the cells on the same level as that cell and below it. If we subtract off the rectangles we know how to calculate we can find the value of a rectangle which lies at the right hand side of the grid and the bottom of the grid. Similar subtractions allow us to work out first the value of the rectangles to the left and right of any vertical line we choose, and then to complete our set of four rectangles formed by two lines crossing by any cell in the grid. The expensive part of this is working out the partial sums, but we only have to do that once, and it costs only O(N^2). The subtractions and lookups used to work out any particular metric have only a constant cost. We have to do one for each of O(N^2) cells, but that is still only O(N^2).
(So we can find the best clustering in O(N^2) time by working out the metrics associated with all possible clusterings in O(N^2) time and choosing the best).

Flipping rectangle bits [duplicate]

This question already has an answer here:
Algorithm for toggling values in a matrix [closed]
(1 answer)
Closed 9 years ago.
Flip the world is a game. In this game a matrix of size N*M is given, which consists of numbers. Each number can be 1 or 0 only. The rows are numbered from 1 to N, and the columns are numbered from 1 to M.
Following steps can be called as a single move.
Select two integers x,y (1<=x<=N and 1<=y<=M) i.e. one square on the matrix.
All the integers in the rectangle denoted by (1,1) and (x,y) i.e. rectangle having top-left and bottom-right points as (1,1) and (x,y) are toggled(1 is made 0 and 0 is made 1).
For example, in this matrix (N=4 and M=3)
101
110
101
000
if we choose x=3 and y=2, the new state of matrix would be
011
000
011
000
For a given state of matrix, aim of the game is to reduce the matrix to a state where all numbers are 1. What is minimum number of moves required.
How to solve this problem?
This is not a homework problem.I'm pretty confused with it.I'm fighting with this problem for past two days.And maintaining a 2-D array for number of ones and zeros.
I tried like balancing the number of one's and number of zeroes.But didn't work out.Any hints or solutions. ?
Source: Hackerearth
Hint #1: Use a greedy approach from bottom to top. That is: if the cell (n, m) is 0 then you must apply XOR to the rectangle (0,0)-(n,m). So try traversing all cells from bottom to top and from right to left and if the current cell is zero then perform a move on it.
This yields a O(n^4) solution.
To get a n^2 solution use, for example, accumulated sums in every rectangle.

Best parallel method for calculating the integral of a 2D function

In some crunching number program, I have a function which can be just 1 or 0 in three dimensions. I do not know in advance the function, but I need to know the total "surface" of the function which is equal to zero. In a similar problem I could draw a rectangle over the 2D representation of the map of United Kingdom. The function is equal to 0 at sea, and 1 at the earth. I need to know the total water surface. I wonder what is the best parallel algorithm or method for doing this.
I thought first about the following approach; a) divide 2D map area into a rectangular grid. For each point that belongs to the center of each cell, check whether it is earth of water. This can be done in parallel. At the end of the procedure I will have a matrix with ones and zeroes. I will get the area with some precision. Now I want to increase this precision, so b) choose the cells that are in the border regions between zeroes and ones (what is the best criterion for doing this?) and in those cells, divide them again into successive cells and repeat the process until one gets the desired accuracy. I guess that in this process, the critical parameters are the grid size for each new stage, and how to store and check the cells that belong to the border area. Finally the most optimal method, from the computational point of view, is the one that performs the minimal number of checks in order to get the value of the total surface with the desired accuracy.
First of all, it looks like you are talking about 3D function, e.g. for two coordinates x and y you have f(x, y) = 0 if (x, y) belongs to the sea, and f(x, y) = 1 otherwise.
Having said that, you can use the following simple approach.
Split your rectangle into N subrectangles, where N is the number of
your processors (or processor cores, or nodes in a cluster, etc.)
For each subrectangle use Monte Carlo method to calculate the
surface of the water.
Add the N values to calculate the total surface of the water.
Of course, you can use any other method to calculate the surface, Mothe Carlo was just an example. But the idea is the same: subdivide your problem to N subproblems, solve them in parallel, then combine the results.
Update: For the Monte Carlo method the error estimate decreases as 1/sqrt(N) where N is the number of samples. For instance, to reduce the error by a factor of 2 requires a 4-fold increase in the number of sample points.
I believe that your attitude is reasonable.
Choose the cells that area in the border regions between zeroes and ones (what is the best criterion for doing this?)
Each cell has 8 sorrunding cells (3x3), or 24 sorrunding cells (5x5). If at least one of the 9 or 25 cells contains land, and at least one of these cells contains water - increase the accuracy for the whole block of cells (3x3 or 5x5) and query again.
When the accuracy is good enough - instead of splitting, just add the land area to the sum.
Efficiency
Use a producers-consumer queue. Create n threads, where n equals to the number of cores on your machine. All threads should do the same job:
Dequeue a geo-cell from the queue
If the area of the cell is still large - divide it into 3x3 or 5x5 cells, for each of the split cells check for land/sea. If there is a mix - enqueue all these cells. If it only land: just add the area. only sea: do nothing.
For start, just divide the whole area into reasonable sized cell and equeue all of them.
You can also optimize by not adding all the 9 or 25 cells when there is a mix, but examine the pattern (only top/bottom/left/right cells).
Edit:
There is a tradeoff between accuracy and performance: If the initial cell size is too large, you may miss small lakes or small islands. therefore the optimization criteria should be: start with the largest cells possible that will assure enough accuracy.

Resources