Programming Puzzle: How to paint a board? - algorithm

There is a N x M board we should paint. We can paint either an entire row or an entire column at once. Given an N x M matrix of colours of all board cells find the minimal number of painting operations to paint the board.
For example: we should paint a 3 x 3 board as follows (R - red, B - blue, G - green):
B, B, B
B, R, R
B, G, G
The minimal number of painting operations is 4:
Paint row 0 with Blue
Paint row 1 with Red
Paint row 2 with Green
Paint column 0 with Blue
How would you solve it ?

This looks like a fun problem. Let me take a shot at it with some pseudocode.
Function MinPaints(Matrix) Returns Integer
If the matrix is empty return 0
Find all rows and columns which have a single color
If there are none, return infinity, since there is no solution
Set the current minimum to infinity
For each row or column with single color:
Remove the row/column from the matrix
Call MinPaints with the new matrix
If the result is less than the current minimum, set the current minimum to the result
End loop
Return the current minimum + 1
End Function
I think that will solve your problem, but I didn't try any optimization or anything. This may not be fast enough though, I don't know. I doubt this problem is solvable in sub-exponential time.
Here is how this algorithm would solve the example:
BBB
BRR
BGG
|
+---BRR
| BGG
| |
| +---RR
| | GG
| | |
| | +---GG
| | | |
| | | +---[]
| | | | |
| | | | Solvable in 0
| | | |
| | | Solvable in 1
| | |
| | +---RR
| | | |
| | | +---[]
| | | | |
| | | | Solvable in 0
| | | |
| | | Solvable in 1
| | |
| | Solvable in 2
| |
| Solvable in 3
| BB
+---Another branch with RR ...
| GG
Solvable in 4

For starters, you can try an informed exhaustive search.
Let your states graph be: G=(V,E) where V = {all possible boards} and E = {(u,v) | you can move from board u to v within a single operation}.
Note that you do not need to generate the graph in advance - you can generate it on the fly, using a successors(board) function, that return all the successors of the given board.
You will also need h:V->R - an admissible heuristic function that evaluates the board1.
Now, you can run A*, or bi-directional BFS search [or combination of both], your source will be a white board, and your target is the requested board. Because we use admissible heuristic function - A* is both complete (always finds a solution if one exists) and optimal (finds the shortest solution), it will find the best solution. [same goes for bi-directional BFS].
drawbacks:
Though the algorithm is informed, it will have exponential behavior.
But if it is an interview question, I believe a non-efficient
solution is better then no solution.
Though complete and optimal - if there is no solution - the algorithm may be stuck in an infinite loop, or a very long loop at the very least until it realizes it has exuahsted all possibilities.
(1) example for admissible heuristic is h(board) = #(miscolored_squares)/max{m,n}

Related

Any algorithm to fill a space with smallest number of boxes

Let a 3D grid, just like a checkerboard, with an extra dimension. Now let's say that I have a certain amount of cubes into that grid, each cube occupying 1x1x1 cells. Let's say that each of these cubes is an item.
What I would like to do is replace/combine these cubes into larger boxes occupying any number of cells on the X, Y and Z axes, so that the resulting number of boxes is as small as possible while preserving the overall "appearance".
It's probably unclear so I'll give a 2D example. Say I have a 2D grid containing several squares occupying 1x1 cells. A letter represents the cells occupied by a given item, each item having a different letter from the other ones. In the first example we have 10 different items, each of them occupying 1x1x1 cells.
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | B | C | D | |
+---+---+---+---+---+---+
| | E | F | G | H | |
+---+---+---+---+---+---+
| | | K | L | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
That's my input data. I could now optimize it, i.e reduce the number of items while still occupying the same cells, by multiple possible ways, one of which could be :
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | B | B | C | |
+---+---+---+---+---+---+
| | A | B | B | C | |
+---+---+---+---+---+---+
| | | B | B | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
Here, instead of 10 items, I only have 3 (i.e A, B and C). However it can be optimized even more :
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
| | A | A | A | A | |
+---+---+---+---+---+---+
| | A | A | A | A | |
+---+---+---+---+---+---+
| | | B | B | | |
+---+---+---+---+---+---+
| | | | | | |
+---+---+---+---+---+---+
Here I only have two items, A and B. This is as optimized as this can be.
What I am looking for is an algorithm capable of finding the best item sizes and arrangement, or at least a reasonably good one, so that I have as few items as possible while occupying the same cells, and in 3D !
Is there such an algorithm ? I'm sure there are some domains where that kind of algorithm would be useful, and I need it for a video game. Thanks !!
Perhaps a simpler algorithm is possible, but a set partition should suffice.
Min x1 + x2 + x3 + ... //where x1 is 1 if the 1th partition is chosen, 0 otherwise
such that x1 + + x3 = 1// if 1st and 3rd partition contain 1st item
x2 + x3 = 1//if 2nd and 3rd partition contain 2nd item and so on.
x1, x2, x3,... are binary
You have 1 constraint for each item. Each constraint stipulates that each item can be part of exactly one box. The objective minimizes the total number of boxes.
This is an NP Hard integer programming however.
The number of variables in this problem could be exponential. You need to have an efficient way of enumerating them -- that is figuring out when a contiguous box can be found that is capable of including all points in it. It is here that you have to take into account information such as whether the grid is 2d or 3d, how you define a contiguous "box", etc.
Such problems are usually solved by column-generation, where these columns of the integer program are dynamically generated on the fly.
If I understand David Eppstein's1 explanation (see section 3) then a solution can be found in a maximal independent set in the bipartite intersection graph of axis-aligned diagonals connecting one concave vertex to another. (This would be 2d. I'm not sure about 3d, although perhaps it involves evaluating hyperplanes instead of lines?)
In your example, there is only one such diagonal:
________
| |
|_x....x_|
|____|
The two xs represent connected concave vertices. The maximal independent set of edges here contains only one edge, splitting the polygon in two.
Here's another with only one axis-parallel edge connecting two concave vertices, x and x. This polygon, though, also has two concave vertices, a and b, that do not have an opposite, axis-parallel match. In that case, it seems to me, each concave vertex without a partner would split the polygon it's on in two (either vertically or horizontally):
____________
| |
| |x
| . |
| . |a
|___ . |
b| . |
| .___|
|________|x
results in 4 rectangles:
____________
| |
| |x
| . |
| ..|a
|___.......... |
b| . |
| .___|
|________|x
Here's one with two intersecting axis-parallel diagonals, each connecting two concave vertices, (x,x) and (y,y):
____________
| |
| |x_
| . |
| . |
|___ . . . .z. .|y
y| . |
| .____|
|________|x
In this case, as I understand, the intersection graph again contains only one independent set:
(y,z) (z,y) (x,z) (z,x)
yielding 4 rectangles as a solution.
Since I'm not completely sure how the "intersection graph" in the paper is defined, I would welcome any clarifying comments.
1. Graph-Theoretic Solutions to Computational Geometry Problems, David Eppstein (Submitted on 26 Aug 2009)

Feature Tracking by using Lucas Kanade algorithm

Lucas Kanade Feature Tracker Refer Page 6 I am implementing the Lucas Kanade Feature Tracker in C++.
One thing is unclear in implementing the equation 23 which is mentioned in attached paper. I think Matrix G calculation should happened inside K loop, not outside K loop. In case when Patch B is present at the border in frame j, That time it is not useful to use full G Spatial Gradient Matrix which is calculated before K loop (as mentioned in paper). For Frame j, Matrix G should calculate for the showed patch B portion only.
Patch A Patch B
| |
| |
-----|--- -|-------
| |---| | | | |
| | | | |--| |
| |---| | | | |
| | |--| |
--------- ---------
Frame i Frame j

In a graph, find longest path with a certain property?

I have a directed graph (more specifically, a control flow graph), and each of the vertices has a set of properties.
I'd like to find (or write) an algorithm that, given a vertex V with property P, finds the longest path to some vertex E such that all vertices along all possible paths from V to E contain the property P.
Example 1
Say I had the following graph. (Please excuse the bad ascii drawing.)
+----+
+--------+P +--------+
| +----+ |
| V1 |
| |
| |
+--v--+ |
+----+P ++ |
| +-----++ +--v--+
| | +----+P |
| | | +-----+
+--v--+ +--v--+ |
|P +-+ +-+P | |
+-----+ | | +-----+ |
| | |
| | |
+v-v--+ |
V6 |P +---------+ |
+-----+ | |
| |
| |
| |
| |
+-v--v-+
V7 |P |
+---+--+
|
|
+---v--+
V8 |!P |
+------+
Starting at V1, the longest path where P always holds on all possible paths is V1 -> V7. Note that the other paths, say V1 -> V6, are "valid" in that P always holds, but V1 -> V7 is the longest.
Example 2
This example is the same as above, except now the P doesn't hold in V3:
+----+
+--------+P +--------+
| +----+ |
| V1 |
| |
| |
+--v--+ |
+----+P ++ |
| +-----++ +--v--+
| | +----+!P | V3
| | | +-----+
+--v--+ +--v--+ |
|P +-+ +-+P | |
+-----+ | | +-----+ |
| | |
| | |
+v-v--+ |
V6 |P +---------+ |
+-----+ | |
| |
| |
| |
| |
+-v--v-+
V7 |P |
+---+--+
|
|
+---v--+
V8 |!P |
+------+
In this case, starting at V1, the longest path where P always holds in all possible paths is V1 -> V6. The path V1 -> V7 is not valid, because there is a path between V1 and V7 in which P does not hold.
Further notes about my situation
The graph could be cyclic.
The graph will be of a "small to medium" size, with maybe 1000 vertices or less.
The graph does not necessarily always have one root and one leaf, like my examples above.
Question
Is there a standard algorithm for computing such paths?
The problem has no known efficient solution, as it is easily reduceable from Hamiltonian Path Problem, which says - given a graph - is there a path that goes through all vertices exactly once?
The reduction is simple - Given Hamiltonian Path problem, label all nodes with p, and find longest path. Since Hamiltonian path is NP-Complete, so is this problem, and there is no known polynomial solution to it.
An alternative is using a brute-force search (simplest form is generate all permutations and chose the best valid one) - but that will become impossible for large graphs. You might also need to consider using a heuristic approach (that finds a "good" solution, but not the optimal), like Genetic Algorithms.
Another possible solution is to reduce the problem to a Traveling Salesman Problem, and use some existing TSP solver. Note that while this problem is also NP-hard, since it is well-studied, there are some pretty efficient solutions for medium size graphs.
Also, if your graph happens to be somehow 'special' (a DAG for example), you might utilize some smart techniques to achieve significant speed up to polynomial time, like Dynamic Programming.

MapReduce matrix multiplication complexity

Assume, that we have large file, which contains descriptions of the cells of two matrices (A and B):
+---------------------------------+
| i | j | value | matrix |
+---------------------------------+
| 1 | 1 | 10 | A |
| 1 | 2 | 20 | A |
| | | | |
| ... | ... | ... | ... |
| | | | |
| 1 | 1 | 5 | B |
| 1 | 2 | 7 | B |
| | | | |
| ... | ... | ... | ... |
| | | | |
+---------------------------------+
And we want to calculate the product of this matrixes: C = A x B
By definition: C_i_j = sum( A_i_k * B_k_j )
And here is a two-step MapReduce algorithm, for calculation of this product (I will provide a pseudocode):
First step:
function Map (input is a single row of the file from above):
i = row[0]
j = row[1]
value = row[2]
matrix = row[3]
if(matrix == 'A')
emit(i, {j, value, 'A'})
else
emit(j, {i, value, 'B'})
Complexity of this Map function is O(1)
function Reduce(Key, List of tuples from the Map function):
Matrix_A_tuples =
filter( List of tuples from the Map function, where matrix == 'A' )
Matrix_B_tuples =
filter( List of tuples from the Map function, where matrix == 'B' )
for each tuple_A from Matrix_A_tuples
i = tuple_A[0]
value_A = tuple_A[1]
for each tuple_B from Matrix_B_tuples
j = tuple_B[0]
value_B = tuple_B[1]
emit({i, j}, {value_A * value_b, 'C'})
Complexity of this Reduce function is O(N^2)
After the first step we will get something like the following file (which contains O(N^3) lines):
+---------------------------------+
| i | j | value | matrix |
+---------------------------------+
| 1 | 1 | 50 | C |
| 1 | 1 | 45 | C |
| | | | |
| ... | ... | ... | ... |
| | | | |
| 2 | 2 | 70 | C |
| 2 | 2 | 17 | C |
| | | | |
| ... | ... | ... | ... |
| | | | |
+---------------------------------+
So, all we have to do - just sum the values, from lines, which contains the same values i and j.
Second step:
function Map (input is a single row of the file, which produced in first step):
i = row[0]
j = row[1]
value = row[2]
emit({i, j}, value)
function Reduce(Key, List of values from the Map function)
i = Key[0]
j = Key[1]
result = 0;
for each Value from List of values from the Map function
result += Value
emit({i, j}, result)
After the second step we will get the file, which contains cells of the matrix C.
So the question is:
Taking into account, that there are multiple number of instances in MapReduce cluster - which is the most correct way to estimate complexity of the provided algorithm?
The first one, which comes to mind is such:
When we assume that number of instances in the MapReduce cluster is K.
And, because of the number of lines - from file, which produced after the first step is O(N^3) - the overall complexity can be estimated as O((N^3)/K).
But this estimation doesn't take into account many details: such as network bandwidth between instances of MapReduce cluster, ability to distribute data between distances - and perform most of the calculations locally etc.
So, I would like to know which is the best approach for estimation of efficiency of the provided MapReduce algorithm, and does it make sense to use Big-O notation to estimate efficiency of MapReduce algorithms at all?
as you said the Big-O estimates the computation complexity, and does not take into consideration the networking issues such(bandwidth, congestion, delay...)
If you want to calculate how much efficient the communication between instances, in this case you need other networking metrics...
However, I want to tell you something, if your file is not big enough, you will not see an improvement in term of execution speed. This is because the MapReduce works efficiently only with BIG data. Moreover, your code has two steps, that means two jobs. MapReduce, from one job to another, takes time to upload the file and start the job again. This can affect slightly the performance.
I think you can calculate the efficiently in term of speed and time as the MapReduce approach is for sure faster when it comes to big data. This is if we compared it to the sequential algorithms.
Moreover, efficiency can be with regards to the fault-tolerance. This is because MapReduce will manage to handle failures by itself. So, no need for the programmers to handle instance failure or networking failures..

The "Waiting lists problem"

A number of students want to get into sections for a class, some are already signed up for one section but want to change section, so they all get on the wait lists. A student can get into a new section only if someone drops from that section. No students are willing to drop a section they are already in unless that can be sure to get into a section they are waiting for. The wait list for each section is first come first serve.
Get as many students into their desired sections as you can.
The stated problem can quickly devolve to a gridlock scenario. My question is; are there known solutions to this problem?
One trivial solution would be to take each section in turn and force the first student from the waiting list into the section and then check if someone end up dropping out when things are resolved (O(n) or more on the number of section). This would work for some cases but I think that there might be better options involving forcing more than one student into a section (O(n) or more on the student count) and/or operating on more than one section at a time (O(bad) :-)
Well, this just comes down to finding cycles in the directed graph of classes right? each link is a student that wants to go from one node to another, and any time you find a cycle, you delete it, because those students can resolve their needs with each other. You're finished when you're out of cycles.
Ok, lets try. We have 8 students (1..8) and 4 sections. Each student is in a section and each section has room for 2 students. Most students want to switch but not all.
In the table below, we see the students their current section, their required section and the position on the queue (if any).
+------+-----+-----+-----+
| stud | now | req | que |
+------+-----+-----+-----+
| 1 | A | D | 2 |
| 2 | A | D | 1 |
| 3 | B | B | - |
| 4 | B | A | 2 |
| 5 | C | A | 1 |
| 6 | C | C | - |
| 7 | D | C | 1 |
| 8 | D | B | 1 |
+------+-----+-----+-----+
We can present this information in a graph:
+-----+ +-----+ +-----+
| C |---[5]--->1| A |2<---[4]---| B |
+-----+ +-----+ +-----+
1 | | 1
^ | | ^
| [1] [2] |
| | | |
[7] | | [8]
| V V |
| 2 1 |
| +-----+ |
\--------------| D |--------------/
+-----+
We try to find a section with a vacancy, but we find none. So because all sections are full, we need a dirty trick. So lets take a random section with a non empty queue. In this case section A and assume, it has an extra position. This means student 5 can enter section A, leaving a vacancy at section C which is taken by student 7. This leaves a vacancy in section D which is taken by student 2. We now have a vacancy at section A. But we assumed that section A has an extra position, so we can remove this assumption and have gained a simpler graph.
If the path never returned to section A, undo the moves and mark A as an invalid startingpoint. Retry with another section.
If there are no valid sections left we are finished.
Right now we have the following situation:
+-----+ +-----+ +-----+
| C | | A |1<---[4]---| B |
+-----+ +-----+ +-----+
| 1
| ^
[1] |
| |
| [8]
V |
1 |
+-----+ |
| D |--------------/
+-----+
We repeat the trick with another random section, and this solves the graph.
If you start with several students currently not assigned, you add an extra dummy section as their startingpoint. Of course, this means that there must be vacancies in any sections or the problem is not solvable.
Note that due to the order in the queue, it can be possible that there is no solution.
This is actually a Graph problem. You can think of each of these waiting list dependencies as edges on a directed graph. If this graph has a cycle, then you have one of the situations you described. Once you have identified a cycle, you can chose any point to "break" the cycle by "over filling" one of the classes, and you will know that things will settle correctly because there was a cycle in the graph.

Resources