2D Matrix Problem - how many people can get a color that they want? - algorithm

Given a bitarray such as the following:
C0 C1 C2 C3 C4 C5
**********************************************
P0 * 0 0 1 0 1 0 *
P1 * 0 1 0 0 1 0 *
P2 * 0 0 0 1 1 0 *
P3 * 1 0 0 0 0 1 *
P4 * 0 0 0 0 0 0 *
P5 * 0 0 0 0 0 0 *
P6 * 1 0 0 0 0 0 *
**********************************************
Each row represents a different person P_i, while each column represent a different color C_j. If a given cell A[i][j] is 1, it means that person i would like color j. A person can only get one color, and a color can only be given to one person.
In general, the number of people P > 0, and the number of colors C >= 0.
How can I, time-efficiently, compute the maximal amount of people who can get a color that they want?
The correct answer to the example above would be 5.
Person 6 (P6) only has one wish, so he gets color 0 (C0)
Since C0 is now taken, P3 only has one wish left, so he gets C5.
P0 gets C2, P1 gets C1 and P2 gets C3.
My first idea was a greedy algorithm, that simply favored the person (i.e. row) with the lowest amount of wanted colors. This works for the most part, but is simply too slow for my liking, as it runs in O(P*(P*C)) time, which is equal to O(n^3) when n = P = C. Any ideas to an algorithm (or another data structure) that can solve the problem quicker?
This might be a duplicate of another similar question, but I had trouble with finding the correct name for the type of problem, so bear with me if this is the case.

This is a classical problem known as maximum cardinality bipartite matching . Here, you have a bipartite graph where in one side you have the vertices corresponding to the people and on the other side the vertices corresponding to the colors. An edge between a person and a color exists if there is a one in the corresponding entry in the matrix.
In the general case, the best known algorithms has worst case performance O(E*sqrt(V)), where E is the number of edges in the graph and V is the number of vertices. One such algorithm is called Hopcroft-Karp. I would suggest you to read the Wikipedia explanation that I linked.

Related

Finding probability that chess Knight will stay on chessboard after k moves with dynamic prgramming

I was trying out "Knight Probability in Chessboard" problem from leetcode:
Given n, k, row and column, we have to find the probability that knight initially kept at cell indexed by [row,column] will stay on n x n chessboard after k moves.
I wanted to do it by addition, that is, maintain number of ways we can get to cell at index [x,y] in kth step at dynamic programming memory location indexed [x,y,k]. Then sum counts in all cells at kth index and then divide it by 8^k. That is, if I start at index [0,0], with n=4, the values at successive k-th index will be:
After step 1:
0 0 0 0
0 0 1 0
0 1 0 0
0 0 0 0
After step 2:
4 0 2 0
0 0 0 2
2 0 0 0
0 2 0 4
After step 3:
0 6 0 0
6 0 11 0
0 11 0 6
0 0 6 0
Only first step output seems to be correct. After second step, the sum is 2+2+2+2+4+4=16 and the probability is 16/8^2 = 0.25. However, the actual answer is 0.125. After third step, the sum becomes 6+6+6+6+11+11=46 and the probability is 46/8^3 = 0.0898. But, the actual answer is 0.039. Where does this dynamic programming approach make mistake?
Sample calculation for step 2
Bottom up approach:
Start by filling P(x_start, y_start, 0) = 1 and setting (x_start, y_start) in a map (from positions to booleans) previous_layer_map. Also, set the counter current_layer to 1.
Iterate though each of the n^2 positions of the board. For each of them, check in O(1) if it reaches a square in previous_layer_map. If so:
If (x, y) was never saw before in the current layer (current_layer_map[x][y] == false), fill
P(x, y, current_layer) = P(x_reached, y_reached, current_layer-1)/8
and set (x, y) in current_layer_map.
Else, set
P(x, y, current_layer) += P(x_reached, y_reached, current_layer-1)/8
After you finish iterating though each of the n^2 positions of the board, empty previous_layer_map, fill it with the elements of current_layer_map and empty current_layer_map. Also, increase the counter current_layer. Then, start a new iteration. Go like this until you reach the k-th layer.
Total time complexity: O(k * n^2).
Top down approach:
Let P(x, y, k) be the probability that the knight is at the square (x, y) at the k-th step. Look at all squares that the knight could have come from (you can get them in O(1), just look at the board with a pen and paper and get the formulas from the different cases, like knight in the corner, knight in the border, knight in a central region etc). Let them be (x1, y1), ... (xj, yj). For each of these squares, what is the probability that the knight jumps to (x, y) ? Considering that it can go out of the border, it's always 1/8. So:
P(x, y, k) = (P(x1, y1, k-1) + ... + P(xj, yj, k-1))/8
The base case is k = 0.
P(x, y ,0) = 1 if (x, y) = (x_start, y_start) and P(x, y, 0) = 0 otherwise.
That is your recurrence formula. You can use dynamic programming to calculate it.
Open question: how to analyze the time complexity of this solution ? Is it equivalent to the bottom-up approach described in my other answer ?
I was incorrecty incrementing the numbers. For example in the diagram shown at the end of original question, red arrows increments from 1 to 2. It shouldnt be the case as going from one cell to next represents the same single path to next cell. It does not create two different paths to next cell. Same is the case with blue arrow. So, corrected steps are:
After step 1
0 0 0 0
0 0 1 0
0 1 0 0
0 0 0 0
After step 2
2 0 1 0
0 0 0 1
1 0 0 0
0 1 0 2
After step 3
0 2 0 0
2 0 6 0
0 6 0 2
0 0 2 0
and (2+2+2+2+6+6)/8^3 = 20/8^3 = 0.039
which is the correct answer!

algorithm for dividing x amount of people into n rooms of different sizes

For a project I have to design an algorithm that will fit a group of people into hotel rooms given their preference. I have created a dictionary in Python that has a person as key, and as a value a list of all people they would like to be in a room with.
There are different types of rooms that can hold between 2-10 people. How many rooms of what type there are is specified by the user of the program.
I have tried to brute force this problem by trying all room combinations and then giving each room a score based on the preference of the residents and looking for the maximum score. This works fine for small group sizes but having a group of 200 will give 200! combinations which my poor computer will not be able to compute within my lifetime.
I was wondering if there is an algorithm that I have not been able to find with the solution to my problem.
Thanks in advance!
Thijs
What you can do is think of your dictionary as a graph. Then you can create an adjacency matrix.
For example let say you have a group of 4 people, A, B, C and D.
A: wants to be with B and C
B: wants to be with A
C: wants to be with D
D: want to be with A and C
Your matrix would look like this:
// A B C D
// A 0 1 1 0
// B 1 0 0 0
// C 0 0 0 1
// D 1 0 1 0
Let's call this matrix M. You can then calculate the transpose (let's call it MT) and add M to MT. You will get something like this.
// A B C D
// A 0 2 1 1
// B 2 0 0 0
// C 1 0 0 2
// D 1 0 2 0
Then order the lines (or the columns it doesn't matter because it is symmetric) based on the sum of its values.
// A B C D
// A 0 2 1 1
// C 1 0 0 2
// D 1 0 2 0
// B 2 0 0 0
Do the same with the columns
// A C D B
// A 0 1 1 2
// C 1 0 2 0
// D 1 2 0 0
// B 2 0 0 0
Start filling your rooms starting from the first line based on the greatest value in that line and reduce the matrix by removing people that were assigned a room. You should start by selecting the biggest room first.
For example if we have a room that can have 2 people you'd assign person B and A to it since the biggest value in the first line is 2 and it corresponds to person B.
The reduced matrix would then be:
// C D
// C 0 2
// D 2 0
And you loop till all is done.
You already had a greedy solution described. So instead I'll suggest a simulated annealing solution.
For this you first assign everyone to rooms randomly. And now you start considering swapping people at random. You always accept swaps that improve your score, but have a chance of accepting a bad swap. The chance of accepting a bad swap goes down if the swap is really bad, and also goes down with time. After you've experimented enough, whatever you have is probably pretty good.
It is called "simulated annealing" because it is a simulation of the process by which a slowly cooling substance forms a well-organized crystal structure. So the parameter that you usually use is called T for temperature. And a standard function is:
def maybe_swap(assignment, x, y, T):
score_now = score(assignment)
swapped = swap(assignment, x, y)
score_swapped = score(swapped)
if random.random() < math.exp( (score_swapped - score_now) / T ):
return swapped
else:
return assignment
And then you just have to play around with how much work to do. Something like this:
for count_down in range(400, -1, -1):
for i in range(n^2):
x = floor(random.random(n))
y = floor(random.random(n))
if x != y:
assignment = maybe_swap(assignment, x, y, count_down / 100.0)
(You should play around with the parameters.)

Which algorithms are there to find the Smallest Set of Smallest Rings?

I have an unweighted undirected connected graph. Generally, it's a chemical compound with lots of cycles side by side. The problem is common in this field and is called like the title says. Good algorithm is Horton's one. However, I don't seem to find any exact information about the algorithm, step by step.
Clearly speaking my problem is this, Algorithm for finding minimal cycles in a graph , but unfortunately the link to the site is disabled.
I only found python code of Figueras algorithm but Figuearas does not work in every case. Sometimes it doesn't find all rings.
The problem is similar to this, Find all chordless cycles in an undirected graph , I tried it but didn't work for more complex graphs like mine.
I found 4-5 sources of needed information, but the algorithm is not fully explained at all.
I don't seem to find any algorithm for SSSR although it seems a common problem, mainly in the chemistry field.
Horton's algorithm is pretty simple. I'll describe it for your use case.
For each vertex v, compute a breadth-first search tree rooted at v. For each edge wx such that v, w, x are pairwise distinct and such that the least common ancestor of w and x is v, add a cycle consisting of the path from v to w, the edge wx, and the path from x back to v.
Sort these cycles by size nondecreasing and consider them in order. If the current cycle can be expressed as the "exclusive OR" of cycles considered before it, then it is not part of the basis.
The test in Step 2 is the most complicated part of this algorithm. What you need to do, basically, is write out the accepted cycle and the candidate cycle as a 0-1 incidence matrix whose rows are indexed by cycle and whose columns are indexed by edge, then run Gaussian elimination on this matrix to see whether it makes an all-zero row (if so, discard the candidate cycle).
With some effort, it's possible to save the cost of re-eliminating the accepted cycles every time, but that's an optimization.
For example, if we have a graph
a---b
| /|
| / |
|/ |
c---d
then we have a matrix like
ab ac bc bd cd
abca 1 1 1 0 0
bcdb 0 0 1 1 1
abdca 1 1 0 1 1
where I'm cheating a bit because abdca is not actually one of the cycles generated in Step 1.
Elimination proceeds as follows:
ab ac bc bd cd
1 1 1 0 0
0 0 1 1 1
1 1 0 1 1
row[2] ^= row[0];
ab ac bc bd cd
1 1 1 0 0
0 0 1 1 1
0 0 1 1 1
row[2] ^= row[1];
ab ac bc bd cd
1 1 1 0 0
0 0 1 1 1
0 0 0 0 0
so that set of cycles is dependent (don't keep the last row).

Special scheduling Algorithm (pattern expansion)

Question
Do you think genetic algorithms worth trying out for the problem below, or will I hit local-minima issues?
I think maybe aspects of the problem is great for a generator / fitness-function style setup. (If you've botched a similar project I would love hear from you, and not do something similar)
Thank you for any tips on how to structure things and nail this right.
The problem
I'm searching a good scheduling algorithm to use for the following real-world problem.
I have a sequence with 15 slots like this (The digits may vary from 0 to 20) :
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
(And there are in total 10 different sequences of this type)
Each sequence needs to expand into an array, where each slot can take 1 position.
1 1 0 0 1 1 1 0 0 0 1 1 1 0 0
1 1 0 0 1 1 1 0 0 0 1 1 1 0 0
0 0 1 1 0 0 0 1 1 1 0 0 0 1 1
0 0 1 1 0 0 0 1 1 1 0 0 0 1 1
The constraints on the matrix is that:
[row-wise, i.e. horizontally] The number of ones placed, must either be 11 or 111
[row-wise] The distance between two sequences of 1 needs to be a minimum of 00
The sum of each column should match the original array.
The number of rows in the matrix should be optimized.
The array then needs to allocate one of 4 different matrixes, which may have different number of rows:
A, B, C, D
A, B, C and D are real-world departments. The load needs to be placed reasonably fair during the course of a 10-day period, not to interfere with other department goals.
Each of the matrix is compared with expansion of 10 different original sequences so you have:
A1, A2, A3, A4, A5, A6, A7, A8, A9, A10
B1, B2, B3, B4, B5, B6, B7, B8, B9, B10
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10
D1, D2, D3, D4, D5, D6, D7, D8, D9, D10
Certain spots on these may be reserved (Not sure if I should make it just reserved/not reserved or function-based). The reserved spots might be meetings and other events
The sum of each row (for instance all the A's) should be approximately the same within 2%. i.e. sum(A1 through A10) should be approximately the same as (B1 through B10) etc.
The number of rows can vary, so you have for instance:
A1: 5 rows
A2: 5 rows
A3: 1 row, where that single row could for instance be:
0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
etc..
Sub problem*
I'de be very happy to solve only part of the problem. For instance being able to input:
1 1 2 3 4 2 2 3 4 2 2 3 3 2 3
And get an appropriate array of sequences with 1's and 0's minimized on the number of rows following th constraints above.
Sub-problem solution attempt
Well, here's an idea. This solution is not based on using a genetic algorithm, but some ideas could be used in going in that direction.
Basis vectors
First of all, you should generate what I think of as the basis vectors. For instance, if your sequence were 3 numbers long rather than 15, the basis vectors would be:
v1 = [1 1 0]
v2 = [0 1 1]
v3 = [1 1 1]
Any solution for sequence length 3 would be a linear combination of these three vectors using only positive integers. In other words, the general solution would be
a*v1 + b*v2 + c*v3
where a, b and c are positive integers. For the sequence [1 2 1], the solution is v1 = 1, v2 = 1, v3 = 0. What you first want to do is find all of the possible basis vectors of length 15. From my rough calculations I think that there are somewhere between 300-400 basis vectors of length 15. I can give you some tips towards generating them if you want.
Finding solutions
Now, what you want to do is sort these basis vectors by their sums/magnitudes. Then in searching for your solution, you start with the basis vectors which have the largest sums. We start with the vectors that have the largest sums because they lead to having less total rows. We also have an array, veccoefs, which contains an entry for the linear coefficient for each basis vector. At the beginning of searching for the solution, all the veccoefs are 0.
So we take the first basis vector (the one with the largest sum/magnitude) and subtract this vector from the sequence until we either create an unsolvable result ( having a 0 1 0 in it for instance) or any of the numbers in the result is negative. We store the number of times we subtract the vector in veccoefs. We use the result after subtracting the basis vector from the sequence as the sequence for the next basis vector. If there are only zeros left in the result, then we stop the loop.
I'm not sure of the efficiency/accuracy of this method, but it might at least give you some ideas.
Other possible solutions
Another idea for solving this is to use the basis vectors and form the problem as an optimization/least squares problem. You form a matrix of the basis vectors such that the basic problem will be minimizing Sum[(Ax - b)^2] where A is the matrix of basis vectors, b is the input sequence, and x are the basis vector coefficients. However, you also want to minimize the number of rows, so you can add a term like x^T*x to the minimization function where x^T is the transpose of x. The hard part in my opinion is finding differentiable terms to add that will encourage integer vector coefficients. If you can think of a way to do that, then optimization could very well be a good way to do this.
Also, you might consider a Metropolis-type Monte Carlo solution. You would choose randomly whether to add a vector, remove a vector, or substitute a vector at each step. The vector to be added/removed/substituted would be chosen randomly. The probability of this change to be accepted would be a ratio of the suitabilities of the solutions before the change and after the change. The suitability could be equal to the difference between the current solution and the sequence, squared and summed, minus the number of rows/basis vectors involved in the solution. You would need to put in appropriate constants to for various terms to try to get the acceptance rate around 50%. I kind of doubt that this will work very well, but I thought that you should still consider it when looking for possible solutions.
GA can be applied to this problem, but it won't be 5 minute task. You need to put several things together, without knowing which implementation of each of them is best.
So:
Solution representation - how you will represent possible solution? Using matrix seems to be most straight forward. Using collection of one dimensional arrays is possible also.
But you have some constrains, so maybe SuperGene concept is worth considering?
You must use proper mutation/crossover operators for given gene representation.
How will you enforce constrains on solutions? Destroying those that are not proper? What if they contain valuable information? Maybe let them stay in population but add some penalty to fitness, so they will contribute to offspring, but won't go into next generations?
Anyway I think that GA can be applied to this problem. Is it worth? Usually GA are not best algorithm, but they are decent algorithm if others fail. I would go with GA, just because it would be most fun but I would look for alternative solution (just in case).
P.S. Personal insight: I was solving N Queens Problem, for 70 < N < 100 (board NxN, N queens). Algorithm was working fine for lower N (maybe it was trying all combination?), but with N in this range, I couldn't find proper solution. Fitness quickly jumped to about 90% of max, but in the end there were always two queens conflicting. But it was very naive implementation.

Sorting a binary 2D matrix?

I'm looking for some pointers here as I don't quite know where to start researching this one.
I have a 2D matrix with 0 or 1 in each cell, such as:
1 2 3 4
A 0 1 1 0
B 1 1 1 0
C 0 1 0 0
D 1 1 0 0
And I'd like to sort it so it is as "upper triangular" as possible, like so:
4 3 1 2
B 0 1 1 1
A 0 1 0 1
D 0 0 1 1
C 0 0 0 1
The rows and columns must remain intact, i.e. elements can't be moved individually and can only be swapped "whole".
I understand that there'll probably be pathological cases where a matrix has multiple possible sorted results (i.e. same shape, but differ in the identity of the "original" rows/columns.)
So, can anyone suggest where I might find some starting points for this? An existing library/algorithm would be great, but I'll settle for knowing the name of the problem I'm trying to solve!
I doubt it's a linear algebra problem as such, and maybe there's some kind of image processing technique that's applicable.
Any other ideas aside, my initial guess is just to write a simple insertion sort on the rows, then the columns and iterate that until it stabilises (and hope that detecting the pathological cases isn't too hard.)
More details: Some more information on what I'm trying to do may help clarify. Each row represents a competitor, each column represents a challenge. Each 1 or 0 represents "success" for the competitor on a particular challenge.
By sorting the matrix so all 1s are in the top-right, I hope to then provide a ranking of the intrinsic difficulty of each challenge and a ranking of the competitors (which will take into account the difficulty of the challenges they succeeded at, not just the number of successes.)
Note on accepted answer: I've accepted Simulated Annealing as "the answer" with the caveat that this question doesn't have a right answer. It seems like a good approach, though I haven't actually managed to come up with a scoring function that works for my problem.
An Algorithm based upon simulated annealing can handle this sort of thing without too much trouble. Not great if you have small matrices which most likely hae a fixed solution, but great if your matrices get to be larger and the problem becomes more difficult.
(However, it also fails your desire that insertions can be done incrementally.)
Preliminaries
Devise a performance function that "scores" a matrix - matrices that are closer to your triangleness should get a better score than those that are less triangle-y.
Devise a set of operations that are allowed on the matrix. Your description was a little ambiguous, but if you can swap rows then one op would be SwapRows(a, b). Another could be SwapCols(a, b).
The Annealing loop
I won't give a full exposition here, but the idea is simple. You perform random transformations on the matrix using your operations. You measure how much "better" the matrix is after the operation (using the performance function before and after the operation). Then you decide whether to commit that transformation. You repeat this process a lot.
Deciding whether to commit the transform is the fun part: you need to decide whether to perform that operation or not. Toward the end of the annealing process, you only accept transformations that improved the score of the matrix. But earlier on, in a more chaotic time, you allow transformations that don't improve the score. In the beginning, the algorithm is "hot" and anything goes. Eventually, the algorithm cools and only good transforms are allowed. If you linearly cool the algorithm, then the choice of whether to accept a transformation is:
public bool ShouldAccept(double cost, double temperature, Random random) {
return Math.Exp(-cost / temperature) > random.NextDouble();
}
You should read the excellent information contained in Numerical Recipes for more information on this algorithm.
Long story short, you should learn some of these general purpose algorithms. Doing so will allow you to solve large classes of problems that are hard to solve analytically.
Scoring algorithm
This is probably the trickiest part. You will want to devise a scorer that guides the annealing process toward your goal. The scorer should be a continuous function that results in larger numbers as the matrix approaches the ideal solution.
How do you measure the "ideal solution" - triangleness? Here is a naive and easy scorer: For every point, you know whether it should be 1 or 0. Add +1 to the score if the matrix is right, -1 if it's wrong. Here's some code so I can be explicit (not tested! please review!)
int Score(Matrix m) {
var score = 0;
for (var r = 0; r < m.NumRows; r++) {
for (var c = 0; c < m.NumCols; c++) {
var val = m.At(r, c);
var shouldBe = (c >= r) ? 1 : 0;
if (val == shouldBe) {
score++;
}
else {
score--;
}
}
}
return score;
}
With this scoring algorithm, a random field of 1s and 0s will give a score of 0. An "opposite" triangle will give the most negative score, and the correct solution will give the most positive score. Diffing two scores will give you the cost.
If this scorer doesn't work for you, then you will need to "tune" it until it produces the matrices you want.
This algorithm is based on the premise that tuning this scorer is much simpler than devising the optimal algorithm for sorting the matrix.
I came up with the below algorithm, and it seems to work correctly.
Phase 1: move rows with most 1s up and columns with most 1s right.
First the rows. Sort the rows by counting their 1s. We don't care
if 2 rows have the same number of 1s.
Now the columns. Sort the cols by
counting their 1s. We don't care
if 2 cols have the same number of
1s.
Phase 2: repeat phase 1 but with extra criterions, so that we satisfy the triangular matrix morph.
Criterion for rows: if 2 rows have the same number of 1s, we move up the row that begin with fewer 0s.
Criterion for cols: if 2 cols have the same number of 1s, we move right the col that has fewer 0s at the bottom.
Example:
Phase 1
1 2 3 4 1 2 3 4 4 1 3 2
A 0 1 1 0 B 1 1 1 0 B 0 1 1 1
B 1 1 1 0 - sort rows-> A 0 1 1 0 - sort cols-> A 0 0 1 1
C 0 1 0 0 D 1 1 0 0 D 0 1 0 1
D 1 1 0 0 C 0 1 0 0 C 0 0 0 1
Phase 2
4 1 3 2 4 1 3 2
B 0 1 1 1 B 0 1 1 1
A 0 0 1 1 - sort rows-> D 0 1 0 1 - sort cols-> "completed"
D 0 1 0 1 A 0 0 1 1
C 0 0 0 1 C 0 0 0 1
Edit: it turns out that my algorithm doesn't give proper triangular matrices always.
For example:
Phase 1
1 2 3 4 1 2 3 4
A 1 0 0 0 B 0 1 1 1
B 0 1 1 1 - sort rows-> C 0 0 1 1 - sort cols-> "completed"
C 0 0 1 1 A 1 0 0 0
D 0 0 0 1 D 0 0 0 1
Phase 2
1 2 3 4 1 2 3 4 2 1 3 4
B 0 1 1 1 B 0 1 1 1 B 1 0 1 1
C 0 0 1 1 - sort rows-> C 0 0 1 1 - sort cols-> C 0 0 1 1
A 1 0 0 0 A 1 0 0 0 A 0 1 0 0
D 0 0 0 1 D 0 0 0 1 D 0 0 0 1
(no change)
(*) Perhaps a phase 3 will increase the good results. In that phase we place the rows that start with fewer 0s in the top.
Look for a 1987 paper by Anna Lubiw on "Doubly Lexical Orderings of Matrices".
There is a citation below. The ordering is not identical to what you are looking for, but is pretty close. If nothing else, you should be able to get a pretty good idea from there.
http://dl.acm.org/citation.cfm?id=33385
Here's a starting point:
Convert each row from binary bits into a number
Sort the numbers in descending order.
Then convert each row back to binary.
Basic algorithm:
Determine the row sums and store
values. Determine the column sums
and store values.
Sort the row sums in ascending order. Sort the column
sums in ascending order.
Hopefully, you should have a matrix with as close to an upper-right triangular region as possible.
Treat rows as binary numbers, with the leftmost column as the most significant bit, and sort them in descending order, top to bottom
Treat the columns as binary numbers with the bottommost row as the most significant bit and sort them in ascending order, left to right.
Repeat until you reach a fixed point. Proof that the algorithm terminates left as an excercise for the reader.

Resources