Hungarian algorithm - assign systematically - algorithm

I'm implementing the Hungarian algorithm in a project. I managed to get it working until what is called step 4 on Wikipedia. I do manage to let the computer create enough zeroes so that the minimal amount of covering lines is the amount of rows/columns, but I'm stuck when it comes to actually assign the right agent to the right job. I see how I could assign myself, but that's more trial and error - i.e., I do not see the systematic method which is of course essential for the computer to get it work.
Say we have this matrix in the end:
a b c d
0 30 0 0 0
1 0 35 5 0
2 60 5 0 0
3 0 50 35 40
The zeroes we have to take to have each agent assigned to a job are (a, 3), (b, 0), (c,2) and (d,1). What is the system behind chosing these ones? My code now picks (b, 0) first, and ignores row 0 and column b from now on. However, it then picks (a, 1), but with this value picked there is no assignment possible for row 3 anymore.
Any hints are appreciated.

Well, I did manage to solve it in the end. The method I used was to check whether there are any columns/rows with only one zero. In such case, that agent must use that job, and that column and row have to be ignored in the future. Then, do it again so as to get a job for every agent.
In my example, (b, 0) would be the first choice. After that we have:
a b c d
0 x x x x
1 0 x 5 0
2 60 x 0 0
3 0 x 35 40
Using the method again, we can do (a, 3), etc. I'm not sure whether it has been proven that this is always correct, but it seems it is.

Related

algorithm for dividing x amount of people into n rooms of different sizes

For a project I have to design an algorithm that will fit a group of people into hotel rooms given their preference. I have created a dictionary in Python that has a person as key, and as a value a list of all people they would like to be in a room with.
There are different types of rooms that can hold between 2-10 people. How many rooms of what type there are is specified by the user of the program.
I have tried to brute force this problem by trying all room combinations and then giving each room a score based on the preference of the residents and looking for the maximum score. This works fine for small group sizes but having a group of 200 will give 200! combinations which my poor computer will not be able to compute within my lifetime.
I was wondering if there is an algorithm that I have not been able to find with the solution to my problem.
Thanks in advance!
Thijs
What you can do is think of your dictionary as a graph. Then you can create an adjacency matrix.
For example let say you have a group of 4 people, A, B, C and D.
A: wants to be with B and C
B: wants to be with A
C: wants to be with D
D: want to be with A and C
Your matrix would look like this:
// A B C D
// A 0 1 1 0
// B 1 0 0 0
// C 0 0 0 1
// D 1 0 1 0
Let's call this matrix M. You can then calculate the transpose (let's call it MT) and add M to MT. You will get something like this.
// A B C D
// A 0 2 1 1
// B 2 0 0 0
// C 1 0 0 2
// D 1 0 2 0
Then order the lines (or the columns it doesn't matter because it is symmetric) based on the sum of its values.
// A B C D
// A 0 2 1 1
// C 1 0 0 2
// D 1 0 2 0
// B 2 0 0 0
Do the same with the columns
// A C D B
// A 0 1 1 2
// C 1 0 2 0
// D 1 2 0 0
// B 2 0 0 0
Start filling your rooms starting from the first line based on the greatest value in that line and reduce the matrix by removing people that were assigned a room. You should start by selecting the biggest room first.
For example if we have a room that can have 2 people you'd assign person B and A to it since the biggest value in the first line is 2 and it corresponds to person B.
The reduced matrix would then be:
// C D
// C 0 2
// D 2 0
And you loop till all is done.
You already had a greedy solution described. So instead I'll suggest a simulated annealing solution.
For this you first assign everyone to rooms randomly. And now you start considering swapping people at random. You always accept swaps that improve your score, but have a chance of accepting a bad swap. The chance of accepting a bad swap goes down if the swap is really bad, and also goes down with time. After you've experimented enough, whatever you have is probably pretty good.
It is called "simulated annealing" because it is a simulation of the process by which a slowly cooling substance forms a well-organized crystal structure. So the parameter that you usually use is called T for temperature. And a standard function is:
def maybe_swap(assignment, x, y, T):
score_now = score(assignment)
swapped = swap(assignment, x, y)
score_swapped = score(swapped)
if random.random() < math.exp( (score_swapped - score_now) / T ):
return swapped
else:
return assignment
And then you just have to play around with how much work to do. Something like this:
for count_down in range(400, -1, -1):
for i in range(n^2):
x = floor(random.random(n))
y = floor(random.random(n))
if x != y:
assignment = maybe_swap(assignment, x, y, count_down / 100.0)
(You should play around with the parameters.)

Converting a number into a special base system

I want to convert a number in base 10 into a special base form like this:
A*2^2 + B*3^1 + C*2^0
A can take on values of [0,1]
B can take on values of [0,1,2]
C can take on values of [0,1]
For example, the number 8 would be
1*2^2 + 1*3 + 1.
It is guaranteed that the given number can be converted to this specialized base system.
I know how to convert from this base system back to base-10, but I do not know how to convert from base-10 to this specialized base system.
In short words, treat every base number (2^2, 3^1, 2^0 in your example) as weight of an item, and the whole number as the capacity of a bag. This problem wants us to find a combination of these items which they fill the bag exactly.
In the first place this problem is NP-complete. It is identical to the subset sum problem, which can also be seen as a derivative problem of the knapsack problem.
Despite this fact, this problem can however be solved by a pseudo-polynomial time algorithm using dynamic programming in O(nW) time, which n is the number of bases, and W is the number to decompose. The details can be find in this wikipedia page: http://en.wikipedia.org/wiki/Knapsack_problem#Dynamic_programming and this SO page: What's it called when I want to choose items to fill container as full as possible - and what algorithm should I use?.
Simplifying your "special base":
X = A * 4 + B * 3 + C
A E {0,1}
B E {0,1,2}
C E {0,1}
Obviously the largest number that can be represented is 4 + 2 * 3 + 1 = 11
To figure out how to get the values of A, B, C you can do one of two things:
There are only 12 possible inputs: create a lookup table. Ugly, but quick.
Use some algorithm. A bit trickier.
Let's look at (1) first:
A B C X
0 0 0 0
0 0 1 1
0 1 0 3
0 1 1 4
0 2 0 6
0 2 1 7
1 0 0 4
1 0 1 5
1 1 0 7
1 1 1 8
1 2 0 10
1 2 1 11
Notice that 2 and 9 cannot be expressed in this system, while 4 and 7 occur twice. The fact that you have multiple possible solutions for a given input is a hint that there isn't a really robust algorithm (other than a look up table) to achieve what you want. So your table might look like this:
int A[] = {0,0,-1,0,0,1,0,1,1,-1,1,1};
int B[] = {0,0,-1,1,1,0,2,1,1,-1,2,2};
int C[] = {0,1,-1,0,2,1,0,1,1,-1,0,1};
Then look up A, B, C. If A < 0, there is no solution.

Hungarian (Kuhn Munkres) algorithm oddity

I've read every answer here, Wikipedia and WikiHow, the indian guy's lecture, and other sources, and I'm pretty sure I understand what they're saying and have implemented it that way. But I'm confused about a statement that all of these explanations make that is clearly false.
They all say to cover the zeros in the matrix with a minimum number of lines, and if that is equal to N (that is, there's a zero in every row and every column), then there's a zero solution and we're done. But then I found this:
a b c d e
A 0 7 0 0 0
B 0 8 0 0 6
C 5 0 7 3 4
D 5 0 5 9 3
E 0 4 0 0 9
There's a zero in every row and column, and no way to cover the zeros with fewer than five lines, but there's clearly no zero solution. Row C has only the zero in column b, but that leaves no zero for row D.
Do I misunderstand something here? Do I need a better test for whether or not there's a zero assignment possible? Are all these sources leaving out something essential?
You can cover the zeros in the matrix in your example with only four lines: column b, row A, row B, row E.
Here is a step-by-step walkthrough of the algorithm as it is presented in the Wikipedia article as of June 25 applied to your example:
a b c d e
A 0 7 0 0 0
B 0 8 0 0 6
C 5 0 7 3 4
D 5 0 5 9 3
E 0 4 0 0 9
Step 1: The minimum in each row is zero, so the subtraction has no effect. We try to assign tasks such that every task is performed at zero cost, but this turns out to be impossible. Proceed to next step.
Step 2: The minimum in each column is also zero, so this step also has no effect. Proceed to next step.
Step 3: We locate a minimal number of lines to cover up all the zeros. We find [b,A,B,E].
a b c d e
A ---|---------
B ---|---------
C 5 | 7 3 4
D 5 | 5 9 3
E ---|---------
Step 4: We locate the minimal uncovered element. This is 3, at (C,d) and (D,e). We subtract 3 from every unmarked element and add 3 to every element covered by two lines:
a b c d e
A 0 10 0 0 0
B 0 11 0 0 6
C 2 0 4 0 1
D 2 0 2 6 0
E 0 7 0 0 9
Immediately the minimum number of lines to cover up all the zeros becomes 5. This is easy to verify as there is a zero in every row and a zero in every column. The algorithm asserts that an assignment like the one we were looking for in step 1 should now be possible on the new matrix.
We try to assign tasks such that every task is performed at zero cost (according to the new matrix). This is now possible. We find the solution [(A,e),(B,c),(C,d),(D,b),(E,a)].
We can now go back and verify that the solution that we found actually is optimal. We see that every assigned job has zero cost, except (C,d), which has cost 3. Since 3 is actually the lowest nonzero element in the matrix, and we have seen that there is no zero-cost solution, it is clear that this is an optimal solution.

Resource Sharing/Trading algorithm

Lets say we have 3 people, Alice, Bob, and Charlie.
Lets say each of them have a resource, Aplles, Bannanas, and Coconuts.
Each of them have 3 of this resource.
The goal of the algorithm is to make 1-1 trades such that each of them end up with 1 of each of our 3 resources. A list of those trades is what I want to obtain.
Ideally I would like to know how to solve this. But I'm willing to settle for the name of this kind of problem, or a problem similar to it that I can research and get ideas from.
The problem I'm working on will have around 600 objects, with ~1000 people each with a random amount/type of starting resources, (with the assumption that there are enough resources to satisfy our end result) so Ideally any solution provided would be feasible for such a scale. But I'll take whatever I can get, I just need some kind of starting point.
The answers of ElKamina and Tyler Durden are decent, but they don't seem to take into account that Kuriso would like to perform 1-1 trades, that people may have multiple commodities, and multiple units of commodities. I have a naive solution that does.
I think the original example was a bit oversimplified, so let's take another one:
c1 c2 c3 c4
A 5 0 1 0
B 0 1 0 1
C 0 6 2 0
Where A,B,C are people and c1,c2,c3,c4 are the commodities.
First, let's calculate the ideal distribution, which is easily done: for each commodity, divide the sum of stuff by the number of people, rounded down, and everybody gets that:
c1 c2 c3 c4
A 1 2 1 0
B 1 2 1 0
C 1 2 1 0
Now let's define a WANT function denoting how much of a stuff c would person X need to get into the ideal position: WANT(X,c) = IDEAL(c) - Xc.
c1 c2 c3 c4 sum
A -4 2 0 0 -2
B 1 1 1 0 3
C 1 -4 -1 0 -4
Let's make a list of people ordered by the sum of their wants. Let's take the richest guy, the one with the lowest want, in this case C, and let's try to satisfy his wants by matching him up with people who has the most to offer of the commodity he wants most. If they can make a trade, great, if not, continue until we find a match (a match is guaranteed, eventually). In this example, C needs c1; the one offering the most c1 is A, iterating over the commodities, we find that A needs c2 and C does have surplus c2, so they exchange them. Update their position in the list, or remove them if they no longer have any needs. Iterate this until nobody has any wants. This won't produce properly equal distribution, but as equal as they can get to by 1 for 1 trading.
This is indeed a naive solution, with the heuristics that the richest guy has the most chance to offer stuff in return for the commodity he needs. The complexity is high, but with ordered lists it should be managable for the numbers you specified.
Assume you have a total number of x1 resources of kind 1,..., xn resources of kind n.
Assume you have k people and each of them have (or need to end up with y1, y2,..., yk resources respectively.
Now, pick a person i and assign him resources that are most prevalent. Once assignment is done, decrement the corresponding xj s (i.e. if resource j is assigned to i, decrement xj).
Keep repeating until all resources are assigned.
This is the way to assign stuff most evenly. It assumes that you dont care about sequences of trades, but the end result itself.
To restate this, let's say you have set of lists like this:
{ 1, 1, 1 }
{ 2, 2, 2 }
{ 3, 3, 3 }
and you want to swap elements from different sets until you have the sets like this:
{ 1, 2, 3 }
{ 1, 2, 3 }
{ 1, 2, 3 }
Now, you might notice that if we regard these lists as a single matrix then one matrix is the inverse of the other. You can perform this inversion by swapping across the 1-2-3 diagonal.
So item 2 in list 1 is swapped with item 2 in row 2, item 3 in list 1 is swapped with item 1 in list 3, and finally item 3 in list 2 is swapped with item 2 in list 3.
To sum up: do a matrix inversion by swapping across the diagonal.

Special scheduling Algorithm (pattern expansion)

Question
Do you think genetic algorithms worth trying out for the problem below, or will I hit local-minima issues?
I think maybe aspects of the problem is great for a generator / fitness-function style setup. (If you've botched a similar project I would love hear from you, and not do something similar)
Thank you for any tips on how to structure things and nail this right.
The problem
I'm searching a good scheduling algorithm to use for the following real-world problem.
I have a sequence with 15 slots like this (The digits may vary from 0 to 20) :
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
(And there are in total 10 different sequences of this type)
Each sequence needs to expand into an array, where each slot can take 1 position.
1 1 0 0 1 1 1 0 0 0 1 1 1 0 0
1 1 0 0 1 1 1 0 0 0 1 1 1 0 0
0 0 1 1 0 0 0 1 1 1 0 0 0 1 1
0 0 1 1 0 0 0 1 1 1 0 0 0 1 1
The constraints on the matrix is that:
[row-wise, i.e. horizontally] The number of ones placed, must either be 11 or 111
[row-wise] The distance between two sequences of 1 needs to be a minimum of 00
The sum of each column should match the original array.
The number of rows in the matrix should be optimized.
The array then needs to allocate one of 4 different matrixes, which may have different number of rows:
A, B, C, D
A, B, C and D are real-world departments. The load needs to be placed reasonably fair during the course of a 10-day period, not to interfere with other department goals.
Each of the matrix is compared with expansion of 10 different original sequences so you have:
A1, A2, A3, A4, A5, A6, A7, A8, A9, A10
B1, B2, B3, B4, B5, B6, B7, B8, B9, B10
C1, C2, C3, C4, C5, C6, C7, C8, C9, C10
D1, D2, D3, D4, D5, D6, D7, D8, D9, D10
Certain spots on these may be reserved (Not sure if I should make it just reserved/not reserved or function-based). The reserved spots might be meetings and other events
The sum of each row (for instance all the A's) should be approximately the same within 2%. i.e. sum(A1 through A10) should be approximately the same as (B1 through B10) etc.
The number of rows can vary, so you have for instance:
A1: 5 rows
A2: 5 rows
A3: 1 row, where that single row could for instance be:
0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
etc..
Sub problem*
I'de be very happy to solve only part of the problem. For instance being able to input:
1 1 2 3 4 2 2 3 4 2 2 3 3 2 3
And get an appropriate array of sequences with 1's and 0's minimized on the number of rows following th constraints above.
Sub-problem solution attempt
Well, here's an idea. This solution is not based on using a genetic algorithm, but some ideas could be used in going in that direction.
Basis vectors
First of all, you should generate what I think of as the basis vectors. For instance, if your sequence were 3 numbers long rather than 15, the basis vectors would be:
v1 = [1 1 0]
v2 = [0 1 1]
v3 = [1 1 1]
Any solution for sequence length 3 would be a linear combination of these three vectors using only positive integers. In other words, the general solution would be
a*v1 + b*v2 + c*v3
where a, b and c are positive integers. For the sequence [1 2 1], the solution is v1 = 1, v2 = 1, v3 = 0. What you first want to do is find all of the possible basis vectors of length 15. From my rough calculations I think that there are somewhere between 300-400 basis vectors of length 15. I can give you some tips towards generating them if you want.
Finding solutions
Now, what you want to do is sort these basis vectors by their sums/magnitudes. Then in searching for your solution, you start with the basis vectors which have the largest sums. We start with the vectors that have the largest sums because they lead to having less total rows. We also have an array, veccoefs, which contains an entry for the linear coefficient for each basis vector. At the beginning of searching for the solution, all the veccoefs are 0.
So we take the first basis vector (the one with the largest sum/magnitude) and subtract this vector from the sequence until we either create an unsolvable result ( having a 0 1 0 in it for instance) or any of the numbers in the result is negative. We store the number of times we subtract the vector in veccoefs. We use the result after subtracting the basis vector from the sequence as the sequence for the next basis vector. If there are only zeros left in the result, then we stop the loop.
I'm not sure of the efficiency/accuracy of this method, but it might at least give you some ideas.
Other possible solutions
Another idea for solving this is to use the basis vectors and form the problem as an optimization/least squares problem. You form a matrix of the basis vectors such that the basic problem will be minimizing Sum[(Ax - b)^2] where A is the matrix of basis vectors, b is the input sequence, and x are the basis vector coefficients. However, you also want to minimize the number of rows, so you can add a term like x^T*x to the minimization function where x^T is the transpose of x. The hard part in my opinion is finding differentiable terms to add that will encourage integer vector coefficients. If you can think of a way to do that, then optimization could very well be a good way to do this.
Also, you might consider a Metropolis-type Monte Carlo solution. You would choose randomly whether to add a vector, remove a vector, or substitute a vector at each step. The vector to be added/removed/substituted would be chosen randomly. The probability of this change to be accepted would be a ratio of the suitabilities of the solutions before the change and after the change. The suitability could be equal to the difference between the current solution and the sequence, squared and summed, minus the number of rows/basis vectors involved in the solution. You would need to put in appropriate constants to for various terms to try to get the acceptance rate around 50%. I kind of doubt that this will work very well, but I thought that you should still consider it when looking for possible solutions.
GA can be applied to this problem, but it won't be 5 minute task. You need to put several things together, without knowing which implementation of each of them is best.
So:
Solution representation - how you will represent possible solution? Using matrix seems to be most straight forward. Using collection of one dimensional arrays is possible also.
But you have some constrains, so maybe SuperGene concept is worth considering?
You must use proper mutation/crossover operators for given gene representation.
How will you enforce constrains on solutions? Destroying those that are not proper? What if they contain valuable information? Maybe let them stay in population but add some penalty to fitness, so they will contribute to offspring, but won't go into next generations?
Anyway I think that GA can be applied to this problem. Is it worth? Usually GA are not best algorithm, but they are decent algorithm if others fail. I would go with GA, just because it would be most fun but I would look for alternative solution (just in case).
P.S. Personal insight: I was solving N Queens Problem, for 70 < N < 100 (board NxN, N queens). Algorithm was working fine for lower N (maybe it was trying all combination?), but with N in this range, I couldn't find proper solution. Fitness quickly jumped to about 90% of max, but in the end there were always two queens conflicting. But it was very naive implementation.

Resources