Weighted bipartite matching with constraints on degrees of vertices - algorithm

I have a problem that I was able to conceptualize as following:
We have a set of n people. And m subsets representing their ethnicity like White, Hispanic, Asian etc.
Given any combination of these people, I want to check if it is a diverse group.
A diverse group is a group that satisfies several requirements, each requirement is of the form "at least Ki persons in the group belong to subset Si". Here is the tricky part, one person can only be used to satisfy one requirement. As in, you can't use him/her for multiple requirements.
An example:
Given:
At least two people from Hispanic= {a,b,c}
At least two people from Asian={a,d,e}
Is the group {a,c,d} a diverse group?
The group {a,c,d} is not diverse because you cant count a as Hispanic and Asian. But,
the group {a,c,d,e,f} is diverse because we have two Hispanics a and c and two Asian d and e.
Attempt:
This is an instance of the Assignment problem. The jobs are the ethnicity and we can put as many ethnicity as the requirement dictate. For example, if we need two Hispanic, then we put two Hispanic jobs. However there only some people are able to do a particular job.
This is my attempt so far:
I will construct a bipartite graph with the set of people P on one hand and the set of ethnicity on the other S. We will put an edge between a person p_i and an ethnicity S_i if he/she belongs to the ethnicity.
Now, we will modify the graph, for every ethnicity S_i duplicate it k_i times (S_{i,1}, S_{i,2}, ... , S_{i,k_i}). And add new edges accordingly. Find the maximum matching M of this graph.
Now, merge the S_{i,j} s into one S_i and there you have a diverse group. However, a maximum matching is only a possible solution to to the problem. And my problem is a decision problem, I want to check if a given group is a solution or not.

I think this is an instance of the http://en.wikipedia.org/wiki/Assignment_problem, usually described in terms of assigning people to jobs, so in your case the job is "sit there and look white" or "sit there and look hispanic". Only some people are qualified to do any particular job, and they can only do one job at a time.
Normally the assignment algorithm minimizes a cost, but you can just use cost 0/cost 1 for "is in the right ethnic group" or not.
One means of solving this is the http://en.wikipedia.org/wiki/Hungarian_algorithm. This is often presented for the case in which there are exactly as many workers as jobs, but you can always invent dummy jobs or dummy workers, with all costs associated with dummies the same cost, so that optimizing the problem with dummies reproduces exactly the relative order of costs you would get if you ignored assignments to dummies, and so the optimum with dummies is the same choice, after ignoring dummies, as the optimum without.

Related

Is there a way to modify topological sort to handle concurrent prerequites?

I've been trying to build a schedule generator for my school using topological sort, but am stuck dealing with classes that have prerequisites that can be taken concurrently. I was wondering if there was any clever way to modify topological sort to deal with these concurrent classes? For example, an intro to CS course can either be taken before a Data Structures course or at the same time as a Data Structures course. I'm trying to include the case where they are taken together.
You could create a dummy node, combining the two courses together (assuming each course has low number of concurrent courses at most, as you will likely need all combination of them... Should work just fine if you have only one or two concurrent courses)
The prerequisites of the combined node will be the combined prerequisites of both courses, and all courses that have any prerequisite of one of these will have the dummy node as well.
As postprocessing, once topological sort has ended, you can cleanup the redundancies, and split dummy nodes back to the original courses.
That said, note that topological sort doesn't guarantee you to actually use this dummy node - even if it's possible, before using the original nodes. So there is no guarantee it will actually be used, unless you tie break in favor of them when possible.
Can't mathematically guarantee it's correctness, but this slight modification should work.
Use the normal topological sorting with one difference. Assign all possible beginning nodes a value of 0. For each node that is queued, assign it value of parent node's value + 1. That way, all nodes at a given value would ideally be parallel and can be picked together.
Kahn's algorithm for topological sorting naturally produces a minimum length schedule with concurrency:
Make a dependency graph of all your courses
Select all courses with no dependencies. These can be taken concurrently.
Remove the selected courses from the graph.
If the graph is not empty, go back to (2)
Of course, students are limited in the number of courses they can take simultaneously, and the problem gets tricky when you also impose a limit on maximum concurrency. Deciding the best courses to take first, when too many courses are available, is an NP-hard problem. There are some heuristics you can try, though, like deferring the jobs with the shortest dependant depth.
If you think about exactly what you want as output, it might clear out. For instance, if your desired output is a potential list of what courses to take which semester, then each vertex involved in the topological sort could be “course X on semester Y” rather than just “course X”. Then you'd get these edges, among many others:
intro to CS on semester 1 → data structures on semester 1
intro to CS
on semester 1 → data structures on semester 2
This graph would be larger than if the vertices are just courses of course: the number of vertices is now the number of courses times the maximum number of semesters in your education. But in a realistic setting, it appears to me that it wouldn't be too much to handle.

Algorithm for matching people together based on likes and dislikes

I have a group of about 75 people. Each user has liked or disliked the other 74 users. These people need to be divided in about 15 groups of various sizes (4 tot 8 people). They need to be grouped together so that the groups consist only of people who all liked eachother, or at least as much as possible.
I'm not sure what the best algorithm is to tackle this problem. Any pointers or pseudo code much appreciated!
This isn't formed quite well enough to suggest a particular algorithm. I suggest clustering and "clique" algorithms, but you'll still need to define your "best grouping" metric. "as much as possible", in the face of trade-offs and undefined desires, is meaningless. Your clustering algorithm will need this metric to form your groups.
Data representation is simple: you need a directed graph. An edge from A to B means that A likes B; lack of an edge means A doesn't like B. That will encode the "likes" information in a form tractable to your algorithm. You have 75 nodes and one edge for every "like".
Start by researching clique algorithms; a "clique" is a set in which every member likes every other member. These will likely form the basis of your clustering.
Note, however, that you have to define your trade-offs. For instance, consider the case of 13 nodes consisting of two distinct cliques of 4 and 8 people, plus one person who likes one member of the 8-clique. There are no other "likes" in the graph.
How do you place that 13th person? Do you split the 8-clique and add them to the group with the person they like? If so, you do split off 3 or 4 people form the 8? Is it fair to break 15 or 16 "likes" to put that person with the one person they like -- who doesn't like them? Is it better to add the 13th person to the mutually antagonistic clique of 4?
Your eval function must return a well-ordered metric for all of these situations. It will need to support adding to a group, splitting a large group, etc.
It sounds like a clustering problem.
Each user is a node. If two users liked each other, there is a path between the nodes.
If users disliked each other, or one like another but not the other way around, then there is no path between those nodes.
Once you process the like information into a graph, you will get a connected graph (maybe some nodes will be isolated if no one likes that user). Now the question becomes how to cut that graph into clusters of 4-8 connected nodes, which is a well studied problem with a lot of possible algorithms:
https://www.google.com/search?q=divide+connected+graph+into+clusters
If you want to differentiate between the cases when two people dislike each other vs one person likes another and that person dislikes the first one, than you can also introduce weight on the path - each like is +1, and dislike is -1. Then the question becomes that of partitioning a weighted graph.

Matching algorithm with liking function

I have an algorithmic problem.
I want to match two equally sized groups of people. There's a liking function which assigns every pair (consisting of one person of group A and one person of group B) a liking score.
I now want to match every person of group A with exactly one person of group B and I want the sum of the scores of all matches to be maximal.
I designed a naive algorithm which tries out all possibilities and then chooses the best one, but it's runtime is n! (where n is the amount of people in each group).
Is there a faster algorithm? Or at least a fast approximation algorithm?
Thanks in advance!
Assuming that each person is only to be matched once (both directions), this sound like a simple assignment problem (or: minimum weight perfect matching in bipartite graph) which can be solved in polynomial-time (and quite efficient in practice). There is also a lot of software available in many programming-languages.
Opposed to the classic worker <-> job view, your view would be: group A <-> group B.
As most libraries are somewhat assuming:
non-negative costs
minimization
you would need to translate your maximization-problem:
x = max(original_likings)
transformed_liking_i_j = x - original_liking_i_j
... solve minimization problem (with transformed likings)
This is often called opportunity loss.

How to match students to their preferred sections efficiently?

Say you have a class with 5 sections: A,B,C,D,E. Each section meets at different times, thus students registering for the course will have preference for which section they will take (they can only take one section). When students register for the course, they list 3 sections they would prefer to take, in order of preference.
Each section has n students. Let's say for simplicity that exactly n*5 students have registered for the course.
So, the question is: How do you efficiently match students to their preferred section?
I've seen some questions with similar matching scenario questions, but none quite fit and I'm afraid I don't know enough about algorithms to make up my own. BTW, this is a real problem and I know the department in question takes a few days to do it by hand.
To determine whether each student can be assigned to a preferred section, construct an integer-valued maximum flow in the following network, where the three Xs stand for capacity-1 arcs from students to the sections they prefer (polynomial-time via, e.g., the push-relabel algorithm). There's a solution if and only if the maximum flow moves m = n*5 units; then the assignments are determined by which arcs from each student is saturated.
capacity-1 arcs capacity-n arcs
| |
v v
student 1
/ student 2 section1
/ . X section2 \
source < . X section3 > sink
\ . X section4 /
\ student m-1 section5
student m
To take the order of preference into account, switch to solving a min-cost flow problem, still poly-time solvable (though you may find the network simplex mode of a general-purpose LP solver easier to use) which allows a cost to specified for each arc. Choose a score for each preference level depending on what you think is fair.
I'm positive that this has been asked before, but scheduling problems are like snowflakes, and I can't find the old question by keywords alone.
Maybe you could randomly distribute them into sections. Next you select random pairs of student and consider if swapping them improves the distribution (does it increase the match with their preferences?). You can iterate until there is no improvement possible for X iterations.
This is obviously a very naive approach but if your sample is small it might converge quickly. You cannot guarantee you have the optimal solution, but therefore you'd need a brute force approach which is probably not possible.
Is there a scoring system in which if student 1 is in section A the score is 20? (on the other hand if student 2 is in section A, score is 15?
I'm asking since if there's only one spot left for section A, and both student 1 and 2 has section A most preferred, then who ever gets registered first gets the spot. Instead of who ever is best fit (higher score).
If there is no scoring, you can just loop through the students and put them in the section they prefer. If the first one is full, try their second preference, then the next. If all three sections the student prefers are filled, just enroll them to one that isn't filled.
(It'd be different if there is scoring since you have to go with a priority queue for each section and maximize that.)

Algorithm for sorting people into rooms based on age and nationality

I’m working on program for the English Language school I work for. I’m not being paid, its just a kind of a hobby to improve / automate my work flow.
It’s a residential school and one aspects I’m looking at automating is the way we allocate room to students, and although I don’t want a full blown solution I was hoping someone could point me in the right direction… Suggestions of the way you might approach this or by suggesting algorithms to look at etc.
Basically at the school we have a whole bunch of different rooms ranging from singles to dormitories for 8 people. We get lots of different nationalities from all over the world, and we always try to maker sure each room has a mix of nationalities. Where there is more than one nationality we try to balance them. Age is also important, we always put students of a similar age together, while still trying to mix nationalities, and its unusual for us to have students sharing with more than two years between them.
I suppose more generically speaking, I am in interested in how to sort a given set of students based on two parameters to an optimal result with a few rules attached.
I hope I’ve explain clearly what I am trying to achieve… in a way it sounds really simple, but I’ve trying to think how to do it in a simple way, i.e. by sorting by nationality and then by age but it just doesn’t cut it and I know there must be a better way of approaching this. When I do it “by hand” on an excel sheet it does feel quite intuitive.
Thank you to anyone who offers help / advice.
This is an interesting question but it's not easy to answer. Somehow it's connected with subdivsion and bin packing or the cutting-stock problem. You may want to look for a topological sort too. You can look for Drools a business logic platform that let you define such rules.
First of all you might find this interesting: Stable Room-mates Problem (wikipedia). Unfortunately it does not answer your question.
Try a genetic algorithm.
There are three main criteria for using a genetic algorithm:
ability to represent a solution as a mutable array. We can have an array of integers such that a[i] is the room for the ith student.
mutation of the state should produce predictable results. In our case this is true. Mutating the array will predictably shuffle students between the rooms.
easy to write a fast fitness function. Shouldn't be too hard to write a O(n) fitness function.
This is an interesting problem. I'll try writing some code with this approach and we'll see what happens.
How about, you think of a room as something that repels students of a nationality it already has, and attracts students of a close age to what it already has. The closer the age to the average age, the more it attracts it, and the more guys of X nationality are in the room, the more if repels guys of X nationality.
Then you would, for every new student to be added, iterate through each room and see which is the one that attracts it more. I guess if the room is empty you can set all forces to 0. Also, you would have a couple of constants that multiply each of both "forces" so you can calibrate it depending on how important is to have the same age against how important is to have different nationalities.
I'd analyze each student and create a 'personality' vector based on his/her age & nationality. Then I'd sort the vectors, and maybe scramble the results a bit after sorting to encourage diversity.
The general theme of "assign x to y with respect to constraints while optimizing some quantity" falls within operations research or more specifically http://en.wikipedia.org/wiki/Mathematical_optimization. The usual approach is to formally specify the problem and use a generic optimization solver such as one of those listed in http://en.wikipedia.org/wiki/List_of_optimization_software.
Give it a try, the formal specification languages for using the existing solvers are rather easy to learn and you might get an optimal solution without having to debug a complicated algorithm.
Formulation as a General Optimization Problem
It will be useful to formalize constraints and parameters. Let us assume that for 1 <= i <= 8, we have n_i rooms available of size i. Now let us impose the hard constraint that in a particular room S, every two students a, b \in S, we have that:
|Grade(a) - Grade(b)| <= 2 (1)
Now we are interested in optimizing the "diversity" function which intuitively represents the idea that we want rooms to be as mixed as possible. So we can represent this goal as:
max over all arrangements {{ Sum over all rooms S of DiversityScore(S) }}
where we have DiversityScore(S) = # of Different Nationalities in the Room
Formulation as a Graph Problem
This is the most general setting, but clearly max over all arrangements is not computationally feasible. Now let us pose this as a sort of graph problem with the hard grade constraints. Denote all students as a vertex in a Graph G. Connect two vertices if students satisfy constraint (1). Now a clique in this graph represents a group of students that can all be placed in the same room. Now proceed in a greedy manner. Choose the largest clique of size 4 which has the largest Diversity Score. Then place them in a room and continue until all rooms are filled. This clique search method can also incorporate gender constraints which is useful, however not that Clique finding is NP Hard Problem.
Now before trying to come up with something that may be faster, let us think about how to weaken the hard constraint (1). We can massage our graph formulation by including edge weights into the picture. So if the hard constraint is satisfied denote the edge weight from i to j as 1. If two students i and j deviate by age more than 2 denote the edge weight as 1 / (Age Difference)^2 or something. Then the score of a clique should be a product of the cliques edge weights with some diversity score. However it becomes clear that now the problem is on a complete graph, which is just the general optimization we hoped to avoid, so we need to impose some hard restrictions to reduce the connectivity of our graph.
A Basic Sorting Approximation Algorithm
Sort all students by their age, so we have a sorted array where all students in a[i] have the same age, and all students in a[i] are older than all students in a[j] for all j < i.
Now consider each pair i, j, of which there are O(n^2), where we also have that |Age[i] - Age[j]| <= 2. Find the largest group of students with different nationalities and place them in a room together. We successively iterate over O(n^2) index pairs which satisfy the hard constraint and take any students with nationality difference (which we can find by preprocessing and hashing on the index pairs). Doing this carefully (like looking at indices i j which are spread apart before close together) improves running time further. It feels like it should be polytime, but I think there are certain subtleties to address first before saying so.

Resources