How to match students to their preferred sections efficiently? - algorithm

Say you have a class with 5 sections: A,B,C,D,E. Each section meets at different times, thus students registering for the course will have preference for which section they will take (they can only take one section). When students register for the course, they list 3 sections they would prefer to take, in order of preference.
Each section has n students. Let's say for simplicity that exactly n*5 students have registered for the course.
So, the question is: How do you efficiently match students to their preferred section?
I've seen some questions with similar matching scenario questions, but none quite fit and I'm afraid I don't know enough about algorithms to make up my own. BTW, this is a real problem and I know the department in question takes a few days to do it by hand.

To determine whether each student can be assigned to a preferred section, construct an integer-valued maximum flow in the following network, where the three Xs stand for capacity-1 arcs from students to the sections they prefer (polynomial-time via, e.g., the push-relabel algorithm). There's a solution if and only if the maximum flow moves m = n*5 units; then the assignments are determined by which arcs from each student is saturated.
capacity-1 arcs capacity-n arcs
| |
v v
student 1
/ student 2 section1
/ . X section2 \
source < . X section3 > sink
\ . X section4 /
\ student m-1 section5
student m
To take the order of preference into account, switch to solving a min-cost flow problem, still poly-time solvable (though you may find the network simplex mode of a general-purpose LP solver easier to use) which allows a cost to specified for each arc. Choose a score for each preference level depending on what you think is fair.
I'm positive that this has been asked before, but scheduling problems are like snowflakes, and I can't find the old question by keywords alone.

Maybe you could randomly distribute them into sections. Next you select random pairs of student and consider if swapping them improves the distribution (does it increase the match with their preferences?). You can iterate until there is no improvement possible for X iterations.
This is obviously a very naive approach but if your sample is small it might converge quickly. You cannot guarantee you have the optimal solution, but therefore you'd need a brute force approach which is probably not possible.

Is there a scoring system in which if student 1 is in section A the score is 20? (on the other hand if student 2 is in section A, score is 15?
I'm asking since if there's only one spot left for section A, and both student 1 and 2 has section A most preferred, then who ever gets registered first gets the spot. Instead of who ever is best fit (higher score).
If there is no scoring, you can just loop through the students and put them in the section they prefer. If the first one is full, try their second preference, then the next. If all three sections the student prefers are filled, just enroll them to one that isn't filled.
(It'd be different if there is scoring since you have to go with a priority queue for each section and maximize that.)

Related

Maximizing expected gain in a social network with probability

I am required to solve a specific problem.
I'm given a representation of a social network.
Each node is a person, each edge is a connection between two persons. The graph is undirected (as you would expect).
Each person has a personal "affinity" for buying a product (to simplify things, let's say there's only one product involved in this whole problem).
In each "step" in time, each person, independently, chooses whether to buy the product or not.
There's probability invovled here. A few parameters are taken into account:
His personal affinity for the product,
The percentage of his friends that already bought the product
The gain for a person buying the product is 1 dollar.
The problem is to point out X persons (let's say, 5 persons) that will receive the product in step 0, and will maximize the total expected value of the gain after Y steps (let's say, 10 steps)
The network is very large. It's not possible to simulate all the options in a naive way.
What tool / library / algorithm should I be using?
Thank you.
P.S.
When investigating this matter in google and wikipedia, a few terms kept popping up:
Dynamic network analysis
Epidemic model
but it didn't help me to find an answer
Generally, people who have the most neighbours have the most influence when they buy something.
So my heuristic would be to order people first by the number of neighbours they have (in decreasing order), then by the number of neighbours that each of those neighbours has (in order from highest to lowest), and so on. You will need at most Y levels of neighbour counts, though fewer may suffice in practice. Then simply take the first X people on this list.
This is only a heuristic, because e.g. if a person has many neighbours but most or all of them are likely to have already bought the product through other connections, then it may give a higher expectation to select a different person having fewer neighbours, but whose neighbours are less likely to already own the product.
You do not need to construct the entire list and then sort it; you can construct the list and then insert each item into a heap, and then just extract the highest-scoring X people. This will be much faster if X is small.
If X and Y are as low as you suggest then this calculation will be pretty fast, so it would be worth doing repeated runs in which instead of starting with the first X people owning the product, for each run you randomly select the initial X owners according to a probability that depends on their position in the list (the further down the list, the lower the probability).
Check out the concept of submodularity, a pretty powerful mathematical concept. In particular, check out slide 19, where submodularity is used to answer the question "Given a social graph, who should get free cell phones?". If you have access, also read the corresponding paper. That should get you started.

algorithm for solving resource allocation problems

Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient
Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.
This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.
There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.

Algorithm for sorting people into rooms based on age and nationality

I’m working on program for the English Language school I work for. I’m not being paid, its just a kind of a hobby to improve / automate my work flow.
It’s a residential school and one aspects I’m looking at automating is the way we allocate room to students, and although I don’t want a full blown solution I was hoping someone could point me in the right direction… Suggestions of the way you might approach this or by suggesting algorithms to look at etc.
Basically at the school we have a whole bunch of different rooms ranging from singles to dormitories for 8 people. We get lots of different nationalities from all over the world, and we always try to maker sure each room has a mix of nationalities. Where there is more than one nationality we try to balance them. Age is also important, we always put students of a similar age together, while still trying to mix nationalities, and its unusual for us to have students sharing with more than two years between them.
I suppose more generically speaking, I am in interested in how to sort a given set of students based on two parameters to an optimal result with a few rules attached.
I hope I’ve explain clearly what I am trying to achieve… in a way it sounds really simple, but I’ve trying to think how to do it in a simple way, i.e. by sorting by nationality and then by age but it just doesn’t cut it and I know there must be a better way of approaching this. When I do it “by hand” on an excel sheet it does feel quite intuitive.
Thank you to anyone who offers help / advice.
This is an interesting question but it's not easy to answer. Somehow it's connected with subdivsion and bin packing or the cutting-stock problem. You may want to look for a topological sort too. You can look for Drools a business logic platform that let you define such rules.
First of all you might find this interesting: Stable Room-mates Problem (wikipedia). Unfortunately it does not answer your question.
Try a genetic algorithm.
There are three main criteria for using a genetic algorithm:
ability to represent a solution as a mutable array. We can have an array of integers such that a[i] is the room for the ith student.
mutation of the state should produce predictable results. In our case this is true. Mutating the array will predictably shuffle students between the rooms.
easy to write a fast fitness function. Shouldn't be too hard to write a O(n) fitness function.
This is an interesting problem. I'll try writing some code with this approach and we'll see what happens.
How about, you think of a room as something that repels students of a nationality it already has, and attracts students of a close age to what it already has. The closer the age to the average age, the more it attracts it, and the more guys of X nationality are in the room, the more if repels guys of X nationality.
Then you would, for every new student to be added, iterate through each room and see which is the one that attracts it more. I guess if the room is empty you can set all forces to 0. Also, you would have a couple of constants that multiply each of both "forces" so you can calibrate it depending on how important is to have the same age against how important is to have different nationalities.
I'd analyze each student and create a 'personality' vector based on his/her age & nationality. Then I'd sort the vectors, and maybe scramble the results a bit after sorting to encourage diversity.
The general theme of "assign x to y with respect to constraints while optimizing some quantity" falls within operations research or more specifically http://en.wikipedia.org/wiki/Mathematical_optimization. The usual approach is to formally specify the problem and use a generic optimization solver such as one of those listed in http://en.wikipedia.org/wiki/List_of_optimization_software.
Give it a try, the formal specification languages for using the existing solvers are rather easy to learn and you might get an optimal solution without having to debug a complicated algorithm.
Formulation as a General Optimization Problem
It will be useful to formalize constraints and parameters. Let us assume that for 1 <= i <= 8, we have n_i rooms available of size i. Now let us impose the hard constraint that in a particular room S, every two students a, b \in S, we have that:
|Grade(a) - Grade(b)| <= 2 (1)
Now we are interested in optimizing the "diversity" function which intuitively represents the idea that we want rooms to be as mixed as possible. So we can represent this goal as:
max over all arrangements {{ Sum over all rooms S of DiversityScore(S) }}
where we have DiversityScore(S) = # of Different Nationalities in the Room
Formulation as a Graph Problem
This is the most general setting, but clearly max over all arrangements is not computationally feasible. Now let us pose this as a sort of graph problem with the hard grade constraints. Denote all students as a vertex in a Graph G. Connect two vertices if students satisfy constraint (1). Now a clique in this graph represents a group of students that can all be placed in the same room. Now proceed in a greedy manner. Choose the largest clique of size 4 which has the largest Diversity Score. Then place them in a room and continue until all rooms are filled. This clique search method can also incorporate gender constraints which is useful, however not that Clique finding is NP Hard Problem.
Now before trying to come up with something that may be faster, let us think about how to weaken the hard constraint (1). We can massage our graph formulation by including edge weights into the picture. So if the hard constraint is satisfied denote the edge weight from i to j as 1. If two students i and j deviate by age more than 2 denote the edge weight as 1 / (Age Difference)^2 or something. Then the score of a clique should be a product of the cliques edge weights with some diversity score. However it becomes clear that now the problem is on a complete graph, which is just the general optimization we hoped to avoid, so we need to impose some hard restrictions to reduce the connectivity of our graph.
A Basic Sorting Approximation Algorithm
Sort all students by their age, so we have a sorted array where all students in a[i] have the same age, and all students in a[i] are older than all students in a[j] for all j < i.
Now consider each pair i, j, of which there are O(n^2), where we also have that |Age[i] - Age[j]| <= 2. Find the largest group of students with different nationalities and place them in a room together. We successively iterate over O(n^2) index pairs which satisfy the hard constraint and take any students with nationality difference (which we can find by preprocessing and hashing on the index pairs). Doing this carefully (like looking at indices i j which are spread apart before close together) improves running time further. It feels like it should be polytime, but I think there are certain subtleties to address first before saying so.

What are some examples of problems well suited for Integer Linear Programming?

I've always been writing software to solve business problems. I came across about LIP while I was going through one of the SO posts. I googled it but I am unable to relate how I can use it to solve business problems. Appreciate if some one can help me understand in layman terms.
ILP can be used to solve essentially any problem involving making a bunch of decisions, each of which only has several possible outcomes, all known ahead of time, and in which the overall "quality" of any combination of choices can be described using a function that doesn't depend on "interactions" between choices. To see how it works, it's easiest to restrict further to variables that can only be 0 or 1 (the smallest useful range of integers). Now:
Each decision requiring a yes/no answer becomes a variable
The objective function should describe the thing we want to maximise (or minimise) as a weighted combination of these variables
You need to find a way to express each constraint (combination of choices that cannot be made at the same time) using one or more linear equality or inequality constraints
Example
For example, suppose you have 3 workers, Anne, Bill and Carl, and 3 jobs, Dusting, Typing and Packing. All of the people can do all of the jobs, but they each have different efficiency/ability levels at each job, so we want to find the best task for each of them to do to maximise overall efficiency. We want each person to perform exactly 1 job.
Variables
One way to set this problem up is with 9 variables, one for each combination of worker and job. The variable x_ad will get the value 1 if Anne should Dust in the optimal solution, and 0 otherwise; x_bp will get the value 1 if Bill should Pack in the optimal solution, and 0 otherwise; and so on.
Objective Function
The next thing to do is to formulate an objective function that we want to maximise or minimise. Suppose that based on Anne, Bill and Carl's most recent performance evaluations, we have a table of 9 numbers telling us how many minutes it takes each of them to perform each of the 3 jobs. In this case it makes sense to take the sum of all 9 variables, each multiplied by the time needed for that particular worker to perform that particular job, and to look to minimise this sum -- that is, to minimise the total time taken to get all the work done.
Constraints
The final step is to give constraints that enforce that (a) everyone does exactly 1 job and (b) every job is done by exactly 1 person. (Note that actually these steps can be done in any order.)
To make sure that Anne does exactly 1 job, we can add the constraint that x_ad + x_at + x_ap = 1. Similar constraints can be added for Bill and Carl.
To make sure that exactly 1 person Dusts, we can add the constraint that x_ad + x_bd + x_cd = 1. Similar constraints can be added for Typing and Packing.
Altogether there are 6 constraints. You can now supply this 9-variable, 6-constraint problem to an ILP solver and it will spit back out the values for the variables in one of the optimal solutions -- exactly 3 of them will be 1 and the rest will be 0. The 3 that are 1 tell you which people should be doing which job!
ILP is General
As it happens, this particular problem has a special structure that allows it to be solved more efficiently using a different algorithm. The advantage of using ILP is that variations on the problem can be easily incorporated: for example if there were actually 4 people and only 3 jobs, then we would need to relax the constraints so that each person does at most 1 job, instead of exactly 1 job. This can be expressed simply by changing the equals sign in each of the 1st 3 constraints into a less-than-or-equals sign.
First, read a linear programming example from Wikipedia
Now imagine the farmer producing pigs and chickens, or a factory producing toasters and vacuums - now the outputs (and possibly constraints) are integers, so those pretty graphs are going to go all crookedly step-wise. That's a business application that is easily represented as a linear programming problem.
I've used integer linear programming before to determine how to tile n identically proportioned images to maximize screen space used to display these images, and the formalism can represent covering problems like scheduling, but business applications of integer linear programming seem like the more natural applications of it.
SO user flolo says:
Use cases where I often met it: In digital circuit design you have objects to be placed/mapped onto certain parts of a chip (FPGA-Placing) - this can be done with ILP. Also in HW-SW codesign there often arise the partition problem: Which part of a program should still run on a CPU and which part should be accelerated on hardware. This can be also solved via ILP.
A sample ILP problem will looks something like:
maximize 37∙x1 + 45∙x2
where
x1,x2,... should be positive integers
...but, there is a set of constrains in the form
a1∙x1+b1∙x2 < k1
a2∙x1+b2∙x2 < k2
a3∙x1+b3∙x2 < k3
...
Now, a simpler articulation of Wikipedia's example:
A farmer has L m² land to be planted with either wheat or barley or a combination of the two.
The farmer has F grams of fertilizer, and P grams of insecticide.
Every m² of wheat requires F1 grams of fertilizer, and P1 grams of insecticide
Every m² of barley requires F2 grams of fertilizer, and P2 grams of insecticide
Now,
Let a1 denote the selling price of wheat per 1 m²
Let a2 denote the selling price of barley per 1 m²
Let x1 denote the area of land to be planted with wheat
Let x2 denote the area of land to be planted with barley
x1,x2 are positive integers (Assume we can plant in 1 m² resolution)
So,
the profit is a1∙x1 + a2∙x2 - we want to maximize it
Because the farmer has a limited area of land: x1+x2<=L
Because the farmer has a limited amount of fertilizer: F1∙x1+F2∙x2 < F
Because the farmer has a limited amount of insecticide: P1∙x1+P2∙x2 < P
a1,a2,L,F1,F2,F,P1,P2,P - are all constants (in our example: positive)
We are looking for positive integers x1,x2 that will maximize the expression stated, given the constrains stated.
Hope it's clear...
ILP "by itself" can directly model lots of stuff. If you search for LP examples you will probably find lots of famous textbook cases, such as the diet problem
Given a set of pills, each with a vitamin content and a daily vitamin
quota, find the cheapest cocktail that matches the quota.
Many such problems naturally have instances that require varialbe to be integers (perhaps you can't split pills in half)
The really interesting stuff though is that actually a big deal of combinatorial problems reduce to LP. One of my favourites is the assignment problem
Given a set of N workers, N tasks and an N by N matirx describing how
much each worker charges for the each task, determine what task to
give to each worker in order to minimize cost.
Most solution that naturally come up have exponential complexity but there is a polynomial solution using linear programming.
When it comes to ILP, ILP has the added benefit/difficulty of being NP-complete. This means that it can be used to model a very wide range of problems (boolean satisfiability is also very popular in this regard). Since there are many good and optimized ILP solvers out there it is often viable to translate an NP-complete problem into ILP instead of devising a custom algorithm of your own.
You can apply linear program easily everywhere you want to optimize and the target function is linear. You can make schedules (I mean big, like train companies, who need to optimize the utilization of the vehicles and tracks), productions (optimize win), almost everything. Sometimes it is tricky to formulate your problem as IP and/or sometimes you meet the problem that your solution is, that you have to produce e.g. 0.345 cars for optimum win. That is of course not possible, and so you constraint even more: Your variable for the number of cars must be integer. Even when it now sounds simpler (because you have infinite less choices for your variable), its actually harder. In this moment it gets NP-hard. Which actually means you can solve ANY problem from your computer with ILP, you just have to transform it.
For you I would recommend an intro into reading some basic (I)LP stuff. From my mind I dont know any good online site (but if you goolge you will find some), as book I can recommend Linear Programming from Chvatal. It has very good examples, and describes also real use cases.
The other answers here have excellent examples. Two of the gold standards in business of using integer programming and more generally operations research are
the journal Interfaces published by INFORMS (The Institute for Operations Research and the Management Sciences)
winners of the the Franz Edelman Award for Achievement in Operations Research and the Management Sciences
Interfaces publishes research that uses operations research applied to real-world problems, and the Edelman award is a highly competitive award for business use of operations research techniques.

Ticket drafting algorithm

I'm trying to write a program to automate a ticket draft.
We have a certain number of season ticket passes and want to split up the tickets among a group of people. There are X number of games, Y number of season passes, and Z number of people. Each of Z people has ranked the X games.
My code basically goes through the draft order and back picking out the tickets from their ranking if available, otherwise, picking the next highest ranking. For the most part it works. The problem is, there's a point where most of the tickets are taken and the remaining tickets left are ones you already have so you just don't pick them. People therefore have different numbers of tickets. Is there a good way to get around this?
If you have X games and Y season passes, presumably there are X*Y tickets available to give to the Z people, right?
This sounds like it could be treated as an optimization problem, but to do so you have to identify your main goals? I'm guessing you want each person to receive X*Y / Z tickets (split them evenly), but maybe not. I'm guessing you also want to maximize the aggregate satisfaction (defined in some way according to the rankings) in tickets. You would probably want to give a large penalty in satisfaction for a person if he receives more than 1 ticket for the same game. I believe this last aspect might be why the straight draft approach is not the best, but I could be mistaken.
Once you are clear on what you are trying to optimize (if this is indeed an optimization problem), then you can consider the best approach to the problem. This could be your own custom-built solution, or you could try an existing technique (genetic algorithm, etc.). Before doing so though it is important that you frame the problem properly.
If there were no preferences involved, this would be a straight min-cut max flow problem. http://en.wikipedia.org/wiki/Maximum_flow_problem, as follows:
Create a source vertex A. From A, create Z vertices, one for each person. The capacity can be infinite (or very, very large). Create a sink B, and create X vertices, one for each game, linked to B; the capacity should be Y (you have Y tickets per game). From each person, link to each game they've ranked, with capacity 1.
If you look at the wiki link above, there are about 10 algorithms to solve this basic problem. Find one you understand and can implement yourself, because you'll need to modify it slightly. I'm not familiar with all of them, but the ones I know about have a step 'pick an edge' or 'pick a path.' You should modify the 'how you pick an edge' logic to take the priority ordering of the games into account. I'm not sure exactly what the ordering should be (you'll probably need to experiment), but if you say the lowest ranked game is 1, the next is 2, up to X, then a score like 'ranking of the edge - number of games the person is already signed up for' might work.
I think this is a variant of the Stable Marriage Problem or the Stable Roommates Problem for which there are known algorithms for solving.

Resources