algorithm for solving resource allocation problems - algorithm

Hi I am building a program wherein students are signing up for an exam which is conducted at several cities through out the country. While signing up students provide a list of three cities where they would like to give the exam in order of their preference. So a student may say his first preference for an exam centre is New York followed by Chicago followed by Boston.
Now keeping in mind that as the exam centres have limited capacity they cannot accomodate each students first choice .We would however try and provide as many students either their first or second choice of centres and as far as possible avoid students having to give the third choice centre to a student
Now any ideas of a sorting algorithm that would mke this process more efficent.The simple way to do this would be to first go through the list of first choice of students allot as many as possible then go through the list of second choices and allot. However this may lead to the students who are first in the list getting their first centre and the last students getting their third choice or worse none of their choices. Anything that could make this more efficient

Sounds like a variant of the classic stable marriages problem or the college admission problem. The Wikipedia lists a linear-time (in the number of preferences, O(n²) in the number of persons) algorithm for the former; the NRMP describes an efficient algorithm for the latter.
I suspect that if you randomly generate preferences of exam places for students (one Fisher–Yates shuffle per exam place) and then apply the stable marriages algorithm, you'll get a pretty fair and efficient solution.

This problem could be formulated as an instance of minimum cost flow. Let N be the number of students. Let each student be a source vertex with capacity 1. Let each exam center be a sink vertex with capacity, well, its capacity. Make an arc from each student to his first, second, and third choices. Set the cost of first choice arcs to 0; the cost of second choice arcs to 1; and the cost of third choice arcs to N + 1.
Find a minimum-cost flow that moves N units of flow. Assuming that your solver returns an integral solution (it should; flow LPs are totally unimodular), each student flows one unit to his assigned center. The costs minimize the number of third-choice assignments, breaking ties by the number of second-choice assignments.

There are a class of algorithms that address this allocating of limited resources called auctions. Basically in this case each student would get a certain amount of money (a number they can spend), then your software would make bids between those students. You might use a formula based on preferences.
An example would be for tutorial times. If you put down your preferences, then you would effectively bid more for those times and less for the times you don't want. So if you don't get your preferences you have more "money" to bid with for other tutorials.

Related

How to match students to their preferred sections efficiently?

Say you have a class with 5 sections: A,B,C,D,E. Each section meets at different times, thus students registering for the course will have preference for which section they will take (they can only take one section). When students register for the course, they list 3 sections they would prefer to take, in order of preference.
Each section has n students. Let's say for simplicity that exactly n*5 students have registered for the course.
So, the question is: How do you efficiently match students to their preferred section?
I've seen some questions with similar matching scenario questions, but none quite fit and I'm afraid I don't know enough about algorithms to make up my own. BTW, this is a real problem and I know the department in question takes a few days to do it by hand.
To determine whether each student can be assigned to a preferred section, construct an integer-valued maximum flow in the following network, where the three Xs stand for capacity-1 arcs from students to the sections they prefer (polynomial-time via, e.g., the push-relabel algorithm). There's a solution if and only if the maximum flow moves m = n*5 units; then the assignments are determined by which arcs from each student is saturated.
capacity-1 arcs capacity-n arcs
| |
v v
student 1
/ student 2 section1
/ . X section2 \
source < . X section3 > sink
\ . X section4 /
\ student m-1 section5
student m
To take the order of preference into account, switch to solving a min-cost flow problem, still poly-time solvable (though you may find the network simplex mode of a general-purpose LP solver easier to use) which allows a cost to specified for each arc. Choose a score for each preference level depending on what you think is fair.
I'm positive that this has been asked before, but scheduling problems are like snowflakes, and I can't find the old question by keywords alone.
Maybe you could randomly distribute them into sections. Next you select random pairs of student and consider if swapping them improves the distribution (does it increase the match with their preferences?). You can iterate until there is no improvement possible for X iterations.
This is obviously a very naive approach but if your sample is small it might converge quickly. You cannot guarantee you have the optimal solution, but therefore you'd need a brute force approach which is probably not possible.
Is there a scoring system in which if student 1 is in section A the score is 20? (on the other hand if student 2 is in section A, score is 15?
I'm asking since if there's only one spot left for section A, and both student 1 and 2 has section A most preferred, then who ever gets registered first gets the spot. Instead of who ever is best fit (higher score).
If there is no scoring, you can just loop through the students and put them in the section they prefer. If the first one is full, try their second preference, then the next. If all three sections the student prefers are filled, just enroll them to one that isn't filled.
(It'd be different if there is scoring since you have to go with a priority queue for each section and maximize that.)

Maximizing expected gain in a social network with probability

I am required to solve a specific problem.
I'm given a representation of a social network.
Each node is a person, each edge is a connection between two persons. The graph is undirected (as you would expect).
Each person has a personal "affinity" for buying a product (to simplify things, let's say there's only one product involved in this whole problem).
In each "step" in time, each person, independently, chooses whether to buy the product or not.
There's probability invovled here. A few parameters are taken into account:
His personal affinity for the product,
The percentage of his friends that already bought the product
The gain for a person buying the product is 1 dollar.
The problem is to point out X persons (let's say, 5 persons) that will receive the product in step 0, and will maximize the total expected value of the gain after Y steps (let's say, 10 steps)
The network is very large. It's not possible to simulate all the options in a naive way.
What tool / library / algorithm should I be using?
Thank you.
P.S.
When investigating this matter in google and wikipedia, a few terms kept popping up:
Dynamic network analysis
Epidemic model
but it didn't help me to find an answer
Generally, people who have the most neighbours have the most influence when they buy something.
So my heuristic would be to order people first by the number of neighbours they have (in decreasing order), then by the number of neighbours that each of those neighbours has (in order from highest to lowest), and so on. You will need at most Y levels of neighbour counts, though fewer may suffice in practice. Then simply take the first X people on this list.
This is only a heuristic, because e.g. if a person has many neighbours but most or all of them are likely to have already bought the product through other connections, then it may give a higher expectation to select a different person having fewer neighbours, but whose neighbours are less likely to already own the product.
You do not need to construct the entire list and then sort it; you can construct the list and then insert each item into a heap, and then just extract the highest-scoring X people. This will be much faster if X is small.
If X and Y are as low as you suggest then this calculation will be pretty fast, so it would be worth doing repeated runs in which instead of starting with the first X people owning the product, for each run you randomly select the initial X owners according to a probability that depends on their position in the list (the further down the list, the lower the probability).
Check out the concept of submodularity, a pretty powerful mathematical concept. In particular, check out slide 19, where submodularity is used to answer the question "Given a social graph, who should get free cell phones?". If you have access, also read the corresponding paper. That should get you started.

Algorithm for sorting people into rooms based on age and nationality

I’m working on program for the English Language school I work for. I’m not being paid, its just a kind of a hobby to improve / automate my work flow.
It’s a residential school and one aspects I’m looking at automating is the way we allocate room to students, and although I don’t want a full blown solution I was hoping someone could point me in the right direction… Suggestions of the way you might approach this or by suggesting algorithms to look at etc.
Basically at the school we have a whole bunch of different rooms ranging from singles to dormitories for 8 people. We get lots of different nationalities from all over the world, and we always try to maker sure each room has a mix of nationalities. Where there is more than one nationality we try to balance them. Age is also important, we always put students of a similar age together, while still trying to mix nationalities, and its unusual for us to have students sharing with more than two years between them.
I suppose more generically speaking, I am in interested in how to sort a given set of students based on two parameters to an optimal result with a few rules attached.
I hope I’ve explain clearly what I am trying to achieve… in a way it sounds really simple, but I’ve trying to think how to do it in a simple way, i.e. by sorting by nationality and then by age but it just doesn’t cut it and I know there must be a better way of approaching this. When I do it “by hand” on an excel sheet it does feel quite intuitive.
Thank you to anyone who offers help / advice.
This is an interesting question but it's not easy to answer. Somehow it's connected with subdivsion and bin packing or the cutting-stock problem. You may want to look for a topological sort too. You can look for Drools a business logic platform that let you define such rules.
First of all you might find this interesting: Stable Room-mates Problem (wikipedia). Unfortunately it does not answer your question.
Try a genetic algorithm.
There are three main criteria for using a genetic algorithm:
ability to represent a solution as a mutable array. We can have an array of integers such that a[i] is the room for the ith student.
mutation of the state should produce predictable results. In our case this is true. Mutating the array will predictably shuffle students between the rooms.
easy to write a fast fitness function. Shouldn't be too hard to write a O(n) fitness function.
This is an interesting problem. I'll try writing some code with this approach and we'll see what happens.
How about, you think of a room as something that repels students of a nationality it already has, and attracts students of a close age to what it already has. The closer the age to the average age, the more it attracts it, and the more guys of X nationality are in the room, the more if repels guys of X nationality.
Then you would, for every new student to be added, iterate through each room and see which is the one that attracts it more. I guess if the room is empty you can set all forces to 0. Also, you would have a couple of constants that multiply each of both "forces" so you can calibrate it depending on how important is to have the same age against how important is to have different nationalities.
I'd analyze each student and create a 'personality' vector based on his/her age & nationality. Then I'd sort the vectors, and maybe scramble the results a bit after sorting to encourage diversity.
The general theme of "assign x to y with respect to constraints while optimizing some quantity" falls within operations research or more specifically http://en.wikipedia.org/wiki/Mathematical_optimization. The usual approach is to formally specify the problem and use a generic optimization solver such as one of those listed in http://en.wikipedia.org/wiki/List_of_optimization_software.
Give it a try, the formal specification languages for using the existing solvers are rather easy to learn and you might get an optimal solution without having to debug a complicated algorithm.
Formulation as a General Optimization Problem
It will be useful to formalize constraints and parameters. Let us assume that for 1 <= i <= 8, we have n_i rooms available of size i. Now let us impose the hard constraint that in a particular room S, every two students a, b \in S, we have that:
|Grade(a) - Grade(b)| <= 2 (1)
Now we are interested in optimizing the "diversity" function which intuitively represents the idea that we want rooms to be as mixed as possible. So we can represent this goal as:
max over all arrangements {{ Sum over all rooms S of DiversityScore(S) }}
where we have DiversityScore(S) = # of Different Nationalities in the Room
Formulation as a Graph Problem
This is the most general setting, but clearly max over all arrangements is not computationally feasible. Now let us pose this as a sort of graph problem with the hard grade constraints. Denote all students as a vertex in a Graph G. Connect two vertices if students satisfy constraint (1). Now a clique in this graph represents a group of students that can all be placed in the same room. Now proceed in a greedy manner. Choose the largest clique of size 4 which has the largest Diversity Score. Then place them in a room and continue until all rooms are filled. This clique search method can also incorporate gender constraints which is useful, however not that Clique finding is NP Hard Problem.
Now before trying to come up with something that may be faster, let us think about how to weaken the hard constraint (1). We can massage our graph formulation by including edge weights into the picture. So if the hard constraint is satisfied denote the edge weight from i to j as 1. If two students i and j deviate by age more than 2 denote the edge weight as 1 / (Age Difference)^2 or something. Then the score of a clique should be a product of the cliques edge weights with some diversity score. However it becomes clear that now the problem is on a complete graph, which is just the general optimization we hoped to avoid, so we need to impose some hard restrictions to reduce the connectivity of our graph.
A Basic Sorting Approximation Algorithm
Sort all students by their age, so we have a sorted array where all students in a[i] have the same age, and all students in a[i] are older than all students in a[j] for all j < i.
Now consider each pair i, j, of which there are O(n^2), where we also have that |Age[i] - Age[j]| <= 2. Find the largest group of students with different nationalities and place them in a room together. We successively iterate over O(n^2) index pairs which satisfy the hard constraint and take any students with nationality difference (which we can find by preprocessing and hashing on the index pairs). Doing this carefully (like looking at indices i j which are spread apart before close together) improves running time further. It feels like it should be polytime, but I think there are certain subtleties to address first before saying so.

Ticket drafting algorithm

I'm trying to write a program to automate a ticket draft.
We have a certain number of season ticket passes and want to split up the tickets among a group of people. There are X number of games, Y number of season passes, and Z number of people. Each of Z people has ranked the X games.
My code basically goes through the draft order and back picking out the tickets from their ranking if available, otherwise, picking the next highest ranking. For the most part it works. The problem is, there's a point where most of the tickets are taken and the remaining tickets left are ones you already have so you just don't pick them. People therefore have different numbers of tickets. Is there a good way to get around this?
If you have X games and Y season passes, presumably there are X*Y tickets available to give to the Z people, right?
This sounds like it could be treated as an optimization problem, but to do so you have to identify your main goals? I'm guessing you want each person to receive X*Y / Z tickets (split them evenly), but maybe not. I'm guessing you also want to maximize the aggregate satisfaction (defined in some way according to the rankings) in tickets. You would probably want to give a large penalty in satisfaction for a person if he receives more than 1 ticket for the same game. I believe this last aspect might be why the straight draft approach is not the best, but I could be mistaken.
Once you are clear on what you are trying to optimize (if this is indeed an optimization problem), then you can consider the best approach to the problem. This could be your own custom-built solution, or you could try an existing technique (genetic algorithm, etc.). Before doing so though it is important that you frame the problem properly.
If there were no preferences involved, this would be a straight min-cut max flow problem. http://en.wikipedia.org/wiki/Maximum_flow_problem, as follows:
Create a source vertex A. From A, create Z vertices, one for each person. The capacity can be infinite (or very, very large). Create a sink B, and create X vertices, one for each game, linked to B; the capacity should be Y (you have Y tickets per game). From each person, link to each game they've ranked, with capacity 1.
If you look at the wiki link above, there are about 10 algorithms to solve this basic problem. Find one you understand and can implement yourself, because you'll need to modify it slightly. I'm not familiar with all of them, but the ones I know about have a step 'pick an edge' or 'pick a path.' You should modify the 'how you pick an edge' logic to take the priority ordering of the games into account. I'm not sure exactly what the ordering should be (you'll probably need to experiment), but if you say the lowest ranked game is 1, the next is 2, up to X, then a score like 'ranking of the edge - number of games the person is already signed up for' might work.
I think this is a variant of the Stable Marriage Problem or the Stable Roommates Problem for which there are known algorithms for solving.

how to tackle this combinatorial algorithm problem

I have N people who must each take T exams. Each exam takes "some" time, e.g. 30 min (no such thing as finishing early). Exams must be performed in front of an examiner.
I need to schedule each person to take each exam in front of an examiner within an overall time period but avoiding a lunch break, using the minimum number of examiners for the minimum amount of time (i.e. no/minimum examiners idle)
There are the following restrictions:
No person can be in 2 places at once
each person must take each exam once
noone should be examined by the same examiner twice
I realise that an optimal solution is probably NP-Complete, and that I'm probably best off using a genetic algorithm to obtain a best estimate (similar to this? Seating plan software recommendations (does such a beast even exist?)).
I'm comfortable with how genetic algorithms work, what i'm struggling with is how to model the problem programatically such that i CAN manipulate the parameters genetically..
If each exam took the same amount of time, then i'd divide the time period up into these lengths, and simply create a matrix of time slots vs examiners and drop the candidates in. However because the times of each test are not necessarily the same, i'm a bit lost on how to approach this.
currently im doing this:
make a list of all "tests" which need to take place, between every candidate and exam
start with as many examiners as there are tests
repeatedly loop over all examiners, for each one: find an unscheduled test which is eligible for the examiner (based on the restrictions)
continue until all tests that can be scheduled, are
if there are any unscheduled tests, increment the number of examiners and start again.
i'm looking for better suggestions on how to approach this, as it feels rather crude currently.
As julienaubert proposed, a solution (which I will call schedule) is a sequence of tuples (date, student, examiner, test) that covers all relevant student-test combinations (do all N students take all T tests?). I understand that a single examiner may test several students at once. Otherwise, a huge number of examiners will be required (at least one per student), which I really doubt.
Two tuples A and B conflict if
the student is the same, the test is different, and the time-period overlaps
the examiner is the same, the test is different, and the time-period overlaps
the student has already worked with the examiner on another test
Notice that tuple conflicts are different from schedule conflicts (which must additionally check for the repeated examiner problem).
Lower bounds:
the number E of examiners must be >= the total number of the tests of the most overworked student
total time must be greater than the total length of the tests of the most overworked student.
A simple, greedy schedule can be constructed by the following method:
Take the most overworked student and
assigning tests in random order,
each with a different examiner. Some
bin-packing can be used to reorder
the tests so that the lunch hour is
kept free. This will be a happy
student: he will finish in the
minimum possible time.
For each other student, if the student must take any test already scheduled, share time, place and examiner with the previously-scheduled test.
Take the most overworked student (as in: highest number of unscheduled tests), and assign tuples so that no constraints are violated, adding more time and examiners if necessary
If any students have unscheduled tests, goto 2.
Improving the choices made in step 2 above is critical to improvement; this choice can form the basis for heuristic search. The above algorithm tries to minimize the number of examiners required, at the expense of student time (students may end up with one exam early on and another last thing in the day, nothing in between). However, it is guaranteed to yield legal solutions. Running it with different students can be used to generate "starting" solutions to a GA that sticks to legal answers.
In general, I believe there is no "perfect answer" to a problem like this one, because there are so many factors that must be taken into account: students would like to minimize their total time spent examining themselves, as would examiners; we would like to minimize the number of examiners, but there are also practical limitations to how many students we can stack in a room with a single examiner. Also, we would like to make the scheduling "fair", so nobody is clearly getting much worse than others. So the best you can hope to do is to allow these knobs to be fiddled with, the results to be known (total time, per-student happiness, per-examiner happiness, exam sizes, perceived fairness) and allow the user to explore the parameter space and make an informed choice.
I'd recommend using a SAT solver for this. Although the problem is probably NP-hard, good SAT solvers can often handle hundreds of thousands of variables. Check out Chaff or MiniSat for two examples.
Don't limit yourself to genetic algorithms prematurely, there are many other approaches.
To be more specific, genetic algorithms are only really useful if you can combine parts of two solutions into a new one. This looks rather hard for this problem, at least if there are a similar number of people and exams so that most of them interact directly.
Here is a take on how you could model it with GA.
Using your notation:
N (nr exam-takers)
T (nr exams)
Let the gene of an individual express a complete schedule of bookings.
i.e. an individual is a list of specific bookings: (i,j,t,d)
i is the i'th exam-taker [1,N]
j is the j'th examiner [1,?]
t is the t'th test the exam-taker must take [1,T]
d is the start of the exam (date+time)
evaluate using a fitness function which has the property to:
penalize (severly) for all double booked examiners
penalize for examiners idle-time
penalize for exam-takers who were not allocated within their time period
reward for each exam-taker's test which was booked within period
this function will have all the logic to determine double bookings etc.. you have the complete proposed schedule in the individual, you now run the logic knowing the time for each test at each booking to determine the fitness and you increase/decrease the score of the booking accordingly.
to make this work well, consider:
initiation - use as much info you have to make "sane" bookings if it is computationally cheap.
define proper GA operators
initiating in a sane way:
random select d within the time period
randomly permute (1,2,..,N) and then pick the i from this (avoids duplicates), same for j and t
proper GA operators:
say you have booking a and b:
(a_i,a_j,a_t,a_d)
(b_i,b_j,b_t,b_d)
you can swap a_i and b_i and you can swap a_j and b_j and a_d and b_d, but likely no point in swapping a_t and b_t.
you can also have cycling, best illustrated with an example, if N*T = 4 a complete booking is 4 tuples and you would then cycle along i or j or d, for example cycle along i:
a_i = b_i
b_i = c_i
c_i = d_i
d_i = a_i
You might also consider constraint programming. Check out Prolog or, for a more modern expression of logic programming, PyKE

Resources