Algorithm to match people with available appointments - algorithm

I need some help making a program that finds the best solution for everyone (more on that later).
6 7
0 0 0 0 0 0 0
1 0 0 1 1 0 0
2 2 2 1 2 2 2
2 1 1 1 2 1 2
0 1 2 2 1 0 0
1 2 1 2 0 1 1
The example given above is a problem that the algorithm is supposed to solve,
the first number of the first row indicates the number of people (6)
the second number of the first row indicates the number of appointments (7)
0 = the person doesnt have a problem with the date
1 = the person could choose these date if none else is available
2 = the person cant choose this appointment
Row = Person
Colum = Available Appointment
What the program needs to do now is to find the best possible solution for everyone by choosing which colum would be the best for the person's desire by arranging peoples appointments based on their choices
ex.
In the 3rd row the person can only attend the appointment on the 4th column since he cant attend to the other ones (2) which also makes column 4 complete and out of use for the other people.
The reason I need help with this is because I have no idea on how to approach this because this might be a simple example but since its an algorithm its supposed to work with dozens of peoples and appointments.

The exercise is somewhat ambiguous, probably on purpose. My wild guess would be to sort the meetings by:
the highest number of possible participants, i.e., the lowest number of 2s in a matrix column.
the lowest “badness”, i.e., the lowest number of 1s in a matrix column.
Why not #2s: Because we don’t care about those who cannot participate at this sorting stage.
Why not #0s: Because we want to minimize the number of people inconvenienced by the meeting time, not (necessarily) maximize the number of people pleased with the meeting time.
#!/usr/bin/env python
import sys
n_people, n_appointments = (int(i)
for i in sys.stdin.readline().split())
people_appointments = tuple(tuple(int(i)
for i in line.split())
for line in sys.stdin)
assert len(people_appointments) == n_people
for appointments in people_appointments:
assert len(appointments) == n_appointments
appointment_metric = {}
for appointment in range(n_appointments):
n_missing = sum(people_appointments[i][appointment] == 2
for i in range(n_people))
badness = sum(people_appointments[i][appointment] == 1
for i in range(n_people))
appointment_metric.setdefault(
(n_missing, badness), []).append(str(appointment + 1))
for metric in sorted(appointment_metric):
print(f'Appointment Nr. {" / ".join(appointment_metric[metric])} '
f'(absence {metric[0]}, badness {metric[1]})')
Possible output (best appointment (by the metric described above) to worst appointment):
Appointment Nr. 6 (absence 1, badness 2)
Appointment Nr. 7 (absence 2, badness 1)
Appointment Nr. 1 / 2 / 3 / 5 (absence 2, badness 2)
Appointment Nr. 4 (absence 2, badness 3)
There are (of course) many other ways to evaluate meetings. Picking and defining a metric is quite likely an implicit part of the exercise.

Related

Algorithms for optimal student seating arrangements

Say I need to place n=30 students into groups of between 2 and 6, and I collect the following preference data from each student:
Student Name: Tom
Likes to sit with: Jimi, Eric
Doesn't like to sit with: John, Paul, Ringo, George
It's implied that they're neutral about any other student in the overall class that they haven't mentioned.
How might I best run a large number of simulations of many different/random grouping arrangements, to be able to determine a score for each arrangement, through which I could then pick the "most optimal" score/arrangement?
Alternatively, are there any other methods by which I might be able to calculate a solution that satisfies all of the supplied constraints?
I'd like a generic method that can be reused on different class sizes each year, but within each simulation run, the following constants and variables apply:
Constants: Total number of students, Student preferences
Variables: Group sizes, Student Groupings, Number of different group arrangements/iterations to test
Thanks in advance for any help/advice/pointers provided.
I believe you can state this as an explicit mathematical optimization problem.
Define the binary decision variables:
x(p,g) = 1 if person p is assigned to group g
0 otherwise
I used:
I used your data set with 28 persons, and your preference matrix (with -1,+1,0 elements). For groups, I used 4 groups of 6 and 1 group of 4. A solution can look like:
---- 80 PARAMETER solution using MIQP model
group1 group2 group3 group4 group5
aimee 1
amber-la 1
amber-le 1
andrina 1
catelyn-t 1
charlie 1
charlotte 1
cory 1
daniel 1
ellie 1
ellis 1
eve 1
grace-c 1
grace-g 1
holly 1
jack 1
jade 1
james 1
kadie 1
kieran 1
kristiana 1
lily 1
luke 1
naz 1
nibah 1
niko 1
wiki 1
zeina 1
COUNT 6 6 6 6 4
Notes:
This model can be linearized, so it can be fed into a standard MIP solver
I solved this directly as a MIQP model (actually the solver reformulated the model into a MIP). The model solved in a few seconds.
Probably we need to add extra logic to make sure one person is not getting a really bad assignment. We optimize here only the total sum. This overall sum may allow an individual to get a bad deal. It is an interesting exercise to take this into account in the model. There are some interesting trade-offs.
1st approach should be, create matrix n x n where n is total number of students, indexes for row and columns are ordinals for every student, and each column representing preferences for sitting with the others students. Fills the cells with values 1=Like to sit, -1 = the Opposite, 0 = neutral. Zeroes to be filled too on main diagonal (i,i)
------Mark Maria John Peter
Mark 0 1 -1 1
Maria 0 0 -1 1
John -1 1 0 1
Peter 0
Score calculations are based on sums of these values. So ie: John likes to sit with Maria, = 1, but Maria doesn't like to sit with John -1, result is 0. Best result is when both score (sum) 2.
So on, based on Group Sizes, calculate Score of each posible combination. Bigger the score, better the arrangement. Combinations discriminate values on main diagonal. ie: John grouped with the same John is not a valid combination/group.
In a group size of 2, best score is 2
In a group size of 3, best score is 6,
In a group size of 4, best score is 12
In a group size of n, best score would be (n-1)*n
Now in ordered list of combinations / groups, you should take first the best tuples with highest scores, but avoiding duplicates of students between tuples.
In a recent research, a PSO was implemented to classify students under unknown number of groups of 4 to 6. PSO showed improved capabilities compared to GA. I think that all you need is the specific research.
The paper is: Forming automatic groups of learners using particle swarm optimization for applications of differentiated instruction
You can find the paper here: https://doi.org/10.1002/cae.22191
Perhaps the researchers could guide you through researchgate: https://www.researchgate.net/publication/338078753
Regarding the optimal sitting you need to specify an objective function with the specific data

Dropping people in Stata from a panel based on their situation in multiple years

I have an unbalanced panel of 7 years with every person interviewed 4 times and I want to drop all the people that reported that they were unemployed/inactive in all 4 periods. However, I do not want to drop the observations of the people that may have been out of the labour market for 1, 2 or 3 out of the 4 periods they were interviewed. How do I tell Stata to drop people based on their situation in multiple years (t to t-3)? When I do drop if ecostatus>3, for example, Stata drops observations that I need, i.e. the people that were inactive for less than the full period of the survey.
// create some example data
clear
input id t unemp
1 1 1
1 2 1
1 3 1
1 4 1
2 1 1
2 2 0
2 3 1
2 4 1
end
// create the total number of unemployment spells
bys id : egen totunemp = total(unemp)
// display the data
sort id t
list, sepby(id)
// keep those observations with at least one
// employment spell
keep if totunemp < 4
// display the data
list

How I can get the 'n' possible matrices from two vectors?

I've been searching for an algorithm for the solution of all possible matrices of dimension 'n' that can be obtained with two arrays, one of the sum of the rows, and another, of the sum of the columns of a matrix. For example, if I have the following matrix of dimension 7:
matriz= [ 1 0 0 1 1 1 0
1 0 1 0 1 0 0
0 0 1 0 1 0 0
1 0 0 1 1 0 1
0 1 1 0 1 0 1
1 1 1 0 0 0 1
0 0 1 0 1 0 1 ]
The sum of the columns are:
col= [4 2 5 2 6 1 4]
The sum of the rows are:
row = [4 3 2 4 4 4 3]
Now, I want to obtain all possible matrices of "ones and zeros" where the sum of the columns and the rows fulfil the condition of "col" and "row" respectively.
I would appreciate ideas that can help solve this problem.
One obvious way is to brute-force a solution: for the first row, generate all the possibilities that have the right sum, then for each of these, generate all the possibilities for the 2nd row, and so on. Once you have generated all the rows, you check if the sum of the columns is right. But this will take a lot of time. My math might be rusty at this time of the day, but I believe the number of distinct possibilities for a row of length n of which k bits are 1 is given by the binomial coefficient or nchoosek(n,k) in Matlab. To determine the total number of possibilities, you have to multiply this number for every row:
>> n = 7;
>> row= [4 3 2 4 4 4 3];
>> prod(arrayfun(#(k) nchoosek(n, k), row))
ans =
3.8604e+10
This is a lot of possibilities to check! Doing the same for the columns gives
>> col= [4 2 5 2 6 1 4];
>> prod(arrayfun(#(k) nchoosek(n, k), col))
ans =
555891525
Still a large number, but 'only' a factor 70 smaller.
It might be possible to improve this brute-force method a little bit by seeing if the later rows are already constrained by the previous rows. If in your example, for a particular combination of the first two rows, both rows have a 1 in the second column, the rest of this column should all be 0, since the sum must be 2. This reduces the number of possibilities for the remaining rows a bit. Implementing such checks might complicate things a bit, but they might make the difference between a calculation that takes 2 days or one that takes just 1 hour.
An optimized version of this might alternatively generate rows and columns, and start with those for which the number of possibilities is the lowest. I don't know if there is a more elegant solution than this brute-force method, I would be interested to hear one.

How to Shuffle an Array with Fixed Row/Column Sum?

I need to assign random papers to students of a class, but I have the constraints that:
Each student should have two papers assigned.
Each paper should be assigned to (approximately) the same number of students.
Is there an elegant way to generate a matrix that has this property? i.e. it is shuffled but the row and column sums are constant? As an illustration:
Student A 1 0 0 1 1 0 | 3
Student B 1 0 1 0 0 1 | 3
Student C 0 1 1 0 1 0 | 3
Student D 0 1 0 1 0 1 | 3
----------------
2 2 2 2 2 2
I thought of first building an "initial matrix" with the right row/column sum, then randomly permuting first the rows, then the colums, but how do I generate this initial matrix? The problem here is that I'd be choosing between (e.g.) the following alternatives, and the fact that there are two students with the same pair of papers assigned (in the left setup) won't change through row/column shuffling:
INITIAL (MA): OR (MB):
A 1 1 1 0 0 0 || 1 1 1 0 0 0
B 1 1 1 0 0 0 || 0 1 1 1 0 0
C 0 0 0 1 1 1 || 0 0 0 1 1 1
D 0 0 0 1 1 1 || 1 0 0 0 1 1
I know I could come up with something quick/dirty and just tweak where necessary but it seemed like a fun exercise.
If you want to make permutations, what about:
Chose randomly a student, say student 1
For this student, chose a random paper he has, say paper A
Chose randomly another student
For this student, chose a random paper he has, say paper B (different from A)
Give paper B to student 1 and paper A to student 2.
That way, you preserve both the number of different papers and the number of papers per student. Indeed, both students give one paper and receive one back. Moreover, no paper is created nor deleted.
In term of table, it means finding two pairs of indices(i1,i2) and (j1,j2) such that A(i1,j1) = 1, A(i2,j2)=1, A(i1,j2)=0 and A(i2,j1)=0 and changing the 0s for 1s and the 1s for 0s => The sums of the rows and columns do not change.
Remark 1: If you do not want to proceed by permutations, you can simply put in a vector all the paper (put 2 times paper A, 2 times paper B,...). Then, random shuffle the vector and attribute the k first to the first student, the k next ones to student 2, ... However, you can end with a student having several times the same paper. In this case, make some permutations starting with the surnumerary papers.
You can generate the initial matrix as follows (pseudo-Python syntax):
column_sum = [0] * n_students
for i in range(n_students):
if column_sum[i] < max_allowed:
for j in range(i + 1, n_students):
if column_sum[j] < max_allowed:
generate_row_with_ones_at(i, j)
column_sum[i] += 1
column_sum[j] += 1
if n_rows == n_wanted:
return
This is a straightforward iteration over all n choose 2 distinct rows, but with the constraint on column sums enforced as early as possible.

User submitted rankings

I was looking to have members submit their top-10 list of something, or their top 10 rankings, then have some algorithm combine the results. Is there something out there like that?
Thanks!
Ahhhh, that's open-ended alright. Let's consider a simple case where only two people vote:
1 ALPHA
2 BRAVO
3 CHARLIE
1 ALPHA
2 DELTA
3 BRAVO
We can't go purely by count... ALPHA should obviously win, though it has the same votes as BRAVO. Yet, we must avoid a case where just a few first place votes dominate a massive amount of 10th place votes. To do this, I suggest the following:
$score = log($num_of_answers - $rank + 2)
First place would then be worth just a bit over one point, and tenth place would get .3 points. That logarithmic scaling prevents ridiculous dominance, yet still gives weight to rankings. From those example votes (and assuming they were the top 3 of a list of 10), you would get:
ALPHA: 2.08
BRAVO: 1.95
DELTA: .1
CHARLIE: .95
Why? Well, that's subjective. I feel out of a very long list that 4,000 10th place votes is worth more than 1,000 1st place votes. You may scale it differently by changing the base of your log (natural, 2, etc.), or choose a different system.
You could just add up the total for each item of the ranking given by a user and then sort them.
ie:
A = (a,b,c)
B = (a,c,b)
C = (b,a,c)
D = (c,b,a)
E = (a,c,b)
F = (c,a,b)
a = 1 + 1 + 2 + 3 + 1 + 2 = 10
b = 2 + 3 + 1 + 2 + 3 + 3 = 14
c = 3 + 2 + 3 + 1 + 2 + 1 = 12
Thus,
a
c
b
I think you could solve this problem by using a max flow algorithm, to create an aggregate ranking, assuming the following:
Each unique item from the list of items is a node in a graph. E.g. if there are 10 things to vote on, there are 10 nodes.
An edge goes from node *a* to node *b* if *a* is immediately before *b* in a _single user submitted_ ranking.
The last node created from a _single user submitted_ ranking will have an edge pointed at the *sink*
The first node created from a _single user submitted_ ranking will have an incoming edge from the *source*
This should get you an aggregated top-10 list.

Resources