What would you use for `n to n` relations in python? - data-structures

after fiddling around with dictionaries, I came to the conclusion, that I would need a data structure that would allow me an n to n lookup. One example would be: A course can be visited by several students and each student can visit several courses.
What would be the most pythonic way to achieve this? It wont be more than 500 Students and 100 courses, to stay with the example. So I would like to avoid using a real database software.
Thanks!

Since your working set is small, I don't think it is a problem to just store the student IDs as lists in the Course class. Finding students in a class would be as simple as doing
course.studentIDs
To find courses a student is in, just iterate over the courses and find the ID:
studentIDToGet = "johnsmith001"
studentsCourses = list()
for course in courses:
if studentIDToGet in course.studentIDs:
studentsCourses.append(course.id)
There's other ways you could do it. You could have a dictionary of studentIDs mapped to courseIDs or two dictionaries that - one mapped studentIDs:courseIDs and another courseIDs:studentIDs - when updated, update each other.
The implementation I wrote out the code for would probably be the slowest, which is why I mentioned that your working set is small enough that it would not be a problem. The other implentations I mentioned but did not show the code for would require some more code to make them work that just aren't worth the effort.

It depends completely on what operations you want the structure to be able to carry out quickly.
If you want to be able to quickly look up properties related to both a course and a student, for example how many hours a student has spent on studies for a specific course, or what grade the student has in the course if he has finished it, and if he has finished it etc. a vector containing n*m elements is probably what you need, where n is the number of students and m is the number of courses.
If on the other hand the average number of courses a student has taken is much less than the total number of courses (which it probably is for a real case scenario), and you want to be able to quickly look up all the courses a student has taken, you probably want to use an array consisting of n lists, either linked lists, resizable vectors or similar – depending on if you want to be able to with the lists; maybe that is to quickly remove elements in the middle of the lists, or quickly access an element at a random location. If you both want to be able to quickly remove elements in the middle of the lists and have quick random access to list elements, then maybe some kind of tree structure would be the most suitable for you.
Most tree data structures carry out all basic operations in logarithmic time to the number of elements in the tree. Beware that some tree data structures have an amortized time on these operators that is linear to the number of elements in the tree, even though the average time for a randomly constructed tree would be logarithmic. A typical example of when this happens is if you use a binary search tree and build it up with increasingly large elements. Don't do that; scramble the elements before you use them to build up the tree in that case, or use a divide-and-conquer method and split the list in two parts and one pivot element and create the tree root with the pivot element, then recursively create trees from both the left part of the list and the right part of the list, these also using the divide-and-conquer method, and attach them to the root as the left child and the right child respectively.
I'm sorry, I don't know python so I don't know what data structures that are part of the language and which you have to create yourself.

I assume you want to index both the Students and Courses. Otherwise you can easily make a list of tuples to store all Student,Course combinations: [ (St1, Crs1), (St1, Crs2) .. (St2, Crs1) ... (Sti, Crsi) ... ] and then do a linear lookup everytime you need to. For upto 500 students this ain't bad either.
However if you'd like to have a quick lookup either way, there is no builtin data structure. You can simple use two dictionaries:
courses = { crs1: [ st1, st2, st3 ], crs2: [ st_i, st_j, st_k] ... }
students = { st1: [ crs1, crs2, crs3 ], st2: [ crs_i, crs_j, crs_k] ... }
For a given student s, looking up courses is now students[s]; and for a given course c, looking up students is courses[c].

For something simple like what you want to do, you could create a simple class with data members and methods to maintain them and keep them consistent with each other. For this problem two dictionaries would be needed. One keyed by student name (or id) that keeps track of the courses each is taking, and another that keeps track of which students are in each class.
defaultdicts from the 'collections' module could be used instead of plain dicts to make things more convenient. Here's what I mean:
from collections import defaultdict
class Enrollment(object):
def __init__(self):
self.students = defaultdict(set)
self.courses = defaultdict(set)
def clear(self):
self.students.clear()
self.courses.clear()
def enroll(self, student, course):
if student not in self.courses[course]:
self.students[student].add(course)
self.courses[course].add(student)
def drop(self, course, student):
if student in self.courses[course]:
self.students[student].remove(course)
self.courses[course].remove(student)
# remove student if they are not taking any other courses
if len(self.students[student]) == 0:
del self.students[student]
def display_course_enrollments(self):
print "Class Enrollments:"
for course in self.courses:
print ' course:', course,
print ' ', [student for student in self.courses[course]]
def display_student_enrollments(self):
print "Student Enrollments:"
for student in self.students:
print ' student', student,
print ' ', [course for course in self.students[student]]
if __name__=='__main__':
school = Enrollment()
school.enroll('john smith', 'biology 101')
school.enroll('mary brown', 'biology 101')
school.enroll('bob jones', 'calculus 202')
school.display_course_enrollments()
print
school.display_student_enrollments()
school.drop('biology 101', 'mary brown')
print
print 'After mary brown drops biology 101:'
print
school.display_course_enrollments()
print
school.display_student_enrollments()
Which when run produces the following output:
Class Enrollments:
course: calculus 202 ['bob jones']
course: biology 101 ['mary brown', 'john smith']
Student Enrollments:
student bob jones ['calculus 202']
student mary brown ['biology 101']
student john smith ['biology 101']
After mary brown drops biology 101:
Class Enrollments:
course: calculus 202 ['bob jones']
course: biology 101 ['john smith']
Student Enrollments:
student bob jones ['calculus 202']
student john smith ['biology 101']

Related

Prolog Recursion | using multiple predicates to calculate sum

could someone tell me how do I solve the below question:
people_in_capitals(N): N is the total number of people living in capital cities of the world.
"Useful predicates:
country(Name, ID, Capital, CapitalProvince, Size, Population)
city(Name, Country ID, Province, Population, Lat, Lon, Elevation)."
I can use "findall" function to get and store the capitals in List, however, how do I use items in the list to find the population from the predicate city?
Show your code so far!
Also, write a predicate that iterates over the list (as described here: https://www.doc.gold.ac.uk/~mas02gw/prolog_tutorial/prologpages/lists.html) and sums the city population into the target value.

Which data structure to use to implement family tree in ruby?

I am trying to create a simple family tree in ruby where I could add children through the mother nodes. Also when I give a name and a relation as input I should be able to get the output as names of people related to the given person name.
For example, I should be able to do operations like
add_child('Tina', 'bob') // which will add bob as a child node to Tina
get_relation(bob, maternal_uncles) // which should output all the siblings of Tina in this case.
Which data structure is best to implement this and how to implement it in ruby? In my research I found graph is good approach and I was researching on its implementation since 2 days but could not find any solution.
I tried the following libraries
RubyTree https://github.com/evolve75/RubyTree - This helped me to get parents, siblings, grandparents relations but I could not think of how I can use this to get relations like father's brothers(paternal uncle), wife's sisters(sister in law) etc
weighted graph https://github.com/msayson/weighted_graph - I used 0 to represent spouse and 1 to represent children. I could not go anywhere from here. I got confused on how to even get parents and children of a given person.
I explored a little bit about ruby prefix trees and rgl gem but I could not apply them to my application.
Please help. Thanks in advance!
I could figure out a way to get minimum relations using RubyTree itself. RubyTree has inbuilt methods like parent, siblings, children etc and also we can pass content to the nodes.
So I used these to get what I want. For example for creating a spouse I created a child node to the root node and passed a hash like {relation: spouse} in the content. In this way I am able to apply logic by putting conditions on this hash and get the relations that I want
example:
tina = Tree::TreeNode.new('Tina', {gender: 'female', relation: 'root'})
mike = tina << Tree::TreeNode.new('Mike', {gender: 'male', relation: 'spouse'})
sofi = tina << Tree::TreeNode.new('sofi', {gender: 'female', relation: 'child'})
...... #add all children and their children like this
puts "--------siblings of sofi--------"
siblings_of_sofi.each do |sib|
if sib.content[:relation] == 'child'
puts sib.name
end
end
# assume tina has 4 sons and one of them is bob and alice is daughter of bob.
puts "--------alice paternal uncles--------"
puts alice.parent.name
puts alice.parent.content
if alice.parent.content[:relation] == 'spouse'
father = alice.parent.parent #as per the question child should be added through mother only, therefore alice is added as a child to bob's wife and bob's wife is added as a child to bob as {relation: spouse}
uncles = father.siblings
uncles.each do |uncle|
puts uncle.name if uncle.content[:gender] == 'male' && uncle.content[:relation] == 'child'
end
end

Algorithm to create unique random concatenation of items

I'm thinking about an algorithm that will create X most unique concatenations of Y parts, where each part can be one of several items. For example 3 parts:
part #1: 0,1,2
part #2: a,b,c
part #3: x,y,z
And the (random, one case of some possibilities) result of 5 concatenations:
0ax
1by
2cz
0bz (note that '0by' would be "less unique " than '0bz' because 'by' already was)
2ay (note that 'a' didn't after '2' jet, and 'y' didn't after 'a' jet)
Simple BAD results for next concatenation:
1cy ('c' wasn't after 1, 'y' wasn't after 'c', BUT '1'-'y' already was as first-last
Simple GOOD next result would be:
0cy ('c' wasn't after '0', 'y' wasn't after 'c', and '0'-'y' wasn't as first-last part)
1az
1cx
I know that this solution limit possible results, but when all full unique possibilities will gone, algorithm should continue and try to keep most avaible uniqueness (repeating as few as possible).
Consider real example:
Boy/Girl/Martin
bought/stole/get
bottle/milk/water
And I want results like:
Boy get milk
Martin stole bottle
Girl bought water
Boy bought bottle (not water, because of 'bought+water' and not milk, because of 'Boy+milk')
Maybe start with a tree of all combinations, but how to select most unique trees first?
Edit: According to this sample data, we can see, that creation of fully unique results for 4 words * 3 possibilities, provide us only 3 results:
Martin stole a bootle
Boy bought an milk
He get hard water
But, there can be more results requested. So, 4. result should be most-available-uniqueness like Martin bought hard milk, not Martin stole a water
Edit: Some start for a solution ?
Imagine each part as a barrel, wich can be rotated, and last item goes as first when rotates down, first goes as last when rotating up. Now, set barells like this:
Martin|stole |a |bootle
Boy |bought|an |milk
He |get |hard|water
Now, write sentences as We see, and rotate first barell UP once, second twice, third three and so on. We get sentences (note that third barell did one full rotation):
Boy |get |a |milk
He |stole |an |water
Martin|bought|hard|bootle
And we get next solutions. We can do process one more time to get more solutions:
He |bought|a |water
Martin|get |an |bootle
Boy |stole |hard|milk
The problem is that first barrel will be connected with last, because rotating parallel.
I'm wondering if that will be more uniqe if i rotate last barrel one more time in last solution (but the i provide other connections like an-water - but this will be repeated only 2 times, not 3 times like now). Don't know that "barrels" are good way ofthinking here.
I think that we should first found a definition for uniqueness
For example, what is changing uniqueness to drop ? If we use word that was already used ? Do repeating 2 words close to each other is less uniqe that repeating a word in some gap of other words ? So, this problem can be subjective.
But I think that in lot of sequences, each word should be used similar times (like selecting word randomly and removing from a set, and after getting all words refresh all options that they can be obtained next time) - this is easy to do.
But, even if we get each words similar number od times, we should do something to do-not-repeat-connections between words. I think, that more uniqe is repeating words far from each other, not next to each other.
Anytime you need a new concatenation, just generate a completely random one, calculate it's fitness, and then either accept that concatenation or reject it (probabilistically, that is).
const C = 1.0
function CreateGoodConcatenation()
{
for (rejectionCount = 0; ; rejectionCount++)
{
candidate = CreateRandomConcatination()
fitness = CalculateFitness(candidate) // returns 0 < fitness <= 1
r = GetRand(zero to one)
adjusted_r = Math.pow(r, C * rejectionCount + 1) // bias toward acceptability as rejectionCount increases
if (adjusted_r < fitness)
{
return candidate
}
}
}
CalculateFitness should never return zero. If it does, you might find yourself in an infinite loop.
As you increase C, less ideal concatenations are accepted more readily.
As you decrease C, you face increased iterations for each call to CreateGoodConcatenation (plus less entropy in the result)

Speed dating algorithm

I work in a consulting organization and am most of the time at customer locations. Because of that I rarely meet my colleagues. To get to know each other better we are going to arrange a dinner party. There will be many small tables so people can have a chat. In order to talk to as many different people as possible during the party, everybody has to switch tables at some interval, say every hour.
How do I write a program that creates the table switching schedule? Just to give you some numbers; in this case there will be around 40 people and there can be at most 8 people at each table. But, the algorithm needs to be generic of course
heres an idea
first work from the perspective of the first person .. lets call him X
X has to meet all the other people in the room, so we should divide the remaining people into n groups ( where n = #_of_people/capacity_per_table ) and make him sit with one of these groups per iteration
Now that X has been taken care of, we will consider the next person Y
WLOG Y be a person X had to sit with in the first iteration itself.. so we already know Y's table group for that time-frame.. we should then divide the remaining people into groups such that each group sits with Y for every consecutive iteration.. and for each iteration X's group and Y's group have no person in common
.. I guess, if you keep doing something like this, you will get an optimal solution (if one exists)
Alternatively you could crowd source the problem by giving each person a card where they could write down the names of all the people they got dine with.. and at the end of event, present some kind of prize to the person with the most names in their card
This sounds like an application for genetic algorithm:
Select a random permutation of the 40 guests - this is one seating arrangement
Repeat the random permutation N time (n is how many times you are to switch seats in the night)
Combine the permutations together - this is the chromosome for one organism
Repeat for how ever many organisms you want to breed in one generation
The fitness score is the number of people each person got to see in one night (or alternatively - the inverse of the number of people they did not see)
Breed, mutate and introduce new organisms using the normal method and repeat until you get a satisfactory answer
You can add in any other factors you like into the fitness, such as male/female ratio and so on without greatly changing the underlying method.
Why not imitate real world?
class Person {
void doPeriodically() {
do {
newTable = random (numberOfTables);
} while (tableBusy(newTable))
switchTable (newTable)
}
}
Oh, and note that there is a similar algorithm for finding a mating partner and it's rumored to be effective for those 99% of people who don't spend all of their free time answering programming questions...
Perfect Table Plan
You might want to have a look at combinatorial design theory.
Intuitively I don't think you can do better than a perfect shuffle, but it's beyond my pre-coffee cognition to prove it.
This one was very funny! :D
I tried different method but the logic suggested by adi92 (card + prize) is the one that works better than any other I tried.
It works like this:
a guy arrives and examines all the tables
for each table with free seats he counts how many people he has to meet yet, then choose the one with more unknown people
if two tables have an equal number of unknown people then the guy will choose the one with more free seats, so that there is more probability to meet more new people
at each turn the order of the people taking seats is random (this avoid possible infinite loops), this is a "demo" of the working algorithm in python:
import random
class Person(object):
def __init__(self, name):
self.name = name
self.known_people = dict()
def meets(self, a_guy, propagation = True):
"self meets a_guy, and a_guy meets self"
if a_guy not in self.known_people:
self.known_people[a_guy] = 1
else:
self.known_people[a_guy] += 1
if propagation: a_guy.meets(self, False)
def points(self, table):
"Calculates how many new guys self will meet at table"
return len([p for p in table if p not in self.known_people])
def chooses(self, tables, n_seats):
"Calculate what is the best table to sit at, and return it"
points = 0
free_seats = 0
ret = random.choice([t for t in tables if len(t)<n_seats])
for table in tables:
tmp_p = self.points(table)
tmp_s = n_seats - len(table)
if tmp_s == 0: continue
if tmp_p > points or (tmp_p == points and tmp_s > free_seats):
ret = table
points = tmp_p
free_seats = tmp_s
return ret
def __str__(self):
return self.name
def __repr__(self):
return self.name
def Switcher(n_seats, people):
"""calculate how many tables and what switches you need
assuming each table has n_seats seats"""
n_people = len(people)
n_tables = n_people/n_seats
switches = []
while not all(len(g.known_people) == n_people-1 for g in people):
tables = [[] for t in xrange(n_tables)]
random.shuffle(people) # need to change "starter"
for the_guy in people:
table = the_guy.chooses(tables, n_seats)
tables.remove(table)
for guy in table:
the_guy.meets(guy)
table += [the_guy]
tables += [table]
switches += [tables]
return switches
lst_people = [Person('Hallis'),
Person('adi92'),
Person('ilya n.'),
Person('m_oLogin'),
Person('Andrea'),
Person('1800 INFORMATION'),
Person('starblue'),
Person('regularfry')]
s = Switcher(4, lst_people)
print "You need %d tables and %d turns" % (len(s[0]), len(s))
turn = 1
for tables in s:
print 'Turn #%d' % turn
turn += 1
tbl = 1
for table in tables:
print ' Table #%d - '%tbl, table
tbl += 1
print '\n'
This will output something like:
You need 2 tables and 3 turns
Turn #1
Table #1 - [1800 INFORMATION, Hallis, m_oLogin, Andrea]
Table #2 - [adi92, starblue, ilya n., regularfry]
Turn #2
Table #1 - [regularfry, starblue, Hallis, m_oLogin]
Table #2 - [adi92, 1800 INFORMATION, Andrea, ilya n.]
Turn #3
Table #1 - [m_oLogin, Hallis, adi92, ilya n.]
Table #2 - [Andrea, regularfry, starblue, 1800 INFORMATION]
Because of the random it won't always come with the minimum number of switch, especially with larger sets of people. You should then run it a couple of times and get the result with less turns (so you do not stress all the people at the party :P ), and it is an easy thing to code :P
PS:
Yes, you can save the prize money :P
You can also take look at stable matching problem. The solution to this problem involves using max-flow algorithm. http://en.wikipedia.org/wiki/Stable_marriage_problem
I wouldn't bother with genetic algorithms. Instead, I would do the following, which is a slight refinement on repeated perfect shuffles.
While (there are two people who haven't met):
Consider the graph where each node is a guest and edge (A, B) exists if A and B have NOT sat at the same table. Find all the connected components of this graph. If there are any connected components of size < tablesize, schedule those connected components at tables. Note that even this is actually an instance of a hard problem known as Bin packing, but first fit decreasing will probably be fine, which can be accomplished by sorting the connected components in order of biggest to smallest, and then putting them each of them in turn at the first table where they fit.
Perform a random permutation of the remaining elements. (In other words, seat the remaining people randomly, which at first will be everyone.)
Increment counter indicating number of rounds.
Repeat the above for a while until the number of rounds seems to converge.

Secret santa algorithm

Every Christmas we draw names for gift exchanges in my family. This usually involves mulitple redraws until no one has pulled their spouse. So this year I coded up my own name drawing app that takes in a bunch of names, a bunch of disallowed pairings, and sends off an email to everyone with their chosen giftee.
Right now, the algorithm works like this (in pseudocode):
function DrawNames(list allPeople, map disallowedPairs) returns map
// Make a list of potential candidates
foreach person in allPeople
person.potentialGiftees = People
person.potentialGiftees.Remove(person)
foreach pair in disallowedPairs
if pair.first = person
person.Remove(pair.second)
// Loop through everyone and draw names
while allPeople.count > 0
currentPerson = allPeople.findPersonWithLeastPotentialGiftees
giftee = pickRandomPersonFrom(currentPerson.potentialGiftees)
matches[currentPerson] = giftee
allPeople.Remove(currentPerson)
foreach person in allPeople
person.RemoveIfExists(giftee)
return matches
Does anyone who knows more about graph theory know some kind of algorithm that would work better here? For my purposes, this works, but I'm curious.
EDIT: Since the emails went out a while ago, and I'm just hoping to learn something I'll rephrase this as a graph theory question. I'm not so interested in the special cases where the exclusions are all pairs (as in spouses not getting each other). I'm more interested in the cases where there are enough exclusions that finding any solution becomes the hard part. My algorithm above is just a simple greedy algorithm that I'm not sure would succeed in all cases.
Starting with a complete directed graph and a list of vertex pairs. For each vertex pair, remove the edge from the first vertex to the second.
The goal is to get a graph where each vertex has one edge coming in, and one edge leaving.
Just make a graph with edges connecting two people if they are allowed to share gifts and then use a perfect matching algorithm. (Look for "Paths, Trees, and Flowers" for the (clever) algorithm)
I was just doing this myself, in the end the algorithm I used doesn't exactly model drawing names out of a hat, but it's pretty damn close. Basically shuffle the list, and then pair each person with the next person in the list. The only difference with drawing names out of a hat is that you get one cycle instead of potentially getting mini subgroups of people who only exchange gifts with each other. If anything that might be a feature.
Implementation in Python:
import random
from collections import deque
def pairup(people):
""" Given a list of people, assign each one a secret santa partner
from the list and return the pairings as a dict. Implemented to always
create a perfect cycle"""
random.shuffle(people)
partners = deque(people)
partners.rotate()
return dict(zip(people,partners))
I wouldn't use disallowed pairings, since that greatly increases the complexity of the problem. Just enter everyone's name and address into a list. Create a copy of the list and keep shuffling it until the addresses in each position of the two lists don't match. This will ensure that no one gets themselves, or their spouse.
As a bonus, if you want to do this secret-ballot-style, print envelopes from the first list and names from the second list. Don't peek while stuffing the envelopes. (Or you could just automate emailing everyone thier pick.)
There are even more solutions to this problem on this thread.
Hmmm. I took a course in graph theory, but simpler is to just randomly permute your list, pair each consecutive group, then swap any element that is disallowed with another. Since there's no disallowed person in any given pair, the swap will always succeed if you don't allow swaps with the group selected. Your algorithm is too complex.
Create a graph where each edge is "giftability" Vertices that represent Spouses will NOT be adjacent. Select an edge at random (that is a gift assignment). Delete all edges coming from the gifter and all edges going to the receiver and repeat.
There is a concept in graph theory called a Hamiltonian Circuit that describes the "goal" you describe. One tip for anybody who finds this is to tell users which "seed" was used to generate the graph. This way if you have to re-generate the graph you can. The "seed" is also useful if you have to add or remove a person. In that case simply choose a new "seed" and generate a new graph, making sure to tell participants which "seed" is the current/latest one.
I just created a web app that will do exactly this - http://www.secretsantaswap.com/
My algorithm allows for subgroups. It's not pretty, but it works.
Operates as follows:
1. assign a unique identifier to all participants, remember which subgroup they're in
2. duplicate and shuffle that list (the targets)
3. create an array of the number of participants in each subgroup
4. duplicate array from [3] for targets
5. create a new array to hold the final matches
6. iterate through participants assigning the first target that doesn't match any of the following criteria:
A. participant == target
B. participant.Subgroup == target.Subgroup
C. choosing the target will cause a subgroup to fail in the future (e.g. subgroup 1 must always have at least as many non-subgroup 1 targets remaining as participants subgroup 1 participants remaining)
D. participant(n+1) == target (n +1)
If we assign the target we also decrement the arrays from 3 and 4
So, not pretty (at all) but it works. Hope it helps,
Dan Carlson
Here a simple implementation in java for the secret santa problem.
public static void main(String[] args) {
ArrayList<String> donor = new ArrayList<String>();
donor.add("Micha");
donor.add("Christoph");
donor.add("Benj");
donor.add("Andi");
donor.add("Test");
ArrayList<String> receiver = (ArrayList<String>) donor.clone();
Collections.shuffle(donor);
for (int i = 0; i < donor.size(); i++) {
Collections.shuffle(receiver);
int target = 0;
if(receiver.get(target).equals(donor.get(i))){
target++;
}
System.out.println(donor.get(i) + " => " + receiver.get(target));
receiver.remove(receiver.get(target));
}
}
Python solution here.
Given a sequence of (person, tags), where tags is itself a (possibly empty) sequence of strings, my algorithm suggests a chain of persons where each person gives a present to the next in the chain (the last person obviously is paired with the first one).
The tags exist so that the persons can be grouped and every time the next person is chosen from the group most dis-joined to the last person chosen. The initial person is chosen by an empty set of tags, so it will be picked from the longest group.
So, given an input sequence of:
example_sequence= [
("person1", ("male", "company1")),
("person2", ("female", "company2")),
("person3", ("male", "company1")),
("husband1", ("male", "company2", "marriage1")),
("wife1", ("female", "company1", "marriage1")),
("husband2", ("male", "company3", "marriage2")),
("wife2", ("female", "company2", "marriage2")),
]
a suggestion is:
['person1 [male,company1]',
'person2 [female,company2]',
'person3 [male,company1]',
'wife2 [female,marriage2,company2]',
'husband1 [male,marriage1,company2]',
'husband2 [male,marriage2,company3]',
'wife1 [female,marriage1,company1]']
Of course, if all persons have no tags (e.g. an empty tuple) then there is only one group to choose from.
There isn't always an optimal solution (think an input sequence of 10 women and 2 men, their genre being the only tag given), but it does a good work as much as it can.
Py2/3 compatible.
import random, collections
class Statistics(object):
def __init__(self):
self.tags = collections.defaultdict(int)
def account(self, tags):
for tag in tags:
self.tags[tag] += 1
def tags_value(self, tags):
return sum(1./self.tags[tag] for tag in tags)
def most_disjoined(self, tags, groups):
return max(
groups.items(),
key=lambda kv: (
-self.tags_value(kv[0] & tags),
len(kv[1]),
self.tags_value(tags - kv[0]) - self.tags_value(kv[0] - tags),
)
)
def secret_santa(people_and_their_tags):
"""Secret santa algorithm.
The lottery function expects a sequence of:
(name, tags)
For example:
[
("person1", ("male", "company1")),
("person2", ("female", "company2")),
("person3", ("male", "company1")),
("husband1", ("male", "company2", "marriage1")),
("wife1", ("female", "company1", "marriage1")),
("husband2", ("male", "company3", "marriage2")),
("wife2", ("female", "company2", "marriage2")),
]
husband1 is married to wife1 as seen by the common marriage1 tag
person1, person3 and wife1 work at the same company.
…
The algorithm will try to match people with the least common characteristics
between them, to maximize entrop— ehm, mingling!
Have fun."""
# let's split the persons into groups
groups = collections.defaultdict(list)
stats = Statistics()
for person, tags in people_and_their_tags:
tags = frozenset(tag.lower() for tag in tags)
stats.account(tags)
person= "%s [%s]" % (person, ",".join(tags))
groups[tags].append(person)
# shuffle all lists
for group in groups.values():
random.shuffle(group)
output_chain = []
prev_tags = frozenset()
while 1:
next_tags, next_group = stats.most_disjoined(prev_tags, groups)
output_chain.append(next_group.pop())
if not next_group: # it just got empty
del groups[next_tags]
if not groups: break
prev_tags = next_tags
return output_chain
if __name__ == "__main__":
example_sequence = [
("person1", ("male", "company1")),
("person2", ("female", "company2")),
("person3", ("male", "company1")),
("husband1", ("male", "company2", "marriage1")),
("wife1", ("female", "company1", "marriage1")),
("husband2", ("male", "company3", "marriage2")),
("wife2", ("female", "company2", "marriage2")),
]
print("suggested chain (each person gives present to next person)")
import pprint
pprint.pprint(secret_santa(example_sequence))

Resources