Algorithm to assign Exams to Rooms? - algorithm

I have a problem that I have no idea what it is or how to solve it. I know there is a name for the problem (after it is known, the title could be changed to reflect it).
Its somewhat of getting a perfect fit for a particular list based on a passed formula. For eg.
I have 2 lists of objects. One list of rooms and one list of exams. For each exam, I loop through all available rooms, execute a formula (which returns a value from 0-1), 1 meaning its a good fit, and assign the highest one to the exam. I continue the loop over and over to find the best fit (which may lead to infinite loop).
I am trying to avoid using a genetic algorithm to solve this. Anyone got any idea what the name of the problem is and also a possible solution?
ps. Can an admin rename the title if I do not get the chance to?

This is the Assignment problem. Wikipedia will tell you more about how to solve it.

Related

Confusion about Rank Selection for Genetic Algorithms

I have seen other SO questions asked about rank selection for genetic algorithms, but I am still confused. I haven't really seen an answer to this, or maybe I just didn't understand it: When using the rank selection, what is the population being ranked on? I had seen some answers say it's fitness, others say it's not. If it is possible to get a snippet of code so I can better understand this would be greatly appreciated. If there are any other questions, I can answer them to provide clarity. Thank you
EDIT: The case I am trying to solve is that I have a string I need the program to get right (I know what it is and have hard-coded it)
That code snippet, the fitness function, is entirely dependent on the application. It really defines the selection process. Imagine a simple program for playing five-card draw (poker). Each candidate is an algorithm that decides which cards to replace.
The fitness function might work like this: (1) remove the specified cards. (2) repeat 100 trials: replace the cards and compute the strength of the resulting hand. (3) return the average of the 100 trials.
That average stands as the fitness measure by which the algorithms are ranked.
Does that clear things up a little?
FOLLOW-UP
This means that you have to choose a similarity metric. You'll want something that is distinct for an exact match and degrades gracefully as you get farther from the right answer. A simple search will find the popular ones.

Algorithm for assigning people based on multiple criteria

I have a list of users which need to be sorted into committees. The users can rank committees based on their particular preference, but must choose at least one to join. When they have all made their selections, the algorithm should sort them as evenly as possible taking into account their committee preference, gender, age, time zone and country (for now). I have looked at this question and its answer would seem like a good choice, but it is unclear to me how to add the various constraints to the algorithm for it to work.
Would anyone point me in the right direction on how to do this, please?
Looking for "clustering" will get you nowhere, because this is not a clustering type if task.
Instead, this is an assignment problem.
For further informarion, see:
Knapsack Problem
Generalized Assignment Problem
Usually, these are NP-hard to solve. Thus, one will usually choose a greedy optimization heuristic to find a reasonably good solution faster.
Think about how to best assign one person at a time.
Then, process the data as follows:
assign everybody that can only be assigned in a single way
find an unassigned person that is hard to assign, stop if everybody is assigned
assign the best possible way
remove preferences that are no longer admissible, and go to 1 again (there may be new person with only a single choice left)
For bonus points, add a source of randomness, and an overall quality measure. Then run the algorothm 10 times, and keep only the best result.
For further bonus, add an postprocessing optimization: when can you transfer one person to another group or swap to persons to improve the overall quality? Iterate over all persons to find such small improvements until you cannot find any.

Algorithm to Calculate "Best" Option Based on Score

I have an essentially infinite set of "challenges", and each one can be solved (or not) using a formula that has a finite set of inputs, each with a finite set of possible values (so that a possible solution to one challenge might be X=10, Y=22, Z=6, and another might be X=3, Y=14, Z=5). Whether or not a challenge is solved by a particular solution is not known until after the formula is applied to the given challenge. At that point, I will know if the challenge was solved or not.
There are some extra factors to consider:
1) Each iteration of the formula takes time and costs money, so I can't use "brute force" and just try every combination.
2) Over time, a solution that used to work may stop working, and vice versa.
3) There is usually more than one solution to a given challenge.
4) Different solutions have varying time and cost factors, so in the absence of any prior attempts at solving a particular challenge, there is a "known order" of solutions, sorted by these factors.
Each time a challenge is presented, I have access to a history of the prior attempts at solving prior challenges with identical attributes. So after I see a particular challenge a few times, I know what solution has and hasn't worked in the past.
The objective is to take the results of the past challenges (with identical attributes) and prioritize the available solutions in the order that they are most likely to succeed. In a highly simplified version of this: If solution 10/22/6 worked 3 of the last 5 times (and no other previously-attempted solutions worked more than 2 times), then solution 10/22/6 is probably the best one to try first; etc.
PS - I'm not expecting anyone to "write this" for me; just hoping for some ideas as to what to explore and experiment with. I can't imagine something like this hasn't been done before.

Designing a twenty questions algorithm

I am interested in writing a twenty questions algorithm similar to what akinator and, to a lesser extent, 20q.net uses. The latter seems to focus more on objects, explicitly telling you not to think of persons or places. One could say that akinator is more general, allowing you to think of literally anything, including abstractions such as "my brother".
The problem with this is that I don't know what algorithm these sites use, but from what I read they seem to be using a probabilistic approach in which questions are given a certain fitness based on how many times they have lead to correct guesses. This SO question presents several techniques, but rather vaguely, and I would be interested in more details.
So, what could be an accurate and efficient algorithm for playing twenty questions?
I am interested in details regarding:
What question to ask next.
How to make the best guess at the end of the 20 questions.
How to insert a new object and a new question into the database.
How to query (1, 2) and update (3) the database efficiently.
I realize this may not be easy and I'm not asking for code or a 2000 words presentation. Just a few sentences about each operation and the underlying data structures should be enough to get me started.
Update, 10+ years later
I'm now hosting a (WIP, but functional) implementation here: https://twentyq.evobyte.org/ with the code here: https://github.com/evobyte-apps/open-20-questions. It's based on the same rough idea listed below.
Well, over three years later, I did it (although I didn't work full time on it). I hosted a crude implementation at http://twentyquestions.azurewebsites.net/ if anyone is interested (please don't teach it too much wrong stuff yet!).
It wasn't that hard, but I would say it's the non-intuitive kind of not hard that you don't immediately think of. My methods include some trivial fitness-based ranking, ideas from reinforcement learning and a round-robin method of scheduling new questions to be asked. All of this is implemented on a normalized relational database.
My basic ideas follow. If anyone is interested, I will share code as well, just contact me. I plan on making it open source eventually, but once I have done a bit more testing and reworking. So, my ideas:
an Entities table that holds the characters and objects played;
a Questions table that holds the questions, which are also submitted by users;
an EntityQuestions table holds entity-question relations. This holds the number of times each answer was given for each question in relation to each entity (well, those for which the question was asked for anyway). It also has a Fitness field, used for ranking questions from "more general" down to "more specific";
a GameEntities table is used for ranking the entities according to the answers given so far for each on-going game. An answer of A to a question Q pushes up all the entities for which the majority answer to question Q is A;
The first question asked is picked from those with the highest sum of fitnesses across the EntityQuestions table;
Each next question is picked from those with the highest fitness associated with the currently top entries in the GameEntities table. Questions for which the expected answer is Yes are favored even before the fitness, because these have more chances of consolidating the current top ranked entity;
If the system is quite sure of the answer even before all 20 questions have been asked, it will start asking questions not associated with its answer, so as to learn more about that entity. This is done in a round-robin fashion from the global questions pool right now. Discussion: is round-robin fine, or should it be fully random?
Premature answers are also given under certain conditions and probabilities;
Guesses are given based on the rankings in GameEntities. This allows the system to account for lies as well, because it never eliminates any possibility, just decreases its likeliness of being the answer;
After each game, the fitness and answers statistics are updated accordingly: fitness values for entity-question associations decrease if the game was lost, and increase otherwise.
I can provide more details if anyone is interested. I am also open to collaborating on improving the algorithms and implementation.
This is a very interesting question. Unfortunately I don't have a full answer, let me just write down the ideas I could come up with in 10 minutes:
If you are able to halve the set of available answers on each question, you can distinguish between 2^20 ~ 1 million "objects". Your set is probably going to be larger, so it's right to assume that sometimes you have to make a guess.
You want to maximize utility. Some objects are chosen more often than others. If you want to make good guesses you have to take into consideration the weight of each object (= the probability of that object being picked) when creating the tree.
If you trust a little bit of your users you can gain knowledge based on their answers. This also means that you cannot use a static tree to ask questions because then you'll get the answers for the same questions.. and you'll learn nothing new if you encounter with the same object.
If a simple question is not able to divide the set to two halves, you could combine them to get better results: eg: "is the object green or blue?". "green or has a round shape?"
I am trying try to write a python implementation using a naïve Bayesian network for learning and minimizing the expected entropy after the question has been answered as criterium for selecting a question (with an epsilon chance of selecting a random question in order to learn more about that question), following the ideas in http://lists.canonical.org/pipermail/kragen-tol/2010-March/000912.html. I have put what I got so far on github.
Preferably choose questions with low remaining entropy expectation. (For putting together something quickly, I stole from ε-greedy multi-armed bandit learning and use: With probability 1–ε: Ask the question with the lowest remaining entropy expectation. With probability ε: Ask any random question. However, this approach seems far from optimal.)
Since my approach is a Bayesian network, I obtain the probabilities of the objects and can ask for the most probable object.
A new object is added as new column to the probabilities matrix, with low a priori probability and the answers to the questions as given if given or as guessed by the Bayes network if not given. (I expect that this second part would work much better if I would add Bayes network structure learning instead of just using naive Bayes.)
Similarly, a new question is a new row in the matrix. If it comes from user input, probably only very few answer probabilities are known, the rest needs to be guessed. (In general, if you can get objects by asking for properties, you can obtain properties by asking if given objects have them or not, and the transformation between these is essentially Bayes' theorem and breaks down to transposition in the easiest case. The guessing quality should improve again once the network has an appropriate structure.)
(This is a problem, since I calculate lots of probabilities. My goal is to do it using database-oriented sparse tensor calculations optimized for working with weighted directed acyclic graphs.)
It would be interesting to see how good a decision tree based algorithm would serve you. The trick here is purely in the learning/sorting of the tree. I'd like to note that this is stuff I remember from AI class and student work in the AI working group and should be taken with a semi-large grain (or nugget) of salt.
To answer the questions:
You just walk the tree :)
This is a big downside of decision trees. You'd only have one guess that can be attached to the end nodes of the tree at depth 20 (or earlier, if the tree is still sparse).
There are whole books dedicated to this topic. As far as I remember from AI class you try minimize entropy at all times, so you want to ask questions that ideally divide the set of remaining objects into two sets of equal size. I'm afraid you'd have to look this up in AI books.
Decision trees are highly efficient during the query phase, as you literally walk the tree and follow the 'yes' or 'no' branch at each node. Update efficiency depends on the learning algorithm applied. You might be able to do this offline as in a nightly batched update or something like that.

Algorithm for a planning tool

I'm writing a small software application that needs to serve as a simple planning tool for a local school. The 'problem' it needs to solve is fairly basic. Namely, the teachers need to talk with the parents of all children. However, some children have, of course, brothers and sisters in different groups, so these talks need to be scheduled next to eachother, to avoid the situations were parents have a talk at 6 pm and another one at 10 pm. Thus in short, given a collection of n children, where some children have 1 or more brothers or sisters, generate a schedule where all the talks of these children are planned next to each other.
Now, maybe the problem can be solved extremely easy, but on the other I have a feeling this can be a pretty complicated problem, that needs and can be solved with some sort of algorithm. Elegantly. But am I right? Is there? I've looked at the Hungarian alorithm but it doesn't quite apply to this particular problem.
Edit: I forgot to mention, that all talks take the same amount of time.
Thanks!
I think it is quite easy.
First group the kids which belong together because they share parents. Schedule the children inside a group consecutively, schedule the rest as random.
Another way to abstract it and make the problem easier is to look from the parent perspective, see brothers and sister as "child" and give them more time. Then you can just schedule the parents at random, but some need more time (because they have multiple childeren).
One approach woul dbe to define the problem in a declarative constraint language and then let it solve the problem for you. The last time I did this, I used ECLiPSe, which is a nifty little language where you define your problem space by constraints, and then let it find allowable values that satisfy those constraints.
For example, I believe you have two classes of constraints:
A teacher may only have one
conference at a time
All students in the same family must
have consecutive slots
Once you define these in ECLiPSe, it will calculate values for each student that satisfy the requirements. If you go this way, you can also easily add constraints as you need to. For example, it's easy to say that a teach is unavailable for slot Y, or teachers must take turns doing administrative work, etc.
This sorts feels like a "backpack algorithm" type of problem. You need to group the family members together then fill slots appropriately.
If you google "backpack algorithm", you'll see enough write-ups to make your head spin and also some good coded solutions.
I think if each talk could be reduced to "activities" where each activity has a start time and an end time you could use the activity-selection algorithm studied in computer science. It is based on a greedy approach and could be solved in O(n) (where n is the number of activities). You could find more information here. I am sure you will need to have to do a pre-processing here to be able to reduce the brother/sister issue as activities of the same type.

Resources