Grouping Algorithm with Requirements - algorithm

So I am working on writing an algorithm that when given a group of people from different places, it will organize them into groups of three based off a few parameters:
No two people in a group are from the same place
No two people in a group have met before
Everyone in the group is available to meet on the same day
Not more than one person is below the age of 18
I have in my data structures a variable for all the required prereqs above. I was wondering if there was a good way to go about solving this problem? Currently I am using a variation of the Gale-Shapely Algorithm (solution to Stable Marriage Problem). This solution works relatively well, but more often that not, it requires me to go in and make some minor tweaks to the final groups.
Any one have any ideas/suggestions?
I appreciate the help in advance.

This is a graph partitioning problem, and as such it's almost certainly NP-Hard (when you've only got one constraint then the problem can be solved in polynomial time - this is similar to the stable marriage problem - but when you've got multiple constraints then the complexity shoots way up). A good way to solve these problems is to apply a heuristic (which you're doing with the Gale-Shapely algorithm), and to then resolve any conflicts with a local backtracking search (which it sounds like you're currently doing by hand). My suggestion is to keep your current heuristic if it seems to be working well for you, and to add an automated local backtracking algorithm to resolve any conflicts that arise from the heuristic (e.g. if you've got a single group that has two people who are under 18, then swap one of these people with somebody who is over 18 and who doesn't violate the other three constraints; if this isn't possible, then choose somebody who is over 18 who violates the fewest constraints, and then swap out somebody else from the group in order to satisfy the now violated constraint; after N failed iterations the algorithm throws up its hand and asks for human intervention)

Related

Genetic algorithm. A population contains chromosomes, that don't perform conditions. What should I do with?

I'm working on Traveling Salesman Problem solution with a Genetic Algorithm.
Some chromosomes are contain the shortest way, but they still aren't appropriate.
For example, a salesman must get to the A city at 6.00 pm, but using the solution of a chromosome he'll get there at 7.00 pm. Thus, this solution is not correct.
What should I do with this issue?
Firstly, I can change these chromosomes. But how can I do it?
Secondly, I can keep them. How should I do the selection then?
Thirdly, I can replace them, but I have no idea what should I use instead.
Could you please help me or recommend me some useful information?
English is not my native language, so sorry if I said something wrong.
The easiest solution in my opinion to make the samples carrying these chromosomes unvital.
This means, that at each iteration of your Genetic Algorithm, possible solution that carries that chromosome "dies". This will make sure that the population carrying this chromosome will remain extremely small, and will not be able to "reproduce" for next generations, and will not be an issue - because samples having this chromosome cannot become dominant in the population.
Do not kill the chromosomes if it's not necessary.
Kill the chromosomes only if the population became too large.
Simply Use the crossover and mutation operators, maybe with an elitist sampling strategy.
PS:
Have you read the book of Zbigniew Michalewicz about Genetic Algorithms?
I'm pretty sure it contains an example with the Salesman Problem
You are dealing with a problem with constraints. Using GAs to solve such problems can be tricky, but there are generally four possibilites of how to deal with constraints:
Use such representation that makes the solutions always valid.
Don't use any constraint-preserving representation but introduce some correction operator which makes a valid solution out of an invalid one.
Penalize (or kill) the invalid solutions.
Use a multi-objective algorithm (e.g. NSGA-II, my favourite one) and turn the constraints into objectives.
The last option is very effective but you need to be able to measure how much is a constraint violated. If this is possible, which seems to me that it is in your case - you can just sum the differences between the desired and actual time of the visit, then this measure of the constraint violation just becomes another objective and you optimize the original objective and the new one(s) at the same time. This approach enables the algorithm to exploit useful information in all individuals even if they are invalid.

Defining a special case of a subset-sum with complications

I have a problem that I have a number of questions about. First, I'm mostly looking for help describing and understanding the problem at hand. Solutions are always welcome, but most importantly I could use some advice from someone more experienced than I. Now, to the problem at hand:
I have a set of orders that each require some number of items. I also have several groupings of items that each contain some number of some items (call them groups). The goal is to find a subset of the orders that can be fulfilled using as few groups as possible and where the total number of items contained within the orders is between n and N.
Edit: The constraints on the number of items contained in the orders (n and N) are chosen independently.
To me at least, that's a really complicated way of saying the problem so I've been trying to re-phrase it as a knapsack problem (I suspect this might reduce to a subset-sum). To help my conceptual understanding of this I've started using the following definitions:
First, lets say that a dimension exists for each possible item, and somethings 'length' in that dimension is the number of that particular type of item it either has or requires.
From this, an order becomes an 'n-dimensional object' where its value in each dimension corresponds to the number of that item that it requires.
In addition, a group can be seen as an 'n-dimensional box' that has space in each dimension corresponding to the number of items it provides.
An objects value is equal to the sum of its length in all dimensions.
Boxes can be combined.
Given the above I've rephrased the problem to this:
What is the smallest combination of boxes that can hold a combination of items with value between n and N.
Question #1: Is this a correct/useful way to express the problem? Does it seem like I've missed anything obvious?
As I see it, since there are two combinations that I'm looking for I need to break the problem into two parts. So far I think breaking the problem up like this is a good step:
How many objects can box (or combination of boxes) X hold?
Check all (or preferably some small subset of) the possible combinations of boxes and pick the 'best'.
That makes it a little more manageable, but I'm still struggling with the details.
Question #2: Solved To solve the first part I think it's appropriate to say that the cost of an object is equal to the sum of its length in all dimensions, so is it's value. That places me into a subset-sum problem, right? Obviously it's a special case, but does this problem have a name?
Question #3: Solved I've been looking into subset-sum solutions a lot, but I don't understand how to apply them to something like this in multiple dimensions. I assume it's been done before, but I'm unsure where to start my research. Could someone either describe the principles at work or point me in a research direction?
Edit: After looking at everyone's feedback and digging into the terms I think I've found a good algorithm I can implement to solve part 1. Since I will have a very large number of dimensions compared to the number of items it looks like using a 'primal effective capacity heuristic (PECH)' will be a good fit. I'd be interested in hearing someones thoughts about it if they have experience with such an algorithm.
Question #4: For the second part, performance is a concern and I doubt it will be realistic to brute force it. So I intend to treat all combinations of boxes as a really big tree of solutions. The idea is to compute part 1 for all combinations of M-1 boxes where M is the total number of boxes. Somehow determine the 'best' couple box combinations from that set and do the same to their child nodes on the tree. Does this sound like it would help me arrive at something close to optimal? How would I choose the 'best' box combinations?
Thanks for reading! Suggestions for edits and clarifications are welcome.

Algorithm for Trading Resources

I'm trying to find the best algorithmic solution to the following problem. It's a real-world problem, but I'm going to present it in an abstract manner.
There is a community of 1000 people. Each user is provided with a set number of tickets. There are four types of tickets (each one corresponds to a different event). However, some people are willing to make trades (for example, I want one A-ticket and am willing to give up two B-tickets). Moreover, some people have extra tickets that they are willing to give away for nothing (for example, I'll give away two C-tickets to whoever wants them). Assuming that I know what each person is willing to give away / trade, how do I satisfy the most number of people?
I tried Googling, but I don't know how to word this problem to avoid getting results related to algorithmic trading of financial instruments.
Thanks.
Given that it has multiple dimensions, it is likely an NP-complete problem. It has parallels to a multi-dimensional knapsack problem.
Therefore, I recommend trying a backtracking approach.
Start with everybody involved in the trade.
Sort the people who cause the most deficit descending (here you can weight the deficit caused by each ticket type by the shortfall in each ticket).
Then in a backtracking manner, kick the person causing the next highest deficit person out of the trade.
Repeat until you either have no more deficit in any ticket (record as possible answer), or you have kicked everybody out.
When that happens, backtrack 1 step and continue (if you already tried kicking the highest deficit, kick the next highest deficit causing person).
Repeat until end or you run out of time. Get the optimal answer out of the possible answers you found.
If the problem is too hard, it will probably run out of time. Otherwise, this algorithm should give you a reasonable answer (perhaps near to optimal).
How well this method works depends on how generous/greedy the people are, how many people there are, and how fast your computer is.
Look for a bipartite minimum weigth matching problem. The idea is to find the shortest distance from i to j using vertices 1 .. k only.

dynamic fitness function for genetic algorithm

I'm not sure if I'm completely understanding genetic algorithms and how they work, I'm trying to learn via ai4r http://ai4r.rubyforge.org/geneticAlgorithms.html
If in Job Shop Scheduling, which I believe can be solved by GA(?), isn't cost of any single job is based on how it related to it's predecessors? I was thinking I would calculate a cost based on the placement of the chromosome with a dynamic score of how well it is placed rather than a binary value, but I'm not sure this works.
Anybody have any experience with this? or does a GA only work when the difference between any two genomes is static?
I hope I have the right terminology here, as I mentioned, I'm just learning.
-----------------------update-----------------------------------
I think I'm using a bit of the wrong terminology here. I referred to 'fitness' when I think what I actually wanted to use was cost matrix.
The example I'm going from describes this
Each chromosome must represent a posible solution for the problem. This class conatins an array with the list of visited nodes (cities of the tour). The size of the tour is obtained automatically from the traveling costs matrix. You have to assign the costs matrix BEFORE you run the genetic search. The following costs matrix could be used to solve the problem with only 3 cities:
data_set = [ [ 0, 10, 5],
[ 6, 0, 4],
[25, 4, 0]
]
Ai4r::GeneticAlgorithm::Chromosome.set_cost_matrix(data_set)
so in my instance, I'm thinking the 'cost' of each chromosome is dynamic based on it's neighbours.
Since you asked in a comment to make this an answer, I took the liberty of summarizing my earlier responses as well so it's all in one place. The answer to the specific question of "what is a penalty term" is in item #3 below.
The way a standard genetic algorithm works is that each "chromosome" is a complete solution to the problem. In your case, an ordering for the jobs to be submitted. The confusion, I think, centers around the notion that because the individual contributions to fitness made by a particular job in that schedule varies according to the rest of the schedule, you must need something "dynamic". That's not really true. From the point of view of the GA, the only thing that has a fitness is the entire solution. So a dynamic problem is one in which the fitness of a whole schedule can change over time. Going back to the TSP, a dynamic problem would be one in which touring cities in order of A, B, C, D, and then E actually had a different distance each time you tried it. Even though the cost of a tour through B depends on which cities come before and after B in the tour, once you decide that, the costs are static, and because the GA only ever receives costs for entire tours, all it knows is that [A,B,C,D,E] has a constant fitness. No dynamic trickery needed.
Now, your second question was how to handle constraints like, for the TSP example, what if you need to ensure that the salesman gets to Paris by a certain time? Typically, there are three ways to try to handle this.
Never allow a solution to be generated in which he doesn't get there before 2:00. Sometimes this is easy, other times it's very hard. For instance, if the constraint was "he cannot start at city X", it's fairly easy to just not generate solutions that don't start with X. Often though, simply finding valid solutions can be hard, and so this approach doesn't really work.
Allow constraints to be violated, but fix them afterward. In the TSP example, you let crossover and mutation produce any possible tour, but then scan through it to see if he gets to Paris too late. If so, swap the position of Paris with some earlier city in the tour. Again though, sometimes it can be hard to figure out a good way to repair violations.
Penalize the fitness of an infeasible solution. Here, the idea is that even if I can't prevent him from getting to Paris too late and I can't fix it if he does, I can at least make the fitness arbitrarily worse. For TSP, the fitness is the length of the tour. So you might say that if a tour gets him to Paris too late, the fitness is the length of the tour + 100. That let's the solution stay in the population (it might be very good otherwise, so you want it to have a chance to pass on some of its genes), but you make it less likely to be selected, because your selection and replacement methods pick individuals with better fitness values.
For your JSP problem, typically you're looking to minimize the makespan. The same three options are available to you if you do have some constraints. But from what I can tell, you don't really have such constraints. I think you're trying to inject too much knowledge into the process rather than letting the evolutionary algorithm come up with it on its own. That is, you don't necessarily worry about telling the GA that some arrangements of jobs are better than others. You just assign higher fitness to the better ones and let the process converge.
That said, injecting information like this is often a really good thing to do, but you want to have a good understanding of the basic algorithm first. Let's say that we know that for TSP, it's more likely that a good solution will connect cities that are close to one another. The way I would use that information inside a GA would be to generate random solutions non-uniformly (perhaps with a greedy heuristic). I might also replace the standard crossover and mutation algorithms with something customized. Mutation is typically easier to do this with than crossover. To mutate a TSP solution, I might pick two connected cities, break the connection, and then look for a way to reconnect them that was "closer". That is, if a tour is [A,B,C,D,E,F,G,H], I might pick the edge [B,C] at random, and then look for another edge, maybe [F,G], such that when I connected them crossways to get [A,B,G,D,E,F,C,H], the total tour length was lower. I could even extend that mutation beyond one step -- create a loop that keeps trying to break and reconnect edges until it can't find a shorter tour. This leads to what is usually called a hybrid GA because it's a GA hybridized with a local search; sometimes also called a Memetic Algorithm. These sorts of algorithms usually outperform a black-box GA because you're giving the algorithm "hints" to bias it towards trying things you expect to be good.
I think this idea of a memetic algorithm is pretty close to what you were hitting on in your original question of wondering how to deal with the fact that the contribution to fitness from a particular job depends on where the other jobs are in the schedule. The only stumbling block there is that you were a bit unlucky in that the somewhat reasonable idea of thinking of this as "dynamic" leads you a bit astray, as "dynamic" actually means something entirely different here.
So to wrap up, there's nothing "dynamic" about your problem, so the things people do with GAs for dynamic problems will be entirely unhelpful. A standard GA will work with no fancy tricks. However, the idea of using information you have about what schedules work better can be introduced into the genetic operators, and will probably result in a significantly better overall algorithm.
You'd use GA to find say the best order to do a number of jobs in, or those jobs which made say best use of a day's resources. So yes they'd be related to each other.
So your fitness measure would be for seq 1,3,4,5,6,2.
Look at say find shortest path algorithm, starts to make sense then

Divide people into teams for most satisfaction

Just a curiosity question. Remember when in class groupwork the professor would divide people up into groups of a certain number (n)?
Some of my professors would take a list of n people one wants to work with and n people one doesn't want to work with from each student, and then magically turn out groups of n where students would be matched up with people they prefer and avoid working with people they don't prefer.
To me this algorithm sounds a lot like a Knapsack problem, but I thought I would ask around about what your approach to this sort of problem would be.
EDIT: Found an ACM article describing something exactly like my question. Read the second paragraph for deja vu.
To me it sounds more like some sort of clique problem.
The way I see the problem, I'd set up the following graph:
Vertices would be the students
Two students would be connected by an edge if both of these following things hold:
At least one of the two students wants to work with the other one.
None of the two students doesn't want to work with the other one.
It is then a matter of partitioning the graph into cliques of size n. (Assuming the number of students is divisible by n)
If this was not possible, I'd probably let the first constraint on the edges slip, and have edges between two people as long as neither of them explicitly says that they don't want to work with the other one.
As for an approach to solving this efficiently, I have no idea, but this should hopefully get you closer to some insight into the problem.
You could model this pretty easily as a clustering problem and you wouldn't even really need to define a space, you could actually just define the distances:
Make two people very close if they both want to work together.
Close if one of them wants to work with the other.
Medium distance if there's just apathy.
Far away if either one doesn't want to work with the other.
Then you could just find clusters, yay. Then split up any clusters of overly large size, with confidence that the people in the clusters would all be fine working together.
This problem can be brute-forced, hence my approach would be first to brute force it, then fix it when I get a better idea.
There are a couple of algorithms you could use. A great example is the so called "stable marriage problem", which has a perfect solution. You can read more about it here:
http://en.wikipedia.org/wiki/Stable_marriage_problem
The stable marriage problem only works with two groups of people (men/women in the marriage case). If you want to form pair you can use a variation, the stable roommate problem. In this case you create pairs but everybody comes from a single pool.
But you asked for a team (which I translate into >2 people per team). In this case you could let everybody fill in their best to worst match and then run the

Resources