Algorithm for Trading Resources - algorithm

I'm trying to find the best algorithmic solution to the following problem. It's a real-world problem, but I'm going to present it in an abstract manner.
There is a community of 1000 people. Each user is provided with a set number of tickets. There are four types of tickets (each one corresponds to a different event). However, some people are willing to make trades (for example, I want one A-ticket and am willing to give up two B-tickets). Moreover, some people have extra tickets that they are willing to give away for nothing (for example, I'll give away two C-tickets to whoever wants them). Assuming that I know what each person is willing to give away / trade, how do I satisfy the most number of people?
I tried Googling, but I don't know how to word this problem to avoid getting results related to algorithmic trading of financial instruments.
Thanks.

Given that it has multiple dimensions, it is likely an NP-complete problem. It has parallels to a multi-dimensional knapsack problem.
Therefore, I recommend trying a backtracking approach.
Start with everybody involved in the trade.
Sort the people who cause the most deficit descending (here you can weight the deficit caused by each ticket type by the shortfall in each ticket).
Then in a backtracking manner, kick the person causing the next highest deficit person out of the trade.
Repeat until you either have no more deficit in any ticket (record as possible answer), or you have kicked everybody out.
When that happens, backtrack 1 step and continue (if you already tried kicking the highest deficit, kick the next highest deficit causing person).
Repeat until end or you run out of time. Get the optimal answer out of the possible answers you found.
If the problem is too hard, it will probably run out of time. Otherwise, this algorithm should give you a reasonable answer (perhaps near to optimal).
How well this method works depends on how generous/greedy the people are, how many people there are, and how fast your computer is.

Look for a bipartite minimum weigth matching problem. The idea is to find the shortest distance from i to j using vertices 1 .. k only.

Related

Grouping Algorithm with Requirements

So I am working on writing an algorithm that when given a group of people from different places, it will organize them into groups of three based off a few parameters:
No two people in a group are from the same place
No two people in a group have met before
Everyone in the group is available to meet on the same day
Not more than one person is below the age of 18
I have in my data structures a variable for all the required prereqs above. I was wondering if there was a good way to go about solving this problem? Currently I am using a variation of the Gale-Shapely Algorithm (solution to Stable Marriage Problem). This solution works relatively well, but more often that not, it requires me to go in and make some minor tweaks to the final groups.
Any one have any ideas/suggestions?
I appreciate the help in advance.
This is a graph partitioning problem, and as such it's almost certainly NP-Hard (when you've only got one constraint then the problem can be solved in polynomial time - this is similar to the stable marriage problem - but when you've got multiple constraints then the complexity shoots way up). A good way to solve these problems is to apply a heuristic (which you're doing with the Gale-Shapely algorithm), and to then resolve any conflicts with a local backtracking search (which it sounds like you're currently doing by hand). My suggestion is to keep your current heuristic if it seems to be working well for you, and to add an automated local backtracking algorithm to resolve any conflicts that arise from the heuristic (e.g. if you've got a single group that has two people who are under 18, then swap one of these people with somebody who is over 18 and who doesn't violate the other three constraints; if this isn't possible, then choose somebody who is over 18 who violates the fewest constraints, and then swap out somebody else from the group in order to satisfy the now violated constraint; after N failed iterations the algorithm throws up its hand and asks for human intervention)

How genetic algorithm is different from random selection and evaluation for fittest?

I have been learning the genetic algorithm since 2 months. I knew about the process of initial population creation, selection , crossover and mutation etc. But could not understand how we are able to get better results in each generation and how its different than random search for a best solution. Following I am using one example to explain my problem.
Lets take example of travelling salesman problem. Lets say we have several cities as X1,X2....X18 and we have to find the shortest path to travel. So when we do the crossover after selecting the fittest guys, how do we know that after crossover we will get a better chromosome. The same applies for mutation also.
I feel like its just take one arrangement of cities. Calculate the shortest distance to travel them. Then store the distance and arrangement. Then choose another another arrangement/combination. If it is better than prev arrangement, then save the current arrangement/combination and distance else discard the current arrangement. By doing this also, we will get some solution.
I just want to know where is the point where it makes the difference between random selection and genetic algorithm. In genetic algorithm, is there any criteria that we can't select the arrangement/combination of cities which we have already evaluated?
I am not sure if my question is clear. But I am open, I can explain more on my question. Please let me know if my question is not clear.
A random algorithm starts with a completely blank sheet every time. A new random solution is generated each iteration, with no memory of what happened before during the previous iterations.
A genetic algorithm has a history, so it does not start with a blank sheet, except at the very beginning. Each generation the best of the solution population are selected, mutated in some way, and advanced to the next generation. The least good members of the population are dropped.
Genetic algorithms build on previous success, so they are able to advance faster than random algorithms. A classic example of a very simple genetic algorithm, is the Weasel program. It finds its target far more quickly than random chance because each generation it starts with a partial solution, and over time those initial partial solutions are closer to the required solution.
I think there are two things you are asking about. A mathematical proof that GA works, and empirical one, that would waive your concerns.
Although I am not aware if there is general proof, I am quite sure at least a good sketch of a proof was given by John Holland in his book Adaptation in Natural and Artificial Systems for the optimization problems using binary coding. There is something called Holland's schemata theoerm. But you know, it's heuristics, so technically it does not have to be. It basically says that short schemes in genotype raising the average fitness appear exponentially with successive generations. Then cross-over combines them together. I think the proof was given only for binary coding and got some criticism as well.
Regarding your concerns. Of course you have no guarantee that a cross-over will produce a better result. As two intelligent or beautiful parents might have ugly stupid children. The premise of GA is that it is less likely to happen. (As I understand it) The proof for binary coding hinges on the theoerm that says a good partial patterns will start emerging, and given that the length of the genotype should be long enough, such patterns residing in different specimen have chance to be combined into one improving his fitness in general.
I think it is fairly easy to understand in terms of TSP. Crossing-over help to accumulate good sub-paths into one specimen. Of course it all depends on the choice of the crossing method.
Also GA's path towards the solution is not purely random. It moves towards a certain direction with stochastic mechanisms to escape trappings. You can lose best solutions if you allow it. It works because it wants to move towards the current best solutions, but you have a population of specimens and they kind of share knowledge. They are all similar, but given that you preserve diversity new better partial patterns can be introduced to the whole population and get incorporated into the best solutions. This is why diversity in population is regarded as very important.
As a final note please remember the GA is a very broad topic and you can modify the base in nearly every way you want. You can introduce elitarism, taboos, niches, etc. There is no one-and-only approach/implementation.

Is there a name for the algorithm solutions for "booking rooms with the least room switches"?

I was discussing with a co-worker a problem we were having with a piece of software we deploy, and he mentioned how it was similar to the conceptual problem of booking rooms over a course of time and the algorithm should output the room bookings that requires the least switches (so for example, an optimal solution may be staying in one room for 3 days, and the rest in another room, only requiring two switches).
Is there a name for such a problem in algorithms?
Originally I posted something regarding the minimum set cover problem. Although you can describe your problem as a minimum set cover problem, if we assume "room bookings" are over consecutive days, your problem can be more succinctly described with a different problem.
The interval cover problem1 consists of one big interval (call it (a,b)), and a bunch of subintervals (call them (ai, bi)). Our goal is to cover the one big interval with as few subintervals as possible.
Finding the minimal coverage of an interval with subintervals is a question posted about 5 years ago which asks for an efficient solution, and the accepted answer shows that the greedy solution is optimal. Within the context of room bookings, the "greedy solution" would be basically to start from the beginning of the period and always pick the booking with the latest end date.
The idea of course with this problem is that the each "subinterval" is a booking, so the fewer subintervals we need, the fewer bookings, and hence the fewer "switches" we need.
1 I'm not actually 100% sure that this is the correct name, but if you were to say "interval cover problem", the listener would probably think of the same thing.

dynamic fitness function for genetic algorithm

I'm not sure if I'm completely understanding genetic algorithms and how they work, I'm trying to learn via ai4r http://ai4r.rubyforge.org/geneticAlgorithms.html
If in Job Shop Scheduling, which I believe can be solved by GA(?), isn't cost of any single job is based on how it related to it's predecessors? I was thinking I would calculate a cost based on the placement of the chromosome with a dynamic score of how well it is placed rather than a binary value, but I'm not sure this works.
Anybody have any experience with this? or does a GA only work when the difference between any two genomes is static?
I hope I have the right terminology here, as I mentioned, I'm just learning.
-----------------------update-----------------------------------
I think I'm using a bit of the wrong terminology here. I referred to 'fitness' when I think what I actually wanted to use was cost matrix.
The example I'm going from describes this
Each chromosome must represent a posible solution for the problem. This class conatins an array with the list of visited nodes (cities of the tour). The size of the tour is obtained automatically from the traveling costs matrix. You have to assign the costs matrix BEFORE you run the genetic search. The following costs matrix could be used to solve the problem with only 3 cities:
data_set = [ [ 0, 10, 5],
[ 6, 0, 4],
[25, 4, 0]
]
Ai4r::GeneticAlgorithm::Chromosome.set_cost_matrix(data_set)
so in my instance, I'm thinking the 'cost' of each chromosome is dynamic based on it's neighbours.
Since you asked in a comment to make this an answer, I took the liberty of summarizing my earlier responses as well so it's all in one place. The answer to the specific question of "what is a penalty term" is in item #3 below.
The way a standard genetic algorithm works is that each "chromosome" is a complete solution to the problem. In your case, an ordering for the jobs to be submitted. The confusion, I think, centers around the notion that because the individual contributions to fitness made by a particular job in that schedule varies according to the rest of the schedule, you must need something "dynamic". That's not really true. From the point of view of the GA, the only thing that has a fitness is the entire solution. So a dynamic problem is one in which the fitness of a whole schedule can change over time. Going back to the TSP, a dynamic problem would be one in which touring cities in order of A, B, C, D, and then E actually had a different distance each time you tried it. Even though the cost of a tour through B depends on which cities come before and after B in the tour, once you decide that, the costs are static, and because the GA only ever receives costs for entire tours, all it knows is that [A,B,C,D,E] has a constant fitness. No dynamic trickery needed.
Now, your second question was how to handle constraints like, for the TSP example, what if you need to ensure that the salesman gets to Paris by a certain time? Typically, there are three ways to try to handle this.
Never allow a solution to be generated in which he doesn't get there before 2:00. Sometimes this is easy, other times it's very hard. For instance, if the constraint was "he cannot start at city X", it's fairly easy to just not generate solutions that don't start with X. Often though, simply finding valid solutions can be hard, and so this approach doesn't really work.
Allow constraints to be violated, but fix them afterward. In the TSP example, you let crossover and mutation produce any possible tour, but then scan through it to see if he gets to Paris too late. If so, swap the position of Paris with some earlier city in the tour. Again though, sometimes it can be hard to figure out a good way to repair violations.
Penalize the fitness of an infeasible solution. Here, the idea is that even if I can't prevent him from getting to Paris too late and I can't fix it if he does, I can at least make the fitness arbitrarily worse. For TSP, the fitness is the length of the tour. So you might say that if a tour gets him to Paris too late, the fitness is the length of the tour + 100. That let's the solution stay in the population (it might be very good otherwise, so you want it to have a chance to pass on some of its genes), but you make it less likely to be selected, because your selection and replacement methods pick individuals with better fitness values.
For your JSP problem, typically you're looking to minimize the makespan. The same three options are available to you if you do have some constraints. But from what I can tell, you don't really have such constraints. I think you're trying to inject too much knowledge into the process rather than letting the evolutionary algorithm come up with it on its own. That is, you don't necessarily worry about telling the GA that some arrangements of jobs are better than others. You just assign higher fitness to the better ones and let the process converge.
That said, injecting information like this is often a really good thing to do, but you want to have a good understanding of the basic algorithm first. Let's say that we know that for TSP, it's more likely that a good solution will connect cities that are close to one another. The way I would use that information inside a GA would be to generate random solutions non-uniformly (perhaps with a greedy heuristic). I might also replace the standard crossover and mutation algorithms with something customized. Mutation is typically easier to do this with than crossover. To mutate a TSP solution, I might pick two connected cities, break the connection, and then look for a way to reconnect them that was "closer". That is, if a tour is [A,B,C,D,E,F,G,H], I might pick the edge [B,C] at random, and then look for another edge, maybe [F,G], such that when I connected them crossways to get [A,B,G,D,E,F,C,H], the total tour length was lower. I could even extend that mutation beyond one step -- create a loop that keeps trying to break and reconnect edges until it can't find a shorter tour. This leads to what is usually called a hybrid GA because it's a GA hybridized with a local search; sometimes also called a Memetic Algorithm. These sorts of algorithms usually outperform a black-box GA because you're giving the algorithm "hints" to bias it towards trying things you expect to be good.
I think this idea of a memetic algorithm is pretty close to what you were hitting on in your original question of wondering how to deal with the fact that the contribution to fitness from a particular job depends on where the other jobs are in the schedule. The only stumbling block there is that you were a bit unlucky in that the somewhat reasonable idea of thinking of this as "dynamic" leads you a bit astray, as "dynamic" actually means something entirely different here.
So to wrap up, there's nothing "dynamic" about your problem, so the things people do with GAs for dynamic problems will be entirely unhelpful. A standard GA will work with no fancy tricks. However, the idea of using information you have about what schedules work better can be introduced into the genetic operators, and will probably result in a significantly better overall algorithm.
You'd use GA to find say the best order to do a number of jobs in, or those jobs which made say best use of a day's resources. So yes they'd be related to each other.
So your fitness measure would be for seq 1,3,4,5,6,2.
Look at say find shortest path algorithm, starts to make sense then

Divide people into teams for most satisfaction

Just a curiosity question. Remember when in class groupwork the professor would divide people up into groups of a certain number (n)?
Some of my professors would take a list of n people one wants to work with and n people one doesn't want to work with from each student, and then magically turn out groups of n where students would be matched up with people they prefer and avoid working with people they don't prefer.
To me this algorithm sounds a lot like a Knapsack problem, but I thought I would ask around about what your approach to this sort of problem would be.
EDIT: Found an ACM article describing something exactly like my question. Read the second paragraph for deja vu.
To me it sounds more like some sort of clique problem.
The way I see the problem, I'd set up the following graph:
Vertices would be the students
Two students would be connected by an edge if both of these following things hold:
At least one of the two students wants to work with the other one.
None of the two students doesn't want to work with the other one.
It is then a matter of partitioning the graph into cliques of size n. (Assuming the number of students is divisible by n)
If this was not possible, I'd probably let the first constraint on the edges slip, and have edges between two people as long as neither of them explicitly says that they don't want to work with the other one.
As for an approach to solving this efficiently, I have no idea, but this should hopefully get you closer to some insight into the problem.
You could model this pretty easily as a clustering problem and you wouldn't even really need to define a space, you could actually just define the distances:
Make two people very close if they both want to work together.
Close if one of them wants to work with the other.
Medium distance if there's just apathy.
Far away if either one doesn't want to work with the other.
Then you could just find clusters, yay. Then split up any clusters of overly large size, with confidence that the people in the clusters would all be fine working together.
This problem can be brute-forced, hence my approach would be first to brute force it, then fix it when I get a better idea.
There are a couple of algorithms you could use. A great example is the so called "stable marriage problem", which has a perfect solution. You can read more about it here:
http://en.wikipedia.org/wiki/Stable_marriage_problem
The stable marriage problem only works with two groups of people (men/women in the marriage case). If you want to form pair you can use a variation, the stable roommate problem. In this case you create pairs but everybody comes from a single pool.
But you asked for a team (which I translate into >2 people per team). In this case you could let everybody fill in their best to worst match and then run the

Resources