Let's say I have a group of N people who is going to travel by train. I need to organize them in a line to the ticket office in a way that minimizes the total tickets cost. The cost can be minimized if families buy family tickets and people travelling to the same destination buy group tickets.
I do not know who of these people are families and where are they travelling.
All I can do is to send any M (1 <= M <= N) of them to the ticket office and get the answer, how much it will cost for these M people.
I also have a limited time, as a train is leaving in some minutes, so the near best solution is good enough for me.
The brute force solution is O(N!) and so is obviously unacceptable.
EDIT
The answer from the ticket office is always the total sum for M people, no details.
Group and/or family may start at 2 people and may include all N of them.
The cushier in the tickets office will always know who is a family and who is not.
EDIT
If I am sending to the tickets office a family and some more people, the family will not get a family ticket, they all will get their regular tickets.
Since you mentioned you can settle for 'good enough', and you are time limited - here is a greedy any-time approach:
p <- Create a random permutation
estimate the cost of this permutation. (let that be cost)
Find a new permutation p' such that cost(p') < cost(p) that is achieved from p using a single swap of two people (there are n(n-1)/2 such possible swaps)1
If such p' exist: p <- p' and return to 2.
Else, store p as a local minimum, and return to 1.
When time is up - choose the best local minimum found.
This is basically a variation of steepest ascent hill climbing with random restarts.
Note that if your time->infinity, you will find optimal solution,
because the probability of checking any possible permutation is
getting closer and closer to 1 as time passes.
(1) getting the price can be done by first checking who is a family member/going to the same destination and is adjacent to each other in the permutation at O(n)using the fact that d(passenger1,passenger2) < d(passenger1) + d(passenger2), and then checking each group separately.
Related
I have to design an algorithm to solve a problem:
We have two groups of people (group A and group B, the number of people in group A is always less or equal to the number of people in group B), all standing in a one-dimensional line, each people have a corresponding number indicating its location. When the timer starts, each people in group A must find a partner in group B, but people in group B cannot move at all and each people in group B can only have at most 1 partner.
Suppose that people in group A move 1 unit/sec, how can I find the minimum time for everyone in group A to find a partner?
for example, if there are three people in group A with location {5,7,8}, and four people in group B with location {2,3,4,9}, the optimal solution would be 3 sec because max(5-3,7-4,9-8)=3
I could just use brute-force to solve it, but is there a better way of solving this problem?
This problem is a special case of the edit distance problem, and so a similar Dynamic Programming solution can be used to solve it. It's possible that a faster solution exists for this special case.
Let A = [a_0, a_1...,a_(m-1)] be the (sorted) positions of our m moving people, and B = [b_0, b_1...,b_(n-1)] be the n (sorted) destination spots, with m <= n. For the edit distance analogy, the allowed operations are:
Insert a number into A (free), or
Substitute an element a -> a' in A with cost |a-a'|.
We can solve this in O(n*m) time (plus sorting time of both A and B, if necessary).
We can define the dynamic programming via a cost function C(i, j) which is the minimum cost to move the first i people a_0, ... a_(i-1) using only the first j spots b_0, ... b_(j-1). You want C(m,n). Define C as follows:
Find the most appropriate team compositions for days in which it is possible. A set of n participants, k days, a team has m slots. A participant specifies how many days he wants to be a part of and which days he is available.
Result constraints:
Participants must not be participating in more days than they want
Participants must not be scheduled in days they are not available in.
Algorithm should do its best to include as many unique participants as possible.
A day will not be scheduled if less than m participants are available for that day.
I find myself solving this problem manually every week at work for my football team scheduling and I'm sure there is a smart programmatic approach to solve it. Currently, we consider only 2 days per week and colleagues write down their name for which day they wanna participate, and it ends up having big lists for each day and impossible to please everyone.
I considered a new approach in which each colleague writes down his name, desired times per week to play and which days he is available, an example below:
Kane 3 1 2 3 4 5
The above line means that Kane wants to play 3 times this week and he is available Monday through Friday. First number represents days to play, next numbers represent available days(1 to 7, MOnday to Sunday).
Days with less than m (in my case, m = 12) participants are not gonna be scheduled. What would be the best way to approach this problem in order to find a solution that does its best to include each participant at least once and also considers their desires(when to play, how much to play).
I can do programming, I just need to know what kind of algorithm to implement and maybe have a brief logical explanation for the choice.
Result constraints:
Participants must not play more than they want
Participants must not be scheduled in days they don't want to play
Algorithm should do its best to include as many participants as possible.
A day will not be scheduled if less than m participants are available for that day.
Scheduling problems can get pretty gnarly, but yours isn't too bad actually. (Well, at least until you put out the first automated schedule and people complain about it and you start adding side constraints.)
The fact that a day can have a match or not creates the kind of non-convexity that makes these problems hard, but if k is small (e.g., k = 7), it's easy enough to brute force through all of the 2k possibilities for which days have a match. For the rest of this answer, assume we know.
Figuring out how to assign people to specific matches can be formulated as a min-cost circulation problem. I'm going to write it as an integer program because it's easier to understand in my opinion, and once you add side constraints you'll likely be reaching for an integer program solver anyway.
Let P be the set of people and M be the set of matches. For p in P and m in M let p ~ m if p is willing to play in m. Let U(p) be the upper bound on the number of matches for p. Let D be the number of people demanded by each match.
For each p ~ m, let x(p, m) be a 0-1 variable that is 1 if p plays in m and 0 if p does not play in m. For all p in P, let y(p) be a 0-1 variable (intuitively 1 if p plays in at least one match and 0 if p plays in no matches, but hold on a sec). We have constraints
# player doesn't play in too many matches
for all p in P, sum_{m in M | p ~ m} x(p, m) ≤ U(p)
# match has the right number of players
for all m in M, sum_{p in P | p ~ m} x(p, m) = D
# y(p) = 1 only if p plays in at least one match
for all p in P, y(p) ≤ sum_{m in M | p ~ m} x(p, m)
The objective is to maximize
sum_{p in P} y(p)
Note that we never actually force y(p) to be 1 if player p plays in at least one match. The maximization objective takes care of that for us.
You can write code to programmatically formulate and solve a given instance as a mixed-integer program (MIP) like this. With a MIP formulation, the sky's the limit for side constraints, e.g., avoid playing certain people on consecutive days, biasing the result to award at least two matches to as many people as possible given that as many people as possible got their first, etc., etc.
I have an idea if you need a basic solution that you can optimize and refine by small steps. I am talking about Flow Networks. Most of those that already know what they are are probably turning their nose because flow network are usually used to solve maximization problem, not optimization problem. And they are right in a sense, but I think it can be initially seen as maximizing the amount of player for each day that play. No need to say it is a kind of greedy approach if we stop here.
No more introduction, the purpose is to find the maximum flow inside this graph:
Each player has a number of days in which he wants to play, represented as the capacity of each edge from the Source to node player x. Each player node has as many edges from player x to day_of_week as the capacity previously found. Each of this 2nd level edges has a capacity of 1. The third level is filled by the edges that link day_of_week to the sink node. Quick example: player 2 is available 2 days: monday and tuesday, both have a limit of player, which is 12.
Until now 1st, 2nd and 4th constraints are satisfied (well, it was the easy part too): after you found the maximum flow of the entire graph you only select those path that does not have any residual capacity both on 2nd level (from players to day_of_weeks) and 3rd level (from day_of_weeks to the sink). It is easy to prove that with this level of "optimization" and under certain conditions, it is possible that it will not find any acceptable path even though it would have found one if it had made different choices while visiting the graph.
This part is the optimization problem that i meant before. I came up with at least two heuristic improvements:
While you visit the graph, store day_of_weeks in a priority queue where days with more players assigned have a higher priority too. In this way the amount of residual capacity of the entire graph is certainly less evenly distributed.
randomness is your friend. You are not obliged to run this algorithm only once, and every time you run it you should pick a random edge from a node in the player's level. At the end you average the results and choose the most common outcome. This is an situation where the majority rule perfectly applies.
Better to specify that everything above is just a starting point: the purpose of heuristic is to find the best approximated solution possible. With this type of problem and given your probably small input, this is not the right way but it is the easiest one when you do not know where to start.
I have an engagement between two fleets of n and m ships, each ship in the friendly fleet with its own with salvo damage, and each ship in the enemy fleet with its own hp amount. The goal of this algorithm is to find the optimal solution (if such solution exists) to how to assign targets to your ships, (ex: ship 1 in my fleet targets ship 3 in your fleet) in such a way that the salvo will maximize the amount of damage done to the enemy fleet.
Important. By damage, I mean the amount of damage/hp value of an enemy ship destroyed. If an enemy ship has 100hp and deals 20dmg, its "value" is 100/20 = 5. So destroying that ship incurs a score of 5. And lastly, only the score of destroyed ships is taken into account. If it is impossible to destroy any ships with a single salvo, the score will then include the damaged ships.
I have attempted a greedy method, an iterative improvement method, and a hill ascent method too, but none of them are capable of reaching an optimal solution. I have also tried another method, where a large amount of randomized target choice sets are made, and evaluated, and the best one is picked out of al of them. This is the one that has produced the best results, but it is incredibly innefficient and almost never produces the optimal result.
I believe there has to be a way of calculating an optimal solution that does not require checking every single possible targeting choice, but I cannot find a way of doing so. It also seems like this problem is like a weird form of the multiple knapsack problem. With the knapsacks being the enemy hp pools, and the items the dmg values of the shots. Except this time the last item placed into a knapsack can exceed the size limit of the knapsac but only the fraction of the item's value that fits into the kanpsack is useful.
Even if it is not a solution to the problem, any thoughts or help are very much appreciated!
Linear programming will do the job perfectly here. In this case, the decision variables are integers, so we are dealing with ILP.
Here is a small description on how to model your problem as a linear program.
Data:
F_dmg[n] // an array containing the damage of friendly ships
E_hp[m] // an array containing the hp points of the ennemy ships
M // constant, the highest hp among all ships
V[m] // the 'value' of ennemy ships
Decision variables:
X[n][m] // a matrix of booleans (0 or 1)
// X[i][j] = 1 if the ship i attacks the ship j, 0 otherwise
Dmg[m] // an array of integer, representing the total damage taken by each ennemy ship
IsAlive[m] // an array of booleans, representing the fact that the ship is destroyed or not (0 if dead, 1 if alive)
Constraints:
// a friend ship can attack at most one ennemy ship
for all i in 1..n, sum(j in 1..m) X[i][j] <= 1
// the damage sustained by a ship cannot exceed its hp
for all j in 1..m, sum(i in 1..n) Dmg[m] <= E_hp[j]
// the damage sustained by a ship has to be coherent with the attacks it receives
for all j in 1..m, sum(i in 1..n) Dmg[j] <= X[i][j] * F_dmg[i]
// a ship is destroyed if the damage sustained is equal to its hp
for all j in 1..m, M * IsAlive[j] >= E_hp[j] - Dmg[j]
Objective function
maximize sum(j in 1..m) (1 - IsAlive[j]) * V[j]
Write that in OPL, feed it to an ILP solver and you'll get an optimal answer real fast if your input is not absolutely gigantic.
This either is, or is very similar to, the Weapon Target Assignment Problem.
Unfortunately that problem is NP-hard, and according to the 2003 paper "Exact and Heuristic Algorithms for the Weapon Target Assignment Problem" (Ahuja, Kumar et al.), even instances as small as 20 weapons and 20 targets can't be solved to provable optimality. (I only read the abstract.)
Must me quite similar to a depth first algorithm that will try and find the optimal route it will then return the optimal route for each ship and return. An array of targets that each ship should target
I'm developing a tournament model for a virtual city commerce game (Urbien.com) and would love to get some algorithm suggestions. Here's the scenario and current "basic" implementation:
Scenario
Entries are paired up duel-style, like on the original Facemash or Pixoto.com.
The "player" is a judge, who gets a stream of dueling pairs and must choose a winner for each pair.
Tournaments never end, people can submit new entries at any time and winners of the day/week/month/millenium are chosen based on the data at that date.
Problems to be solved
Rating algorithm - how to rate tournament entries and how to adjust their ratings after each match?
Pairing algorithm - how to choose the next pair to feed the player?
Current solution
Rating algorithm - the Elo rating system currently used in chess and other tournaments.
Pairing algorithm - our current algorithm recognizes two imperatives:
Give more duels to entries that have had less duels so far
Match people with similar ratings with higher probability
Given:
N = total number of entries in the tournament
D = total number of duels played in the tournament so far by all players
Dx = how many duels player x has had so far
To choose players x and y to duel, we first choose player x with probability:
p(x) = (1 - (Dx / D)) / N
Then choose player y the following way:
Sort the players by rating
Let the probability of choosing player j at index jIdx in the sorted list be:
p(j) = ...
0, if (j == x)
n*r^abs(jIdx - xIdx) otherwise
where 0 < r < 1 is a coefficient to be chosen, and n is a normalization factor.
Basically the probabilities in either direction from x form a geometic series, normalized so they sum to 1.
Concerns
Maximize informational value of a duel - pairing the lowest rated entry against the highest rated entry is very unlikely to give you any useful information.
Speed - we don't want to do massive amounts of calculations just to choose one pair. One alternative is to use something like the Swiss pairing system and pair up all entries at once, instead of choosing new duels one at a time. This has the drawback (?) that all entries submitted in a given timeframe will experience roughly the same amount of duels, which may or may not be desirable.
Equilibrium - Pixoto's ImageDuel algorithm detects when entries are unlikely to further improve their rating and gives them less duels from then on. The benefits of such detection are debatable. On the one hand, you can save on computation if you "pause" half the entries. On the other hand, entries with established ratings may be the perfect matches for new entries, to establish the newbies' ratings.
Number of entries - if there are just a few entries, say 10, perhaps a simpler algorithm should be used.
Wins/Losses - how does the player's win/loss ratio affect the next pairing, if at all?
Storage - what to store about each entry and about the tournament itself? Currently stored:
Tournament Entry: # duels so far, # wins, # losses, rating
Tournament: # duels so far, # entries
instead of throwing in ELO and ad-hoc probability formulae, you could use a standard approach based on the maximum likelihood method.
The maximum likelihood method is a method for parameter estimation and it works like this (example). Every contestant (player) is assigned a parameter s[i] (1 <= i <= N where N is total number of contestants) that measures the strength or skill of that player. You pick a formula that maps the strengths of two players into a probability that the first player wins. For example,
P(i, j) = 1/(1 + exp(s[j] - s[i]))
which is the logistic curve (see http://en.wikipedia.org/wiki/Sigmoid_function). When you have then a table that shows the actual results between the users, you use global optimization (e.g. gradient descent) to find those strength parameters s[1] .. s[N] that maximize the probability of the actually observed match result. E.g. if you have three contestants and have observed two results:
Player 1 won over Player 2
Player 2 won over Player 3
then you find parameters s[1], s[2], s[3] that maximize the value of the product
P(1, 2) * P(2, 3)
Incidentally, it can be easier to maximize
log P(1, 2) + log P(2, 3)
Note that if you use something like the logistics curve, it is only the difference of the strength parameters that matters so you need to anchor the values somewhere, e.g. choose arbitrarily
s[1] = 0
In order to have more recent matches "weigh" more, you can adjust the importance of the match results based on their age. If t measures the time since a match took place (in some time units), you can maximize the value of the sum (using the example)
e^-t log P(1, 2) + e^-t' log P(2, 3)
where t and t' are the ages of the matches 1-2 and 2-3, so that those games that occurred more recently weigh more.
The interesting thing in this approach is that when the strength parameters have values, the P(...) formula can be used immediately to calculate the win/lose probability for any future match. To pair contestants, you can pair those where the P(...) value is close to 0.5, and then prefer those contestants whose time-adjusted number of matches (sum of e^-t1 + e^-t2 + ...) for match ages t1, t2, ... is low. The best thing would be to calculate the total impact of a win or loss between two players globally and then prefer those matches that have the largest expected impact on the ratings, but that could require lots of calculations.
You don't need to run the maximum likelihood estimation / global optimization algorithm all the time; you can run it e.g. once a day as a batch run and use the results for the next day for matching people together. The time-adjusted match masses can be updated real time anyway.
On algorithm side, you can sort the players after the maximum likelihood run base on their s parameter, so it's very easy to find equal-strength players quickly.
I have a complex problem and I want to know if an existing and well understood solution model exists or applies, like the Traveling Salesman problem.
Input:
A calendar of N time events, defined by starting and finishing time, and place.
The capacity of each meeting place (maximum amount of people it can simultaneously hold)
A set of pairs (Ai,Aj) which indicates that attendant Ai wishes to meet with attendat Aj, and Aj accepted that invitation.
Output:
For each assistant A, a cronogram of all the events he will attend. The main criteria is that each attendants should meet as many of the attendants who accepted his invites as possible, satisfying the space constraints.
So far, we thought of solving with backtracking (trying out all possible solutions), and using linear programming (i.e. defining a model and solving with the simplex algorithm)
Update: If Ai already met Aj in some event, they don't need to meet anymore (they have already met).
Your problem is as hard as minimum maximal matching problem in interval graphs, w.l.o.g Assume capacity of rooms is 2 means they can handle only one meeting in time. You can model your problem with Interval graphs, each interval (for each people) is one node. Also edges are if A_i & A_j has common time and also they want to see each other, set weight of edges to the amount of time they should see each other, . If you find the minimum maximal matching in this graph, you can find the solution for your restricted case. But notice that this graph is n-partite and also each part is interval graph.
P.S: note that if the amount of time that people should be with each other is fixed this will be more easier than weighted one.
If you have access to a good MIP solver (cplex/gurobi via acedamic initiative, but coin OR and LP_solve are open-source, and not bad either), I would definitely give simplex a try. I took a look at formulating your problem as a mixed integer program, and my feeling is that it will have pretty strong relaxations, so branch and cut and price will go a long way for you. These solvers give remarkably scalable solutions nowadays, especially the commercial ones. Advantage is they also provide an upper bound, so you get an idea of solution quality, which is not the case for heuristics.
Formulation:
Define z(i,j) (binary) as a variable indicating that i and j are together in at least one event n in {1,2,...,N}.
Define z(i,j,n) (binary) to indicate they are together in event n.
Define z(i,n) to indicate that i is attending n.
Z(i,j) and z(i,j,m) only exist if i and j are supposed to meet.
For each t, M^t is a subset of time events that are held simulteneously.
So if event 1 is from 9 to 11, event 2 is from 10 to 12 and event 3 is from 11 to 13, then
M^1 = {event 1, event 2) and M^2 = {event 2, event 3}. I.e. no person can attend both 1 and 2, or 2 and 3, but 1 and 3 is fine.
Max sum Z(i,j)
z(i,j)<= sum_m z(i,j,m)
(every i,j)(i and j can meet if they are in the same location m at least once)
z(i,j,m)<= z(i,m) (for every i,j,m)
(if i and j attend m, then i attends m)
z(i,j,m)<= z(j,m) (for every i,j,m)
(if i and j attend m, then j attends m)
sum_i z(i,m) <= C(m) (for every m)
(only C(m) persons can visit event m)
sum_(m in M^t) z(i,m) <= 1 (for every t and i)
(if m and m' are both overlapping time t, then no person can visit them both. )
As pointed out by #SaeedAmiri, this looks like a complex problem.
My guess would be that the backtracking and linear programming options you are considering will explode as soon as the number of assistants grows a bit (maybe in the order of tens of assistants).
Maybe you should consider a (meta)heuristic approach if optimality is not a requirement, or constraint programming to build an initial model and see how it scales.
To give you a more precise answer, why do you need to solve this problem? what would be the typical number of attendees? number of rooms?