Related
An agent is works between n producer and m consumers. ith producer, generates s_i candy and jth consumer consumes b_j candy in this year. For each candy that sales, agent get 1 dollar payoff. For some problems, one strict rule was defined that a producer should be sales candy to any producer that the distance between them is not greater than 100 KM (kilometers). if we have list of all pairs of producer-consumer that the distance between them is lower than 100 KM, which of the following algorithm is goof for finding maximum payoffs? (suppose s_i and b_j may be becomes very large).
1) Maximal Matching
2) Dynamic Programming
3) Maximum Flow
4) 1 , 3
this is a last question on 2013-Final Exam on DS course. Any hint or idea?
To my understanding, the problem can be solved as a Maximum Bipartite Matching, however the modelling requires the graph to be relatively large compared to the original input; for every feasible edge (s_i,b_j) introduce s_i nodes for the producer partition and b_j for the consumer partition and connect each producer node of the to each consumer node of them.
The problem can be modelled as Maximum Flow Problem on a bipartite graph, however; the flow on each feasible edge (s_i,b_j) is constrained by 0 from below and min{s_i,b_j} from above, as the flow must be nonnegative, but may exceed neither the producer's nor the consumer's capacity.
Concerning a solution via Dynamic Prgramming, consider the following sketch of a recurrence relation for Maximum Bipartite Matching. Let n and m denote the number of nodes in the first and second partition, respectively. Fix a node a in the first partition; either a is not matched or matched to one of its neighbors; in either case, each possibility is evaluated, recursively evaluating an instance where the partitions have n-1 and m or n-1 and m-1 nodes, respectively; the best of these choices is taken.
To put it all in a nutshell, apparently all of the proposed approaches yield valid solutions; however, the modelling as network flow seems to be most natural.
I'm working on a problem from "Algorithm Design" by Kleinberg, specifically problem 4.15. I'm not currently enrolled in the class that this relates to -- I'm taking a crack at the problem set before the new quarter starts to see if I'd be able to do it. The question is as follows:
The manager of a large student union on campus comes to you with the
following problem. She’s in charge of a group of n students, each of whom
is scheduled to work one shift during the week. There are different jobs
associated with these shifts (tending the main desk, helping with package
delivery, rebooting cranky information kiosks, etc.), but.we can view each
shift as a single contiguous interval of time. There can be multiple shifts
going on at once.
She’s trying to choose a subset of these n students to form a super-
vising committee that she can meet with once a week. She considers such
a committee to be complete if, for every student not on the committee,
that student’s shift overlaps (at least partially) the shift of some student
who is on the committee. In this way, each student’s performance can be
observed by at least one person who’s serving on the committee.
Give an efficient algorithm that takes the schedule of n shifts and
produces a complete supervising committee containing as few students
as possible.
Example. Suppose n = 3, and the shifts are
Monday 4 p.M.-Monday 8 P.M.,
Monday 6 p.M.-Monday 10 P.M.,
Monday 9 P.M.-Monday 1I P.M..
Then the smallest complete supervising committee would consist of just
the second student, since the second shift overlaps both the first and the
third.
My attempt (I can't find this problem in my solution manual, so I'm asking here):
Construct a graph G with vertices S1, S2, ..., Sn for each student.
Let there be an edge between Si and Sj iff students i and j have an overlapping
shift. Let C represent the set of students in the supervising committee.
[O(n + 2m) to build an adjacency list, where m is the number of shifts?
Since we have to add at least each student to the adjacency list, and add an
additional m entries for each shift, with two entries added per shift since
our graph is undirected.]
Sort the vertices by degree into a list S [O(n log n)].
While S[0] has degree > 0:
(1) Add Si to C. [O(1)]
(2) Delete Si and all of the nodes that it was connected to, update the
adjacency list.
(3) Update S so that it is once again sorted.
Add any remaining vertices of degree 0 to C.
I'm not sure how to quantify the runtime of (2) and (3). Since the degree of any node is bounded by n, it seems that (2) is bounded by O(n). But the degree of the node removed in (1) also affects the number of iterations performed inside of the while loop, so I suspect that it's possible to say something about the upper bound of the whole while loop -- something to the effect of "Any sequence of deletions will involve deleting at most n nodes in linear time and resorting at most n nodes in linear time, resulting in an upper bound of O(n log n) for the while loop, and therefore of the algorithm as a whole."
You don't want to convert this to a general graph problem, as then it's simply the NP-hard vertex cover problem. However, on interval graphs in particular, there is in fact a linear-time greedy algorithm, as described in this paper (which is actually for a more general problem, but works fine here). From a quick read of it, here's how it applies to your problem:
Sort the students by the time at which their shift ends, from earliest to latest. Number them 1 through n.
Initialize a counter k = 1 which represents the earliest student in the ordering not in the committee.
Starting from k, find the first student in the order whose shift does not intersect student k's shift. Suppose this is student i. Add student i-1 to the committee, and update k to be the new earliest student not covered by the committee.
Repeat the previous step until all students are covered.
(This feels correct, but like I said I only had a quick read, so please say if I missed something)
I have a complex problem and I want to know if an existing and well understood solution model exists or applies, like the Traveling Salesman problem.
Input:
A calendar of N time events, defined by starting and finishing time, and place.
The capacity of each meeting place (maximum amount of people it can simultaneously hold)
A set of pairs (Ai,Aj) which indicates that attendant Ai wishes to meet with attendat Aj, and Aj accepted that invitation.
Output:
For each assistant A, a cronogram of all the events he will attend. The main criteria is that each attendants should meet as many of the attendants who accepted his invites as possible, satisfying the space constraints.
So far, we thought of solving with backtracking (trying out all possible solutions), and using linear programming (i.e. defining a model and solving with the simplex algorithm)
Update: If Ai already met Aj in some event, they don't need to meet anymore (they have already met).
Your problem is as hard as minimum maximal matching problem in interval graphs, w.l.o.g Assume capacity of rooms is 2 means they can handle only one meeting in time. You can model your problem with Interval graphs, each interval (for each people) is one node. Also edges are if A_i & A_j has common time and also they want to see each other, set weight of edges to the amount of time they should see each other, . If you find the minimum maximal matching in this graph, you can find the solution for your restricted case. But notice that this graph is n-partite and also each part is interval graph.
P.S: note that if the amount of time that people should be with each other is fixed this will be more easier than weighted one.
If you have access to a good MIP solver (cplex/gurobi via acedamic initiative, but coin OR and LP_solve are open-source, and not bad either), I would definitely give simplex a try. I took a look at formulating your problem as a mixed integer program, and my feeling is that it will have pretty strong relaxations, so branch and cut and price will go a long way for you. These solvers give remarkably scalable solutions nowadays, especially the commercial ones. Advantage is they also provide an upper bound, so you get an idea of solution quality, which is not the case for heuristics.
Formulation:
Define z(i,j) (binary) as a variable indicating that i and j are together in at least one event n in {1,2,...,N}.
Define z(i,j,n) (binary) to indicate they are together in event n.
Define z(i,n) to indicate that i is attending n.
Z(i,j) and z(i,j,m) only exist if i and j are supposed to meet.
For each t, M^t is a subset of time events that are held simulteneously.
So if event 1 is from 9 to 11, event 2 is from 10 to 12 and event 3 is from 11 to 13, then
M^1 = {event 1, event 2) and M^2 = {event 2, event 3}. I.e. no person can attend both 1 and 2, or 2 and 3, but 1 and 3 is fine.
Max sum Z(i,j)
z(i,j)<= sum_m z(i,j,m)
(every i,j)(i and j can meet if they are in the same location m at least once)
z(i,j,m)<= z(i,m) (for every i,j,m)
(if i and j attend m, then i attends m)
z(i,j,m)<= z(j,m) (for every i,j,m)
(if i and j attend m, then j attends m)
sum_i z(i,m) <= C(m) (for every m)
(only C(m) persons can visit event m)
sum_(m in M^t) z(i,m) <= 1 (for every t and i)
(if m and m' are both overlapping time t, then no person can visit them both. )
As pointed out by #SaeedAmiri, this looks like a complex problem.
My guess would be that the backtracking and linear programming options you are considering will explode as soon as the number of assistants grows a bit (maybe in the order of tens of assistants).
Maybe you should consider a (meta)heuristic approach if optimality is not a requirement, or constraint programming to build an initial model and see how it scales.
To give you a more precise answer, why do you need to solve this problem? what would be the typical number of attendees? number of rooms?
(Before anyone asks, this is not homework.)
I have a set of workers with interests, i.e.:
Bob: Java, XML, Ruby
Susan: Java, HTML, Python
Fred: Python, Ruby
Sam: Java, Ruby
etc.
(There are actually somewhere in the range of 10-25 "interests" for each worker, and I have around 40-50 workers)
At the same time, I have a very large set of tasks that need to be distributed among the workers. Each task has to be assigned to at least 3 workers, and the workers must match at least one of the tasks' interests:
Task 1: Ruby, XML
Task 2: XHTML, Python
and so on. So Bob, Fred, or Sam could get Task 1; Susan or Fred could get Task 2.
This is all stored in a database thusly:
Task
id integer primary key
name varchar
TaskInterests
task_id integer
interest_id integer
Workers
id integer primary key
name varchar
max_assignments integer
WorkerInterests
worker_id
interest_id
Assignments
task_id
worker_id
date_assigned
Each worker has a maximum number of assignments they will do, around 10. Some interests are more rare than others (i.e. only 1 or 2 workers have listed them as a interest), some interests are more common (i.e. half of the workers list them).
The algorithm must:
Assign every task to 3 workers (it is
assumed that at least 3 of the
workers are interested in one of the
interests of the task).
Assign every worker 1 or more tasks
Ideally, the algorithm will:
Assign each worker a number of tasks proportional to their maximum assignments and the total number of tasks. For example, if Susan says she will do 20 tasks and most people will only do 10 tasks and there are 50 workers and 300 tasks, she should be assigned 12 tasks (20/10*(300/50)).
Assign a variety of tasks to each worker, so if Susan lists 4 interests she gets tasks that include 4 interests (rather than getting 10 tasks all with the same interest)
The most difficult aspect so far has been dealing with theses issues:
tasks having interests with few corresponding workers
workers who have few interests, especially
workers who have a few interests, for which there are relatively few tasks
This problem can be modeled as a
Maximum Flow Problem.
In a max-flow problem, you have a directed graph with two special nodes, the source and the sink. The edges in the graph have capacities, and your goal is to assign a flow through the graph from the source to the sink without exceeding any of the edge capacities.
With a (very) carefully crafted graph, we can find an assignment meeting your requirements from the maximum flow.
Let me number the requirements.
Required:
1. Workers are assigned no more than their maximum assignments.
2. Tasks can only be assigned to workers that match one of the task's interests.
3. Every task must be assigned to 3 workers.
4. Every worker must be assigned to at least 1 task.
Optional:
5. Each worker should be assigned a number of tasks proportional to that worker's maximum assignments
6. Each worker should be assigned a variety of tasks.
I will assume that the maximum flow is found using the
Edmonds-Karp Algorithm.
Let's first find a graph that meets requirements 1-3.
Picture the graph as 4 columns of nodes, where edges only go from nodes in a column to nodes in the neighboring column to the right.
In the first column we have the source node. In the next column we will have nodes for each of the workers. From the source, there is an edge to each worker with capacity equal to that worker's maximum assignments. This will enforce requirement 1.
In the third column, there is a node for each task. From each worker in the second column there is an edge to each task that that worker is interested in with a capacity of 1 (a worker is interested in a task if the intersection of their interests is non-empty). This will enforce requirement 2. The capacity of 1 will ensure that each worker takes only 1 of the 3 slots for each task.
In the fourth column we have the sink. There is an edge from each task to the sink with capacity 3. This will enforce requirement 3.
Now, we find a maximum flow in this graph using the Edmonds-Karp Algorithm. If this maximum flow is less than 3 * (# of tasks) then there is no assignment meeting requirements 1-3. If not, there is such an assignment and we can find it by examining the final augmented graph. In the augmented graph, if there is an edge from a task to a worker with capacity 1, then that worker is assigned to that task.
Now, we will modify our graph and algorithm to meet the rest of the requirements.
First, let's meet requirement 4. This will require a small change to the algorithm. Initially, set all the capacities from the source to the workers to 1. Find the max-flow in this graph. If the flow is not equal to the number of workers, then there is no assignment meeting requirement 4. Now, in your final residual graph, for each worker the edge from the source to that worker has capacity 0 and the reverse edge has capacity 1. Change these to that worker's maximum assignments - 1 and 0, respectively. Now continue Edmonds-Karp algorithm as before. Basically what we have done is first find an assignment such that each worker is assigned to exactly one task. Then delete the reverse edge from that task so that the worker will always be assigned to at least one task(though it may not be the one assigned to in the first pass).
Now let's meet requirement 5. Strictly speaking, this requirement just means that we divide each worker's maximum assignments by sum of all worker's maximum assignments / number of tasks. This will quite possibly not have a satisfying assignment. But that's ok. Initialize our graph with these new maximum assignments. Run Edmonds-Karp. If it finds a flow that saturates the edges from tasks to sink, we are done. Otherwise we can increment the capacities from sink to workers in the residual graph and continue running Edmonds-Karp. Repeat until we saturate the edges into the sink. Don't increment the capacities so much that a worker is assigned too many tasks. Also, technically, the increment for each worker should be proportional to that worker's maximum assignments. These are both easy to do.
Finally let's meet requirement 6. This one is a bit tricky. First, add a column between workers and tasks and remove all edges from workers to tasks. In this new column, for each worker add a node for each of that workers interests. From each of these new nodes, add an edge to each task with a matching interest with capacity 1. Add an edge from each worker to each of its interest nodes with capacity 1. Now, a flow in this graph would enforce that if a worker is assigned to n tasks, then the intersection of the union of those task's interests with that worker's interests has size at least n. Again, it is possible that there is a satisfying assignment without this assignment, but there is not one with it. We can handle this the same as requirement 5: run Edmonds-Karp to completion, if no satisfying assignment, increment the capacities from workers to their interest nodes and repeat.
Note that in this modified graph we no longer satisfy requirement 3, as a single worker may be assigned to multiple/all slots of a task if the intersection of their interests has size greater than 1. We can fix that. Add a new column of nodes between the interest nodes and the task nodes and delete the edges between those nodes. For each employee, in the new column insert a node for each task (so each employee has its own node for each task). From these new nodes, to their corresponding task to the right, add an edge with capacity 1. From each worker's interests node to that worker's task nodes, add an edge with capacity 1 from each interest to each task that matches.
-
EDIT: Let me try to clarify this a little. Let -(n)-> be an edge with n capacity.
Previously we had worker-(1)->task for each worker-task pair with a matching interest. Now we have worker-(k)->local interest-(1)->local task-(1)->global task. Now, you can think of a task being matched to a worker-interest pair. The first edge says that for a worker, each of its interests can be matched to k tasks. The second edge says that each of a worker's interests can only be matched once to each job. The third edge says that each task can only be assigned once to each worker. Note that you could push multiple flow from the worker to a local task (equal to the size of the intersection of their interests) but only 1 flow from the worker to the global task node due to the third edge.
-
Also note that we can't really mix this incrementing with the one for requirement 5 correctly. However, we can run the whole algorithm once for each capacity {1,2,...,r} for worker->interest edges. We then need a way to rank the assignments. That is, as we relax requirement 5 we can better meet requirement 6 and vice versa. However, there is another approach that I prefer for relaxing these constraints.
A better approach to requirement relaxation (inspired-by/taken-from templatetypedef)
When we want to be able to relax multiple requirements (e.g. 5 and 6), we can model it as a min-cost max-flow problem. This may be simpler than the incremental search that I described above.
For example, for requirement 5, set all the edge costs to 0. We have the initial edge from the source to the worker with the capacity equal to worker's maximum assignments / (sum of all worker's maximum assignments / number of tasks) and with cost 0. Then you can add another edge with the remaining capacity for that worker and cost 1. Another possibility would be to use some sort of progressive cost such that as you add tasks to a worker the cost to add another task to that user goes up. E.g. you could instead split a worker's remaining capacity up into individual edges with costs 1,2,3,4,....
A similar thing could then be done between the worker nodes and the local-interest nodes for requirement 6. The weighting would need to be balanced to reflect the relative importance of the different requirements.
This method is also sufficient to enforce requirement 4. Also, the costs for requirement 5 should probably be made such that they are proportional to a worker's maximum assignments. Then assigning 1 extra task to a worker with max 100 would not cost as much as assigning an extra to a worker with max 2.
Complexity Analysis
Let n = # of employees, m = # of tasks, k = max interests for a single task/worker, l = # of interests, j = maximum of maximum assignments.
Requirement 3 implies that n = O(m). Let's also assume that l = O(m) and j = O(m).
In the smaller graph (before the change for req. 6), the graph has n + m + 2 = O(m) vertices and at most n + m + k*min(n, m) = O(km) edges.
After the change it has 2 + n + n * l + n * m + m = O(nm) vertices and n + k * n + k * m * n + n * m + m = O(kmn) edges (technically we may need j * n + j * l more nodes and edges so that there are not multiple edges from one node to another, but this wouldn't change the asymptotic bound). Also note that no edge need have capacity > j.
Using the min-cost max-flow formulation, we can find a solution in O(VEBlogV) where V = # vertices, E = # edges, and B = max capacity on a single edge. In our case this gives O(kjn^2m^2log(nm)).
For problems where finding a direct solution is difficult it can be a good idea to use an approximation algorithm, an evaulation function and a method to improve the solution. There are a variety of approaches, such as genetic algorithms and simulated annealing.
The basic idea is to use some sort of simple algorithm (such as a greedy algorithm) to get something that is vaguely usable and make random modifications, keeping those modifications that improve the evaluation score and discarding those that make it worse.
With genetic algorithms a group of for example 100 random solutions is generated and scored and the best are kept and "bred" to produce a new generation of solutions with characteristics similar to the previous generations, but with some random mutations.
For simulated annealing the probablility of a slightly worse solutions being accepted is high initially, but decreases over time. This reduces the risk of getting stuck at a local optimium early on.
Try mapping your task to the stable marriage problem. Tasks become prospective wives `, and your staff become suitors.
You might want to add some extra algorithm for assigning preferences of each task to the staff, and vice-versa - you could assign some ideal proficiency neccessary for the components of each task, and then allow your staff to rank each task. You could assign a proficiency for each component that each staff member posses and use that to get each tasks preference in staff members.
Once you have the preferences then run the algorithm, post the results, then allow people to apply in pairs to you to swap assignments - after all this is a people problem and people work better when they have a degree of control.
So I gave this problem some thought and I think that you can get a good solution (for some definition of "good") by reducing it to an instance of min-cost max-flow (see this, for example). The idea is as follows. Suppose you are given as input a set of jobs J, each of which has a set of skills necessary, along with a set of workers W, each of whom has a set of talents. You are also given for each worker a constant k_i saying how many jobs you'd like them to do, as well as a constant m_i saying the maximum number of jobs you can allocate to them. Your goal is to assign the jobs to the workers in such a way that each job is done by a worker who has the skills, no worker does more than m_i jobs, and the number of the "excess" jobs done by the workers is minimized. For example, if the re are five workers who each want to do four tasks and the load is balanced so that two workers do four jobs, one does three, and one does five, the total excess is one, since one worker did one more job than was expected.
The reduction is as follows. For now, we'll ignore the balancing requirement and just see how tom reduce this to max-flow; we'll add load balancing at the end. Construct a graph G with a designated start node s and sink node t. Add to this graph a node for each job j and each worker w. There will be an edge from s to each of these j nodes of cost zero and capacity one. There will also be an edge from each w node to t with cost zero and capacity m_i. Finally, for each job j and worker w, if worker w has the talents necessary to complete job j, there is an edge from j to w with cost zero and capacity one.
The idea is that we want to push flow from s to t through the j and w nodes such that each flow path going through some j node to a w node means that job j should be given to worker w. The capacity restrictions on the edges from s to j nodes ensures that at most one unit of flow enters the j node, so the job is only assigned at most once. The capacity restriction on the edges from the w nodes to the node t prevent each worker from being assigned too many times. Since all capacities are integral, an integral max flow exists from s to t, and so a max-flow in this graph corresponds to an assignments of jobs to workers that is legal and doesn't exceed any worker's maximum load. You can check whether all jobs are assigned by looking at the total flow in the graph; if it's equal to the number of jobs, they've all been assigned.
This above construction, however, does nothing to balance worker loads. To fix this, we'll modify the construction a bit. Rather than having an edge from each w node to t, instead, for each w node, add two nodes to the graph, c and e, and connect them as follows. There is an edge from w_i to c_i with capacity k_i and cost zero, and an identical edge from c_i to t. There is also an edge from w_i to e_i with cost 1 and capacity m_i - k_i. There is also an edge from e_i to t with equal capacity and zero cost.
Intuitively, we haven't changed the amount of flow that leaves any w node, but we have changed how much that flow costs. Flow shunted to t via the c node is free, and so the worker can take on k_i jobs without incurring cost. Any jobs after that have to be routed through e, which costs one for each unit of flow crossing it. Finding a max-flow in this new graph still determines an assignment, but finding the min-cost max-flow in the graph finds the assignment that minimizes the excess jobs divvied up to workers.
Min-cost max flows can be solved in polynomial time with a few somewhat-well-known algorithms, so hopefully this is a useful answer!
I'd like to understand the algorithm that solves Google Code Jam, Tutorial, Problem C. So far I wrote my own basic implementation that solves the small problem. I find that it's unable to deal with the large problem (complexity O(min(n, 2*k)! which is 30! in the larger data set).
I found this solution page, but the solutions are of course not documented (there's a time limit to the context). I saw that at least one of the solutions used the Union Find data structure, but I don't understand how it's applied here.
Does anyone know of a page with the algorithms that solve these problems, not just code?
Not sure if there's a better way to deal with this near duplicate of GCJ - Hamiltonian Cycles, but here's my answer from there:
The O(2k)-based solution uses the inclusion-exclusion principle. Given that there are k forbidden edges, there are 2k subsets of those edges, including the set itself and the empty set. For instance, if there were 3 forbidden edges: {A, B, C}, there would be 23=8 subsets: {}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C}.
For each subset, you calculate the number of cycles that include at least all the edges in that subset . If the number of cycles containing edges s is f(s) and S is the set of all forbidden edges, then by the inclusion-exclusion principle, the number of cycles without any forbidden edges is:
sum, for each subset s of S: f(s) * (-1)^|s|
where |s| is the number of elements in s. Put another way, the sum of the number of cycles with any edges minus the number of cycles with at least 1 forbidden edge plus the number with at least 2 forbidden edges, ...
Calculating f(s) is not trivial -- at least I didn't find an easy way to do it. You might stop and ponder it before reading on.
To calculate f(s), start with the number of permutations of the nodes not involved with any s nodes. If there are m such nodes, there are m! permutations, as you know. Call the number of permutations c.
Now examine the edges in s for chains. If there are any impossible combinations, such as a node involved with 3 edges or a subcycle within s, then f(s) is 0.
Otherwise, for each chain increment m by 1 and multiply c by 2m. (There are m places to put the chain in the existing permutations and the factor of 2 is because the chain can be forwards or backwards.) Finally, f(s) is c/(2m). The last division converts permutations to cycles.
The important limit imposed on the input data is that the number of forbidden edges k<=15.
You can proceed by inclusion and exclusion:
calculate the number of all cycles ((n-1)!),
for each forbidden edge e, substract the number of cycles that contains it ((n-2)!/2, unless n is very small),
for each pair of forbidden edges e,f, add the number of cycles that contain both of them (this will depend on whether e and f touch),
for each triple ..., substract ...,
etc.
Since there are only 2^k <= 32768 subsets of F (the set of all forbidden edges), you will get a reasonable bound on the running time.
The analysis of the google code jam problem Endless Knight uses a similar idea.
Hamiltonian cycle problem is a special case of travelling salesman problem (obtained by setting the distance between two cities to a finite constant if they are adjacent and infinity otherwise.)
These are NP Complete problems which in simple words means no fast solution to them is known.
For solving questions on Google Code Jam you should know Algorimths Analysis and Design though they can be solved in exponential time and don't worry Google knows that very well. ;)
The following sources should be enough to get you started:
MIT Lecture Videos: "Introduction to Algorithims"
TopCoder Tutorials
http://dustycodes.wordpress.com
Book: Introduction to Algorithms [Cormen, Leiserson, Rivest & Stein]
The Algorithm Design Manual [Steven S. Skiena]