How to solve this task using Topological sort? - topological-sort

There are N modules in the project. Each module has
(i) Completion time denoted in number of hours (Hi) and may depend on other modules. If Module x depends on Module y then one needs to complete y before x. s Project manager, you are asked to deliver the project as early as possible. Provide an estimation of amount of time required to complete the project.
Input Format:
First line contains T, number of test cases.
For each test case: First line contains N, number of modules. Next N lines, each contain: (i) Module ID (Hi) Number of hours it takes to complete the module (D) Set of module ids that i depends on - integers delimited by space.
Output Format:
Output the minimum number of hours required to deliver the project.
Input: 1
5
1 5
2 6 1
3 3 2
4 2 3
5 1 3
output: 16
I know the problem is related to topological sorting.But cant get idea how to find total hours.

You are looking for the length of the critical path. The is the longest path through the network from start to finish in the digraph where the nodes are the tasks, arrows from a node A to node B represent prerequisite relationships (A must be done before B begins) and the weight of an arrow is the time it takes to complete the source node task. If there isn't any well-defined start and end node it is common to create dummy nodes for that purpose. Create a 0-cost arrow from the start node to all tasks with no prerequisites, and a 0-cost arrow from all nodes which aren't prerequisites to anything else to the end node. Furthermore, the start and end nodes themselves are just book-keeping devices, they themselves shouldn't correspond to tasks which take any time to complete.
Topological sorting doesn't find it for you but is rather a form of pre-processing that allows you to find the critical path in a single pass. You use it to sort the nodes in such a way that the first node listed has no prerequisites and, when you come to a node in the sorted list, you are guaranteed that all prerequisite nodes have been processed. You process them by assigning a minimum start time for each task. The first node (the start node) in the sorted list has start time 0. When you get to a node for which all prerequisite nodes have been processed, the min start time of that node is
max({m_i + t_i })
where i ranges over all prerequisite nodes, m_i is the min start time for node i and t_i is the time it takes to do the task for node i. The point is that m_i + t_i is the minimum finish time for node i and you take the max of such things because all prerequisite tasks must be finished before a given task can be begu. The minimum start time of the end node is the length of the critical task.

create a directed graph G if a depends on b add a directed edge in G from b to a apply topological sort on G it lets say we stored it in a array called TOPO[],intialize time=H(0)
now run a loop over TOPO array starting from the second element.
check if TOPO[i] depends on TOPO[i-1] if it is so we have to perform them one after the other so add their task times
time=time+H(i)
if TOPO[i] does not dependent on TOPo[i-1] then we can perform them together so take a maximum of thier task times
time=max(time,H(i))
after the end of the loop variable time will have your answer
"
do this for every component separately and take the maximum of all

Related

Problems that involve time intervals and their overlapping

I have recently came across a lot of questions that involve time intervals as an input. Some of the time intervals are overlapping. And depending upon that you have to perform an optimization, maximization or minimization operation on the input. I am not able to solve such problems. In fact, I am not able to even start thinking on these problems.
Here is an example:
Let us say, you are a resource holder. There can be an infinite supply of such a resource.
There are people who want that resource for a particular time interval. For ex: 4 pm to 8 pm
There can be an overlapping interval. ex: 5 pm to 7 pm, 3 pm to 6 pm
etc.
Depending upon these intervals, and their overlapping nature, you have to figure out how many distinct instances of these resources are required.
Ex. Input:
8 am - 9 am
8:30 am to 9:15 am
9.30 am to 1040 am
In this case, the first two intervals overlap. So two instances of resources will be required. The third interval is not overlapping, so the person with that interval can reuse the resource returned by any of the earlier ones.
Hence, in this case, minimum resources required are 2.
I don't need a solution. I need some pointers on how to solve. Are there any algorithms that address such questions? What should I read/ study. Are there any data structures that might help.
The number of intervals overlapping any time instant T is the number of interval start times less than T, minus the number of interval end times less than or equal to T.
Many of these problems, like the specific one above, can be solved by putting the start and end times separately into a sorted list or tree so you can figure out stuff about how these counts change over time.
To solve this problem, for example, sort the start and end times in a single list:
800S, 900E, 830S, 915E, 930S, 1040E
then sort them:
800S, 830S, 900E, 915E, 930S, 1040E
The run through the list and count, adding 1 for each start time and subtracting one for each end time:
1 2 1 0 1 0
The highest number of overlapping intervals is 2.
The data structure you need to use in order to solve this type of problems is The Interval Graph. The Interval Graph has a vertex for every interval and an edge between every pair of vertices corresponding to intervals that intersect.
The following interval graph corresponds to the set of three intervals in your example:
A: 8:00-9:00
B: 8:30-9:15
C: 9:30-10:40
This data structure captures the relevant aspects of most problems involving intervals and thus helps to solve them efficiently. Also, given the set of intervals (represented by a list of 2-tuples), you can construct the interval graph in Polynomial time.
Many problems that are NP-hard in general graphs, such as finding the Maximum Weight Independent Set or finding the Optimal Coloring, can be efficiently solved for interval graphs.
To solve the particular problem you've specified, first construct the interval graph G, while storing for each vertex the finish time of its corresponding interval. Also initialize a set of resources R={1} that at first contains only a single resource: resource number 1. Consider each vertex v of G in sorted order according to their finish time. Assign to v resource number i where i is the smallest resource in R not used by the neighbors of v. If no such a resource exists (because the neighbors of v use all the resources in R), insert a new resource i=max{R}+1 to R and assign it to v. The optimal number of resources (aka, the solution to your problem) is the size of the set R.

Parts Moving Through Tanks. Shortest Path Algorithm

Suppose there is a production line with 8 tanks: each filled with a different substance for parts to be dipped in. The parts will be dropped into tanks by a crane along side the tanks. Each part moving through the tanks has a recipe associated with it. That is, each a part with recipe #1 must be in tank 1 for 10 seconds and tank 2 for 5 seconds and so on. Also each part must be dipped in each tank in the order of the tank numbers 1,2,3,4,5,6,7,8.
Further suppose that each part cannot sit in a tank for more than the time specified in the recipe for that part and the travel time of the crane is instantaneous. For example if a part is in tank 2 for 10 seconds and the next part scheduled to enter tank 1 only is supposed to be in tank 1 for 5 seconds then that part will not be put in tank 1 because it would then have to wait in tank 1 for 5 seconds longer than the recipe specified. Instead the crane must wait to put a new part in tank 1 until it is guaranteed to not have any wait time moving between tanks.
Now if you have say 50 parts with recipe 1, 50 with recipe 2, and 50 with recipe 3 then what is the optimal way to add parts into the tanks? 1,1,1,2,3,2,1,3...? or maybe all parts with recipe 1 first then a mix of parts 2 and 3? My most promising thought on maybe solving this problem is to use a shortest path algorithm (which I don't have much experience with), but Dijkstra's algorithm looked promising. I would build a tree where the root node is the first part put on the line and each child represents the next part to be put in the tanks. If you start with a part using recipe 1 then you can think of it as the root node with three children 1,2,3 (one for each type of recipe). Similarly each of those child nodes would have three children 1,2,3 and so on down the line until you've run out of parts to add to the tree. The 'distance' between a parent and its child would then be how long, based on the parent's recipe, that the child has to wait outside the tanks before it can enter and safely move through the tanks with no delays.
The problem with this method, however, is that there are 150!/(50!)^3 = 2*10^64 distinct orders of part which would make it quite difficult to store in any kind of data structure or process it in a reasonable way. What other approaches could I take to solve this problem? Is a definitive optimal order of parts even obtainable or would I have to settle for an approximation?
You can turn this into a minimum cost flow problem.
Use a starting node s with supply 150 (the sum of your amounts of parts)
Add a node for each of your recipes
Connect each of these nodes to s and give them a capacity of 50 (however many of that part you need)
Add another node for each of your recipes, plus 1 extra.
Connect each of your first recipe nodes to each of the nodes you've just created.
Give these edges (i,j) infinite capacity, and a cost of how long you have to wait for recipe j to proceed before you can start recipe i. (For our special extra node, 0, any edge (i,0) should have cost 0 (as though we started with this recipe).
Connect each of these last nodes to a sink t with demand 150 (same as the supply of the source) using an edge with capacity 50 (or the amount of that part needed) or 1 (in the case of our special 0 node, since only 1 part can go first).
You can solve this problem with linear programming. What's nice about this approach is that you'll only have 2*n + 3 nodes and n*(n+3) + 1 edges, regardless of how many parts you have to produce.
EDIT
The Linear Programming formulation is actually way easier than the network flow (to explain):
min sum(i in Recipes, sum(j in Recipes, t_(i,j)*n_(i,j)))
s.t. sum(j in Recipes n_(i,j)) = d_i for all i in Recipes
sum(i in Recipes n_(i,j)) <= d_j for all j in Recipes
sum(i in Recipes n_(i,0)) = 1
n_(i,j) >= 0 for all i in Recipes, for all j in Recipes and 0
where t_(i,j) is the time we wait for recipe j to proceed before starting recipe i, and n_(i,j)is the number of parts of recipes type i that follow a part of recipe type j, and n_(i,0) represents the number of parts of recipes type i that don't follow anything (that go first). d_i is the number of parts of recipe i that should be made.

What does the beam size represent in the beam search algorithm?

I have a question about the beam search algorithm.
Let's say that n = 2 (the number of nodes we are going to expand from every node). So, at the beginning, we only have the root, with 2 nodes that we expand from it. Now, from those two nodes, we expand two more. So, at the moment, we have 4 leafs. We will continue like this till we find the answer.
Is this how beam search works? Does it expand only n = 2 of every node, or it keeps 2 leaf nodes at all the times?
I used to think that n = 2 means that we should have 2 active nodes at most from each node, not two for the whole tree.
In the "standard" beam search algorithm, at every step, the total number of the nodes you currently "know about" is limited - and NOT the number of nodes you will follow from each node.
Concretely, if n = 2, it means that the "beam" will be of size at most 2, at all times. So, initially, you start from one node, then you discover all nodes that are reachable from it, but discard all of them but two, and finish step 1 with 2 nodes. At step 2, you have two nodes, and you will expand both, and discard all nodes again, except exactly 2 nodes (total, not from each!). In the next steps, similarly, you will keep 2 nodes after each step.
Choosing which node to keep is usually done by some heuristic function that evaluates which node is closest to the target.
Note that the beam search algorithm is not complete (i.e., it may not find a solution if one exists) nor optimal (i.e. it may not find the best solution). The best way to see this is witnessing that when n = 1, it basically reduces to best-first-search.
In beam search, instead of choosing the best token to generate at each timestep, we keep k possible tokens at each step. This fixed-size memory footprint k is called the beam width, on the metaphor of a flashlight beam that can be parameterized to be wider or narrower.
Thus at the first step of decoding, we compute a softmax over the entire vocabulary, assigning a probability to each word. We then select the k-best options from this softmax output. These initial k outputs are the search frontier and these k initial words are called hypotheses. A hypothesis is an output sequence, a translation-so- far, together with its probability.
At subsequent steps, each of the k best hypotheses is extended incrementally by being passed to distinct decoders, which each generate a softmax over the entire vocabulary to extend the hypothesis to every possible next token. Each of these k∗V hypotheses is scored by P(yi|x,y<i): the product of the probability of current word choice multiplied by the probability of the path that led to it. We then prune the k∗V hypotheses down to the k best hypotheses, so there are never more than k hypotheses at the frontier of the search, and never more than k decoders.
The beam size(or beam width) is the k aforementioned.
Source: https://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

Algorithm for fairly assigning tasks to workers based on skills

(Before anyone asks, this is not homework.)
I have a set of workers with interests, i.e.:
Bob: Java, XML, Ruby
Susan: Java, HTML, Python
Fred: Python, Ruby
Sam: Java, Ruby
etc.
(There are actually somewhere in the range of 10-25 "interests" for each worker, and I have around 40-50 workers)
At the same time, I have a very large set of tasks that need to be distributed among the workers. Each task has to be assigned to at least 3 workers, and the workers must match at least one of the tasks' interests:
Task 1: Ruby, XML
Task 2: XHTML, Python
and so on. So Bob, Fred, or Sam could get Task 1; Susan or Fred could get Task 2.
This is all stored in a database thusly:
Task
id integer primary key
name varchar
TaskInterests
task_id integer
interest_id integer
Workers
id integer primary key
name varchar
max_assignments integer
WorkerInterests
worker_id
interest_id
Assignments
task_id
worker_id
date_assigned
Each worker has a maximum number of assignments they will do, around 10. Some interests are more rare than others (i.e. only 1 or 2 workers have listed them as a interest), some interests are more common (i.e. half of the workers list them).
The algorithm must:
Assign every task to 3 workers (it is
assumed that at least 3 of the
workers are interested in one of the
interests of the task).
Assign every worker 1 or more tasks
Ideally, the algorithm will:
Assign each worker a number of tasks proportional to their maximum assignments and the total number of tasks. For example, if Susan says she will do 20 tasks and most people will only do 10 tasks and there are 50 workers and 300 tasks, she should be assigned 12 tasks (20/10*(300/50)).
Assign a variety of tasks to each worker, so if Susan lists 4 interests she gets tasks that include 4 interests (rather than getting 10 tasks all with the same interest)
The most difficult aspect so far has been dealing with theses issues:
tasks having interests with few corresponding workers
workers who have few interests, especially
workers who have a few interests, for which there are relatively few tasks
This problem can be modeled as a
Maximum Flow Problem.
In a max-flow problem, you have a directed graph with two special nodes, the source and the sink. The edges in the graph have capacities, and your goal is to assign a flow through the graph from the source to the sink without exceeding any of the edge capacities.
With a (very) carefully crafted graph, we can find an assignment meeting your requirements from the maximum flow.
Let me number the requirements.
Required:
1. Workers are assigned no more than their maximum assignments.
2. Tasks can only be assigned to workers that match one of the task's interests.
3. Every task must be assigned to 3 workers.
4. Every worker must be assigned to at least 1 task.
Optional:
5. Each worker should be assigned a number of tasks proportional to that worker's maximum assignments
6. Each worker should be assigned a variety of tasks.
I will assume that the maximum flow is found using the
Edmonds-Karp Algorithm.
Let's first find a graph that meets requirements 1-3.
Picture the graph as 4 columns of nodes, where edges only go from nodes in a column to nodes in the neighboring column to the right.
In the first column we have the source node. In the next column we will have nodes for each of the workers. From the source, there is an edge to each worker with capacity equal to that worker's maximum assignments. This will enforce requirement 1.
In the third column, there is a node for each task. From each worker in the second column there is an edge to each task that that worker is interested in with a capacity of 1 (a worker is interested in a task if the intersection of their interests is non-empty). This will enforce requirement 2. The capacity of 1 will ensure that each worker takes only 1 of the 3 slots for each task.
In the fourth column we have the sink. There is an edge from each task to the sink with capacity 3. This will enforce requirement 3.
Now, we find a maximum flow in this graph using the Edmonds-Karp Algorithm. If this maximum flow is less than 3 * (# of tasks) then there is no assignment meeting requirements 1-3. If not, there is such an assignment and we can find it by examining the final augmented graph. In the augmented graph, if there is an edge from a task to a worker with capacity 1, then that worker is assigned to that task.
Now, we will modify our graph and algorithm to meet the rest of the requirements.
First, let's meet requirement 4. This will require a small change to the algorithm. Initially, set all the capacities from the source to the workers to 1. Find the max-flow in this graph. If the flow is not equal to the number of workers, then there is no assignment meeting requirement 4. Now, in your final residual graph, for each worker the edge from the source to that worker has capacity 0 and the reverse edge has capacity 1. Change these to that worker's maximum assignments - 1 and 0, respectively. Now continue Edmonds-Karp algorithm as before. Basically what we have done is first find an assignment such that each worker is assigned to exactly one task. Then delete the reverse edge from that task so that the worker will always be assigned to at least one task(though it may not be the one assigned to in the first pass).
Now let's meet requirement 5. Strictly speaking, this requirement just means that we divide each worker's maximum assignments by sum of all worker's maximum assignments / number of tasks. This will quite possibly not have a satisfying assignment. But that's ok. Initialize our graph with these new maximum assignments. Run Edmonds-Karp. If it finds a flow that saturates the edges from tasks to sink, we are done. Otherwise we can increment the capacities from sink to workers in the residual graph and continue running Edmonds-Karp. Repeat until we saturate the edges into the sink. Don't increment the capacities so much that a worker is assigned too many tasks. Also, technically, the increment for each worker should be proportional to that worker's maximum assignments. These are both easy to do.
Finally let's meet requirement 6. This one is a bit tricky. First, add a column between workers and tasks and remove all edges from workers to tasks. In this new column, for each worker add a node for each of that workers interests. From each of these new nodes, add an edge to each task with a matching interest with capacity 1. Add an edge from each worker to each of its interest nodes with capacity 1. Now, a flow in this graph would enforce that if a worker is assigned to n tasks, then the intersection of the union of those task's interests with that worker's interests has size at least n. Again, it is possible that there is a satisfying assignment without this assignment, but there is not one with it. We can handle this the same as requirement 5: run Edmonds-Karp to completion, if no satisfying assignment, increment the capacities from workers to their interest nodes and repeat.
Note that in this modified graph we no longer satisfy requirement 3, as a single worker may be assigned to multiple/all slots of a task if the intersection of their interests has size greater than 1. We can fix that. Add a new column of nodes between the interest nodes and the task nodes and delete the edges between those nodes. For each employee, in the new column insert a node for each task (so each employee has its own node for each task). From these new nodes, to their corresponding task to the right, add an edge with capacity 1. From each worker's interests node to that worker's task nodes, add an edge with capacity 1 from each interest to each task that matches.
-
EDIT: Let me try to clarify this a little. Let -(n)-> be an edge with n capacity.
Previously we had worker-(1)->task for each worker-task pair with a matching interest. Now we have worker-(k)->local interest-(1)->local task-(1)->global task. Now, you can think of a task being matched to a worker-interest pair. The first edge says that for a worker, each of its interests can be matched to k tasks. The second edge says that each of a worker's interests can only be matched once to each job. The third edge says that each task can only be assigned once to each worker. Note that you could push multiple flow from the worker to a local task (equal to the size of the intersection of their interests) but only 1 flow from the worker to the global task node due to the third edge.
-
Also note that we can't really mix this incrementing with the one for requirement 5 correctly. However, we can run the whole algorithm once for each capacity {1,2,...,r} for worker->interest edges. We then need a way to rank the assignments. That is, as we relax requirement 5 we can better meet requirement 6 and vice versa. However, there is another approach that I prefer for relaxing these constraints.
A better approach to requirement relaxation (inspired-by/taken-from templatetypedef)
When we want to be able to relax multiple requirements (e.g. 5 and 6), we can model it as a min-cost max-flow problem. This may be simpler than the incremental search that I described above.
For example, for requirement 5, set all the edge costs to 0. We have the initial edge from the source to the worker with the capacity equal to worker's maximum assignments / (sum of all worker's maximum assignments / number of tasks) and with cost 0. Then you can add another edge with the remaining capacity for that worker and cost 1. Another possibility would be to use some sort of progressive cost such that as you add tasks to a worker the cost to add another task to that user goes up. E.g. you could instead split a worker's remaining capacity up into individual edges with costs 1,2,3,4,....
A similar thing could then be done between the worker nodes and the local-interest nodes for requirement 6. The weighting would need to be balanced to reflect the relative importance of the different requirements.
This method is also sufficient to enforce requirement 4. Also, the costs for requirement 5 should probably be made such that they are proportional to a worker's maximum assignments. Then assigning 1 extra task to a worker with max 100 would not cost as much as assigning an extra to a worker with max 2.
Complexity Analysis
Let n = # of employees, m = # of tasks, k = max interests for a single task/worker, l = # of interests, j = maximum of maximum assignments.
Requirement 3 implies that n = O(m). Let's also assume that l = O(m) and j = O(m).
In the smaller graph (before the change for req. 6), the graph has n + m + 2 = O(m) vertices and at most n + m + k*min(n, m) = O(km) edges.
After the change it has 2 + n + n * l + n * m + m = O(nm) vertices and n + k * n + k * m * n + n * m + m = O(kmn) edges (technically we may need j * n + j * l more nodes and edges so that there are not multiple edges from one node to another, but this wouldn't change the asymptotic bound). Also note that no edge need have capacity > j.
Using the min-cost max-flow formulation, we can find a solution in O(VEBlogV) where V = # vertices, E = # edges, and B = max capacity on a single edge. In our case this gives O(kjn^2m^2log(nm)).
For problems where finding a direct solution is difficult it can be a good idea to use an approximation algorithm, an evaulation function and a method to improve the solution. There are a variety of approaches, such as genetic algorithms and simulated annealing.
The basic idea is to use some sort of simple algorithm (such as a greedy algorithm) to get something that is vaguely usable and make random modifications, keeping those modifications that improve the evaluation score and discarding those that make it worse.
With genetic algorithms a group of for example 100 random solutions is generated and scored and the best are kept and "bred" to produce a new generation of solutions with characteristics similar to the previous generations, but with some random mutations.
For simulated annealing the probablility of a slightly worse solutions being accepted is high initially, but decreases over time. This reduces the risk of getting stuck at a local optimium early on.
Try mapping your task to the stable marriage problem. Tasks become prospective wives `, and your staff become suitors.
You might want to add some extra algorithm for assigning preferences of each task to the staff, and vice-versa - you could assign some ideal proficiency neccessary for the components of each task, and then allow your staff to rank each task. You could assign a proficiency for each component that each staff member posses and use that to get each tasks preference in staff members.
Once you have the preferences then run the algorithm, post the results, then allow people to apply in pairs to you to swap assignments - after all this is a people problem and people work better when they have a degree of control.
So I gave this problem some thought and I think that you can get a good solution (for some definition of "good") by reducing it to an instance of min-cost max-flow (see this, for example). The idea is as follows. Suppose you are given as input a set of jobs J, each of which has a set of skills necessary, along with a set of workers W, each of whom has a set of talents. You are also given for each worker a constant k_i saying how many jobs you'd like them to do, as well as a constant m_i saying the maximum number of jobs you can allocate to them. Your goal is to assign the jobs to the workers in such a way that each job is done by a worker who has the skills, no worker does more than m_i jobs, and the number of the "excess" jobs done by the workers is minimized. For example, if the re are five workers who each want to do four tasks and the load is balanced so that two workers do four jobs, one does three, and one does five, the total excess is one, since one worker did one more job than was expected.
The reduction is as follows. For now, we'll ignore the balancing requirement and just see how tom reduce this to max-flow; we'll add load balancing at the end. Construct a graph G with a designated start node s and sink node t. Add to this graph a node for each job j and each worker w. There will be an edge from s to each of these j nodes of cost zero and capacity one. There will also be an edge from each w node to t with cost zero and capacity m_i. Finally, for each job j and worker w, if worker w has the talents necessary to complete job j, there is an edge from j to w with cost zero and capacity one.
The idea is that we want to push flow from s to t through the j and w nodes such that each flow path going through some j node to a w node means that job j should be given to worker w. The capacity restrictions on the edges from s to j nodes ensures that at most one unit of flow enters the j node, so the job is only assigned at most once. The capacity restriction on the edges from the w nodes to the node t prevent each worker from being assigned too many times. Since all capacities are integral, an integral max flow exists from s to t, and so a max-flow in this graph corresponds to an assignments of jobs to workers that is legal and doesn't exceed any worker's maximum load. You can check whether all jobs are assigned by looking at the total flow in the graph; if it's equal to the number of jobs, they've all been assigned.
This above construction, however, does nothing to balance worker loads. To fix this, we'll modify the construction a bit. Rather than having an edge from each w node to t, instead, for each w node, add two nodes to the graph, c and e, and connect them as follows. There is an edge from w_i to c_i with capacity k_i and cost zero, and an identical edge from c_i to t. There is also an edge from w_i to e_i with cost 1 and capacity m_i - k_i. There is also an edge from e_i to t with equal capacity and zero cost.
Intuitively, we haven't changed the amount of flow that leaves any w node, but we have changed how much that flow costs. Flow shunted to t via the c node is free, and so the worker can take on k_i jobs without incurring cost. Any jobs after that have to be routed through e, which costs one for each unit of flow crossing it. Finding a max-flow in this new graph still determines an assignment, but finding the min-cost max-flow in the graph finds the assignment that minimizes the excess jobs divvied up to workers.
Min-cost max flows can be solved in polynomial time with a few somewhat-well-known algorithms, so hopefully this is a useful answer!

calendar scheduler algorithm

I'm looking for an algorithm that, given a set of items containing a start time, end time, type, and id, it will return a set of all sets of items that fit together (no overlapping times and all types are represented in the set).
S = [("8:00AM", "9:00AM", "Breakfast With Mindy", 234),
("11:40AM", "12:40PM", "Go to Gym", 219),
("12:00PM", "1:00PM", "Lunch With Steve", 079),
("12:40PM", "1:20PM", "Lunch With Steve", 189)]
Algorithm(S) => [[("8:00AM", "9:00AM", "Breakfast With Mindy", 234),
("11:40AM", "12:40PM", "Go to Gym", 219),
("12:40PM", "1:20PM", "Lunch With Steve", 189)]]
Thanks!
This can be solved using graph theory. I would create an array, which contains the items sorted by start time and end time for equal start times: (added some more items to the example):
no.: id: [ start - end ] type
---------------------------------------------------------
0: 234: [08:00AM - 09:00AM] Breakfast With Mindy
1: 400: [09:00AM - 07:00PM] Check out stackoverflow.com
2: 219: [11:40AM - 12:40PM] Go to Gym
3: 79: [12:00PM - 01:00PM] Lunch With Steve
4: 189: [12:40PM - 01:20PM] Lunch With Steve
5: 270: [01:00PM - 05:00PM] Go to Tennis
6: 300: [06:40PM - 07:20PM] Dinner With Family
7: 250: [07:20PM - 08:00PM] Check out stackoverflow.com
After that i would create a list with the array no. of the least item that could be the possible next item. If there isn't a next item, -1 is added:
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7
1 | 7 | 4 | 5 | 6 | 6 | 7 | -1
With that list it is possible to generate a directed acyclic graph. Every vertice has a connection to the vertices starting from the next item. But for vertices where already is a vertices bewteen them no edge is made. I'll try to explain with the example. For the vertice 0 the next item is 1. So a edge is made 0 -> 1. The next item from 1 is 7, that means the range for the vertices which are connected from vertice 0 is now from 1 to (7-1). Because vertice 2 is in the range of 1 to 6, another edge 0 -> 2 is made and the range updates to 1 to (4-1) (because 4 is the next item of 2). Because vertice 3 is in the range of 1 to 3 one more edge 0 -> 3 is made. That was the last edge for vertice 0. That has to be continued with all vertices leading to such a graph:
Until now we are in O(n2). After that all paths can be found using a depth first search-like algorithm and then eliminating the duplicated types from each path.
For that example there are 4 solutions, but none of them has all types because it is not possible for the example to do Go to Gym, Lunch With Steve and Go to Tennis.
Also this search for all paths has a worst case complexity of O(2n). For example the following graph has 2n/2 possible paths from a start vertice to an end vertice.
(source: archive.org)
There could be made some more optimisation, like merging some vertices before searching for all paths. But that is not ever possible. In the first example vertice 3 and 4 can't be merged even though they are of the same type. But in the last example vertice 4 and 5 can be merged if they are of the same type. Which means it doesn't matter which activity you choose, both are valid. This can speed up calculation of all paths dramatically.
Maybe there is also a clever way to consider duplicate types earlier to eliminate them, but worst case is still O(2n) if you want all possible paths.
EDIT1:
It is possible to determine if there are sets that contain all types and get a t least one such solution in polynomial time. I found a algorithm with a worst case time of O(n4) and O(n2) space. I'll take an new example which has a solution with all types, but is more complex.
no.: id: [ start - end ] type
---------------------------------------------------------
0: 234: [08:00AM - 09:00AM] A
1: 400: [10:00AM - 11:00AM] B
2: 219: [10:20AM - 11:20AM] C
3: 79: [10:40AM - 11:40AM] D
4: 189: [11:30AM - 12:30PM] D
5: 270: [12:00PM - 06:00PM] B
6: 300: [02:00PM - 03:00PM] E
7: 250: [02:20PM - 03:20PM] B
8: 325: [02:40PM - 03:40PM] F
9: 150: [03:30PM - 04:30PM] F
10: 175: [05:40PM - 06:40PM] E
11: 275: [07:00PM - 08:00PM] G
1.) Count the different types in the item set. This is possible in O(nlogn). It is 7 for that example.
2.) Create a n*n-matrix, that represents which nodes can reach the actual node and which can be reached from the actual node. For example if position (2,4) is set to 1, means that there is a path from node 2 to node 4 in the graph and (4,2) is set to 1 too, because node 4 can be reached from node 2. This is possible in O(n2). For the example the matrix would look like that:
111111111111
110011111111
101011111111
100101111111
111010111111
111101000001
111110100111
111110010111
111110001011
111110110111
111110111111
111111111111
3.) Now we have in every row, which nodes can be reached. We can also mark each node in a row which is not yet marked, if it is of the same type as a node that can be reached. We set that matrix positions from 0 to 2. This is possible in O(n3). In the example there is no way from node 1 to node 3, but node 4 has the same type D as node 3 and there is a path from node 1 to node 4. So we get this matrix:
111111111111
110211111111
121211111111
120121111111
111212111111
111121020001
111112122111
111112212111
111112221211
111112112111
111112111111
111111111111
4.) The nodes that still contains 0's (in the corresponding rows) can't be part of the solution and we can remove them from the graph. If there were at least one node to remove we start again in step 2.) with the smaller graph. Because we removed at least one node, we have to go back to step 2.) at most n times, but most often this will only happend few times. If there are no 0's left in the matrix we can continue with step 5.). This is possible in O(n2). For the example it is not possible to build a path with node 1 that also contains a node with type C. Therefore it contains a 0 and is removed like node 3 and node 5. In the next loop with the smaller graph node 6 and node 8 will be removed.
5.) Count the different types in the remainig set of items/nodes. If it is smaller than the first count there is no solution that can represent all types. So we have to find another way to get a good solution. If it is the same as the first count we now have a smaller graph which still holds all the possible solutions. O(nlogn)
6.) To get one solution we pick a start node (it doesn't matter which, because all nodes that are left in the graph are part of a solution). O(1)
7.) We remove every node that can't be reached from the choosen node. O(n)
8.) We create a matrix like in step 2.) and 3.) for that graph and remove the nodes that can not reach nodes of any type like in step 4.). O(n3)
9.) We choose one of the next nodes from the node we choosen before and continue with 7.) until there we are at a end node and the graph only has one path left.
That way it is also possible to get all paths, but that can still be exponential many. After all it should be faster than finding solutions in the original graph.
Hmmm, this reminds me of a task in the university, I'll describe what i can remember
The run-time is O(n*logn) which is pretty good.
This is a greedy approuch..
i will refine your request abit, tell me if i'm wrong..
Algorithem should return the MAX subset of non colliding tasks(in terms of total length? or amount of activities? i guess total length)
I would first order the list by the finishing times(first-minimum finishing time,last-maximum) = O(nlogn)
Find_set(A):
G<-Empty set;
S<-A
f<-0
while S!='Empty set' do
i<-index of activity with earliest finish time(**O(1)**)
if S(i).finish_time>=f
G.insert(S(i)) \\add this to result set
f=S(i).finish_time
S.removeAt(i) \\remove the activity from the original set
od
return G
Run time analysis:
initial ordering :nlogn
each iteration O(1)*n = O(n)
Total O(nlogn)+O(n) ~ O(nlogn) (well, given the O notation weakness to represent real complexety on small numbers.. but as the scale grow, this is a good algo)
Enjoy.
Update:
Ok, it seems like i've misread the post, you can alternatively use dynamic programming to reduce running time, there is a solution in link text page 7-19.
you need to tweak the algorithm a bit, first you should build the table, then you can get all variations on it fairly easy.
I would use an Interval Tree for this.
After you build the data structure, you can iterate each event and perform an intersection query. If no intersections are found, it is added to your schedule.
Yes exhaustive search might be an option:
initialise partial schedules with earliest tasks that overlap (eg 9-9.30
and 9.15-9.45)
foreach partial schedule generated so far generate a list of new partial schedules appending to each partial schedule the earliest task that don't overlap (generate more than one in case of ties)
recur with new partial schedules
In your case initlialisation would produce only (8-9 breakfast)
After the first iteration: (8-9 brekkie, 11.40-12.40 gym) (no ties)
After the second iteration: (8-9 brekkie, 11.40-12.40 gym, 12.40-1.20 lunch) (no ties again)
This is a tree search, but it's greedy. It leaves out possibilities like skipping the gym and going to an early lunch.
Since you're looking for every possible schedule, I think the best solution you will find will be a simple exhaustive search.
The only thing I can say algorithmically is that your data structure of lists of strings is pretty terrible.
The implementation is hugely language dependent so I don't even think pseudo-code would make sense, but I'll try to give the steps for the basic algorithm.
Pop off the first n items of the same type and put them in list.
For each item in list, add that item to schedule set.
Pop off next n items of same type off list.
For each item that starts after the first item ends, put on list. (If none, fail)
Continue until done.
Hardest part is deciding exactly how to construct the lists/recursion so it's most elegant.

Resources