Node-weighted DAGs and longest paths with updates - algorithm

Suppose you have a set of jobs which can possibly be done in parallel. Each job has a time requirement ( time requirement for the i th job is t_i ). There are also some dependencies, the i th of them saying that you have to do job u_i before job v_i. You have to minimize the total time required.
This is easily done by converting these relations into a directed acyclic graph and then using it to determine which ones to do in parallel.
In case I'm not clear, here is an example. Suppose you have 5 jobs with time requirements 2, 9, 3, 12, 5, and you have to do 3 before 5, 4 before 5, 3 before 1 and 1 before 2. Then the best you can do is 17. This is your dag:
+---> 1 (2) ---> 2(9)
|
3 (3)
|
+----> 5 (5)
^
|
4 (12)-+
You can do 3 and 4 in parallel, so that you spend MAX(3,12)=12 units of time before doing 5, which takes 5 units of time. So 5 is completed after 17 units of time. On the other hand 2 is completed after 14 units. So the answer is 17.
The question is, if exactly one time requirement in updated in each query (where everytime you start with the original graph, not the graph resulting after the previous modifications)
how can you efficiently find the new minimum time requirement?
For those who want constraints, number of jobs <= 10^5. number of dependencies <= 10^6, number of queries <= 10^6.

The time requirement is the maximum weight of a path in the acyclic directed graph. Two linear-time traversals yield, for each node, the maximum weight of a path ending with that node and the maximum weight of a path starting with that node. Now we can find the maximum length of a path including a specified node. If a node's weight increases, then we take the maximum of the previous maximum and the new maximum involving that node, which we can compute in constant time.
Decreases are trickier, since we need, for each node in the maximum weight path, the maximum weight of a path not including that node. The first way (maybe not the best) that I can think of to do this is the following. Add to the acyclic directed graph a source and a sink, both with weight zero and connections to (source) or from (sink) each of the other nodes. Number the nodes in topological order, where the source is 0 and the sink is n + 1, and initialize a segment tree mapping nodes to path weights, where the initial values are -infinity. This segment tree has the following logarithmic-time operations.
Weight(i) - returns the value for node i
Update(i, j, w) - updates the value for nodes i..j to
the maximum of the current value and w
For each arc from i to j, call Update(i + 1, j - 1, w), where w is the maximum weight of a path that includes the arc from i to j. At the end, each weight in the segment tree is the maximum weight of a path excluding the corresponding node.
(Note on the running time: by treating nodes without dependencies separately, the running time of this approach can be made to be O(m log n + n + q), where n is the number of nodes, m is the number of dependencies, and q is the number of queries. My segment tree sort of is computing 3D maxima, a problem well studied by computational geometers. With presorted input (at least in two dimensions), the fastest known algorithm for n points is O(n log log n), via van Emde Boas trees. There also exist algorithms with output-sensitive time bounds better in some cases than that worst-case bound.)

Related

Give minimum permutation weight for edges such that a given set of edge is the Minimum Spanning Tree

Question:
Given a graph of N nodes and M edges, the edges are indexed from 1 -> M. It is guaranteed that there's a path between any 2 nodes.
You need to assign weights for M edges. The weights are in the range of [1...M], and each number can only occur once.
To be shorted, the answer should be a permutation array of [1...M], in which arr[i] = x means edge[i] has the weight of x.
You are given a set R of n-1 edges. R is guaranteed to be a Spanning Tree of the graph.
Find a way to assign weights so that R is the Minimum Spanning Tree of the graph, if there are multiple answers, print the one with minimum lexicographical order.
Contraints:
N, M <= 10^6
Example:
Edges:
3 4
1 2
2 3
1 3
1 4
R = [2, 4, 5]
Answer: 3 4 5 1 2
Explaination:
If you assign weights for the graph like the above image, the MST would be the set R, and it has the smallest lexicographical order.
My take with O(N^2):
Since it asks for the minimum lexicographical order, I traverse through the list of edges, assigning the weights in an increasing order. Intially, w = 1. There can be 3 situations:
If edge[i] is in R, assign weight[i] = w, increase w by 1
If edge[i] is not in R: say edge[i] connect nodes u and v. assign weight and increase w for each edge in the path from u to v in R (if that edge is not assigned yet). Then assign weight and increase w for edge[i]
If edge[i] is assigned, skip it
Is there any way to improve my solution so that it can work in O(N.logN) or less?
Yes, there's an O(m log m)-time algorithm.
The fundamental cycle of a non-tree edge e is comprised of e and the path in the tree between the endpoints of e. Given weights, the spanning tree is minimum if and only if, for every non-tree edge e, the heaviest edge in the fundamental cycle of e is e itself.
The lexicographic objective lends itself to a greedy algorithm, where we find the least valid assignment for edge 1, then edge 2 given edge 1, then edge 3 given the previous edges, etc. Here's the core idea: if the next unassigned edge is a non-tree edge, assign the next numbers to the unassigned tree edges in its fundamental cycle; then assign the next number.
In the example, edge 3-4 is first, and edges 1-3 and 1-4 complete its fundamental cycle. Therefore we assign 1-3 → 1 and 1-4 → 2 and then 3-4 → 3. Next is 1-2, a tree edge, so 1-2 → 4. Finally, 2-3 → 5 (1-2 and 1-3 are already assigned).
To implement this efficiently, we need two ingredients: a way to enumerate the unassigned edges in a fundamental cycle, and a way to assign numbers. My proposal for the former would be to store the spanning tree with the assigned edges contracted. We don't need anything fancy; start by rooting the spanning tree somewhere and running depth-first search to record parent pointers and depths. The fundamental cycle of e will be given by the paths to the least common ancestor of the endpoints of e. To do the contraction, we add a Boolean field indicating whether the parent edge is contracted, then use the path compression trick from disjoint-set forests. The work will be O(m log m) worst case, but O(m) average case. I think there's a strong possibility that the offline least common ancestor algorithms can be plugged in here to get the worst case down to O(m).
As for number assignment, we can handle this in linear time. For each edge, record the index of the edge that caused it to be assigned. At the end, stably bucket sort by this index, breaking ties by putting tree edges before non-tree. This can be done in O(m) time.

Minimum Hop Count in Directed Graph based on Conditional Statement

A directed graph G is given with Vertices V and Edges E, representing train stations and unidirectional train routes respectively.
Trains of different train numbers move in between pairs of Vertices in a single direction.
Vertices of G are connected with one another through trains with allotted train numbers.
A hop is defined when a passenger needs to shift trains while moving through the graph. The passenger needs to shift trains only if the train-number changes.
Given two Vertices V1 and V2, how would one go about calculating the minimum number of hops needed to reach V2 starting from V1?
In the above example, the minimum number of hops between Vertices 0 and 3 is 1.
There are two paths from 0 to 3, these are
0 -> 1 -> 2 -> 7-> 3
Hop Count 4
Hop Count is 4 as the passenger has to shift from Train A to B then C and B again.
and
0 -> 5 -> 6 -> 8 -> 7 -> 3
Hop Count 1
Hop Count is 1 as the passenger needs only one train route, B to get from Vertices 0 to 3
Thus the minimum hop count is 1.
Input Examples
Input Graph Creation
Input To be solved
Output Example
Output - Solved with Hop Counts
0 in the Hop Count column implies that the destination can't be reached
Assuming number of different trainIDs is relatively small (like 4 in your example), then I suggest using layered graph approach.
Let number of vertices be N, number of edges M, and number of different trainIDs K.
Let's divide our graph to K graphs. (graphA, graphB, ...)
graphA contains only edges labeled with A, and so on.
Weight of each edge in each of the graphs is 0.
Now create edges between these graphs.
Edge between graphs is representing a 'hop'
grapha[i] connects to graphB[i], graphC[i], ...
Each of these edges has weight 1.
Now for every graph run Dijkstra's shortest path algorithm from V1 in that graph, and read results from V2 in all graphs, take minimal value.
This way minimum of results for running dijkstra's for every graph will be your minimum number of hops.
Memory complexity is O(K*(N+M))
And time complexity is O(K*(((2^K)*N+M)*log(KV)))
(2^K)*N comes from fact that for every 1<=i<=N, vertices graphA[i],graphB[i],... must be connected to each other, and this gives 2^K connections for every i, and (2^K)*N connections in total.
For cases where K is relatively small, like 4 in your example, but N and M are quite big, this algorithm works like a charm. It isn't suitable for situation where K is big though.
I'm not sure if that's clear. Tell me if you need more detailed explanation.
EDIT:
Hope this makes my algorithm more clear.
Black edges have weight 0, and red edges have weight 1.
By using layered graph approach, we translated our special graph into plain weighted graph, so we can just run Dijkstra's algorithm on it.
Sorry for ugly image.
EDIT:
Since max K = 10, we would like to remove 2^K from our time complexity. I believe this can be done by making edges that represent possible hops virtual, instead of physically storing them on adjacency list.

How to calculate maximal parallelism in a DAG?

Given a DAG (directed acyclic graph), how does one calculate the maximal parallelism?
Instantaneous parallelism is the maximum number of processors that can be kept busy at each point in execution of algorithm; the maximal parallelism is the highest instantaneous parallelism.
Put another way, given a DAG representing a dependency graph of tasks, what is the minimum number of processors/threads such that no task is ever blocked?
The closest approach I found here is:
apply a topological sort on the DAG
traverse over the nodes by the topological order, calculate the minimum level:
no parents: 0
otherwise: minimum parent level + 1
return the max level width (max num of nodes assigned the same level)
This algorithm worked for me on several samples, however doesn't work on a tree. E.g.:
o 0
/ \
o 1 o 1
/ \
o 2 o 2
/ \
o 3 o 3
According to the algorithm above, max width is 2, but clearly max parallelism in a tree is the number of leafs, 4 in the example above.
A similar approach is partially described here (see slide titled Computing critical path etc., which describes how to calculate earliest start times of nodes and that "maximal...parallelism can easily be computed from this").
Edit 1:
#AliSoltani's solution to use BFS to find the length of the critical path and that is the max parallelism degree is incorrect, since it only applies to a subset of examples, mainly trees in which the number of leafs is equal to the longest path. Here's an illustration of a case where this wouldn't work:
Edit 2:
#AliSultani's 2nd solution using BFS to find the level with maximum number of nodes, and set that max as the max parallelism, is also incorrect, as it doesn't take into account cases where nodes from different levels may run concurrently. See this counterexample:
This problem is reducible to the Maximum Directed Cut problem.
Let's build an auxiliary DAG from the original one.
For every vertex u[i] of the original graph add vertexes v[i] and w[i] to the new graph, and connect them using an edge (v[i], w[i]) with a cost 1.
For every edge (u[i], u[j]) of the original graph add an edge (w[i], v[j]) with a cost 0 to the new graph.
Now the problem is equivalent to finding the maximum directed cut in the auxiliary graph.
Example:
You should find critical path length in DAG. A critical path is a directed
path that has the maximum execution requirement among all other paths in DAG. critical path length in DAG with n node has n node. So maximal parallelism is n.
Critical path is longest path from root to leaf (in DAG) and for find it you can use BFS algorithm (Breath First Search).
Example 1
BFS order in this tree is O(|V|+|E|). This is optimal solution for this problem.
Edit: Find maximum degree of concurrency by BFS
You can determine the maximum degree of concurrency by running the breadth-first search algorithm too:
The algorithm starts from the root node and proceeds towards the
leafs level-wise.
before inspecting nodes located on the next level it explores all of
the nodes belonging to the same level.
Count the number of nodes on each level and update a variable holding
the maximum number of nodes per level.
Example 2 (Step by step)
So in this example maximum degree of concurrency is 4.
Final Edit
With the last explanations you gave, Maximal independent set of tasks is what you are looking for. To solve this problem see this article.
I have not tested the algorithm, but my proposal would be the following:
Start from the origin node.
Select each connected edge. Current concurrency is the number of selected edges. Remember that.
Sort the selected nodes which are connected by the edges by the number of outgoing edges. Ignore all nodes, which have incoming edges which weren't yet selected.
Start going down the edge with the node with the most outgoing edges.
If not at end node: Repeat from 2)
Get the maximum of current concurrency for all iterations.
Here is an implementation in python using networkx. The document you have linked does something different. It calculates the number of concurrent tasks when the graph is executed with the attached timings to the nodes (1 for each node in that case). This is an easy tasks and probably the one the author of the document refers to. My algorithm however calculates the theoretical maximum and does not take the running time of each task into account.

Given a node network, how to find the highest scoring loop with finite number of moves?

For a project of mine, I'm attempting to create a solver that, given a random set of weighted nodes with weighted paths, will find the highest scoring path with a finite number of moves. I've created a visual to help describe the problem.
This example has all the connection edges shown for completeness. The number on edges are traversal costs and numbers inside nodes are scores. A node is only counted when traversed to and cannot traverse to itself from itself.
As you can see from the description in the image, there is a start/finish node with randomly placed nodes that each have a arbitrary score. Every node is connected to all other nodes and every connection has an arbitrary weight that subtracts from the total number of move units remaining. For simplicity, you could assume the weight of a connection is a function of distance. Nodes can be traveled to more than once and their score is applied again. The goal is to find a loop path that has the highest score for the given move limit.
The solver will never be dealing with more than 30 nodes, usually dealing with 10-15 nodes. I still need to try and make it as fast as possible.
Any ideas on algorithms or methods that would help me solve this problem other than pure brute force methods?
Here's an O(m n^2)-time algorithm, where m is the number of moves and n is the number of nodes.
For every time t in {0, 1, ..., m} and every node v, compute the maximum score of a t-step walk that begins at the start node and ends at v as follows. If t = 0, then there's only walk, namely, doing nothing at the start node, so the maximum for (0, v) is 0 if v is the start node and -infinity (i.e., impossible) otherwise.
For t > 0, we use the entries for t - 1 to compute the entries for t. To compute the (t, v) entry, we add the score for v to the difference of the maximum over all nodes w of the (t - 1, w) entry minus the transition penalty from w to v. In other words, an optimal t-step walk to v consists of a step from some node w to v preceded by a (t - 1)-step walk to w, and this (t - 1)-step walk must be optimal because history does not influence future scoring.
At the end, we look at the (m, start node) entry. To recover the actual walk involves working backward and determining repeatedly which w was the best node to have come from.

Minimum number of days required to solve a list of questions

There are N problems numbered 1..N which you need to complete. You've arranged the problems in increasing difficulty order, and the ith problem has estimated difficulty level i. You have also assigned a rating vi to each problem. Problems with similar vi values are similar in nature. On each day, you will choose a subset of the problems and solve them. You've decided that each subsequent problem solved on the day should be tougher than the previous problem you solved on that day. Also, to not make it boring, consecutive problems you solve should differ in their vi rating by at least K. What is the least number of days in which you can solve all problems?
Input:
The first line contains the number of test cases T. T test cases follow. Each case contains an integer N and K on the first line, followed by integers v1,...,vn on the second line.
Output:
Output T lines, one for each test case, containing the minimum number of days in which all problems can be solved.
Constraints:
1 <= T <= 100
1 <= N <= 300
1 <= vi <= 1000
1 <= K <= 1000
Sample Input:
2
3 2
5 4 7
5 1
5 3 4 5 6
Sample Output:
2
1
This is one of the challenge from interviewstreet.
Below is my approach
Start from 1st question and find out max possible number of question can be solve and remove these questions from the question list.Now again start from first element of the remainning list and do this till now size of the question list is 0.
I am getting wrong answer from this method so looking for some algo to solve this challenge.
Construct a DAG of problems in the following way. Let pi and pj be two different problems. Then we will draw a directed edge from pi to pj if and only if pj can be solved directly after pi on the same day, consecutively. Namely, the following conditions have to be satisfied:
i < j, because you should solve the less difficult problem earlier.
|vi - vj| >= K (the rating requirement).
Now notice that each subset of problems that is chosen to be solved on some day corresponds to the directed path in that DAG. You choose your first problem, and then you follow the edges step by step, each edge in the path corresponds to the pair of problems that have been solved consecutively on the same day. Also, each problem can be solved only once, so any node in our DAG may appear only in exactly one path. And you have to solve all the problems, so these paths should cover all the DAG.
Now we have the following problem: given a DAG of n nodes, find the minimal number of non-crossing directed paths that cover this DAG completely. This is a well-known problem called Path cover. Generally speaking, it is NP-hard. However, our directed graph is acyclic, and for acyclic graphs it can be solved in polynomial time using reduction to the matching problem. Maximal matching problem, in its turn, can be solved using Hopcroft-Karp algorithm, for example. The exact reduction method is easy and can be read, say, on Wikipedia. For each directed edge (u, v) of the original DAG one should add an undirected edge (au, bv) to the bipartite graph, where {ai} and {bi} are two parts of size n.
The number of nodes in each part of the resulting bipartite graph is equal to the number of nodes in the original DAG, n. We know that Hopcroft-Karp algorithm runs in O(n2.5) in the worst case, and 3002.5 ≈ 1 558 845. For 100 tests this algorithm should take under a 1 second in total.
The algorithm is simple. First, sort the problems by v_i, then, for each problem, find the number of problems in the interval (v_i-K, v_i]. The maximum of those numbers is the result. The second phase can be done in O(n), so the most costly operation is sorting, making the whole algorithm O(n log n). Look here for a demonstration of the work of the algorithm on your data and K=35 in a spreadsheet.
Why does this work
Let's reformulate the problem to the problem of graph coloring. We create graph G as follows: vertices will be the problems and there will be an edge between two problems iff |v_i - v_j| < K.
In such graph, independent sets exactly correspond to sets of problems doable on the same day. (<=) If the set can be done on a day, it is surely an independent set. (=>) If the set doesn't contain two problems not satisfying the K-difference criterion, you can just sort them according to the difficulty and solve them in this order. Both condition will be satisfied this way.
Therefore, it easily follows that colorings of graph G exactly correspond to schedules of the problems on different days, with each color corresponding to one day.
So, we want to find the chromaticity of graph G. This will be easy once we recognize the graph is an interval graph, which is a perfect graph, those have chromaticity equal to cliqueness, and both can be found by a simple algorithm.
Interval graphs are graphs of intervals on the real line, edges are between intervals that intersect. Our graph, as can be easily seen, is an interval graph (for each problem, assign an interval (v_i-K, v_i]. It can be easily seen that the edges of this interval graph are exactly the edges of our graph).
Lemma 1: In an interval graph, there exist a vertex whose neighbors form a clique.
Proof is easy. You just take the interval with the lowest upper bound (or highest lower bound) of all. Any intervals intersecting it have the upper bound higher, therefore, the upper bound of the first interval is contained in the intersection of them all. Therefore, they intersect each other and form a clique. qed
Lemma 2: In a family of graphs closed on induced subgraphs and having the property from lemma 1 (existence of vertex, whose neighbors form a clique), the following algorithm produces minimal coloring:
Find the vertex x, whose neighbors form a clique.
Remove x from the graph, making its subgraph G'.
Color G' recursively
Color x by the least color not found on its neighbors
Proof: In (3), the algorithm produces optimal coloring of the subgraph G' by induction hypothesis + closeness of our family on induced subgraphs. In (4), the algorithm only chooses a new color n if there is a clique of size n-1 on the neighbors of x. That means, with x, there is a clique of size n in G, so its chromaticity must be at least n. Therefore, the color given by the algorithm to a vertex is always <= chromaticity(G), which means the coloring is optimal. (Obviously, the algorithm produces a valid coloring). qed
Corollary: Interval graphs are perfect (perfect <=> chromaticity == cliqueness)
So we just have to find the cliqueness of G. That is easy easy for interval graphs: You just process the segments of the real line not containing interval boundaries and count the number of intervals intersecting there, which is even easier in your case, where the intervals have uniform length. This leads to an algorithm outlined in the beginning of this post.
Do we really need to go to path cover? Can't we just follow similar strategy as LIS.
The input is in increasing order of complexity. We just have to maintain a bunch of queues for tasks to be performed each day. Every element in the input will be assigned to a day by comparing the last elements of all queues. Wherever we find a difference of 'k' we append the task to that list.
For ex: 5 3 4 5 6
1) Input -> 5 (Empty lists so start a new one)
5
2) 3 (only list 5 & abs(5-3) is 2 (k) so append 3)
5--> 3
3) 4 (only list with last vi, 3 and abs(3-4) < k, so start a new list)
5--> 3
4
4) 5 again (abs(3-5)=k append)
5-->3-->5
4
5) 6 again (abs(5-6) < k but abs(4-6) = k) append to second list)
5-->3-->5
4-->6
We just need to maintain an array with last elements of each list. Since order of days (when tasks are to be done is not important) we can maintain a sorted list of last tasks therefore, searching a place to insert the new task is just looking for the value abs(vi-k) which can be done via binary search.
Complexity:
The loop runs for N elements. In the worst case , we may end up querying ceil value using binary search (log i) for many input[i].
Therefore, T(n) < O( log N! ) = O(N log N). Analyse to ensure that the upper and lower bounds are also O( N log N ). The complexity is THETA (N log N).
The minimum number of days required will be the same as the length of the longest path in the complementary (undirected) graph of G (DAG). This can be solved using Dilworth's theorem.

Resources