How to calculate maximal parallelism in a DAG? - algorithm

Given a DAG (directed acyclic graph), how does one calculate the maximal parallelism?
Instantaneous parallelism is the maximum number of processors that can be kept busy at each point in execution of algorithm; the maximal parallelism is the highest instantaneous parallelism.
Put another way, given a DAG representing a dependency graph of tasks, what is the minimum number of processors/threads such that no task is ever blocked?
The closest approach I found here is:
apply a topological sort on the DAG
traverse over the nodes by the topological order, calculate the minimum level:
no parents: 0
otherwise: minimum parent level + 1
return the max level width (max num of nodes assigned the same level)
This algorithm worked for me on several samples, however doesn't work on a tree. E.g.:
o 0
/ \
o 1 o 1
/ \
o 2 o 2
/ \
o 3 o 3
According to the algorithm above, max width is 2, but clearly max parallelism in a tree is the number of leafs, 4 in the example above.
A similar approach is partially described here (see slide titled Computing critical path etc., which describes how to calculate earliest start times of nodes and that "maximal...parallelism can easily be computed from this").
Edit 1:
#AliSoltani's solution to use BFS to find the length of the critical path and that is the max parallelism degree is incorrect, since it only applies to a subset of examples, mainly trees in which the number of leafs is equal to the longest path. Here's an illustration of a case where this wouldn't work:
Edit 2:
#AliSultani's 2nd solution using BFS to find the level with maximum number of nodes, and set that max as the max parallelism, is also incorrect, as it doesn't take into account cases where nodes from different levels may run concurrently. See this counterexample:

This problem is reducible to the Maximum Directed Cut problem.
Let's build an auxiliary DAG from the original one.
For every vertex u[i] of the original graph add vertexes v[i] and w[i] to the new graph, and connect them using an edge (v[i], w[i]) with a cost 1.
For every edge (u[i], u[j]) of the original graph add an edge (w[i], v[j]) with a cost 0 to the new graph.
Now the problem is equivalent to finding the maximum directed cut in the auxiliary graph.
Example:

You should find critical path length in DAG. A critical path is a directed
path that has the maximum execution requirement among all other paths in DAG. critical path length in DAG with n node has n node. So maximal parallelism is n.
Critical path is longest path from root to leaf (in DAG) and for find it you can use BFS algorithm (Breath First Search).
Example 1
BFS order in this tree is O(|V|+|E|). This is optimal solution for this problem.
Edit: Find maximum degree of concurrency by BFS
You can determine the maximum degree of concurrency by running the breadth-first search algorithm too:
The algorithm starts from the root node and proceeds towards the
leafs level-wise.
before inspecting nodes located on the next level it explores all of
the nodes belonging to the same level.
Count the number of nodes on each level and update a variable holding
the maximum number of nodes per level.
Example 2 (Step by step)
So in this example maximum degree of concurrency is 4.
Final Edit
With the last explanations you gave, Maximal independent set of tasks is what you are looking for. To solve this problem see this article.

I have not tested the algorithm, but my proposal would be the following:
Start from the origin node.
Select each connected edge. Current concurrency is the number of selected edges. Remember that.
Sort the selected nodes which are connected by the edges by the number of outgoing edges. Ignore all nodes, which have incoming edges which weren't yet selected.
Start going down the edge with the node with the most outgoing edges.
If not at end node: Repeat from 2)
Get the maximum of current concurrency for all iterations.
Here is an implementation in python using networkx. The document you have linked does something different. It calculates the number of concurrent tasks when the graph is executed with the attached timings to the nodes (1 for each node in that case). This is an easy tasks and probably the one the author of the document refers to. My algorithm however calculates the theoretical maximum and does not take the running time of each task into account.

Related

Good algorithm for finding shortest path for specific vertices

I'm solving the problem described below and can't think of a better algorithm than trying every permutation of every vertex of every group with every.
I'm given a graph of vertices, along with a list of groups of specific vertices, the goal is to find the shortest path from a specific starting vertex to a specific ending vertex, and the path must pass through at least one vertex from each specified group of vertices.
There are also vertices in the graph that are not part of any given group.
Re-visiting vertices and edges is possible.
The graph data is specified as follows:
Vertex list - each vertex is identified by a sequence number (0 to the number of vertices -1 )
Edge list - list of vertex pairs (by vertex number)
Vertex group list - list of lists of vector numbers
A specific starting and ending vertex.
I would be grateful for any ideas for a better solution, thank you.
Summary:
We can use bitmasks to efficiently check which groups we have visited so far, and combine this with a traditional BFS/ Dijkstra's shortest-path algorithm.
If we assume E edges, V vertices, and K vertex-groups that have to be included, the below algorithm has a time complexity of O((V + E) * 2^K) and a space complexity of O(V * 2^K). The exponential 2^K term means it will only work for a relatively small K, say up to 10 or 20.
Details:
First, are the edges weighted?
If yes then a "shortest path" algorithm will usually be a variation of Dijkstra's algorithm, in which we keep a (min) priority queue of the shortest paths. We only visit a node once it's at the top of the queue, meaning that this must be the shortest path to this node. Any other shorter path to this node would already have been added to the priority queue and would come before the current iteration. (Note: this doesn't work for negative paths).
If no, meaning all edges have the same weight, then there is no need to maintain a priority queue with the shortest edges. We can instead just run a regular Breadth-first search (BFS), in which we maintain a deque with all nodes at the current depth. At each step we iterate over all nodes at the current depth (popping them from the left of the deque), and for each node we add all it's not-yet-visited neighbors to the right side of the deque, forming the next level.
The below algorithm works for both BFS and Dijkstra's, but for simplicity's sake for the rest of the answer I'll pretend that the edges have positive weights and we will use Dijkstra's. What is important to take away though is that for either algorithm we will only "visit" or "explore" a node for a path that must be the shortest path to that node. This property is essential for the algorithm to be efficient, since we know that we will at most visit each of the V nodes and E edges only one time, giving us a time complexity of O(V + E). If we use Dijkstra's we have to multiply this with log(V) for the priority queue usage (this also applies to the time complexity mentioned in the summary).
Our Problem
In our case we have the additional complexity that we have K vertex-groups, for each of which our shortest path has to contain at least one the nodes in it. This is a big problem, since it destroys our ability to simple go along with the "shortest current path".
See for example this simple graph. Notation: -- means an edge, start is that start node, and end is the end node. A vertex with value 0 does not have a vertex-group, and a vertex with value >= 1 belongs to the vertex-group of that index.
end -- 0 -- 2 -- start -- 1 -- 2
It is clear that the optimal path will first move right to the node in group 1, and then move left until the end. But this is impossible to do for the BFS and Dijkstra's algorithm we introduced above! After we move from the start to the right to capture the node in group 1, we would never ever move back left to the start, since we have already been there with a shorter path.
The Trick
In the above example, if the right-hand side would have looked like start -- 0 -- 0, where 0 means the vertex does not not belonging to a group, then it would be of no use to go there and back to the start.
The decisive reason of why it makes sense to go there and come back, although the path will get longer, is that it includes a group that we have not seen before.
How can we keep track of whether or not at a current position a group is included or not? The most efficient solution is a bit mask. So if we for example have already visited a node of group 2 and 4, then the bitmask would have a bit set at the position 2 and 4, and it would have the value of 2 ^ 2 + 2 ^ 4 == 4 + 16 == 20
In the regular Dijkstra's we would just keep a one-dimensional array of size V to keep track of what the shortest path to each vertex is, initialized to a very high MAX value. array[start] begins with value 0.
We can modify this method to instead have a two-dimensional array of dimensions [2 ^ K][V], where K is the number of groups. Every value is initialized to MAX, only array[mask_value_of_start][start] begins with 0.
The value we store at array[mask][node] means Given the already visited groups with bit-mask value of mask, what is the length of the shortest path to reach this node?
Suddenly, Dijkstra's resurrected
Once we have this structure, we can suddenly use Dijkstra's again (it's the same for BFS). We simply change the rules a bit:
In regular Dijkstra's we never re-visit a node
--> in our modification we differentiate by mask and never re-visit a node if it's already been visited for that particular mask.
In regular Dijkstra's, when exploring a node, we look at all neighbors and only add them to the priority queue if we managed to decrease the shortest path to them.
--> in our modification we look at all neighbors, and update the mask we use to check for this neighbor like: neighbor_mask = mask | (1 << neighbor_group_id). We only add a {neighbor_mask, neighbor} pair to the priority queue, if for that particular array[neighbor_mask][neighbor] we managed to decrease the minimal path length.
In regular Dijkstra's we only visit unexplored nodes with the current shortest path to it, guaranteeing it to be the shortest path to this node
--> In our modification we only visit nodes that for their respective mask values are not explored yet. We also only visit the current shortest path among all masks, meaning that for any given mask it must be the shortest path.
In regular Dijkstra's we can return once we visit the end node, since we are sure we got the shortest path to it.
--> In our modification we can return once we visit the end node for the full mask, meaning the mask containing all groups, since it must be the shortest path for the full mask. This is the answer to our problem.
If this is too slow...
That's it! Because time and space complexity are exponentially dependent on the number of groups K, this will only work for very small K (of course depending on the number of nodes and edges).
If this is too slow for your requirements then there might be a more sophisticated algorithm for this that someone smarter can come up with, it will probably involve dynamic programming.
It is very possible that this is still too slow, in which case you will probably want to switch to some heuristic, that sacrifices accuracy in order to gain more speed.

How to find widest paths collection on a directed weighted graph

Consider the following graph:
nodes 1 to 6 are connected with a transition edge that have a direction and a volume property (red numbers). I'm looking for the right algorithm to find paths with a high volume. In the above example the output should be:
Path: [4,5,6] with a minimal volume of 17
Path: [1,2,3] with a
minimal volume of 15
I've looked at Floyd–Warshall algorithm but I'm not sure it's the right approach.
Any resources, comments or ideas would be appreciated.
Finding a beaten graph:
In the comments, you clarify that you are looking for "beaten" paths. I am assume this means that you are trying to contrast the paths with the average; for instance, looking for paths which can support weight at least e*w, where 0<e and w is the average edge weight. (You could have any number of contrast functions here, but the function you choose does not affect the algorithm.)
Then the algorithm to find all paths that meet this condition is incredibly simple and only takes O(m) time:
Loop over all edges to find the average weight. (Takes O(m) time.)
Calculate the threshold based on the average. (Takes O(1) time.)
Remove all edges which do not support the threshold weight. (Takes O(m) time.)
Any path in the resulting graph will be a member of the "widest path collection."
Example:
Consider that e=1.5. That is, you require that a beaten path support at least 1.5x the average edge weight. Then in graph you provided, you will loop over all the edges to find their average weight, and multiply this by e:
((20+4)+15+3+(2+20)+(1+1+17))/9 = 9.2
9.2*1.5 = 13.8
Then you loop over all edges, removing any that have weight less than 13.8. Any remaining paths in the graph are "beaten" paths.
Enumerating all beaten paths:
If you then want to find the set of beaten paths with maximal length (that is, they are not "parts" of paths), the modified graph is must be a DAG (because a cycle can be repeated infinite times). If it is a DAG, you can find the set of all maximal paths by:
In your modified graph, select the set of all source nodes (no incoming edges).
From each of these source nodes, perform a DFS (allowing repeated visits to the same node).
Every time you get to a sink node (no outgoing edges), write down the path that you took to get here.
This will take up to O(IncompleteGamma[n,1]) time (super exponential), depending on your graph. That is, it is not very feasible.
Finding the widest paths:
An actually much simpler task is to find the widest paths between every pair of nodes. To do this:
Start from the modified graph.
Run Floyd-Warshall's, using pathWeight(i,j,k+1) = max[pathWeight(i,j,k), min[pathWeight(i,k+1,k), pathWeight(k+1,j,k)]] (that is, instead of adding the weights of two paths, you take the minimum volume they can support).

how to partition the nodes of an undirected graph into k sets

I have an undirected graph G=(V,E) where each vertex represents a router in a large network. Each edge represents a network hop from one router to the other therefore, all edges have the same weight. I wish to partition this network of routers into 3 or k different sets clustered by Hop count.
Motivation:
The idea is to replicate some data in routers contained in each of these 3 sets. This is so that whenever a node( or client or whatever) in the network graph requests for a certain data item, I can search for it in the 3 sets and get a responsible node(one that has cached that particular data) from each set. Then I'd select the node which is at the minimum hop count away from the requesting node and continue with my algorithms and tests.
The cache distribution and request response are not in the scope of this question. I just need a way to partition the network into 3 sets so that I can perform the above operations on it.
Which clustering algorithm could be used in such a scenario. I have almost 9000 nodes in the graph and I wish to get 3 sets of ~3000 nodes each
In the graph case, a clustering method based on minimum spanning trees can be used.
The regular algorithm is the following:
Find the minimum spanning tree of the graph.
Remove the k - 1 longest edges in the spanning tree, where k is the desired number of clusters.
However, this works only if the edges differ in length (or weight). In the case of edges of equal length, every spanning tree is a minimum one so this would not work well. However, putting a little thinking into it, a different algorithm came to my mind which uses BFS.
The algorithm:
1. for i = 1..k do // for each cluster
2. choose the number of nodes N in cluster i
3. choose an arbitrary node n
4. run breadth-first search (BFS) from n until N
5. assign the first N nodes (incl. n) tapped by the BFS to the i-th cluster
6. remove these nodes (and the incident edges) from the graph
7. done
This algorithm (the results) hugely depends on how step 3, i.e. choosing the "root" node of a cluster, is implemented. For the sake of simplicity I choose an arbitrary node, but it could be more sophisticated. The best nodes are those that are the at the "end" of the graph. You could find a center of the graph (a node that has the lowest sum of lengths of paths to all other nodes) and then use the nodes that are the furthes from this center.
The real issue is that your edges are equal (if I understood your problem statement well) and you have no information about the nodes (i.e. their coordinates - then you could use e.g. k-means).

How to detect if the given graph has a cycle containing all of its nodes? Does the suggested algorithm have any flaws?

I have a connected, non-directed, graph with N nodes and 2N-3 edges. You can consider the graph as it is built onto an existing initial graph, which has 3 nodes and 3 edges. Every node added onto the graph and has 2 connections with the existing nodes in the graph. When all nodes are added to the graph (N-3 nodes added in total), the final graph is constructed.
Originally I'm asked, what is the maximum number of nodes in this graph that can be visited exactly once (except for the initial node), i.e., what is the maximum number of nodes contained in the largest Hamiltonian path of the given graph? (Okay, saying largest Hamiltonian path is not a valid phrase, but considering the question's nature, I need to find a max. number of nodes that are visited once and the trip ends at the initial node. I thought it can be considered as a sub-graph which is Hamiltonian, and consists max. number of nodes, thus largest possible Hamiltonian path).
Since i'm not asked to find a path, I should check if a hamiltonian path exists for given number of nodes first. I know that planar graphs and cycle graphs (Cn) are hamiltonian graphs (I also know Ore's theorem for Hamiltonian graphs, but the graph I will be working on will not be a dense graph with a great probability, thus making Ore's theorem pretty much useless in my case). Therefore I need to find an algorithm for checking if the graph is cycle graph, i.e. does there exist a cycle which contains all the nodes of the given graph.
Since DFS is used for detecting cycles, I thought some minor manipulation to the DFS can help me detect what I am looking for, as in keeping track of explored nodes, and finally checking if the last node visited has a connection to the initial node. Unfortunately
I could not succeed with that approach.
Another approach I tried was excluding a node, and then try to reach to its adjacent node starting from its other adjacent node. That algorithm may not give correct results according to the chosen adjacent nodes.
I'm pretty much stuck here. Can you help me think of another algorithm to tell me if the graph is a cycle graph?
Edit
I realized by the help of the comment (thank you for it n.m.):
A cycle graph consists of a single cycle and has N edges and N vertices. If there exist a cycle which contains all the nodes of the given graph, that's a Hamiltonian cycle. – n.m.
that I am actually searching for a Hamiltonian path, which I did not intend to do so:)
On a second thought, I think checking the Hamiltonian property of the graph while building it will be more efficient, which is I'm also looking for: time efficiency.
After some thinking, I thought whatever the number of nodes will be, the graph seems to be Hamiltonian due to node addition criteria. The problem is I can't be sure and I can't prove it. Does adding nodes in that fashion, i.e. adding new nodes with 2 edges which connect the added node to the existing nodes, alter the Hamiltonian property of the graph? If it doesn't alter the Hamiltonian property, how so? If it does alter, again, how so? Thanks.
EDIT #2
I, again, realized that building the graph the way I described might alter the Hamiltonian property. Consider an input given as follows:
1 3
2 3
1 5
1 3
these input says that 4th node is connected to node 1 and node 3, 5th to node 2 and node 3 . . .
4th and 7th node are connected to the same nodes, thus lowering the maximum number of nodes that can be visited exactly once, by 1. If i detect these collisions (NOT including an input such as 3 3, which is an example that you suggested since the problem states that the newly added edges are connected to 2 other nodes) and lower the maximum number of nodes, starting from N, I believe I can get the right result.
See, I do not choose the connections, they are given to me and I have to find the max. number of nodes.
I think counting the same connections while building the graph and subtracting the number of same connections from N will give the right result? Can you confirm this or is there a flaw with this algorithm?
What we have in this problem is a connected, non-directed graph with N nodes and 2N-3 edges. Consider the graph given below,
A
/ \
B _ C
( )
D
The Graph does not have a Hamiltonian Cycle. But the Graph is constructed conforming to your rules of adding nodes. So searching for a Hamiltonian Cycle may not give you the solution. More over even if it is possible Hamiltonian Cycle detection is an NP-Complete problem with O(2N) complexity. So the approach may not be ideal.
What I suggest is to use a modified version of Floyd's Cycle Finding algorithm (Also called the Tortoise and Hare Algorithm).
The modified algorithm is,
1. Initialize a List CYC_LIST to ∅.
2. Add the root node to the list CYC_LIST and set it as unvisited.
3. Call the function Floyd() twice with the unvisited node in the list CYC_LIST for each of the two edges. Mark the node as visited.
4. Add all the previously unvisited vertices traversed by the Tortoise pointer to the list CYC_LIST.
5. Repeat steps 3 and 4 until no more unvisited nodes remains in the list.
6. If the list CYC_LIST contains N nodes, then the Graph contains a Cycle involving all the nodes.
The algorithm calls Floyd's Cycle Finding Algorithm a maximum of 2N times. Floyd's Cycle Finding algorithm takes a linear time ( O(N) ). So the complexity of the modied algorithm is O(N2) which is much better than the exponential time taken by the Hamiltonian Cycle based approach.
One possible problem with this approach is that it will detect closed paths along with cycles unless stricter checking criteria are implemented.
Reply to Edit #2
Consider the Graph given below,
A------------\
/ \ \
B _ C \
|\ /| \
| D | F
\ / /
\ / /
E------------/
According to your algorithm this graph does not have a cycle containing all the nodes.
But there is a cycle in this graph containing all the nodes.
A-B-D-C-E-F-A
So I think there is some flaw with your approach. But suppose if your algorithm is correct, it is far better than my approach. Since mine takes O(n2) time, where as yours takes just O(n).
To add some clarification to this thread: finding a Hamiltonian Cycle is NP-complete, which implies that finding a longest cycle is also NP-complete because if we can find a longest cycle in any graph, we can find the Hamiltonian cycle of the subgraph induced by the vertices that lie on that cycle. (See also for example this paper regarding the longest cycle problem)
We can't use Dirac's criterion here: Dirac only tells us minimum degree >= n/2 -> Hamiltonian Cycle, that is the implication in the opposite direction of what we would need. The other way around is definitely wrong: take a cycle over n vertices, every vertex in it has exactly degree 2, no matter the size of the circle, but it has (is) an HC. What you can tell from Dirac is that no Hamiltonian Cycle -> minimum degree < n/2, which is of no use here since we don't know whether our graph has an HC or not, so we can't use the implication (nevertheless every graph we construct according to what OP described will have a vertex of degree 2, namely the last vertex added to the graph, so for arbitrary n, we have minimum degree 2).
The problem is that you can construct both graphs of arbitrary size that have an HC and graphs of arbitrary size that do not have an HC. For the first part: if the original triangle is A,B,C and the vertices added are numbered 1 to k, then connect the 1st added vertex to A and C and the k+1-th vertex to A and the k-th vertex for all k >= 1. The cycle is A,B,C,1,2,...,k,A. For the second part, connect both vertices 1 and 2 to A and B; that graph does not have an HC.
What is also important to note is that the property of having an HC can change from one vertex to the other during construction. You can both create and destroy the HC property when you add a vertex, so you would have to check for it every time you add a vertex. A simple example: take the graph after the 1st vertex was added, and add a second vertex along with edges to the same two vertices of the triangle that the 1st vertex was connected to. This constructs from a graph with an HC a graph without an HC. The other way around: add now a 3rd vertex and connect it to 1 and 2; this builds from a graph without an HC a graph with an HC.
Storing the last known HC during construction doesn't really help you because it may change completely. You could have an HC after the 20th vertex was added, then not have one for k in [21,2000], and have one again for the 2001st vertex added. Most likely the HC you had on 23 vertices will not help you a lot.
If you want to figure out how to solve this problem efficiently, you'll have to find criteria that work for all your graphs that can be checked for efficiently. Otherwise, your problem doesn't appear to me to be simpler than the Hamiltonian Cycle problem is in the general case, so you might be able to adjust one of the algorithms used for that problem to your variant of it.
Below I have added three extra nodes (3,4,5) in the original graph and it does seem like I can keep adding new nodes indefinitely while keeping the property of Hamiltonian cycle. For the below graph the cycle would be 0-1-3-5-4-2-0
1---3---5
/ \ / \ /
0---2---4
As there were no extra restrictions about how you can add a new node with two edges, I think by construction you can have a graph that holds the property of hamiltonian cycle.

Destroy weighted edges in a bidirected graph

On my recent interview I was asked the following question:
There is a bidirectional graph G with no cycles. Each edge has some weight. Also there is a set of nodes K which should be disconnected from each other (by removing some edges). There is only one path between any two nodes in K set. The goal is to minimize total weight of removed edges and disconnect all nodes (from set K) from each other.
My approach was to run BFS for each node from K set and determine all paths between all pairs of nodes from K. So then I'll have set of paths (each path is a set of edges).
My next step is to apply dynamic programming approach to find minimum total weight of removed edges. And here I stuck. Could you please help me (just direct me) of how DP should be done.
Thanks.
This sounds like the Multiway Cut problem in trees, assuming a "bidirectional" graph is just like an undirected one. It can be solved in polynomial time by a straightforward dynamic programming. See Chopra and Rao: "On the multiway cut polyhedron", Networks 21(1):51–89, 1991. See also this question.
This problem can be solved using disjoint Sets. In this problem you need to make set of each vertex like forest. Then, join the vertices which belong to two different sets through the edges which are in the graph such that -
1) if one of the set contains a node which is mentioned in the k set of nodes, then the node becomes the representative element.
2) if both the sets contain the unwanted node (node present in the k set of nodes), then find the minimum weight edge in both the sets and compare them, then compare the minimum one of them with the edge to be joined and find the minimum. Then delete the edge which we found so that no path exists b/w them.
3) if none of the set contain the unwanted node, then join one set to the other set.
In this way you find the minimum total weight of the destroyed edges in a very good time complexity of the order of O(nlogn).
Your approach will also work but it will prove to be costly in terms of time complexity.
Here is the complete code - (GameAlgorithm.cpp)
https://github.com/KARTHEEKCIC/RoboAttack

Resources