I am building a dependency based scheduler where my tasks/nodes form a directed acyclic graph. I have the following constraints and am trying to decide on the most appropriate algorithm:
New tasks can be added (with or without dependencies) at any time
Some tasks can run in parallel
Two algorithms are mentioned repeatedly with regards to topological sorting; depth first search and Kahn's algorithm.
What are the pro's and con's of these two algorithms?
Is one algorithm objectively better for my scenario?
Is there an alternate that better fits my scenario?
I have one further question about vocabulary. Given dependencies such as:
c->b
b->a
e->d
Is this considered to be a single directed acyclic graph, 2 acyclic graphs (since e and d are not dependent on the other tasks) or an acyclic graph with sub acyclic graphs?
I think you misunderstood these algorithms.
Depth First Search:
So the Depth First Search algorithm (I looked it up on wikipedia. There it is actually called an algorithm. I would rather call it a strategy) is an algorithm for traversing through a graph. So to visit all nodes. It will not return you the topological order of your graph!!
Kahn's algorithm: Kahn's algorithm on the other hand, is the right algorithm for your problem. It will return you the topoligical order if there is one. The algorithm has an asymptotic running time of O(m+n) where m is the amount of edges and n is the amount of vertices in your graph. I do not think, that you will find a better algorithm for that problem.
Vocabulary: In your example, you have one DAG (directed acyclic graph) with two weakly connected components.
EDIT:
After your mention of an algorithm based on Depth First Search:
For the asymptotic running time, it does not matter which algorithm you are using. Both are in O(m+n). Also concerning the storage, it actually should not matter.
Related
I want to know about the algorithms for finding block in graph. Which one works good for directed graph and which one for undirected? Also the performance, complexity of those algorithm.
I would like to know the difference between Boruvkas algorithm and Kruskals algorithm.
What they have in common:
both find the minimum spanning tree (MST) in a undirected graph
both add the shortest edge to the existing tree until the MST is found
both look at the graph in it`s entirety, unlike e.g. Prims algorithm, which adds one node after another to the MST
Both algorithmns are greedy
The only difference seems to be, that Boruvkas perspective is each individual node (from where it looks for the cheapest edge), instead of looking at the entire graph (like Kruskal does).
It therefore seems to be, that Boruvka should be relatively easy to do in parallel (unlike Kruskal). Is that true?
In case of Kruskal's algorithm, first of all we want to sort all edges from the cheapest to the most expensive ones. Then in each step we remove min-weight edge and if it doesn't create a cycle in our graph (which initially consits of |V|-1 separate vertices), then we add it to MST. Otherwise we just remove it.
Boruvka's algorithm looks for nearest neighbour of each component (initially vertex). It keeps selecting cheapest edge from each component and adds it to our MST. When we have only one connected component, it's done.
Finding cheapest outgoing edge from each node/component can be done easily in parallel. Then we can just merge new, obtained components and repeat finding phase till we find MST. That's why this algorithm is a good example for parallelism (in case of finding MST).
Regarding parallel processing using Kruskal's algorithm, we need to keep and check edges in strict order, that's why it's hard to achieve explicit parallelism. It's rather sequential and we can't do much about this (even if we still may consider e.g. parallel sorting). Although there were few approaches to implement this method in parallel way, those papers can be found easily to check their results.
Your description is accurate, but one detail can be clarified: Boruvka's algorithm's perspective is each connected component rather than each individual node.
Your intuition about parallelization is also right -- this paper has more details. Excerpt from the abstract:
In this paper we design and implement four parallel MST algorithms (three variations of Boruvka plus our new approach) for arbitrary sparse graphs that for the first time give speedup when compared with the best sequential algorithm.
The important difference between Boruvka's algorithm and Kruskal's or Prim's is that with Boruvka's you don't need to presort the edges or maintain a priority queue.
Boruvka's still incurs the extra log N factor in the cost, but it does it by requiring O(log N) passes over the edges.
You can parallelize Boruvka's algorithm, but you can also parallelize sorting, so I don't know if Boruvka's has any real advantages over Kruskal's in practice.
I was reading Introduction To Algorithms 3rd Edition. There are 3 methods given to solve the problem. My inquiry is about two of them.
The one with no name
The algorithm starts by topologically sorting the dag (see Section 22.4) to impose a linear ordering on the vertices. If the dag contains a path from vertex u to vertex v, then u precedes v in the topological sort. We make just one pass over the vertices in the topologically sorted order. As we process each vertex, we relax each edge that leaves the vertex.
Dijkstra's Algorithm
This is quite well known
As far as the book shows, time complexity of without name one is O(V+E) but of Dijstra's is O(ElogV). We cannot use Dijkstra's on negative weight but we can use the other. What are the advantages of using Dijkstra's Algorithm except it can be used in cyclic ones?
Because the first algorithm you give only works on acyclic graphs, whereas Dijkstra runs on graph with non-negative weight.
The limitations are not the same.
In real-world, many applications can be modelled as graphs with non-negative weights, that's why Dijkstra is so used. Plus, it is very simple to implement. The complexity of Dijkstra is higher because it relies on priority queue, but this does not mean it takes necessarily more time to execute. (nlog(n) time is not that bad, because log(n) is a relatively small number: log(10^80) = 266)
However, this stand for sparse graphs (low density of edges). For dense graphs, other algorithms may be more efficient.
I want to detect cycles in an undirected graph such that I can find the minimum spanning tree (in particular I want to use Kruskal algorithm). Since I want to parallelize the code, I was wondering which algorithm is the best, Depth-first search of union-find algorithm?
Thanks for any suggestions.
Of all three MST algorithms only Boruvka's MST algorithm is easily parallelizable whereas the kruskal and prims are sequential greedy algorithms hence there is minimum scope for parallel implementation of them.
Note: It is a research topic to achieve efficient parallel boruvka might find some papers
What I'm looking for is a comprehensive list of graph traversal algorithms, with brief descriptions of their purpose, as a jump off point for researching them. So far I'm aware of:
Dijkstra's - single-source shortest path
Kruskal's - finds a minimum spanning tree
What are some other well-known ones? Please provide a brief description of each algorithm to each of your answers.
the well knowns are :
Depth-first search http://en.wikipedia.org/wiki/Depth-first_search
Breadth-first search http://en.wikipedia.org/wiki/Breadth-first_search
Prim's algorithm http://en.wikipedia.org/wiki/Prim's_algorithm
Kruskal's algorithm http://en.wikipedia.org/wiki/Kruskal's_algorithm
Bellman–Ford algorithm http://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm
Floyd–Warshall algorithm http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm
Reverse-delete algorithm http://en.wikipedia.org/wiki/Reverse-Delete_algorithm
Dijkstra's_algorithm http://en.wikipedia.org/wiki/Dijkstra's_algorithm
network flow
Ford–Fulkerson algorithm http://en.wikipedia.org/wiki/Ford%E2%80%93Fulkerson_algorithm
Maximum Flow http://en.wikipedia.org/wiki/Maximum_flow_problem
A few off of the top of my head:
Depth-first and Breadth-first traversals, really just two different ways of touching all of the nodes.
Floyd-Warshall algorithm finds the shortest paths between any pair of points, in (big-theta)(v^3) time.
Prim's algorithm is an alternative to Kruskal's for MST.
There are also algorithms for finding fully connected components, which are groups of nodes where you can get from any member in the component to any other member. This only matters for "directed graphs", where you can traverse an edge only one direction.
Personally, I think the coolest extension of graph theory (not exactly related to your question, but if you're interested in learning more about graphs in general its certainly worth your while) is the concepts of "flow networks": http://en.wikipedia.org/wiki/Flow_network . It is a way of computing about, say, how much electricity can one distribute over houses with a variety of power needs and requirements, and a variety of power stations.
Graph algorithms
http://en.wikipedia.org/wiki/List_of_algorithms#Graph_algorithms
All algorithms in one place
http://en.wikipedia.org/wiki/List_of_algorithms
Dictionary of algorithms and data structures:
Dictionary: http://xlinux.nist.gov/dads/
By area: http://xlinux.nist.gov/dads/termsArea.html
By terms type: http://xlinux.nist.gov/dads/termsType.html
List of all implementations in diff. languages: http://xlinux.nist.gov/dads/termsImpl.html