What I mean is this: To know powers, I need to know multiplication, and to know multiplication, I need to know addition. So to know A I need to know B, or A depends on B. I can only think of a couple of rules: if A depends on B, B cannot depend on A. And if A depends on B, and B depends on C, C cannot depend on A.
Does this kind of data structure has a name? I don't think is a hierarchical tree. And also, am I missing any other rule? If I would like to implement a map of human knowledge in such a way that if I ask my database what I need to know to learn quantum physics, it gives me an ordered list of subjects on which quantum physics depends. Of course, this list could have some sublists that run in parallel, in the sense that A could depend on B and C, without B depending on C or C depending on B. In this case B would be parallel to C, so graphically they could be displayed bellow A but both at the same height.
I'm pretty sure there are many other cases on which the same kind of structure is used.
Edit How about a partially ordered set? Sorry, not trying to be picky but sounds to me like it formalizes the same thing without any unnecessary references to graphs.
Such dependency constraints are usually represented by a directed acyclic graph, or DAG for short.
A DAG is a graph which is
directed
...since each edge represents a dependency, and a dependency has a direction. If "A depends on B" you have A → B.
acyclic
...since (as you point out in your post) it is undesirable to have cyclic dependencies.
It’s simply a directed acyclic graph.
Yes: this structure is a directed acyclic graph (DAG).
Yup. The DAG (Directed acyclic graph) is most commonly used!
See here: https://en.wikipedia.org/wiki/Directed_acyclic_graph
This is often implemented using a graph.
Related
I have an application that uses a directed acyclic graph (DAG) to represent events ordered by time. My goal is to create or find an algorithm to simplify the graph by removing certain edges with specific properties. I'll try to define what I mean:
In the example below, a is the first node and f is the last. In the first picture, there are four unique paths to use to go from a to f. If we isolate the paths between b and e, we have two alternative paths. The path that is a single edge, namely the edge between b and e is the type of path that I want to remove, leaving the graph in the second picture as a result.
Therefore, all the edges I want to remove are defined as: single edges between two nodes that have at least one other path with >1 edges.
I realize this might be a very specific kind of graph operation, but hoping this algorithm already exists out there, my question to Stack Overflow is: Is this a known graph operation, or should I get my hiney to the algorithm drawing board?
Like Matt Timmermans said in the comment: that operation is called a transitive reduction.
Thanks Matt!
In a recent algorithms course we had to form a condensation graph and compute its reflexive-transitive closure to get a partial order. But it was never really explained why we would want to do that in a graph. I understand the gist of a condensation graph in that it highlights the strongly connected components, but what does the partial order give us that the original graph did not?
The algorithm implemented went like this:
Find strongly connected components (I used Tarjan's algorithm)
Create condensation graph for the SCCs
Form reflexive-transitive closure of adjacency matrix (I used Warshall's algorithm)
Doing that forms the partial order, but.... what advantage does finding the partial order give us?
Like any other data structure or algorithm, advantages are there only if it's properties are needed :-)
Result of procedure you described is structure that can be used to (easily) answer questions like:
For two nodes x, y. Is it x<=y and/or y<=x, or neither?
For a node x, find all nodes a that are a<=x, or x<=a?
These properties can be used to answer other questions about initial graph (DAG). Like, if adding edge x->y will produce a cycle. That can be checked by intersecting set A, of a<=x, and set B of y<=b. If A intersection B is not empty than edge x->y creates a cycle.
Structure also can be used to simpler implement algorithms that use graph to describes other dependencies. E.g. x->y means that result of calculation x is used for calculation y. If calculation x is changed than all calculations a where x<=a should be re-evaluated or flagged 'dirty' or result of x removed from a cache.
Suppose I have a file with a one-liner (joke) on each line. I want to sort the jokes by how funny I find them. My first thought is to implement any sorting algorithm (preferably one that makes as few comparisons as possible) and having the comparison algorithm take my input; I'd just sit there and choose which of each pair of jokes it presented me was funnier.
There's a problem with that. My joke preference is not a total order. It lacks transitivity. For example, I might think that B is funnier than A when presented them, and that C is funnier than B, but when presented A and C somehow I find A to be funnier than C. If “>” means “is funnier than,” this means that C > B and B > A does not imply C > A. All sorting algorithms’ correctness depends on this.
But it still seems that there should be an algorithm that sorts the list of jokes so that the one at the top is most preferred over other jokes, and the one at the bottom is least preferred over other jokes, even if there are individual exceptions.
I don’t know how to Google this. Is there an algorithm for this kind of preference sorting? The answer here is not applicable because it forces the user’s preference to be transitive.
If you represent your decisions as a directed graph, where each joke is a node and each directed edge indicates one joke being better than the other, then you can retrieve an ordering by constructing the path which follows the edges and visits each node exactly once.
This type of graph is called a Tournament, and the path is a Hamiltonian path. I've got good news for you Bub, a Hamiltonian is proven to exist if the graph is strongly connected. Strongly connected just means that every node can be reached from every node, obeying the direction of the edges, so keep adding edges until this is true.
Tournament: https://en.wikipedia.org/wiki/Tournament_(graph_theory)
Hamiltonian Path: https://en.wikipedia.org/wiki/Hamiltonian_path
I'm trying to represent a transitive relation (in a database) and having a hard time working out the best data structure.
Basically, the data structure is a series of pairs A → B such that if A → B and B → C, then implicitly A → C. It's important to me to be able to identify which entries are original input and which entries exist implicitly. Asking if A → C is equivalent to me having a digraph and asking if there exists a path from A to C in that digraph.
I could just represent the original entries, but if I do than then it takes a lot of time to determine if two items are related, since I need to search for all possible paths and this is rather slow.
Alternatively, I can store the original edges, as well as a listing of all paths. This makes adding a new edge easy, because when I add A → B I can just take the Cartesian product of paths ending in A and the paths ending in B and put them together. This has some significant space overhead of O(n2) in the worst case, but has the nice property that lookups, by far the most common operation, will be constant time. The issue is deleting, where I cannot think of anything really other than recalculating all paths that may or may not run through the edge deleted, and this can be really nasty.
Does anyone have any better ideas?
Technical notes: the digraph may be cyclic, but the relation is reflexive so I don't need to represent the reflexivity or store anything about it.
This is called the Reachability problem.
It would seem that you want an efficient online algorithm, which is an open problem, and an area of much research.
See my similar question on cs.SE: An incrementally-condensed transitive-reduction of a DAG, with efficient reachability queries, where I reference several related querstions across stackexchange:
Related:
What is the fastest deterministic algorithm for dynamic digraph reachability with no edge deletion?
What is the fastest deterministic algorithm for incremental DAG reachability?
Does an algorithm exist to efficiently maintain connectedness information for a DAG in presence of inserts/deletes?
Is there an online-algorithm to keep track of components in a changing undirected graph?
Dynamic shortest path data structure for DAG
Note that even though some algorithm might be for a DAG only, if it supports condensation (that is, collapsing strongly connected components into one node, since they are considered equal, ie. they relate back and forth), it is equivalent; after condensation, you can query the graph for the representative node in place of any of the condensed nodes (because they were both reachable from each-other, and thusly related to the rest of the graph in exactly the same way).
My conclusion is that as-of-yet there does not seem to be an efficient way to do this (on the order of O(log n) queries for a dynamic graph, with output-sensitive update times on the condensed graph). For less efficient ways, see the related links above.
The closest practical algorithm I found was here (source), which is an interesting read. I am not sure how easy/practical this data-structure or any data structure in any paper you will find, would be to adapt it to a database.
PS. Consider asking CS-related questions on cs.stackexchange.com in the future.
Sorry for the wall of text, its as concise as I could make it!
I've got one very large directed graph, G, and subset of vertices, S, from within G. What I want to do is find the subgraph of G induced by S, with the additional consideration that if some path exists between a vertex p and a vertex q in G, that an edge exists between these two vertices in the induced subgraph. This is key; its a little more complicated (I think) than the usual induced subgraph problem.
The most rudimentary way I can think of to solve the problem is the following (I realize its probably not the most efficient, let me know if you have other suggestions that aren't too complicated to implement): For every pair of vertices within S, test for the existence of a path between them in G. If such a path exists, insert an edge between p and q in the induced subgraph. For my purposes, an n^2 time isn't that bad.
So, I suppose I have two questions:
1) What is the fastest way to determine whether or not a path EXISTS between two vertices? I don't need to know the path, just whether or not it exists. Furthermore, if there is some preprocessing I can do to the whole graph to make this calculation faster, what might it be, since I have to perform this calculation between each pair of vertices?
2) Is there a faster way than the one I suggested to find the type of induced subgraph I described?
Thanks so much for the help!
The problem of finding whether a path exists between two vertices is called the transitive closure problem, and it's as hard as matrix multiplication in the general case. I would first run a strongly connected components algorithm on your graph to compress cycles into a single node and form a directed graph. If you are lucky, you'll have some big cycles and that will make the subsequent transitive problem easy. Then I'd run the Floyd Warshall all pairs shortest paths algorithm on that graph to compute the transitive closure because it's incredibly simple to code. Maybe one of the o(n^3) matrix multiplication based algorithm will be faster, but I doubt it will be that much faster because the constant is so low Floyd Warhsall.
Here is a fast algorithm for strongly connected components.
And this contains a proof of the equivalence of matrix multiplication and transitive closure.
I am not sure if there is any good way to get around computing the transitive closure to solve your original problem. I suspect not, but on the other hand, sometimes clever people come up with something great.