I have a graph that is implemented as a list of edges connecting arbitrary nodes, with the data types defined below.
type edge = int * int;;
type graph = edge list;;
How would I perform a purely functional depth-first search while avoiding getting stuck on a cycle? I am not quite sure of how to keep track of all the nodes visited while remaining purely functional. The answer is probably something trivial that I am not conceptually grasping for some reason.
The search function has a parameter that tracks visited nodes. In FP one of the insights is that you can keep calling deeper and deeper (with tail calls). So you can pass the parameter along through all the calls, adding new nodes as you go.
Another parameter could be the nodes you plan to visit later. For DFS this would work like a stack.
Related
I would like to implement a DFS algorithm on a DAG that collects information from the children and passes it on to the parents. It is allowed to revisit a node multiple times when it is connected to the root node via different paths. In my case, recursion is not suitable because the graph might be so large that running the algorithm would exceed the maximal recursion depth.
My current implementation uses a stack to manage the depth-first search and it consists of a message passing scheme in which I build a hash map (Node → List) which contains the return values of the children to their parents. I was wondering whether there might be a more efficient way of doing this.
Here's my situation. I have a graph that has different sets of data being added at different times. For example, set1 might have a few thousand nodes and then set2 comes in later and we apply business logic to create edges from set1 to set2(and disgard any Vertices from set1 that do not have edges to set2). Then at a later point, we get set3, set4, and so on and the same process applies between each set and its previous set.
Question, what's the best way to organize this? What I did before was name the nodes set1-xx, set2-xx,etc.. The problem I faced was when I was trying to run analytics between the current set and the previous set I would have to run a loop through the entire graph and look for all the nodes that started with 'setx'. It took a long time as the graph grew, so I thought of another solution which was to create a node called 'set1' and have it connected to all nodes for that particular set. I am testing it but I was wondering if there way a more efficient way or a build in way of handling data structures like this? Is there a way to somehow segment data like this?
I think a general solution would be application but if it helps I'm using neo4j(so any specific solution to that database would be good as well).
You have a very special type of a directed graph, called a layered graph.
The choice of the data structure depends primarily on the expected graph density (how many nodes from a previous set/layer are typically connected to a node in the current set/layer) and on the operations that you need to perform on it most of the time. It is definitely a good idea to have each layer directly represented by a numeric index (that is, the outermost structure will be an array of sets/layers), and presumably you can also use one array of vertices per layer. However, the list of edges per vertex (out only, or in and out sets of edges depending on whether you ever traverse the layers backward) may be any of the following:
Linked list of vertex identifiers; this is good if the graph is very sparse and edges are often added/removed.
Sorted array of vertex identifiers; this is good if the graph is quite sparse and immutable.
Array of booleans, indexed by vertex identifiers, determining whether a given vertex is or is not linked by an edge from the current vertex; this is good if the graph is dense.
The "vertex identifier" can take many forms. For example, it can be an index into the array of vertices on the next layer.
Your second solution is what I would do- create a setX node and connect all nodes belonging to that set to setX. That way your data is partitioned and it is easier to query.
I am reading about DFS in Introduction to Algorithms by Cormen. Following is text
snippet.
Unlike BFS, whose predecessor subgraph forms a tree, the predecessor
subgrpah produced by DFS may be composed of several trees, because
search may be repeated from multiple sources.
In addition to above notes, following is mentioned.
It may seem arbitrary that BFS is limited to only one source where as
DFS may search from multiple sources. Although conceptually, BFS
could proceed from mutilple sources and DFS could limited to one
source, our approach reflects how the results of these searches are
typically used.
My question is
Can any one give an example how BFS is used with multiple source and
DFS is used with single source?
When it says multiple sources it is referring to the start node of the search. You'll notice that the parameters of the algorithms are BFS(G, s) and DFS(G). That should already be a hint that BFS is single-source and DFS isn't, since DFS doesn't take any initial node as an argument.
The major difference between these two, as the authors point out, is that the result of BFS is always a tree, whereas DFS can be a forest (collection of trees). Meaning, that if BFS is run from a node s, then it will construct the tree only of those nodes reachable from s, but if there are other nodes in the graph, will not touch them. DFS however will continue its search through the entire graph, and construct the forest of all of these connected components. This is, as they explain, the desired result of each algorithm in most use-cases.
As the authors mentioned there is nothing stopping slight modifications to make DFS single source. In fact this change is easy. We simply accept another parameter s, and in the routine DFS (not DFS_VISIT) instead of lines 5-7 iterating through all nodes in the graph, we simply execute DFS_VISIT(s).
Similarly, changing BFS is possible to make it run with multiple sources. I found an implementation online: http://algs4.cs.princeton.edu/41undirected/BreadthFirstPaths.java.html although that is slightly different to another possible implementation, which creates separate trees automatically. Meaning, that algorithm looks like this BFS(G, S) (where S is a collection of nodes) whereas you can implement BFS(G) and make separate trees automatically. It's a slight modification to the queueing and I'll leave it as an exercise.
As the authors point out, the reason these aren't done is that the main use of each algorithm lends to them being useful as they are. Although well done for thinking about this, it is an important point that should be understood.
Did you understand the definition? Did you see some pictures on the holy book?
When it says that DFS may be composed of several trees it is because, it goes deeper until it reaches to a leaf and then back tracks. So essentially imagine a tree, first you search the left sub tree and then right subtree. left sub tree may contain several sub trees. that s why.
When you think about BFS, it s based on level. level first search in other words. so you have a single source (node) than you search all the sub nodes of that level.
DFS with single source if there is only one child node, so you have only 1 source. i think it would be more clear if you take the source as parent node.
I'm looking for a quick method/algorithm for finding which nodes in a graph is critical.
For example, in this graph:
Node number 2 and 5 are critical.
My current method is to try removing one non-endpoint node from the graph at a time and then check if the entire network can be reached from all other nodes. This method is obvious not very efficient.
What are a better way?
See biconnected components. Calling them articulation points instead of critical nodes seems to yield better search results.
In any case, the algorithm consists of a simple depth first search where you maintain certain information for each node.
there are several better ways. research is always helpful
but since this is homework, the point of the exercise is likely to be to figure it out yourself
hint: how could you decorate the graph to tell you what nodes depend on what other nodes, and would this information perhaps be useful to spot the critical nodes?
I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves.
The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy.
Example:
(source: theuprightape.net)
On that structure, I want to perform the following operations:
find all items (sinks) below a particular node (all items in Europe)
find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com)
find all nodes that lie below all of a set of nodes (intersection: goyish brown foods)
The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations?
How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach?
The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?
It seems to me that its essentially the same operation for all 3 questions. You're always asking "Find all X below node(s) Y, where X is of type Z". All you need is a generic mechanism for 'locate all nodes below node', (solves Q3) and then you can filter the results for 'nodetype=sink' (solves Q1). For Q2, you have the starting-point (your node set) and your ending point (any sink below the starting point) so your solution set is all paths from starting node specified to the sink. So I would suggest that what you basically have a is a tree, and basic tree-traversal algorithms would be the way to go.
Despite the fact that your graph is acyclic, the operations you cite remind me of similar aspects of control flow graph analysis. There is a rich set of algorithms based on dominance that may be applicable. For example, your third operation reminds me od computing dominance frontiers; I believe that algorithm would work directly if you temporarily introduce "entry" and "exit" nodes. The entry node connects the "given set of nodes" and the exit nodes connects the sinks.
Also see Robert Tarjan's basic algorithms.