Adding and removing existing edge in a graph (BOOST)? - boost

Lets say I have an undirected graph G. Lets say I add the following
add_edge(1,2,G);
add_edge(1,3,G);
add_edge(0,2,G);
Now I lets say add this again:
add_edge(0,2,G);
Do I have two edges in my graph from 0 ---> 2 ?
What happens if I added the edge twice and I do:
remove_edge(0,2,G);
Do both the edges disappear,or do I still have one of them?

The answer to both of your questions depends on the definition of graph G.
The answer to the first question, according to the boost::graph tutorial, depends on which OutEdgeList you use in your graph definition. If you use a container that cannot represent multiple edges (such as setS or hash_setS), there will be only one edge between two vertices no matter how many times you insert it. If you use a vectorS, multisetS or similar, there will be one edge inserted for each call of add_edge().
The answer to the second question, according to the same page (that section of the page does not allow direct links - just search for remove_edge) is that all edges between the two vertices will be removed after calling that particular remove_edge() function. There are several other versions of remove_edge() (described on the same page), each with a slightly different behaviour.

Related

Does this graph reduction operation already exist?

I have an application that uses a directed acyclic graph (DAG) to represent events ordered by time. My goal is to create or find an algorithm to simplify the graph by removing certain edges with specific properties. I'll try to define what I mean:
In the example below, a is the first node and f is the last. In the first picture, there are four unique paths to use to go from a to f. If we isolate the paths between b and e, we have two alternative paths. The path that is a single edge, namely the edge between b and e is the type of path that I want to remove, leaving the graph in the second picture as a result.
Therefore, all the edges I want to remove are defined as: single edges between two nodes that have at least one other path with >1 edges.
I realize this might be a very specific kind of graph operation, but hoping this algorithm already exists out there, my question to Stack Overflow is: Is this a known graph operation, or should I get my hiney to the algorithm drawing board?
Like Matt Timmermans said in the comment: that operation is called a transitive reduction.
Thanks Matt!

How to find connected components in graph efficiently without adjacency matrix?

I want to find the connected components in an undirected graph. However, I don't have an adjacency matrix. Instead I have a set of vertices as well as a function telling me whether two vertices are adjacent. What is the most efficient way to find all connected components?
I know I could just calculate the entire adjacency matrix and use depth-first search to find all components. But that would not be very efficient as I'd need to check every pair of vertices.
What I'm currently doing is the following procedure:
Pick any unassigned vertex which is now its own component
Find all neighbors of that vertex and add them to the component
Find all neighbors of the just added vertices (amongst those vertices not yet assigned to any component) and add them too
Repeat previous step until no new neighbors can be found
The component is now complete, repeat from the first step to find other components until all vertices are assigned
This is the pseudocode:
connected_components(vertices):
// vertices belonging to the current component and whose neighbors have not yet been added
vertices_to_check= [vertices.pop()]
// vertices belonging to the current component
current_component = []
components = []
while (vertices.notEmpty() or vertices_to_check.notEmpty()):
if (vertices_to_check.isEmpty()): // All vertices in the current component have been found
components.add(current_component)
current_component = []
vertices_to_check= [vertices.pop()]
next_vertex = vertices_to_check.pop()
current_component.add(next_vertex)
for vertex in vertices: // Find all neighbors of next_vertex
if (vertices_adjacent(vertex, next_vertex)):
vertices.remove(vertex)
vertices_to_check.add(vertex)
components.add(current_component)
return components
I understand that this method is faster than calculating the adjacency matrix in most cases, as I don't need to check whether two vertices are adjacent, if it is already known they belong to the same component. But is there a way to improve this algorithm?
Ultimately, any algorithm will have to call vertices_adjacent for every single pair of vertices that turn out to belong to separate components, because otherwise it will never be able to verify that there's no link between those components.
Now, if a majority of vertices all belong to a single component, then there may not be too many such pairs; but unless you expect a majority of vertices all belong to a single component, there's little point optimizing specifically for that case. So, discarding that case, the very best-case scenario is:
There turn out to be exactly two components, each with the same number of vertices (½|V| each). So there are ¼|V|2 pairs of vertices that belong to separate components, and you need to call vertices_adjacent for each of those pairs.
These two components turn out to be complete, or you turn out to be exceptionally lucky in your choice of edges to check for first, such that you can detect the connected parts by checking just |V| − 2 pairs.
. . . which still involves making ¼|V|2 + |V| − 2 calls to vertices_adjacent. By comparison, the build-an-adjacency-list approach makes ½|V|2 − ½|V| calls — which is more than the best-case scenario, but by a factor of less than 2. (And the worst-case scenario is simply equivalent to the build-an-adjacency-list approach. That would happen if no component contains more than two vertices, or if the graph is acyclic and you get unlikely in your choice of edges to check for first. Most graphs will be somewhere in between.)
So it's probably not worth trying to optimize too closely for the exact minimum number of calls to vertices_adjacent.
That said, your approach seems pretty reasonable to me; it doesn't make any calls to vertices_adjacent that are clearly unnecessary, so the only improvement would be a probabilistic one, if it could do a better job guessing which calls will turn out to be useful for eliminating later calls.
One possibility: in many graphs, there are some vertices that have a lot of neighbors and some vertices that have relatively few, according to a power-law distribution. So if you prioritize vertices based on how many neighbors they're already known to have, you may be able to take advantage of that pattern. (I think this is especially likely to be useful if the majority of vertices really do all belong to a single component, which is the only case where a better-than-factor-of-2 improvement is even conceivable.) But, you'll have to test to see if it actually makes a difference for the graphs you're interested in.

Reason for finding partial order of a graph

In a recent algorithms course we had to form a condensation graph and compute its reflexive-transitive closure to get a partial order. But it was never really explained why we would want to do that in a graph. I understand the gist of a condensation graph in that it highlights the strongly connected components, but what does the partial order give us that the original graph did not?
The algorithm implemented went like this:
Find strongly connected components (I used Tarjan's algorithm)
Create condensation graph for the SCCs
Form reflexive-transitive closure of adjacency matrix (I used Warshall's algorithm)
Doing that forms the partial order, but.... what advantage does finding the partial order give us?
Like any other data structure or algorithm, advantages are there only if it's properties are needed :-)
Result of procedure you described is structure that can be used to (easily) answer questions like:
For two nodes x, y. Is it x<=y and/or y<=x, or neither?
For a node x, find all nodes a that are a<=x, or x<=a?
These properties can be used to answer other questions about initial graph (DAG). Like, if adding edge x->y will produce a cycle. That can be checked by intersecting set A, of a<=x, and set B of y<=b. If A intersection B is not empty than edge x->y creates a cycle.
Structure also can be used to simpler implement algorithms that use graph to describes other dependencies. E.g. x->y means that result of calculation x is used for calculation y. If calculation x is changed than all calculations a where x<=a should be re-evaluated or flagged 'dirty' or result of x removed from a cache.

What are good ways of organizing directed graph data?

Here's my situation. I have a graph that has different sets of data being added at different times. For example, set1 might have a few thousand nodes and then set2 comes in later and we apply business logic to create edges from set1 to set2(and disgard any Vertices from set1 that do not have edges to set2). Then at a later point, we get set3, set4, and so on and the same process applies between each set and its previous set.
Question, what's the best way to organize this? What I did before was name the nodes set1-xx, set2-xx,etc.. The problem I faced was when I was trying to run analytics between the current set and the previous set I would have to run a loop through the entire graph and look for all the nodes that started with 'setx'. It took a long time as the graph grew, so I thought of another solution which was to create a node called 'set1' and have it connected to all nodes for that particular set. I am testing it but I was wondering if there way a more efficient way or a build in way of handling data structures like this? Is there a way to somehow segment data like this?
I think a general solution would be application but if it helps I'm using neo4j(so any specific solution to that database would be good as well).
You have a very special type of a directed graph, called a layered graph.
The choice of the data structure depends primarily on the expected graph density (how many nodes from a previous set/layer are typically connected to a node in the current set/layer) and on the operations that you need to perform on it most of the time. It is definitely a good idea to have each layer directly represented by a numeric index (that is, the outermost structure will be an array of sets/layers), and presumably you can also use one array of vertices per layer. However, the list of edges per vertex (out only, or in and out sets of edges depending on whether you ever traverse the layers backward) may be any of the following:
Linked list of vertex identifiers; this is good if the graph is very sparse and edges are often added/removed.
Sorted array of vertex identifiers; this is good if the graph is quite sparse and immutable.
Array of booleans, indexed by vertex identifiers, determining whether a given vertex is or is not linked by an edge from the current vertex; this is good if the graph is dense.
The "vertex identifier" can take many forms. For example, it can be an index into the array of vertices on the next layer.
Your second solution is what I would do- create a setX node and connect all nodes belonging to that set to setX. That way your data is partitioned and it is easier to query.

How do I find all paths through a set of given nodes in a DAG?

I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves.
The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy.
Example:
(source: theuprightape.net)
On that structure, I want to perform the following operations:
find all items (sinks) below a particular node (all items in Europe)
find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com)
find all nodes that lie below all of a set of nodes (intersection: goyish brown foods)
The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations?
How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach?
The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?
It seems to me that its essentially the same operation for all 3 questions. You're always asking "Find all X below node(s) Y, where X is of type Z". All you need is a generic mechanism for 'locate all nodes below node', (solves Q3) and then you can filter the results for 'nodetype=sink' (solves Q1). For Q2, you have the starting-point (your node set) and your ending point (any sink below the starting point) so your solution set is all paths from starting node specified to the sink. So I would suggest that what you basically have a is a tree, and basic tree-traversal algorithms would be the way to go.
Despite the fact that your graph is acyclic, the operations you cite remind me of similar aspects of control flow graph analysis. There is a rich set of algorithms based on dominance that may be applicable. For example, your third operation reminds me od computing dominance frontiers; I believe that algorithm would work directly if you temporarily introduce "entry" and "exit" nodes. The entry node connects the "given set of nodes" and the exit nodes connects the sinks.
Also see Robert Tarjan's basic algorithms.

Resources