Multigraph reduction in O(m+n) time - algorithm

Consider a multigraph G, where the following three reductions need to be made:
Vertices with two neighbors are removed from the graph and their neighbors joined to each other via a new edge.
Vertices with one neighbor are removed from the graph.
Duplicate edges are removed from the graph.
This is a homework question that I had on a recent assignment, where I am asked to show that these three reductions can be done in O(m+n) time. Any help to better understand how to go about doing this is greatly appreciated. Thanks!

This reduction isn't unique: consider a graph with two vertices and one edge, v-w, which has two possible reductions. I will explain how to get an arbitrary valid reduction.
You'll first want to remove duplicate edges: this can be done using a set or a hash-table to identify duplicates, in O(n+m) time. I'll assume you're storing the graph as a dictionary from vertices to their adjacency sets.
After this, you'll want to iterate over the vertices, and keep a set (or any container with O(1) membership testing) to store 'to be deleted' vertices. After this first pass over vertices, this will contain any vertices with degree 1 or 2.
Now, while your 'to be deleted' set isn't empty, you'll:
Pop a vertex v from the set.
If v has degree 0, ignore it.
If v has degree 1 and its neighbor is w, delete v from your graph and remove v from w's adjacency set. If w now has degree 1 or 2 and isn't in the 'to be deleted' set, add it to the set.
Otherwise, v has degree 2, and two distinct neighbors u, w.
If u and w are not adjacent: add an edge from u to w, remove v and its edges from your graph.
If u and w are adjacent: remove v and its edges from your graph. If u or w now have degree 1 or 2, add them to the 'to be deleted' set.
This does constant work per vertex and edge, but relies upon a certain graph representation of 'adjacency sets' where edges can be deleted in constant time. Converting to and from this representation, given adjacency lists or a list of edges, can be done in O(m+n) time.

Related

Difficulties understanding, bipartite graphs

I am looking back at some of my algorithms homework(exam soon haha), and I am having troubles understanding the solution to one of the questions.
The Question
Tutor Solution
I am having difficulties visualizing the solution, I understand that if you have an odd number of cycles than your graph cannot be bipartite. But as I stated, I don't understand the shortest path from s to u and v to s.
Let's say G is bipartite. Then the vertex set can be divided into V1 and V2 s.t. every edge of G includes a vertex from each set. Then the vertices of every path in G alternate between V1 and V2, so the parity of the length of every path between every pair of vertices is the same. I.e., if G is bipartite and v,w are two vertices of G, then the length of every path connecting v & w is either even or odd.
If there is an edge (u,v) connecting two vertices in the same layer then this is violated. The BFS path to v and u have the same length, so same parity, since v & u are in the same layer. The edge between v & u gives us a path longer by 1, so of a different parity. Contradiction.
The tutor's solution isn't as clear as it could be, since it talks about cycles, and the two paths don't necessarily form a cycle since they can share vertices. And the definition of bipartite graph is slightly wrong (or uses non-standard definitions of edges, cross-product etc. where the things aren't pairs but 2-element sets).
But instead of the definition as given, I'd just say that a graph is bipartite if the vertices can be coloured black or white, such that each edge goes between different-coloured vertices. (Equivalently you can simply say "the graph is 2-colourable").
From this definition, it follows that if there's an even length path between two vertices they must be the same colour, otherwise different colours. In the BFS on a bipartite graph, a vertex u is layer i has a path of length i from s to u, so all vertices in the same layer have the same colour. Thus there can be no edge between two vertices in the same layer.

Prove that a graph is bipartite

Given a graph G in which every edge connects an even degree node with an odd degree node. How can i prove that the graph is bipartite?
Thanks in advance
This is the Welsh-Powell graph colouring algorithm:
All vertices are sorted according to the decreasing value of their degree in a list V
Colours are ordered in a list C
The first non-coloured vertex v in V is coloured with the first available colour in C. "Available" means a colour that has not previously used by this algorithm
The remaining part of the ordered list V is traversed and the same colour is allocated to every vertex for which no adjacent vertex has the same colour
Steps 3 and 4 are applied iteratively until all the vertices have been coloured
A graph is bipartite if it is 2-colourable. This intuitive fact is proven in Cambridge Tracts in Mathematics 131.
This is, of course, the cannon with which to shoot a mosquito. A graph is bipartite iff its vertices can be divided into two sets, such that every edge connects a vertex from set 1 to one in set 2. You already have such a division: each edge connects a vertex from the set of odd-degree vertices, to a vertex in the set of even-degree vertices.
Because by definition you already have two disjoint sets of vertices such that the only edges go between a vertex in one set and a vertex in the other set.
The even degree nodes are one set, and the odd degree nodes are the other set.
Pick any node, put it in set A. Take all the nodes that link to it, put them in set B. Now for every node added, add all it's neighbors to the opposite set, and check for the ones that already belong to one of the sets that they are in the right set. If you get a contradiction then the graph is not bipartite.
If you run out of neighbors but there are still nodes left, pick again any node and continue the algorithm until you either have no nodes or you found a contradiction.
If by "prove", you mean "find out", the complete solution is here
Bipartite.java
It works by keeping two boolean arrays. A marked array to check if a node has been visited, and another colored array. It then does a depth first search, marking neighbors with alternate colors. If a neighbor is marked with same color, graph is not bipartite.

Number of edge distinct paths in bipartite graph

We have a bipartite graph, where set A have n vertices and set B have n vertices.
Also each vertices in set A have k edges to set B and each vertices in set B have k edges to set A.
There is a special vertex s that has edges to all vertices to set A, and a special vertex t that has edges to all vertices in B.
How can I prove that there are k edge distinct paths from s to t?
The problem that I am facing is that it asks given the graph mentioned above(Minus the vertices s and t), I need to prove that if at each round I remove all edges from A to B in a way that I can’t remove more than 1 edge from same vertices, there is a way to do this removal so that A and B will become disconnected in k rounds.
Also each vertices in set A have k edges to set B and each vertices in
set B have k edges to set A.
=> There exist at least k vertices in A and there exist at least k vertices in B. (I)
Now we use:
There is a special vertex s that has edges to all vertices to set A,
and a special vertex t that has edges to all vertices in B.
(which we'll call (II)) to show there must be at least k edge disjoint path from s to t.
Consider the following removal-process:
Go from s to a vertex v_a in A.
Go from v_a to a vertex v_bin B.
Go from v_b to t.
Remove all the edges along this path (to make sure we are not reusing them later on)
Note: one such removal round corresponds to exactly a path from s to t.
Now: we can repeat this removal-process at least k times. Why?
Because after k-1 rounds, there must remain at least one vertex v_a_last in A because of (I). This vertex can be reached from s because of (II). This vertex v_a_last must have at least one adjacent vertex v_b_last in B which we have not come along yet (v_a_last has k neighbors in B but we have come across at most k-1 of them so far since we have only made k-1 removal-rounds so far). Since we haven't come along v_b_last so far, the edge from v_b_last to t must still be in the graph. Hence in round k we can go from s to v_a_last to v_b_last to t which is the k-th edge-disjoint paths from s to t.

Find all edges in min-cut

Let (G,s,t,{c}) be a flow network, and let F be the set of all edges e for which there exists at least one minimum cut (A,B) such that e goes from A to B. Give a polynomial time algorithm that finds all edges in F.
NOTE: So far I know I need to run Ford-Fulkerson so each edges has a flow. Furthermore I know for all edges in F, the flow f(e) = c(e). However not all edges in a graph G which respects that constraint will be in a min-cut. I am stuck here.
Suppose you have computed a max flow on a graph G and you know the flow through every edge in the graph. From the source vertex s, perform a Breadth First Search OR Depth First Search on the original graph and only traverse those edges that have flow less than the capacity of the edge. Denote the set of vertices reachable in this traversal as S, and unreachable vertices as T.
To obtain the minimum cut C, we simply find all edges in the original graph G which begin at some vertex in S and end at some vertex in T.
This tutorial in Topcoder provides an explanation / proof of the above algorithm. Look at the section beginning with the following text:
A cut in a flow network is simply a partition of the vertices in two sets, let's call them A and B, in such a way that the source vertex is in A and the sink is in B.
I shall attempt to provide an explanation of the corresponding section in the Topcoder tutorial (just for me to brush up on this as well).
Now, suppose that we have computed a max flow on a graph G, and that we have computed the set of edges C using the procedure outlined above. From here, we can conclude several facts.
Fact 1: Source vertex s must be in set S, and sink vertex t must be in set T.
Otherwise, vertices s and t must be in the same set, which means that we must have found a path from s to t consisting only of edges that have flow less than capacity. This means that we can push more flow from s to t, and therefore we have found an augmenting path! However, this is a contradiction, since we have already computed a max flow on the graph. Hence, it is impossible for source vertex s and sink vertex t to be connected, and they must be in different sets.
Fact 2: Every edge beginning at set S and ending at set T must have flow == capacity
Again we prove this by contradiction. Suppose that there is a vertex u in S and a vertex v in T, such that edge (u,v) in the residual network has flow less than capacity. By our algorithm above, this edge will be traversed, and vertex v should be in set S. This is a contradiction. Therefore, such an edge must have flow == capacity.
Fact 3: Removing the edges in C from graph G will mean that there is no path from any vertex in set S to any vertex in set T
Suppose that this is not the case, and there is some edge (u,v) that connects vertex u in set S to vertex v in set T. We can separate this into 2 cases:
Flow through edge (u,v) is less than its capacity. But we know this will cause vertex v to be part of set S, so this case is impossible.
Flow through edge (u,v) is equal to its capacity. This is impossible since edge (u,v) will be considered as part of the edge set C.
Hence both cases are impossible, and we see that removing the edges in C from the original graph G will indeed result in a situation where there is no path from S to T.
Fact 4: Every edge in the original graph G that begins at vertex set T but ends at vertex set S must have a flow of 0
The explanation on the Topcoder tutorial may not be obvious on first reading and the following is an educated guess on my part and may be incorrect.
Suppose that there exists some edge (x,y) (where x belongs to vertex set T and y belongs to vertex set S), such that the flow through (x,y) is greater than 0. For convenience, we denote the flow through (x,y) as f. This means that on the residual network, there must exist a backward edge (y,x) with capacity f and flow 0. Since vertex y is part of set S, the backward edge (y,x) has flow 0 with capacity f > 0, our algorithm will traverse the edge (y,x) and place vertex x as part of vertex set S. However, we know that vertex x is part of vertex set T, and hence this is a contradiction. As such, all edges from T to S must have a flow of 0.
With these 4 facts, along with the Max-flow min-cut theorem, we can conclude that:
The max flow must be less than or equal to the capacity of any cut. By Fact 3, C is a cut of the graph, so the max flow must be less than or equal to the capacity of cut C.
Fact 4 allows us to conclude that there is no "back flow" from T to S. This along with Fact 2 means that the flow consists entirely of "forward flow" from S to T. In particular, all the forward flow must result from the cut C. This flow value happens to be the max flow. As such, by the Max-flow min-cut theorem, we know that C must be a minimum cut.

Path from s to e in a weighted DAG graph with limitations

Consider a directed graph with n nodes and m edges. Each edge is weighted. There is a start node s and an end node e. We want to find the path from s to e that has the maximum number of nodes such that:
the total distance is less than some constant d
starting from s, each node in the path is closer than the previous one to the node e. (as in, when you traverse the path you are getting closer to your destination e. in terms of the edge weight of the remaining path.)
We can assume there are no cycles in the graph. There are no negative weights. Does an efficient algorithm already exist for this problem? Is there a name for this problem?
Whatever you end up doing, do a BFS/DFS starting from s first to see if e can even be reached; this only takes you O(n+m) so it won't add to the complexity of the problem (since you need to look at all vertices and edges anyway). Also, delete all edges with weight 0 before you do anything else since those never fulfill your second criterion.
EDIT: I figured out an algorithm; it's polynomial, depending on the size of your graphs it may still not be sufficiently efficient though. See the edit further down.
Now for some complexity. The first thing to think about here is an upper bound on how many paths we can actually have, so depending on the choice of d and the weights of the edges, we also have an upper bound on the complexity of any potential algorithm.
How many edges can there be in a DAG? The answer is n(n-1)/2, which is a tight bound: take n vertices, order them from 1 to n; for two vertices i and j, add an edge i->j to the graph iff i<j. This sums to a total of n(n-1)/2, since this way, for every pair of vertices, there is exactly one directed edge between them, meaning we have as many edges in the graph as we would have in a complete undirected graph over n vertices.
How many paths can there be from one vertex to another in the graph described above? The answer is 2n-2. Proof by induction:
Take the graph over 2 vertices as described above; there is 1 = 20 = 22-2 path from vertex 1 to vertex 2: (1->2).
Induction step: assuming there are 2n-2 paths from the vertex with number 1 of an n vertex graph as described above to the vertex with number n, increment the number of each vertex and add a new vertex 1 along with the required n edges. It has its own edge to the vertex now labeled n+1. Additionally, it has 2i-2 paths to that vertex for every i in [2;n] (it has all the paths the other vertices have to the vertex n+1 collectively, each "prefixed" with the edge 1->i). This gives us 1 + Σnk=2 (2k-2) = 1 + Σn-2k=0 (2k-2) = 1 + (2n-1 - 1) = 2n-1 = 2(n+1)-2.
So we see that there are DAGs that have 2n-2 distinct paths between some pairs of their vertices; this is a bit of a bleak outlook, since depending on weights and your choice of d, you may have to consider them all. This in itself doesn't mean we can't choose some form of optimum (which is what you're looking for) efficiently though.
EDIT: Ok so here goes what I would do:
Delete all edges with weight 0 (and smaller, but you ruled that out), since they can never fulfill your second criterion.
Do a topological sort of the graph; in the following, let's only consider the part of the topological sorting of the graph from s to e, let's call that the integer interval [s;e]. Delete everything from the graph that isn't strictly in that interval, meaning all vertices outside of it along with the incident edges. During the topSort, you'll also be able to see whether there is a
path from s to e, so you'll know whether there are any paths s-...->e. Complexity of this part is O(n+m).
Now the actual algorithm:
traverse the vertices of [s;e] in the order imposed by the topological
sorting
for every vertex v, store a two-dimensional array of information; let's call it
prev[][] since it's gonna store information about the predecessors
of a node on the paths leading towards it
in prev[i][j], store how long the total path of length (counted in
vertices) i is as a sum of the edge weights, if j is the predecessor of the
current vertex on that path. For example, pres+1[1][s] would have
the weight of the edge s->s+1 in it, while all other entries in pres+1
would be 0/undefined.
when calculating the array for a new vertex v, all we have to do is check
its incoming edges and iterate over the arrays for the start vertices of those
edges. For example, let's say vertex v has an incoming edge from vertex w,
having weight c. Consider what the entry prev[i][w] should be.
We have an edge w->v, so we need to set prev[i][w] in v to
min(prew[i-1][k] for all k, but ignore entries with 0) + c (notice the subscript of the array!); we effectively take the cost of a
path of length i - 1 that leads to w, and add the cost of the edge w->v.
Why the minimum? The vertex w can have many predecessors for paths of length
i - 1; however, we want to stay below a cost limit, which greedy minimization
at each vertex will do for us. We will need to do this for all i in [1;s-v].
While calculating the array for a vertex, do not set entries that would give you
a path with cost above d; since all edges have positive weights, we can only get
more costly paths with each edge, so just ignore those.
Once you reached e and finished calculating pree, you're done with this
part of the algorithm.
Iterate over pree, starting with pree[e-s]; since we have no cycles, all
paths are simple paths and therefore the longest path from s to e can have e-s edges. Find the largest
i such that pree[i] has a non-zero (meaning it is defined) entry; if non exists, there is no path fitting your criteria. You can reconstruct
any existing path using the arrays of the other vertices.
Now that gives you a space complexity of O(n^3) and a time complexity of O(n²m) - the arrays have O(n²) entries, we have to iterate over O(m) arrays, one array for each edge - but I think it's very obvious where the wasteful use of data structures here can be optimized using hashing structures and other things than arrays. Or you could just use a one-dimensional array and only store the current minimum instead of recomputing it every time (you'll have to encapsulate the sum of edge weights of the path together with the predecessor vertex though since you need to know the predecessor to reconstruct the path), which would change the size of the arrays from n² to n since you now only need one entry per number-of-nodes-on-path-to-vertex, bringing down the space complexity of the algorithm to O(n²) and the time complexity to O(nm). You can also try and do some form of topological sort that gets rid of the vertices from which you can't reach e, because those can be safely ignored as well.

Resources