Related
Let's say I have a graph and run a max-flow on it. I get some flow, f. However, I want to to flow f1 units where f1>f. Of course, I need to go about increasing some of the edge capacities. I want to make as small a total increase as possible to the capacities. Is there a clever algorithm to achieve this?
If it helps, I care for my application about bi-partite graphs with source (s) to left vertices (L) having some finite, integer capacities (c_l), left vertices L to right vertices R having some connectivity with infinite capacities and all right vertices, R connected to a sink vertex with finite integer capacities (c_r). Here, c_l and c_r sum to the same number. Also, there are no connections among the left vertices or among the right ones.
An example is provided in the image below. The blue numbers are the flow capacities and the pink numbers are the actual flows in the max-flow. Currently, 5 units are flowing but I want 9 units to flow.
In general, turn the flow instance into a min-cost flow instance by setting the cost of existing arcs to zero and adding new, infinite-capacity arcs doubling them of cost one.
For these particular instances, the best you're going to do is to repeatedly find an unsaturated arc of finite capacity and push flow along any path that includes it. Once everything's saturated just use any path.
This seems a little too easy to be what you want, so I'll mention that it's possible to formulate more sophisticated objectives and solve them using linear programming techniques.
The graph is undirected, and all the "middle" vertices have infinite capacity. That means we can unify all vertices connected by infinite capacity in L and R, making a very simple graph indeed.
For example, in the above graph, an equivalent graph would be:
s -8-> Vertex 1+2+4 -4-> t
s -1-> Vertex 3+5 -5-> t
So we end up with just a bunch of unique paths with no branching. We can unify the nodes with a simple "floodfill" or DFS type search on infinite-capacity edges. When we unify nodes, we add up their "left" and "right" capacities.
To maximize flow in this graph we:
First, if the left and right paths are not equal, increase the lower one until they are equal. This lets us convert an increase of cost X, into an increase in flow of X.
Once the left and right paths are equal for all nodes, we pick any path. Then, we increase both halves of the path with cost 2X, increasing the flow by X.
I came upon wait-for graphs and I wonder, are there any efficient algorithms for detecting if adding an edge to a directed graph results in a cycle?
The graphs in question are mutable (they can have nodes and edges added or removed). And we're not interested in actually knowing an offending cycle, just knowing there is one is enough (to prevent adding an offending edge).
Of course it'd be possible to use an algorithm for computing strongly connected components (such as Tarjan's) to check if the new graph is acyclic or not, but running it again every time an edge is added seems quite inefficient.
If I understood your question correctly, then a new edge (u,v) is only inserted if there was no path from v to u before (i.e., if (u,v) does not create a cycle). Thus, your graph is always a DAG (directed acyclic graph). Using Tarjan's Algorithm to detect strongly connected components (http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) sounds like an overkill in this case. Before inserting (u,v), all you have to check is whether there is a directed path from v to u, which can be done with a simple BFS/DFS.
So the simplest way of doing it is the following (n = |V|, m = |E|):
Inserting (u,v): Check whether there is a path from v to u (BFS/DFS). Time complexity: O(m)
Deleting edges: Simply remove them from the graph. Time complexity: O(1)
Although inserting (u,v) takes O(m) time in the worst case, it is probably pretty fast in your situation. When doing the BFS/DFS starting from v to check whether u is reachable, you only visit vertices that are reachable from v. I would guess that in your setting the graph is pretty sparse and that the number of vertices reachable by another is not that high.
However, if you want to improve the theoretical running time, here are some hints (mostly showing that this will not be very easy). Assume we aim for testing in O(1) time whether there exists a directed path from v to u. The keyword in this context is the transitive closure of a DAG (i.e., a graph that contains an edge (u, v) if and only if there is a directed path from u to v in the DAG). Unfortunately, maintaining the transitive closure in a dynamic setting seems to be not that simple. There are several papers considering this problem and all papers I found were STOC or FOCS papers, which indicates that they are very involved. The newest (and fastest) result I found is in the paper Dynamic Transitive Closure via Dynamic Matrix Inverse by Sankowski (http://dl.acm.org/citation.cfm?id=1033207).
Even if you are willing to understand one of those dynamic transitive closure algorithms (or even want to implement it), they will not give you any speed up for the following reason. These algorithms are designed for the situation, where you have a lot of connectivity queries (which then can be performed in O(1) time) and only few changes in the graph. The goal then is to make these changes cheaper than recomputing the transitive closure. However, this update is still slower that a single check for connectivity. Thus, if you need to do an update on every connectivity query, it is better to use the simple approach mentioned above.
So why do I mention this approach of maintaining the transitive closure if it does not fit your needs? Well, it shows that searching an algorithm consuming only O(1) query time does probably not lead you to a solution faster than the simple one using BFS/DFS. What you could try is to get a query time that is faster than O(m) but worse than O(1), while updates are also faster than O(m). This is a very interesting problem, but it sounds to me like a very ambitious goal (so maybe do not spend too much time on trying to achieve it..).
As Mark suggested it is possible to use data structure that stores connected nodes. It is the best to use boolean matrix |V|x|V|. Values can be initialized with Floyd–Warshall algorithm. That is done in O(|V|^3).
Let T(i) be set of vertices that have path to vertex i, and F(j) set of vertices where exists path from vertex j. First are true's in i'th row and second true's in j'th column.
Adding an edge (i,j) is simple operation. If i and j wasn't connected before, than for each a from T(i) and each b from F(j) set matrix element (a,b) to true. But operation isn't cheap. In worst case it is O(|V|^2). That is in case of directed line, and adding edge from end to start vertex makes all vertices connected to all other vertices.
Removing an edge (i,j) is not so simple, but not more expensive operation in the worst case :-) If there is a path from i to j after removing edge, than nothing changes. That is checked with Dijkstra, less than O(|V|^2). Vertices that are not connected any more are (a,b):
a in T(i) - i - T(j),
b in F(j) + j
Only T(j) is changed with removing edge (i,j), so it has to be recalculated. That is done by any kind of graph traversing (BFS, DFS), by going in opposite edge direction from vertex j. That is done in less then O(|V|^2). Since setting of matrix element is in worst case is again O(|V|^2), this operation has same worst case complexity as adding edge.
This is a problem which I recently faced in a slightly different situation (optimal ordering of interdependent compiler instructions).
While I can't improve on O(n*n) theoretical bounds, after a fair amount of experimentation and assuming heuristics for my case (for example, assuming that the initial ordering wasn't created maliciously) the following was the best compromise algorithm in terms of performance.
(In my case I had an acceptable "right side failure": after the initial nodes and arcs were added (which was guaranteed to be possible), it was acceptable for the optimiser to occasionally reject the addition of further arcs where one could actually be added. This approximation isn't necessary for this algorithm when carried to completion, but it does admit such an approximation if you wish to do so, and so limiting its runtime further).
While a graph is topologically sorted, it is guaranteed to be cycle-free. In the first phase when I had a static bulk of nodes and arcs to add, I added the nodes and then topologically sorted them.
During the second phase, adding additional arcs, there are two situations when considering an arc from A to B. If A already lies to the left of B in the sort, an arc can simply be added and no cycle can be generated, as the list is still topologically sorted.
If B is to the left of A, we consider the sub-sequence between B and A and partition it into two disjoint sequences X, Y, where X is those nodes which can reach A (and Y the others). If A is not reachable from B, ie there are no direct arcs from B into X or to A, then the sequence can be reordered XABY before adding the A to B arc, showing it is still cycle-free and maintaining the topological sort. The efficiency over the naive algorithm here is that we only need consider the subsequence between B and A as our list is topologically sorted: A is not reachable from any node to the right of A. For my situation, where localised reorderings are the most frequent and important, this an important gain.
As we don't reorder within the sequences X,A,B,Y, clearly any arcs which start or end within the same sequence are still ordered correctly, and the same in each flank, and any "fly-over" arcs from the left to the right flanks. Any arcs between the flanks and X,A,B,Y are also still ordered correctly as our reordering is restricted to this local region. So we only need to consider arcs between our four sequences. Consider each possible "problematic" arc for our final ordering XABY in turn: YB YA YX BA BX AX. Our initial order was B[XY]A, so AX and YB cannot occur. X reaches A, but Y does not, therefore YX and YA do not occur or A could be reached from the source of the arc in Y (potentially via X) a contradiction. Our criterion for acceptability was that there are no links BX or BA. So there are no problematic arcs, and we are still topologically sorted.
Our only acceptability criterion (that A is not reachable from B) is clearly sufficient to create a cycle on adding the arc A->B: B -(X)-> A -> B, so the converse is also shown.
This can be implemented reasonably efficiently if we can add a flag to each node. Consider the nodes [BXY] going right-to-left from the node immediately to the left of A. If that node has a direct arc to A then set the flag. At an arbitrary such node, we need only consider direct outgoing arcs: the nodes to its right are either after A (and so irrelevant), or else have already been flagged if reachable from A, so the flag on such an arbitrary node is set when any flagged nodes are encountered by direct link. If B is not flagged at the end of the process, the reordering is acceptable and the flagged nodes comprise X.
Though this always yields a correct ordering if carried to completion (as far as I can tell), as I mentioned in the introduction it is particularly efficient if your initial build is approximately correct (in the sense of accommodating of likely additional arcs without reordering).
There also exists an effective approximation, if your context is such that "outrageous" arcs can be rejected (those which would massively reorder) by limiting the A to B distance you are prepared to scan. If you have an initial list of the additional arcs you wish to add, they can be ordered by increasing distance in the initial ordering until you run out of some scanning "credit", and call your optimisation a day at that point.
If the graph is directed, you would only have to check the parent nodes (navigate up until you reach the root) of the node where the new edge should start. If one of the parent nodes is equal to the end of the edge, adding the edge would create a cycle.
If all previous jobs are in Topologically sorted order. Then if you add an edge that appears to brake the sort, and can not be fixed, then you have a cycle.
https://stackoverflow.com/a/261621/831850
So if we have a sorted list of nodes:
1, 2, 3, ..., x, ..., z, ...
Such that each node is waiting for nodes to its left.
Say we want to add an edge from x->z. Well that appears to brake the sort. So we can move the node at x to position z+1 which will fix the sort iif none of the nodes (x, z] have an edge to the node at x.
A question to the following exercise:
Let N = (V,E,c,s,t) be a flow network such that (V,E) is acyclic, and let m = |E|. Describe a polynomial-
time algorithm that checks whether N has a unique maximum flow, by solving ≤ m + 1 max-flow problems.
Explain correctness and running time of the algorithm
My suggestion would be the following:
run FF (Ford Fulkerson) once and save the value of the flow v(f) and the flow over all egdes f(e_i)
for each edge e_i with f(e_i)>0:
set capacity (in this iteration) of this edge c(e_i)=f(e_i)-1 and run FF.
If the value of the flow is the same as in the original graph, then there exists another way to push the max flow through the network and we're done - the max flow isn't unique --> return "not unique"
Otherwise we continue
we're done with looping without finding another max flow of same value, that means max flow is unique -> return "unique"
Any feedback? Have I overlooked some cases where this does not work?
Your question leaves a few details open, e.g., is this an integer flow graph (probably yes, although Ford-Fulkerson, if it converges, can run on other networks as well), and how exactly do you define whether two flows are different (is it enough that the function mapping edges to flows be different, or must the set of edges actually flowing something be different, which is a stronger requirement).
If the network is not necessarily integer flows, then, no, this will not necessarily work. Consider the following graph, where, on each edge, the number within the parentheses represents the actual flow, and the number to the left of the parentheses represents the capacity (e.g., the capacity of each of (a, c) and (c, d) is 1.1, and the flow of each is 1.):
In this graph, the flow is non-unique. It's possible to flow a total of 1 by floating 0.5 through (a, b) and (b, d). Your algorithm, however, won't find this by reducing the capacity of each of the edges to 1 below its current flow.
If the network is integer, it is not guaranteed to find a different set of participating edges than the current one. You can see it through the following graph:
Finally, though, if the network is an integer flow network, and the meaning of a different flow is simply a different function of edges to flows, then your algorithm is correct.
Sufficiency If your algorithm finds a different flow with the same total result, then obviously the new flow is legal, and, also, necessarily, at least one of the edges is flowing a different amount than it did before.
Necessity Suppose there is a different flow than the original one (with the same total value), with at least one of the edges flowing a different amount. Say that, for each edge, the flow in the alternative solution is not less than the flow in the original solution. Since the flows are different, there must be at least a single edge where the flow in the alternative solution increased. Without a different edge decreasing the flow, though, there is either a violation of the conservation of flow, or the original solution was suboptimal. Hence there is some edge e where the flow in the alternative solution is lower than in the original solution. Since it is an integer flow network, the flow must be at least 1 lower on e. By definition, though, reducing the capacity of e to at least 1 lower than the current flow, will not make the alternative flow illegal. Hence some alternative flow must be found if the capacity is decreased for e.
non integer, rational flows can be 'scaled' to integer
changing edges capacity is risky, because some edges may be critical and are included in every max flow
there is a better runtime solution, you don't need to check every single edge.
create a residual network (https://en.wikipedia.org/wiki/Flow_network). run DFS on the residual network graph, if you find a circle it means there is another max flow, wherein the flow on at least one edge is different.
I have Graph with N nodes and edges with cost. (graph may be Complete but also can contain zero edges).
I want to find K trees in the graph (K < N) to ensure every node is visited and cost is the lowest possible.
Any recommendations what the best approach could be?
I tried to modify the problem to finding just single minimal spanning tree, but didn't succeeded.
Thank you for any hint!
EDIT
little detail, which can be significant. To cost is not related to crossing the edge. The cost is the price to BUILD such edge. Once edge is built, you can traverse it forward and backwards with no cost. The problem is not to "ride along all nodes", the problem is about "creating a net among all nodes". I am sorry for previous explanation
The story
Here is the story i have heard and trying to solve.
There is a city, without connection to electricity. Electrical company is able to connect just K houses with electricity. The other houses can be connected by dropping cables from already connected houses. But dropping this cable cost something. The goal is to choose which K houses will be connected directly to power plant and which houses will be connected with separate cables to ensure minimal cable cost and all houses coverage :)
As others have mentioned, this is NP hard. However, if you're willing to accept a good solution, you could use simulated annealing. For example, the traveling salesman problem is NP hard, yet near-optimal solutions can be found using simulated annealing, e.g. http://www.codeproject.com/Articles/26758/Simulated-Annealing-Solving-the-Travelling-Salesma
You are describing something like a cardinality constrained path cover. It's in the Traveling Salesman/ Vehicle routing family of problems and is NP-Hard. To create an algorithm you should ask
Are you only going to run it on small graphs.
Are you only going to run it on special cases of graphs which do have exact algorithms.
Can you live with a heuristic that solves the problem approximately.
Assume you can find a minimum spanning tree in O(V^2) using prim's algorithm.
For each vertex, find the minimum spanning tree with that vertex as the root.
This will be O(V^3) as you run the algorithm V times.
Sort these by total mass (sum of weights of their vertices) of the graph. This is O(V^2 lg V) which is consumed by the O(V^3) so essentially free in terms of order complexity.
Take the X least massive graphs - the roots of these are your "anchors" that are connected directly to the grid, as they are mostly likely to have the shortest paths. To determine which route it takes, you simply follow the path to root in each node in each tree and wire up whatever is the shortest. (This may be further optimized by sorting all paths to root and using only the shortest ones first. This will allow for optimizations on the next iterations. Finding path to root is O(V). Finding it for all V X times is O(V^2 * X). Because you would be doing this for every V, you're looking at O(V^3 * X). This is more than your biggest complexity, but I think the average case on these will be small, even if their worst case is large).
I cannot prove that this is the optimal solution. In fact, I am certain it is not. But when you consider an electrical grid of 100,000 homes, you can not consider (with any practical application) an NP hard solution. This gives you a very good solution in O(V^3 * X), which I imagine is going to give you a solution very close to optimal.
Looking at your story, I think that what you call a path can be a tree, which means that we don't have to worry about Hamiltonian circuits.
Looking at the proof of correctness of Prim's algorithm at http://en.wikipedia.org/wiki/Prim%27s_algorithm, consider taking a minimum spanning tree and removing the most expensive X-1 links. I think the proof there shows that the result has the same cost as the best possible answer to your problem: the only difference is that when you compare edges, you may find that the new edge join two separated components, but in this case you can maintain the number of separated components by removing an edge with cost at most that of the new edge.
So I think an answer for your problem is to take a minimum spanning tree and remove the X-1 most expensive links. This is certainly the case for X=1!
Here is attempt at solving this...
For X=1 I can calculate minimal spanning tree (MST) with Prim's algorithm from each node (this node is the only one connected to the grid) and select the one with the lowest overall cost
For X=2 I create extra node (Power plant node) beside my graph. I connect it with random node (eg. N0) by edge with cost of 0. I am now sure I have one power plant plug right (the random node will definitely be in one of the tree, so whole tree will be connected). Now the iterative part. I take other node (eg. N1) and again connected with PP with 0 cost edge. Now I calculate MST. Then repeat this process with replacing N1 with N2, N3 ...
So I will test every pair [N0, NX]. The lowest cost MST wins.
For X>2 is it really the same as for X=2, but I have to test connect to PP every (x-1)-tuple and calculate MST
with x^2 for MST I have complexity about (N over X-1) * x^2... Pretty complex, but I think it will give me THE OPTIMAL solution
what do you think?
edit by random node I mean random but FIXED node
attempt to visualize for x=2 (each description belongs to image above it)
Let this be our city, nodes A - F are houses, edges are candidates to future cables (each has some cost to build)
Just for image, this could be the solution
Let the green one be the power plant, this is how can look connection to one tree
But this different connection is really the same (connection to power plant(pp) cost the same, cables remains untouched). That is why we can set one of the nodes as fixed point of contact to the pp. We can be sure, that the node will be in one of the trees, and it does not matter where in the tree is.
So let this be our fixed situation with G as PP. Edge (B,G) with zero cost is added.
Now I am trying to connect second connection with PP (A,G, cost 0)
Now I calculate MST from the PP. Because red edges are the cheapest (the can actually have even negative cost), is it sure, that both of them will be in MST.
So when running MST I get something like this. Imagine detaching PP and two MINIMAL COST trees left. This is the best solution for A and B are the connections to PP. I store the cost and move on.
Now I do the same for B and C connections
I could get something like this, so compare cost to previous one and choose the better one.
This way I have to try all the connection pairs (B,A) (B,C) (B,D) (B,E) (B,F) and the cheapest one is the winner.
For X=3 I would just test other tuples with one fixed again. (A,B,C) (A,B,D) ... (A,C,D) ... (A,E,F)
I just came up with the easy solution as follows:
N - node count
C - direct connections to the grid
E - available edges
1, Sort all edges by cost
2, Repeat (N-C) times:
Take the cheapest edge available
Check if adding this edge will not caused circles in already added edge
If not, add this edge
3, That is all... You will end up with C disjoint sets of edges, connect every set to the grid
Sounds like the famous Traveling Salesman problem. The problem known to be NP-hard. Take a look at the Wikipedia as your starting point: http://en.wikipedia.org/wiki/Travelling_salesman_problem
The minimum spanning tree problem is to take a connected weighted graph and find the subset of its edges with the lowest total weight while keeping the graph connected (and as a consequence resulting in an acyclic graph).
The algorithm I am considering is:
Find all cycles.
remove the largest edge from each cycle.
The impetus for this version is an environment that is restricted to "rule satisfaction" without any iterative constructs. It might also be applicable to insanely parallel hardware (i.e. a system where you expect to have several times more degrees of parallelism then cycles).
Edits:
The above is done in a stateless manner (all edges that are not the largest edge in any cycle are selected/kept/ignored, all others are removed).
What happens if two cycles overlap? Which one has its longest edge removed first? Does it matter if the longest edge of each is shared between the two cycles or not?
For example:
V = { a, b, c, d }
E = { (a,b,1), (b,c,2), (c,a,4), (b,d,9), (d,a,3) }
There's an a -> b -> c -> a cycle, and an a -> b -> d -> a
#shrughes.blogspot.com:
I don't know about removing all but two - I've been sketching out various runs of the algorithm and assuming that parallel runs may remove an edge more than once I can't find a situation where I'm left without a spanning tree. Whether or not it's minimal I don't know.
For this to work, you'd have to detail how you would want to find all cycles, apparently without any iterative constructs, because that is a non-trivial task. I'm not sure that's possible. If you really want to find a MST algorithm that doesn't use iterative constructs, take a look at Prim's or Kruskal's algorithm and see if you could modify those to suit your needs.
Also, is recursion barred in this theoretical architecture? If so, it might actually be impossible to find a MST on a graph, because you'd have no means whatsoever of inspecting every vertex/edge on the graph.
I dunno if it works, but no matter what your algorithm is not even worth implementing. Finding all cycles will be the freaking huge bottleneck that will kill it. Also doing that without iterations is impossible. Why don't you implement some standard algorithm, let's say Prim's.
Your algorithm isn't quite clearly defined. If you have a complete graph, your algorithm would seem to entail, in the first step, removing all but the two minimum elements. Also, listing all the cycles in a graph can take exponential time.
Elaboration:
In a graph with n nodes and an edge between every pair of nodes, there are, if I have my math right, n!/(2k(n-k)!) cycles of size k, if you're counting a cycle as some subgraph of k nodes and k edges with each node having degree 2.
#Tynan The system can be described (somewhat over simplified) as a systems of rules describing categorizations. "Things are in category A if they are in B but not in C", "Nodes connected to nodes in Z are also in Z", "Every category in M is connected to a node N and has 'child' categories, also in M for every node connected to N". It's slightly more complicated than this. (I have shown that by creating unstable rules you can model a turning machine but that's beside the point.) It can't explicitly define iteration or recursion but can operate on recursive data with rules like the 2nd and 3rd ones.
#Marcin, Assume that there are an unlimited number of processors. It is trivial to show that the program can be run in O(n^2) for n being the longest cycle. With better data structures, this can be reduced to O(n*O(set lookup function)), I can envision hardware (quantum computers?) that can evaluate all cycles in constant time. giving a O(1) solution to the MST problem.
The Reverse-delete algorithm seems to provide a partial proof of correctness (that the proposed algorithm will not produce a non-minimal spanning tree) this is derived by arguing that mt algorithm will remove every edge that the Reverse-delete algorithm will. However I'm not sure how to show that my algorithm won't delete more than that algorithm.
Hhmm....
OK this is an attempt to finish the proof of correctness. By analogy to the Reverse-delete algorithm, we know that enough edges will be removed. What remains is to show that there will not be to many edges removed.
Removing to many edges can be described as removing all the edges between the side of a binary partition of the graph nodes. However only edges in a cycle are ever removed, therefor, for all edge between partitions to be removed, there needs to be a return path to complete the cycle. If we only consider edges between the partitions then the algorithm can at most remove the larger of each pair of edges, this can never remove the smallest bridging edge. Therefor for any arbitrary binary partitioning, the algorithm can't sever all links between the side.
What remains is to show that this extends to >2 way partitions.