For an audio processing chain (like Propellerheads' Reason), I'm developing a circuit of nodes which communicate with each other in an environment where there may be loops (shown below).
Audio devices (nodes) are processed in batches of 64 audio frames, and in the ideal situation, communication propagates within a single batch. Unfortunately feedback loops can be created, which causes a delay in the chain.
I am looking for the best type of algorithm to consistently minimize feedback loops?
In my system, a cycle leads to at least one audio device having to be a "feedback node" (shown below), which means its "feedback input" cannot be processed within the same batch.
An example of feedback can be seen in the following processing schedule:
D -> A
A -> B
B -> C
C -> 'D
In this case the output from C to D has to be processed on the next batch.
Below is an example of an inefficient processing schedule which results in two feedback loops:
A -> B
B -> C
C -> D
D -> E, 'A
E -> F
F -> G
G -> 'D
Here the output from G to D, and D to A must be processed on the next batch. This means that the output from G reaches A after 2 batches, compared to the output from A to D occurring within the same batch.
The most efficient processing schedule begins with D, which results in just one feedback node (D).
How large can this graph become?
It's quite common to have 1000 audio devices (for example a song with 30 audio channels, with 30 effects devices connected), though the there are typically 2-4 outputs per device and the circuits aren't incredibly complex. Instead audio devices tend to be connected with localised scopes so circuits (if they do exist) are more likely to be locally confined though I just need to prepare the most efficient node schedule to reduce the number feedbacks.
A pair of audio devices with two paths (ideally) should not have mismatched feedback nodes between them
Suppose there are two nodes, M and N, with two separate paths from M to N, there should not be a feedback node on one path but not on the other as this would desynchronise the input to N, which is highly undesired. This aim complicates the algorithm further. But I will examine how Reason behaves (as it might not actually be so complex).
This survey describes several approaches to feedback set problems but only briefly describes branch and bound. I think that branch and bound is a promising approach, so I'll expand on that description here.
In branch and bound, we explore a search tree consisting of nodes where each vertex is assigned a label 0, 1, or ?. The meaning of ? is that we don't know what label to give yet, and the root node has all vertices labeled ?. The leaves of the search tree have no vertices labeled ?. The children of a node where at least one vertex is labeled ? are determined by choosing an arbitrary vertex labeled ? and letting it be 0 in the left child and 1 in the right. This is branching.
To bound a node, we do something to determine a lower bound (because we're minimizing) on the number of vertices labeled 1 in a solution where each of the ?s is replaced by a 0 or a 1. If this lower bound is no better than the best solution that we have found so far, then there is no need to explore the subtree further. For proving optimality, the best approach, given space, is depth-first search with best-first backtracking. The depth-first search part consists of repeatedly exploring the more promising child (lower lower bound) and putting the other into a priority queue. Then, when we get stuck because we're at a leaf or because the node got pruned, we pull the most promising possibility out of the queue. We stop when the queue is empty.
One very common approach for obtaining bounds is linear programming. Instead of labeling vertices 0 or 1, it turns out that if we allow fractional labels in the interval [0, 1], then we can find a "solution" relatively efficiently. This cost of this solution is no greater than the true optimum, but it's not possible, of course, to have a node be "half feedback". Select one of the vertices with a fractional label (closest to 0.5 is often a good bet) and branch on it.
In fact, this approach is so common that most linear programming solvers provide a convenient interface to it in the form of integer programming. Unfortunately, integer programming won't work directly for us, because the program has too many constraints: one for each simple cycle. (Come to think of it, if there aren't too many simple cycles, then you could use integer programming after all.)
The linear program for feedback vertex set looks like this. The variable x_v is the solution label: 0 if v is combinatorial, 1 if v is feedback, fractional values interpolating between those two possibilities.
minimize sum_v x_v (as few feedback vertices as possible)
subject to
for all simple cycles C, sum_{v in C} x_v >= 1 (at least one vertex on each cycle)
for all v, x_v >= 0 (vertices cannot be "negative feedback")
You actually want to solve the dual program, which by weak LP duality, lower bounds the optimal solution when feasible.
maximize sum_C y_C
subject to
for all vertices v, sum_{C ni v} y_C <= 1
for all C, y_C >= 0
The intuitive meaning of this program is as follows. Suppose that we identified disjoint simple cycles. Each of these cycles contains at least one feedback vertex, and these vertices are distinct. This is the fractional analog of that technique (which works on every LP, not just this one; the dual of this LP is the first LP again).
The technique for solving this LP is called column (i.e., variable) generation. Initially, we send it to the solver with no variables. We then interact with the solver repeatedly, getting solutions and adding variables that look useful, until it becomes clear that we've reached an optimum (or stalled). The solver returns corresponding values for x_v for all v and y_C for the C that we've told it about. To find another cycle C' that we should include, we want sum_{v in C'} x_v < 1, preferably much less. Label each arc by the value of x_v, where x_v is its head. For each vertex, run Dijkstra to find shortest paths, then check whether the arcs into that vertex form a short cycle.
This is complicated slightly by the side constraints imposed by the current node. If a vertex is labeled 0 or 1, then we omit its constraint from the dual and use that label in place of x_v in labeling the graph for Dijkstra.
I hope that, when you determine what the side constraints are, we can devise another column generation strategy to deal with them. I'm more confident about that than I would be modifying the combinatorial reduction strategies surveyed.
Related
Is there an algorithm that takes a directed graph as input, and as output gives the minimum number of edges to "break" in order to prevent loops?
As an example, the loops in the above graph are:
A, B, C, D
A, B, E
E, G, H, F
And the minimum number of edges to break all of them:
A - B, breaks 2 loops
E - G, breaks 1 loop
It gets more complicated when the loops are nested inside each other and share edges.
My current approach is to find all the loops, group by most common edge, order desc, and break them in that order whilst the loops are still unbroken from the previous iterations.
I have tried a few methods and they all vary by the count of edges they break - I'm looking for the minimum theoretically possible.
Is there an established algorithm that does this?
This is the NP-hard Minimum Feedback Arc Set problem. Wikipedia doesn't point at an exact algorithm for small- to medium-size graphs, so let me suggest one.
Using an integer programming solver library such as OR-Tools, we formulate an integer program with a 0-1 variable for each arc, where 0 means retain the arc and 1 means delete the arc. The objective is to minimize the sum of these variables.
Now, we need to specify constraints, but in general there can be an exponential number of cycles, which would quickly overwhelm the solver as the graph grows. Instead, do the following:
Solve the integer program (initially, with no constraints).
Use breadth-first search to find a shortest cycle in the retained edges, if there is one.
If there is a cycle, add a constraint that the sum of the corresponding variables must be greater than or equal to one, then go back to Step 1. Otherwise, the current solution is optimal.
I'm looking at several problems of similar format but different difficulty. Help would be appreciated for polynomial (preferably relatively fast, but not necessarily), and even brute-force solutions of any of them.
The idea of all of the problems is that you have a weighted, undirected graph, and that an agent controls some of the nodes of the graph at the start. The agent can gain control of a node if they already control two adjacent nodes. The agent is trying to minimise the time they take to control a certain number of nodes. The problems differ on some details.
(1) You gain control of nodes in order (ie. you cannot take over multiple nodes simultaneously). The time taken to take control of a node is defined as the minimum of the edges from the two nodes used to take control of it. The goal is to take control of every single node in the graph.
(2) Again, you gain nodes in order and the goal is to take control of every single node in the graph. The time taken to take control of a node is defined as the maximum of the two nodes used to take control of it.
(3) Either (1) or (2), but with the goal of taking control of a certain number of nodes, not necessarily all of them.
(4) (3), but you can take control of multiple nodes simultaneously. Basically, say nodes 2 and 4 are being used to take over node 3 in time of 5. During that time of 5, nodes 2 and 4 cannot be used to take over a node that is not node 3. However, nodes 5 and 6 may for example be simultaneously taking over node 1.
(5) (4), but with an unweighted graph.
I started with the problem (4). I progressively made the problem easier from (4) to (3) to (2) to (1) with the hopes I could construct the solution for (4) from that. Finally, I have solved (1) but do not know how to solve any other one. My solution to (1) is this: of all candidate nodes which have two adjacent nodes that we control, simply take the one which takes the shortest amount of time. This is similar to Dijkstra's shortest path algorithm. However, this kind of solution should not solve any of the others. I believe that possibly a dynamic programming solution might work though, but I have no idea how to formulate one. I also have not found brute force solutions for any of the 4 problems. It is also possible that some of the problems are not polynomially solvable, and I would be curious to know why if that is the case.
Idea for the questions are my own, and I'm solving for my own entertainment. But I would not be surprised if it can be found elsewhere.
This isn't an answer to the problem. It is a demonstration that the greedy approach fails for problem 1.
Suppose that we have a graph with 7 nodes. We start by controlling A and B. The cost from A to B and B to C and C to D are all 1. Both E and F connect to A, B, and D with cost 10. G connects to A, B, C, and D with cost 100.
The greedy strategy that you describe will connect to E and F at cost 10 each, then D at cost 10, then C at cost 1, then G at cost 100 for a total cost of 131.
The best strategy is to connect to G at cost 100, then C and D at cost 1, then E and F at cost 10 for a total cost of 122 < 131.
And this example demonstrates that greedy is not always going to produce the right answer.
I haven't been able to come up with a reduction yet, but these problems have the flavor of NP-hard network design and maximum coverage problems, so I would be quite surprised if variants (3) through (5) were tractable.
My practical suggestion would be to apply the Biased Random-Key Genetic Algorithm framework. The linked slide deck covers the generic part (an individual is a map from nodes to numbers; at each step, we rank individuals, retain the top x% "elite" individuals as is, produce y% offspring by crossing a random elite individual with a random non-elite individual, biased toward selecting the elite chromosomes, and fill out the rest of the population with random individuals). The non-generic part is translating an individual into a solution. My recommended starting point would be to choose to explore the lowest-numbered eligible node each time.
I came upon wait-for graphs and I wonder, are there any efficient algorithms for detecting if adding an edge to a directed graph results in a cycle?
The graphs in question are mutable (they can have nodes and edges added or removed). And we're not interested in actually knowing an offending cycle, just knowing there is one is enough (to prevent adding an offending edge).
Of course it'd be possible to use an algorithm for computing strongly connected components (such as Tarjan's) to check if the new graph is acyclic or not, but running it again every time an edge is added seems quite inefficient.
If I understood your question correctly, then a new edge (u,v) is only inserted if there was no path from v to u before (i.e., if (u,v) does not create a cycle). Thus, your graph is always a DAG (directed acyclic graph). Using Tarjan's Algorithm to detect strongly connected components (http://en.wikipedia.org/wiki/Tarjan%27s_strongly_connected_components_algorithm) sounds like an overkill in this case. Before inserting (u,v), all you have to check is whether there is a directed path from v to u, which can be done with a simple BFS/DFS.
So the simplest way of doing it is the following (n = |V|, m = |E|):
Inserting (u,v): Check whether there is a path from v to u (BFS/DFS). Time complexity: O(m)
Deleting edges: Simply remove them from the graph. Time complexity: O(1)
Although inserting (u,v) takes O(m) time in the worst case, it is probably pretty fast in your situation. When doing the BFS/DFS starting from v to check whether u is reachable, you only visit vertices that are reachable from v. I would guess that in your setting the graph is pretty sparse and that the number of vertices reachable by another is not that high.
However, if you want to improve the theoretical running time, here are some hints (mostly showing that this will not be very easy). Assume we aim for testing in O(1) time whether there exists a directed path from v to u. The keyword in this context is the transitive closure of a DAG (i.e., a graph that contains an edge (u, v) if and only if there is a directed path from u to v in the DAG). Unfortunately, maintaining the transitive closure in a dynamic setting seems to be not that simple. There are several papers considering this problem and all papers I found were STOC or FOCS papers, which indicates that they are very involved. The newest (and fastest) result I found is in the paper Dynamic Transitive Closure via Dynamic Matrix Inverse by Sankowski (http://dl.acm.org/citation.cfm?id=1033207).
Even if you are willing to understand one of those dynamic transitive closure algorithms (or even want to implement it), they will not give you any speed up for the following reason. These algorithms are designed for the situation, where you have a lot of connectivity queries (which then can be performed in O(1) time) and only few changes in the graph. The goal then is to make these changes cheaper than recomputing the transitive closure. However, this update is still slower that a single check for connectivity. Thus, if you need to do an update on every connectivity query, it is better to use the simple approach mentioned above.
So why do I mention this approach of maintaining the transitive closure if it does not fit your needs? Well, it shows that searching an algorithm consuming only O(1) query time does probably not lead you to a solution faster than the simple one using BFS/DFS. What you could try is to get a query time that is faster than O(m) but worse than O(1), while updates are also faster than O(m). This is a very interesting problem, but it sounds to me like a very ambitious goal (so maybe do not spend too much time on trying to achieve it..).
As Mark suggested it is possible to use data structure that stores connected nodes. It is the best to use boolean matrix |V|x|V|. Values can be initialized with Floyd–Warshall algorithm. That is done in O(|V|^3).
Let T(i) be set of vertices that have path to vertex i, and F(j) set of vertices where exists path from vertex j. First are true's in i'th row and second true's in j'th column.
Adding an edge (i,j) is simple operation. If i and j wasn't connected before, than for each a from T(i) and each b from F(j) set matrix element (a,b) to true. But operation isn't cheap. In worst case it is O(|V|^2). That is in case of directed line, and adding edge from end to start vertex makes all vertices connected to all other vertices.
Removing an edge (i,j) is not so simple, but not more expensive operation in the worst case :-) If there is a path from i to j after removing edge, than nothing changes. That is checked with Dijkstra, less than O(|V|^2). Vertices that are not connected any more are (a,b):
a in T(i) - i - T(j),
b in F(j) + j
Only T(j) is changed with removing edge (i,j), so it has to be recalculated. That is done by any kind of graph traversing (BFS, DFS), by going in opposite edge direction from vertex j. That is done in less then O(|V|^2). Since setting of matrix element is in worst case is again O(|V|^2), this operation has same worst case complexity as adding edge.
This is a problem which I recently faced in a slightly different situation (optimal ordering of interdependent compiler instructions).
While I can't improve on O(n*n) theoretical bounds, after a fair amount of experimentation and assuming heuristics for my case (for example, assuming that the initial ordering wasn't created maliciously) the following was the best compromise algorithm in terms of performance.
(In my case I had an acceptable "right side failure": after the initial nodes and arcs were added (which was guaranteed to be possible), it was acceptable for the optimiser to occasionally reject the addition of further arcs where one could actually be added. This approximation isn't necessary for this algorithm when carried to completion, but it does admit such an approximation if you wish to do so, and so limiting its runtime further).
While a graph is topologically sorted, it is guaranteed to be cycle-free. In the first phase when I had a static bulk of nodes and arcs to add, I added the nodes and then topologically sorted them.
During the second phase, adding additional arcs, there are two situations when considering an arc from A to B. If A already lies to the left of B in the sort, an arc can simply be added and no cycle can be generated, as the list is still topologically sorted.
If B is to the left of A, we consider the sub-sequence between B and A and partition it into two disjoint sequences X, Y, where X is those nodes which can reach A (and Y the others). If A is not reachable from B, ie there are no direct arcs from B into X or to A, then the sequence can be reordered XABY before adding the A to B arc, showing it is still cycle-free and maintaining the topological sort. The efficiency over the naive algorithm here is that we only need consider the subsequence between B and A as our list is topologically sorted: A is not reachable from any node to the right of A. For my situation, where localised reorderings are the most frequent and important, this an important gain.
As we don't reorder within the sequences X,A,B,Y, clearly any arcs which start or end within the same sequence are still ordered correctly, and the same in each flank, and any "fly-over" arcs from the left to the right flanks. Any arcs between the flanks and X,A,B,Y are also still ordered correctly as our reordering is restricted to this local region. So we only need to consider arcs between our four sequences. Consider each possible "problematic" arc for our final ordering XABY in turn: YB YA YX BA BX AX. Our initial order was B[XY]A, so AX and YB cannot occur. X reaches A, but Y does not, therefore YX and YA do not occur or A could be reached from the source of the arc in Y (potentially via X) a contradiction. Our criterion for acceptability was that there are no links BX or BA. So there are no problematic arcs, and we are still topologically sorted.
Our only acceptability criterion (that A is not reachable from B) is clearly sufficient to create a cycle on adding the arc A->B: B -(X)-> A -> B, so the converse is also shown.
This can be implemented reasonably efficiently if we can add a flag to each node. Consider the nodes [BXY] going right-to-left from the node immediately to the left of A. If that node has a direct arc to A then set the flag. At an arbitrary such node, we need only consider direct outgoing arcs: the nodes to its right are either after A (and so irrelevant), or else have already been flagged if reachable from A, so the flag on such an arbitrary node is set when any flagged nodes are encountered by direct link. If B is not flagged at the end of the process, the reordering is acceptable and the flagged nodes comprise X.
Though this always yields a correct ordering if carried to completion (as far as I can tell), as I mentioned in the introduction it is particularly efficient if your initial build is approximately correct (in the sense of accommodating of likely additional arcs without reordering).
There also exists an effective approximation, if your context is such that "outrageous" arcs can be rejected (those which would massively reorder) by limiting the A to B distance you are prepared to scan. If you have an initial list of the additional arcs you wish to add, they can be ordered by increasing distance in the initial ordering until you run out of some scanning "credit", and call your optimisation a day at that point.
If the graph is directed, you would only have to check the parent nodes (navigate up until you reach the root) of the node where the new edge should start. If one of the parent nodes is equal to the end of the edge, adding the edge would create a cycle.
If all previous jobs are in Topologically sorted order. Then if you add an edge that appears to brake the sort, and can not be fixed, then you have a cycle.
https://stackoverflow.com/a/261621/831850
So if we have a sorted list of nodes:
1, 2, 3, ..., x, ..., z, ...
Such that each node is waiting for nodes to its left.
Say we want to add an edge from x->z. Well that appears to brake the sort. So we can move the node at x to position z+1 which will fix the sort iif none of the nodes (x, z] have an edge to the node at x.
A question to the following exercise:
Let N = (V,E,c,s,t) be a flow network such that (V,E) is acyclic, and let m = |E|. Describe a polynomial-
time algorithm that checks whether N has a unique maximum flow, by solving ≤ m + 1 max-flow problems.
Explain correctness and running time of the algorithm
My suggestion would be the following:
run FF (Ford Fulkerson) once and save the value of the flow v(f) and the flow over all egdes f(e_i)
for each edge e_i with f(e_i)>0:
set capacity (in this iteration) of this edge c(e_i)=f(e_i)-1 and run FF.
If the value of the flow is the same as in the original graph, then there exists another way to push the max flow through the network and we're done - the max flow isn't unique --> return "not unique"
Otherwise we continue
we're done with looping without finding another max flow of same value, that means max flow is unique -> return "unique"
Any feedback? Have I overlooked some cases where this does not work?
Your question leaves a few details open, e.g., is this an integer flow graph (probably yes, although Ford-Fulkerson, if it converges, can run on other networks as well), and how exactly do you define whether two flows are different (is it enough that the function mapping edges to flows be different, or must the set of edges actually flowing something be different, which is a stronger requirement).
If the network is not necessarily integer flows, then, no, this will not necessarily work. Consider the following graph, where, on each edge, the number within the parentheses represents the actual flow, and the number to the left of the parentheses represents the capacity (e.g., the capacity of each of (a, c) and (c, d) is 1.1, and the flow of each is 1.):
In this graph, the flow is non-unique. It's possible to flow a total of 1 by floating 0.5 through (a, b) and (b, d). Your algorithm, however, won't find this by reducing the capacity of each of the edges to 1 below its current flow.
If the network is integer, it is not guaranteed to find a different set of participating edges than the current one. You can see it through the following graph:
Finally, though, if the network is an integer flow network, and the meaning of a different flow is simply a different function of edges to flows, then your algorithm is correct.
Sufficiency If your algorithm finds a different flow with the same total result, then obviously the new flow is legal, and, also, necessarily, at least one of the edges is flowing a different amount than it did before.
Necessity Suppose there is a different flow than the original one (with the same total value), with at least one of the edges flowing a different amount. Say that, for each edge, the flow in the alternative solution is not less than the flow in the original solution. Since the flows are different, there must be at least a single edge where the flow in the alternative solution increased. Without a different edge decreasing the flow, though, there is either a violation of the conservation of flow, or the original solution was suboptimal. Hence there is some edge e where the flow in the alternative solution is lower than in the original solution. Since it is an integer flow network, the flow must be at least 1 lower on e. By definition, though, reducing the capacity of e to at least 1 lower than the current flow, will not make the alternative flow illegal. Hence some alternative flow must be found if the capacity is decreased for e.
non integer, rational flows can be 'scaled' to integer
changing edges capacity is risky, because some edges may be critical and are included in every max flow
there is a better runtime solution, you don't need to check every single edge.
create a residual network (https://en.wikipedia.org/wiki/Flow_network). run DFS on the residual network graph, if you find a circle it means there is another max flow, wherein the flow on at least one edge is different.
I'm trying to understand how the LP formulation for shortest path problem works. However I'm having trouble understanding constraints. Why does this formulation work?
http://ie.bilkent.edu.tr/~ie400/Lecture8.pdf
I'm having trouble understanding how the constraints work at pages 15 and 17. I got the main idea and I understand how and why x should take some values but I did not understand how the whole system works in terms of math. Can someone explain? In the exam, I am supposed to be able to create and modify such constraints but I am pretty far from doing that.
What isn't very clear on those slides (pp. 15 and 17) is that the line beginning with "s.t." is actually specifying one constraint per vertex i, i.e., n separate constraints in total (if there are n vertices). Normally this would be communicated by writing something like "∀i ϵ V" alongside the constraint.
In any case, this line says that for each vertex i, the total amount of flow entering it from any other vertices must equal the total amount of flow leaving it -- unless the vertex is the source, in which case the total amount of flow leaving it must be greater by 1, or the sink, in which case the total amount of flow entering it must be greater by 1. It may not be obvious how to come up with this system of constraints in the first place, but by looking at some examples you should be able to see that any shortest path (or in fact, any path from s to t at all) satisfies all of them: every internal vertex in the path will have 1 incoming edge and 1 outgoing edge, while s and t will have just 1 outgoing, or 1 incoming, edge, respectively. Vertices that don't participate in the path at all have 0 incoming and 0 outgoing flow, so they work too.
One more point is that with flow problems, very often the numbers labelling the edges represent capacity constraints -- maximum limits on the amount of flow between the two endpoints -- not costs as they do here.