I am trying to solve below question from tardos. Any suggestions or help would be appreciated.
You’ve been called in to help some network administrators diagnose the extent of a failure in their network. The network is designed to carry traffic from a designated source node s to a designated target node t, so we will model it as a directed graph G = (V,E), in which the capacity of each edge is 1, and in which each node lies on at least one path from s to t.
Now, when everything is running smoothly in the network, the maximum s-t flow in G has value k. However, the current situation - and the reason you’re here - is that an attacker has destroyed some of the edges in the network, so that there is now no path from s to t using the remaining (surviving) edges. For reasons that we won’t go into here, they believe the attacker has destroyed only k edges, the minimum number needed to separate s from t (i.e. the size of a minimum s-t cut); and we’ll assume they’re correct in believing this.
The network administrators are running a monitoring pool on node s, which has the following behavior: if you issue the command ping(v), for a given node v, it will tell you whether there is currently a path from s to v. (So pint(t) reports that no path currently exists; on the other hand, ping(s) always reports a path from s to itself.) Since it’s not practical to go out and inspect every edge of the network, they’d like to determine the extent of the failure using this monitoring tool, through judicious use of the ping command.
So here’s the problem you face: give an algorithm that issues a sequence of ping commands to various nodes in the network, and then reports the full set of nodes that are not currently reachable from s. You could do this by pinging every node in the network, of course, but you’d like to do it using many fewer pings (given the assumption that only k edges have been deleted). In issuing this sequence, your algorithm is allowed to decide which node to ping next based on the outcome of earlier ping operations.
Give an algorithm that accomplishes this task using only O(k log n) pings.
Use Floyd-Fulkerson on the complete network to calculate a max flow, which will consist of k edge-disjoint paths.
Since exactly k edges have been deleted, and all flow is cut off, exactly one edge must have been deleted along each of these paths.
For each path, which will contain at most n edges, do a binary search to discover the position of the broken edge, using O(log n) pings to the nodes on the path.
Related
What I want to solve:
I want to detect newly created unreachable subgraph(marking nodes that unreachable) in given directed graph if I cut the specific edge.
The restriction of this problem:
The given graph is directed graph (see useful information below.)
The number of the nodes are more than 100,000.
The number of the edges are around 1.5x of nodes.
Running time of the solution should be less than a second.
The information that might be useful:
The given graph was made by connecting numerous cycles. And, there are at least 1 route exists from any node to other node.
A few (~ 10%) of the nodes have the branch. No more than 3 edges on the node exists in the graph.
The meaning of "unreachable area" is including "not connected", but you can ignore this if you think this is mixing two different problems into one.
My trials
When I met this problem, I tried in 4-ways but no luck above of them.
Find the another path can replace the cut node.
This method is rejected because of running time of method. Currently we use Dijkstra Algorithm for path-find and when I tried this method by putting into job queue, the job queue was flooded in less than an hour.
Check level of edges (like packets' Time-To-Live from network.)
Search from edge node with given threshold level.
If I met the branch, keep previous level. Otherwise, decrease level.
If level is 0, do nothing.
Current temporal solution is this one, but obviously this solution ignores a lot of corner cases.
Simulate flow network to the graph.
It's simple:
Give a threshold(like 100) to every nodes and simulate its flow.
If I met the branch, split number into each branch.
Check the values that is lower than 1.
But this method is also rejected because of Time complexity.
SCC and Topological sorting.
Lastly, I check the Strongly-Connected-Components with Topological orders. (Of course I know I used wrong word, see below)
The idea is, topological sorting is used for DAG(Directed Acyclic Graph), but If I add some rules(like "If I detect cycle, treat that cycle as a virtual node, recursively", using SCC), I can check the "topological orders" for general directed graph. If I found the topological orders, this means that there is an area that unreachable. (It's hard to say, think about it with method 3: simulate flow network)
I think this approach is the best one, and might be solve the problem, but I have no ideas about keywords that should I search and learn about it. Same as implementation.
EDIT
I forgot the explanation of unreachable means. If there is no route from a node(node 'A') to any other node, node 'A' is "unreachable". Initially, at given digraph, there are no unreachable node exists.
In this problem, let's assume that node 1 is the master node. if there is no route from node 1 to node 2, then node 2 is unreachable.
I understand that distance vector routing protocol is a distributed version of Bellman-Ford algorithm.
It is used to find the shortest-path from every node to every other node in the network.
So, every node advertises it's routing table information (computed distances to all other nodes in the network) to its neighbors and at the same learns from the neighboring nodes.
So, my question is how long does these advertisement keeps happening between the neighbors? ie, since this is a distributed system, how does each node gets to know that the entire system has converged and I should stop advertising.
Like in case of Bellman-Ford (centralized) algorithm, we can say that convergence has happened when the number of iteration is equal to one shy of number of edges in the graph (one shy of number of links in the network), and we can stop the algorithm execution...
Little bit of more learning on the topic and search different article brought me to this below conclusion.
Below is the excerpt from Wikipedia - https://en.wikipedia.org/wiki/Distance-vector_routing_protocol#Example
None of the routers have any new shortest-paths to broadcast.
Therefore, none of the routers receive any new information that might
change their routing tables. The algorithm comes to a stop.
So what this indicates is - In the distributed network using distance vector protocol, the nodes stop advertising the information when there is no more change to their estimated shortest-path distance. And at this stage, you can call the distributed network has got converged.
The node in the network starts advertising its initial table (which will contain the distance information to the directly connected nodes) once it is part of the network.
And it keeps advertising the distance information to its neighbors until there is change to its table.
Periodically Routers send distance vectors to neighbours ,, it updates only if the new information present in the distance vector by comparing with old distance vector of neighbour v which is already present at the router... There is no stopping of algorithm...
We are given a network flow, as well as a max flow in the network. An edge would be called increasing edge if increasing its capacity in an arbitrary positive number, would also increase the max flow.
Present an algorithm the finds an increaing edge (if one exists) and runs at $O(n^2)$.
I thought about the following idea -
Find the minimum cut in the graph, as its given to us with the ford-fulkerson algorithm.
Increase the capacity of all the edges in the left hand side of the cut by 1.
Run BFS in the residual network to find if an improved path exists. If one exists, we have an increasing edge. To find it, we have to compare the original network with the new network. We have to do that n times since we have to check for an improved path every time we increase the capacity by 1.
Is it correct, an am I in line with the required running time?
Thank you!
I think you just need to find a path from the source to the sink that would be an augmenting path if at most one node were increased in capacity.
First find all the best paths to vertices you can reach with residual capacity. If you found the sink, then you weren't given a max flow to begin with.
Then find all the other vertices that are adjacent to those ones though edges that are at capacity.
Then try to find an augmenting path from those vertices to the sink.
Total complexity is O(N), so whoever asked you this question probably had something else in mind.
A question to the following exercise:
Let N = (V,E,c,s,t) be a flow network such that (V,E) is acyclic, and let m = |E|. Describe a polynomial-
time algorithm that checks whether N has a unique maximum flow, by solving ≤ m + 1 max-flow problems.
Explain correctness and running time of the algorithm
My suggestion would be the following:
run FF (Ford Fulkerson) once and save the value of the flow v(f) and the flow over all egdes f(e_i)
for each edge e_i with f(e_i)>0:
set capacity (in this iteration) of this edge c(e_i)=f(e_i)-1 and run FF.
If the value of the flow is the same as in the original graph, then there exists another way to push the max flow through the network and we're done - the max flow isn't unique --> return "not unique"
Otherwise we continue
we're done with looping without finding another max flow of same value, that means max flow is unique -> return "unique"
Any feedback? Have I overlooked some cases where this does not work?
Your question leaves a few details open, e.g., is this an integer flow graph (probably yes, although Ford-Fulkerson, if it converges, can run on other networks as well), and how exactly do you define whether two flows are different (is it enough that the function mapping edges to flows be different, or must the set of edges actually flowing something be different, which is a stronger requirement).
If the network is not necessarily integer flows, then, no, this will not necessarily work. Consider the following graph, where, on each edge, the number within the parentheses represents the actual flow, and the number to the left of the parentheses represents the capacity (e.g., the capacity of each of (a, c) and (c, d) is 1.1, and the flow of each is 1.):
In this graph, the flow is non-unique. It's possible to flow a total of 1 by floating 0.5 through (a, b) and (b, d). Your algorithm, however, won't find this by reducing the capacity of each of the edges to 1 below its current flow.
If the network is integer, it is not guaranteed to find a different set of participating edges than the current one. You can see it through the following graph:
Finally, though, if the network is an integer flow network, and the meaning of a different flow is simply a different function of edges to flows, then your algorithm is correct.
Sufficiency If your algorithm finds a different flow with the same total result, then obviously the new flow is legal, and, also, necessarily, at least one of the edges is flowing a different amount than it did before.
Necessity Suppose there is a different flow than the original one (with the same total value), with at least one of the edges flowing a different amount. Say that, for each edge, the flow in the alternative solution is not less than the flow in the original solution. Since the flows are different, there must be at least a single edge where the flow in the alternative solution increased. Without a different edge decreasing the flow, though, there is either a violation of the conservation of flow, or the original solution was suboptimal. Hence there is some edge e where the flow in the alternative solution is lower than in the original solution. Since it is an integer flow network, the flow must be at least 1 lower on e. By definition, though, reducing the capacity of e to at least 1 lower than the current flow, will not make the alternative flow illegal. Hence some alternative flow must be found if the capacity is decreased for e.
non integer, rational flows can be 'scaled' to integer
changing edges capacity is risky, because some edges may be critical and are included in every max flow
there is a better runtime solution, you don't need to check every single edge.
create a residual network (https://en.wikipedia.org/wiki/Flow_network). run DFS on the residual network graph, if you find a circle it means there is another max flow, wherein the flow on at least one edge is different.
I'm trying to optimize a graph-traversal problem, but can't figure out the best way to tackle it. It seems neither like A* search problem (because we want to maximize the path rather than minimizing it), nor traveling salesman problem (because we don't have to visit all cities). The simplified version of it is something along these lines:
We have a set of nodes and connections/edges. Connections are arbitrary and nodes can have one or more of them. Connections also have an interface/type associated with them, and interfaces can't support more than a single connection. So for example, if node A can connect to nodes B or C via interface alpha, and we decide to connect it to node B, that interface on node A can no longer support other connections, so C can't be connected to A anymore. However, we could connect node C to node D, if it happens to have the same alpha interface.
I should also mention that these interfaces work like lock-and-key, so A can connect to either B or C, but B and C can't connect to eachother (the interface is like a mirror). Also, while A can no longer connect to anything via the alpha interface because it's used by B, if it happens to have another interface (bravo) and something else can connect to bravo, then we can connect more than one node to A. The goal is to obtain the largest group of connected nodes (discarding all smaller groups).
There are a few heuristics I'm considering:
prefer nodes with more interfaces (I already discarded interfaces without pairs)
prefer interfaces that are more popular
The above two rules can be useful for prioritizing which node to try connecting to next (for now I naively grouped them into one rank - total number of connectable nodes), but my gut is telling me I can do better. Moreover, I don't think this would favor an optimal solution.
I was trying to figure out if I can invert the heuristic somehow to create a variation of A* Search such that the A* 'optimistic heuristic cost' rule still applies (i.e. heuristic cost = number of nodes discarded, however, this breaks the actual cost computation - since we'd be starting with all but one node discarded).
Another idea I had was computing the distance (number of intermediate nodes) to each node from the starting node and using the average of that as a heuristic, with goal being all nodes connected. However, I'm not guaranteed that all nodes will connect.
EDIT:
Here is an example
dashed lines represent allowed (but not activated/traveled) connections
interfaces are not allowed to connect to the interface with identical name, but can connect to the ' version of itself
interface can only be used once (if we connect A to B via α, we can no longer connect A to C because A no longer has interface α available)
number of nodes is arbitrary (but constant during the algorithm's execution), and should be assumed to be very large
number of interfaces per node is going to be at least one, we could assume an upper limit if it makes the problem easier - i.e. 3
number of possible connections is simply a function of interface compatibility, interface defines what the node can connect to, whether/how you use that interface is up to you
direction/order of activating the connections doesn't matter
the goal is to generate the largest set of connected nodes (we don't care about number of connections or interfaces used)