I'm trying to optimize a graph-traversal problem, but can't figure out the best way to tackle it. It seems neither like A* search problem (because we want to maximize the path rather than minimizing it), nor traveling salesman problem (because we don't have to visit all cities). The simplified version of it is something along these lines:
We have a set of nodes and connections/edges. Connections are arbitrary and nodes can have one or more of them. Connections also have an interface/type associated with them, and interfaces can't support more than a single connection. So for example, if node A can connect to nodes B or C via interface alpha, and we decide to connect it to node B, that interface on node A can no longer support other connections, so C can't be connected to A anymore. However, we could connect node C to node D, if it happens to have the same alpha interface.
I should also mention that these interfaces work like lock-and-key, so A can connect to either B or C, but B and C can't connect to eachother (the interface is like a mirror). Also, while A can no longer connect to anything via the alpha interface because it's used by B, if it happens to have another interface (bravo) and something else can connect to bravo, then we can connect more than one node to A. The goal is to obtain the largest group of connected nodes (discarding all smaller groups).
There are a few heuristics I'm considering:
prefer nodes with more interfaces (I already discarded interfaces without pairs)
prefer interfaces that are more popular
The above two rules can be useful for prioritizing which node to try connecting to next (for now I naively grouped them into one rank - total number of connectable nodes), but my gut is telling me I can do better. Moreover, I don't think this would favor an optimal solution.
I was trying to figure out if I can invert the heuristic somehow to create a variation of A* Search such that the A* 'optimistic heuristic cost' rule still applies (i.e. heuristic cost = number of nodes discarded, however, this breaks the actual cost computation - since we'd be starting with all but one node discarded).
Another idea I had was computing the distance (number of intermediate nodes) to each node from the starting node and using the average of that as a heuristic, with goal being all nodes connected. However, I'm not guaranteed that all nodes will connect.
EDIT:
Here is an example
dashed lines represent allowed (but not activated/traveled) connections
interfaces are not allowed to connect to the interface with identical name, but can connect to the ' version of itself
interface can only be used once (if we connect A to B via α, we can no longer connect A to C because A no longer has interface α available)
number of nodes is arbitrary (but constant during the algorithm's execution), and should be assumed to be very large
number of interfaces per node is going to be at least one, we could assume an upper limit if it makes the problem easier - i.e. 3
number of possible connections is simply a function of interface compatibility, interface defines what the node can connect to, whether/how you use that interface is up to you
direction/order of activating the connections doesn't matter
the goal is to generate the largest set of connected nodes (we don't care about number of connections or interfaces used)
Related
Given multiple abstract objects, but not their value, and a list of which objects are bigger than other objects, find the biggest object.
One thought I had was representing the objects in a directed graph.An outgoing connection will indicated that the node which the connection originated from is bigger than the node which receives the connection.
Therefore nodes with only outgoing connections will have no known node bigger than itself meaning it's the biggest of them all. If there are multiple nodes with only outgoing connections, then there's not enough given information to definitively pick the biggest node.
For example given this graph we known that nodes 4 and 5 are currently the biggest but it would require a connection to tell who's the biggest between these two.
On the other hand in this graph node 4 is the sole node with only outgoing connections and therefore is the biggest.
This method has worked on the examples I've tried, but I'd like to know if this method is even correct or if somebody can think of an example that will prove the method to be false or if somebody can prove this method is correct or if somebody has a proven working approach to solve this problem.
Thanks in advance!
The problem statement
Given a directed graph of N nodes where each node is pointing to any one of the N nodes (can possibly point to itself). Ishu, the coder, is bored and he has discovered a problem out of it to keep himself busy. Problem is as follows:
A node is 'good' if it satisfies one of the following:
It is the special node (marked as node 1)
It is pointing to the special node (node 1)
It is pointing to a good node.
Ishu is going to change pointers of some nodes to make them all 'good'. You have to find the minimum number of pointers to change in order to make all the nodes good (Thus, a Good Graph).
NOTE: Resultant Graph should hold the property that all nodes are good and each node must point to exactly one node.
Problem Constraints
1 <= N <= 10^5
Input Format
First and only argument is an integer array A containing N numbers all between 1 to N, where i-th number is the number of node that i-th node is pointing to.
Output Format
An Integer denoting minimum number of pointer changes.
My Attempted Solution
I tried building a graph by reversing the edges, and then trying to colour connected nodes, the answer would be colors - 1. In short , I was attempting to find number of connected components for a directed graph, which does not make sense(as directed graphs do not have any concept of connected components). Other solutions on the web like this and this also point out to find number of connected components, but again connected components for aa directed graph does not make any sense to me. The question looks a bit trickier to me than a first look at it suggests.
#mcdowella gives an accurate characterization of the graph -- in each connected component, all chains of pointers and up in the same dead end or cycle.
There's a complication if the special node has a non-null pointer, though. If the special node has a non-null pointer, then it may or may not be in the terminal cycle for its connected component. First set the special node pointer to null to make this problem go away, and then:
If the graph has m connected components, then m-1 pointers would have to be changed to make all nodes good.
Since you don't have to actually change the pointers, all you need to do to solve the problem is count the connected components in the graph.
There are many ways to do this. If you think of the edges as undirected, then you could use BFS or DFS to trace each connected component, for example.
With the kind of graph you actually have, though, where each node has a directed pointer, the easiest way to solve this is with a disjoint set data structure: https://en.wikipedia.org/wiki/Disjoint-set_data_structure
Initially you just put each node in its own set, then union the sets on either side of each pointer, and then count the number of sets by counting the number of roots in the disjoint set structure.
If you haven't implemented a disjoint set structure before, it's a little tough to understand just how easy it is. There are couple simple implementations here: How to properly implement disjoint set data structure for finding spanning forests in Python?
I think this is a very similar question to Graphic: Rerouting problem test in python language. Because each node points to just one other node, your graph consists of a set of cycles (counting a node pointing to itself as a cycle) with trees feeding in to them. You can find a cycle by following pointers between nodes and checking for nodes that you have already visited. You need to break one link in each cycle, redirecting it to point to the special node. This routes everything else in that cycle, and all the trees that feed into it, to the special node.
I understand that distance vector routing protocol is a distributed version of Bellman-Ford algorithm.
It is used to find the shortest-path from every node to every other node in the network.
So, every node advertises it's routing table information (computed distances to all other nodes in the network) to its neighbors and at the same learns from the neighboring nodes.
So, my question is how long does these advertisement keeps happening between the neighbors? ie, since this is a distributed system, how does each node gets to know that the entire system has converged and I should stop advertising.
Like in case of Bellman-Ford (centralized) algorithm, we can say that convergence has happened when the number of iteration is equal to one shy of number of edges in the graph (one shy of number of links in the network), and we can stop the algorithm execution...
Little bit of more learning on the topic and search different article brought me to this below conclusion.
Below is the excerpt from Wikipedia - https://en.wikipedia.org/wiki/Distance-vector_routing_protocol#Example
None of the routers have any new shortest-paths to broadcast.
Therefore, none of the routers receive any new information that might
change their routing tables. The algorithm comes to a stop.
So what this indicates is - In the distributed network using distance vector protocol, the nodes stop advertising the information when there is no more change to their estimated shortest-path distance. And at this stage, you can call the distributed network has got converged.
The node in the network starts advertising its initial table (which will contain the distance information to the directly connected nodes) once it is part of the network.
And it keeps advertising the distance information to its neighbors until there is change to its table.
Periodically Routers send distance vectors to neighbours ,, it updates only if the new information present in the distance vector by comparing with old distance vector of neighbour v which is already present at the router... There is no stopping of algorithm...
I am trying to solve below question from tardos. Any suggestions or help would be appreciated.
You’ve been called in to help some network administrators diagnose the extent of a failure in their network. The network is designed to carry traffic from a designated source node s to a designated target node t, so we will model it as a directed graph G = (V,E), in which the capacity of each edge is 1, and in which each node lies on at least one path from s to t.
Now, when everything is running smoothly in the network, the maximum s-t flow in G has value k. However, the current situation - and the reason you’re here - is that an attacker has destroyed some of the edges in the network, so that there is now no path from s to t using the remaining (surviving) edges. For reasons that we won’t go into here, they believe the attacker has destroyed only k edges, the minimum number needed to separate s from t (i.e. the size of a minimum s-t cut); and we’ll assume they’re correct in believing this.
The network administrators are running a monitoring pool on node s, which has the following behavior: if you issue the command ping(v), for a given node v, it will tell you whether there is currently a path from s to v. (So pint(t) reports that no path currently exists; on the other hand, ping(s) always reports a path from s to itself.) Since it’s not practical to go out and inspect every edge of the network, they’d like to determine the extent of the failure using this monitoring tool, through judicious use of the ping command.
So here’s the problem you face: give an algorithm that issues a sequence of ping commands to various nodes in the network, and then reports the full set of nodes that are not currently reachable from s. You could do this by pinging every node in the network, of course, but you’d like to do it using many fewer pings (given the assumption that only k edges have been deleted). In issuing this sequence, your algorithm is allowed to decide which node to ping next based on the outcome of earlier ping operations.
Give an algorithm that accomplishes this task using only O(k log n) pings.
Use Floyd-Fulkerson on the complete network to calculate a max flow, which will consist of k edge-disjoint paths.
Since exactly k edges have been deleted, and all flow is cut off, exactly one edge must have been deleted along each of these paths.
For each path, which will contain at most n edges, do a binary search to discover the position of the broken edge, using O(log n) pings to the nodes on the path.
A question to the following exercise:
Let N = (V,E,c,s,t) be a flow network such that (V,E) is acyclic, and let m = |E|. Describe a polynomial-
time algorithm that checks whether N has a unique maximum flow, by solving ≤ m + 1 max-flow problems.
Explain correctness and running time of the algorithm
My suggestion would be the following:
run FF (Ford Fulkerson) once and save the value of the flow v(f) and the flow over all egdes f(e_i)
for each edge e_i with f(e_i)>0:
set capacity (in this iteration) of this edge c(e_i)=f(e_i)-1 and run FF.
If the value of the flow is the same as in the original graph, then there exists another way to push the max flow through the network and we're done - the max flow isn't unique --> return "not unique"
Otherwise we continue
we're done with looping without finding another max flow of same value, that means max flow is unique -> return "unique"
Any feedback? Have I overlooked some cases where this does not work?
Your question leaves a few details open, e.g., is this an integer flow graph (probably yes, although Ford-Fulkerson, if it converges, can run on other networks as well), and how exactly do you define whether two flows are different (is it enough that the function mapping edges to flows be different, or must the set of edges actually flowing something be different, which is a stronger requirement).
If the network is not necessarily integer flows, then, no, this will not necessarily work. Consider the following graph, where, on each edge, the number within the parentheses represents the actual flow, and the number to the left of the parentheses represents the capacity (e.g., the capacity of each of (a, c) and (c, d) is 1.1, and the flow of each is 1.):
In this graph, the flow is non-unique. It's possible to flow a total of 1 by floating 0.5 through (a, b) and (b, d). Your algorithm, however, won't find this by reducing the capacity of each of the edges to 1 below its current flow.
If the network is integer, it is not guaranteed to find a different set of participating edges than the current one. You can see it through the following graph:
Finally, though, if the network is an integer flow network, and the meaning of a different flow is simply a different function of edges to flows, then your algorithm is correct.
Sufficiency If your algorithm finds a different flow with the same total result, then obviously the new flow is legal, and, also, necessarily, at least one of the edges is flowing a different amount than it did before.
Necessity Suppose there is a different flow than the original one (with the same total value), with at least one of the edges flowing a different amount. Say that, for each edge, the flow in the alternative solution is not less than the flow in the original solution. Since the flows are different, there must be at least a single edge where the flow in the alternative solution increased. Without a different edge decreasing the flow, though, there is either a violation of the conservation of flow, or the original solution was suboptimal. Hence there is some edge e where the flow in the alternative solution is lower than in the original solution. Since it is an integer flow network, the flow must be at least 1 lower on e. By definition, though, reducing the capacity of e to at least 1 lower than the current flow, will not make the alternative flow illegal. Hence some alternative flow must be found if the capacity is decreased for e.
non integer, rational flows can be 'scaled' to integer
changing edges capacity is risky, because some edges may be critical and are included in every max flow
there is a better runtime solution, you don't need to check every single edge.
create a residual network (https://en.wikipedia.org/wiki/Flow_network). run DFS on the residual network graph, if you find a circle it means there is another max flow, wherein the flow on at least one edge is different.