Graph travelling algorithm - algorithm

I've got an interesting graph-theory problem. I am given a tree T with n nodes and a set of edges. T is, of course, undirected. Each edge has weight that indicates how many times (at least) it has to be visited. We are strolling from node to node using edges and the task is to find minimal number of needed steps to satisfy above conditions. I can start from any node.
For example, this tree (edge weight in parentheses):
1 - 2 (1)
2 - 3 (1)
3 - 4 (2)
4 - 5 (1)
4 - 6 (1)
we need 8 steps to walk this tree. That are for example: 1->2->3->4->3->4->5->4->6
I don't know how to approach this algorithm. Is it possible to find this optimal tour or can we find this minimal number not directly?

Add extra edges to your graph corresponding to the weight of each edge. (i.e. if a->b has weight 3, then your graph should include 3 undirected edges connections between a and b).
Then what you are trying to find is called an Eulerian trail on this graph.
A Eulerian trail can be closed (if start==end) or open (if start!=end).
Closed trails exist if all nodes have even degree.
Open trails exist if all nodes except 2 have even degree.
Paths can be found using Fleury’s Algorithm (faster linear algorithms also exist if this is too slow).
If your graph does not satisfy the requirements for an Eulerian trail, then simply add the smallest number of extra edges until it does.
One way of doing this is to perform a depth first search over the tree and keep track of the minimum number of edges that you can add to each subtree in order that it has 0,1, or 2 vertices of odd degree. This should take time linear in the number of nodes in the tree.
EXAMPLE CODE
This Python code computes the shortest number of steps for a graph.
(To construct the graph you should consider it as a rooted graph and add edges for each edge going away from the root)
from collections import defaultdict
D=defaultdict(list)
D[1].append((2,1))
D[2].append((3,1))
D[3].append((4,2))
D[4].append((5,1))
D[4].append((6,1))
BIGNUM=100000
class Memoize:
def __init__(self, fn):
self.fn = fn
self.memo = {}
def __call__(self, *args):
if not self.memo.has_key(args):
self.memo[args] = self.fn(*args)
return self.memo[args]
#Memoize
def min_odd(node,num_odd,odd,k):
"""Return minimum cost for num_odd (<=2) odd vertices in subtree centred at node, and using only children >=k
odd is 1 if we have an odd number of edges into this node from already considered edges."""
edges=D[node]
if k==len(edges):
# No more children to consider, and no choices to make
if odd:
return 0 if num_odd==1 else BIGNUM
return 0 if num_odd==0 else BIGNUM
# We decide whether to add another edge, and how many of the odd vertices to have coming from the subtree
dest,w0 = edges[k]
best = BIGNUM
for extra in [0,1]:
w = w0+extra
for sub_odd in range(num_odd+1):
best = min(best, w + min_odd(dest,sub_odd,w&1,0) + min_odd(node,num_odd-sub_odd,(odd+w)&1,k+1) )
return best
root = 1
print min( min_odd(root,2,0,0),min_odd(root,0,0,0) )

Related

Compute distance using DFS

I was torn between these two methods:
M1:
Use adjacency list to represent graph G with vertices P and edges A
Use DFS on G storing all the distances from p in an array d;
Loop through d checking all entries. If some d[u] >6, return false otherwise true
M2:
Use adjacency list to represent graph G with vertices P and edges A
Use BFS on G storing all the distances from p in an array d;
Loop through d checking all entries. If some d[u] >6, return false otherwise true
Both these methods will produce a worst case O(|P| + |A|), therefore I think that both would be a correct answer to this question. I had chosen the DFS method, with the reasoning that with DFS you should be able to find the "outlier" of freedom degree 7 earlier than with BFS, since with BFS you would have to traverse every single Vertex until degree 7 in every case.
Apparently this is wrong according to the teacher, as using DFS, you can't compute the distances. I don't understand why you wouldn't be able to compute the distances. I could have a number n indicating the degree of freedom I am currently at. Starting from root p, the child would have n = 1. Now I store n in array d. Then I keep traversing down until no child is to be found, while incrementing n and storing the value in my array d. Then, if the back-tracking starts, the value n will be decremented until we find an unvisited child node of any of the visited nodes on the stack. If there is an unvisited child, increment once again, then increment until no more child is found, decrement until the next unvisited child from the stack is found...
I believe that would be a way to store the distances with DFS
Both BFS and DFS can do the job: they can both limit their search to a depth of 6, and at the end of the traversal they can check whether the whole population was reached or not. But there are some important differences:
With BFS
The BFS traversal is the algorithm I would opt for. When a BFS search determines the degree of a person, it is definitive: no correction needs to be made to it.
Here is sketch of how you can do this with BFS:
visited = set() # empty set
frontier = [] # empty array
visited.add(p) # search starts at person p
frontier.append(p)
for degree in [1, 2, 3, 4, 5, 6]:
nextFrontier = [] # empty array
for person in frontier:
for acquaintance in A[person]:
if acquaintance not in visited:
visited.add(acquaintance)
nextFrontier.append(acquaintance)
frontier = nextFrontier
if size(visited) == size(P): # have we reached the whole population?
return True
# After six rounds we did not reach all people, so...
return False
This assumes that you can find the list of acquaintances for a given person via A[person]. If A is not structured like an adjacency list but as a list of pairs, then first do some preprocessing on the original A to create such an adjacency list.
With DFS
A DFS algorithm has as downside that it will not necessarily start with optimal paths, and so it will find that some persons have degree 6, while there really are shorter, uninvestigated paths that could improve on that degree. This means that a DFS algorithm may need to revisit nodes and even partial paths (edges) to register such improvements and cascade them through a visited path up to degree 6. And there might even be several improvements to be applied for the same person.
A DFS algorithm could look like this:
degreeOfPerson = dict() # empty key/value dictionary
for person in P:
degreeOfPerson[person] = 7 # some value greater than 6
function dfs(person, degree):
if degree >= 7:
return # don't lose time for higher degrees than 6.
for acquaintance in A[person]:
if degree < degreeOfPerson[acquaintance]: # improvement?
degreeOfPerson[acquaintance] = degree
dfs(acquaintance, degree+1)
# start DFS
degreeOfPerson[p] = 0
dfs(p, 1)
# Check if all persons got a degree of maximum 6
for person in P:
if degreeOfPerson[person] > 6:
return False
return True
Example
If the graph has three nodes, linked as a triangle a-b-c, with starting point a, then this would be the sequence. Indentation means (recursive) call of dfs:
degreeOfPerson[a] = 0
a->b: degreeOfPerson[b] = 1
b->c: degreeOfPerson[c] = 2
c->a: # cannot improve degreeOfPerson[a]. Backtrack
c->b: # cannot improve degreeOfPerson[b]. Backtrack
b->a: # cannot improve degreeOfPerson[a]. Backtrack
a->c: degreeOfPerson[c] = 1 # improvement!
c->a: # cannot improve degreeOfPerson[a]. Backtrack
c->b: # cannot improve degreeOfPerson[b]. Backtrack
Time Complexity
The number of times the same edge can be visited with DFS is not more than the maximum degree we are looking for -- in your case 6. If that is a constant, then it does not affect the time complexity. If however the degree to check for is an input value, then the time complexity of DFS becomes O(maxdegree * |E| + |V|).
A simple depth-first search algorithm does not necessary yield the shortest path in an undirected graph. For example, consider a simple triangle graph. If you start at one vertex, you will process the other two vertices. A naive algorithm will find that there is one vertex whose distance equals one away from the source, and a second vertex whose distance equals two away from the source. However, this is incorrect since the distance from the source to either vertex is actually one.
A much more natural approach is to use the breadth-first search (BFS) algorithm. It can be shown that a breadth-first search computes shortest paths, and it requires significantly fewer modifications.
You definitely can use depth-first search to compute the distances from one node to another, but it is not a natural approach. In fact, it is very common to miscompute distances using a depth-first search algorithm (see: http://www-student.cse.buffalo.edu/~atri/cse331/support/dfs-bfs/index.html), particularly when the underlying graph has cycles. There are some special cases you must handle if you want to do it this way, but it definitely is possible.
With that being said, the depth-first search algorithm you describe does not appear to be correct. For example, it will fail on the triangle graph that I described above. This is true because the standard depth-first search only visits each vertex once, and you would not revisit a vertex after its distance has been set. Thus, if you take the "longer path" to a vertex in a cycle at first, you will end up with an incorrect distance value.

Shortest path from one source which goes through N edges

In my economics research I am currently dealing with a specific shortest path problem:
Given a directed deterministic dynamic graph with weights on the edges, I need to find the shortest path from one source S, which goes through N edges. The graph can have cycles, the edge weights could be negative, and the path is allowed to go through a vertex or edge more than once.
Is there an efficient algorithm for this problem?
One possibility would be:
First find the lowest edge-weight in the graph.
And then build a priority queue of all paths from the starting edge (initially an empty path from starting point) where all yet-to-be-handled edges are counted as having the lowest weight.
Main loop:
Remove path with lowest weight from the queue.
If path has N edges you are done
Otherwise add all possible one-edge extensions of that path to priority queue
However, that simple algorithm has a flaw - you might re-visit a vertex multiple times as i:th edge (visiting as 2nd and 4th is ok, but 4th in two different paths is the issue), which is inefficient.
The algorithm can be improved by skipping them in the 3rd step above, since the priority queue guarantees that the first partial path to the vertex had the lowest weight-sum to that vertex, and the rest of the path does not depend on how you reached the vertex (since edges and vertices can be duplicated).
The "exactly N edges" constraint makes this problem much easier to solve than if that constraint didn't exist. Essentially you can solve N = 0 (just the start node), use that to solve N = 1 (all the neighbors of the start node), then N = 2 (neighbors of the solution to N = 1, taking the lowest cost path for nodes that are are connected to multiple nodes), etc.
In pseudocode (using {field: val} to mean "a record with a field named field with value val"):
# returns a map from node to cost, where each key represents
# a node reachable from start_node in exactly n steps, and the
# associated value is the total cost of the cheapest path to
# that node
cheapest_path(n, start_node):
i = 0
horizon = new map()
horizon[start_node] = {cost: 0, path: []}
while i <= n:
next_horizon = new map()
for node, entry in key_value_pairs(horizon):
for neighbor in neighbors(node):
this_neighbor_cost = entry.cost + edge_weight(node, neighbor, i)
this_neighbor_path = entry.path + [neighbor]
if next_horizon[neighbor] does not exist or this_neighbor_cost < next_horizon[neighbor].cost:
next_horizon[neighbor] = {cost: this_neighbor_cost, path: this_neighbor_path}
i = i + 1
horizon = next_horizon
return horizon
We take account of dynamic weights using edge_weight(node, neighbor, i), meaning "the cost of going from node to neighbor at time step i.
This is a degenerate version of a single-source shortest-path algorithm like Dijkstra's Algorithm, but it's much simpler because we know we must walk exactly N steps so we don't need to worry about getting stuck in negative-weight cycles, or longer paths with cheaper weights, or anything like that.

Divide a graph into two sets

The Question is from Code jam.
Question:
Is there any way to divide the nodes of a graph into two group such that any two nodes which can't remain in the same group should be in different group.
Is there any standard algorithm for this?
How should I tackle this problem when each group should have equal element.
First, the feasibility problem (is there such set/ doesn't exist such set) is 2-coloring problem, where:
G = (V,E)
V = { all nodes }
E = { (u,v) | u and v are "troubling each other" }
This problem is solved by checking if the graph is bi-partite, and can be done using BFS.
How to tackle the problem when each group should have equal element.
first, let's assume the graph is bi-partite, so there is some solution.
Split the graph into set of connected components: (S1,S2,S3,...,Sk).
Each connected component is actually a subgraph (Si = Li,Ri) - where Li,Ri are the two sides of the bipartite graph (there is only one such splitting in each connected component, if ignoring the order of Li and Ri).
Create a new array:
arr[i] = |Li| - |Ri|
where |X| is the cardinality of X (number of elements in the set)
Now, solving this problem is same as solving the partition problem, which can be done in pseudo-polynomial time (which is polynomial in the number of nodes).
The solution to partition problem splits each arr[i] to be in A or in B, such that sum{A} is closest as possible to sum{B}. If arr[i] is in A, in your solution, color Li with "1", and Ri with "2". Otherwise - do the opposite.
Solution will be O(k*n+m), where k is number of connected components, n is number of nodes in the graph, and m is number of edges in the graph.
You build a graph from the given nodes (using a hash-table to map names to nodes) and then you use BFS or DFS to traverse the graph and determine if its bipartite (that is, divisibe into two disjoint sets such that a node in one set is only in "trouble" with nodes in the other set, but not with any node in its own set). This is done by assigning a boolean value to each node as its visited by the BFS/DFS and then checking if any of its visited neighbors has the same value, which means the graph is not bipartite (not divisible into two groups).

Code on Weighted, Acyclic Graph

We have a Code on Weighted, Acyclic Graph G(V, E) with positive and negative edges. we change the weight of this graph with following code, to give a G without negative edge (G'). if V={1,2...,n} and G_ij be a weight of edge i to edge j.
Change_weight(G)
for t=1 to n
for j=1 to n
G_i=min G_ij for All K
if G_i < 0 (we have a bar on G)
G_ij = G_ij+G_i for all j
G_ki = G_ki+G_i for all k
We have two axioms:
1) the shortest path between every two vertex in G is the same as G'.
2) the length of shortest path between every two vertex in G is the same as G'.
i read one pdf that has low quality, i'm not sure the code exactly mentioned, and add the picture. in this book he say the above axioms is false, anyone could help me? i think these are true?
i think two is false as following counter example, the original graph is given in left, and after the algorithm is run, the result is in right the shortest path between 1 to 3 changed, it passed from vertex 2 but after the algorithm is run it never passed from vertex 2.
Algorithm
My reading of the PDF is:
Change_weight(G)
for i=i to n
for j=1 to n
c_i=min c_ij for all j
if c_i < 0
c_ij = c_ij-c_i for all j
c_ki = c_ki+c_i for all k
The interpretation is that for each vertex we increase its outgoing edges by c_i, and decrease the incoming edges by c_i, where c_i is chosen such that all outgoing edges become non-negative.
Claim 1
"the shortest path between every two vertex in G is the same as G'"
With my reading of the pdf, this claim is true because every path between vertices i and j is changed by the same amount (c_i-c_j) and so the relative order of paths is unchanged. (Note that the path may go via intermediate vertices, but the net effect is 0 because for each intermediate vertex k we decrease the length by c_k when entering, but increase by c_k when exiting.)
Claim 2
"the length of shortest path between every two vertex in G is the same as G'".
This cannot be true - suppose we start with an original graph which has a single edge A to B with weight -1.
In the modified graph this weight will become 0.
Therefore the length of the shortest path has changed from -1 in G to 0 in G' so the statement is false.
Example
Shown below is what would happen to your graph as you applied this algorithm to node 1, followed by node 2:
Topological sort
Note that as shown in the example, we still end up with some negative weights which is probably unintended. This is because the weights of incoming edges are reduced.
However, if we work backwards through the graph (e.g. by using a topological sort), then we will always end up with non-negative weights everywhere.
In the given example, working backwards means we first update 2, and then 1 as shown below:

Find a maximum tree subgraph with given number of edges that is a subgraph of a tree

So a problem is as follows: you are given a graph which is a tree and the number of edges that you can use. Starting at v1, you choose the edges that go out of any of the verticies that you have already visited.
An example:
In this example the optimal approach is:
for k==1 AC -> 5
for k==2 AB BH -> 11
for k==3 AC AB BH -> 16
At first i though this is a problem to find the maximum path of length k starting from A, which would be trivial, but the point is you can always choose to go a different way, so that approach did not work.
What i though of so far:
Cut the tree at k, and brute force all the possibilites.
Calculate the cost of going to an edge for all edges.
The cost would include the sum of all edges before the edge we are trying to go to divided by the amount of edges you need to add in order to get to that edge.
From there pick the maximum, for all edges, update the cost, and do it again until you have reached k.
The second approach seems good, but it reminds me a bit of the knapsack problem.
So my question is: is there a better approach for this? Is this problem NP?
EDIT: A counter example for the trimming answer:
This code illustrates a memoisation approach based on the subproblem of computing the max weight from a tree rooted at a certain node.
I think the complexity will be O(kE) where E is the number of edges in the graph (E=n-1 for a tree).
edges={}
edges['A']=('B',1),('C',5)
edges['B']=('G',3),('H',10)
edges['C']=('D',2),('E',1),('F',3)
cache={}
def max_weight_subgraph(node,k,used=0):
"""Compute the max weight from a subgraph rooted at node.
Can use up to k edges.
Not allowed to use the first used connections from the node."""
if k==0:
return 0
key = node,k,used
if key in cache:
return cache[key]
if node not in edges:
return 0
E=edges[node]
best=0
if used<len(E):
child,weight = E[used]
# Choose the amount r of edges to get from the subgraph at child
for r in xrange(k):
# We have k-1-r edges remaining to be used by the rest of the children
best=max(best,weight+
max_weight_subgraph(node,k-1-r,used+1)+
max_weight_subgraph(child,r,0))
# Also consider not using this child at all
best=max(best,max_weight_subgraph(node,k,used+1))
cache[key]=best
return best
for k in range(1,4):
print k,max_weight_subgraph('A',k)

Resources