I was torn between these two methods:
M1:
Use adjacency list to represent graph G with vertices P and edges A
Use DFS on G storing all the distances from p in an array d;
Loop through d checking all entries. If some d[u] >6, return false otherwise true
M2:
Use adjacency list to represent graph G with vertices P and edges A
Use BFS on G storing all the distances from p in an array d;
Loop through d checking all entries. If some d[u] >6, return false otherwise true
Both these methods will produce a worst case O(|P| + |A|), therefore I think that both would be a correct answer to this question. I had chosen the DFS method, with the reasoning that with DFS you should be able to find the "outlier" of freedom degree 7 earlier than with BFS, since with BFS you would have to traverse every single Vertex until degree 7 in every case.
Apparently this is wrong according to the teacher, as using DFS, you can't compute the distances. I don't understand why you wouldn't be able to compute the distances. I could have a number n indicating the degree of freedom I am currently at. Starting from root p, the child would have n = 1. Now I store n in array d. Then I keep traversing down until no child is to be found, while incrementing n and storing the value in my array d. Then, if the back-tracking starts, the value n will be decremented until we find an unvisited child node of any of the visited nodes on the stack. If there is an unvisited child, increment once again, then increment until no more child is found, decrement until the next unvisited child from the stack is found...
I believe that would be a way to store the distances with DFS
Both BFS and DFS can do the job: they can both limit their search to a depth of 6, and at the end of the traversal they can check whether the whole population was reached or not. But there are some important differences:
With BFS
The BFS traversal is the algorithm I would opt for. When a BFS search determines the degree of a person, it is definitive: no correction needs to be made to it.
Here is sketch of how you can do this with BFS:
visited = set() # empty set
frontier = [] # empty array
visited.add(p) # search starts at person p
frontier.append(p)
for degree in [1, 2, 3, 4, 5, 6]:
nextFrontier = [] # empty array
for person in frontier:
for acquaintance in A[person]:
if acquaintance not in visited:
visited.add(acquaintance)
nextFrontier.append(acquaintance)
frontier = nextFrontier
if size(visited) == size(P): # have we reached the whole population?
return True
# After six rounds we did not reach all people, so...
return False
This assumes that you can find the list of acquaintances for a given person via A[person]. If A is not structured like an adjacency list but as a list of pairs, then first do some preprocessing on the original A to create such an adjacency list.
With DFS
A DFS algorithm has as downside that it will not necessarily start with optimal paths, and so it will find that some persons have degree 6, while there really are shorter, uninvestigated paths that could improve on that degree. This means that a DFS algorithm may need to revisit nodes and even partial paths (edges) to register such improvements and cascade them through a visited path up to degree 6. And there might even be several improvements to be applied for the same person.
A DFS algorithm could look like this:
degreeOfPerson = dict() # empty key/value dictionary
for person in P:
degreeOfPerson[person] = 7 # some value greater than 6
function dfs(person, degree):
if degree >= 7:
return # don't lose time for higher degrees than 6.
for acquaintance in A[person]:
if degree < degreeOfPerson[acquaintance]: # improvement?
degreeOfPerson[acquaintance] = degree
dfs(acquaintance, degree+1)
# start DFS
degreeOfPerson[p] = 0
dfs(p, 1)
# Check if all persons got a degree of maximum 6
for person in P:
if degreeOfPerson[person] > 6:
return False
return True
Example
If the graph has three nodes, linked as a triangle a-b-c, with starting point a, then this would be the sequence. Indentation means (recursive) call of dfs:
degreeOfPerson[a] = 0
a->b: degreeOfPerson[b] = 1
b->c: degreeOfPerson[c] = 2
c->a: # cannot improve degreeOfPerson[a]. Backtrack
c->b: # cannot improve degreeOfPerson[b]. Backtrack
b->a: # cannot improve degreeOfPerson[a]. Backtrack
a->c: degreeOfPerson[c] = 1 # improvement!
c->a: # cannot improve degreeOfPerson[a]. Backtrack
c->b: # cannot improve degreeOfPerson[b]. Backtrack
Time Complexity
The number of times the same edge can be visited with DFS is not more than the maximum degree we are looking for -- in your case 6. If that is a constant, then it does not affect the time complexity. If however the degree to check for is an input value, then the time complexity of DFS becomes O(maxdegree * |E| + |V|).
A simple depth-first search algorithm does not necessary yield the shortest path in an undirected graph. For example, consider a simple triangle graph. If you start at one vertex, you will process the other two vertices. A naive algorithm will find that there is one vertex whose distance equals one away from the source, and a second vertex whose distance equals two away from the source. However, this is incorrect since the distance from the source to either vertex is actually one.
A much more natural approach is to use the breadth-first search (BFS) algorithm. It can be shown that a breadth-first search computes shortest paths, and it requires significantly fewer modifications.
You definitely can use depth-first search to compute the distances from one node to another, but it is not a natural approach. In fact, it is very common to miscompute distances using a depth-first search algorithm (see: http://www-student.cse.buffalo.edu/~atri/cse331/support/dfs-bfs/index.html), particularly when the underlying graph has cycles. There are some special cases you must handle if you want to do it this way, but it definitely is possible.
With that being said, the depth-first search algorithm you describe does not appear to be correct. For example, it will fail on the triangle graph that I described above. This is true because the standard depth-first search only visits each vertex once, and you would not revisit a vertex after its distance has been set. Thus, if you take the "longer path" to a vertex in a cycle at first, you will end up with an incorrect distance value.
Related
Consider a directed graph on n vertices, where each vertex has exactly
one outgoing edge. This graph consists of a collection of cycles as
well as additional vertices that have paths to the cycles, which we
call the branches. Describe a linear time algorithm that identifies
all of the cycles and computes the length of each cycle. You can
assume that the input is given as an array A, where A[i] is the
neighbor of i, so that the graph has the edge (i, A[i]).
So far my approach to the algorithm is basically marking the vertices I have traversed, and every time a vertex points back to the ones that I've traversed I count one cycle and move on to the next unvisited vertex. During the process, I also have a hashmap or something to record the order in which each node is traversed so I can calculate the length whenever I identify a cycle. (Would that be linear?) However, I very new to proof and I have no idea how to justify the correctness of an algorithm.
If you are allowed to use extra memory, the algorithm in Python would be like this.
colors = [0] ** N; # initialize N element array withe values of zero (not seen)
for i in range(N):
v = i # current vertex
if colors[v] != 0: continue # already seen
colors[v] = 1 # seen
v = A[v] # move to neighbor
while colors[v] == 0:
colors[v] = 1
v = A[v] # move to neighbor
# we have reached previously seen node; this is the start node of a cycle
colors[v] = 2 # mark the start of a cycle
cycle_len = 1
v = A[v] # move to neighbor
while colors[v] == 1:
cycle_len += 1
v = A[v] # move to neighbor
print("got a cycle with length =", cycle_len)
The basic idea is to use three colors to differently mark nodes that have already been visited and nodes that are the starting points of cycles; obviously, a single node can only belong to a single cycle.
The algorithm is linear as the internal while loop is only executed for nodes that have not been previously seen. Nodes already seen are skipped. In the worst case, both internal while loops are fully executed, but 2*N is still O(N).
Using a hashmap would not match the requirements, as the worst-case time complexity for hashmaps is not linear.
In my economics research I am currently dealing with a specific shortest path problem:
Given a directed deterministic dynamic graph with weights on the edges, I need to find the shortest path from one source S, which goes through N edges. The graph can have cycles, the edge weights could be negative, and the path is allowed to go through a vertex or edge more than once.
Is there an efficient algorithm for this problem?
One possibility would be:
First find the lowest edge-weight in the graph.
And then build a priority queue of all paths from the starting edge (initially an empty path from starting point) where all yet-to-be-handled edges are counted as having the lowest weight.
Main loop:
Remove path with lowest weight from the queue.
If path has N edges you are done
Otherwise add all possible one-edge extensions of that path to priority queue
However, that simple algorithm has a flaw - you might re-visit a vertex multiple times as i:th edge (visiting as 2nd and 4th is ok, but 4th in two different paths is the issue), which is inefficient.
The algorithm can be improved by skipping them in the 3rd step above, since the priority queue guarantees that the first partial path to the vertex had the lowest weight-sum to that vertex, and the rest of the path does not depend on how you reached the vertex (since edges and vertices can be duplicated).
The "exactly N edges" constraint makes this problem much easier to solve than if that constraint didn't exist. Essentially you can solve N = 0 (just the start node), use that to solve N = 1 (all the neighbors of the start node), then N = 2 (neighbors of the solution to N = 1, taking the lowest cost path for nodes that are are connected to multiple nodes), etc.
In pseudocode (using {field: val} to mean "a record with a field named field with value val"):
# returns a map from node to cost, where each key represents
# a node reachable from start_node in exactly n steps, and the
# associated value is the total cost of the cheapest path to
# that node
cheapest_path(n, start_node):
i = 0
horizon = new map()
horizon[start_node] = {cost: 0, path: []}
while i <= n:
next_horizon = new map()
for node, entry in key_value_pairs(horizon):
for neighbor in neighbors(node):
this_neighbor_cost = entry.cost + edge_weight(node, neighbor, i)
this_neighbor_path = entry.path + [neighbor]
if next_horizon[neighbor] does not exist or this_neighbor_cost < next_horizon[neighbor].cost:
next_horizon[neighbor] = {cost: this_neighbor_cost, path: this_neighbor_path}
i = i + 1
horizon = next_horizon
return horizon
We take account of dynamic weights using edge_weight(node, neighbor, i), meaning "the cost of going from node to neighbor at time step i.
This is a degenerate version of a single-source shortest-path algorithm like Dijkstra's Algorithm, but it's much simpler because we know we must walk exactly N steps so we don't need to worry about getting stuck in negative-weight cycles, or longer paths with cheaper weights, or anything like that.
I am working on the following past paper question for an algorithms module:
Let G = (V, E) be a simple directed acyclic graph (DAG).
For a pair of vertices v, u in V, we say v is reachable from u if there is a (directed) path from u to v in G.
(We assume that every vertex is reachable from itself.)
For any vertex v in V, let R(v) be the reachability number of vertex v, which is the number of vertices u in V that are reachable from v.
Design an algorithm which, for a given DAG, G = (V, E), computes the values of R(v) for all vertices v in V.
Provide the analysis of your algorithm (i.e., correctness and running time
analysis).
(Optimally, one should try to design an algorithm running in
O(n + m) time.)
So, far I have the following thoughts:
The following algorithm for finding a topological sort of a DAG might be useful:
TopologicalSort(G)
1. Run DFS on G and compute a DFS-numbering, N // A DFS-numbering is a numbering (starting from 1) of the vertices of G, representing the point at which the DFS-call on a given vertex v finishes.
2. Let the topological sort be the function a(v) = n - N[v] + 1 // n is the number of nodes in G and N[v] is the DFS-number of v.
My second thought is that dynamic programming might be a useful approach, too.
However, I am currently not sure how to combine these two ideas into a solution.
I would appreciate any hints!
EDIT: Unfortunately the approach below is not correct in general. It may count multiple times the nodes that can be reached via multiple paths.
The ideas below are valid if the DAG is a polytree, since this guarantees that there is at most one path between any two nodes.
You can use the following steps:
find all nodes with 0 in-degree (i.e. no incoming edges).
This can be done in O(n + m), e.g. by looping through all edges
and marking those nodes that are the end of any edge. The nodes with 0
in-degree are those which have not been marked.
Start a DFS from each node with 0 in-degree.
After the DFS call for a node ends, we want to have computed for that
node the information of its reachability.
In order to achieve this, we need to add the reachability of the
successors of this node. Some of these values might have already been
computed (if the successor was already visited by DFS), therefore this
is a dynamic programming solution.
The following pseudocode describes the DFS code:
function DFS(node) {
visited[node] = true;
reachability[node] = 1;
for each successor of node {
if (!visited[successor]) {
DFS(successor);
}
reachability[node] += reachability[successor];
}
}
After calling this for all nodes with 0 in-degree, the reachability
array will contain the reachability for all nodes in the graph.
The overall complexity is O(n + m).
I'd suggest using a Breadth First Search approach.
For every node, add all the nodes that are connected to the queue. In addition to that, maintain a separate array for calculating the reachability.
For example, if a A->B, then
1.) Mark A as traversed
2.) B is added to the queue
3.) arr[B]+=1
This way, we can get R(v) for all vertices in O(|V| + |E|) time through arr[].
I'm looking for an algorithm to solve this problem. I have to implement it (so I need a not np solution XD)
I have a complete graph with a cost on each arch and a reward on each vertex. I have only a start point, but it doesn't matter the end point, becouse the problem is to find a path to see as many vertex as possible, in order to have the maximum reward possible, but subject to a maximum cost limit. (for this reason it doesn't matter the end position).
I think to find the optimum solution is a np-hard problem, but also an approximate solution is apprecciated :D
Thanks
I'm trying study how to solve the problem with branch & bound...
update: complete problem dscription
I have a region in which there are several areas identify by its id and x,y,z position. Each vertex identifies one ot these areas. The maximum number of ares is 200.
From a start point S, I know the cost, specified in seconds and inserted in the arch (so only integer values), to reach each vertex from each other vertex (a complete graph).
When I visit a vertex I get a reward (float valiues).
My objective is to find a paths in a the graph that maximize the reward but I'm subject to a cost constraint on the paths. Indeed I have only limited minutes to complete the path (for example 600 seconds.)
The graph is made as matrix adjacency matrix for the cost and reward (but if is useful I can change the representation).
I can visit vertex more time but with one reward only!
Since you're interested in branch and bound, let's formulate a linear program. Use Floyd–Warshall to adjust the costs minimally downward so that cost(uw) ≤ cost(uv) + cost(vw) for all vertices u, v, w.
Let s be the starting vertex. We have 0-1 variables x(v) that indicate whether vertex v is part of the path and 0-1 variables y(uv) that indicate whether the arc uv is part of the path. We seek to maximize
sum over all vertices v of reward(v) x(v).
The constraints unfortunately are rather complicated. We first relate the x and y variables.
for all vertices v ≠ s, x(v) - sum over all vertices u of y(uv) = 0
Then we bound the cost.
sum over all arcs uv of cost(uv) y(uv) ≤ budget
We have (pre)flow constraints to ensure that the arcs chosen look like a path possibly accompanied by cycles (we'll handle the cycles shortly).
for all vertices v, sum over all vertices u of y(uv)
- sum over all vertices w of y(vw)
≥ -1 if v = s
0 if v ≠ s
To handle the cycles, we add cut covering constraints.
for all subsets of vertices T such that s is not in T,
for all vertices t in T,
x(t) - sum over all vertices u not in T and v in T of y(uv) ≥ 0
Because of the preflow constraints, a cycle necessarily is disconnected from the path structure.
There are exponentially many cut covering constraints, so when solving the LP, we have to generate them on demand. This means finding the minimum cut between s and each other vertex t, then verifying that the capacity of the cut is no greater than x(t). If we find a violation, then we add the constraint and use the dual simplex method to find the new optimum (repeat as necessary).
I'm going to pass on describing the branching machinery – this should be taken care of by your LP solver anyway.
Finding the optimal solution
Here is a recursive approach to solving your problem.
Let's begin with some definitions :
Let A = (Ai)1 ≤ i ≤ N be the areas.
Let wi,j = wj,i the time cost for traveling from Ai to Aj and vice versa.
Let ri the reward for visiting area Ai
Here is the recursive procedure that will output the exact requested solution : (pseudo-code)
List<Area> GetBestPath(int time_limit, Area S, int *rwd) {
int best_reward(0), possible_reward(0), best_fit(0);
List<Area> possible_path[N] = {[]};
if (time_limit < 0) {
return [];
}
if (!S.visited) {
*rwd += S.reward;
S.visit();
}
for (int i = 0; i < N; ++i) {
if (S.index != i) {
possible_path[i] = GetBestPath(time_limit - W[S.index][i], A[i], &possible_reward);
if (possible_reward > best_reward) {
best_reward = possible_reward;
best_fit = i;
}
}
}
*rwd+= best_reward;
possible_path[best_fit].push_front(S);
return possible_path[best_fit];
}
For obvious clarity reasons, I supposed the Ai to be globally reachable, as well as the wi,j.
Explanations
You start at S. First thing you do ? Collect the reward and mark the node as visited. Then you have to check which way to go is best between the S's N-1 neighbors (lets call them NS,i for 1 ≤ i ≤ N-1).
This is the exact same thing as solving the problem for NS,i with a time limit of :
time_limit - W(S ↔ NS,i)
And since you mark the visited nodes, when arriving at an area, you first check if it is marked. If so you have no reward ... Else you collect and mark it as visited ...
And so forth !
The ending condition is when time_limit (C) becomes negative. This tells us we reached the limit and cannot proceed to further moves : the recursion ends. The final path may contain useless journeys if all the rewards have been collected before the time limit C is reached. You'll have to "prune" the output list.
Complexity ?
Oh this solution is soooooooo awful in terms of complexity !
Each calls leads to N-1 calls ... Until the time limit is reached. The longest possible call sequence is yielded by going back and forth each time on the shortest edge. Let wmin be the weight of this edge.
Then obviously, the overall complexity is bounded by NC/wmin.C/wmin.
This is huuuuuge.
Another approach
Maintain a hash table of all the visited nodes.
On the other side, maintain a Max-priority queue (eg. using a MaxHeap) of the nodes that have not been collected yet. (The top of the heap is the node with the highest reward). The priority value for each node Ai in the queue is set as the couple (ri, E[wi,j])
Pop the heap : Target <- heap.pop().
Compute the shortest path to this node using Dijkstra algorithm.
Check out the path : If the cost of the path is too high, then the node is not reachable, add it to the unreachable nodes list.
Else collect all the uncollected nodes that you find in it and ...
Remove each collected node from the heap.
Set Target as the new starting point.
In either case, proceed to step 1. until the heap is empty.
Note : A hash table is the best suited to keep track of the collected node. This way, we can check a node in a path computed using Dijkstra in O(1).
Likewise, maintaining a hashtable leading to the position of each node in the heap might be useful to optimise the "pruning" of the heap, when collecting the nodes along a path.
A little analysis
This approach is slightly better than the first one in terms of complexity, but may not lead to the optimal result. In fact, it can even perform quite poorly on some graph configurations. For example, if all nodes have a reward r, except one node T that has r+1 and W(N ↔ T) = C for every node N, but the other edges would be all reachable, then this will only make you collect T and miss every other node. In this particular case, the best solution would have been to ignore T and collect everyone else leading to a reward of (N-1).r instead of only r+1.
I've got an interesting graph-theory problem. I am given a tree T with n nodes and a set of edges. T is, of course, undirected. Each edge has weight that indicates how many times (at least) it has to be visited. We are strolling from node to node using edges and the task is to find minimal number of needed steps to satisfy above conditions. I can start from any node.
For example, this tree (edge weight in parentheses):
1 - 2 (1)
2 - 3 (1)
3 - 4 (2)
4 - 5 (1)
4 - 6 (1)
we need 8 steps to walk this tree. That are for example: 1->2->3->4->3->4->5->4->6
I don't know how to approach this algorithm. Is it possible to find this optimal tour or can we find this minimal number not directly?
Add extra edges to your graph corresponding to the weight of each edge. (i.e. if a->b has weight 3, then your graph should include 3 undirected edges connections between a and b).
Then what you are trying to find is called an Eulerian trail on this graph.
A Eulerian trail can be closed (if start==end) or open (if start!=end).
Closed trails exist if all nodes have even degree.
Open trails exist if all nodes except 2 have even degree.
Paths can be found using Fleury’s Algorithm (faster linear algorithms also exist if this is too slow).
If your graph does not satisfy the requirements for an Eulerian trail, then simply add the smallest number of extra edges until it does.
One way of doing this is to perform a depth first search over the tree and keep track of the minimum number of edges that you can add to each subtree in order that it has 0,1, or 2 vertices of odd degree. This should take time linear in the number of nodes in the tree.
EXAMPLE CODE
This Python code computes the shortest number of steps for a graph.
(To construct the graph you should consider it as a rooted graph and add edges for each edge going away from the root)
from collections import defaultdict
D=defaultdict(list)
D[1].append((2,1))
D[2].append((3,1))
D[3].append((4,2))
D[4].append((5,1))
D[4].append((6,1))
BIGNUM=100000
class Memoize:
def __init__(self, fn):
self.fn = fn
self.memo = {}
def __call__(self, *args):
if not self.memo.has_key(args):
self.memo[args] = self.fn(*args)
return self.memo[args]
#Memoize
def min_odd(node,num_odd,odd,k):
"""Return minimum cost for num_odd (<=2) odd vertices in subtree centred at node, and using only children >=k
odd is 1 if we have an odd number of edges into this node from already considered edges."""
edges=D[node]
if k==len(edges):
# No more children to consider, and no choices to make
if odd:
return 0 if num_odd==1 else BIGNUM
return 0 if num_odd==0 else BIGNUM
# We decide whether to add another edge, and how many of the odd vertices to have coming from the subtree
dest,w0 = edges[k]
best = BIGNUM
for extra in [0,1]:
w = w0+extra
for sub_odd in range(num_odd+1):
best = min(best, w + min_odd(dest,sub_odd,w&1,0) + min_odd(node,num_odd-sub_odd,(odd+w)&1,k+1) )
return best
root = 1
print min( min_odd(root,2,0,0),min_odd(root,0,0,0) )