Solving cycle in undirected graph in log space? - algorithm

A slightly more theoretical question, but here it is nonetheless.
Setting
Let:
UCYLE = { : G is an undirected graph that contains a simple cycle}.
My Solution
we show UCYLE is in L by constructing algorithm M that decides UCYLE using $L$ space.
M = "On input where G = (V,E)
For each v_i in V, for each v_j in Neighbor(v_i), store the current v_i and v_j
Traverse the edge (v_i,v_j) and then follow all possible paths through G using DFS.
If we encounter v_k in Neighbor(v_i) / {v_j} so that there is an edge (v_i,v_k) in E, then ACCEPT. Else REJECT."
First we claim M decides UCYLE. First, if there exists a cycle in $G$, then it must start and end on some vertex $v_i$, step one of $M$ tries all such $v_i$'s and therefore must find the desired vertex. Next, suppose the cycle starts at $v_i$, then there must exists a starting edge $(v_i,v_j)$ so that if we follow the cycle, we come back to $v_i$ through a different edge $(v_k,v_i)$, so we accept in step three. Since the graph is undirected, we can always come back to $v_i$ through $(v_i,v_j)$, but $M$ does not accept this case. By construction, neither does $M$ accept if we come upon some $v_k in Neighbor(v_i)/{v_j}$ but there is no edge from $v_k$ to $v_i$.
Now we show M is in L. First if the vertices are labled $1,\ldots,n$ where $|\mathbb V| = n$, then it requires $log(n)$ bits to specify each $v_i$. Next note in $\mathcal M$ we only need to keep track of the current $v_i$ and $v_j$, so M is $2 log(n) = O(log n), which is in L
My Problem
My problem is how do you perform DFS on the graph in $log(n)$ space. For example, in the worst case where each vertex has degree $n$, you'd have to keep a counter of which vertex you took on a particular path, which would require $n log(n)$ space.

The state you maintain as you search is four vertices: (v_i, v_j, prev, current).
The next state is: (v_i, v_j, current, v) where v is the next neighbour of current after prev (wrapping back to the first if prev is the numerically last neighbour of current).
You stop when current is a neighbour of v_i and reject if it's not v_j.
In pseudo-code, something like this:
for v_i in vertices
for v_j in neighbours(v_i)
current, prev = v_j, v_i
repeat
idx = neighbours(current).index(v_j)
idx = (idx + 1) % len(neighbours(current))
current, prev = neighbours(current)[idx], current
until current adjacent to v_i
if current != v_j
return FOUND_A_CYCLE
return NO_CYCLES_EXIST
Intuitively, this is saying for each point in a maze, and for each corridor from that point, follow the left-hand wall, and if when you can see the start point again if it's not through the original corridor then you've found a cycle.
While it's easy to see that this algorithm uses O(log n) space, there's some proof necessary to show that this algorithm terminates.

Related

How to prove a linear algorithm that identifies all cycles and the length in a graph where each vertex has exactly one outgoing edge

Consider a directed graph on n vertices, where each vertex has exactly
one outgoing edge. This graph consists of a collection of cycles as
well as additional vertices that have paths to the cycles, which we
call the branches. Describe a linear time algorithm that identifies
all of the cycles and computes the length of each cycle. You can
assume that the input is given as an array A, where A[i] is the
neighbor of i, so that the graph has the edge (i, A[i]).
So far my approach to the algorithm is basically marking the vertices I have traversed, and every time a vertex points back to the ones that I've traversed I count one cycle and move on to the next unvisited vertex. During the process, I also have a hashmap or something to record the order in which each node is traversed so I can calculate the length whenever I identify a cycle. (Would that be linear?) However, I very new to proof and I have no idea how to justify the correctness of an algorithm.
If you are allowed to use extra memory, the algorithm in Python would be like this.
colors = [0] ** N; # initialize N element array withe values of zero (not seen)
for i in range(N):
v = i # current vertex
if colors[v] != 0: continue # already seen
colors[v] = 1 # seen
v = A[v] # move to neighbor
while colors[v] == 0:
colors[v] = 1
v = A[v] # move to neighbor
# we have reached previously seen node; this is the start node of a cycle
colors[v] = 2 # mark the start of a cycle
cycle_len = 1
v = A[v] # move to neighbor
while colors[v] == 1:
cycle_len += 1
v = A[v] # move to neighbor
print("got a cycle with length =", cycle_len)
The basic idea is to use three colors to differently mark nodes that have already been visited and nodes that are the starting points of cycles; obviously, a single node can only belong to a single cycle.
The algorithm is linear as the internal while loop is only executed for nodes that have not been previously seen. Nodes already seen are skipped. In the worst case, both internal while loops are fully executed, but 2*N is still O(N).
Using a hashmap would not match the requirements, as the worst-case time complexity for hashmaps is not linear.

Minimum Spanning tree different from another

Assume we are given
an undirected graph g where every node i,1 <= i < n is connected to all j,i < j <=n
and a source s.
We want to find the total costs (defined as the sum of all edges' weights) of the cheapest minimum spanning tree that differs from the minimum distance tree of s (i.e. from the MST obtained by running prim/dijkstra on s) by at least one edge.
What would be the best way to tackle this? Because currently, I can only think of some kind of fixed-point iteration
run dijkstra on (g,s) to obtain reference graph r that we need to differ from
costs := sum(edge_weights_of(r))
change := 0
for each vertex u in r, run a bfs and note for each reached vertex v the longest edge on the path from u to v.
iterate through all edges e = (a,b) in g: and find e'=(a',b') that is NOT in r and minimizes newchange := weight(e') - weight(longest_edge(a',b'))
if(first_time_here OR newchange < 0) then change += newchange
if(newchange < 0) goto 4
result := costs + change
That seems to waste a lot of time... It relies on the fact that adding an edge to a spanning tree creates a cycle from which we can remove the longest edge.
I also thought about using Kruskal to get an overall minimum spanning tree and only using the above algorithm to replace a single edge when the trees from both, prim and kruskal, happen to be the same, but that doesn't seem to work as the result would be highly dependent on the edges selected during a run of kruskal.
Any suggestions/hints?
You can do it using Prim`s algorithm
Prim's algorithm:
let T be a single vertex x
while (T has fewer than n vertices)
{
1.find the smallest edge connecting T to G-T
2.add it to T
}
Now lets modify it.
Let you have one minimum spanning tree. Say Tree(E,V)
Using this algorithm
Prim's algorithm (Modified):
let T be a single vertex
let isOther = false
while (T has fewer than n vertices)
{
1.find the smallest edge (say e) connecting T to G-T
2.If more than one edge is found, {
check which one you have in E(Tree)
choose one different from this
add it to T
set isOther = true
}
else if one vertex is found {
add it to T
If E(Tree) doesn`t contain this edge, set isOther = true
Else don`t touch isOther ( keep value ).
}
}
If isOther = true, it means you have found another tree different from Tree(E,V) and it is T,
Else graph have single minimum spanning tree

Graph Theory - Length of Cycle UnDirected Graph - Adjacency Matrix

Study Review Question for comprehensive Exam for algorithms part.
Let G be an undirected Graph with n vertices that contains exactly one cycle and isolated vertices (i.e. no leaves). That means the degree of a vertex is 0 (isolated) if it is not in the cycle and 2 if it is part of the cycle. Assume that the graph is reresented by an adjacency matrix. Describe an efficeint algorithm that finds the length of the cycle.
I am looking for assistance on verifying my understanding, checking if it is correct and if the analysis is also correct.
My Answer (pseudo pythonic)
visited = [] // this will be list of u,v pairs belonging to cycle
for each u,v in G[][]:
if G[u][v] == 1: // is an edge
if G[u][v] in visited : //
return len(visited) // return the length of the cycle, since weve hit begining of cycle
else :
visited.add((u,v))
English Based understanding
We know a cycle must exist, be definition of the question, the case wherein no cycle found need not be accounted for
For each pair of vertices, check if it is an edge
if it is an edge, check if weve been there before. If we have, we've found the cycle, and return the size of all visited edges. (size of cycle)
If it is not a visited edge, add it to the visited list, and continue until we find the source edge (grow the cycle by 1 until we hit source)
My analysis for it I think may be off. Since we visit each (u,v) pair at least once, AND then check if it is an edge, as well as 2 comparisons per edge. I think it comes to O(|v|^2 + 2 |E|)
# of vertices, squared (since we visit every pair in a matrix), + 2 comparisons per edge.
Can someone please advise on efficiency and correctness? Also maybe provide more english based understanding if there is a logical leap I may have made, without acknowledge the proof of logic?
Thanks for reading and thanks in advance for assistance.
Given the conditions in the question (that is, the only edges in the graph are part of the cycle), the length of the cycle is the number edges in the graph, which is half the number of 1s in the adjacency matrix (each edge (i, j) appears twice in the adjacency matrix: A[i,j]=1 and A[j,i]=1).
The obvious algorithm therefore is to just sum the entries of the adjacency matrix and divide by 2. This is O(V^2) if there's V vertices.
One thing that looks like it might help is, once you've found the first 1 in the adjacency matrix, follow edges until you've got back to the start:
Find i, j such that A[i, j]=1.
start = i
cycle_length = 1
repeat
find k != i with A[j, k] = 1
i, j = j, k
cycle_length++
until i = start
After this process terminates, cycle_length is the length of the cycle. This is still worst-case O(V^2) though, although if you can find a single vertex on the cycle quickly, it's O(V*C) where C is the length of the cycle.
The code in your question doesn't work. You're iterating over (u, v) as indexes in the matrix, and it's impossible to find the same (u, v) twice.
Since theres exactly one cycle, a vertex is part of the cycle, if he is connected to atleast one other vertex. Since the graph is undirected, the following rule can be used:
if a edge between v1 and v2 exists, the edge aswell exists between v2 and v1 or in other words: the algorithm only needs to scan the part of the matrix where v1 < v2 is given, which reduces the number of matrixelements read even in worstcase by more than 50%. And since were searching a cylce, we can simply save every node we have visited before the previous node to ensure we don't visit it again and end, if we end up with the current node being equal to the starting node.
//search for any node that is in the cycle
int firstNode = -1
for int i in 1 , len(graph)
boolean notIsolated = false
for int j in 0 , i - 1
if graph[i][j] == 1
notIsolated = true
break
if notIsolated
firstNode = i
break
int node_c = firstNode
int node_p = -1
int count = 0
do
//search the neighbour that isn't the previous node with above given
//optimizations
int n
for n in 0 , node_c - 1
if graph[node_c][n] == 1 AND n != node_p
break
//update the nodevars for the next step
node_p = node_c
node_c = n
++count
while node_c != firstNode //continue until the algorithm reaches the startnode
Apart from that, there won't be much to be optimized (at least i don't know any way to further optimize runtime).

Find a path in a complete graph with cost limit and max reward

I'm looking for an algorithm to solve this problem. I have to implement it (so I need a not np solution XD)
I have a complete graph with a cost on each arch and a reward on each vertex. I have only a start point, but it doesn't matter the end point, becouse the problem is to find a path to see as many vertex as possible, in order to have the maximum reward possible, but subject to a maximum cost limit. (for this reason it doesn't matter the end position).
I think to find the optimum solution is a np-hard problem, but also an approximate solution is apprecciated :D
Thanks
I'm trying study how to solve the problem with branch & bound...
update: complete problem dscription
I have a region in which there are several areas identify by its id and x,y,z position. Each vertex identifies one ot these areas. The maximum number of ares is 200.
From a start point S, I know the cost, specified in seconds and inserted in the arch (so only integer values), to reach each vertex from each other vertex (a complete graph).
When I visit a vertex I get a reward (float valiues).
My objective is to find a paths in a the graph that maximize the reward but I'm subject to a cost constraint on the paths. Indeed I have only limited minutes to complete the path (for example 600 seconds.)
The graph is made as matrix adjacency matrix for the cost and reward (but if is useful I can change the representation).
I can visit vertex more time but with one reward only!
Since you're interested in branch and bound, let's formulate a linear program. Use Floyd–Warshall to adjust the costs minimally downward so that cost(uw) ≤ cost(uv) + cost(vw) for all vertices u, v, w.
Let s be the starting vertex. We have 0-1 variables x(v) that indicate whether vertex v is part of the path and 0-1 variables y(uv) that indicate whether the arc uv is part of the path. We seek to maximize
sum over all vertices v of reward(v) x(v).
The constraints unfortunately are rather complicated. We first relate the x and y variables.
for all vertices v ≠ s, x(v) - sum over all vertices u of y(uv) = 0
Then we bound the cost.
sum over all arcs uv of cost(uv) y(uv) ≤ budget
We have (pre)flow constraints to ensure that the arcs chosen look like a path possibly accompanied by cycles (we'll handle the cycles shortly).
for all vertices v, sum over all vertices u of y(uv)
- sum over all vertices w of y(vw)
≥ -1 if v = s
0 if v ≠ s
To handle the cycles, we add cut covering constraints.
for all subsets of vertices T such that s is not in T,
for all vertices t in T,
x(t) - sum over all vertices u not in T and v in T of y(uv) ≥ 0
Because of the preflow constraints, a cycle necessarily is disconnected from the path structure.
There are exponentially many cut covering constraints, so when solving the LP, we have to generate them on demand. This means finding the minimum cut between s and each other vertex t, then verifying that the capacity of the cut is no greater than x(t). If we find a violation, then we add the constraint and use the dual simplex method to find the new optimum (repeat as necessary).
I'm going to pass on describing the branching machinery – this should be taken care of by your LP solver anyway.
Finding the optimal solution
Here is a recursive approach to solving your problem.
Let's begin with some definitions :
Let A = (Ai)1 ≤ i ≤ N be the areas.
Let wi,j = wj,i the time cost for traveling from Ai to Aj and vice versa.
Let ri the reward for visiting area Ai
Here is the recursive procedure that will output the exact requested solution : (pseudo-code)
List<Area> GetBestPath(int time_limit, Area S, int *rwd) {
int best_reward(0), possible_reward(0), best_fit(0);
List<Area> possible_path[N] = {[]};
if (time_limit < 0) {
return [];
}
if (!S.visited) {
*rwd += S.reward;
S.visit();
}
for (int i = 0; i < N; ++i) {
if (S.index != i) {
possible_path[i] = GetBestPath(time_limit - W[S.index][i], A[i], &possible_reward);
if (possible_reward > best_reward) {
best_reward = possible_reward;
best_fit = i;
}
}
}
*rwd+= best_reward;
possible_path[best_fit].push_front(S);
return possible_path[best_fit];
}
For obvious clarity reasons, I supposed the Ai to be globally reachable, as well as the wi,j.
Explanations
You start at S. First thing you do ? Collect the reward and mark the node as visited. Then you have to check which way to go is best between the S's N-1 neighbors (lets call them NS,i for 1 ≤ i ≤ N-1).
This is the exact same thing as solving the problem for NS,i with a time limit of :
time_limit - W(S ↔ NS,i)
And since you mark the visited nodes, when arriving at an area, you first check if it is marked. If so you have no reward ... Else you collect and mark it as visited ...
And so forth !
The ending condition is when time_limit (C) becomes negative. This tells us we reached the limit and cannot proceed to further moves : the recursion ends. The final path may contain useless journeys if all the rewards have been collected before the time limit C is reached. You'll have to "prune" the output list.
Complexity ?
Oh this solution is soooooooo awful in terms of complexity !
Each calls leads to N-1 calls ... Until the time limit is reached. The longest possible call sequence is yielded by going back and forth each time on the shortest edge. Let wmin be the weight of this edge.
Then obviously, the overall complexity is bounded by NC/wmin.C/wmin.
This is huuuuuge.
Another approach
Maintain a hash table of all the visited nodes.
On the other side, maintain a Max-priority queue (eg. using a MaxHeap) of the nodes that have not been collected yet. (The top of the heap is the node with the highest reward). The priority value for each node Ai in the queue is set as the couple (ri, E[wi,j])
Pop the heap : Target <- heap.pop().
Compute the shortest path to this node using Dijkstra algorithm.
Check out the path : If the cost of the path is too high, then the node is not reachable, add it to the unreachable nodes list.
Else collect all the uncollected nodes that you find in it and ...
Remove each collected node from the heap.
Set Target as the new starting point.
In either case, proceed to step 1. until the heap is empty.
Note : A hash table is the best suited to keep track of the collected node. This way, we can check a node in a path computed using Dijkstra in O(1).
Likewise, maintaining a hashtable leading to the position of each node in the heap might be useful to optimise the "pruning" of the heap, when collecting the nodes along a path.
A little analysis
This approach is slightly better than the first one in terms of complexity, but may not lead to the optimal result. In fact, it can even perform quite poorly on some graph configurations. For example, if all nodes have a reward r, except one node T that has r+1 and W(N ↔ T) = C for every node N, but the other edges would be all reachable, then this will only make you collect T and miss every other node. In this particular case, the best solution would have been to ignore T and collect everyone else leading to a reward of (N-1).r instead of only r+1.

Number of paths between two nodes in a DAG

I want to find number of paths between two nodes in a DAG. O(V^2) and O(V+E) are acceptable.
O(V+E) reminds me to somehow use BFS or DFS but I don't know how.
Can somebody help?
Do a topological sort of the DAG, then scan the vertices from the target backwards to the source. For each vertex v, keep a count of the number of paths from v to the target. When you get to the source, the value of that count is the answer. That is O(V+E).
The number of distinct paths from node u to v is the sum of distinct paths from nodes x to v, where x is a direct descendant of u.
Store the number of paths to target node v for each node (temporary set to 0), go from v (here the value is 1) using opposite orientation and recompute this value for each node (sum the value of all descendants) until you reach u.
If you process the nodes in topological order (again opposite orientation) you are guaranteed that all direct descendants are already computed when you visit given node.
Hope it helps.
This question has been asked elsewhere on SO, but nowhere has the simpler solution of using DFS + DP been mentioned; all solutions seem to use topological sorting. The simpler solution goes like this (paths from s to t):
Add a field to the vertex representation to hold an integer count. Initially, set vertex t’s count to 1 and other vertices’ count to 0. Start running DFS with s as the start vertex. When t is discovered, it should be immediately marked as finished (BLACK), without further processing starting from it. Subsequently, each time DFS finishes a vertex v, set v’s count to the sum of the counts of all vertices adjacent to v. When DFS finishes vertex s, stop and return the count computed for s. The time complexity of this solution is O(V+E).
Pseudo-code:
simple_path (s, t)
if (s == t)
return 1
else if (path_count != NULL)
return path_count
else
path_count = 0
for each node w ϵ adj[s]
do path_count = path_count + simple_path(w, t)
end
return path_count
end

Resources