Study Review Question for comprehensive Exam for algorithms part.
Let G be an undirected Graph with n vertices that contains exactly one cycle and isolated vertices (i.e. no leaves). That means the degree of a vertex is 0 (isolated) if it is not in the cycle and 2 if it is part of the cycle. Assume that the graph is reresented by an adjacency matrix. Describe an efficeint algorithm that finds the length of the cycle.
I am looking for assistance on verifying my understanding, checking if it is correct and if the analysis is also correct.
My Answer (pseudo pythonic)
visited = [] // this will be list of u,v pairs belonging to cycle
for each u,v in G[][]:
if G[u][v] == 1: // is an edge
if G[u][v] in visited : //
return len(visited) // return the length of the cycle, since weve hit begining of cycle
else :
visited.add((u,v))
English Based understanding
We know a cycle must exist, be definition of the question, the case wherein no cycle found need not be accounted for
For each pair of vertices, check if it is an edge
if it is an edge, check if weve been there before. If we have, we've found the cycle, and return the size of all visited edges. (size of cycle)
If it is not a visited edge, add it to the visited list, and continue until we find the source edge (grow the cycle by 1 until we hit source)
My analysis for it I think may be off. Since we visit each (u,v) pair at least once, AND then check if it is an edge, as well as 2 comparisons per edge. I think it comes to O(|v|^2 + 2 |E|)
# of vertices, squared (since we visit every pair in a matrix), + 2 comparisons per edge.
Can someone please advise on efficiency and correctness? Also maybe provide more english based understanding if there is a logical leap I may have made, without acknowledge the proof of logic?
Thanks for reading and thanks in advance for assistance.
Given the conditions in the question (that is, the only edges in the graph are part of the cycle), the length of the cycle is the number edges in the graph, which is half the number of 1s in the adjacency matrix (each edge (i, j) appears twice in the adjacency matrix: A[i,j]=1 and A[j,i]=1).
The obvious algorithm therefore is to just sum the entries of the adjacency matrix and divide by 2. This is O(V^2) if there's V vertices.
One thing that looks like it might help is, once you've found the first 1 in the adjacency matrix, follow edges until you've got back to the start:
Find i, j such that A[i, j]=1.
start = i
cycle_length = 1
repeat
find k != i with A[j, k] = 1
i, j = j, k
cycle_length++
until i = start
After this process terminates, cycle_length is the length of the cycle. This is still worst-case O(V^2) though, although if you can find a single vertex on the cycle quickly, it's O(V*C) where C is the length of the cycle.
The code in your question doesn't work. You're iterating over (u, v) as indexes in the matrix, and it's impossible to find the same (u, v) twice.
Since theres exactly one cycle, a vertex is part of the cycle, if he is connected to atleast one other vertex. Since the graph is undirected, the following rule can be used:
if a edge between v1 and v2 exists, the edge aswell exists between v2 and v1 or in other words: the algorithm only needs to scan the part of the matrix where v1 < v2 is given, which reduces the number of matrixelements read even in worstcase by more than 50%. And since were searching a cylce, we can simply save every node we have visited before the previous node to ensure we don't visit it again and end, if we end up with the current node being equal to the starting node.
//search for any node that is in the cycle
int firstNode = -1
for int i in 1 , len(graph)
boolean notIsolated = false
for int j in 0 , i - 1
if graph[i][j] == 1
notIsolated = true
break
if notIsolated
firstNode = i
break
int node_c = firstNode
int node_p = -1
int count = 0
do
//search the neighbour that isn't the previous node with above given
//optimizations
int n
for n in 0 , node_c - 1
if graph[node_c][n] == 1 AND n != node_p
break
//update the nodevars for the next step
node_p = node_c
node_c = n
++count
while node_c != firstNode //continue until the algorithm reaches the startnode
Apart from that, there won't be much to be optimized (at least i don't know any way to further optimize runtime).
Related
Consider a directed graph on n vertices, where each vertex has exactly
one outgoing edge. This graph consists of a collection of cycles as
well as additional vertices that have paths to the cycles, which we
call the branches. Describe a linear time algorithm that identifies
all of the cycles and computes the length of each cycle. You can
assume that the input is given as an array A, where A[i] is the
neighbor of i, so that the graph has the edge (i, A[i]).
So far my approach to the algorithm is basically marking the vertices I have traversed, and every time a vertex points back to the ones that I've traversed I count one cycle and move on to the next unvisited vertex. During the process, I also have a hashmap or something to record the order in which each node is traversed so I can calculate the length whenever I identify a cycle. (Would that be linear?) However, I very new to proof and I have no idea how to justify the correctness of an algorithm.
If you are allowed to use extra memory, the algorithm in Python would be like this.
colors = [0] ** N; # initialize N element array withe values of zero (not seen)
for i in range(N):
v = i # current vertex
if colors[v] != 0: continue # already seen
colors[v] = 1 # seen
v = A[v] # move to neighbor
while colors[v] == 0:
colors[v] = 1
v = A[v] # move to neighbor
# we have reached previously seen node; this is the start node of a cycle
colors[v] = 2 # mark the start of a cycle
cycle_len = 1
v = A[v] # move to neighbor
while colors[v] == 1:
cycle_len += 1
v = A[v] # move to neighbor
print("got a cycle with length =", cycle_len)
The basic idea is to use three colors to differently mark nodes that have already been visited and nodes that are the starting points of cycles; obviously, a single node can only belong to a single cycle.
The algorithm is linear as the internal while loop is only executed for nodes that have not been previously seen. Nodes already seen are skipped. In the worst case, both internal while loops are fully executed, but 2*N is still O(N).
Using a hashmap would not match the requirements, as the worst-case time complexity for hashmaps is not linear.
I am looking for an algorithm that finds minimal subset of vertices such that by removing this subset (and edges connecting these vertices) from graph all other vertices become unconnected (i.e. the graph won't have any edges).
Is there such algorithm?
If not: Could you recommend some kind of heuristics to designate the vertices.
I have a basic knowledge of graph theory so please excuse any incorrectness.
IIUC, this is the classic Minimum Vertex Cover problem, which is, unfortunately, NP Complete.
Fortunately, the most intuitive and greedy possible algorithm is as good as it gets in this case.
The greedy algorithm is a 2-approximation for vertex cover, which in theory, under the Unique Games Conjecture, is as good as it gets. In practice, solving a formulation of vertex cover as an integer program will most likely yield much better results. The program is
min sum_{v in V} x(v)
s.t.
forall {u, v} in E, x(u) + x(v) >= 1
forall v in V, x(v) in {0, 1}.
Try this way:
Define a variable to count number of vertexes, starting by 0;
Create a Max-Heap of vertexes sorted by the length of the adjacent list of each vertex;
Remove all edges from the first vertex of the Heap (the one with biggest number of edges) and remove it from the Heap, adding 1 to the count;
Reorder the Heap now that number of edges of the vertexes changed, repeating the previous step until the length of the adjacent list from the first vertex is 0;
Heap Q
int count = 0
while(1){
Q = Create_Heap(G)
Vertex first = Q.pop
if(first.adjacents.size() == 0) {
break
}
for( Vertex v : first.adjacent ){
RemoveEdge(first, v)
RemoveEdge(v, first) /* depends on the implementation */
}
count = count + 1
}
return count
A slightly more theoretical question, but here it is nonetheless.
Setting
Let:
UCYLE = { : G is an undirected graph that contains a simple cycle}.
My Solution
we show UCYLE is in L by constructing algorithm M that decides UCYLE using $L$ space.
M = "On input where G = (V,E)
For each v_i in V, for each v_j in Neighbor(v_i), store the current v_i and v_j
Traverse the edge (v_i,v_j) and then follow all possible paths through G using DFS.
If we encounter v_k in Neighbor(v_i) / {v_j} so that there is an edge (v_i,v_k) in E, then ACCEPT. Else REJECT."
First we claim M decides UCYLE. First, if there exists a cycle in $G$, then it must start and end on some vertex $v_i$, step one of $M$ tries all such $v_i$'s and therefore must find the desired vertex. Next, suppose the cycle starts at $v_i$, then there must exists a starting edge $(v_i,v_j)$ so that if we follow the cycle, we come back to $v_i$ through a different edge $(v_k,v_i)$, so we accept in step three. Since the graph is undirected, we can always come back to $v_i$ through $(v_i,v_j)$, but $M$ does not accept this case. By construction, neither does $M$ accept if we come upon some $v_k in Neighbor(v_i)/{v_j}$ but there is no edge from $v_k$ to $v_i$.
Now we show M is in L. First if the vertices are labled $1,\ldots,n$ where $|\mathbb V| = n$, then it requires $log(n)$ bits to specify each $v_i$. Next note in $\mathcal M$ we only need to keep track of the current $v_i$ and $v_j$, so M is $2 log(n) = O(log n), which is in L
My Problem
My problem is how do you perform DFS on the graph in $log(n)$ space. For example, in the worst case where each vertex has degree $n$, you'd have to keep a counter of which vertex you took on a particular path, which would require $n log(n)$ space.
The state you maintain as you search is four vertices: (v_i, v_j, prev, current).
The next state is: (v_i, v_j, current, v) where v is the next neighbour of current after prev (wrapping back to the first if prev is the numerically last neighbour of current).
You stop when current is a neighbour of v_i and reject if it's not v_j.
In pseudo-code, something like this:
for v_i in vertices
for v_j in neighbours(v_i)
current, prev = v_j, v_i
repeat
idx = neighbours(current).index(v_j)
idx = (idx + 1) % len(neighbours(current))
current, prev = neighbours(current)[idx], current
until current adjacent to v_i
if current != v_j
return FOUND_A_CYCLE
return NO_CYCLES_EXIST
Intuitively, this is saying for each point in a maze, and for each corridor from that point, follow the left-hand wall, and if when you can see the start point again if it's not through the original corridor then you've found a cycle.
While it's easy to see that this algorithm uses O(log n) space, there's some proof necessary to show that this algorithm terminates.
I'm looking for an algorithm to solve this problem. I have to implement it (so I need a not np solution XD)
I have a complete graph with a cost on each arch and a reward on each vertex. I have only a start point, but it doesn't matter the end point, becouse the problem is to find a path to see as many vertex as possible, in order to have the maximum reward possible, but subject to a maximum cost limit. (for this reason it doesn't matter the end position).
I think to find the optimum solution is a np-hard problem, but also an approximate solution is apprecciated :D
Thanks
I'm trying study how to solve the problem with branch & bound...
update: complete problem dscription
I have a region in which there are several areas identify by its id and x,y,z position. Each vertex identifies one ot these areas. The maximum number of ares is 200.
From a start point S, I know the cost, specified in seconds and inserted in the arch (so only integer values), to reach each vertex from each other vertex (a complete graph).
When I visit a vertex I get a reward (float valiues).
My objective is to find a paths in a the graph that maximize the reward but I'm subject to a cost constraint on the paths. Indeed I have only limited minutes to complete the path (for example 600 seconds.)
The graph is made as matrix adjacency matrix for the cost and reward (but if is useful I can change the representation).
I can visit vertex more time but with one reward only!
Since you're interested in branch and bound, let's formulate a linear program. Use Floyd–Warshall to adjust the costs minimally downward so that cost(uw) ≤ cost(uv) + cost(vw) for all vertices u, v, w.
Let s be the starting vertex. We have 0-1 variables x(v) that indicate whether vertex v is part of the path and 0-1 variables y(uv) that indicate whether the arc uv is part of the path. We seek to maximize
sum over all vertices v of reward(v) x(v).
The constraints unfortunately are rather complicated. We first relate the x and y variables.
for all vertices v ≠ s, x(v) - sum over all vertices u of y(uv) = 0
Then we bound the cost.
sum over all arcs uv of cost(uv) y(uv) ≤ budget
We have (pre)flow constraints to ensure that the arcs chosen look like a path possibly accompanied by cycles (we'll handle the cycles shortly).
for all vertices v, sum over all vertices u of y(uv)
- sum over all vertices w of y(vw)
≥ -1 if v = s
0 if v ≠ s
To handle the cycles, we add cut covering constraints.
for all subsets of vertices T such that s is not in T,
for all vertices t in T,
x(t) - sum over all vertices u not in T and v in T of y(uv) ≥ 0
Because of the preflow constraints, a cycle necessarily is disconnected from the path structure.
There are exponentially many cut covering constraints, so when solving the LP, we have to generate them on demand. This means finding the minimum cut between s and each other vertex t, then verifying that the capacity of the cut is no greater than x(t). If we find a violation, then we add the constraint and use the dual simplex method to find the new optimum (repeat as necessary).
I'm going to pass on describing the branching machinery – this should be taken care of by your LP solver anyway.
Finding the optimal solution
Here is a recursive approach to solving your problem.
Let's begin with some definitions :
Let A = (Ai)1 ≤ i ≤ N be the areas.
Let wi,j = wj,i the time cost for traveling from Ai to Aj and vice versa.
Let ri the reward for visiting area Ai
Here is the recursive procedure that will output the exact requested solution : (pseudo-code)
List<Area> GetBestPath(int time_limit, Area S, int *rwd) {
int best_reward(0), possible_reward(0), best_fit(0);
List<Area> possible_path[N] = {[]};
if (time_limit < 0) {
return [];
}
if (!S.visited) {
*rwd += S.reward;
S.visit();
}
for (int i = 0; i < N; ++i) {
if (S.index != i) {
possible_path[i] = GetBestPath(time_limit - W[S.index][i], A[i], &possible_reward);
if (possible_reward > best_reward) {
best_reward = possible_reward;
best_fit = i;
}
}
}
*rwd+= best_reward;
possible_path[best_fit].push_front(S);
return possible_path[best_fit];
}
For obvious clarity reasons, I supposed the Ai to be globally reachable, as well as the wi,j.
Explanations
You start at S. First thing you do ? Collect the reward and mark the node as visited. Then you have to check which way to go is best between the S's N-1 neighbors (lets call them NS,i for 1 ≤ i ≤ N-1).
This is the exact same thing as solving the problem for NS,i with a time limit of :
time_limit - W(S ↔ NS,i)
And since you mark the visited nodes, when arriving at an area, you first check if it is marked. If so you have no reward ... Else you collect and mark it as visited ...
And so forth !
The ending condition is when time_limit (C) becomes negative. This tells us we reached the limit and cannot proceed to further moves : the recursion ends. The final path may contain useless journeys if all the rewards have been collected before the time limit C is reached. You'll have to "prune" the output list.
Complexity ?
Oh this solution is soooooooo awful in terms of complexity !
Each calls leads to N-1 calls ... Until the time limit is reached. The longest possible call sequence is yielded by going back and forth each time on the shortest edge. Let wmin be the weight of this edge.
Then obviously, the overall complexity is bounded by NC/wmin.C/wmin.
This is huuuuuge.
Another approach
Maintain a hash table of all the visited nodes.
On the other side, maintain a Max-priority queue (eg. using a MaxHeap) of the nodes that have not been collected yet. (The top of the heap is the node with the highest reward). The priority value for each node Ai in the queue is set as the couple (ri, E[wi,j])
Pop the heap : Target <- heap.pop().
Compute the shortest path to this node using Dijkstra algorithm.
Check out the path : If the cost of the path is too high, then the node is not reachable, add it to the unreachable nodes list.
Else collect all the uncollected nodes that you find in it and ...
Remove each collected node from the heap.
Set Target as the new starting point.
In either case, proceed to step 1. until the heap is empty.
Note : A hash table is the best suited to keep track of the collected node. This way, we can check a node in a path computed using Dijkstra in O(1).
Likewise, maintaining a hashtable leading to the position of each node in the heap might be useful to optimise the "pruning" of the heap, when collecting the nodes along a path.
A little analysis
This approach is slightly better than the first one in terms of complexity, but may not lead to the optimal result. In fact, it can even perform quite poorly on some graph configurations. For example, if all nodes have a reward r, except one node T that has r+1 and W(N ↔ T) = C for every node N, but the other edges would be all reachable, then this will only make you collect T and miss every other node. In this particular case, the best solution would have been to ignore T and collect everyone else leading to a reward of (N-1).r instead of only r+1.
I have an adjacency matrix for an ordered graph and I need to find vertex to which all others have edge to (in it's row there are all 1s except for the diagonal):
If this is adjacency matrix:
0 0 0
0 0 0
1 1 0
the algorithm should yield vertex 3.
Suppose that there is at least one vertex like this.
Solution in O(N^2) (N being the number of vertices) is trivial, but how can this be done in O(N)?
Preconditions:
the graph is an ordered graph
there is one vertex that all other vertices have an in-edge to
Since the edges need to induce a total ordering, the vertex that needs to be found is the "smallest" vertex, it does not have any out edges, because this would be to one of the other edges that it is already connected to and that would lead to a cycle, which is not allowed in an ordered graph.
Also the graph needs to be connected, therefor all paths need to lead to the smallest vertex, which brings us to this algorithm:
start with the set of all rows as possible candidates
choose one vertex from the set and iterate over possible edges to the remaining candidates.
If there is an edge to the candidate, remove the candidate from step 2 from the list and continue at 2 with the new candidate.
If there is no edge to that vertex remove the target candidate from the candidate set and continue with the next possible edge.
if no candidate is left, the current vertex is the one you were looking for
Since each step can be carried out in O(1) and in each step the set of remaining candidates is reduced, the running time should be O(N).
An ordered graph is a graph with a total order over its vertices. There is no other requirement, in particular, it does not restrict where the edges can go. So the answer to the question is that you sometimes cannot do better than O(N^2): Consider an adjacency matrix where one row has all non-diagonal entries equal to one and all other rows have exactly one zero non-diagonal entry. Unless you are extremely lucky, you need to go through nearly the whole adjacency matrix to find out which row has no non-diagonal zero.
So I assume that you mean a directed graph admitting a topological ordering. That is, a directed acyclic graph (DAG). In that case, Sebastian already answered it, but since the answer is not accepted, let me try to explain it hopefully more clearly.
If a vertex in a DAG has incoming edges from every other vertex, then it has no outgoing edges, since these would form a cycle of length 2. In other words, its corresponding column has only zeros. Such a vertex is called a universal sink and there is a well-known O(N) algorithm for finding it.
General algorithm:
candidates := {0,1,...,N-1}
while |candidates| > 1 do:
arbitrarily select candidates u and v
if (u,v) is an edge, remove u from cadidates, else remove v from candidates
test if the last remaining candidate is a universal sink
If you know that your graph has a universal sink, then the last step is unnecesary.
The number of iterations of the while loop is N-1, because every iteration removes one vertex from the set of candidates. The algorithm is correct, because it removes only vertices that cannot be universal sinks - either the removed vertex has an outgoing edge or it does not have an incoming edge from some vertex.
In the code below, you can notice that we do not hold the list of candidates explicitly. The list of candidates in step i of the for-cycle is {candidate, i, i+1, ..., N-1} and the selected candidates u and v are candidate and i.
// step 2
int candidate = 0;
for(int i=1; i<N; i++)
{
if(edge[candidate][i] == 1)
candidate = i;
}
// step 3
bool no_sink = false;
for(int i=0; i<N; i++)
{
if(candidate != i && (edge[candidate][i] == 1 || edge[i][candidate]==0))
no_sink = true;
}
if(no_sink)
printf("No universal sink.");
else
printf("The universal sink is %d.", candidate);