we consider only graphs that are undirected. The diameter of a graph is the maximum, over all choices of vertices s and t, of the shortest-path distance between s and t . (Recall the shortest-path distance between s and t is the fewest number of edges in an s-t path.) Next, for a vertex s, let l(s) denote the maximum, over all vertices t, of the shortest-path distance between s and t. The radius of a graph is the minimum of l(s) over all choices of the vertex s.
with radius r and the diameter d which of the following is always hold? choose the best answer.
1) r >= d/2
2) r <= d
we know the (1) and (2) always hold and in any reference book that written.
my challenge is this problem mentioned on Entrance Exam and just one of (1) or (2) should be true, the OP says choose the best answer and after the exam answer sheet wrote (1) is the best choice. How can verify me, why the (1) is better than (2).
They both are indeed True.
Don't let an exam with ambiguous questions weaken your concepts.
Well as for the Proof:
First of all 2nd inequality is quite trivial (from the definition itself)
Now the 1st one
d <= 2*r
Let z be a central vertex, then:
e(z)=r
Now,
diameter = d(x,y) [d(x,y) denotes distance between some vertex x & y]
d(x,y) <= d(x,z) + d(z,y)
d(x,y) <= d(z,x) + d(z,y)
d(x,y) <= e(z) + e(z) [this can be an upper bound as e(z)>=d(z,u) for all u]
diameter <= 2*r
They both hold.
2) should be clear.
1) holds using the triangle inequality. We can use this property because distances on graphs are a metric (http://en.wikipedia.org/wiki/Metric_%28mathematics%29). Using Let d(x, z) = diameter(G) and let y be a center of G (i.e. there exists a vertex v in G such that d(y, v) = radius(G)). Because d(y, v) = radius(G) and d(y, v) = d(v, y), we know that d(v, z) <= radius(G). Then we have that diameter(G) = d(x, z) <= d(y, v) + d(v, z) <= 2*radius(G).
The OP defined the shortest-path distance between s and t as "the fewest number of edges in an s-t path". This makes things simpler.
We may write the definitions in terms of some pseudocode:
def dist(s, t):
return min([len(path)-1 for path starts with s and ends with t])
r = min([max([dist(s, t) for t in V]) for s in V])
d = max([max([dist(s, t) for t in V]) for s in V])
where V is the set of all vertexes.
Now (2) is obviously true. The definition itself tells this: max always >= min.
(1) is slightly less obvious. It requires at least a few steps to prove.
Suppose d = dist(A, B), and r = dist(C, D), we have
dist(C, A) + dist(C, B) >= dist(A, B),
otherwise the length of path A-C-B would be smaller than dist(A, B).
From the definition of r, we know that
dist(C, D) >= dist(C, A)
dist(C, D) >= dist(C, B)
Hence 2 * dist(C, D) >= dist(A, B), i.e., 2 * r >= d.
So which one is better? This depends on how you define "better". If we consider something non-trivially correct (or not so obvious) to be better than something trivially correct, then we may agree that (1) is better than (2).
Related
I think with A* algorithm it should be SAEFG, but the answer is SBEFG. Now my prof is a man of unavailability. Can someone explain why SBEFG?
The heuristic function used in the example is not consistent. For the heuristic to be consistent the following equality must be true:
For each nodes x, y: h(x) + w(x, y) >= h(y), where h(v) is the heuristic function's value for node v, w(x, y) is the real distance between nodes x and y.
In this example, h(B) = 13, h(E) = 4 and w(B, E) = 6. As you can see, h(E) + w(B, E) = 10 < h(B), so the heuristic is not consistent.
What does that mean? Well, the A* search with such a heuristic might not be optimal in graphs. It might not find the shortest path without revisiting some nodes.
In this example the author probably assumes revisiting the E node, so the A* will first go to C, then A, E, F, D, B and it should revisit E again from B, because SBE is shorter than SAE, then revisit F and go to G. The final path will be SBEFG.
So I'm trying to write an algorithm for computing the shortest path with constraints on the vertices you can visit in O(m + nlogn) time. In this problem, we are given an indirect weighted (non negative) graph G = (V, E), a set of vertices X ⊂ V, and two vertices s, t ∈ V \ X. The graph is given with adjacency lists and we can assume that it takes O(1) time to determine if a vertex is in X. I'm trying to write an algorithm which can compute the shortest path between s and t in G such that the path includes at most one vertex from X if such a path exists in O(m + nlogn) time. I know that this algorithm would require a modified Dijkstra's algorithm but I'm unsure how to proceed from here. Could anyone please help me out? Thank you
Construct a new graph by taking two disjoint copies of your graph G (call them G0 and G1), and for each edge (v, x) in G (with x in X), add an additional edge in the combined graph from v in G0 to x in G1. This new graph has twice as many vertices as G, and at most three times as many edges.
The shortest path from s in G0 to t in G1 is the shortest path from s to t in G going through at least one vertex of X.
On this new graph, Dijkstra's algorithm (with the priority queue implemented with a fibonacci heap) works in time O(3m + 2n log 2n) = O(m + n log n).
A possibility is to double the number of vertices not in X.
For each vertex v not in X, you create v0 and v1: v0 is accessible only without passing from a vertex in X, v1 is accessible by passing through one and only one vertex in X.
Let us call w another vertex. Then:
if w is in X, v not in X:
length (w, v0) = infinite and dist(v1) = min (dist(v1), dist(w) + length(w, v))
if w is in X, v in X:
length (w, v) = infinite
if w is not in X, v not in X:
dist (v0) = min (dist(v0), dist (w0) + length (w, v))
dist (v1) = min (dist(v1), dist (w1) + length (w, v))
if w is not in X, v is in X:
dist (v) = min (dist(v), dist (w0) + length (w, v))
I need to find a 'dynamic - programming' kind of solution for the following problem:
Input:
Perfect Binary-Tree, T = (V,E) - (each node has exactly 2 children except the leafs).
V = V(blue) ∪ V(black).
V(blue) ∩ V(black) = ∅.
(In other words, some vertices in the tree are blue)
Root of the tree 'r'.
integer k
A legal Solution:
A subset of vertices V' ⊆ V which is a vertex cover of T, and |V' ∩ V(blue)| = k. (In other words, the cover V' contains k blue vertices)
Solution Value:
The value of a legal solution V' is the number of vertices in the set = |V'|.
For convenience, we will define the value of an "un-legal" solution to be ∞.
What we need to find:
A solution with minimal Value.
(In other words, The best solution is a solution which is a cover, contains exactly k blue vertices and the number of vertices in the set is minimal.)
I need to define a typical sub-problem. (Like, if i know what is the value solution of a sub tree I can use it to find my value solution to the problem.)
and suggest a formula to solve it.
To me, looks like you are on the right track!
Still, I think you will have to use an additional parameter to tell us how far is any picked vertex from the current subtree's root.
For example, it can be just the indication whether we pick the current vertex, as below.
Let fun (v, b, p) be the optimal size for subtree with root v such that, in this subtree, we pick exactly b blue vertices, and p = 1 if we pick vertex v or p = 0 if we don't.
The answer is the minimum of fun (r, k, 0) and fun (r, k, 1): we want the answer for the full tree (v = r), with exactly k vertices covered in blue (b = k), and we can either pick or not pick the root.
Now, how do we calculate this?
For the leaves, fun (v, 0, 0) is 0 and fun (v, t, 1) is 1, where t tells us whether vertex v is blue (1 if yes, 0 if no).
All other combinations are invalid, and we can simulate it by saying the respective values are positive infinities: for example, for a leaf vertex v, the value fun (v, 3, 1) = +infinity.
In the implementation, the infinity can be just any value greater than any possible answer.
For all internal vertices, let v be the current vertex and u and w be its children.
We have two options: to pick or not to pick the vertex v.
Suppose we pick it.
Then the value we get for f (v, b, 1) is 1 (the picked vertex v) plus the minimum of fun (u, x, q) + fun (w, y, r) such that x + y is either b if the vertex v is black or b - 1 if it is blue, and q and r can be arbitrary: if we picked the vertex v, the edges v--u and v--w are already covered by our vertex cover.
Now let us not pick the vertex v.
Then the value we get for f (v, b, 0) is just the minimum of fun (u, x, 1) + fun (w, y, 1) such that x + y = b: if we did not pick the vertex v, the edges v--u and v--w have to be covered by u and w.
We have a directed graph G = (V, E) for a comm. network with each edge having a probability of not failing r(u, v) (defined as edge weight) which lies in interval [0, 1]. The probabilities are independent, so that from one vertex to another, if we multiply all probabilities, we get the the probability of the entire path not failing.
I need an efficient algorithm to find a most reliable path from one given vertex to another given vertex (i.e., a path from the first vertex to the second that is least likely to fail). I am given that log(r · s) = log r + log s will be helpful.
This is what I have so far -:
DIJKSTRA-VARIANT (G, s, t)
for v in V:
val[v] ← ∞
A ← ∅
Q ← V to initialize Q with vertices in V.
val[s] ← 0
while Q is not ∅ and t is not in A
do x ← EXTRACT-MIN (Q)
A ← A ∪ {x}
for each vertex y ∈ Adj[x]
do if val[x] + p(x, y) < val[y]:
val[y] = val[x] + p(x, y)
s is the source vertex and t is the destination vertex. Of course, I have not exploited the log property as I am not able to understand how to use it. The relaxation portion of the algorithm at the bottom needs to be modified, and the val array will capture the results. Without log, it would probably be storing the next highest probability. How should I modify the algorithm to use log?
Right now, your code has
do if val[x] + p(x, y) < val[y]:
val[y] = val[x] + p(x, y)
Since the edge weights in this case represent probabilities, you need to multiply them together (rather than adding):
do if val[x] * p(x, y) > val[y]:
val[y] = val[x] * p(x, y)
I've changed the sign to >, since you want the probability to be as large as possible.
Logs are helpful because (1) log(xy) = log(x) + log(y) (as you said) and sums are easier to compute than products, and (2) log(x) is a monotonic function of x, so log(x) and x have their maximum in the same place. Therefore, you can deal with the logarithm of the probability, instead of the probability itself:
do if log_val[x] + log(p(x, y)) > log_val[y]:
log_val[y] = log_val[x] + log(p(x, y))
Edited to add (since I don't have enough rep to leave a comment): you'll want to initialize your val array to 0, rather than Infinity, because you're calculating a maximum instead of a minimum. (Since you want the largest probability of not failing.) So, after log transforming, the initial log_val array values should be -Infinity.
In order to calculate probabilities you should multiply (instead of add) in the relaxation phase, which means changing:
do if val[x] + p(x, y) < val[y]:
val[y] = val[x] + p(x, y)
to:
do if val[x] * p(x, y) < val[y]:
val[y] = val[x] * p(x, y)
Using the Log is possible if the range is (0,1] since log(0) = -infinity and log(1) = 0, it means that for every x,y in (0,1]: probability x < probability y than: log(x) < log(y). Since we are maintaining the same relation (between probabilities) this modification will provide the correct answer.
I think you'll be able to take it from here.
I think I may have solved the question partially.
Here is my attempt. Edits and pointers are welcome -:
DIJKSTRA-VARIANT (G, s, t)
for v in V:
val[v] ← 0
A ← ∅
Q ← V to initialize Q with vertices in V.
val[s] ← 1
while Q is not ∅ and t is not in A
do x ← EXTRACT-MAX (Q)
A ← A ∪ {x}
for each vertex y ∈ Adj[x]
do if log(val[x]) + log(p(x, y)) > log(val[y]):
log(val[y]) = log(val[x]) + log(p(x, y))
Since I am to find the highest possible probability values, I believe I should be using >. The following questions remain -:
What should the initial values in the val array be?
Is there anything else I need to add?
EDIT: I have changed the initial val values to 0. However, log is undefined at 0. I am open to a better alternative. Also, I changed the priority queue's method to EXTRACT-MAX since it is the larger probabilities that need to be extracted. This would ideally be implemented on a binary max-heap.
FURTHER EDIT: I have marked tinybike's answer as accepted, since they have posted most of the necessary details that I require. The algorithm should be as I have posted here.
In order to find the diameter of a tree I can take any node from the tree, perform BFS to find a node which is farthest away from it and then perform BFS on that node. The greatest distance from the second BFS will yield the diameter.
I am not sure how to prove this, though? I have tried using induction on the number of nodes, but there are too many cases.
Any ideas would be much appreciated...
Let's call the endpoint found by the first BFS x. The crucial step is proving that the x found in this first step always "works" -- that is, that it is always at one end of some longest path. (Note that in general there can be more than one equally-longest path.) If we can establish this, it's straightforward to see that a BFS rooted at x will find some node as far as possible from x, which must therefore be an overall longest path.
Hint: Suppose (to the contrary) that there is a longer path between two vertices u and v, neither of which is x.
Observe that, on the unique path between u and v, there must be some highest (closest to the root) vertex h. There are two possibilities: either h is on the path from the root of the BFS to x, or it is not. Show a contradiction by showing that in both cases, the u-v path can be made at least as long by replacing some path segment in it with a path to x.
[EDIT] Actually, it may not be necessary to treat the 2 cases separately after all. But I often find it easier to break a configuration into several (or even many) cases, and treat each one separately. Here, the case where h is on the path from the BFS root to x is easier to handle, and gives a clue for the other case.
[EDIT 2] Coming back to this later, it now seems to me that the two cases that need to be considered are (i) the u-v path intersects the path from the root to x (at some vertex y, not necessarily at the u-v path's highest point h); and (ii) it doesn't. We still need h to prove each case.
I'm going to work out j_random_hacker's hint. Let s, t be a maximally distant pair. Let u be the arbitrary vertex. We have a schematic like
u
|
|
|
x
/ \
/ \
/ \
s t ,
where x is the junction of s, t, u (i.e. the unique vertex that lies on each of the three paths between these vertices).
Suppose that v is a vertex maximally distant from u. If the schematic now looks like
u
|
|
|
x v
/ \ /
/ *
/ \
s t ,
then
d(s, t) = d(s, x) + d(x, t) <= d(s, x) + d(x, v) = d(s, v),
where the inequality holds because d(u, t) = d(u, x) + d(x, t) and d(u, v) = d(u, x) + d(x, v). There is a symmetric case where v attaches between s and x instead of between x and t.
The other case looks like
u
|
*---v
|
x
/ \
/ \
/ \
s t .
Now,
d(u, s) <= d(u, v) <= d(u, x) + d(x, v)
d(u, t) <= d(u, v) <= d(u, x) + d(x, v)
d(s, t) = d(s, x) + d(x, t)
= d(u, s) + d(u, t) - 2 d(u, x)
<= 2 d(x, v)
2 d(s, t) <= d(s, t) + 2 d(x, v)
= d(s, x) + d(x, v) + d(v, x) + d(x, t)
= d(v, s) + d(v, t),
so max(d(v, s), d(v, t)) >= d(s, t) by an averaging argument, and v belongs to a maximally distant pair.
Here's an alternative way to look at it:
Suppose G = ( V, E ) is a nonempty, finite tree with vertex set V and edge set E.
Consider the following algorithm:
Let count = 0. Let all edges in E initially be uncolored. Let C initially be equal to V.
Consider the subset V' of V containing all vertices with exactly one uncolored edge:
if V' is empty then let d = count * 2, and stop.
if V' contains exactly two elements then color their mutual (uncolored) edge green, let d = count * 2 + 1, and stop.
otherwise, V' contains at least three vertices; proceed as follows:
Increment count by one.
Remove all vertices from C that have no uncolored edges.
For each vertex in V with two or more uncolored edges, re-color each of its green edges red (some vertices may have zero such edges).
For each vertex in V', color its uncolored edge green.
Return to step (2).
That basically colors the graph from the leaves inward, marking paths with maximal distance to a leaf in green and marking those with only shorter distances in red. Meanwhile, the nodes of C, the center, with shorter maximal distance to a leaf are pared away until C contains only the one or two nodes with the largest maximum distance to a leaf.
By construction, all simple paths from leaf vertices to their nearest center vertex that traverse only green edges are the same length (count), and all other simple paths from a leaf vertex to its nearest center vertex (traversing at least one red edge) are shorter. It can furthermore be proven that
this algorithm always terminates under the conditions given, leaving every edge of G colored either red or green, and leaving C with either one or two elements.
at algorithm termination, d is the diameter of G, measured in edges.
Given a vertex v in V, the maximum-length simple paths in G starting at v are exactly those that contain contain all vertices of the center, terminate at a leaf, and traverse only green edges between center and the far endpoint. These go from v, across the center, to one of the leaves farthest from the center.
Now consider your algorithm, which might be more practical, in light of the above. Starting from any vertex v, there is exactly one simple path p from that vertex, ending at a center vertex, and containing all vertices of the center (because G is a tree, and if there are two vertices in C then they share an edge). It can be shown that the maximal simple paths in G having v as one endpoint all have the form of the union of p with a simple path from center to leaf traversing only green edges.
The key point for our purposes is that the incoming edge of the other endpoint is necessarily green. Therefore, when we perform a search for the longest paths starting there, we have access to those traversing only green edges from leaf across (all vertices of) the center to another leaf. Those are exactly the maximal-length simple paths in G, so we can be confident that the second search will indeed reveal the graph diameter.
1:procedureTreeDiameter(T)
2:pick an arbitrary vertex v where v∈V
3:u = BFS ( T, v )
4:t = BFS ( T, u )
5:return distance ( u, t )
Result: Complexity = O(|V|)