Dijkstra Algorithm on a graph modeling a network - algorithm

We have a directed graph G = (V, E) for a comm. network with each edge having a probability of not failing r(u, v) (defined as edge weight) which lies in interval [0, 1]. The probabilities are independent, so that from one vertex to another, if we multiply all probabilities, we get the the probability of the entire path not failing.
I need an efficient algorithm to find a most reliable path from one given vertex to another given vertex (i.e., a path from the first vertex to the second that is least likely to fail). I am given that log(r · s) = log r + log s will be helpful.
This is what I have so far -:
DIJKSTRA-VARIANT (G, s, t)
for v in V:
val[v] ← ∞
A ← ∅
Q ← V to initialize Q with vertices in V.
val[s] ← 0
while Q is not ∅ and t is not in A
do x ← EXTRACT-MIN (Q)
A ← A ∪ {x}
for each vertex y ∈ Adj[x]
do if val[x] + p(x, y) < val[y]:
val[y] = val[x] + p(x, y)
s is the source vertex and t is the destination vertex. Of course, I have not exploited the log property as I am not able to understand how to use it. The relaxation portion of the algorithm at the bottom needs to be modified, and the val array will capture the results. Without log, it would probably be storing the next highest probability. How should I modify the algorithm to use log?

Right now, your code has
do if val[x] + p(x, y) < val[y]:
val[y] = val[x] + p(x, y)
Since the edge weights in this case represent probabilities, you need to multiply them together (rather than adding):
do if val[x] * p(x, y) > val[y]:
val[y] = val[x] * p(x, y)
I've changed the sign to >, since you want the probability to be as large as possible.
Logs are helpful because (1) log(xy) = log(x) + log(y) (as you said) and sums are easier to compute than products, and (2) log(x) is a monotonic function of x, so log(x) and x have their maximum in the same place. Therefore, you can deal with the logarithm of the probability, instead of the probability itself:
do if log_val[x] + log(p(x, y)) > log_val[y]:
log_val[y] = log_val[x] + log(p(x, y))
Edited to add (since I don't have enough rep to leave a comment): you'll want to initialize your val array to 0, rather than Infinity, because you're calculating a maximum instead of a minimum. (Since you want the largest probability of not failing.) So, after log transforming, the initial log_val array values should be -Infinity.

In order to calculate probabilities you should multiply (instead of add) in the relaxation phase, which means changing:
do if val[x] + p(x, y) < val[y]:
val[y] = val[x] + p(x, y)
to:
do if val[x] * p(x, y) < val[y]:
val[y] = val[x] * p(x, y)
Using the Log is possible if the range is (0,1] since log(0) = -infinity and log(1) = 0, it means that for every x,y in (0,1]: probability x < probability y than: log(x) < log(y). Since we are maintaining the same relation (between probabilities) this modification will provide the correct answer.
I think you'll be able to take it from here.

I think I may have solved the question partially.
Here is my attempt. Edits and pointers are welcome -:
DIJKSTRA-VARIANT (G, s, t)
for v in V:
val[v] ← 0
A ← ∅
Q ← V to initialize Q with vertices in V.
val[s] ← 1
while Q is not ∅ and t is not in A
do x ← EXTRACT-MAX (Q)
A ← A ∪ {x}
for each vertex y ∈ Adj[x]
do if log(val[x]) + log(p(x, y)) > log(val[y]):
log(val[y]) = log(val[x]) + log(p(x, y))
Since I am to find the highest possible probability values, I believe I should be using >. The following questions remain -:
What should the initial values in the val array be?
Is there anything else I need to add?
EDIT: I have changed the initial val values to 0. However, log is undefined at 0. I am open to a better alternative. Also, I changed the priority queue's method to EXTRACT-MAX since it is the larger probabilities that need to be extracted. This would ideally be implemented on a binary max-heap.
FURTHER EDIT: I have marked tinybike's answer as accepted, since they have posted most of the necessary details that I require. The algorithm should be as I have posted here.

Related

Find minimum number of iterations to reach a certain sum

I'm trying to solve this problem since weeks, but couldn't arrive to a solution.
You start with two numbers X and Y both equal to 1. Only valid options are X+Y or Y+X at a time. We need to find minimum number of iterations need to reach a specific number.
eg : if the number is 5
X=1, Y=1; X = X+Y
X=2, Y=1; Y = X+Y
X=2, Y=3; Y = Y+X
X=2, Y=5; Stop answer reached
My take : If a number is odd let's say 23, decrement by 1. Now value = 22. Find the largest number that divides 22 = 11. Now reach the number by adding 1's so that:
X=11; Y=1 ; Y=Y+X
X=11; Y=12; X=X+Y
X=23, answer reached
But the problem with this approach is I cannot recursively reach a specific number, as even if I reach a certain point, say X = required value, the Y value gets misplaced and I cant reuse it to reach another value
Now I can give an O(nlogn) solution.
The method seems like greatest common divisor
Use f(x, y) express the minimum number of iterations to this state. This state can be reached by f(x-y, y) if x>y or f(x,y-x) if x<y. We can see that the way to reach state (x, y) is unique, we can calculate it in O(logn) like gcd.
The answer is min( f(n, i) | 1 <= i < n) so complexity is O(nlogn)
python code:
def gcd (n, m):
if m == 0:
return n
return gcd (m, n%m)
def calculate (x, y):
if y == 0:
return -1
return calculate (y, x%y) + x/y
def solve (n):
x = 0
min = n
for i in xrange (1, n):
if gcd (n, i) == 1:
ans = calculate (n, i)
if ans < min:
min = ans
x = i
print min
if __name__ == '__main__':
solve (5)
If the numbers are not that big (say, below 1000), you can use a breadth-first search.
Consider a directed graph where each vertex is a pair of numbers (X,Y), and from each such vertex there are two edges to vertices (X+Y,Y) and (X,X+Y). Run a BFS on that graph from (0,0) until you reach any of the positions you need.

Find the highest distance in vector

I have an array of positive integers. The problem is to find the highest
distance in vector. The Distance is calculated as A[p] + A[q] + (q - p), where A is a vector p, q are indexes and p<=q. The complexity of the solution must be O(n). I'm able to solve this problem with a O(n^2) solution, but I can't find a O(n) algorithm for this problem.
Someone could help me? Thanks in advance. Which language is used to find the solution doesn't matter
Rearrange the objective as (A[p] - p) + (A[q] + q). The first term is a function only of p, and the second term is a function only of q. Thus they can be optimized separately subject to p ≤ q. As we increase q from 0 to n-1, the best choice of p can be computed from the previous best and A[q] - q.
def highest_distance(A):
highest = float('-inf')
max_Ap_minus_p = float('-inf')
for q in range(len(A)):
max_Ap_minus_p = max(max_Ap_minus_p, A[q] - q)
highest = max(highest, max_Ap_minus_p + (A[q] + q))
return highest

Relation among Diameter and Radius in an Undirected Graph?

we consider only graphs that are undirected. The diameter of a graph is the maximum, over all choices of vertices s and t, of the shortest-path distance between s and t . (Recall the shortest-path distance between s and t is the fewest number of edges in an s-t path.) Next, for a vertex s, let l(s) denote the maximum, over all vertices t, of the shortest-path distance between s and t. The radius of a graph is the minimum of l(s) over all choices of the vertex s.
with radius r and the diameter d which of the following is always hold? choose the best answer.
1) r >= d/2
2) r <= d
we know the (1) and (2) always hold and in any reference book that written.
my challenge is this problem mentioned on Entrance Exam and just one of (1) or (2) should be true, the OP says choose the best answer and after the exam answer sheet wrote (1) is the best choice. How can verify me, why the (1) is better than (2).
They both are indeed True.
Don't let an exam with ambiguous questions weaken your concepts.
Well as for the Proof:
First of all 2nd inequality is quite trivial (from the definition itself)
Now the 1st one
d <= 2*r
Let z be a central vertex, then:
e(z)=r
Now,
diameter = d(x,y) [d(x,y) denotes distance between some vertex x & y]
d(x,y) <= d(x,z) + d(z,y)
d(x,y) <= d(z,x) + d(z,y)
d(x,y) <= e(z) + e(z) [this can be an upper bound as e(z)>=d(z,u) for all u]
diameter <= 2*r
They both hold.
2) should be clear.
1) holds using the triangle inequality. We can use this property because distances on graphs are a metric (http://en.wikipedia.org/wiki/Metric_%28mathematics%29). Using Let d(x, z) = diameter(G) and let y be a center of G (i.e. there exists a vertex v in G such that d(y, v) = radius(G)). Because d(y, v) = radius(G) and d(y, v) = d(v, y), we know that d(v, z) <= radius(G). Then we have that diameter(G) = d(x, z) <= d(y, v) + d(v, z) <= 2*radius(G).
The OP defined the shortest-path distance between s and t as "the fewest number of edges in an s-t path". This makes things simpler.
We may write the definitions in terms of some pseudocode:
def dist(s, t):
return min([len(path)-1 for path starts with s and ends with t])
r = min([max([dist(s, t) for t in V]) for s in V])
d = max([max([dist(s, t) for t in V]) for s in V])
where V is the set of all vertexes.
Now (2) is obviously true. The definition itself tells this: max always >= min.
(1) is slightly less obvious. It requires at least a few steps to prove.
Suppose d = dist(A, B), and r = dist(C, D), we have
dist(C, A) + dist(C, B) >= dist(A, B),
otherwise the length of path A-C-B would be smaller than dist(A, B).
From the definition of r, we know that
dist(C, D) >= dist(C, A)
dist(C, D) >= dist(C, B)
Hence 2 * dist(C, D) >= dist(A, B), i.e., 2 * r >= d.
So which one is better? This depends on how you define "better". If we consider something non-trivially correct (or not so obvious) to be better than something trivially correct, then we may agree that (1) is better than (2).

Random Polynomial TM

I need to show that there exists random polynomial TM M, that uses O(log(n)) space s.t.
for input G,s,t where G is directed graph and s,t are two vertices in
G: If there is a path from s to t then Pr[M(G,s,t) = 1] ≥ 1/nⁿ
Else Pr[M(G,s,t)=1] = 0
I tried to choose each time a random neighbor, but I can't figure why the probability is 1/nⁿ,
and I'm not sure about the number of iterations.
And another question:
I need to use the above result and the fact that I have "random counter" that uses O(log k) space, and can count up to 2k, to show that:
L is in LN iff there exists random polynomial TM M that uses O(log n)
space and for every input x, M will spot and: If x is in L then
Pr[M(x) = 1] ≥ 1/2 Else Pr[M(x) = 1] = 0
I will only answer the first question as this should be one question per post.
Algorithm:
start with vertice s
set a counter to zero.
choose a random neighbor v. (and increment the counter)
check if v equals t.
If not choose another random neighbor of v. (increment the counter)
repeat 3. and 4. until you found t or the counter reaches n. (or maybe c⋅n where c is a constant number)
To do this, you onlyhave to save 3 vertices (s,v,t) and the counter. If the counter is stored as binaray number it needs log₂(n) Bits. So this runs in O(log(n)) space.
If there is no path from s to t you will never get a neighbors neighbor of s that is t, so Pr[M(G,s,t)=1] = 0 holds.
If there exists a path, the propability of finding it would be the product of the propabilitys of choosing the right neighbors. So the worst case is that g is a complete graph, so evert vertice has n-1 neighbors. The path cannot be longer than n vertices. So let [s,v₁,v₂,...,vm,t] a path from s to t, with m < n. The we get
Pr[M(G,s,t) = 1] ≥ Πk=1,...,m 1/|neighbors(vk)|
≥ Πk=1,...,m 1/(n-1)
≥ Πk=1,...,m 1/n
≥ Πk=1,...,n 1/n
= 1/nⁿ

Recursion Puzzle

Recently, one of my friends challenged me to solve this puzzle which goes as follows:
Suppose that you have two variables x and y. These are the only variables which can be used for storage in the program. There are three operations which can be done:
Operation 1: x = x+y
Operation 2: x = x-y
Operation 3: y = x-y
Now, you are given two number n1 and n2 and a target number k. Starting with x = n1 and y = n2, is there a way to arrive at x = k using the operations mentioned above? If yes, what is the sequence of operations which can generate x = k.
Example: If n1 = 16, n2 = 6 and k = 28 then the answer is YES. The sequence is:
Operation 1
Operation 1
If n1 = 19, n2 = 7 and k = 22 then the answer is YES. The sequence is:
Operation 2
Operation 3
Operation 1
Operation 1
Now, I have wrapped my head around the problem for too long but I am not getting any initial thoughts. I have a feeling that this is recursion but I do not know what should be the boundary conditions. It would be very helpful if someone can direct me towards an approach which can be used to solve this problem. Thanks!
Maybe not a complete answer, but a proof that a sequence exists if and only if k is a multiple of the greatest common divisor (GCD) of n1 and n2. Let's write G = GCD(n1, n2) for brevity.
First I'll prove that x and y are always integer multiples of the G. This proof is really straightforward by induction. Hypothesis: x = p * G and y = q * G, for some integers p and q.
Initially, the hypothesis holds by definition of G.
Each of the rules respects the induction hypothesis. The rules yield:
x + y = p * G + q * G = (p + q) * G
x - y = p * G - q * G = (p - q) * G
y - x = q * G - p * G = (q - p) * G
Due to this result, there can only be a sequence to k if k is an integer multiple of the GCD of n1 and n2.
For the other direction we need to show that any integer multiple of G can be achieved by the rules. This is definitely the case if we can reach x = G and y = G. For this we use Euclid's algorithm. Consider the second implementation in the linked wiki article:
function gcd(a, b)
while a ≠ b
if a > b
a := a − b
else
b := b − a
return a
This is a repetitive application of rules 2 and 3 and results in x = G and y = G.
Knowing that a solution exists, you can apply a BFS, as shown in Amit's answer, to find the shortest sequence.
Assuming a solution exists, finding the shortest sequence to get to it can be done using a BFS.
The pseudo code should be something like:
queue <- new empty queue
parent <- new map of type map:pair->pair
parent[(x,y)] = 'root' //special indicator to stop the search there
queue.enqueue(pair(x,y))
while !queue.empty():
curr <- queue.dequeue()
x <- curr.first
y <- curr.second
if x == target or y == target:
printSolAccordingToMap(parent,(x,y))
return
x1 <- x+y
x2 <- x-y
y1 <- x-y
if (x1,y) is not a key in parent:
parent[(x1,y)] = (x,y)
queue.enqueue(pair(x1,y))
//similarly to (x2,y) and (x,y1)
The function printSolAccordingToMap() simply traces back on the map until it finds the root, and prints it.
Note that this solution only finds the optimal sequence if one exists, but will cause infinite loop if one does not exist, so this is only partial answer yet.
Consider that you have both (x,y) always <= target & >0 if not you can always bring them in the range by simple operations. If you consider this constraints you can make a graph where there are O(target*target) nodes and edge you can find by doing an operation among three on that node. You now need to evaluate the shortest path from start position node to target node which is (target,any). The assumption here is (x,y) values always stay within (0,target). The time complexity is O(target*target*log(target)) using djikstra.
In the Vincent's answer, I think the proof is not complete.
Let us suppose two relatively prime numbers suppose n1=19 and n2=13 whose GCD will be 1. According to him, sequence exits if k is multiple of GCD.Since every number is multiple of 1. I think it is not possible for every k.

Resources