Julia : getting whole path from Dijkstra function in Graphs.jl - algorithm

Is there a possibility to acquire whole path from source vertex to destination vertex using Dijkstra algorithm from Graphs.jl module ?
There is a update_vertex!(visitor, u, v, d) method invoked when distance to vertex is updated. Is distance updated only when new vertex that belongs to the shortest path is found? I am not really sure.
Thanks.
Edit:
According to documentation there is possibility to reconstruct shortest path using Floyd-Warshall algorithm with atributes dists and nexts but I am not sure how. I would like to run it on my GenericGraph.
Any idea ?

The result of the algorithm contains a field called parents as #DanGetz pointed out. Each of the nodes has the last node visited before arriving to it (i.e. the parent in the shortest path). Using the parents for each of the nodes, you can backtrack the shortest path for each of them with a simple recursive function:
spath(x, r, s) = x == s ? x : [spath(r.parents[x], r, s) x]
where r is the result of the Dijkstra algorithm and s is the source passed to it.
The shortest path for each of the nodes can be obtained by list-comprehension. Find bellow the result for the example in the documentation:
julia> [spath(x, r, 1) for x in g.vertices]
5-element Array{Any,1}:
1
1x3 Array{Int64,2}:
1 3 2
1x2 Array{Int64,2}:
1 3
1x4 Array{Int64,2}:
1 3 2 4
1x3 Array{Int64,2}:
1 3 5
There are probably better algorithms to do it (i.e. some dynamic programming method to remember paths for large graphs), but as an example, the recursive method does the job.
A quick recursion code adapted to your dictionaries with multiple parents for each shortest path:
function spath(current, parents, source, current_path)
if current == source || isempty(parents[current])
return Any[[current; current_path]]
end
results = []
for node in parents[current]
results = [spath(node, parents, source, [current; current_path]); results]
end
results
end
Note the current path is passed as a parameter (copies of it) until the leaf node (source), and thus, returns the whole shortest path when it reaches it. Again, it is probably not the most efficient implementation (I'm not a julia guru) but it does the job.
For your example:
julia> parents = {2=>[1,3],3=>[1],1=>[]}
julia> [(i, spath(i, parents, 1, [])) for i in keys(parents)]
3-element Array{Tuple{Any,Array{Any,1}},1}:
(2,Any[Any[1,3,2],Any[1,2]])
(3,Any[Any[1,3]])
(1,Any[Any[1]])

Related

Returning all shortest paths in lowest run time and complexity

This post has is the result that constantly appears for this problem but doesn't provide an optimal solution.
Currently I am trying to return all shortest paths starting atfrom and ending at target using BFS but I am running into a bottleneck with either my algorithm or the data structures I use.
pseudocode:
// The graph is an adjacency list of type unordered_map<string, unordered_set<string>>
// deque with pair of (visited unordered_set, vector with current path)
deque q = [({from}, [from]);
while q:
pair = q.dequeue()
visited = pair.first
path = pair.second
foreach adjacent_node to path[-1] in the graph:
if (adjacent_node == target):
res.append(path + [adjacent_node])
else if adjacent_node not in visited:
newPath = path + [adjacent_node]
visited.add(adjacent_node)
q.push((visited, newPath))
Currently the bottleneck seems to be with the queue's pair of items. I'm unsure how to solve the problem without storing a visited set with every path, or without copying a new path into the queue.
Firstly you should know that number of shortest paths can be huge and returning them all is not practical. Consider a graph with 2k+1 layers numbered from 1 to 2k+1, in which each layer is fully connected with the next layer, and odd layers has only one point while even layers has q points. Although this graph only has k(q+1)+1 nodes and kq edges, there are in total q^k different shortest paths which can be inefficient for normal computers to handle. However if you're sure that the number of shortest paths is relatively small I can introduce the following algorithm.
The basic idea is to store a list back for each node, meaning the shortest distance between from and x equals to the shortest distance between from and v plus one if and only if v in back[x]. back[x] can be computed during the process. Then you can perform a depth-first search to print all the shortest path. Pseudo code (BTW I noticed that your code is not correct):
queue q = [ from ]
visited = set<node>
back = map<node, list<node>>
while q.not_empty():
now = q.front()
if (now == target):
continue
foreach adjacent_node to now in the graph:
if (adjacent_node in visited):
back[adjacent_node].push(now)
else:
visited.add(adjacent_node)
back[adjacent_node] = [ now ]
q.push(adjacent_node)
# Now collect all shortest paths
ret = []
current = []
def collect(x):
current.push(x)
if (x == from):
ret.push(current.reversed())
return
foreach v in back[x]:
collect(v)
current.pop()
Sorry for my poor English. Feel free to point out my typos and mistakes.

Longest common path between k graphs

I was looking at interview problems and come across this one, failed to find a liable solution.
Actual question was asked on Leetcode discussion.
Given multiple school children and the paths they took from their school to their homes, find the longest most common path (paths are given in order of steps a child takes).
Example:
child1 : a -> g -> c -> b -> e
child2 : f -> g -> c -> b -> u
child3 : h -> g -> c -> b -> x
result = g -> c -> b
Note: There could be multiple children.The input was in the form of steps and childID. For example input looked like this:
(child1, a)
(child2, f)
(child1, g)
(child3, h)
(child1, c)
...
Some suggested longest common substring can work but it will not example -
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
1 and 2 will give abc, 3 and 4 give pfg
now ans will be none but ans is fg
it's like graph problem, how can we find longest common path between k graphs ?
You can construct a directed graph g with an edge a->b present if and only if it is present in all individual paths, then drop all nodes with degree zero.
The graph g will have have no cycles. If it did, the same cycle would be present in all individual paths, and a path has no cycles by definition.
In addition, all in-degrees and out-degrees will be zero or one. For example, if a node a had in-degree greater than one, there would be two edges representing two students arriving at a from two different nodes. Such edges cannot appear in g by construction.
The graph will look like a disconnected collection of paths. There may be multiple paths with maximum length, or there may be none (an empty path if you like).
In the Python code below, I find all common paths and return one with maximum length. I believe the whole procedure is linear in the number of input edges.
import networkx as nx
path_data = """1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g"""
paths = [line.split(" ")[1].split("-") for line in path_data.split("\n")]
num_paths = len(paths)
# graph h will include all input edges
# edge weight corresponds to the number of students
# traversing that edge
h = nx.DiGraph()
for path in paths:
for (i, j) in zip(path, path[1:]):
if h.has_edge(i, j):
h[i][j]["weight"] += 1
else:
h.add_edge(i, j, weight=1)
# graph g will only contain edges traversed by all students
g = nx.DiGraph()
g.add_edges_from((i, j) for i, j in h.edges if h[i][j]["weight"] == num_paths)
def longest_path(g):
# assumes g is a disjoint collection of paths
all_paths = list()
for node in g.nodes:
path = list()
if g.in_degree[node] == 0:
while True:
path.append(node)
try:
node = next(iter(g[node]))
except:
break
all_paths.append(path)
if not all_paths:
# handle the "empty path" case
return []
return max(all_paths, key=len)
print(longest_path(g))
# ['f', 'g']
Approach 1: With Graph construction
Consider this example:
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
Draw a directed weighted graph.
I am a lazy person. So, I have not drawn the direction arrows but believe they are invisibly there. Edge weight is 1 if not marked on the arrow.
Give the length of longest chain with each edge in the chain having Maximum Edge Weight MEW.
MEW is 4, our answer is FG.
Say AB & BC had edge weight 4, then ABC should be the answer.
The below example, which is the case of MEW < #children, should output ABC.
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-h
4 m-x-o-p-f-i
If some kid is like me, the kid will keep roaming multiple places before reaching home. In such cases, you might see MEW > #children and the solution would become complicated. I hope all the children in our input are obedient and they go straight from school to home.
Approach 2: Without Graph construction
If luckily the problem mentions that the longest common piece of path should be present in the paths of all the children i.e. strictly MEW == #children then you can solve by easier way. Below picture should give you clue on what to do.
Take the below example
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
Method 1:
Get longest common graph for first two: a-b-c, f-g (Result 1)
Get longest common graph for last two: p-f-g (Result 2)
Using Result 1 & 2 we get: f-g (Final Result)
Method 2:
Get longest common graph for first two: a-b-c, f-g (Result 1)
Take Result 1 and next graph i.e. m-n-o-p-f-g: f-g (Result 2)
Take Result 2 and next graph i.e. m-x-o-p-f-g: f-g (Final Result)
The beauty of the approach without graph construction is that even if kids roam same pieces of paths multiple times, we get the right solution.
If you go a step ahead, you could combine the approaches and use approach 1 as a sub-routine in approach 2.

Find largest ones after setting a coordinate to one

Interview Question:
You are given a grid of ones and zeros. You can arbitrarily select any point in that grid. You have to write a function which does two things:
If you choose e.g. coordinate (3,4) and it is zero you need to flip
that to a one. If it is a one you need to flip that to a zero.
You need to return the largest contiguous region
with the most ones i.e. ones have to be at least connected to
another one.
E.g.
[0,0,0,0]
[0,1,1,0]
[1,0,1,0]
We have the largest region being the 3 ones. We have another region which have only one one (found at coordinate (2,0)).
You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.
My Solution which has Time Complexity:O(num_row*num_col) each time this function is called:
def get_all_coordinates_of_ones(grid):
set_ones = set()
for i in range(len(grid[0])):
for j in range(len(grid)):
if grid[i][j]:
set_ones.add((i, j))
return set_ones
def get_largest_region(x, y, grid):
num_col = len(grid)
num_row = len(grid[0])
one_or_zero = grid[x][y]
if not grid[x][y]:
grid[x][y] = 1 - grid[x][y]
# get the coordinates of ones in the grid
# Worst Case O(num_col * num_row)
coordinates_ones = get_all_coordinates_of_ones(grid)
while coordinates_ones:
queue = collections.deque([coordinates_ones.pop()])
largest_one = float('-inf')
count_one = 1
visited = set()
while queue:
x, y = queue.popleft()
visited.add((x, y))
for new_x, new_y in ((x, y + 1), (x, y - 1), (x + 1, y), (x - 1, y)):
if (0 <= new_x < num_row and 0 <= new_y < num_col):
if grid[new_x][new_y] == 1 and (new_x, new_y) not in visited:
count_one += 1
if (new_x, new_y) in coordinates_ones:-
coordinates_ones.remove((new_x, new_y))
queue.append((new_x, new_y))
largest_one = max(largest_one, count_one)
return largest_one
My Proposed modifications:
Use Union Find by rank. Encountered a problem. Union all the ones that are adjacent to each other. Now when one of the
coordinates is flipped e.g. from zero to one I will need to remove that coordinate from the region that it is connected to.
Questions are:
What is the fastest algorithm in terms of time complexity?
Using Union Find with rank entails removing a node. Is this the way to do improve the time complexity. If so, is there an implementation of removing a node in union find online?
------------------------ EDIT ---------------------------------
Should we always subtract one from the degree from sum(degree-1 of each 'cut' vertex). Here are two examples the first one where we need to subtract one and the second one where we do not need to subtract one:
Block Cut Tree example 1
Cut vertex is vertex B. Degree of vertex B in the block cut tree is 2.
Sum(cardinality of each 'block' vertex) : 2(A,B) + 1(B) + 3 (B,C,D) = 6
Sum(degree of each 'cut' vertex) : 1 (B)
Block cut size: 6 – 1 = 5 but should be 4 (A. B, C, D, E, F). Here need to subtract one more.
Block Cut Tree Example 2
Sum(cardinality of each 'block' vertex) : 3 (A,B,C) + 1(C) + 1(D) + 3 (D, E, F) = 8
Sum(degree of each 'cut' vertex) : 2 (C and D)
Block cut size: 8 – 2 = 6 which is (A. B, C, D, E, F). Here no need to subtract one.
Without preprocessing:
Flip the cell in the matrix.
Consider the matrix as a graph where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components. For each connected component - store its cardinality.
Return the highest cardinality.
Note that O(V) = O(E) = O(num_row*num_col).
Step 3 takes O(V+E)=O(num_row*num_col), which is similar to your solution.
You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.
That hints that you can benefit from preprocessing:
Preprocessing:
Consider the original matrix as a graph G where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components
Construct the set of block-cut trees (section 5.2) of G (also here, here and here) (one block-cut tree for each connected component of G). Construction: see here.
Processing:
If you flip a '0' cell to '1':
Find neighbor connected components (0 to 4)
Remove old block-cut trees, construct a new block-cut tree for the merged component (Optimizations are possible: in some cases, previous tree(s) may be updated instead of reconstructed).
If you flip a '1' cell to '0':
If this cell is a 'cut' in a block-cut tree:
remove it from the block-cut-tree
remove it from each neighbor 'cut' vertex
split the block-cut-tree into several block-cut trees
Otherwise (this cell is part of only one 'block vertex')
remove it from the 'block' vertex; if empty - remove vertex. If block-cut-tree empty - remove it from the set of trees.
The size of a block-cut tree = sum(cardinality of each 'block' vertex) - sum(neighbor_blocks-1 of each 'cut' vertex).
Block-cut trees are not 'well known' as other data structures, so I'm not sure if this is what the interviewer had in mind. If it is - they're really looking for someone well experienced with graph algorithms.

Dijkstra's algorithm: is my implementation flawed?

In order to train myself both in Python and graph theory, I tried to implement the Dijkstra algo using Python 3, and submitted it against several online judges, to see if it was correct.
It works well in many cases, but not always.
For example, I am stuck with this one: the test case works fine and I also have tried custom test cases of my own, but when I submit the following solution, the judge keeps telling me "wrong answer", and the expected result is very different from my output, indeed.
Notice that the judge tests it against quite a complex graph (10000 nodes with 100000 edges), while all the cases I tried before never exceeded 20 nodes and around 20-40 edges.
Here is my code.
Given al an adjacency list in the following form:
al[n] = [(a1, w1), (a2, w2), ...]
where
n is the node id;
a1, a2, etc. are its adjacent nodes and w1, w2, etc. the respective weights for the given edge;
and supposing that maximum distance never exceeds 1 billion, I implemented Dijkstra's algorithm this way:
import queue
distance = [1000000000] * (N+1) # this is the array where I store the shortest path between 1 and each other node
distance[1] = 0 # starting from node 1 with distance 0
pq = queue.PriorityQueue()
pq.put((0, 1)) # same as above
visited = [False] * (N+1)
while not pq.empty():
n = pq.get()[1]
if visited[n]:
continue
visited[n] = True
for edge in al[n]:
if distance[edge[0]] > distance[n] + edge[1]:
distance[edge[0]] = distance[n] + edge[1]
pq.put((distance[edge[0]], edge[0]))
Could you please help me understand wether my implementation is flawed, or if I simply ran into some bugged online judge?
Thank you very much.
UPDATE
As requested, I'm providing the snippet I use to populate the adjacency list al for the linked problem.
N,M = input().split()
N,M = int(N), int(M)
al = [[] for n in range(N+1)]
for m in range(M):
try:
a,b,w = input().split()
a,b,w = int(a), int(b), int(w)
al[a].append((b, w))
al[b].append((a, w))
except:
pass
(Please don't mind the ugly "except: pass", I was using it just for debugging purposes... :P)
Primary problem in interpreting the question:
According to your parsing code, you are treating the input data as an undirected graph, i.e. each edge from A to B also is an edge from B to A.
Is seems like this premise is not valid and it should instead be a directed graph, i.e. you have to remove this line:
al[b].append((a, w)) # no back reference!
Previous problem, now already fixed in the code:
Currently, you are using the never-changing weight of the edges in your queue:
pq.put((edge[1], edge[0]))
This way, the nodes always end up at the same position in the queue, no matter at what stage of the algorithm and how far the path to reach that node actually is.
Instead, you should use the new distance to the target node edge[0], i.e. distance[edge[0]] as the priority in the queue:
pq.put((distance[edge[0]], edge[0]))

Finding the path with the maximum minimal weight

I'm trying to work out an algorithm for finding a path across a directed graph. It's not a conventional path and I can't find any references to anything like this being done already.
I want to find the path which has the maximum minimum weight.
I.e. If there are two paths with weights 10->1->10 and 2->2->2 then the second path is considered better than the first because the minimum weight (2) is greater than the minimum weight of the first (1).
If anyone can work out a way to do this, or just point me in the direction of some reference material it would be incredibly useful :)
EDIT:: It seems I forgot to mention that I'm trying to get from a specific vertex to another specific vertex. Quite important point there :/
EDIT2:: As someone below pointed out, I should highlight that edge weights are non negative.
I am copying this answer and adding also adding my proof of correctness for the algorithm:
I would use some variant of Dijkstra's. I took the pseudo code below directly from Wikipedia and only changed 5 small things:
Renamed dist to width (from line 3 on)
Initialized each width to -infinity (line 3)
Initialized the width of the source to infinity (line 8)
Set the finish criterion to -infinity (line 14)
Modified the update function and sign (line 20 + 21)
1 function Dijkstra(Graph, source):
2 for each vertex v in Graph: // Initializations
3 width[v] := -infinity ; // Unknown width function from
4 // source to v
5 previous[v] := undefined ; // Previous node in optimal path
6 end for // from source
7
8 width[source] := infinity ; // Width from source to source
9 Q := the set of all nodes in Graph ; // All nodes in the graph are
10 // unoptimized – thus are in Q
11 while Q is not empty: // The main loop
12 u := vertex in Q with largest width in width[] ; // Source node in first case
13 remove u from Q ;
14 if width[u] = -infinity:
15 break ; // all remaining vertices are
16 end if // inaccessible from source
17
18 for each neighbor v of u: // where v has not yet been
19 // removed from Q.
20 alt := max(width[v], min(width[u], width_between(u, v))) ;
21 if alt > width[v]: // Relax (u,v,a)
22 width[v] := alt ;
23 previous[v] := u ;
24 decrease-key v in Q; // Reorder v in the Queue
25 end if
26 end for
27 end while
28 return width;
29 endfunction
Some (handwaving) explanation why this works: you start with the source. From there, you have infinite capacity to itself. Now you check all neighbors of the source. Assume the edges don't all have the same capacity (in your example, say (s, a) = 300). Then, there is no better way to reach b then via (s, b), so you know the best case capacity of b. You continue going to the best neighbors of the known set of vertices, until you reach all vertices.
Proof of correctness of algorithm:
At any point in the algorithm, there will be 2 sets of vertices A and B. The vertices in A will be the vertices to which the correct maximum minimum capacity path has been found. And set B has vertices to which we haven't found the answer.
Inductive Hypothesis: At any step, all vertices in set A have the correct values of maximum minimum capacity path to them. ie., all previous iterations are correct.
Correctness of base case: When the set A has the vertex S only. Then the value to S is infinity, which is correct.
In current iteration, we set
val[W] = max(val[W], min(val[V], width_between(V-W)))
Inductive step: Suppose, W is the vertex in set B with the largest val[W]. And W is dequeued from the queue and W has been set the answer val[W].
Now, we need to show that every other S-W path has a width <= val[W]. This will be always true because all other ways of reaching W will go through some other vertex (call it X) in the set B.
And for all other vertices X in set B, val[X] <= val[W]
Thus any other path to W will be constrained by val[X], which is never greater than val[W].
Thus the current estimate of val[W] is optimum and hence algorithm computes the correct values for all the vertices.
You could also use the "binary search on the answer" paradigm. That is, do a binary search on the weights, testing for each weight w whether you can find a path in the graph using only edges of weight greater than w.
The largest w for which you can (found through binary search) gives the answer. Note that you only need to check if a path exists, so just an O(|E|) breadth-first/depth-first search, not a shortest-path. So it's O(|E|*log(max W)) in all, comparable to the Dijkstra/Kruskal/Prim's O(|E|log |V|) (and I can't immediately see a proof of those, too).
Use either Prim's or Kruskal's algorithm. Just modify them so they stop when they find out that the vertices you ask about are connected.
EDIT: You ask for maximum minimum, but your example looks like you want minimum maximum. In case of maximum minimum Kruskal's algorithm won't work.
EDIT: The example is okay, my mistake. Only Prim's algorithm will work then.
I am not sure that Prim will work here. Take this counterexample:
V = {1, 2, 3, 4}
E = {(1, 2), (2, 3), (1, 4), (4, 2)}
weight function w:
w((1,2)) = .1,
w((2,3)) = .3
w((1,4)) = .2
w((4,2)) = .25
If you apply Prim to find the maxmin path from 1 to 3, starting from 1 will select the 1 --> 2 --> 3 path, while the max-min distance is attained for the path that goes through 4.
This can be solved using a BFS style algorithm, however you need two variations:
Instead of marking each node as "visited", you mark it with the minimum weight along the path you took to reach it.
For example, if I and J are neighbors, I has value w1, and the weight of the edge between them is w2, then J=min(w1, w2).
If you reach a marked node with value w1, you might need to remark and process it again, if assigning a new value w2 (and w2>w1). This is required to make sure you get the maximum of all minimums.
For example, if I and J are neighbors, I has value w1, J has value w2, and the weight of the edge between them is w3, then if min(w2, w3) > w1 you must remark J and process all it's neighbors again.
Ok, answering my own question here just to try and get a bit of feedback I had on the tentative solution I worked out before posting here:
Each node stores a "path fragment", this is the entire path to itself so far.
0) set current vertex to the starting vertex
1) Generate all path fragments from this vertex and add them to a priority queue
2) Take the fragment off the top off the priority queue, and set the current vertex to the ending vertex of that path
3) If the current vertex is the target vertex, then return the path
4) goto 1
I'm not sure this will find the best path though, I think the exit condition in step three is a little ambitious. I can't think of a better exit condition though, since this algorithm doesn't close vertices (a vertex can be referenced in as many path fragments as it likes) you can't just wait until all vertices are closed (like Dijkstra's for example)
You can still use Dijkstra's!
Instead of using +, use the min() operator.
In addition, you'll want to orient the heap/priority_queue so that the biggest things are on top.
Something like this should work: (i've probably missed some implementation details)
let pq = priority queue of <node, minimum edge>, sorted by min. edge descending
push (start, infinity) on queue
mark start as visited
while !queue.empty:
current = pq.top()
pq.pop()
for all neighbors of current.node:
if neighbor has not been visited
pq.decrease_key(neighbor, min(current.weight, edge.weight))
It is guaranteed that whenever you get to a node you followed an optimal path (since you find all possibilities in decreasing order, and you can never improve your path by adding an edge)
The time bounds are the same as Dijkstra's - O(Vlog(E)).
EDIT: oh wait, this is basically what you posted. LOL.

Resources