Longest common path between k graphs - algorithm

I was looking at interview problems and come across this one, failed to find a liable solution.
Actual question was asked on Leetcode discussion.
Given multiple school children and the paths they took from their school to their homes, find the longest most common path (paths are given in order of steps a child takes).
Example:
child1 : a -> g -> c -> b -> e
child2 : f -> g -> c -> b -> u
child3 : h -> g -> c -> b -> x
result = g -> c -> b
Note: There could be multiple children.The input was in the form of steps and childID. For example input looked like this:
(child1, a)
(child2, f)
(child1, g)
(child3, h)
(child1, c)
...
Some suggested longest common substring can work but it will not example -
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
1 and 2 will give abc, 3 and 4 give pfg
now ans will be none but ans is fg
it's like graph problem, how can we find longest common path between k graphs ?

You can construct a directed graph g with an edge a->b present if and only if it is present in all individual paths, then drop all nodes with degree zero.
The graph g will have have no cycles. If it did, the same cycle would be present in all individual paths, and a path has no cycles by definition.
In addition, all in-degrees and out-degrees will be zero or one. For example, if a node a had in-degree greater than one, there would be two edges representing two students arriving at a from two different nodes. Such edges cannot appear in g by construction.
The graph will look like a disconnected collection of paths. There may be multiple paths with maximum length, or there may be none (an empty path if you like).
In the Python code below, I find all common paths and return one with maximum length. I believe the whole procedure is linear in the number of input edges.
import networkx as nx
path_data = """1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g"""
paths = [line.split(" ")[1].split("-") for line in path_data.split("\n")]
num_paths = len(paths)
# graph h will include all input edges
# edge weight corresponds to the number of students
# traversing that edge
h = nx.DiGraph()
for path in paths:
for (i, j) in zip(path, path[1:]):
if h.has_edge(i, j):
h[i][j]["weight"] += 1
else:
h.add_edge(i, j, weight=1)
# graph g will only contain edges traversed by all students
g = nx.DiGraph()
g.add_edges_from((i, j) for i, j in h.edges if h[i][j]["weight"] == num_paths)
def longest_path(g):
# assumes g is a disjoint collection of paths
all_paths = list()
for node in g.nodes:
path = list()
if g.in_degree[node] == 0:
while True:
path.append(node)
try:
node = next(iter(g[node]))
except:
break
all_paths.append(path)
if not all_paths:
# handle the "empty path" case
return []
return max(all_paths, key=len)
print(longest_path(g))
# ['f', 'g']

Approach 1: With Graph construction
Consider this example:
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
Draw a directed weighted graph.
I am a lazy person. So, I have not drawn the direction arrows but believe they are invisibly there. Edge weight is 1 if not marked on the arrow.
Give the length of longest chain with each edge in the chain having Maximum Edge Weight MEW.
MEW is 4, our answer is FG.
Say AB & BC had edge weight 4, then ABC should be the answer.
The below example, which is the case of MEW < #children, should output ABC.
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-h
4 m-x-o-p-f-i
If some kid is like me, the kid will keep roaming multiple places before reaching home. In such cases, you might see MEW > #children and the solution would become complicated. I hope all the children in our input are obedient and they go straight from school to home.
Approach 2: Without Graph construction
If luckily the problem mentions that the longest common piece of path should be present in the paths of all the children i.e. strictly MEW == #children then you can solve by easier way. Below picture should give you clue on what to do.
Take the below example
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
Method 1:
Get longest common graph for first two: a-b-c, f-g (Result 1)
Get longest common graph for last two: p-f-g (Result 2)
Using Result 1 & 2 we get: f-g (Final Result)
Method 2:
Get longest common graph for first two: a-b-c, f-g (Result 1)
Take Result 1 and next graph i.e. m-n-o-p-f-g: f-g (Result 2)
Take Result 2 and next graph i.e. m-x-o-p-f-g: f-g (Final Result)
The beauty of the approach without graph construction is that even if kids roam same pieces of paths multiple times, we get the right solution.
If you go a step ahead, you could combine the approaches and use approach 1 as a sub-routine in approach 2.

Related

Find largest ones after setting a coordinate to one

Interview Question:
You are given a grid of ones and zeros. You can arbitrarily select any point in that grid. You have to write a function which does two things:
If you choose e.g. coordinate (3,4) and it is zero you need to flip
that to a one. If it is a one you need to flip that to a zero.
You need to return the largest contiguous region
with the most ones i.e. ones have to be at least connected to
another one.
E.g.
[0,0,0,0]
[0,1,1,0]
[1,0,1,0]
We have the largest region being the 3 ones. We have another region which have only one one (found at coordinate (2,0)).
You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.
My Solution which has Time Complexity:O(num_row*num_col) each time this function is called:
def get_all_coordinates_of_ones(grid):
set_ones = set()
for i in range(len(grid[0])):
for j in range(len(grid)):
if grid[i][j]:
set_ones.add((i, j))
return set_ones
def get_largest_region(x, y, grid):
num_col = len(grid)
num_row = len(grid[0])
one_or_zero = grid[x][y]
if not grid[x][y]:
grid[x][y] = 1 - grid[x][y]
# get the coordinates of ones in the grid
# Worst Case O(num_col * num_row)
coordinates_ones = get_all_coordinates_of_ones(grid)
while coordinates_ones:
queue = collections.deque([coordinates_ones.pop()])
largest_one = float('-inf')
count_one = 1
visited = set()
while queue:
x, y = queue.popleft()
visited.add((x, y))
for new_x, new_y in ((x, y + 1), (x, y - 1), (x + 1, y), (x - 1, y)):
if (0 <= new_x < num_row and 0 <= new_y < num_col):
if grid[new_x][new_y] == 1 and (new_x, new_y) not in visited:
count_one += 1
if (new_x, new_y) in coordinates_ones:-
coordinates_ones.remove((new_x, new_y))
queue.append((new_x, new_y))
largest_one = max(largest_one, count_one)
return largest_one
My Proposed modifications:
Use Union Find by rank. Encountered a problem. Union all the ones that are adjacent to each other. Now when one of the
coordinates is flipped e.g. from zero to one I will need to remove that coordinate from the region that it is connected to.
Questions are:
What is the fastest algorithm in terms of time complexity?
Using Union Find with rank entails removing a node. Is this the way to do improve the time complexity. If so, is there an implementation of removing a node in union find online?
------------------------ EDIT ---------------------------------
Should we always subtract one from the degree from sum(degree-1 of each 'cut' vertex). Here are two examples the first one where we need to subtract one and the second one where we do not need to subtract one:
Block Cut Tree example 1
Cut vertex is vertex B. Degree of vertex B in the block cut tree is 2.
Sum(cardinality of each 'block' vertex) : 2(A,B) + 1(B) + 3 (B,C,D) = 6
Sum(degree of each 'cut' vertex) : 1 (B)
Block cut size: 6 – 1 = 5 but should be 4 (A. B, C, D, E, F). Here need to subtract one more.
Block Cut Tree Example 2
Sum(cardinality of each 'block' vertex) : 3 (A,B,C) + 1(C) + 1(D) + 3 (D, E, F) = 8
Sum(degree of each 'cut' vertex) : 2 (C and D)
Block cut size: 8 – 2 = 6 which is (A. B, C, D, E, F). Here no need to subtract one.
Without preprocessing:
Flip the cell in the matrix.
Consider the matrix as a graph where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components. For each connected component - store its cardinality.
Return the highest cardinality.
Note that O(V) = O(E) = O(num_row*num_col).
Step 3 takes O(V+E)=O(num_row*num_col), which is similar to your solution.
You are to find an algorithm that will solve this where you will call that function many times. You need to ensure that your amortized run time is the lowest you can achieve.
That hints that you can benefit from preprocessing:
Preprocessing:
Consider the original matrix as a graph G where each '1' represents a node, and neighbor nodes are connected with an edge.
Find all connected components
Construct the set of block-cut trees (section 5.2) of G (also here, here and here) (one block-cut tree for each connected component of G). Construction: see here.
Processing:
If you flip a '0' cell to '1':
Find neighbor connected components (0 to 4)
Remove old block-cut trees, construct a new block-cut tree for the merged component (Optimizations are possible: in some cases, previous tree(s) may be updated instead of reconstructed).
If you flip a '1' cell to '0':
If this cell is a 'cut' in a block-cut tree:
remove it from the block-cut-tree
remove it from each neighbor 'cut' vertex
split the block-cut-tree into several block-cut trees
Otherwise (this cell is part of only one 'block vertex')
remove it from the 'block' vertex; if empty - remove vertex. If block-cut-tree empty - remove it from the set of trees.
The size of a block-cut tree = sum(cardinality of each 'block' vertex) - sum(neighbor_blocks-1 of each 'cut' vertex).
Block-cut trees are not 'well known' as other data structures, so I'm not sure if this is what the interviewer had in mind. If it is - they're really looking for someone well experienced with graph algorithms.

Julia : getting whole path from Dijkstra function in Graphs.jl

Is there a possibility to acquire whole path from source vertex to destination vertex using Dijkstra algorithm from Graphs.jl module ?
There is a update_vertex!(visitor, u, v, d) method invoked when distance to vertex is updated. Is distance updated only when new vertex that belongs to the shortest path is found? I am not really sure.
Thanks.
Edit:
According to documentation there is possibility to reconstruct shortest path using Floyd-Warshall algorithm with atributes dists and nexts but I am not sure how. I would like to run it on my GenericGraph.
Any idea ?
The result of the algorithm contains a field called parents as #DanGetz pointed out. Each of the nodes has the last node visited before arriving to it (i.e. the parent in the shortest path). Using the parents for each of the nodes, you can backtrack the shortest path for each of them with a simple recursive function:
spath(x, r, s) = x == s ? x : [spath(r.parents[x], r, s) x]
where r is the result of the Dijkstra algorithm and s is the source passed to it.
The shortest path for each of the nodes can be obtained by list-comprehension. Find bellow the result for the example in the documentation:
julia> [spath(x, r, 1) for x in g.vertices]
5-element Array{Any,1}:
1
1x3 Array{Int64,2}:
1 3 2
1x2 Array{Int64,2}:
1 3
1x4 Array{Int64,2}:
1 3 2 4
1x3 Array{Int64,2}:
1 3 5
There are probably better algorithms to do it (i.e. some dynamic programming method to remember paths for large graphs), but as an example, the recursive method does the job.
A quick recursion code adapted to your dictionaries with multiple parents for each shortest path:
function spath(current, parents, source, current_path)
if current == source || isempty(parents[current])
return Any[[current; current_path]]
end
results = []
for node in parents[current]
results = [spath(node, parents, source, [current; current_path]); results]
end
results
end
Note the current path is passed as a parameter (copies of it) until the leaf node (source), and thus, returns the whole shortest path when it reaches it. Again, it is probably not the most efficient implementation (I'm not a julia guru) but it does the job.
For your example:
julia> parents = {2=>[1,3],3=>[1],1=>[]}
julia> [(i, spath(i, parents, 1, [])) for i in keys(parents)]
3-element Array{Tuple{Any,Array{Any,1}},1}:
(2,Any[Any[1,3,2],Any[1,2]])
(3,Any[Any[1,3]])
(1,Any[Any[1]])

minimum weight vertex cover of a tree

There's an existing question dealing with trees where the weight of a vertex is its degree, but I'm interested in the case where the vertices can have arbitrary weights.
This isn't homework but it is one of the questions in the algorithm design manual, which I'm currently reading; an answer set gives the solution as
Perform a DFS, at each step update Score[v][include], where v is a vertex and include is either true or false;
If v is a leaf, set Score[v][false] = 0, Score[v][true] = wv, where wv is the weight of vertex v.
During DFS, when moving up from the last child of the node v, update Score[v][include]:
Score[v][false] = Sum for c in children(v) of Score[c][true] and Score[v][true] = wv + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
Extract actual cover by backtracking Score.
However, I can't actually translate that into something that works. (In response to the comment: what I've tried so far is drawing some smallish graphs with weights and running through the algorithm on paper, up until step four, where the "extract actual cover" part is not transparent.)
In response Ali's answer: So suppose I have this graph, with the vertices given by A etc. and the weights in parens after:
A(9)---B(3)---C(2)
\ \
E(1) D(4)
The right answer is clearly {B,E}.
Going through this algorithm, we'd set values like so:
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6; score[B][true] = 3
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4; score[A][true] = 12
Ok, so, my question is basically, now what? Doing the simple thing and iterating through the score vector and deciding what's cheapest locally doesn't work; you only end up including B. Deciding based on the parent and alternating also doesn't work: consider the case where the weight of E is 1000; now the correct answer is {A,B}, and they're adjacent. Perhaps it is not supposed to be confusing, but frankly, I'm confused.
There's no actual backtracking done (or needed). The solution uses dynamic programming to avoid backtracking, since that'd take exponential time. My guess is "backtracking Score" means the Score contains the partial results you would get by doing backtracking.
The cover vertex of a tree allows to include alternated and adjacent vertices. It does not allow to exclude two adjacent vertices, because it must contain all of the edges.
The answer is given in the way the Score is recursively calculated. The cost of not including a vertex, is the cost of including its children. However, the cost of including a vertex is whatever is less costly, the cost of including its children or not including them, because both things are allowed.
As your solution suggests, it can be done with DFS in post-order, in a single pass. The trick is to include a vertex if the Score says it must be included, and include its children if it must be excluded, otherwise we'd be excluding two adjacent vertices.
Here's some pseudocode:
find_cover_vertex_of_minimum_weight(v)
find_cover_vertex_of_minimum_weight(left children of v)
find_cover_vertex_of_minimum_weight(right children of v)
Score[v][false] = Sum for c in children(v) of Score[c][true]
Score[v][true] = v weight + Sum for c in children(v) of min(Score[c][true]; Score[c][false])
if Score[v][true] < Score[v][false] then
add v to cover vertex tree
else
for c in children(v)
add c to cover vertex tree
It actually didnt mean any thing confusing and it is just Dynamic Programming, you seems to almost understand all the algorithm. If I want to make it any more clear, I have to say:
first preform DFS on you graph and find leafs.
for every leaf assign values as the algorithm says.
now start from leafs and assign values to each leaf parent by that formula.
start assigning values to parent of nodes that already have values until you reach the root of your graph.
That is just it, by backtracking in your algorithm it means that you assign value to each node that its child already have values. As I said above this kind of solving problem is called dynamic programming.
Edit just for explaining your changes in the question. As you you have the following graph and answer is clearly B,E but you though this algorithm just give you B and you are incorrect this algorithm give you B and E.
A(9)---B(3)---C(2)
\ \
E(1) D(4)
score[D][false] = 0; score[D][true] = 4
score[C][false] = 0; score[C][true] = 2
score[B][false] = 6 this means we use C and D; score[B][true] = 3 this means we use B
score[E][false] = 0; score[E][true] = 1
score[A][false] = 4 This means we use B and E; score[A][true] = 12 this means we use B and A.
and you select 4 so you must use B and E. if it was just B your answer would be 3. but as you find it correctly your answer is 4 = 3 + 1 = B + E.
Also when E = 1000
A(9)---B(3)---C(2)
\ \
E(1000) D(4)
it is 100% correct that the answer is B and A because it is wrong to use E just because you dont want to select adjacent nodes. with this algorithm you will find the answer is A and B and just by checking you can find it too. suppose this covers :
C D A = 15
C D E = 1006
A B = 12
Although the first two answer have no adjacent nodes but they are bigger than last answer that have adjacent nodes. so it is best to use A and B for cover.

How to store a map and generate a graph with BFS in Ruby

So I guess this is a classical question for somebody with MSC in CS.
I have N element and I have the distances as well. Let's say I have 3 elements with the following distances. It is symmetric, so
A -> B == B -> A
It looks like a matrix:
A, B, C,
A 0, 10, 20
B 10, 0, 30
C 20, 30, 0
My question would be:
how can I store this efficiently(what data structure)
what is the most efficient way to get a linked list where the sum of distance is minimal
In this case the best is
B -> A -> C = 30 which equals to C -> A -> B
other case:
A -> B -> C = 40 which equals to C -> B -> A
I had the impression that BFS might be suitable for this. Link to documentation in English is good, even books on Amazon...
The ideal solution for your data structure is an adjacency list.
An adjacency list is simply a list of objects (representing vertices on your graph). Each object has a list containing all the vertices that it has an adjacent edge to and the corresponding weight.
In ruby, a simple implementation might looking something like:
vertices = {}
a = Vertex.new
b = Vertex.new
a.add(b, 10)
b.add(a, 10)
vertices[a] = a
vertices[b] = b
The algorithm to find the shortest, weighted path is called Dijkstra's.
If you would like to get the shortest path after running the algorithm, you can do a traceback. This is done by storing the (optimal) parent of each node as you reach it. In order to do this, you could have a hash for each visited node which maps to the node which lead to it with the least cost.
Once you have finished the algorithm, your recursive traceback may look something like this:
def traceback(parent, start, node, path)
if(start == node)
(path + start.to_s).reverse
else
path += node.to_s + " "
traceback(parent, start, parent[node], path)
end
end
I hear Dijkstra has an algorithm to navigate a weighted graph.

Paths in complete graph

I have a friend that needs to compute the following:
In the complete graph Kn (k<=13), there are k*(k-1)/2 edges.
Each edge can be directed in 2 ways, hence 2^[(k*(k-1))/2] different cases.
She needs to compute P[A !-> B && C !-> D] - P[A !-> B]*P[C !-> D]
X !-> Y means "there is no path from X to Y", and P[ ] is the probability.
So the bruteforce algorithm is to examine every one of the 2^[(k*(k-1))/2] different graphes, and since they are complete, in each graph one only needs to consider one set of A,B,C,D because of symmetry.
P[A !-> B] is then computed as "number of graphs with no path between node 1 and 2" divided by total number of graphs, i.e 2^[(k*(k-1))/2].
The bruteforce method works in mathematica up to K8, but she needs K9,K10... up to K13.
We obviously don't need to find the shortest path in the cases, just want to find if there is one.
Anyone have optimization suggestions? (This sound like a typical Project Euler problem).
Example:
The minimal graph K4 have 4 vertices, giving 6 edges. Hence there are 2^6 = 64 possible ways to assign directions to the edges, if we label the 4 vertices A,B,C and D.
In some graphs, there is NOT a path from A to B, (lets say X of them) and in some others, there are no path from C to D (lets say Y). But in some graphs, there is no path from A to B, and at the same time no path from C to D. These are W.
So P[A !-> B]=X/64, P[C !-> D]=Y/64 and P[A !-> B && C !-> D] = W/64.
Update:
A, B,C and D are 4 different vertives, hence we need at least K4.
Observe that we are dealing with DIRECTED graphs, so normal representation with UT-matrices won't suffice.
There is a function in mathematica that finds the distance between nodes in a directed graph, (if it returns infinity, there is no path), but this is a little bit overkill since we dont need the distance, just if there is a path or not.
I have a theory, but I don't have mathematica to test it with, so here goes. (And please excuse my mistakes in terminology, I'm not really familiar with graph theory.)
I agree that there are 2^(n*(n-1)/2) different directed Kn graphs. The question is how many of those contain a path A->B. Call that number S(n).
Suppose we know S(n) for some n, and we want to add another node, X, and calculate S(n+1). We will look for paths X->A.
There are 2^n ways to connect X to the preexisting graph.
The edge X-A might point in the "right" direction (X->A); there are 2^(n-1) ways to connect X this way, and it will lead to a path for any of the 2^(n*(n-1)/2) different Kn graphs.
If X-A points to X, try the edge X-B. If X-B points to B (and there are 2^(n-2) such ways to connect X) then some Kn graphs will give a path B->A, S(n) of them in fact.
If X-B points to X, try X-C; there are 2^(n-3)S(n) successful graphs there.
If my math is correct, S(n+1) = 2^((n+2)(n-1)/2) + (2^(n-1)-1)S(n)
So this gives the following:
S(2) = 1
S(3) = 5
S(4) = 47
S(5) = 841
S(6) = 28999
Can someone check this? Or give a closed form for S(n)?
EDIT:
I see now that the hard part is this P[A !-> B && C !-> D]. But I think the recursion approach will still work: start with {A,B,C,D}, then keep adding points, keeping track of the number of graphs in which A->(a points), (b points)->B, C->(c points) and (d points)->D, keeping the desired constraint. Ugly, but tractable.
The brute force approach of considering all graphs will not get you much further, you'll have to consider more than one graph at a time.
For 8 you have 2^28 ~ 256 million graphs.
9: 2^36 ~ 64 billion
10: 2^45 ~ 32 trillion
11: 2^55 > 1016
12: 2^66 > 1019
13: 2^78 > 1023
For the purpose of finding paths the interesting part is the partial ordering on the strongly connected components of the graph. Actually the ordering must be total, because there is an edge between any two nodes.
So you could try to consider total orderings, there are certainly a lot fewer than graphs.
I think that representing graph using matrix will be very helpful.
If A!->B put 0 in A th row and B th column.
Put 1 everywhere else.
Count no of 0s = Z.
then P[A!->B] = 1 / 2^Z
=> P[A!->B && C!->B] - P[A!-B].P[C!-D] = 1/2^2 - 1/ 2^(X-2) // Somthing wrong here I'm fixin it
where X = k(k-1)/2
A B C D
A . 0 1 1
B . . 1 1
C . . . 1
D . . . .
NOTE:We can use upper triangle without loss of generality.

Resources