Graph data structure terms - data-structures

What is the difference between the terms edge and path in graph data structure?

An edge is something that connects two nodes. A path is a series of edges in sequence that defines a "path" from node A to node B.
http://en.wikipedia.org/wiki/Graph_(data_structure)

Edge: connects node one node to another. So there no nodes present between node A and B.
eg. A<-->B or A-->B or A<---B.
Path: Connects 1 or more nodes to each other. So path contains 1 or more edges.
eg. 1.) A---B---C : here path is ABC
2.)
A
/ \
B C
/
D
Here different paths are A-B-C and A-C.
Different edges are: A-B, B-C, A-C.
I hope this clears your doubt

Edge is a connection between two vertices of the graph.
Consider the graph a b
6---4----5
| | \ e
c | d| 1
| | / f
3----2
g
a,b,c,d,e represents the edges of the graphs where as a path can be path from a to g that can be a,b,d,g or a,c,g.

Edge is a point/dot ( maybe starting point, mid point, ending point).
Path is a line( sequence of point/dot makes a line).

A graph is two tuple G = (V, E), where:
V -> set of vertices (points/nodes or whatever you call it)
E -> set of edges (a line which connects any two vertices)
Such that: (v,u) belongs to E (set of edges) => v, u belongs to V (set of vertices).
Now, when we talk about paths: These are series of connected edges, which starts from a vertex and ends in another vertex.
Then you have several types of graphs : i.e. Connected/disconnected directed/undirected weighted/unweighted graphs.
Further reading : http://en.wikipedia.org/wiki/Graph_(mathematics)
Hope it helps!!

An edge connects two nodes and path is sequence of nodes and edges.

Related

Longest common path between k graphs

I was looking at interview problems and come across this one, failed to find a liable solution.
Actual question was asked on Leetcode discussion.
Given multiple school children and the paths they took from their school to their homes, find the longest most common path (paths are given in order of steps a child takes).
Example:
child1 : a -> g -> c -> b -> e
child2 : f -> g -> c -> b -> u
child3 : h -> g -> c -> b -> x
result = g -> c -> b
Note: There could be multiple children.The input was in the form of steps and childID. For example input looked like this:
(child1, a)
(child2, f)
(child1, g)
(child3, h)
(child1, c)
...
Some suggested longest common substring can work but it will not example -
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
1 and 2 will give abc, 3 and 4 give pfg
now ans will be none but ans is fg
it's like graph problem, how can we find longest common path between k graphs ?
You can construct a directed graph g with an edge a->b present if and only if it is present in all individual paths, then drop all nodes with degree zero.
The graph g will have have no cycles. If it did, the same cycle would be present in all individual paths, and a path has no cycles by definition.
In addition, all in-degrees and out-degrees will be zero or one. For example, if a node a had in-degree greater than one, there would be two edges representing two students arriving at a from two different nodes. Such edges cannot appear in g by construction.
The graph will look like a disconnected collection of paths. There may be multiple paths with maximum length, or there may be none (an empty path if you like).
In the Python code below, I find all common paths and return one with maximum length. I believe the whole procedure is linear in the number of input edges.
import networkx as nx
path_data = """1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g"""
paths = [line.split(" ")[1].split("-") for line in path_data.split("\n")]
num_paths = len(paths)
# graph h will include all input edges
# edge weight corresponds to the number of students
# traversing that edge
h = nx.DiGraph()
for path in paths:
for (i, j) in zip(path, path[1:]):
if h.has_edge(i, j):
h[i][j]["weight"] += 1
else:
h.add_edge(i, j, weight=1)
# graph g will only contain edges traversed by all students
g = nx.DiGraph()
g.add_edges_from((i, j) for i, j in h.edges if h[i][j]["weight"] == num_paths)
def longest_path(g):
# assumes g is a disjoint collection of paths
all_paths = list()
for node in g.nodes:
path = list()
if g.in_degree[node] == 0:
while True:
path.append(node)
try:
node = next(iter(g[node]))
except:
break
all_paths.append(path)
if not all_paths:
# handle the "empty path" case
return []
return max(all_paths, key=len)
print(longest_path(g))
# ['f', 'g']
Approach 1: With Graph construction
Consider this example:
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
Draw a directed weighted graph.
I am a lazy person. So, I have not drawn the direction arrows but believe they are invisibly there. Edge weight is 1 if not marked on the arrow.
Give the length of longest chain with each edge in the chain having Maximum Edge Weight MEW.
MEW is 4, our answer is FG.
Say AB & BC had edge weight 4, then ABC should be the answer.
The below example, which is the case of MEW < #children, should output ABC.
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-h
4 m-x-o-p-f-i
If some kid is like me, the kid will keep roaming multiple places before reaching home. In such cases, you might see MEW > #children and the solution would become complicated. I hope all the children in our input are obedient and they go straight from school to home.
Approach 2: Without Graph construction
If luckily the problem mentions that the longest common piece of path should be present in the paths of all the children i.e. strictly MEW == #children then you can solve by easier way. Below picture should give you clue on what to do.
Take the below example
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
Method 1:
Get longest common graph for first two: a-b-c, f-g (Result 1)
Get longest common graph for last two: p-f-g (Result 2)
Using Result 1 & 2 we get: f-g (Final Result)
Method 2:
Get longest common graph for first two: a-b-c, f-g (Result 1)
Take Result 1 and next graph i.e. m-n-o-p-f-g: f-g (Result 2)
Take Result 2 and next graph i.e. m-x-o-p-f-g: f-g (Final Result)
The beauty of the approach without graph construction is that even if kids roam same pieces of paths multiple times, we get the right solution.
If you go a step ahead, you could combine the approaches and use approach 1 as a sub-routine in approach 2.

Modifying a Depth First Search Algorithm using an Adjacency Matrix to search for specific end-node

So I have an adjacency matrix of size N x N for a graph with N nodes. I would like to conduct a Depth First Search through this matrix in order to find if a path does or does not exist from a Source node to a Destination node. If it exists I would like to print the path.
In the psuedocode below, it uses a matrix/graph G to find all vertices that can be accessed with a starting node of v. How would I modify this algorithm so I can have something similar to this: procedure DFS(G,v,d) where d is the target node I am searching for?
procedure DFS(G,v):
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
if vertex w is not labeled as discovered then
recursively call DFS(G,w)
Also as a sidenote, how would I add the ability to return the total weight of all the edges for the path that it discovered?
The algorithm needs to be modified in two ways
it needs to stop when it finds the destination
it needs to produce a path to the destination
In the pseudocode below, the path variable P starts as an empty list. When the destination is found, the destination node is placed in P. Then as each level of recursion returns, the current node w is appended to the path. When the top-level call returns, P contains the full path. There's only one problem, the path is in reverse: destination to source. So you'll have to turn it around.
procedure DFS(G,v,d,P=empty):
if v equals d
initialize P with d
return P
label v as discovered
for all edges from v to w in G.adjacentEdges(v) do
if vertex w is not labeled as discovered then
recursively call DFS(G,w,d,P)
if P is not empty
append v to P
return P
return empty

sort graph by distance to end nodes

I have a list of nodes which belong in a graph. The graph is directed and does not contain cycles. Also, some of the nodes are marked as "end" nodes. Every node has a set of input nodes I can use.
The question is the following: How can I sort (ascending) the nodes in the list by the biggest distance to any reachable end node? Here is an example off how the graph could look like.
I have already added the calculated distance after which I can sort the nodes (grey). The end nodes have the distance 0 while C, D and G have the distance 1. However, F has the distance of 3 because the approach over D would be shorter (2).
I have made a concept of which I think, the problem would be solved. Here is some pseudo-code:
sortedTable<Node, depth> // used to store nodes and their currently calculated distance
tempTable<Node>// used to store nodes
currentDepth = 0;
- fill tempTable with end nodes
while( tempTable is not empty)
{
- create empty newTempTable<Node node>
// add tempTable to sortedTable
for (every "node" in tempTable)
{
if("node" is in sortedTable)
{
- overwrite depth in sortedTable with currentDepth
}
else
{
- add (node, currentDepth) to sortedTable
}
// get the node in the next layer
for ( every "newNode" connected to node)
{
- add newNode to newTempTable
}
- tempTable = newTempTable
}
currentDepth++;
}
This approach should work. However, the problem with this algorithm is that it basicly creates a tree from the graph based from every end node and then corrects old distance-calculations for every depth. For example: G would have the depth 1 (calculatet directly over B), then the depth 3 (calculated over A, D and F) and then depth 4 (calculated over A, C, E and F).
Do you have a better solution to this problem?
It can be done with dynamic programming.
The graph is a DAG, so first do a topological sort on the graph, let the sorted order be v1,v2,v3,...,vn.
Now, set D(v)=0 for all "end node", and from last to first (according to topological order) do:
D(v) = max { D(u) + 1, for each edge (v,u) }
It works because the graph is a DAG, and when done in reversed to the topological order, the values of all D(u) for all outgoing edges (v,u) is already known.
Example on your graph:
Topological sort (one possible):
H,G,B,F,D,E,C,A
Then, the algorithm:
init:
D(B)=D(A)=0
Go back from last to first:
D(A) - no out edges, done
D(C) = max{D(A) + 1} = max{0+1}=1
D(E) = max{D(C) + 1} = 2
D(D) = max{D(A) + 1} = 1
D(F) = max{D(E)+1, D(D)+1} = max{2+1,1+1} = 3
D(B) = 0
D(G) = max{D(B)+1,D(F)+1} = max{1,4}=4
D(H) = max{D(G) + 1} = 5
As a side note, if the graph is not a DAG, but a general graph, this is a variant of the Longest Path Problem, which is NP-Complete.
Luckily, it does have an efficient solution when our graph is a DAG.

What does boost::out_edges( v, g ) in Boost.Graph do?

I am not able to comprehend the documentation for this function, I have seen several times the following
tie (ei,ei_end) = out_edges(*(vi+a),g);
**g**<-graph
**vi**<-beginning vertex of graph
**a**<- a node
**ei and ei_end** <- edge iterators
What does the function return,and what does it do,when could I use?
Can I find all edges from a node for example?
Provides iterators to iterate over the out-going edges of node u from graph g, e.g.:
typename graph_traits < Graph >::out_edge_iterator ei, ei_end;
for (boost::tie(ei, ei_end) = out_edges(u, g); ei != ei_end; ++ei) {
auto source = boost::source ( *ei, g );
auto target = boost::target ( *ei, g );
std::cout << "There is an edge from " << source << " to " << target << std::endl;
}
where Graph is your type definition of the graph an g is an instance of that. However, out_edges is only applicable for graphs with directed edges. The opposite of out_edges is in_edges that provides you iterators to compute in-coming edges of a node.
In an undirected graph both out_edges and in_edges will return all the edges connecting to the node in question.
However, more information can be easily found on http://www.boost.org/doc/libs/1_55_0/libs/graph/doc/graph_concepts.html or just in the Boost.Graph examples/tests.
As explained above, for a directed graph, out_edges accepts a "vertex_descriptor and the graph(adjacency list) to be examined" and returns "all the edges that emanate (directed from) the given vertex_descriptor", by means of an iterator-range.
As described in https://www.boost.org/doc/libs/1_69_0/libs/graph/doc/adjacency_list.html
std::pair<out_edge_iterator, out_edge_iterator>
out_edges(vertex_descriptor u, const adjacency_list& g)
Returns an iterator-range providing access to the out-edges of vertex
u in graph g. If the graph is undirected, this iterator-range provides
access to all edges incident on vertex u. For both directed and
undirected graphs, for an out-edge e, source(e, g) == u and target(e,
g) == v where v is a vertex adjacent to u.
In short, to answer some of your questions,
Yes, you can use it to find all edges from a node.
For undirected graphs, the behavior is as explained in the link above, it returns all the edges incident on the vertex (all edges connected to it)

Algorithm: find connections between towns with a limit of train changes

What algorithm would you use to create an application that given appropriate data (list of cities, train routes, train stations) is capable of returning a list of connection between any two user-selected cities? The application has to choose only those connections that fall into the limit of accepted train-changes.
Example: I ask the application which train to take if I need to travel from Paris to Moscow with max. 1 stop/switch - the application returns a route: Train 1 (Paris-Berlin) -> Train 2 (Berlin->Moscow) (No direct connection exists).
Graphical example
http://i.imgur.com/KEJ3I.png
If I ask the system about possible connections from Town A to Town G I get a response:
Brown Line (0 switches = direct)
Brown Line to Town B / Orange Line to Town G (1 switch)
Brown Line to Town B / Orange Line to Town D / Red Line to G (2 switch)
... all other possibilities
And thouh the 2nd and 3rd options are shorter than the 1st, it's the 1st that should have priority (since no train-switching is involved).
Assuming the only thing important is "number of stops/switches", then the problem is actually finding a shortest path in an unweighted directed graph.
The graph model is G = (V,E) where V = {all possible stations} and E = { (u,v) | there is a train/route from station u to station v }
Note: let's say you have a train which starts at a_0, and paths through a_1, a_2,...a_n: then E will contain: (a_0,a_1),(a_0,a_2),..,(a_0,a_n) and also (a_1,a_2),(a_1,a_3),... formally: for each i < j : (a_i,a_j) &in; E.
BFS solves this problem, and is both complete [always finds a solution if there is one] and optimal [finds the shortest path].
If the edges [routes] are weighted, something like dijkstra's algorithm will be needed instead.
If you want a list of all possible routes, Iterative-Deepening DFS could be used, without maintaining a visited set, and print all the paths found to the target up to the relevant depth. [BFS fails to return all paths with the counter example of a clique]
I think you need to compute all pairs shortest paths. Check http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm.

Resources