Determining the distance between pairs of nodes in a BST tree - data-structures

I want to write a psudo-code to find the distance between 2 nodes in a BST.
I have already implemented LCA function.
Here is my attempt:
findDistance(root,p,q){
if(root==null) return -1;
Node LCA = findLCA(root,p,q);
d1=distance(p,LCA);
d2=distance(q,LCA);
return abs(d1-d2);
}
My only problem here is that I don't know how to compute the distance between a node to its LCA.
Any help will be amazing!
Thanks!

Finding the LCA is helpful, but you could determine the length of the path while determining the LCA.
I would first define a helper function which will give the path from the root to the given node. As the root is the only node without a parent, we don't actually need to provide the root to this function:
function getPath(p) {
let path = [];
// Walk up the tree and log each node
for (let node = p; node != null; node = node.parent) {
path.push(node);
}
// We want the root to be first element of path, and p to be the last:
return path.reverse();
}
Now with this function we can collect the two paths for the two given nodes. Then we can ignore the common prefix in these two paths (the last common node is the LCA). The remaining lengths of the paths should be summed to get the final result:
function findDistance(p, q) {
let path1 = getPath(p);
let path2 = getPath(q);
let len = min(path1.length, path2.length);
// Find the index where the paths diverge
let i = 0;
while (i < len && path1[i] == path2[i]) {
i++;
}
// LCA is at path1[i-1] == path2[i-1]
// Subtract the nodes that the paths have in common (from both):
return path1.length + path2.length - i*2;
}

Related

how should I find the shortest path in a single direction tree

I have a graph that is not binary, and is of single direction. It can look like the following:
I would like to find the shortest path, for example Team "G" to Team "D". What methods can I use?
By "of single direction" I imagine what you mean is that the tree is represented by nodes which only point at the children, not at the parents. Given that, user3386109's comment provides a simple way to get the answer you are looking for.
Find the path from the root to the first node by doing any tree traversal, like in-order (even if the order of children is insignificant, in practice, there will be some way to enumerate them in some order), and record the sequence of nodes from the root to the first node. In your example, we would get G-B-A (assuming a recursive solution where we are printing these nodes in reverse order).
Find the path from the root to the second node in the same way. Here, we'd get a path like D-A.
Find the first common ancestor of the two nodes; assuming the nodes are labeled uniquely as in this example, we can simply find the first symbol in either string of nodes, that is also in the other string of nodes. Regardless of which string we start with, we should get the same answer, as we do in your example: A
Chop off everything in both strings after the common ancestor, reverse one of the strings, and concatenate them with the common ancestor in the middle. This gives a path starting with node1 going to node2. Note that the problem asks for the shortest path; however, in a tree, there will be exactly one path between any two nodes. In your example, we get G-B-A-D.
Some pseudocode...
class Node {
char label;
Node[] children;
}
string FindPath(Node root, Node node1, Node node2) {
// make sure we have valid inputs
if (root == null || node1 == null || node2 == null) return null;
// look for paths to the nodes from the root
string path1 = FindPath(root, node1);
string path2 = FindPath(root, node2);
// one of the nodes wasn't found
if (path1 == null || path2 == null) return null;
// look for first common node
// note: this isn't the most efficient approach, see
// comments on time complexity below
for (int i = 0; i < path1.Length; i++) {
char label = path1[i];
if (path2.Contains(label) {
path1 = path1.Substring(0, i);
path2 = path2.Substring(0, path2.IndexOf(label);
return path1 + label + ReverseString(path2);
}
}
// will never reach here because it's guaranteed we will find
// a common ancestor in a tree
throw new Exception("Unreachable statement");
}
string FindPath(Node root, Node node) {
// make sure inputs are valid
if (root == null || node == null) return null;
if (root.label == node.label) {
// found the node
return node.label;
} else {
// this is not the node, exit early if no children
if (root.children == null || root.children.Count == 0) return null;
// check each child and if we find a path, return it
foreach (Node child in root.children) {
string path = FindPath(child, node);
if (path != null && path.Length > 0) {
return path + root.label;
}
}
// it's possible that the target node is not in this subtree
return null;
}
}
In terms of the number of nodes in the tree, the complexity should look like...
Getting the path from the root to each node should at worst visit each node in the tree, so O(|V|)
Looking for the first common node... well, a naive approach would be O(|V|^2) if we use the code as proposed above, since in the worst case, the tree is split in two long branches. But, we could be a little smarter and start at the end and work our way back as long as the strings match, and return the last matching node we saw as soon as we see that they don't match anymore (or we run out of symbols to check). Coding that is left as an exercise and that should be O(|V|)

Traversal directed graph with cycles

I wrote a script to construct a directed graph using networkx in python, and I want to get all possible path from start to end including cycles.
For example, there is a directed graph:
I want to get these paths:
A->B->D
A->B->C->D
A->B->C->B->D
A->B->C->B->C->B->D
...
As far as I know, there are many algorithms to find shortest paths or paths without cycles between 2 nodes, but I want to find paths with cycles.
Is there any algorithm to achieve this ?
Thx a lot
As noted, there is an infinite number of such paths.
However, you can still generate all of them in a lazy way by maintaining all nodes v (and path you used to reach v) you can reach from the start node in k steps for k=1,2,...; if v is your target node, remember it.
When you have to return the next path, (i) pop the first target node off list, and (ii) generate the next candidates for all non-target nodes on the list. If there is no target node on the list, repeat (ii) until you find one.
The method works assuming the path always exists. If you don't find a path in n-1 steps, where n is the number of nodes, simply report that no path exists.
Here's the pseudo code for an algorithm that generates paths from shortest to longest assuming unit weights:
class Node {
int steps
Node prev
Node(int steps=0, Node prev=null) {
prev = prev
steps = steps
}
}
class PathGenerator {
Queue<Node> nodes
Node start, target;
PathGenerator(Node start, Node target) {
start = start
target = target
nodes = new Queue<>()
nodes.add(start) // assume start.steps=0 and stat.prev=null
}
Node nextPath(int n) {
current_length = -1;
do {
node = nodes.poll()
current_length = node.steps
// expand to all others you can reach from node
for each u in node.neighbors()
list.add(new Node(node, node.steps+1))
// if node is the target, return the path
if (node == target)
return node
} while (current_length < n);
throw new Exception("no path of length <=n exists");
}
}
Beware that the list nodes can grow exponentially in the worst case (think of what happens in case you run it on a complete graph).

Remove all nodes which don’t lie in any path with sum>= k

I was reading the topic here and reading the code- http://www.geeksforgeeks.org/remove-all-nodes-which-lie-on-a-path-having-sum-less-than-k/
However I am stuck at a point. In the following functions of the code:
struct Node *pruneUtil(struct Node *root, int k, int *sum)
{
// Base Case
if (root == NULL) return NULL;
// Initialize left and right sums as sum from root to
// this node (including this node)
int lsum = *sum + (root->data);
int rsum = lsum;
// Recursively prune left and right subtrees
root->left = pruneUtil(root->left, k, &lsum);
root->right = pruneUtil(root->right, k, &rsum);
// Get the maximum of left and right sums
*sum = max(lsum, rsum);
// If maximum is smaller than k, then this node
// must be deleted
if (*sum < k)
{
free(root);
root = NULL;
}
return root;
}
// A wrapper over pruneUtil()
struct Node *prune(struct Node *root, int k)
{
int sum = 0;
return pruneUtil(root, k, &sum);
}
I have following two queries:
In the function *prune, we call the function pruneUTIL send the address of sum and then in function pruneUTIL, we use *sum everywhere.Why do we send the address? What's the point of using address of sum?
Why don't we send simple the integer sum?
Why do we calculate max of lsum and rsum?
Can anybody solve the query?
The point of using the pointer to the sum variable is that the variable sum will be modifiable. If you just send one sum then the change in value will not be reflected when the function returns . Eg
int main()
{
int v=1;
goot(v);
gootp(&v)l
}
void goot(int val)
{
val=10;
}
void gootp(int * val)
{
*val=10;
}
After goot return the value of v will not be 10 it will remain to be 1
but when a call to gootp is made the value will be modified to 10.
And here we want the change to be reflected as for each node we want sum to be the maximum of all the paths it can take to reach any leaf + the value calculated from root to the node we could have simply have returned the value but here we are deleting nodes so we need to return the nodes if they are null then left/right child need to be modified as only one thing can be returned so we use pointer to this variable
So we do not send the variable sum because when the function pruneUtil return for left and right child of a node then the value of sum calculated for those subtrees would be lost. So that we can compare the sum of the path because we need to add the value of nodes below the node to calculate the sum of the path. If we just send a variable its changes could not be seen in the parent function(as we can not return the value as it is being used to return the node).
We calculate max of left and right because when for a node there are two paths going towards leaf one via its left and other via its right child (these paths further will divide) so if either of these paths have sum value > k then we need not to delete the node. So we take max of left and right its done for each node so for each node we get the maximum value of the path from root to leaf.

Finding all the shortest paths between two nodes in unweighted undirected graph

I need help finding all the shortest paths between two nodes in an unweighted undirected graph.
I am able to find one of the shortest paths using BFS, but so far I am lost as to how I could find and print out all of them.
Any idea of the algorithm / pseudocode I could use?
As a caveat, remember that there can be exponentially many shortest paths between two nodes in a graph. Any algorithm for this will potentially take exponential time.
That said, there are a few relatively straightforward algorithms that can find all the paths. Here's two.
BFS + Reverse DFS
When running a breadth-first search over a graph, you can tag each node with its distance from the start node. The start node is at distance 0, and then, whenever a new node is discovered for the first time, its distance is one plus the distance of the node that discovered it. So begin by running a BFS over the graph, writing down the distances to each node.
Once you have this, you can find a shortest path from the source to the destination as follows. Start at the destination, which will be at some distance d from the start node. Now, look at all nodes with edges entering the destination node. A shortest path from the source to the destination must end by following an edge from a node at distance d-1 to the destination at distance d. So, starting at the destination node, walk backwards across some edge to any node you'd like at distance d-1. From there, walk to a node at distance d-2, a node at distance d-3, etc. until you're back at the start node at distance 0.
This procedure will give you one path back in reverse order, and you can flip it at the end to get the overall path.
You can then find all the paths from the source to the destination by running a depth-first search from the end node back to the start node, at each point trying all possible ways to walk backwards from the current node to a previous node whose distance is exactly one less than the current node's distance.
(I personally think this is the easiest and cleanest way to find all possible paths, but that's just my opinion.)
BFS With Multiple Parents
This next algorithm is a modification to BFS that you can use as a preprocessing step to speed up generation of all possible paths. Remember that as BFS runs, it proceeds outwards in "layers," getting a single shortest path to all nodes at distance 0, then distance 1, then distance 2, etc. The motivating idea behind BFS is that any node at distance k + 1 from the start node must be connected by an edge to some node at distance k from the start node. BFS discovers this node at distance k + 1 by finding some path of length k to a node at distance k, then extending it by some edge.
If your goal is to find all shortest paths, then you can modify BFS by extending every path to a node at distance k to all the nodes at distance k + 1 that they connect to, rather than picking a single edge. To do this, modify BFS in the following way: whenever you process an edge by adding its endpoint in the processing queue, don't immediately mark that node as being done. Instead, insert that node into the queue annotated with which edge you followed to get to it. This will potentially let you insert the same node into the queue multiple times if there are multiple nodes that link to it. When you remove a node from the queue, then you mark it as being done and never insert it into the queue again. Similarly, rather than storing a single parent pointer, you'll store multiple parent pointers, one for each node that linked into that node.
If you do this modified BFS, you will end up with a DAG where every node will either be the start node and have no outgoing edges, or will be at distance k + 1 from the start node and will have a pointer to each node of distance k that it is connected to. From there, you can reconstruct all shortest paths from some node to the start node by listing of all possible paths from your node of choice back to the start node within the DAG. This can be done recursively:
There is only one path from the start node to itself, namely the empty path.
For any other node, the paths can be found by following each outgoing edge, then recursively extending those paths to yield a path back to the start node.
This approach takes more time and space than the one listed above because many of the paths found this way will not be moving in the direction of the destination node. However, it only requires a modification to BFS, rather than a BFS followed by a reverse search.
Hope this helps!
#templatetypedef is correct, but he forgot to mention about distance check that must be done before any parent links are added to node. This means that se keep the distance from source in each of nodes and increment by one the distance for children. We must skip this increment and parent addition in case the child was already visited and has the lower distance.
public void addParent(Node n) {
// forbidding the parent it its level is equal to ours
if (n.level == level) {
return;
}
parents.add(n);
level = n.level + 1;
}
The full java implementation can be found by the following link.
http://ideone.com/UluCBb
I encountered the similar problem while solving this https://oj.leetcode.com/problems/word-ladder-ii/
The way I tried to deal with is first find the shortest distance using BFS, lets say the shortest distance is d. Now apply DFS and in DFS recursive call don't go beyond recursive level d.
However this might end up exploring all paths as mentioned by #templatetypedef.
First, find the distance-to-start of all nodes using breadth-first search.
(if there are a lot of nodes, you can use A* and stop when top of the queue has distance-to-start > distance-to-start(end-node). This will give you all nodes that belong to some shortest path)
Then just backtrack from the end-node. Anytime a node is connected to two (or more) nodes with a lower distance-to-start, you branch off into two (or more) paths.
templatetypedef your answer was very good, thank you a lot for that one(!!), but it missed out one point:
If you have a graph like this:
A-B-C-E-F
| |
D------
Now lets imagine I want this path:
A -> E.
It will expand like this:
A-> B -> D-> C -> F -> E.
The problem there is,
that you will have F as a parent of E, but
A->B->D->F-E is longer than
A->B->C->E. You will have to take of tracking the distances of parents you are so happily adding.
Step 1: Traverse the graph from the source by BFS and assign each node the minimal distance from the source
Step 2: The distance assigned to the target node is the shortest length
Step 3: From source, do a DFS search along all paths where the minimal distance is increased one by one until the target node is reached or the shortest length is reached. Print the path whenever the target node is reached.
A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:
Every adjacent pair of words differs by a single letter.
Every si for 1 <= i <= k is in wordList. Note that beginWord does not need to be in wordList.
sk == endWord
Given two words, beginWord and endWord, and a dictionary wordList, return all the shortest transformation sequences from beginWord to endWord, or an empty list if no such sequence exists. Each sequence should be returned as a list of the words [beginWord, s1, s2, ..., sk].
Example 1:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: [["hit","hot","dot","dog","cog"],["hit","hot","lot","log","cog"]]
Explanation: There are 2 shortest transformation sequences:
"hit" -> "hot" -> "dot" -> "dog" -> "cog"
"hit" -> "hot" -> "lot" -> "log" -> "cog"
Example 2:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: []
Explanation: The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.
https://leetcode.com/problems/word-ladder-ii
class Solution {
public List<List<String>> findLadders(String beginWord, String endWord, List<String> wordList) {
List<List<String>> result = new ArrayList<>();
if (wordList == null) {
return result;
}
Set<String> dicts = new HashSet<>(wordList);
if (!dicts.contains(endWord)) {
return result;
}
Set<String> start = new HashSet<>();
Set<String> end = new HashSet<>();
Map<String, List<String>> map = new HashMap<>();
start.add(beginWord);
end.add(endWord);
bfs(map, start, end, dicts, false);
List<String> subList = new ArrayList<>();
subList.add(beginWord);
dfs(map, result, subList, beginWord, endWord);
return result;
}
private void bfs(Map<String, List<String>> map, Set<String> start, Set<String> end, Set<String> dicts, boolean reverse) {
// Processed all the word in start
if (start.size() == 0) {
return;
}
dicts.removeAll(start);
Set<String> tmp = new HashSet<>();
boolean finish = false;
for (String str : start) {
char[] chars = str.toCharArray();
for (int i = 0; i < chars.length; i++) {
char old = chars[i];
for (char n = 'a' ; n <='z'; n++) {
if(old == n) {
continue;
}
chars[i] = n;
String candidate = new String(chars);
if (!dicts.contains(candidate)) {
continue;
}
if (end.contains(candidate)) {
finish = true;
} else {
tmp.add(candidate);
}
String key = reverse ? candidate : str;
String value = reverse ? str : candidate;
if (! map.containsKey(key)) {
map.put(key, new ArrayList<>());
}
map.get(key).add(value);
}
// restore after processing
chars[i] = old;
}
}
if (!finish) {
// Switch the start and end if size from start is bigger;
if (tmp.size() > end.size()) {
bfs(map, end, tmp, dicts, !reverse);
} else {
bfs(map, tmp, end, dicts, reverse);
}
}
}
private void dfs (Map<String, List<String>> map,
List<List<String>> result , List<String> subList,
String beginWord, String endWord) {
if(beginWord.equals(endWord)) {
result.add(new ArrayList<>(subList));
return;
}
if (!map.containsKey(beginWord)) {
return;
}
for (String word : map.get(beginWord)) {
subList.add(word);
dfs(map, result, subList, word, endWord);
subList.remove(subList.size() - 1);
}
}
}

How to determine if a given directed graph is a tree

The input to the program is the set of edges in the graph. For e.g. consider the following simple directed graph:
a -> b -> c
The set of edges for this graph is
{ (b, c), (a, b) }
So given a directed graph as a set of edges, how do you determine if the directed graph is a tree? If it is a tree, what is the root node of the tree?
First of I'm looking at how will you represent this graph, adjacency list/ adjacency matrix / any thing else? How will utilize the representation that you have chosen to efficiently answer the above questions?
Edit 1:
Some people are mentoning about using DFS for cycle detection but the problem is which node to start the DFS from. Since it is a directed graph we cannot start the DFS from a random node, for e.g. if I started a DFS from vertex 'c' it won't proceed further since there is no back edge to go to any other nodes. The follow up question here should be how do you determine what is the root of this tree.
Here's a fairly direct method. It can be done with either an adjacency matrix or an edge list.
Find the set, R, of nodes that do not appear as the destination of any edge. If R does not have exactly one member, the graph is not a tree.
If R does have exactly one member, r, it is the only possible root.
Mark r.
Starting from r, recursively mark all nodes that can be reached by following edges from source to destination. If any node is already marked, there is a cycle and the graph is not a tree. (This step is the same as a previously posted answer).
If any node is not marked at the end of step 3, the graph is not a tree.
If none of those steps find that the graph is not a tree, the graph is a tree with r as root.
It's difficult to know what is going to be efficient without some information about the numbers of nodes and edges.
There are 3 properties to check if a graph is a tree:
(1) The number of edges in the graph is exactly one less than the number of vertices |E| = |V| - 1
(2) There are no cycles
(3) The graph is connected
I think this example algorithm can work in the cases of a directed graph:
# given a graph and a starting vertex, check if the graph is a tree
def checkTree(G, v):
# |E| = |V| - 1
if edges.size != vertices.size - 1:
return false;
for v in vertices:
visited[v] = false;
hasCycle = explore_and_check_cycles(G, v);
# a tree is acyclic
if hasCycle:
return false;
for v in vertices:
if not visited[v]: # the graph isn't connected
return false;
# otherwise passes all properties for a tree
return true;
# given a Graph and a vertex, explore all reachable vertices from the vertex
# and returns true if there are any cycles
def explore_and_check_cycles(G, v):
visited[v] = true;
for (v, u) in edges:
if not visited[u]:
return explore_and_check_cyles(G, u)
else: # a backedge between two vertices indicates a cycle
return true
return false
Sources:
Algorithms by S. Dasgupta, C.H. Papadimitriou, and U.V. Vazirani
http://www.cs.berkeley.edu/~vazirani/algorithms.html
start from the root, "mark" it and then go to all children and repeat recursively. if you reach a child that is already marked it means that it is not a tree...
Note: By no means the most efficient way, but conceptually useful. Sometimes you want efficiency, sometimes you want an alternate viewpoint for pedagogic reasons. This is most certainly the later.
Algorithm: Starting with an adjacency matrix A of size n. Take the matrix power A**n. If the matrix is zero for every entry you know that it is at least a collection of trees (a forest). If you can show that it is connected, then it must be a tree. See a Nilpotent matrix. for more information.
To find the root node, we assume that you've shown the graph is a connected tree. Let k be the the number of times you have to raise the power A**k before the matrix becomes all zero. Take the transpose to the (k-1) power A.T ** (k-1). The only non-zero entry must be the root.
Analysis: A rough worse case analysis shows that it is bounded above by O(n^4), three for the matrix multiplication at most n times. You can do better by diagonalizing the matrix which should bring it down to O(n^3). Considering that this problem can be done in O(n), O(1) time/space this is only a useful exercise in logic and understanding of the problem.
For a directed graph, the underlying undirected graph will be a tree if the undirected graph is acyclic and fully connected. The same property holds good if for the directed graph, each vertex has in-degree=1 except one having in-degree=0.
If adjacency-list representation also supports in-degree property of each vertex, then we can apply the above rule easily. Otherwise, we should apply a tweaked DFS to find no-loops for underlying undirected graph as well as that |E|=|V|-1.
Following is the code I have written for this. Feel free to suggest optimisations.
import java.util.*;
import java.lang.*;
import java.io.*;
class Graph
{
private static int V;
private static int adj[][];
static void initializeGraph(int n)
{
V=n+1;
adj = new int[V][V];
for(int i=0;i<V;i++)
{
for(int j=0;j<V ;j++)
adj[i][j]= 0;
}
}
static int isTree(int edges[][],int n)
{
initializeGraph(n);
for(int i=0;i< edges.length;i++)
{
addDirectedEdge(edges[i][0],edges[i][1]);
}
int root = findRoot();
if(root == -1)
return -1;
boolean visited[] = new boolean[V];
boolean isTree = isTree(root, visited, -1);
boolean isConnected = isConnected(visited);
System.out.println("isTree= "+ isTree + " isConnected= "+ isConnected);
if(isTree && isConnected)
return root;
else
return -1;
}
static boolean isTree(int node, boolean visited[], int parent)
{
// System.out.println("node =" +node +" parent" +parent);
visited[node] = true;int j;
for(j =1;j<V;j++)
{
// System.out.println("node =" +node + " j=" +j+ "parent" + parent);
if(adj[node][j]==1)
{
if(visited[j])
{
// System.out.println("returning false for j="+ j);
return false;
}
else
{ //visit all adjacent vertices
boolean child = isTree(j, visited, node);
if(!child)
{
// System.out.println("returning false for j="+ j + " node=" +node);
return false;
}
}
}
}
if(j==V)
return true;
else
return false;
}
static int findRoot()
{
int root =-1, j=0,i;
int count =0;
for(j=1;j<V ;j++)
{
count=0;
for(i=1 ;i<V;i++)
{
if(adj[i][j]==1)
count++;
}
// System.out.println("j"+j +" count="+count);
if(count==0)
{
// System.out.println(j);
return j;
}
}
return -1;
}
static void addDirectedEdge(int s, int d)
{
// System.out.println("s="+ s+"d="+d);
adj[s][d]=1;
}
static boolean isConnected(boolean visited[])
{
for(int i=1; i<V;i++)
{
if(!visited[i])
return false;
}
return true;
}
public static void main (String[] args) throws java.lang.Exception
{
int edges[][]= {{2,3},{2,4},{3,1},{3,5},{3,7},{4,6}, {2,8}, {8,9}};
int n=9;
int root = isTree(edges,n);
System.out.println("root is:" + root);
int edges2[][]= {{2,3},{2,4},{3,1},{3,5},{3,7},{4,6}, {2,8}, {6,3}};
int n2=8;
root = isTree(edges2,n2);
System.out.println("root is:" + root);
}
}

Resources