I have
a few million records in a database that
reference each other (a directed acyclic graph). There are direct references (A -> B) and indirect references (if A -> B and B -> C, then A -> C). Indirect references can have any recursion depths, but in reality the depth is at most 100. This is very similar to objects in an object oriented language can reference other objects, recursively, except that cycles are not allowed.
A record can have between zero and 100 direct references.
Each record can be marked or not (most records are not marked).
Problem
I'm looking for an efficient data structure and algorithm to find all marked referenced (directly or indirectly referenced) records given a set of records (often just one, or up to 100). There are directly marked records (if a directly referenced record is marked), or indirectly marked records (if an indirectly referenced record is marked).
Reading the records is relatively slow, let's say 2 milliseconds per record.
I'm not looking for using a faster storage or similar here. I know it is possible, but it is quite hard to keep in sync. I'm trying to add a secondary data structure that contains just the relevant data. This will speed things up quite a bit (maybe factor of 10 or even 100), but bring a constant-factor improvement. I'm still interested in understanding if it's possible to improve the algorithm, if the amount of data grows.
Ideas
I have thought about the following options:
Brute force: One algorithm would be to search for all (directly or indirectly referenced) entries, and filter for marked entries. But that is slow, obviously, as I have to process all (directly or indirectly) referenced entries. Maybe none are marked, but 20'000 are referenced.
Shadow mark: Another algorithm would be to have a reverse index (which entries are referencing which other entries), and then each time an entry is marked, also "shadow-mark" all the entries that reference this entry, recursively. That way, when searching for marked entries, we can filter for those that have the "shadow-mark" set. The disadvantage is that many updates are needed if an entry is marked. A related option would be using a Bloom filter for shadow marking. But this would just reduce the memory usage.
Let's say we maintain a "maximum-depth" which is the maximum depth of a tree (the maximum number of hops from any record). And then we use the shadown-mark algorithm from above, but only partially: only up to maximum-depth / 2 recursion levels. So we limit propagating the shadow-mark. And then, for a query, we also limit the recursion depth to maximum-depth / 2. That way, we will "meet in the middle" in the worst case. (I should probably draw a picture.) A sub-problem is then how to efficiently maintain this maximum-depth.
I wonder, is there something similar to this approach? Something that doesn't require many updates when marking an entry, and doesn't require too many reads when querying? Or maybe a solution that allows to gradually update entries, if an entry is marked?
Example
In this example (blue is "marked"), for example if I search for (indirectly) referenced marked records for 5, I would like to quickly find 1 and 3.
You could keep a table on each node that records which marked nodes are reachable from it, and keep it updated whenever a node (or edge) is added or removed from the graph, similar to network routing tables are kept for each node in a network. There are a couple of specifics about your problem that make it simpler than a network routing table though:
You don't want to know the actual path to the marked nodes from a given node, only that one (or more) exists.
The graph is acyclic.
It's not a distributed system so you have full control (obviously ...).
Because you don't care about path and because the graph is acyclic, the table on each node can be a map marked_node_id -> count where count is the number of paths from the given node to the given marked-node. When a new node is added the new node's table is built as the union of all the nodes tables adjacent to the new node where count is the sum. Additionally, the tables of all nodes adjacent from the new node have to be updated by adding the new node's table to each of them, and this has to be done recursively up the adjacent from chain. When a node is deleted you have to do similar.
Basic complexity analysis: Finding all marked nodes of a given node is O(1) and can be done with info stashed on a single node - which is the whole point. In general, adding and removing an edge (or a new node plus its edges) will require updating tables of all connected nodes recursively (upto a call depth of 100 and branching factor upto 100). Building tables initially would be O(number-of-nodes) by reverse flooding from marked nodes.
Code Example:
This is abstract and in-code solution but should translate. I'm using Python (+GraphViz) because you didn't specify a language, it's probably most accessible to widest audience, and is easy to prototype in. I'm also going to only implement add/remove node operations (to modify a node can remove then add with different initialization) and build the graph from scratch which isn't really realistic, but you can build tables initially given an existing graph by working backwards from marked nodes pretty easily. Also note:
The following require each node to have/maintain an adjacent_from list in addition to adjacent_to list so we can recurse up the adjacent from paths when a given node is deleted.
I've assumed each marked node is reachable from itself - just makes things bit easier to implement.
def main():
''' Build a test graph, then test. '''
graph = Graph()
a = graph.add_node('a', marked=True)
b = graph.add_node('b', marked=True)
c = graph.add_node('c', marked=True)
d = graph.add_node('d', adjacent_to=[a])
e = graph.add_node('e', adjacent_to=[d])
f = graph.add_node('f',adjacent_to=[c])
g = graph.add_node('g', adjacent_to=[d,f])
h = graph.add_node('h', adjacent_to=[e,g])
i = graph.add_node('i')
j = graph.add_node('j', marked=True, adjacent_to=[i])
k = graph.add_node('k', adjacent_to=[j])
l = graph.add_node('l', adjacent_to=[k])
m = graph.add_node('m', adjacent_to=[j])
with open('main0.dot', 'w') as f:
f.write(graph.gviz())
graph.delete_node('f')
with open('main1.dot', 'w') as f:
f.write(graph.gviz())
graph.delete_node('e')
with open('main2.dot', 'w') as f:
f.write(graph.gviz())
graph.delete_node('g')
with open('main3.dot', 'w') as f:
f.write(graph.gviz())
# Run this script to process graphviz files: for i in *.dot; do dot -Tpng $i > "${i%%.dot}.png"; done
class Graph:
''' Container for nodes. '''
def __init__(self):
self.nodes = {}
def add_node(self, id, marked=False, adjacent_to=[]):
assert id not in self.nodes
self.nodes[id] = Node(id, marked, adjacent_to)
return self.nodes[id]
def delete_node(self, id):
assert id in self.nodes
node = self.nodes[id]
self._recursive_subtract_table_on_delete(node, node)
for adjacent_from_node in node.adjacent_from:
adjacent_from_node._remove_adjacent_node(node.id)
del self.nodes[id]
def _recursive_subtract_table_on_delete(self, node, deleted_node):
for adjacent_from_node in node.adjacent_from:
self._recursive_subtract_table_on_delete(adjacent_from_node, deleted_node)
node._delete_reachability_table(deleted_node)
def gviz(self):
return 'strict digraph {\n%s}' % ''.join([n._gviz_edges() for n in self.nodes.values()])
class Node:
def __init__(self, id, marked=False, adjacent_to = []):
''' Init node. Note only adjacent_to not adjacent_from node are allowed,
which measn we dno't have to update adjacent_from reachable_marks. '''
self.id = id
self.marked = marked
self.adjacent_to = adjacent_to
self.adjacent_from = []
self.reachable_marks = {}
if marked:
self.reachable_marks[id] = 1
for adjacent_node in adjacent_to:
adjacent_node.adjacent_from.append(self);
self._add_reachability_table(adjacent_node)
def _add_reachability_table(self, node):
''' Add the reachable_marks table from node to self. '''
for (marked_node_id, k) in node.reachable_marks.items():
self.reachable_marks[marked_node_id] = self.reachable_marks[marked_node_id] + 1 if marked_node_id in self.reachable_marks else 1
def _delete_reachability_table(self, node):
''' Delete the reachable_marks table from node from self. '''
for (marked_node_id, k) in node.reachable_marks.items():
self.reachable_marks[marked_node_id] = self.reachable_marks[marked_node_id] - 1 if marked_node_id in self.reachable_marks else 0
self.reachable_marks = {k: v for k,v in self.reachable_marks.items() if v}
def _remove_adjacent_node(self, id):
self.adjacent_to = list(filter(lambda n: n.id != id, self.adjacent_to))
def _gviz_edges(self):
''' Helper to print graphviz edges adjacent to this node. '''
_str = ''
if self.marked:
_str += ' %s[style=filled,fillcolor=blue]\n' % (self._gviz_name(),)
else:
_str += self._gviz_name() + '\n'
for adjacent_node in self.adjacent_to:
_str += ' %s -> %s\n' % (self._gviz_name(), adjacent_node._gviz_name())
return _str;
def _gviz_name(self):
''' Helper to print graphviz name with reachable marks. '''
return '"' + self.id + '(' + ','.join(self.reachable_marks.keys()) + ')"'
if __name__ == '__main__':
main()
Results:
The output graph shows marked nodes reachable from each node in brackets.
Initial:
Remove node f:
Remove node e:
Remove node g:
This problem is related to fully dynamic transitive closure. I'm not intimately familiar with the research literature on the latter (probably most of which is not practical), but there is one algorithmic trick that you might not know about, related to your "maximum depth" idea.
Add a binary flag ("open" or "closed") to each node, and store both incoming and outgoing arcs. The rules are, every node that can reach an open node is open, and (equivalently) every node that can be reached by a closed node is closed. Each closed node also stores the set of marked nodes that it can reach. To query, traverse forward (outgoing arcs) from the queried node via open nodes, stopping at closed nodes. To update, traverse backward (incoming arcs) from the updated node via closed nodes, stopping at open nodes.
A closed node with incoming arcs from open nodes only can be converted to open. An open node with outgoing arcs to closed nodes only can be converted to closed. Conversion requires updates proportional to (in- or out-) degree. At this scale, I would suggest dumping the whole graph periodically and computing a reasonable set of adjustments in main memory.
To find all the marked records that are reachable from a given record is equivalent counting the marked records in the component that contains the given record.
This can be done with breadth first or depth first search.
There is no faster algorithm. To improve your performance I believe you need to:
Implement an efficient search code using an optimising compiler
Switch to a high performance database engine
Optimize your queries. ( Do not read records one at a time! )
Optimize your hardware configuration ( no networks, no spinning disks )
I have implemented 4 query algorithms with real-world data. All algorithms are don't require new data structures, or updates when marking an entry.
The test uses real-world data, with 40'000 nodes, and for each node tries to find a connection to 200 other random nodes. I expect that with more data, the results will be similar, because I expect that the "shape" of the data is very similar. For this experiment, I report the minimum, maximum, and average number of nodes read for each check.
Depth-first search: min: 1, max: 1659, avg: 4.03
Breath-first search: min: 1, max: 1659, avg: 4.02
Reverse breath-first search: min: 1, max: 102859, avg: 4.21
Bidirectional adaptive breath-first search: min: 1, max: 174, avg: 1.29
The best algorithm is 3.11 times faster than breath-first search (BFS). Reverse BFS is slower than BFS, due to the "shape" of the data: It seems each node references at most a few children. But there are a few nodes that are referenced by a lot of other nodes. So reverse search can be slow (max is much higher for reverse BFS).
Bidirectional adaptive BFS uses the following algorithm:
The idea is to do a mix of BFS and reverse BFS, in a balanced way.
We use a mix of downward search (source to target), and upward search (target to source).
We remember the nodes we have seen so far in both directions.
At each step, we either expand downwards, or upwards.
Which direction we go at each step depends on how many nodes we have seen in the last step on the way down, versus on the way up: If we have seen more on the way down, then the next step is to go up. Otherwise, go down.
Implementation (Java):
System.out.println("Number of nodes: " + nodes.size());
System.out.println();
int dfsCount = 0, bfsCount = 0, revCount = 0, biCount = 0;
int dfsMin = Integer.MAX_VALUE, bfsMin = Integer.MAX_VALUE, revMin = Integer.MAX_VALUE, biMin = Integer.MAX_VALUE;
int dfsMax = Integer.MIN_VALUE, bfsMax = Integer.MIN_VALUE, revMax = Integer.MIN_VALUE, biMax = Integer.MIN_VALUE;
int totalCount = 0;
Random r = new Random(1);
for (int i = 0; i < nodeList.size(); i++) {
int a = i;
for (int j = 0; j < 200; j++) {
int b;
do {
b = r.nextInt(nodes.size());
} while (a == b);
Node na = nodeList.get(a);
Node nb = nodeList.get(b);
totalCount++;
AtomicInteger x = new AtomicInteger();
boolean r1 = depthFirstSearch(x, na, nb);
dfsCount += x.get();
dfsMin = Math.min(dfsMin, x.get());
dfsMax = Math.max(dfsMax, x.get());
x.set(0);
boolean r2 = breathFirstSearch(x, na, nb);
bfsCount += x.get();
bfsMin = Math.min(bfsMin, x.get());
bfsMax = Math.max(bfsMax, x.get());
x.set(0);
boolean r3 = reverseBreathFirstSearch(x, na, nb);
revCount += x.get();
revMin = Math.min(revMin, x.get());
revMax = Math.max(revMax, x.get());
x.set(0);
boolean r4 = bidirectionalAdaptiveBFS(x, na, nb);
biCount += x.get();
biMin = Math.min(biMin, x.get());
biMax = Math.max(biMax, x.get());
x.set(0);
if (r1 != r2 || r1 != r3 || r1 != r4) {
depthFirstSearchTrace(na, nb);
bidirectionalAdaptiveBFS(x, na, nb);
throw new AssertionError(r1 + " " + r2 + " " + r3 + " " + r4);
}
}
}
System.out.println("Depth-first search");
System.out.printf("min: %d, max: %d, avg: %2.2f\n", dfsMin, dfsMax, ((double) dfsCount / totalCount));
System.out.println();
System.out.println("Breath-first search");
System.out.printf("min: %d, max: %d, avg: %2.2f\n", bfsMin, bfsMax, ((double) bfsCount / totalCount));
System.out.println();
System.out.println("Reverse breath-first search");
System.out.printf("min: %d, max: %d, avg: %2.2f\n", revMin, revMax, ((double) revCount / totalCount));
System.out.println();
System.out.println("Bidirectional adaptive breath-first search");
System.out.printf("min: %d, max: %d, avg: %2.2f\n", biMin, biMax, ((double) biCount / totalCount));
System.out.println();
static boolean depthFirstSearch(AtomicInteger count, Node source, Node target) {
HashSet<Node> tested = new HashSet<>();
tested.add(source);
return depthFirstSearch(count, tested, source, target);
}
static boolean depthFirstSearch(AtomicInteger count, HashSet<Node> tested, Node source, Node target) {
count.incrementAndGet();
for(Node n : source.references) {
if (n == target) {
return true;
}
if (!tested.contains(n)) {
tested.add(n);
if (depthFirstSearch(count, n, target)) {
return true;
}
}
}
return false;
}
static boolean breathFirstSearch(AtomicInteger count, Node source, Node target) {
HashSet<Node> tested = new HashSet<>();
tested.add(source);
return breathFirstSearch(count, tested, source, target);
}
static boolean breathFirstSearch(AtomicInteger count, HashSet<Node> tested, Node source, Node target) {
count.incrementAndGet();
for(Node n : source.references) {
if (n == target) {
return true;
}
}
for(Node n : source.references) {
if (!tested.contains(n)) {
tested.add(n);
if (breathFirstSearch(count, n, target)) {
return true;
}
}
}
return false;
}
static boolean reverseBreathFirstSearch(AtomicInteger count, Node source, Node target) {
HashSet<Node> tested = new HashSet<>();
tested.add(target);
return reverseBreathFirstSearch(count, tested, source, target);
}
static boolean reverseBreathFirstSearch(AtomicInteger count, HashSet<Node> tested, Node source, Node target) {
count.incrementAndGet();
for(Node n : target.referencedBy) {
if (n == source) {
return true;
}
}
for(Node n : target.referencedBy) {
if (!tested.contains(n)) {
tested.add(n);
if (breathFirstSearch(count, source, n)) {
return true;
}
}
}
return false;
}
static boolean bidirectionalAdaptiveBFS(AtomicInteger count, Node source, Node target) {
HashSet<Node> allSources = new HashSet<>();
HashSet<Node> sources = new HashSet<>();
allSources.add(source);
sources.add(source);
HashSet<Node> allTargets = new HashSet<>();
HashSet<Node> targets = new HashSet<>();
allTargets.add(target);
targets.add(target);
return bidirectionalAdaptiveBFS(count, allSources, allTargets, sources, targets);
}
static boolean bidirectionalAdaptiveBFS(AtomicInteger count, Set<Node> allSources, Set<Node> allTargets, Set<Node> sources, Set<Node> targets) {
while (!sources.isEmpty() && !targets.isEmpty()) {
if (sources.size() <= targets.size()) {
HashSet<Node> newSources = new HashSet<>();
for(Node source: sources) {
count.incrementAndGet();
for(Node n : source.references) {
if (!allSources.contains(n)) {
newSources.add(n);
allSources.add(n);
if (allTargets.contains(n)) {
return true;
}
}
}
}
sources = newSources;
} else {
HashSet<Node> newTargets = new HashSet<>();
for(Node target: targets) {
count.incrementAndGet();
for(Node n : target.referencedBy) {
if (!allTargets.contains(n)) {
newTargets.add(n);
allTargets.add(n);
if (allSources.contains(n)) {
return true;
}
}
}
}
targets = newTargets;
}
}
return false;
}
static class Node {
String name;
HashSet<Node> references = new HashSet<>();
HashSet<Node> referencedBy = new HashSet<>();
boolean marked;
Node(String name) {
this.name = name;
}
void addReference(Node n) {
references.add(n);
n.referencedBy.add(this);
}
public String toString() {
return name;
}
#Override
public boolean equals(Object other) {
if (!(other instanceof Node)) {
return false;
}
return name.equals(((Node) other).name);
}
#Override
public int hashCode() {
return name.hashCode();
}
}
I am looking at the Wikipedia page for KD trees Nearest Neighbor Search.
The pseudo code given in Wikipedia works when the points are in 2-D(x,y) .
I want to know,what changes should i make,when the points are 3-D(x,y,z).
I googled a lot and even went through similar questions link in stack overflow ,but i did n't find the 3-d implementation any where,all previous question takes 2-D points as input ,not the 3-D points that i am looking for.
The pseudo code in Wiki for building the KD Tree is::
function kdtree (list of points pointList, int depth)
{
// Select axis based on depth so that axis cycles through all valid values
var int axis := depth mod k;
// Sort point list and choose median as pivot element
select median by axis from pointList;
// Create node and construct subtrees
var tree_node node;
node.location := median;
node.leftChild := kdtree(points in pointList before median, depth+1);
node.rightChild := kdtree(points in pointList after median, depth+1);
return node;
}
How to find the Nearest neighbor now after building the KD Trees?
Thanks!
You find the nearest neighbour exactly as described on the Wikipedia page under the heading "Nearest neighbour search". The description there applies in any number of dimensions. That is:
Go down the tree recursively from the root as if you're about to insert the point you're looking for the nearest neighbour of.
When you reach a leaf, note it as best-so-far.
On the way up the tree again, for each node as you meet it:
If it's closer than the best-so-far, update the best-so-far.
If the distance from best-so-far to the target point is greater than the distance from the target point to the splitting hyperplane at this node,
process the other child of the node too (using the same recursion).
I've recently coded up a KDTree for nearest neighbor search in 3-D space and ran into the same problems understand the NNS, particularly 3.2 of the wiki. I ended up using this algorithm which seems to work in all my tests:
Here is the initial leaf search:
public Collection<T> nearestNeighbourSearch(int K, T value) {
if (value==null) return null;
//Map used for results
TreeSet<KdNode> results = new TreeSet<KdNode>(new EuclideanComparator(value));
//Find the closest leaf node
KdNode prev = null;
KdNode node = root;
while (node!=null) {
if (KdNode.compareTo(node.depth, node.k, node.id, value)<0) {
//Greater
prev = node;
node = node.greater;
} else {
//Lesser
prev = node;
node = node.lesser;
}
}
KdNode leaf = prev;
if (leaf!=null) {
//Used to not re-examine nodes
Set<KdNode> examined = new HashSet<KdNode>();
//Go up the tree, looking for better solutions
node = leaf;
while (node!=null) {
//Search node
searchNode(value,node,K,results,examined);
node = node.parent;
}
}
//Load up the collection of the results
Collection<T> collection = new ArrayList<T>(K);
for (KdNode kdNode : results) {
collection.add((T)kdNode.id);
}
return collection;
}
Here is the recursive search which starts at the closest leaf node:
private static final <T extends KdTree.XYZPoint> void searchNode(T value, KdNode node, int K, TreeSet<KdNode> results, Set<KdNode> examined) {
examined.add(node);
//Search node
KdNode lastNode = null;
Double lastDistance = Double.MAX_VALUE;
if (results.size()>0) {
lastNode = results.last();
lastDistance = lastNode.id.euclideanDistance(value);
}
Double nodeDistance = node.id.euclideanDistance(value);
if (nodeDistance.compareTo(lastDistance)<0) {
if (results.size()==K && lastNode!=null) results.remove(lastNode);
results.add(node);
} else if (nodeDistance.equals(lastDistance)) {
results.add(node);
} else if (results.size()<K) {
results.add(node);
}
lastNode = results.last();
lastDistance = lastNode.id.euclideanDistance(value);
int axis = node.depth % node.k;
KdNode lesser = node.lesser;
KdNode greater = node.greater;
//Search children branches, if axis aligned distance is less than current distance
if (lesser!=null && !examined.contains(lesser)) {
examined.add(lesser);
double nodePoint = Double.MIN_VALUE;
double valuePlusDistance = Double.MIN_VALUE;
if (axis==X_AXIS) {
nodePoint = node.id.x;
valuePlusDistance = value.x-lastDistance;
} else if (axis==Y_AXIS) {
nodePoint = node.id.y;
valuePlusDistance = value.y-lastDistance;
} else {
nodePoint = node.id.z;
valuePlusDistance = value.z-lastDistance;
}
boolean lineIntersectsCube = ((valuePlusDistance<=nodePoint)?true:false);
//Continue down lesser branch
if (lineIntersectsCube) searchNode(value,lesser,K,results,examined);
}
if (greater!=null && !examined.contains(greater)) {
examined.add(greater);
double nodePoint = Double.MIN_VALUE;
double valuePlusDistance = Double.MIN_VALUE;
if (axis==X_AXIS) {
nodePoint = node.id.x;
valuePlusDistance = value.x+lastDistance;
} else if (axis==Y_AXIS) {
nodePoint = node.id.y;
valuePlusDistance = value.y+lastDistance;
} else {
nodePoint = node.id.z;
valuePlusDistance = value.z+lastDistance;
}
boolean lineIntersectsCube = ((valuePlusDistance>=nodePoint)?true:false);
//Continue down greater branch
if (lineIntersectsCube) searchNode(value,greater,K,results,examined);
}
}
The full java source can be found here.
I want to know,what changes should i make,when the points are
3-D(x,y,z).
You get the current axis on this line
var int axis := depth mod k;
Now depending on the axis, you find the median by comparing the corresponding property. Eg. if axis = 0 you compare against the x property. One way to implement this is to pass a comparator function in the routine that does the search.
I'm trying to find a linear-time algorithm using recursion to solve the diameter problem for a rooted k-ary tree implemented with adjacency lists. The diameter of a tree is the maximum distance between any couple of leaves. If I choose a root r (that is, a node whose degree is > 1), it can be shown that the diameter is either the maximum distance between two leaves in the same subtree or the maximum distance between two leaves of a path that go through r. My pseudocode for this problem:
Tree-Diameter(T,r)
if degree[r] = 1 then
height[r] = 0
return 0
for each v in Adj[r] do
for i = 1 to degree[r] - 1 do
d_i = Tree-Diameter(T,v)
height[r] = max_{v in Adj[r]} (height[v]
return max(d_i, max_{v in V} (height[v]) + secmax_{v in V} (height[v], 0) + 1)
To get linear time, I compute the diameter AND the height of each subtree at the same time. Then, I choose the maximum quantity between the diameters of each subtrees and the the two biggest heights of the tree + 1 (the secmax function chooses between height[v] and 0 because some subtree can have only a child: in this case, the second biggest height is 0). I ask you if this algorithm works fine and if not, what are the problems? I tried to generalize an algorithm that solve the same problem for a binary tree but I don't know if it's a good generalization.
Any help is appreciated! Thanks in advance!
In all in tree for finding diameter do as below:
Select a random node A, run BFS on this node, to find furthermost node from A. name this node as S.
Now run BFS starting from S, find the furthermost node from S, name it D.
Path between S and D is diameter of your tree. This algorithm is O(n), and just two time traverses tree. Proof is little tricky but not hard. (try yourself or if you think is not true, I'll write it later). And be careful I'm talking about Trees not general graphs. (There is no loop in tree and is connected).
This is a python implementation of what I believe you are interested in. Here, a tree is represented as a list of child trees.
def process(tree):
max_child_height=0
secmax_child_height=0
max_child_diameter=0
for child in tree:
child_height,child_diameter=process(child)
if child_height>max_child_height:
secmax_child_height=max_child_height
max_child_height=child_height
elif child_height>secmax_child_height:
secmax_child_height=child_height
if child_diameter>max_child_diameter:
max_child_diameter=child_diameter
height=max_child_height+1
if len(tree)>1:
diameter=max(max_child_diameter,max_child_height+secmax_child_height)
else:
diameter=max_child_diameter
return height,diameter
def diameter(tree):
height,diameter=process(tree)
return diameter
This is the recursive solution with Java.
import java.util.ArrayList;
import java.util.List;
public class DiameterOrNAryTree {
public int diameter(Node root) {
Result result = new Result();
getDepth(root, result);
return result.max;
}
private int getDepth(Node node, Result result) {
if (node == null) return 0;
int h1 = 0, h2 = 0;
for (Node c : node.children) {
int d = getDepth(c, result);
if (d > h1) {
h2 = h1;
h1 = d;
} else if (d > h2) h2 = d;
}
result.max = Math.max(result.max, h1 + h2);
return h1 + 1;
}
class Result {
int max;
Result() {
max = 0;
}
}
class Node {
public int val;
public List<Node> children;
public Node() {
children = new ArrayList<Node>();
}
public Node(int _val) {
val = _val;
children = new ArrayList<Node>();
}
public Node(int _val, ArrayList<Node> _children) {
val = _val;
children = _children;
}
}
}
I have a class Graph with two lists types namely nodes and edges
I have a function
List<int> GetNodesInRange(Graph graph, int Range)
when I get these parameters I need an algorithm that will go through the graph and return the list of nodes only as deep (the level) as the range.
The algorithm should be able to accommodate large number of nodes and large ranges.
Atop this, should I use a similar function
List<int> GetNodesInRange(Graph graph, int Range, int selected)
I want to be able to search outwards from it, to the number of nodes outwards (range) specified.
alt text http://www.freeimagehosting.net/uploads/b110ccba58.png
So in the first function, should I pass the nodes and require a range of say 2, I expect the results to return the nodes shown in the blue box.
The other function, if I pass the nodes as in the graph with a range of 1 and it starts at node 5, I want it to return the list of nodes that satisfy this criteria (placed in the orange box)
What you need seems to be simply a depth-limited breadth-first search or depth-first search, with an option of ignoring edge directionality.
Here's a recursive definition that may help you:
I'm the only one of range 1 from myself.
I know who my immediate neighbors are.
If N > 1, then those of range N from myself are
The union of all that is of range N-1 from my neighbors
It should be a recursive function, that finds neighbours of the selected, then finds neighbours of each neighbour until range is 0. DFS search something like that:
List<int> GetNodesInRange(Graph graph, int Range, int selected){
var result = new List<int>();
result.Add( selected );
if (Range > 0){
foreach ( int neighbour in GetNeighbours( graph, selected ) ){
result.AddRange( GetNodesInRange(graph, Range-1, neighbour) );
}
}
return result;
}
You should also check for cycles, if they are possible. This code is for tree structure.
// get all the nodes that are within Range distance of the root node of graph
Set<int> GetNodesInRange(Graph graph, int Range)
{
Set<int> out = new Set<int>();
GetNodesInRange(graph.root, int Range, out);
return out;
}
// get all the nodes that are within Range successor distance of node
// accepted nodes are placed in out
void GetNodesInRange(Node node, int Range, Set<int> out)
{
boolean alreadyVisited = out.add(node.value);
if (alreadyVisited) return;
if (Range == 0) return;
// for each successor node
{
GetNodesInRange(successor, Range-1, out);
}
}
// get all the nodes that are within Range distance of selected node in graph
Set<int> GetNodesInRange(Graph graph, int Range, int selected)
{
Set<int> out = new Set<int>();
GetNodesInRange(graph, Range, selected, out);
return out;
}
// get all the nodes that are successors of node and within Range distance
// of selected node
// accepted nodes are placed in out
// returns distance to selected node
int GetNodesInRange(Node node, int Range, int selected, Set<int> out)
{
if (node.value == selected)
{
GetNodesInRange(node, Range-1, out);
return 1;
}
else
{
int shortestDistance = Range + 1;
// for each successor node
{
int distance = GetNodesInRange(successor, Range, selected, out);
if (distance < shortestDistance) shortestDistance = distance;
}
if (shortestDistance <= Range)
{
out.add(node.value);
}
return shortestDistance + 1;
}
}
I modified your requirements somewhat to return a Set rather than a List.
The GetNodesInRange(Graph, int, int) method will not handle graphs that contain cycles. This can be overcome by maintaining a collection of nodes that have already been visited. The GetNodesInRange(Graph, int) method makes use of the fact that the out set is a collection of visited nodes to overcome cycles.
Note: This has not been tested in any way.