Efficiently find marked referenced records - algorithm

I have
a few million records in a database that
reference each other (a directed acyclic graph). There are direct references (A -> B) and indirect references (if A -> B and B -> C, then A -> C). Indirect references can have any recursion depths, but in reality the depth is at most 100. This is very similar to objects in an object oriented language can reference other objects, recursively, except that cycles are not allowed.
A record can have between zero and 100 direct references.
Each record can be marked or not (most records are not marked).
Problem
I'm looking for an efficient data structure and algorithm to find all marked referenced (directly or indirectly referenced) records given a set of records (often just one, or up to 100). There are directly marked records (if a directly referenced record is marked), or indirectly marked records (if an indirectly referenced record is marked).
Reading the records is relatively slow, let's say 2 milliseconds per record.
I'm not looking for using a faster storage or similar here. I know it is possible, but it is quite hard to keep in sync. I'm trying to add a secondary data structure that contains just the relevant data. This will speed things up quite a bit (maybe factor of 10 or even 100), but bring a constant-factor improvement. I'm still interested in understanding if it's possible to improve the algorithm, if the amount of data grows.
Ideas
I have thought about the following options:
Brute force: One algorithm would be to search for all (directly or indirectly referenced) entries, and filter for marked entries. But that is slow, obviously, as I have to process all (directly or indirectly) referenced entries. Maybe none are marked, but 20'000 are referenced.
Shadow mark: Another algorithm would be to have a reverse index (which entries are referencing which other entries), and then each time an entry is marked, also "shadow-mark" all the entries that reference this entry, recursively. That way, when searching for marked entries, we can filter for those that have the "shadow-mark" set. The disadvantage is that many updates are needed if an entry is marked. A related option would be using a Bloom filter for shadow marking. But this would just reduce the memory usage.
Let's say we maintain a "maximum-depth" which is the maximum depth of a tree (the maximum number of hops from any record). And then we use the shadown-mark algorithm from above, but only partially: only up to maximum-depth / 2 recursion levels. So we limit propagating the shadow-mark. And then, for a query, we also limit the recursion depth to maximum-depth / 2. That way, we will "meet in the middle" in the worst case. (I should probably draw a picture.) A sub-problem is then how to efficiently maintain this maximum-depth.
I wonder, is there something similar to this approach? Something that doesn't require many updates when marking an entry, and doesn't require too many reads when querying? Or maybe a solution that allows to gradually update entries, if an entry is marked?
Example
In this example (blue is "marked"), for example if I search for (indirectly) referenced marked records for 5, I would like to quickly find 1 and 3.

You could keep a table on each node that records which marked nodes are reachable from it, and keep it updated whenever a node (or edge) is added or removed from the graph, similar to network routing tables are kept for each node in a network. There are a couple of specifics about your problem that make it simpler than a network routing table though:
You don't want to know the actual path to the marked nodes from a given node, only that one (or more) exists.
The graph is acyclic.
It's not a distributed system so you have full control (obviously ...).
Because you don't care about path and because the graph is acyclic, the table on each node can be a map marked_node_id -> count where count is the number of paths from the given node to the given marked-node. When a new node is added the new node's table is built as the union of all the nodes tables adjacent to the new node where count is the sum. Additionally, the tables of all nodes adjacent from the new node have to be updated by adding the new node's table to each of them, and this has to be done recursively up the adjacent from chain. When a node is deleted you have to do similar.
Basic complexity analysis: Finding all marked nodes of a given node is O(1) and can be done with info stashed on a single node - which is the whole point. In general, adding and removing an edge (or a new node plus its edges) will require updating tables of all connected nodes recursively (upto a call depth of 100 and branching factor upto 100). Building tables initially would be O(number-of-nodes) by reverse flooding from marked nodes.
Code Example:
This is abstract and in-code solution but should translate. I'm using Python (+GraphViz) because you didn't specify a language, it's probably most accessible to widest audience, and is easy to prototype in. I'm also going to only implement add/remove node operations (to modify a node can remove then add with different initialization) and build the graph from scratch which isn't really realistic, but you can build tables initially given an existing graph by working backwards from marked nodes pretty easily. Also note:
The following require each node to have/maintain an adjacent_from list in addition to adjacent_to list so we can recurse up the adjacent from paths when a given node is deleted.
I've assumed each marked node is reachable from itself - just makes things bit easier to implement.
def main():
''' Build a test graph, then test. '''
graph = Graph()
a = graph.add_node('a', marked=True)
b = graph.add_node('b', marked=True)
c = graph.add_node('c', marked=True)
d = graph.add_node('d', adjacent_to=[a])
e = graph.add_node('e', adjacent_to=[d])
f = graph.add_node('f',adjacent_to=[c])
g = graph.add_node('g', adjacent_to=[d,f])
h = graph.add_node('h', adjacent_to=[e,g])
i = graph.add_node('i')
j = graph.add_node('j', marked=True, adjacent_to=[i])
k = graph.add_node('k', adjacent_to=[j])
l = graph.add_node('l', adjacent_to=[k])
m = graph.add_node('m', adjacent_to=[j])
with open('main0.dot', 'w') as f:
f.write(graph.gviz())
graph.delete_node('f')
with open('main1.dot', 'w') as f:
f.write(graph.gviz())
graph.delete_node('e')
with open('main2.dot', 'w') as f:
f.write(graph.gviz())
graph.delete_node('g')
with open('main3.dot', 'w') as f:
f.write(graph.gviz())
# Run this script to process graphviz files: for i in *.dot; do dot -Tpng $i > "${i%%.dot}.png"; done
class Graph:
''' Container for nodes. '''
def __init__(self):
self.nodes = {}
def add_node(self, id, marked=False, adjacent_to=[]):
assert id not in self.nodes
self.nodes[id] = Node(id, marked, adjacent_to)
return self.nodes[id]
def delete_node(self, id):
assert id in self.nodes
node = self.nodes[id]
self._recursive_subtract_table_on_delete(node, node)
for adjacent_from_node in node.adjacent_from:
adjacent_from_node._remove_adjacent_node(node.id)
del self.nodes[id]
def _recursive_subtract_table_on_delete(self, node, deleted_node):
for adjacent_from_node in node.adjacent_from:
self._recursive_subtract_table_on_delete(adjacent_from_node, deleted_node)
node._delete_reachability_table(deleted_node)
def gviz(self):
return 'strict digraph {\n%s}' % ''.join([n._gviz_edges() for n in self.nodes.values()])
class Node:
def __init__(self, id, marked=False, adjacent_to = []):
''' Init node. Note only adjacent_to not adjacent_from node are allowed,
which measn we dno't have to update adjacent_from reachable_marks. '''
self.id = id
self.marked = marked
self.adjacent_to = adjacent_to
self.adjacent_from = []
self.reachable_marks = {}
if marked:
self.reachable_marks[id] = 1
for adjacent_node in adjacent_to:
adjacent_node.adjacent_from.append(self);
self._add_reachability_table(adjacent_node)
def _add_reachability_table(self, node):
''' Add the reachable_marks table from node to self. '''
for (marked_node_id, k) in node.reachable_marks.items():
self.reachable_marks[marked_node_id] = self.reachable_marks[marked_node_id] + 1 if marked_node_id in self.reachable_marks else 1
def _delete_reachability_table(self, node):
''' Delete the reachable_marks table from node from self. '''
for (marked_node_id, k) in node.reachable_marks.items():
self.reachable_marks[marked_node_id] = self.reachable_marks[marked_node_id] - 1 if marked_node_id in self.reachable_marks else 0
self.reachable_marks = {k: v for k,v in self.reachable_marks.items() if v}
def _remove_adjacent_node(self, id):
self.adjacent_to = list(filter(lambda n: n.id != id, self.adjacent_to))
def _gviz_edges(self):
''' Helper to print graphviz edges adjacent to this node. '''
_str = ''
if self.marked:
_str += ' %s[style=filled,fillcolor=blue]\n' % (self._gviz_name(),)
else:
_str += self._gviz_name() + '\n'
for adjacent_node in self.adjacent_to:
_str += ' %s -> %s\n' % (self._gviz_name(), adjacent_node._gviz_name())
return _str;
def _gviz_name(self):
''' Helper to print graphviz name with reachable marks. '''
return '"' + self.id + '(' + ','.join(self.reachable_marks.keys()) + ')"'
if __name__ == '__main__':
main()
Results:
The output graph shows marked nodes reachable from each node in brackets.
Initial:
Remove node f:
Remove node e:
Remove node g:

This problem is related to fully dynamic transitive closure. I'm not intimately familiar with the research literature on the latter (probably most of which is not practical), but there is one algorithmic trick that you might not know about, related to your "maximum depth" idea.
Add a binary flag ("open" or "closed") to each node, and store both incoming and outgoing arcs. The rules are, every node that can reach an open node is open, and (equivalently) every node that can be reached by a closed node is closed. Each closed node also stores the set of marked nodes that it can reach. To query, traverse forward (outgoing arcs) from the queried node via open nodes, stopping at closed nodes. To update, traverse backward (incoming arcs) from the updated node via closed nodes, stopping at open nodes.
A closed node with incoming arcs from open nodes only can be converted to open. An open node with outgoing arcs to closed nodes only can be converted to closed. Conversion requires updates proportional to (in- or out-) degree. At this scale, I would suggest dumping the whole graph periodically and computing a reasonable set of adjustments in main memory.

To find all the marked records that are reachable from a given record is equivalent counting the marked records in the component that contains the given record.
This can be done with breadth first or depth first search.
There is no faster algorithm. To improve your performance I believe you need to:
Implement an efficient search code using an optimising compiler
Switch to a high performance database engine
Optimize your queries. ( Do not read records one at a time! )
Optimize your hardware configuration ( no networks, no spinning disks )

I have implemented 4 query algorithms with real-world data. All algorithms are don't require new data structures, or updates when marking an entry.
The test uses real-world data, with 40'000 nodes, and for each node tries to find a connection to 200 other random nodes. I expect that with more data, the results will be similar, because I expect that the "shape" of the data is very similar. For this experiment, I report the minimum, maximum, and average number of nodes read for each check.
Depth-first search: min: 1, max: 1659, avg: 4.03
Breath-first search: min: 1, max: 1659, avg: 4.02
Reverse breath-first search: min: 1, max: 102859, avg: 4.21
Bidirectional adaptive breath-first search: min: 1, max: 174, avg: 1.29
The best algorithm is 3.11 times faster than breath-first search (BFS). Reverse BFS is slower than BFS, due to the "shape" of the data: It seems each node references at most a few children. But there are a few nodes that are referenced by a lot of other nodes. So reverse search can be slow (max is much higher for reverse BFS).
Bidirectional adaptive BFS uses the following algorithm:
The idea is to do a mix of BFS and reverse BFS, in a balanced way.
We use a mix of downward search (source to target), and upward search (target to source).
We remember the nodes we have seen so far in both directions.
At each step, we either expand downwards, or upwards.
Which direction we go at each step depends on how many nodes we have seen in the last step on the way down, versus on the way up: If we have seen more on the way down, then the next step is to go up. Otherwise, go down.
Implementation (Java):
System.out.println("Number of nodes: " + nodes.size());
System.out.println();
int dfsCount = 0, bfsCount = 0, revCount = 0, biCount = 0;
int dfsMin = Integer.MAX_VALUE, bfsMin = Integer.MAX_VALUE, revMin = Integer.MAX_VALUE, biMin = Integer.MAX_VALUE;
int dfsMax = Integer.MIN_VALUE, bfsMax = Integer.MIN_VALUE, revMax = Integer.MIN_VALUE, biMax = Integer.MIN_VALUE;
int totalCount = 0;
Random r = new Random(1);
for (int i = 0; i < nodeList.size(); i++) {
int a = i;
for (int j = 0; j < 200; j++) {
int b;
do {
b = r.nextInt(nodes.size());
} while (a == b);
Node na = nodeList.get(a);
Node nb = nodeList.get(b);
totalCount++;
AtomicInteger x = new AtomicInteger();
boolean r1 = depthFirstSearch(x, na, nb);
dfsCount += x.get();
dfsMin = Math.min(dfsMin, x.get());
dfsMax = Math.max(dfsMax, x.get());
x.set(0);
boolean r2 = breathFirstSearch(x, na, nb);
bfsCount += x.get();
bfsMin = Math.min(bfsMin, x.get());
bfsMax = Math.max(bfsMax, x.get());
x.set(0);
boolean r3 = reverseBreathFirstSearch(x, na, nb);
revCount += x.get();
revMin = Math.min(revMin, x.get());
revMax = Math.max(revMax, x.get());
x.set(0);
boolean r4 = bidirectionalAdaptiveBFS(x, na, nb);
biCount += x.get();
biMin = Math.min(biMin, x.get());
biMax = Math.max(biMax, x.get());
x.set(0);
if (r1 != r2 || r1 != r3 || r1 != r4) {
depthFirstSearchTrace(na, nb);
bidirectionalAdaptiveBFS(x, na, nb);
throw new AssertionError(r1 + " " + r2 + " " + r3 + " " + r4);
}
}
}
System.out.println("Depth-first search");
System.out.printf("min: %d, max: %d, avg: %2.2f\n", dfsMin, dfsMax, ((double) dfsCount / totalCount));
System.out.println();
System.out.println("Breath-first search");
System.out.printf("min: %d, max: %d, avg: %2.2f\n", bfsMin, bfsMax, ((double) bfsCount / totalCount));
System.out.println();
System.out.println("Reverse breath-first search");
System.out.printf("min: %d, max: %d, avg: %2.2f\n", revMin, revMax, ((double) revCount / totalCount));
System.out.println();
System.out.println("Bidirectional adaptive breath-first search");
System.out.printf("min: %d, max: %d, avg: %2.2f\n", biMin, biMax, ((double) biCount / totalCount));
System.out.println();
static boolean depthFirstSearch(AtomicInteger count, Node source, Node target) {
HashSet<Node> tested = new HashSet<>();
tested.add(source);
return depthFirstSearch(count, tested, source, target);
}
static boolean depthFirstSearch(AtomicInteger count, HashSet<Node> tested, Node source, Node target) {
count.incrementAndGet();
for(Node n : source.references) {
if (n == target) {
return true;
}
if (!tested.contains(n)) {
tested.add(n);
if (depthFirstSearch(count, n, target)) {
return true;
}
}
}
return false;
}
static boolean breathFirstSearch(AtomicInteger count, Node source, Node target) {
HashSet<Node> tested = new HashSet<>();
tested.add(source);
return breathFirstSearch(count, tested, source, target);
}
static boolean breathFirstSearch(AtomicInteger count, HashSet<Node> tested, Node source, Node target) {
count.incrementAndGet();
for(Node n : source.references) {
if (n == target) {
return true;
}
}
for(Node n : source.references) {
if (!tested.contains(n)) {
tested.add(n);
if (breathFirstSearch(count, n, target)) {
return true;
}
}
}
return false;
}
static boolean reverseBreathFirstSearch(AtomicInteger count, Node source, Node target) {
HashSet<Node> tested = new HashSet<>();
tested.add(target);
return reverseBreathFirstSearch(count, tested, source, target);
}
static boolean reverseBreathFirstSearch(AtomicInteger count, HashSet<Node> tested, Node source, Node target) {
count.incrementAndGet();
for(Node n : target.referencedBy) {
if (n == source) {
return true;
}
}
for(Node n : target.referencedBy) {
if (!tested.contains(n)) {
tested.add(n);
if (breathFirstSearch(count, source, n)) {
return true;
}
}
}
return false;
}
static boolean bidirectionalAdaptiveBFS(AtomicInteger count, Node source, Node target) {
HashSet<Node> allSources = new HashSet<>();
HashSet<Node> sources = new HashSet<>();
allSources.add(source);
sources.add(source);
HashSet<Node> allTargets = new HashSet<>();
HashSet<Node> targets = new HashSet<>();
allTargets.add(target);
targets.add(target);
return bidirectionalAdaptiveBFS(count, allSources, allTargets, sources, targets);
}
static boolean bidirectionalAdaptiveBFS(AtomicInteger count, Set<Node> allSources, Set<Node> allTargets, Set<Node> sources, Set<Node> targets) {
while (!sources.isEmpty() && !targets.isEmpty()) {
if (sources.size() <= targets.size()) {
HashSet<Node> newSources = new HashSet<>();
for(Node source: sources) {
count.incrementAndGet();
for(Node n : source.references) {
if (!allSources.contains(n)) {
newSources.add(n);
allSources.add(n);
if (allTargets.contains(n)) {
return true;
}
}
}
}
sources = newSources;
} else {
HashSet<Node> newTargets = new HashSet<>();
for(Node target: targets) {
count.incrementAndGet();
for(Node n : target.referencedBy) {
if (!allTargets.contains(n)) {
newTargets.add(n);
allTargets.add(n);
if (allSources.contains(n)) {
return true;
}
}
}
}
targets = newTargets;
}
}
return false;
}
static class Node {
String name;
HashSet<Node> references = new HashSet<>();
HashSet<Node> referencedBy = new HashSet<>();
boolean marked;
Node(String name) {
this.name = name;
}
void addReference(Node n) {
references.add(n);
n.referencedBy.add(this);
}
public String toString() {
return name;
}
#Override
public boolean equals(Object other) {
if (!(other instanceof Node)) {
return false;
}
return name.equals(((Node) other).name);
}
#Override
public int hashCode() {
return name.hashCode();
}
}

Related

How to calculate a height of a tree

I am trying to learn DSA and got stuck on one problem.
How to calculate height of a tree. I mean normal tree, not any specific implementation of tree like BT or BST.
I have tried google but seems everyone is talking about Binary tree and nothing is available for normal tree.
Can anyone help me to redirect to some page or articles to calculate height of a tree.
Lets say a typical node in your tree is represented as Java class.
class Node{
Entry entry;
ArrayList<Node> children;
Node(Entry entry, ArrayList<Node> children){
this.entry = entry;
this.children = children;
}
ArrayList<Node> getChildren(){
return children;
}
}
Then a simple Height Function can be -
int getHeight(Node node){
if(node == null){
return 0;
}else if(node.getChildren() == null){
return 1;
} else{
int childrenMaxHeight = 0;
for(Node n : node.getChildren()){
childrenMaxHeight = Math.max(childrenMaxHeight, getHeight(n));
}
return 1 + childrenMaxHeight;
}
}
Then you just need to call this function passing the root of tree as argument. Since it traverse all the node exactly once, the run time is O(n).
1. If height of leaf node is considered as 0 / Or height is measured depending on number of edges in longest path from root to leaf :
int maxHeight(treeNode<int>* root){
if(root == NULL)
return -1; // -1 beacuse since a leaf node is 0 then NULL node should be -1
int h=0;
for(int i=0;i<root->childNodes.size();i++){
temp+=maxHeight(root->childNodes[i]);
if(temp>h){
h=temp;
}
}
return h+1;
}
2. If height of root node is considered 1:
int maxHeight(treeNode<int>* root){
if(root == NULL)
return 0;
int h=0;
for(int i=0;i<root->childNodes.size();i++){
temp+=maxHeight(root->childNodes[i]);
if(temp>h){
h=temp;
}
}
return h+1;
Above Code is based upon following class :
template <typename T>
class treeNode{
public:
T data;
vector<treeNode<T>*> childNodes; // vector for storing pointer to child treenode
creating Tree node
treeNode(T data){
this->data = data;
}
};
In case of 'normal tree' you can recursively calculate the height of tree in similar fashion to a binary tree but here you will have to consider all children at a node instead of just two.
To find a tree height a BFS iteration will work fine.
Edited form Wikipedia:
Breadth-First-Search(Graph, root):
create empty set S
create empty queues Q1, Q2
root.parent = NIL
height = -1
Q1.enqueue(root)
while Q1 is not empty:
height = height + 1
switch Q1 and Q2
while Q2 is not empty:
for each node n that is adjacent to current:
if n is not in S:
add n to S
n.parent = current
Q1.enqueue(n)
You can see that adding another queue allows me to know what level of the tree.
It iterates for each level, and for each mode in that level.
This is a discursion way to do it (opposite of recursive). So you don't have to worry about that too.
Run time is O(|V|+ |E|).

My A* pathfinding implementation does not produce the shortest path

I am building a flash game that requires correct pathfinding. I used the pseudo code in this tutorial and a diagonal heuristic. I did not closely follow their code. The language is ActionScript 3 and I am also using flashpunk libraries.
My current issue is that the program is producing a path that is clearly not the shortest path possible. Here is a screenshot showing the problem:
The grey blocks are non traversable, the green blocks mark nodes that have been "visited" and the blue blocks show the path generated by the algorithm.
It looks as if the diagonal travel cost is equal to the non-diagonal travel cost, despite my attempt to make the diagonal cost higher (1.414).
This is the overall algorithm implementation.
function solveMaze() {
// intitialize starting node
startingNode.g = 0;
startingNode.h = diagonalHeuristic(startingNode, destinationNode);
startingNode.f = startingNode.g + startingNode.h;
// Loop until destination node has been reached.
while (currentNode != destinationNode) {
if (openNodes.length == 0) {
return null;
}
// set lowest cost node in openNode list to current node
currentNode = lowestCostInArray(openNodes);
//remove current node from openList
openNodes.splice(openNodes.indexOf(currentNode), 1);
//find 8 nodes adjacent to current node
connectedNodes = findConnectedNodes(currentNode);
//for each adjacent node,
for each (var n:Node in connectedNodes) {
// if node is not in open list AND its not in closed list AND its traversable
if ((openNodes.indexOf(n) == -1) && (closedNodes.indexOf(n) == -1) && n.traversable) {
// Calculate g and h values for the adjacent node and add the adjacent node to the open list
// also set the current node as the parent of the adjacent node
if ((n.mapX != currentNode.mapX) && (n.mapY != currentNode.mapY)) {
cost = 1.414;
} else {
cost = 1;
}
if(n.g> currentNode.g + cost){
n.g = currentNode.g + cost;
n.f=calculateCostOfNode(n);
n.parentNode =currentNode;
openNodes.push(n);
}
}
}
// turn current node into grass to indicate its been traversed
currentNode.setType("walked_path");
//var temp2:TextEntity = new TextEntity(n.h.toFixed(1).toString(), 32 * currentNode.mapX, 32 * currentNode.mapY);
//add(temp2);
// add current node to closed list
closedNodes.push(currentNode);
}
// create a path from the destination node back to the starting node by following each parent node
var tempNode:Node = destinationNode.parentNode;
tempNode.setType("path2"); // blue blocks
while(tempNode != startingNode){
tempNode = tempNode.parentNode;
tempNode.setType("path2");
}
}
These were the helper functions used:
function findConnectedNodes(inputNode:Node):Array {
var outputArray:Array=[];
// obtain all nodes that are either 1 unit away or 1.4 units away.
for each (var n:Node in listOfNodes){
if ((diagonalHeuristic(inputNode, n) == 1)||(diagonalHeuristic(inputNode, n) == 1.4) {
outputArray.push(n);
}
}
return outputArray;
}
public static function diagonalHeuristic(node:Node, destinationNode:Node, cost:Number = 1.0, diagonalCost:Number = 1.4):Number {
var dx:Number = Math.abs(node.mapX - destinationNode.mapX);
var dy:Number = Math.abs(node.mapY - destinationNode.mapY);
if (dx > dy) {
return diagonalCost * dy + (dx - dy);
}else {
return diagonalCost * dx + (dy - dx);
}
}
function lowestCostInArray(inputArray:Array):Node {
var tempNode:Node = inputArray[0];
for each (var n:Node in inputArray) {
if (n.f < tempNode.f) {
tempNode = n;
}
}
return tempNode;
}
I can provide the project source code if it would help.
I see a few potential things wrong.
You are potentially overwriting values here:
n.g = currentNode.g + cost;
n.f=calculateCostOfNode(n);
n.parentNode =currentNode;
openNodes.push(n);
It should be:
if n.g > currentNode.g + cost:
n.g = currentNode.g + cost;
n.f=calculateCostOfNode(n);
n.parentNode =currentNode;
if n not already in openNodes:
openNodes.push(n);
With n.g initiated to a very large value, or you can do the check like if n not in the open set or n.g > currentNode.g + cost.
You should remove the check if ((openNodes.indexOf(n) == -1) from where you have it now and put it where I said. If the new g cost is better, you should update it, even if it's in the open list. You only update each node once. If it so happens that you check diagonals first, you will completely ignore side steps.
This is likely the problem: by ignoring neighbors that are in the open list, you will only update their cost once. It is OK to update their cost as long as they are not in the closed list.
I don't know for sure if this is a valid concern, but I think you're playing with fire a little by using 1.414 in the heuristic function. The heuristic function has to be admissible, which means it should never overestimate the cost. If you run into some floating point issues, you might overestimate. I'd play it safe and use 1.4 for the heuristic and 1.414 for the actual cost between diagonally adjacent nodes.

Dijkstra algorithm optimization regarding priority queueu

I use the code below in a simulation. Because I am calling the dijkstra method over and over, performance is very crucial for me. , I use PriorityQueue to keep the nodes of the graph in an ascending order relative to their distance to the source. PriorityQueue provides me to access the node with smallest distance with O(log n) complexity. However,
to keep the nodes in order after recalculating a nodes distance, I need to first remove the node, than add it again. I suppose there may be a better way. I appreciate for ANY feedback. Thanks in advance for all community.
public HashMap<INode, Double> getSingleSourceShortestDistance(INode sourceNode) {
HashMap<INode, Double> distance = new HashMap<>();
PriorityQueue<INode> pq;
// The nodes are stored in a priority queue in which all nodes are sorted
according to their estimated distances.
INode u = null;
INode v = null;
double alt;
Set<INode> nodeset = nodes.keySet();
Iterator<INode> iter = nodeset.iterator();
//Mark all nodes with infinity
while (iter.hasNext()) {
INode node = iter.next();
distance.put(node, Double.POSITIVE_INFINITY);
previous.put(node, null);
}
iter = null;
// Mark the distance[source] as 0
distance.put(sourceNode, 0d);
pq = new PriorityQueue<>(this.network.getNodeCount(), new NodeComparator(distance));
pq.addAll(nodeset);
// Loop while q is empty
while (!pq.isEmpty()) {
// Fetch the node with the smallest estimated distance.
u = pq.peek();
/**
* break the loop if the distance is greater than the max net size.
* That shows that the nodes in the queue can not be reached from
* the source node.
*/
if ((Double.isInfinite(distance.get(u).doubleValue()))) {
break;
}
// Remove the node with the smallest estimated distance.
pq.remove(u);
// Iterate over all nodes (v) which are neighbors of node u
iter = nodes.get(u).keySet().iterator();
while (iter.hasNext()) {
v = (INode) iter.next();
alt = distance.get(u) + nodes.get(u).get(v).getDistance();
if (alt < distance.get(v)) {
distance.put(v, alt);
//To reorder the queue node v is first removed and then inserted.
pq.remove(v);
pq.add(v);
}
}
}
return distance;
}
protected static class NodeComparator<INode> implements Comparator<INode> {
private Map<INode, Number> distances;
protected NodeComparator(Map<INode, Number> distances) {
this.distances = distances;
}
#Override
public int compare(INode node1, INode node2) {
return ((Double) distances.get(node1)).compareTo((Double) distances.get(node2));
}
}
You could use a Heap with increase_key and decrease_key implemented, so you could update the node distance without removing and adding it again.

Binary tree level order traversal

Three types of tree traversals are inorder, preorder, and post order.
A fourth, less often used, traversal is level-order traversal. In a
level-order traveresal, all nodes at depth "d" are processed before
any node at depth d + 1. Level-order traversal differs from the other
traversals in that it is not done recursively; a queue is used,
instead of the implied stack of recursion.
My questions on above text snippet are
Why level order traversals are not done recursively?
How queue is used in level order traversal? Request clarification with Pseudo code will be helpful.
Thanks!
Level order traversal is actually a BFS, which is not recursive by nature. It uses Queue instead of Stack to hold the next vertices that should be opened. The reason for it is in this traversal, you want to open the nodes in a FIFO order, instead of a LIFO order, obtained by recursion
as I mentioned, the level order is actually a BFS, and its [BFS] pseudo code [taken from wikipedia] is:
1 procedure BFS(Graph,source):
2 create a queue Q
3 enqueue source onto Q
4 mark source
5 while Q is not empty:
6 dequeue an item from Q into v
7 for each edge e incident on v in Graph:
8 let w be the other end of e
9 if w is not marked:
10 mark w
11 enqueue w onto Q
(*) in a tree, marking the vertices is not needed, since you cannot get to the same node in 2 different paths.
void levelorder(Node *n)
{ queue < Node * >q;
q.push(n);
while(!q.empty())
{
Node *node = q.front();
cout<<node->value;
q.pop();
if(node->left != NULL)
q.push(node->left);
if (node->right != NULL)
q.push(node->right);
}
}
Instead of a queue, I used a map to solve this. Take a look, if you are interested. As I do a postorder traversal, I maintain the depth at which each node is positioned and use this depth as the key in a map to collect values in the same level
class Solution {
public:
map<int, vector<int> > levelValues;
void recursivePrint(TreeNode *root, int depth){
if(root == NULL)
return;
if(levelValues.count(root->val) == 0)
levelValues.insert(make_pair(depth, vector<int>()));
levelValues[depth].push_back(root->val);
recursivePrint(root->left, depth+1);
recursivePrint(root->right, depth+1);
}
vector<vector<int> > levelOrder(TreeNode *root) {
recursivePrint(root, 1);
vector<vector<int> > result;
for(map<int,vector<int> >::iterator it = levelValues.begin(); it!= levelValues.end(); ++it){
result.push_back(it->second);
}
return result;
}
};
The entire solution can be found here - http://ideone.com/zFMGKU
The solution returns a vector of vectors with each inner vector containing the elements in the tree in the correct order.
you can try solving it here - https://oj.leetcode.com/problems/binary-tree-level-order-traversal/
And, as you can see, we can also do this recursively in the same time and space complexity as the queue solution!
My questions on above text snippet are
Why level order traversals are not done recursively?
How queue is used in level order traversal? Request clarification with Pseudo code will be helpful.
I think it'd actually be easier to start with the second question. Once you understand the answer to the second question, you'll be better prepared to understand the answer to the first.
How level order traversal works
I think the best way to understand how level order traversal works is to go through the execution step by step, so let's do that.
We have a tree.
We want to traverse it level by level.
So, the order that we'd visit the nodes would be A B C D E F G.
To do this, we use a queue. Remember, queues are first in, first out (FIFO). I like to imagine that the nodes are waiting in line to be processed by an attendant.
Let's start by putting the first node A into the queue.
Ok. Buckle up. The setup is over. We're about to start diving in.
The first step is to take A out of the queue so it can be processed. But wait! Before we do so, let's put A's children, B and C, into the queue also.
Note: A isn't actually in the queue anymore at this point. I grayed it out to try to communicate this. If I removed it completely from the diagram, it'd make it harder to visualize what's happening later on in the story.
Note: A is being processed by the attendant at the desk in the diagram. In real life, processing a node can mean a lot of things. Using it to compute a sum, send an SMS, log to the console, etc, etc. Going off the metaphor in my diagram, you can tell the attendant how you want them to process the node.
Now we move on to the node that is next in line. In this case, B.
We do the same thing that we did with A: 1) add the children to the line, and 2) process the node.
Hey, check it out! It looks like what we're doing here is going to get us that level order traversal that we were looking for! Let's prove this to ourselves by continuing the step through.
Once we finish with B, C is next in line. We place C's children at the back of the line, and then process C.
Now let's see what happens next. D is next in line. D doesn't have any children, so we don't place anything at the back of the line. We just process D.
And then it's the same thing for E, F, and G.
Why it's not done recursively
Imagine what would happen if we used a stack instead of a queue. Let's rewind to the point where we had just visited A.
Here's how it'd look if we were using a stack.
Now, instead of going "in order", this new attendant likes to serve the most recent clients first, not the ones who have been waiting the longest. So C is who is up next, not B.
Here's where the key point is. Where the stack starts to cause a different processing order than we had with the queue.
Like before, we add C's children and then process C. We're just adding them to a stack instead of a queue this time.
Now, what's next? This new attendant likes to serve the most recent clients first (ie. we're using a stack), so G is up next.
I'll stop the execution here. The point is that something as simple as replacing the queue with a stack actually gives us a totally different execution order. I'd encourage you to finish the step through though.
You might be thinking: "Ok... but the question asked about recursion. What does this have to do with recursion?" Well, when you use recursion, something sneaky is going on. You never did anything with a stack data structure like s = new Stack(). However, the runtime uses the call stack. This ends up being conceptually similar to what I did above, and thus doesn't give us that A B C D E F G ordering we were looking for from level order traversal.
https://github.com/arun2pratap/data-structure/blob/master/src/main/java/com/ds/tree/binarytree/BinaryTree.java
for complete can look out for the above link.
public void levelOrderTreeTraversal(List<Node<T>> nodes){
if(nodes == null || nodes.isEmpty()){
return;
}
List<Node<T>> levelNodes = new ArrayList<>();
nodes.stream().forEach(node -> {
if(node != null) {
System.out.print(" " + node.value);
levelNodes.add(node.left);
levelNodes.add(node.right);
}
});
System.out.println("");
levelOrderTreeTraversal(levelNodes);
}
Also can check out
http://www.geeksforgeeks.org/
here you will find Almost all Data Structure related answers.
Level order traversal implemented by queue
# class TreeNode:
# def __init__(self, val=0, left=None, right=None):
# self.val = val
# self.left = left
# self.right = right
def levelOrder(root: TreeNode) -> List[List[int]]:
res = [] # store the node value
queue = [root]
while queue:
node = queue.pop()
# visit the node
res.append(node.val)
if node.left:
queue.insert(0, node.left)
if node.right:
queue.insert(0, node.right)
return res
Recursive implementation is also possible. However, it needs to know the max depth of the root in advance.
def levelOrder(root: TreeNode) -> List[int]:
res = []
max_depth = maxDepth(root)
for i in range(max_depth):
# level start from 0 to max_depth-1
visitLevel(root, i, action)
return res
def visitLevel(root:TreeNode, level:int, res: List):
if not root:
return
if level==0:
res.append(node.val)
else:
self.visitLevel(root.left, level-1, res)
self.visitLevel(root.right, level-1, res)
def maxDepth(root: TreeNode) -> int:
if not root:
return 0
if not root.left and not root.right:
return 1
return max([ maxDepth(root.left), maxDepth(root.right)]) + 1
For your point 1) we can use Java below code for level order traversal in recursive order, we have not used any library function for tree, all are user defined tree and tree specific functions -
class Node
{
int data;
Node left, right;
public Node(int item)
{
data = item;
left = right = null;
}
boolean isLeaf() { return left == null ? right == null : false; }
}
public class BinaryTree {
Node root;
Queue<Node> nodeQueue = new ConcurrentLinkedDeque<>();
public BinaryTree() {
root = null;
}
public static void main(String args[]) {
BinaryTree tree = new BinaryTree();
tree.root = new Node(1);
tree.root.left = new Node(2);
tree.root.right = new Node(3);
tree.root.left.left = new Node(4);
tree.root.left.right = new Node(5);
tree.root.right.left = new Node(6);
tree.root.right.right = new Node(7);
tree.root.right.left.left = new Node(8);
tree.root.right.left.right = new Node(9);
tree.printLevelOrder();
}
/*Level order traversal*/
void printLevelOrder() {
int h = height(root);
int i;
for (i = 1; i <= h; i++)
printGivenLevel(root, i);
System.out.println("\n");
}
void printGivenLevel(Node root, int level) {
if (root == null)
return;
if (level == 1)
System.out.print(root.data + " ");
else if (level > 1) {
printGivenLevel(root.left, level - 1);
printGivenLevel(root.right, level - 1);
}
}
/*Height of Binary tree*/
int height(Node root) {
if (root == null)
return 0;
else {
int lHeight = height(root.left);
int rHeight = height(root.right);
if (lHeight > rHeight)
return (lHeight + 1);
else return (rHeight + 1);
}
}
}
For your point 2) If you want to use non recursive function then you can use queue as below function-
public void levelOrder_traversal_nrec(Node node){
System.out.println("Level order traversal !!! ");
if(node == null){
System.out.println("Tree is empty");
return;
}
nodeQueue.add(node);
while (!nodeQueue.isEmpty()){
node = nodeQueue.remove();
System.out.printf("%s ",node.data);
if(node.left !=null)
nodeQueue.add(node.left);
if (node.right !=null)
nodeQueue.add(node.right);
}
System.out.println("\n");
}
Recursive Solution in C++
/**
* Definition for a binary tree node.
* struct TreeNode {
* int val;
* TreeNode *left;
* TreeNode *right;
* TreeNode() : val(0), left(nullptr), right(nullptr) {}
* TreeNode(int x) : val(x), left(nullptr), right(nullptr) {}
* TreeNode(int x, TreeNode *left, TreeNode *right) : val(x), left(left), right(right) {}
* };
*/
class Solution {
public:
vector<vector<int>> levels;
void helper(TreeNode* node,int level)
{
if(levels.size() == level) levels.push_back({});
levels[level].push_back(node->val);
if(node->left)
helper(node->left,level+1);
if(node->right)
helper(node->right,level+1);
}
vector<vector<int>> levelOrder(TreeNode* root) {
if(!root) return levels;
helper(root,0);
return levels;
}
};
We can use queue to solve this problem in less time complexity. Here is the solution of level order traversal suing Java.
class Solution {
public List<List<Integer>> levelOrder(TreeNode root) {
List<List<Integer>> levelOrderTraversal = new ArrayList<List<Integer>>();
List<Integer> currentLevel = new ArrayList<Integer>();
Queue<TreeNode> queue = new LinkedList<TreeNode>();
if(root != null)
{
queue.add(root);
queue.add(null);
}
while(!queue.isEmpty())
{
TreeNode queueRoot = queue.poll();
if(queueRoot != null)
{
currentLevel.add(queueRoot.val);
if(queueRoot.left != null)
{
queue.add(queueRoot.left);
}
if(queueRoot.right != null)
{
queue.add(queueRoot.right);
}
}
else
{
levelOrderTraversal.add(currentLevel);
if(!queue.isEmpty())
{
currentLevel = new ArrayList<Integer>();
queue.add(null);
}
}
}
return levelOrderTraversal;
}
}

Data structure for handling intervals

I have got a series of time intervals (t_start,t_end), that cannot overlap, i.e.: t_end(i) > t_start(i+1). I want to do the following operations:
1) Add new (Union of) intervals [ {(1,4),(8,10)} U (3,7) = {(1,7),(8,10)} ]
2) Take intervals out [ (1,7) - (3,5) = {(1,3),(5,7)}
3) Checking whether a point or a interval overlaps with an interval in my series (intersection)
4) Finding the first "non-interval" of a minimum length after some point [ {(1,4),(7,8)}: there is a "non-interval" of length 3 between 4 and 7 ].
I want to know good ways of implementing this, with low complexities (log n for all operations would do it).
Related question: Data structure for quick time interval look up
It sounds like you could just use a balanced binary tree of all the boundary times.
For example, represent {(1,4), (8,10), (12,15)} as a tree containing 1, 4, 8, 10, 12, and 15.
Each node needs to say whether it's the start or end of an interval. So:
8 (start)
/ \
1 (start) 12 (start)
\ / \
4 (end) 10 (end) 15 (end)
(Here all the "end" nodes ended up at the bottom by coincidence.)
Then I think you can have all your operations in O(log n) time. To add an interval:
Find the start time. If it's already in the tree as a start time, you can leave it there. If it's already in the tree as an end time, you'll want to remove it. If it's not in the tree and it doesn't fall during an existing interval, you'll want to add it. Otherwise you don't want to add it.
Find the stop time, using the same method to find out if you need to add it, remove it, or neither.
Now you just want to add or remove the abovementioned start and stop nodes and, at the same time, delete all the existing nodes in between. To do this you only need to rebuild the tree nodes at or directly above those two places in the tree. If the height of the tree is O(log n), which you can guarantee by using a balanced tree, this takes O(log n) time.
(Disclaimer: If you're in C++ and doing explicit memory management, you might end up freeing more than O(log n) pieces of memory as you do this, but really the time it takes to free a node should be billed to whoever added it, I think.)
Removing an interval is largely the same.
Checking a point or interval is straightforward.
Finding the first gap of at least a given size after a given time can be done in O(log n) too, if you also cache two more pieces of information per node:
In each start node (other than the leftmost), the size of the gap immediately to the left.
In every node, the size of the largest gap that appears in that subtree.
To find the first gap of a given size that appears after a given time, first find that time in the tree. Then walk up until you reach a node that claims to contain a large enough gap. If you came up from the right, you know this gap is to the left, so you ignore it and keep walking up. Otherwise you came from the left. If the node is a start node, check to see if the gap to its left is large enough. If so, you're done. Otherwise, the large-enough gap must be somewhere to the right. Walk down to the right and continue down until you find the gap. Again, because the height of the tree is O(log n), walking it three times (down, up, and possibly down again) is O(log n).
Without knowing anymore specifics, I'd suggest reading about Interval Trees. Interval trees are a special 1 dimensional case of more generic kd-trees, and have a O(n log n) construction time, and O(log n) typical operation times. Exact algorithm implementations you'd need to find yourself, but you can start by looking at CGAL.
I know you've already accepted an answer, but since you indicated that you will probably be implementing in C++, you could also have a look at Boosts Interval Container Library (http://www.boost.org/doc/libs/1_46_1/libs/icl/doc/html/index.html).
My interval tree implementation with AVL tree.
public class IntervalTreeAVL<T>{
private static class TreeNode<T>{
private T low;
private T high;
private TreeNode<T> left;
private TreeNode<T> right;
private T max;
private int height;
private TreeNode(T l, T h){
this.low=l;
this.high=h;
this.max=high;
this.height=1;
}
}
private TreeNode<T> root;
public void insert(T l, T h){
root=insert(root, l, h);
}
private TreeNode<T> insert(TreeNode<T> node, T l, T h){
if(node==null){
return new TreeNode<T>(l, h);
}
else{
int k=((Comparable)node.low).compareTo(l);
if(k>0){
node.left=insert(node.left, l, h);
}
else{
node.right=insert(node.right, l, h);
}
node.height=Math.max(height(node.left), height(node.right))+1;
node.max=findMax(node);
int hd = heightDiff(node);
if(hd<-1){
int kk=heightDiff(node.right);
if(kk>0){
node.right=rightRotate(node.right);
return leftRotate(node);
}
else{
return leftRotate(node);
}
}
else if(hd>1){
if(heightDiff(node.left)<0){
node.left = leftRotate(node.left);
return rightRotate(node);
}
else{
return rightRotate(node);
}
}
else;
}
return node;
}
private TreeNode<T> leftRotate(TreeNode<T> n){
TreeNode<T> r = n.right;
n.right = r.left;
r.left=n;
n.height=Math.max(height(n.left), height(n.right))+1;
r.height=Math.max(height(r.left), height(r.right))+1;
n.max=findMax(n);
r.max=findMax(r);
return r;
}
private TreeNode<T> rightRotate(TreeNode<T> n){
TreeNode<T> r = n.left;
n.left = r.right;
r.right=n;
n.height=Math.max(height(n.left), height(n.right))+1;
r.height=Math.max(height(r.left), height(r.right))+1;
n.max=findMax(n);
r.max=findMax(r);
return r;
}
private int heightDiff(TreeNode<T> a){
if(a==null){
return 0;
}
return height(a.left)-height(a.right);
}
private int height(TreeNode<T> a){
if(a==null){
return 0;
}
return a.height;
}
private T findMax(TreeNode<T> n){
if(n.left==null && n.right==null){
return n.max;
}
if(n.left==null){
if(((Comparable)n.right.max).compareTo(n.max)>0){
return n.right.max;
}
else{
return n.max;
}
}
if(n.right==null){
if(((Comparable)n.left.max).compareTo(n.max)>0){
return n.left.max;
}
else{
return n.max;
}
}
Comparable c1 = (Comparable)n.left.max;
Comparable c2 = (Comparable)n.right.max;
Comparable c3 = (Comparable)n.max;
T max=null;
if(c1.compareTo(c2)<0){
max=n.right.max;
}
else{
max=n.left.max;
}
if(c3.compareTo((Comparable)max)>0){
max=n.max;
}
return max;
}
TreeNode intervalSearch(T t1){
TreeNode<T> t = root;
while(t!=null && !isInside(t, t1)){
if(t.left!=null){
if(((Comparable)t.left.max).compareTo(t1)>0){
t=t.left;
}
else{
t=t.right;
}
}
else{
t=t.right;
}
}
return t;
}
private boolean isInside(TreeNode<T> node, T t){
Comparable cLow=(Comparable)node.low;
Comparable cHigh=(Comparable)node.high;
int i = cLow.compareTo(t);
int j = cHigh.compareTo(t);
if(i<=0 && j>=0){
return true;
}
return false;
}
}
I've just found Guava's Range and RangeSet which do exactly that.
It implements all the operations cited:
Union
RangeSet<Integer> intervals = TreeRangeSet.create();
intervals.add(Range.closedOpen(1,4)); // stores {[1,4)}
intervals.add(Range.closedOpen(8,10)); // stores {[1,4), [8,10)}
// Now unite 3,7
intervals.add(Range.closedOpen(3,7)); // stores {[1,7), [8,10)}
Subraction
intervals.remove(Range.closedOpen(3,5)); //stores {[1,3), [5, 7), [8, 10)}
Intersection
intervals.contains(3); // returns false
intervals.contains(5); // returns true
intervals.encloses(Range.closedOpen(2,4)); //returns false
intervals.subRangeSet(Range.closedOpen(2,4)); // returns {[2,3)} (isEmpty returns false)
intervals.subRangeSet(Range.closedOpen(3,5)).isEmpty(); // returns true
Finding empty spaces (this will be the same complexity as a set iteration in the worst case):
Range freeSpace(RangeSet<Integer> ranges, int size) {
RangeSet<Integer> frees = intervals.complement().subRangeSet(Range.atLeast(0));
for (Range free : frees.asRanges()) {
if (!free.hasUpperBound()) {
return free;
}
if (free.upperEndpoint() - free.lowerEndpoint() >= size) {
return free;
}
}

Resources