Algorithm for Tree Traversal - algorithm

Update:
I found more of an example of what I'm trying to pull off: Managing Hierarchical Data in MySQL. I want to do that but in JavaScript because I am building an app that takes in comments that are in a hierarchical structure, to be more specific reddit.com. If you have the Pretty JSON extension on your chrome web browser go to reddit and click on a threads comments and then add .json to the url to see what I am parsing.
I get the JSON data just fine, its just parsing through the comments and adding the appropriate HTML to show that its nested.
Ideas for solutions?
OLD question:
I am working on a program and I have come to a part that I need to figure out the logic before I write the code.
I am taking in data that is in a tree format but with the possibility of several children for each parent node and the only tree's I can seem to find data on are tree's with weights or tree's where at most each node has two child nodes. So I'm trying to figure out the algorithm to evaluate each node of a tree like this:
startingParent[15] // [# of children]
child1[0]
child2[5]
child2ch1[4]
...
child2ch5[7]
child3[32]
...
child15[4]
Now when I try to write out how my algorithm would work I end up writing nested for/while loops but I end up writing a loop for each level of the height of the tree which for dynamic data and tree's of unknown height with unknown number of children per node this doesn't work. I know that at some point I learned how to traverse a tree like this but its completely escaping me right now. Anyone know how this is done in terms of loops?

If you're not going to use recursion, you need an auxiliary data structure. A queue will give you a breadth-first traversal, whereas a stack will give you a depth-first traversal. Either way it looks roughly like this:
structure <- new stack (or queue)
push root onto structure
while structure is not empty
node <- pop top off of structure
visit(node)
for each child of node
push child onto structure
loop
Wikipedia References
Queue
Stack

Use recursion, not loops.
Breadth first search
Depth first search
Those should help you get started with what you're trying to accomplish

Just use recursion like
def travel(node):
for child in node.childs:
# Do something
travel(child)

The simplest code for most tree traversal is usually recursive. For a multiway tree like yours, it's usually easiest to have a loop that looks at each pointer to a child, and calls itself with that node as the argument, for all the child nodes.

Related

Why we are not saving the parent pointer in "B+ Tree" for easy upward traversal in tree?

Will it affect much if I add a pointer to the parent Node to get simplicity during splitting and insertion process?
General Node would then look something like this :
class BPTreeNode{
bool leaf;
BPTreeNode *next;
BPTreeNode *parent; //add-on
std::vector < int* >pointers;
std::vector < int >keys;
};
What are the challenges I might get in real life database system since right now.
I am only implementing it as a hobby project.
There are two reasons I can think of:
The algorithm for deleting a value from a B+tree may result in an internal block A that has too few child blocks. If neither the block at the left or right of A can pass an entry to A in order to resolve this violation, then block A needs to merge into a sibling block B. This means that all the child blocks of block A need to have their parent pointer updated to block B. This is additional work that increases (a lot) the number of blocks that need an update in a delete-operation.
It represents extra space that is really not needed for performing the standard B+Tree operations. When searching a value via a B+Tree you can easily keep track of the path to that leaf level and use it for backtracking upwards in the tree.

Rope tree that returns node that can be used to find the position and who's parent is a token

I am trying to write a compiler/code editor.
To speed up the process I want a red and black tree that returns a node which I can then use to get the strings under it, and it's position value, and use it's parent node as a place to store a token (such as alphanumeric_word or left_parenthesis).
I am having trouble finding the best way to go about this.
I basically want something that can do the following:
tree.insert("01234567890123456789",0);
node = tree.at(10);
tree.insert("string",5);
node.index(); //should be 10+length("string")
node.value(); //should be '0'
node.tokenPtr.value; //should point to a token with the value of NUMBER
I am looking for the simplest implementation of such a tree that I could modify since these can be frustrating to build and debug from scratch.
The following code is sort of what I am looking for (it has parent nodes), but it lacks an indexing feature for index look up. This is needed because I want to create a map that uses the node as it's key and node.index() as it's sorting value so that I don't have to update the keys in that map.
[[archive.gamedev.net/archive/reference/programming/features/TStorage/page2.html]]
I have tried to look at sgi's rope implimentation, but the code is overwhelming and difficult to understand.
This tutorial seems to be helpfull, however it also doesn't provide a doubly linked tree which I think can be used to find the index of a node:
[[eternallyconfuzzled.com/tuts/datastructures/jsw_tut_rbtree.aspx]]
Update:
I have found an implementation that has a parent node, however it still lacks an index count property:
[[web.mit.edu/~emin/Desktop/ref_to_emin/www.old/source_code/red_black_tree/index.html]]
I have found one solution and another that might work.
You have to use the sgi stl rope mutable_begin()+(index) iterator.
There is also this function, however I am still having trouble analyzing the sgi rope code to see what it does:
mutable_reference_at(index)

Neo4j - slow cypher query - big graph with hierarchies

Using Neo4j 2.1.4. I have a graph with 'IS A' relationships (and other types of relationships) between nodes. I have some hierarchies inside the graph (IS A relationships) and I need to know the descendants (IS A relationship) of one hierarchy that has a particular-known relationship with some descendant of second hierarchy. If that particular-known relationship exists, I return the descendant/s of the first hierarchy.
INPUTS: 'ID_parentnode_hierarchy_01', 'ID_relationship', 'ID_parentnode_hierarchy_02'.
OUTPUT: Descendants (IS A relationship) of 'ID_parentnode_hierarchy_01' that has 'ID_relationship' with some descendant of 'ID_parentnode_hierarchy_02'.
Note: The graph has 500.000 nodes and 2 million relationships.
I am using this cypher query but it is very slow (aprox. 40s in a 4GB RAM and 3GHz Pentium Dual Core 64 bit PC). It is possible to build a faster query?
MATCH (parentnode_hierarchy_01: Node{nodeid : {ID_parentnode_hierarchy_01}})
WITH parentnode_hierarchy_01
MATCH (parentnode_hierarchy_01) <- [:REL* {reltype: {isA}}] - (descendants01: Node)
WITH descendants01
MATCH (descendants01) - [:REL {reltype: {ID_relationship}}] -> (descendants02: Node)
WITH descendants02, descendants01
MATCH (parentnode_hierarchy_02: Node {nodeid: {ID_parentnode_hierarchy_02} })
<- [:REL* {reltype: {isA}}] - (descendants02)
RETURN DISTINCT descendants01;
Thank you very much.
Well, I can slightly clean up your query - this might help us understand the issues better. I doubt this one will run faster, but using the cleaned up version we can discuss what's going on: (mostly eliminating unneeded uses of MATCH/WITH)
MATCH (parent:Node {nodeid: {ID_parentnode_hierarchy_01}})<-[:REL* {reltype:{isA}}]-
(descendants01:Node)-[:REL {reltype:{ID_relationship}}]->(descendants02:Node),
(parent2:Node {nodeid: {ID_parentnode_hierarchy_02}})<-[:REL* {reltype:{isA}}]-
(descendants02)
RETURN distinct descendants01;
This looks like you're searching two (probably large) trees, starting from the root, for two nodes somewhere in the tree that are linked by an {ID_relationship}.
Unless you can provide some query hints about which node in the tree might have an ID_relationship or something like that, at worst, this looks like you could end up comparing every two nodes in the two trees. So this looks like it could take n * k time, where n is the number of nodes in the first tree, k the number of nodes in the second tree.
Here are some strategy things to think about - which you should use depends on your data:
Is there some depth in the tree where these links are likely to be found? Can you put a range on the depth of [:REL* {reltype:{isA}}]?
What other criteria can you add to descendants01 and descendants02? Is there anything that can help make the query more selective so that you're not comparing every node in one tree to every node in the other?
Another strategy you might try is this: (this might be a horrible idea, but it's worth trying) -- basically look for a path from one root to the other, over any number of undirected edges of either isa type, or the other. Your data model has :REL relationships with a reltype attribute. This is probably an antipattern; instead of a reltype attribute, why is the relationship type not just that? This prevents the query that I want to write, below:
MATCH p=shortestPath((p1:Node {nodeid: {first_parent_id}})-[:isA|ID_relationship*]-(p2:Node {nodeid: {second_parent_id}}))
return p;
This would return the path from one "root" to the other, via the bridge you want. You could then use path functions to extract whatever nodes you wanted. Note that this query isn't possible currently because of your data model.

Need some help understanding this problem about maximizing graph connectivity

I was wondering if someone could help me understand this problem. I prepared a small diagram because it is much easier to explain it visually.
alt text http://img179.imageshack.us/img179/4315/pon.jpg
Problem I am trying to solve:
1. Constructing the dependency graph
Given the connectivity of the graph and a metric that determines how well a node depends on the other, order the dependencies. For instance, I could put in a few rules saying that
node 3 depends on node 4
node 2 depends on node 3
node 3 depends on node 5
But because the final rule is not "valuable" (again based on the same metric), I will not add the rule to my system.
2. Execute the request order
Once I built a dependency graph, execute the list in an order that maximizes the final connectivity. I am not sure if this is a really a problem but I somehow have a feeling that there might exist more than one order in which case, it is required to choose the best order.
First and foremost, I am wondering if I constructed the problem correctly and if I should be aware of any corner cases. Secondly, is there a closely related algorithm that I can look at? Currently, I am thinking of something like Feedback Arc Set or the Secretary Problem but I am a little confused at the moment. Any suggestions?
PS: I am a little confused about the problem myself so please don't flame on me for that. If any clarifications are needed, I will try to update the question.
It looks like you are trying to determine an ordering on requests you send to nodes with dependencies (or "partial ordering" for google) between nodes.
If you google "partial order dependency graph", you get a link to here, which should give you enough information to figure out a good solution.
In general, you want to sort the nodes in such a way that nodes come after their dependencies; AKA topological sort.
I'm a bit confused by your ordering constraints vs. the graphs that you picture: nothing matches up. That said, it sounds like you have soft ordering constraints (A should come before B, but doesn't have to) with costs for violating the constraint. An optimal algorithm for scheduling that is NP-hard, but I bet you could get a pretty good schedule using a DFS biased towards large-weight edges, then deleting all the back edges.
If you know in advance the dependencies of each node, you can easily build layers.
It's amusing, but I faced the very same problem when organizing... the compilation of the different modules of my application :)
The idea is simple:
def buildLayers(nodes):
layers = []
n = nodes[:] # copy the list
while not len(n) == 0:
layer = _buildRec(layers, n)
if len(layer) == 0: raise RuntimeError('Cyclic Dependency')
for l in layer: n.remove(l)
layers.append(layer)
return layers
def _buildRec(layers, nodes):
"""Build the next layer by selecting nodes whose dependencies
already appear in `layers`
"""
result = []
for n in nodes:
if n.dependencies in flatten(layers): result.append(n) # not truly python
return result
Then you can pop the layers one at a time, and each time you'll be able to send the request to each of the nodes of this layer in parallel.
If you keep a set of the already selected nodes and the dependencies are also represented as a set the check is more efficient. Other implementations would use event propagations to avoid all those nested loops...
Notice in the worst case you have O(n3), but I only had some thirty components and there are not THAT related :p

Is there a specific name for the node that coresponds to a subtree?

I'm designing a web site navigation hierarchy. It's a tree of nodes. Nodes represent web pages.
Some nodes on the tree are special. I need a name for them.
There are multiple such nodes. Each is the "root" of a sub-tree with pages that have a distinct logo, style sheet, or layout. Think of different departments.
site map with color-coded sub-trees http://img518.imageshack.us/img518/153/subtreesfe1.gif
What should I name this type of node?
How about Root (node with children, but no parent), Node (node with children and parent) and Leaf (node with no children and parent)?
You can then distinguish by name and position within the tree structure (E.g. DepartmentRoot, DepartmentNode, DepartmentLeaf) if need be..
Update Following Comment from OP
Looking at your question, you said that "some" are special, and in your diagram, you have different nodes looking differently at different levels. The nodes may be different in their design, you can build a tree structure many ways. For example, a single abstract class that can have child nodes, if no children, its a leaf, if no parent, its a root but this can change in its lifetime. Or, a fixed class structure in which leafs are a specific class type that cannot have children added to them in any way.
IF your design does not need you to distinguish nodes differently depending on their position (relative to the root) it suggests that you have an abstract class used for them all.
In which case, it raises the question, how is it different?
If it is simply the same as the standard node everywhere else, but with a bit of styling, how about StyledNode? Do you even need it to be seperate (no style == no big deal, it doesn't render).
Since I don't know the mechanics of how the tree is architected, there could possibly be several factors to consider when naming.
The word you are looking for is "Section". It's part of a whole and has the same stuff inside.
So, you have Nodes, which have children and a parent, and you have SectionNodes which are the roots of these special subtrees.
How about PageTemplate to embody the fact that its children have their own layout, CSS etc?
AreaNode
So, it sounds like you are gathering categories. The nodes are the entry points of this categories. How about "TopCategoryNode", "CategoryEntry" then for som,ething that is below them. Or, if you want to divide more, something like "CategoryCSS", "CategoryLayout" etc?
This is kind of generic, but you make clear that there are "categories", and that these do consist of more than one subnode, or subtheme.
Branch ?
Keeps the tree analogy and also hints in this case at departments etc
Thinking about class heirarchies, Root is probably a special case of Branch, which is a special case of Node, special case of Leaf. The Branch/Node distinction is one you get to make for your special situation.

Resources