When there is an Imbalance and a Rotation is needed in an AVL tree, how do you go about picking which nodes to rotate first. In examples, sometimes I see the root node gets rotated first, sometimes I see a parent node get rotated first and sometimes a leaf node gets rotated first.
With AVL trees, the imbalance caused by an insertion or deletion of a leaf will be detected as the balance factors are updated along the path from the leaf to root. As soon as a node is found (in that upwards traversal) that gets a balance factor that is out of range, either a single or double rotation will happen at that node. However, a double rotation breaks down into two single rotations, and the first of these two really acts on a child of the given node.
Related
If a node is inserted in an AVL tree, it might happen that one of the nodes in the path to the new_node would lose height balance. But my question is if that node is fixed, can other nodes above it (the ancestors till the root) would still retain height imbalance (in case they lost the balance earlier).
I have done some paper work and could observe that such a scenario is not possible. Once height imbalance is fixed at a node, all its ancestors should be fixed automatically (if they were affected).
The imbalance can be fixed locally; in total there are four cases that need to be considered. More precisely, it is two cases (single rotation and double rotation) with the other two ones being the mirrored versions of them; the operations are decribed here. There is no need to follow the path to the root and rebalance every node on this path. In total, the rebalancing can be done in constant time.
I have an algorithm problem that needs binary tree structure similar to a binary tree. but the difference is that it may have nodes apart from the original tree independently.
And each node has three types. The first type is to point out starting node and only one exists. The second type is to point out connecting node and of course, and the last type is to point out a leaf node. Each edge has a cost to traverse to its bottom node.
Which data structure is good for me to cost to reach each node?
UPDATE
OK, I questioned this with data-structure tag so that I want to avoid to explain what the problem is. But inevitably, I explain about the problem because of lack of my explaination and my poor English.
I have nodes lists and edges with costs. There is a starting node(root node), nodes where will be located in the middle of a tree and leaf nodes are the destination for my program to traverse starting from a root node. But some of the leaf nodes may be ignored depending on the value in it. It is not important anyway. I have to calculate all leaf nodes' cost to reach its node from the root node and get the maximum value for them. Now, The problem is to adjust the cost value in edges for all other leaf nodes to have the same total cost with the maximum cost. But the sum of the adjust values has to be the minumum.
How to find a loop in a binary tree? I am looking for a solution other than marking the visited nodes as visited or doing a address hashing. Any ideas?
Suppose you have a binary tree but you don't trust it and you think it might be a graph, the general case will dictate to remember the visited nodes. It is, somewhat, the same algorithm to construct a minimum spanning tree from a graph and this means the space and time complexity will be an issue.
Another approach would be to consider the data you save in the tree. Consider you have numbers of hashes so you can compare.
A pseudocode would test for this conditions:
Every node would have to have a maximum of 2 children and 1 parent (max 3 connections). More then 3 connections => not a binary tree.
The parent must not be a child.
If a node has two children, then the left child has a smaller value than the parent and the right child has a bigger value. So considering this, if a leaf, or inner node has as a child some node on a higher level (like parent's parent) you can determine a loop based on the values. If a child is a right node then it's value must be bigger then it's parent but if that child forms a loop, it means he is from the left part or the right part of the parent.
3.a. So if it is from the left part then it's value is smaller than it's sibling. So => not a binary tree. The idea is somewhat the same for the other part.
Testing aside, in what form is the tree that you want to test? Remeber that every node has a pointer to it's parent. An this pointer points to a single parent. So depending of the format you tree is in, you can take advantage from this.
As mentioned already: A tree does not (by definition) contain cycles (loops).
To test if your directed graph contains cycles (references to nodes already added to the tree) you can iterate trough the tree and add each node to a visited-list (or the hash of it if you rather prefer) and check each new node if it is in the list.
Plenty of algorithms for cycle-detection in graphs are just a google-search away.
Problem: I have a binary tree, all leaves are numbered (from left to right, starting from 0) and no connection exists between them.
I want an algorithm that, given two indices (of 2 distinct leaves), visits the tree starting from the greater leaf (the one with the higher index) and gets to the lower one.
The internal nodes of the tree do not contain any useful information.
I should chose the path based only on the leaves indices. The path start from a leaf and terminates on a leaf, and of course I can access a leaf if I know its index (through an array of pointers)
The tree is static, no insertion or deletion of nodes is allowed.
I have developed an algorithm to do it but it really sucks... any ideas?
One option would be to find the least common ancestor of the two nodes, along with the sequence of nodes you should take from each node to get to that ancestor. Here's a sketch of the algorithm:
Starting from each node, walk back up to that node's parent until you reach the root. Count the number of nodes on the path from each node to the root. Let the height of the first node be h1 and the height of the second node be h2.
Let h = min(h1, h2). This is the height of the higher of the two nodes.
Starting from each node, keep following the node's parent pointer until both nodes are at height h. Record the nodes you followed during this step. At this point, both nodes are at the same height.
Until you find a common node, keep marching upwards from each node to its parent. Eventually you will hit their common ancestor. At this point, follow the path from the first node up to this ancestor, then down the path from the ancestor down to the second node.
In the worst case, this takes O(h) time and O(h) space, where h is the height of the tree. For a balanced binary tree is this O(lg n) time and space, which is quite good.
If you're interested in a Much More Hardcore version of this algorithm, consider looking into Tarjan's Least Common Ancestors algorithm, which with linear preprocessing time, can be used to find the least common ancestor much more rapidly than this.
Hope this helps!
Distance between any two nodes can be calculated with the help of lowest common ancestor:
Dist(n1, n2) = Dist(root, n1) + Dist(root, n2) - 2*Dist(root, lca)
where lca is lowest common ancestor.
see this for more help about this algorithm and see this video for learning how to calculate lca.
Wikipedia: Directed Acyclic Graph
Not sure if leaf node is still proper terminology since it's not really a tree (each node can have multiple children and also multiple parents) and also I'm actually trying to find all the root nodes (which is really just a matter of semantics, if you reverse the direction of all the edges it'd they'd be leaf nodes).
Right now we're just traversing the entire graph (that's reachable from the specified node), but that's turning out to be somewhat expensive, so I'm wondering if there's a better algorithm for doing this. One thing I'm thinking is that we keep track of nodes that have been visited already (while traversing a different path) and don't recheck those.
Are there any other algorithmic optimizations?
We also thought about keeping a list of root nodes that this node is a descendant of, but it seems like maintaining such a list would be fairly expensive as well if we need to check if it changes every time a node is added, moved, or removed.
Edit:
This is more than just finding a single node, but rather finding ALL nodes that are endpoints.
Also there is no master list of nodes. Each node has a list of it's children and it's parents. (Well, that's not completely true, but pulling millions of nodes from the DB ahead of time is prohibitively expensive and would likely cause an OutOfMemory exception)
Edit2:
May or may not change possible solutions, but the graph is bottom-heavy in that there's at most a few dozen root nodes (what I'm trying to find) and some millions (possibly tens or hundreds of millions) leaf nodes (where I'm starting from).
There are a few methods that each may be faster depending on your structure, but in general what youre going to want is a traversal.
A depth first search, goes through each possible route, keeping track of nodes that have already been visited. It's a recursive function, because at each node you have to branch and try each child node of it. There's no faster method if you dont know which way to look for the object you just have to try each way! You definitely need to keep track of where you have already been because it would be wasteful otherwise. It should require on the order of the number of nodes to do a full traversal.
A breadth first search is similar but visits each child of the node before "moving on" and as such builds up layers of distance from the chosen root. This can be faster if the destination is expected to be close to the root node. It would be slower if it is expected to be all the way down a path, because it forces you to traverse every possible edge.
Youre right about maybe keeping a list of known root nodes, the tradeoff there is that you basically have to do the search whenever you alter the graph. If you are altering the graph rarely this is acceptable, but if you alter the graph more frequently than you need to generate this information, then of course it is too costly.
EDIT: Info Update.
It sounds like we are actually looking for a path between two arbitrary nodes, the root/leaf semantic keeps getting switched. The DepthFirstSearch (DFS) starts at one node, and then for each unvisited child, recurse. Break if you find the target node. Due to the way recursion evaluates, this will traverse all the way down the 'left' path, then enumerate nodes at this distance before ever getting to the 'right' path. This is time costly and inefficient if the target node is potentially the first child on the right. BreadthFirst walks in steps, covering all children before moving forward. Because your graph is bottom heavy like a tree, both will be approximately the same execution time.
When the graph is bottom heavy you might be interested in a reverse traversal. Start at the target node and walk upwards, because there are relatively fewer nodes in this direction. So long as the nodes in general have more parents than children, this direction will be much faster. You can also combine the approaches, stepping one up and one down , then comparing lists of nodes, and meeting somewhere in the middle. (this combination might seem the fastest if you ignore that twice as much work is done at each step).
However, since you said that your graph is stored as a list of lists of children, you have no real way of traversing the graph backwards. A node does not know what its parents are. This is a problem. To fix it you have to get a node to know what its parents are by adding that data on graph update, or by creating a duplicate of the whole structure (which you have said is too large). It will need the whole structure to be rewritten, which sounds probably out of the question due to it being a large database at this point.
There's a lot of work to do.
http://en.wikipedia.org/wiki/Graph_(data_structure)
Just color (keep track of) visited nodes.
Sample in Python:
def reachable(nodes, edges, start, end):
color = {}
for n in nodes:
color[n] = False
q = [start]
while q:
n = q.pop()
if color[n]:
continue
color[n] = True
for adj in edges[n]:
q.append(adj)
return color[end]
For a vertex x you want to compute a bit array f(x), each bit corresponds to a root vertex Ri, and 1 (resp 0) means "x can (resp can't) be reached from root vertex Ri.
You could partition the graph into one "upper" set U containing all your target roots R and such that if x in U then all parents of x are in U. For example the set of all vertices at distance <=D from the closest Ri.
Keep U not too big, and precompute f for each vertex x of U.
Then, for a query vertex y: if y is in U, you already have the result. Otherwise recursively perform the query for all parents of y, caching the value f(x) for each visited vertex x (in a map for example), so you won't compute a value twice. The value of f(y) is the bitwise OR of the value of its parents.