Induced height imbalance in AVL tree - algorithm

If a node is inserted in an AVL tree, it might happen that one of the nodes in the path to the new_node would lose height balance. But my question is if that node is fixed, can other nodes above it (the ancestors till the root) would still retain height imbalance (in case they lost the balance earlier).
I have done some paper work and could observe that such a scenario is not possible. Once height imbalance is fixed at a node, all its ancestors should be fixed automatically (if they were affected).

The imbalance can be fixed locally; in total there are four cases that need to be considered. More precisely, it is two cases (single rotation and double rotation) with the other two ones being the mirrored versions of them; the operations are decribed here. There is no need to follow the path to the root and rebalance every node on this path. In total, the rebalancing can be done in constant time.

Related

AVL Tree rotation order

When there is an Imbalance and a Rotation is needed in an AVL tree, how do you go about picking which nodes to rotate first. In examples, sometimes I see the root node gets rotated first, sometimes I see a parent node get rotated first and sometimes a leaf node gets rotated first.
With AVL trees, the imbalance caused by an insertion or deletion of a leaf will be detected as the balance factors are updated along the path from the leaf to root. As soon as a node is found (in that upwards traversal) that gets a balance factor that is out of range, either a single or double rotation will happen at that node. However, a double rotation breaks down into two single rotations, and the first of these two really acts on a child of the given node.

AI minimax algorithm to process a state reachable from root through multiple paths, of different lengths

A standard minimax algorithm considers root level as MAX and subsequent levels alternating between MIN and MAX. Consider a tree-node that can be reached through more than one paths. If the difference in path lengths is odd, it implies different levels so should that node be MIN or MAX ? Is it more likely if branching factor > 2 ? If not possible, please explain why.
Consider the state node N which is reachable from root through multiple paths. The downward path from N to the goal states should be identical.
Keeping in mind the purpose of minimax algorithm, the node N (and its downstream subtree) should be duplicated in the tree to create a loop-free star topology. The two nodes (and downstream nodes in the subtrees) will now operate as MIN or MAX as per individual path lengths.
Let us define a "node" as a full state, which includes which player is to move (the MAX or the MIN player). Then, it is not possible to have a single node that is reachable both by MIN and MAX, because they would be, by definition, different nodes.
In chess, you can reach the exact same position of pieces in the board with the only difference being whether white or black is to move. But those are fundamentally different game-states, and therefore different nodes in the tree!
So, to answer your questions:
player-to-move is an important part of node identity
this makes it impossible to reach the same node through odd-length paths from the root, by the very definition of node identity. If it is reachable through an odd-length path, it is a different node.
high branching factors (assuming same total number of possible nodes) make it more likely to find previously-encountered nodes (more positions would be repeated in chess if, say, pawns could go backwards) -- but do not alter the above.

What is the number of nodes at a particular level in a balanced binary search tree?

I was asked this question in a phone screen interview and I was not able to answer it. For example, in a BST, I know that the maximum number of nodes is given by 2^h (assuming the root node at height = 0)
I wanted to ask, is there a similar mathematical outcome for a balanced binary search tree as well (For AVL, Red Black trees?), i.e. the number of nodes at a particular level k.
Thanks!
A balanced binary tree starts with one node, which has two descendants. Each of those then has two descendants again. So there will be 1, 2, 4, 8 and so on nodes per level.
As a formula you can use 2^(level-1). The last row might not be completely full, so it can have less elements.
As the balancing step is costly, implementations usually do not rebalance after every mutation of the tree. They will rather apply a heuristic to find out when a rebalancing will make the most sense. So in practice, levels might have less nodes than if the tree were perfectly balanced and there might be additional levels from nodes being inserted in the wrong places.

Recommmend a proper data structure

I have an algorithm problem that needs binary tree structure similar to a binary tree. but the difference is that it may have nodes apart from the original tree independently.
And each node has three types. The first type is to point out starting node and only one exists. The second type is to point out connecting node and of course, and the last type is to point out a leaf node. Each edge has a cost to traverse to its bottom node.
Which data structure is good for me to cost to reach each node?
UPDATE
OK, I questioned this with data-structure tag so that I want to avoid to explain what the problem is. But inevitably, I explain about the problem because of lack of my explaination and my poor English.
I have nodes lists and edges with costs. There is a starting node(root node), nodes where will be located in the middle of a tree and leaf nodes are the destination for my program to traverse starting from a root node. But some of the leaf nodes may be ignored depending on the value in it. It is not important anyway. I have to calculate all leaf nodes' cost to reach its node from the root node and get the maximum value for them. Now, The problem is to adjust the cost value in edges for all other leaf nodes to have the same total cost with the maximum cost. But the sum of the adjust values has to be the minumum.

Is there an effient way of determining whether a leaf node is reachable from another arbitrary node in a Directed Acyclic Graph?

Wikipedia: Directed Acyclic Graph
Not sure if leaf node is still proper terminology since it's not really a tree (each node can have multiple children and also multiple parents) and also I'm actually trying to find all the root nodes (which is really just a matter of semantics, if you reverse the direction of all the edges it'd they'd be leaf nodes).
Right now we're just traversing the entire graph (that's reachable from the specified node), but that's turning out to be somewhat expensive, so I'm wondering if there's a better algorithm for doing this. One thing I'm thinking is that we keep track of nodes that have been visited already (while traversing a different path) and don't recheck those.
Are there any other algorithmic optimizations?
We also thought about keeping a list of root nodes that this node is a descendant of, but it seems like maintaining such a list would be fairly expensive as well if we need to check if it changes every time a node is added, moved, or removed.
Edit:
This is more than just finding a single node, but rather finding ALL nodes that are endpoints.
Also there is no master list of nodes. Each node has a list of it's children and it's parents. (Well, that's not completely true, but pulling millions of nodes from the DB ahead of time is prohibitively expensive and would likely cause an OutOfMemory exception)
Edit2:
May or may not change possible solutions, but the graph is bottom-heavy in that there's at most a few dozen root nodes (what I'm trying to find) and some millions (possibly tens or hundreds of millions) leaf nodes (where I'm starting from).
There are a few methods that each may be faster depending on your structure, but in general what youre going to want is a traversal.
A depth first search, goes through each possible route, keeping track of nodes that have already been visited. It's a recursive function, because at each node you have to branch and try each child node of it. There's no faster method if you dont know which way to look for the object you just have to try each way! You definitely need to keep track of where you have already been because it would be wasteful otherwise. It should require on the order of the number of nodes to do a full traversal.
A breadth first search is similar but visits each child of the node before "moving on" and as such builds up layers of distance from the chosen root. This can be faster if the destination is expected to be close to the root node. It would be slower if it is expected to be all the way down a path, because it forces you to traverse every possible edge.
Youre right about maybe keeping a list of known root nodes, the tradeoff there is that you basically have to do the search whenever you alter the graph. If you are altering the graph rarely this is acceptable, but if you alter the graph more frequently than you need to generate this information, then of course it is too costly.
EDIT: Info Update.
It sounds like we are actually looking for a path between two arbitrary nodes, the root/leaf semantic keeps getting switched. The DepthFirstSearch (DFS) starts at one node, and then for each unvisited child, recurse. Break if you find the target node. Due to the way recursion evaluates, this will traverse all the way down the 'left' path, then enumerate nodes at this distance before ever getting to the 'right' path. This is time costly and inefficient if the target node is potentially the first child on the right. BreadthFirst walks in steps, covering all children before moving forward. Because your graph is bottom heavy like a tree, both will be approximately the same execution time.
When the graph is bottom heavy you might be interested in a reverse traversal. Start at the target node and walk upwards, because there are relatively fewer nodes in this direction. So long as the nodes in general have more parents than children, this direction will be much faster. You can also combine the approaches, stepping one up and one down , then comparing lists of nodes, and meeting somewhere in the middle. (this combination might seem the fastest if you ignore that twice as much work is done at each step).
However, since you said that your graph is stored as a list of lists of children, you have no real way of traversing the graph backwards. A node does not know what its parents are. This is a problem. To fix it you have to get a node to know what its parents are by adding that data on graph update, or by creating a duplicate of the whole structure (which you have said is too large). It will need the whole structure to be rewritten, which sounds probably out of the question due to it being a large database at this point.
There's a lot of work to do.
http://en.wikipedia.org/wiki/Graph_(data_structure)
Just color (keep track of) visited nodes.
Sample in Python:
def reachable(nodes, edges, start, end):
color = {}
for n in nodes:
color[n] = False
q = [start]
while q:
n = q.pop()
if color[n]:
continue
color[n] = True
for adj in edges[n]:
q.append(adj)
return color[end]
For a vertex x you want to compute a bit array f(x), each bit corresponds to a root vertex Ri, and 1 (resp 0) means "x can (resp can't) be reached from root vertex Ri.
You could partition the graph into one "upper" set U containing all your target roots R and such that if x in U then all parents of x are in U. For example the set of all vertices at distance <=D from the closest Ri.
Keep U not too big, and precompute f for each vertex x of U.
Then, for a query vertex y: if y is in U, you already have the result. Otherwise recursively perform the query for all parents of y, caching the value f(x) for each visited vertex x (in a map for example), so you won't compute a value twice. The value of f(y) is the bitwise OR of the value of its parents.

Resources