locating lowest common ancestor in AVL tree - algorithm

I have an AVL tree and 2 keys in it. how do I find the lowest common ancestor (by lowest I mean hight, not value) with O(logn) complexity?
I've seen an answer here on stackoverflow, but I admit I didn't exactly understand it. it involved finding the routes from each key to the root and then comparing them. I'm not sure how this meets the complexity requirements

For the first node you move up and mark the nodes. For the second node you move up and look if a node on the path is marked. As soon as you find a marked node you can stop. (And remove the marks by doing the first path again).
If you cannot mark nodes in the tree directly then modify the values contained to include a place where you can mark. If you cannot do this either then add a hashmap that stores which nodes are marked.
This is O(logn) because the tree is O(logn) deep and at worst you walk 3 times to the root.
Also, if you wish you can alternate steps of the two paths instead of first walking the first path completely. (Note that then both paths have to check for marks.) This might be better if you expect the two nodes to have their ancestor somewhat locally. The asymptotic runtime is the same as above.

A better solution for the AVL tree (balanced binary search tree) is (I have used C pointers like notation)-
Let K1 and K2 be 2 keys, for which LCA is to be found. Assume K1 < K2
A pointer P = root of tree
If P->key >= K1 and P->key <= K2 : return P
Else if P->key > K1 and P->key > K2 : P = P->left
Else P = P->right
Repeat step 3 to 5
The returned P points to the required LCA.
Note that this approach works only for BST, not any other Binary tree.

Related

when exactly should root split in a B Tree

I learned B trees recently and from what I understand a node can have minimum t-1 keys and maximum 2t-1 keys given minimum degree t. Exception being root can have even 1 key.
Here is the example from CLRS 3rd edition Fig 18.7 (Page 498) where t=3
min keys = 3-1 = 2
max keys = 2*3-1 = 5
In the d) example when L is inserted why is the root splitted when it doesn't violate the B tree properties at the moment (It has 5 keys which is maximum allowed).
Why isn't inserting L into [J K L] without splitting [G M P T X] considered.
Should I always split the root when it reaches the maximum?
There are several variants of the insertion algorithm for B-trees. In this case the insertion algorithm is the "single pass down the tree" variant.
The background for this variant is given on page 493:
Since we cannot insert a key into a leaf node that is full, we introduce an operation that splits a full node 𝑦 (having 2𝑡 − 1 keys) around its median key 𝑦:key𝑡 into two nodes having only 𝑡 − 1 keys each. The median key moves up into 𝑦’s parent to identify the dividing point between the two new trees. But if 𝑦’s parent is also full, we must split it before we can insert the new key, and thus we could end up splitting full nodes all the way up the tree.
As with a binary search tree, we can insert a key into a B-tree in a single pass down the tree from the root to a leaf. To do so, we do not wait to find out whether we will actually need to split a full node in order to do the insertion. Instead, as we travel down the tree searching for the position where the new key belongs, we split each full node we come to along the way (including the leaf itself). Thus whenever we want to split a full node 𝑦, we are assured that its parent is not full.
In other words, this insertion algorithm will split a node earlier than might be strictly needed, in order to avoid to have to split nodes while backtracking out of recursion.
This algorithm is further described on page 495 with pseudo code.
This explains why at the insertion of L the root node is split immediately before any recursive call is made.
Alternative algorithms would not do this, and would delay the split up to the point when it is inevitable.

RB tree with sum

I have some questions about augmenting data structures:
Let S = {k1, . . . , kn} be a set of numbers. Design an efficient
data structure for S that supports the following two operations:
Insert(S, k) which inserts the
number k into S (you can assume that k is not contained in S yet), and TotalGreater(S, a)
which returns the sum of all keys ki ∈ S which are larger than a, that is, P ki∈S, ki>a ki .
Argue the running time of both operations and give pseudo-code for TotalGreater(S, a) (do not given pseudo-code for Insert(S, k)).
I don't understand how to do this, I was thinking of adding an extra field to the RB-tree called sum, but then it doesn't work because sometimes I need only the sum of the left nodes and sometimes I need the sum of the right nodes too.
So I was thinking of adding 2 fields called leftSum and rightSum and if the current node is > GivenValue then add the cached value of the sum of the sub nodes to the current sum value.
Can someone please help me with this?
You can just add a variable size to each node, which is the number of nodes in the subtree rooted at that node. When finding the node with the smallest value that is larger than the value a, two things can happen on the path to that node: you can go left or right. Every time you go left, you add the size of the right child + 1 to the running total. Every time you go right, you do nothing.
There are two conditions for termination. 1) we find a node containing the exact value a, in which case we add the size of its right child to the total. 2) we reach a leaf, in which case we add 1 if it is larger than a, or nothing if it is smaller.
As Jordi describes: The key-word could be augmented red-black tree.

Checking if A is a part of binary tree B

Let's say I have binary trees A and B and I want to know if A is a "part" of B. I am not only talking about subtrees. What I want to know is if B has all the nodes and edges that A does.
My thoughts were that since tree is essentially a graph, and I could view this question as a subgraph isomorphism problem (i.e. checking to see if A is a subgraph of B). But according to wikipedia this is an NP-complete problem.
http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
I know that you can check if A is a subtree of B or not with O(n) algorithms (e.g. using preorder and inorder traversals to flatten the trees to strings and checking for substrings). I was trying to modify this a little to see if I can also test for just "parts" as well, but to no avail. This is where I'm stuck.
Are there any other ways to view this problem other than using subgraph isomorphism? I'm thinking there must be faster methods since binary trees are much more restricted and simpler versions of graphs.
Thanks in advance!
EDIT: I realized that the worst case for even a brute force method for my question would only take O(m * n), which is polynomial. So I guess this isn't a NP-complete problem after all. Then my next question is, is there an algorithm that is faster than O(m*n)?
I would approach this problem in two steps:
Find the root of A in B (either BFS of DFS)
Verify that A is contained in B (giving that starting node), using a recursive algorithm, as below (I concocted same crazy pseudo-language, because you didn't specify the language. I think this should be understandable, no matter your background). Note that a is a node from A (initially the root) and b is a node from B (initially the node found in step 1)
function checkTrees(node a, node b) returns boolean
if a does not exist or b does not exist then
// base of the recursion
return false
else if a is different from b then
// compare the current nodes
return false
else
// check the children of a
boolean leftFound = true
boolean rightFound = true
if a.left exists then
// try to match the left child of a with
// every possible neighbor of b
leftFound = checkTrees(a.left, b.left)
or checkTrees(a.left, b.right)
or checkTrees(a.left, b.parent)
if a.right exists then
// try to match the right child of a with
// every possible neighbor of b
leftFound = checkTrees(a.right, b.left)
or checkTrees(a.right, b.right)
or checkTrees(a.right, b.parent)
return leftFound and rightFound
About the running time: let m be the number of nodes in A and n be the number of nodes in B. The search in the first step takes O(n) time. The running time of the second step depends on one crucial assumption I made, but that might be wrong: I assumed that every node of A is equal to at most one node of B. If that is the case, the running time of the second step is O(m) (because you can never search too far in the wrong direction). So the total running time would be O(m + n).
While writing down my assumption, I start to wonder whether that's not oversimplifying your case...
you could compare the trees in bottom-up as follows:
for each leaf in tree A, identify the corresponding node in tree B.
start a parallel traversal towards the root in both trees from the nodes just matched.
specifically, move to the parent of a node in A and subsequently move towards the root in B until you either encounter the corresponding node in B (proceed) or a marked node in A (see below, if a match in B is found proceed, else fail) or the root of B (fail)
mark all nodes visited in A.
you succeed, if you haven't failed ;-).
the main part of the algorithm runs in O(e_B) - in the worst case, all edges in B are visited a constant number of times. the leaf node matching will run in O(n_A * log n_B) if there the B vertices are sorted, O(n_A * log n_A + n_B * log n_B + n) = O(n_B * log n_B) (sort each node set, lienarly scan the results thereafter) otherwise.
EDIT:
re-reading your question, abovementioned step 2 is even easier, as for matching nodes in A, B, their parents must match too (otheriwse there would be a mismatch between the edge sets). no effect on worst-case run time, of course.

Generating suffix tree of string S[2..m] from suffix tree of string S[1..m]

Is there a fast (O(1) time complexity) way of generating a suffix tree of string S[2..m] from suffix tree of string S[1..m]?
I am familiar with Ukkonen's, so I know how to make fast suffix tree of string S[1..m+1] from suffix tree of string S[1..m], but I couldn't apply the algorithm for reverse situation.
Well, as #jogojapan says, to get the S[2..m] tree from the S[1..m] tree we need to:
Find the position-0 leaf L.
If L has more than one sibling, delete the pointer from L's parent to L
If L has exactly one sibling, change the pointer from L's grandparent to L's parent so it instead points to L's sibling.
#jogojapan further suggests that you keep a pointer to the deepest leaf in the tree. There are two problems with that: L isn't necessarily the deepest leaf in the tree, as Wikipedia's example shows, and second if you want to be able to output the same type of data structure as you received, once removing L you need to find the new position-0 leaf, which will take O(m) time anyway.
(What you could do is construct an array of pointers to each leaf in O(m) time and count-sort them by position in another O(m) time. Then you'd be able to construct all the trees { S[t..n] : 1 <= t <= m } in constant amortized time each)
Assuming you're not interested in amortized time though, let's prove what you ask is impossible.
We know any algorithm to modify the suffix tree of S[1..m] must start at the root: it can't start anywhere else because we know nothing about the underlying concrete data structure, and we don't know that the tree's nodes have parent pointers, so the only position the whole tree is accessible from is the root.
We also know that it must locate the position-0 leaf before it can hope to modify the data structure into the suffix tree for S[2..m]. To do this, it must obviously traverse every node between the root and the position-0 leaf.
Thing is, consider the suffix tree of a^m (the character a repeated m times): the length of the path is m-1. So any algorithm must visit at least m-1 nodes, and therefore take O(m) time in the worst case.

O(1) algorithm to determine if node is descendant of another node in a multiway tree?

Imagine the following tree:
A
/ \
B C
/ \ \
D E F
I'm looking for a way to query if for example F is a descendant of A (note: F doesn't need to be a direct descendant of A), which, in this particular case would be true. Only a limited amount of potential parent nodes need to be tested against a larger potential descendants node pool.
When testing whether a node is a descendant of a node in the potential parent pool, it needs to be tested against ALL potential parent nodes.
This is what a came up with:
Convert multiway tree to a trie, i.e. assign the following prefixes to every node in the above tree:
A = 1
B = 11
C = 12
D = 111
E = 112
F = 121
Then, reserve a bit array for every possible prefix size and add the parent nodes to be tested against, i.e. if C is added to the potential parent node pool, do:
1 2 3 <- Prefix length
*[1] [1] ...
[2] *[2] ...
[3] [3] ...
[4] [4] ...
... ...
When testing if a node is a descendant of a potential parent node, take its trie prefix, lookup the first character in the first "prefix array" (see above) and if it is present, lookup the second prefix character in the second "prefix array" and so on, i.e. testing F leads to:
F = 1 2 1
*[1] [1] ...
[2] *[2] ...
[3] [3] ...
[4] [4] ...
... ...
so yes F, is a descendant of C.
This test seems to be worst case O(n), where n = maximum prefix length = maximum tree depth, so its worst case is exactly equal to the obvious way of just going up the tree and comparing nodes. However, this performs much better if the tested node is near the bottom of the tree and the potential parent node is somewhere at the top. Combining both algorithms would mitigate both worst case scenarios. However, memory overhead is a concern.
Is there another way for doing that? Any pointers greatly appreciated!
Are your input trees always static? If so, then you can use a Lowest Common Ancestor algorithm to answer the is descendant question in O(1) time with an O(n) time/space construction. An LCA query is given two nodes and asked which is the lowest node in the tree whose subtree contains both nodes. Then you can answer the IsDescendent query with a single LCA query, if LCA(A, B) == A or LCA(A, B) == B, then one is the descendent of the other.
This Topcoder algorithm tuorial gives a thorough discussion of the problem and a few solutions at various levels of code complexity/efficiency.
I don't know if this would fit your problem, but one way to store hierarchies in databases, with quick "give me everything from this node and downwards" features is to store a "path".
For instance, for a tree that looks like this:
+-- b
|
a --+ +-- d
| |
+-- c --+
|
+-- e
you would store the rows as follows, assuming the letter in the above tree is the "id" of each row:
id path
a a
b a*b
c a*c
d a*c*d
e a*c*e
To find all descendants of a particular node, you would do a "STARTSWITH" query on the path column, ie. all nodes with a path that starts with a*c*
To find out if a particular node is a descendant of another node, you would see if the longest path started with the shortest path.
So for instance:
e is a descendant of a since a*c*e starts with a
d is a descendant of c since a*c*d starts with a*c
Would that be useful in your instance?
Traversing any tree will require "depth-of-tree" steps. Therefore if you maintain balanced tree structure it is provable that you will need O(log n) operations for your lookup operation. From what I understand your tree looks special and you can not maintain it in a balanced way, right? So O(n) will be possible. But this is bad during creation of the tree anyways, so you will probably die before you use the lookup anyway...
Depending on how often you will need that lookup operation compared to insert, you could decide to pay during insert to maintain an extra data structure. I would suggest a hashing if you really need amortized O(1). On every insert operation you put all parents of a node into a hashtable. By your description this could be O(n) items on a given insert. If you do n inserts this sounds bad (towards O(n^2)), but actually your tree can not degrade that bad, so you probably get an amortized overall hastable size of O(n log n). (actually, the log n part depends on the degration-degree of your tree. If you expect it to be maximal degraed, don't do it.)
So, you would pay about O(log n) on every insert, and get hashtable efficiency O(1) for a lookup.
For a M-way tree, instead of your bit array, why not just store the binary "trie id" (using M bits per level) with each node? For your example (assuming M==2) : A=0b01, B=0b0101, C=0b1001, ...
Then you can do the test in O(1):
bool IsParent(node* child, node* parent)
{
return ((child->id & parent->id) == parent->id)
}
You could compress the storage to ceil(lg2(M)) bits per level if you have a fast FindMSB() function which returns the position of the most significant bit set:
mask = (1<<( FindMSB(parent->id)+1) ) -1;
retunr (child->id&mask == parent->id);
In a pre-order traversal, every set of descendants is contiguous. For your example,
A B D E C F
+---------+ A
+---+ B
+ D
+ E
+-+ C
+ F
If you can preprocess, then all you need to do is number each node and compute the descendant interval.
If you can't preprocess, then a link/cut tree offers O(log n) performance for both updates and queries.
You can answer query of the form "Is node A a descendant of node B?" in constant time, by just using two auxiliary arrays.
Preprocess the tree, by visiting in Depth-First order, and for each node A store its starting and ending time in the visit in the two arrays Start[] and End[].
So, let us say that End[u] and Start[u] are respectively the ending and starting time of the visit of node u.
Then node u is a descendant of node v if and only if:
Start[v] <= Start[u] and End[u] <= End[v].
and you are done, checking this condition requires just two lookup in the arrays Start and End
Take a look at Nested set model It's very effective to select but too slow to update
For what it's worth, what you're asking for here is equivalent to testing if a class is a subtype of another class in a class hierarchy, and in implementations like CPython this is just done the good old fashioned "iterate the parents looking for the parent" way.

Resources