Dynamic sets using RB trees - algorithm

let us say we have a dynamic set of S of integers and an index i, we wish to find the i-th smallest negative number in S written in increasing order, if any.
example:
S= {-5, -2, -1, 2, 5} the naswer is -1 for i=3 and is undefined for i = 4.
the objective is to choose the red-black tree as an underlying data structure and define an additional attribute that allows to solve the problem in O(lg n) time. Any guides on the algorithm should be used to solve such a question?

It's called Order Statistic Tree (https://en.wikipedia.org/wiki/Order_statistic_tree).
In general, you extend your tree node with an extra attribute, the size of subtrees. For a leaf, it's 1, for an inner node, it's
size(left_subtree) + size(right_subtree) + 1
Wiki has a clear explanation and pseudocode. It works with any kind of balanced tree (RB/AVl/Treap/etc), you need to support the size of subtrees during rotation (or any tree modification).

Related

How to decide a value resides in which subtree from a root for a level order binary tree?

I have a level order complete binary tree rooted from 2, as in below figure.
Given a root value and another value v, how can I decide whether v is on left or right subtree of the tree, without traversing the tree?
For example: Let's say root = 2, v = 15. I want to decide using a mathematical function or something that v is in right subtree.
Another example could be, root = 3, v = 10. Answer should be left subtree.
I know I can do this by a tree traversal. I want to know if this is possible in O(1).
It is unclear from your question if you want O(1) to be the time complexity or space complexity.
But, I assume you are talking about the time complexity as space is abundant these days.
If the space complexity permits, there is an approach using which you can query the subtree with a search value in constant time.
The idea is to store all the ancestors of the node with proper direction.
For example:
Let's assume Node 11 to be the target node.
In a single traversal, we can maintain a separate ancestors map for all the nodes containing the respective ancestor and direction to reach the target node.
Starting from the root, Node 2.
Node 2 has no parent, therefore, its ancestors map will be empty.
For Node 3, store a key value pair <2, L> (2 for parent and L for left).
Likewise, for Node 4, store a key value pair <2, R> (2 for parent and R for right).
For Node 6, the ancestors map looks like:
{
2 : "L",
3 : "R"
}
Repeat the procedure for until we cover each node.
Now, the ancestors map for Node 11 will look like as follows:
{
2 : "L",
3 : "R",
6 : "L"
}
Just check if the value of the root of the subtree is present in the ancestors map of Node 11.
If present, just return its value, which denotes the left/right subtree, in constant time.
PS: Using unordered map can be beneficial in such case.
Also, as it is a binary tree, the maximum height for N nodes, will be log2(N).
Therefore, space complexity required is O(N * log2(N)).
The time complexity to for insertion into unordered map is O(1) on average.
Therefore, time complexity for building all the maps = O(N * log2(N) * some constant factor).
Time complexity for queuing = constant ~ O(1).
For, N <= 10^5, the logic for building the ancestors map can be executed within 1 second.

Finding the minimum "internal" nodes of AVL tree?

I know how to find the minimum numbers of nodes in an AVL tree of height h (which includes external nodes) with the formula n(h) = n(h-1) + n(h-2) + 1 but I was wondering if there was a formula to just find the minimum internal nodes only of an AVL tree with height h.
So for n(3) = 4, if we're only counting internal nodes. n(4) = 7, if we're only counting internal nodes. I can draw it out and count the internal nodes but when you get to bigger AVL trees it's a mess.
I can't seem to find anything on this and trying to find a pattern with consistent answers has only led to hours of frustration. Thanks in advance.
Yep, there’s a nice way to calculate this. Let’s begin with the two simplest AVL trees, which have order 0 and order 1:
* *
|
*
This first tree has no internal nodes, and the second has one internal node. This gives us our base cases for a recurrence relation:
I(0) = 0
I(1) = 1
From here, we notice that the way to get the fewest internal nodes in an AVL tree of order n+2 is to pick two trees of order n and n+1 as children (minimizing the number of nodes) that have the fewest internal nodes possible. The resulting tree will have a number of internal nodes equal to the number of internal nodes in the two subtrees, plus one for the new root. This means that
I(n+2) = I(n) + I(n+1) + 1.
Applying this recurrence gives us the sequence
0, 1, 2, 4, 7, 12, 20, etc.
And hey - have we seen this before somewhere? We have! Adding one to each term gives us
1, 2, 3, 5, 8, 13, 21, etc.
which is the Fibonacci sequence, shifted down two positions! So our hypothesis is that
I(n) = F(n+2) - 1
You can prove that this is the case by induction on n.
Here’s a different way to arrive at this result. Imagine you take an AVL tree of height n and remove all the leaves. You’re now left with an AVL tree of height n-1 (prove this!), and all of the remaining nodes in this tree are the internal nodes of the original tree. The smallest possible number of nodes in an AVL tree of height n is F(n+2)-1, matching our result.

Optimal solution for tree traversal and summing node values on condition

Hi all i have algorithmic problem and i struggle with finding optimal solution. I have tree which i want to traverse. Nodes of the tree consist of value and a rank of node (value as well as rank can be random number).
What i want to do is traverse tree and for each node i want to sum values from all descendant nodes except descendants with lower rank and all nodes under them (irrespective of rank). Tree does not have any special properties as every node can have <0, Integer.MAX_VALUE> of children. There are no rules applied on relations between parent and children regarding rank or value.
My naive solution is to recursively traverse subtree for each node and stop recursion as soon as i find node with lower rank, summing values on my way back to root of subtree. However this feels to me like suboptimal solution (in worst case - that is each node has only one descendant - its basically linked list and ranks are sorted ascending to the root this solution would be O(n^2)).
It is possible to have sums for all nodes settled after one traversal?
Edit:
Solution, slightly better as my naive approach could be for every node visited propagate its value recursively back to the root while keeping minimum rank of visited nodes (during back to root traversal). Then adding value only to nodes that have lower than minimal rank.
Edit from my phone: since we only ever inspect the value of tree roots, we can use a disjoint sets structure with path compression only instead of a dynamic tree. Don't bother updating non-roots.
Here's an O(n log n)-time algorithm with dynamic trees. (I know. They're a pain to implement. I try not to include them in answers.)
Sort the nodes from greatest to least rank and initialize a totally disconnected dynamic tree with their values. For each node in order, issue dynamic tree operations to
Report its current value (O(log n) amortized, output this value eventually), and
If it's not the root, add its value to each of its ancestors' values (O(log n) amortized) and then link it to its parent (also O(log n) amortized).
The effect of Step 2 is that each node's dynamic value is the sum of its descendants' (relative to the present tree) original values.
Edit
Not correct answer, does not solve the problem asked by the OP (cf comment)
Old answer (before edit)
You can see that this problem (like most problem on trees) can be solved with a recursive approach. That is because the sum value of a node depends only on the sum values and respective ranks of its children.
Here is a pseudo-code describing a solution.
get_sum(my_node, result_arr):
my_sum = 0
for child in my_node.children():
get_sum(child, result_arr) // we compute the sum value of the children
if rank(child) >= rank(my_node): // if child node is big enough add its sum
my_sum += result_arr[child]
result_arr[my_node] = my_sum // store the result somewhere
This is a BFS based algorithm, which should run in O(n) with n the number of nodes in your tree. To get the values for all the nodes, call this recursive function on the root node of your tree.
I suggest you a postfixed DFS. For every node, keep the reference of its predecessor.
the sum_by_rank for a leaf is an empty dict;
the sum_by_rank for any node is the dict ranks of all subnodes -> values of all subodes. If two or more subnodes have the same rank, just add their values.
The postfixed DFS allows you to compute the sums from bottom to up.
Here's some Python 3.7 program to play with (the code is probably not optimized):
from dataclasses import dataclass
from typing import List, Dict
#dataclass
class Node:
value: int
rank: int
sum_by_rank: Dict[int, int]
children: List[object]
tree = Node(5, 0, {}, [
Node(4,2, {}, [Node(3,1, {}, [])]),
Node(7,2, {}, [Node(11,1, {}, [Node(7,8, {}, [])])]),
Node(8,4, {}, [Node(3,3, {}, []), Node(9,5, {}, []), Node(4,2, {}, [])]),
])
def dfs(node, previous=None):
for child in node.children:
dfs(child, node)
node.sum_by_rank[child.rank] = node.sum_by_rank.get(child.rank, 0) + child.value # add this children rank -> value
# add the subtree rank -> value
for r,v in child.sum_by_rank.items():
node.sum_by_rank[r] = node.sum_by_rank.get(r, 0)+v
dfs(tree)
print (tree)
# Node(value=5, rank=0, sum_by_rank={2: 15, 1: 14, 8: 7, 4: 8, 3: 3, 5: 9}, children=[Node(value=4, rank=2, sum_by_rank={1: 3}, children=[Node(value=3, rank=1, sum_by_rank={}, children=[])]), Node(value=7, rank=2, sum_by_rank={1: 11, 8: 7}, children=[Node(value=11, rank=1, sum_by_rank={8: 7}, children=[Node(value=7, rank=8, sum_by_rank={}, children=[])])]), Node(value=8, rank=4, sum_by_rank={3: 3, 5: 9, 2: 4}, children=[Node(value=3, rank=3, sum_by_rank={}, children=[]), Node(value=9, rank=5, sum_by_rank={}, children=[]), Node(value=4, rank=2, sum_by_rank={}, children=[])])])
Hence, to get the sum of a node, just add the values is associated with the ranks greater or equal to the node rank. In Python:
sum(value for rank, value in node.sum_by_rank.items() where rank >= node.rank)
Let a, b, c be nodes. Assume that a is an ancestor or b, and b is an ancestor of c.
Observe: if b.rank > c.rank and a.rank > b.rank then a.rank > c.rank.
This leads us to the conclusion that the sum_by_rank of a is equal to the sum of sum_by_rank(b) + b.value for every b direct child of a, having a rank lower than a.
That suggests the following recursion:
ComputeRank(v)
if v is null
return
let sum = 0
foreach child in v.children
ComputeRank(child)
if child.rank <= v.rank
sum += child.sumByRank + child.value
v.sumByRank = sum
By the end of the algorithm each node will have it's sumByRank as you required (if I understood correctly).
Observe that for each node n in the input tree, the algorithm will visit n exactly once, and query it once again while visiting it's predecessor. This is a constant number of times, meaning the algorithm will take O(N) time.
Hope it helps :)

What data structure should I use for these operations?

I need a data structure that that stores a subset—call it S—of {1, . . . , n} (n given initially)
and supports just these operations:
• Initially: n is given, S = {1, . . . , n} at the beginning.
• delete(i): Delete i from S. If i isn't in S already, no effect.
• pred(i): Return the predecessor in S of i. This means max{j ∈ S | j < i}, the greatest element in S
that is strictly less than i. If there is none, return 0. The parameter i is guaranteed to be in {1, . . . , n},
but may or may not be in S.
For example, if n = 7 and S = {1, 3, 6, 7}, then pred(1) returns 0, pred(2) and pred(3) return 1.
I need to figure out:
a data structure that represents S
an algorithm for initialization (O(n) time)
an algorithm for delete (O(α(n)) amortized time)
an algorithm for pred (O(α(n)) amortized time)
Would appreciate any help (I don't need code - just the algorithms).
You can use Disjoint-set data structure.
Let's represent our subset as disjoint-set. Each element of the disjoint-set is an element of the subset i (including always present zero) unioned with all absent elements in the set that is greater than i and less than next set element.
Example:
n = 10
s = [1, 4, 7, 8], disjoint-set = [{0}, {1,2,3}, {4,5,6}, {7}, {8, 9, 10}]
s = [3, 5, 6, 10], disjoint-set = [{0, 1, 2}, {3, 4}, {5}, {6, 7, 8, 9}, {10}]
Initially, we have a full set that is represented by n+1 disjoint-set elements (with zero included). Usually, every disjoint-set element is a rooted tree, and we can store the leftmost number in the element for every tree root.
Let's leftmost(i) is a leftmost value of a disjoint-set element that contains i.
leftmost(i) operation is similar to Find operation of a disjoint-set. We just go from i to the root of the element and return the leftmost number stored for the root. Complexity: O(α(n))
We can check if i is in the subset comparing i with leftmost(i). If they are equal (and i > 0) then i is in the subset.
pred(i) will be equal to leftmost(i) if i is not in the subset, and equal to leftmost(i-1) if i is in the subset. Complexity: O(α(n))
On every delete(i) operation we check if i is in the subset at first. If i is in the subset we should union an element containing i with the left neighbor element (this is the element that contains i-1). This operation is similar to Union operation of a disjoint-set. The leftmost number of resulting tree will be equal to leftmost(i-1). Complexity: O(α(n))
Edit: I've just noticed "strictly less than i" in the question, changed description a bit.
I'm not sure if there is a data structure that can guarantee all these properties in O(α(n)) time, but a good start would be predecessor data structures like van Emde Boas trees or y-fast tries
The vEB tree works is defined recursively based on the binary representation of the element indices. Let's assume that n=2^b for some b=2^k
If we have only two elements, store the minimum and maximum
Otherwise, we divide the binary representation of all the elements into the upper and lower b/2 bits.
We build a vEB tree ('summary') for the upper bits of all elements and √n vBE trees for the lower bits (one for every choice of the upper bits). Additionally, we store the minimum and maximum element.
This gives you O(n) space usage and O(log log n) = O(k) time for search, insertion and deletion.
Note however that the constant factors involved might be very large. If your n is representable in 32bit, at least I know of this report by Dementiev et al. breaking the recursion when the problem sizes are solvable more easily with other techniques
The idea of y-fast tries builds on x-fast tries:
They are most simply described as a trie based on the binary representation of its elements, combined with a hash table for every level and some additional pointers.
y-fast tries reduce the space usage by splitting the elements in nearly equally-sized partitions and choosing representatives (maximum) from them, over which an x-fast trie is built. Searches within the partitions are then realized using normal balanced search trees.
The space usage and time complexity are comparable to the vEBs. I'm guessing the constant factors will be a bit smaller than a naïve implementation of vEBs, but the claim is only based on intuition.
A last note: Always keep in mind that log log n < 6, which will probably not change in the near future
In terms of providing with an O(α(n)) time, it really becomes tricky. Here is my idea of approaching this:
Since we know the range of i, which is from 1 to n, we can first form a self balancing BST like AVL tree. The nodes of this AVL tree shall be the objects of DataNode. Here is how it might look like:
public class DataNode{
int value;
boolean type;
DataNode(int value, boolean type){
this.value = value;
this.type = type;
}
}
The value would simply consist of all the values in range 1 to n. The type variable would be assigned value as true if the item we are inserting in the tree is present in the set S. If not, it would be marked as false.
This would take O(n) time for creation. Deletion can be done in O(logn) time.
For pred(i), we can achieve average case time complexity to be around O(logn) if I am correct. The algorithm for pred(i) shall be something like this:
Locate the element i in the tree. If type is true, then return the inorder predecessor of this element i if the type value of this predecessor is true.
If it is false, recur for the next predecessor of this element(i.e. predecessor of i-1) until we find an element i whose type = true.
If no such predecessor is found such that type = true, then return 0.
I hope we can optimize this approach further to make it in O(α(n)) for pred(i).

How to generate an AVL tree as lopsided as possible?

I saw this in some paper and someone argued that there can be at most log(n) times rotation when we delete a node of an AVL tree. I believe we can achieve this by generating an AVL tree as lopsided as possible. The problem is how to do this. This will help me a lot about researching the removal rotation thing. Thanks very much!
If you want to make a maximally lopsided AVL tree, you are looking for a Fibonacci tree, which is defined inductively as follows:
A Fibonacci tree of order 0 is empty.
A Fibonacci tree of order 1 is a single node.
A Fibonacci tree of order n + 2 is a node whose left child is a Fibonacci tree of order n and whose right child is a Fibonacci tree of order n + 1.
For example, here's a Fibonacci tree of order 5:
The Fibonacci trees represent the maximum amount of skew that an AVL tree can have, since if the balance factor were any more lopsided the balance factor of each node would exceed the limits placed by AVL trees.
You can use this definition to very easily generate maximally lopsided AVL trees:
function FibonacciTree(int order) {
if order = 0, return the empty tree.
if order = 1, create a single node and return it.
otherwise:
let left = FibonacciTree(order - 2)
let right = FibonacciTree(order - 1)
return a tree whose left child is "left" and whose right child is "right."
Hope this helps!

Resources