RB tree with sum - algorithm

I have some questions about augmenting data structures:
Let S = {k1, . . . , kn} be a set of numbers. Design an efficient
data structure for S that supports the following two operations:
Insert(S, k) which inserts the
number k into S (you can assume that k is not contained in S yet), and TotalGreater(S, a)
which returns the sum of all keys ki ∈ S which are larger than a, that is, P ki∈S, ki>a ki .
Argue the running time of both operations and give pseudo-code for TotalGreater(S, a) (do not given pseudo-code for Insert(S, k)).
I don't understand how to do this, I was thinking of adding an extra field to the RB-tree called sum, but then it doesn't work because sometimes I need only the sum of the left nodes and sometimes I need the sum of the right nodes too.
So I was thinking of adding 2 fields called leftSum and rightSum and if the current node is > GivenValue then add the cached value of the sum of the sub nodes to the current sum value.
Can someone please help me with this?

You can just add a variable size to each node, which is the number of nodes in the subtree rooted at that node. When finding the node with the smallest value that is larger than the value a, two things can happen on the path to that node: you can go left or right. Every time you go left, you add the size of the right child + 1 to the running total. Every time you go right, you do nothing.
There are two conditions for termination. 1) we find a node containing the exact value a, in which case we add the size of its right child to the total. 2) we reach a leaf, in which case we add 1 if it is larger than a, or nothing if it is smaller.

As Jordi describes: The key-word could be augmented red-black tree.

Related

Algorithmic Puzzle: Distinct nodes in a subtree

I am trying to solve this question:
You are given a rooted tree consisting of n nodes. The nodes are
numbered 1,2,…,n, and node 1 is the root. Each node has a color.
Your task is to determine for each node the number of distinct colors
in the subtree of the node.
The brute force solution is to store a set for each node and them cumulatively merge them in a depth first search. That would run in n^2, not very efficient.
How do I solve this (and the same class of problems) efficiently?
For each node,
Recursively traverse the left and right nodes.
Have each call return a HashSet of color.
At each node, merge the left child set, the right child set .
Update the count for the current node in a HashMap.
Add the color of current node and return the set.
Sample C# code:
public Dictionary<Node, Integer> distinctColorCount = new ...
public HashSet<Color> GetUniqueColorsTill (TreeNode t) {
// If null node, return empty set.
if (t == null) return new HashSet<Color>();
// If we reached here, we are at a non-null node.
// First get the set from its left child.
var lSet = GetUniqueColorsTill(t.Left);
// Second get the set from its right child.
var rSet = GetUniqueColorsTill(t.Right);
// Now, merge the two sets.
// Can be a little clever here. Merge smaller set to bigger set.
var returnSet = rSet;
returnSet.AddAll(lSet);
// Put the count for this node in the dictionary.
distinctColorCount[t] = returnSet.Count;
// Finally, add the color of current node and return.
returnSet.Add(t.Color);
return returnSet;
}
You can figure out the complexity exactly as #user58697 commented on your question using the Master Theorem. This is another answer from me written long time ago that explains Master Theorem, if you need a refresher.
c#
First of all, you'd want to change tree into a list. This technique is often called 'Euler Tour'.
Basically you make an empty list and run DFS. If you visit a node first or last time, push it's color at the end of the list. In this way you'll get list of length 2 * n, where n is equal to number of nodes. It's easy to see that in the list, all colors corresponding to node's children are between its first and last occurrence. Now instead of tree and queries 'how many different colors are there in node's subtree' you have list and queries 'how many different colors are there between index i-th and j-th'. That actually makes things a lot easier.
First idea -- MO's technique O(n sqrt(n)):
I will describe it briefly, I strongly recommend searching up MO's technique, it is well explained in many sources.
Sort all your queries (remainder, they look like this: given pair (i, j) find all distinct numbers in sub-array from index i to index j) by their start. Make sqrt(n) buckets, place query starting from index i to bucket number i / sqrt(n).
For each bucket we will answer the queries separately. Sort all queries in the bucket by their end. Now start processing the first one (the query which end is most to the left) using brute force (iterate over the subarray, store numbers in set/hashset/map/whatever, get size of the set).
Now to process the next one, we shall add some numbers at the end (next query ends farther than the previous one!) and, unfortunately, do something about its start. We'd need to either delete some numbers from the set (if the next query's start > old query start) or add some numbers from the beginning (if the next query's start < old query start). However, we may do it using brute force too, since all queries have start in the same segment of sqrt(n) indices! In total we get O(n sqrt(n)) time complexity.
Second idea -- check this out, O(n log n): Is it possible to query number of distinct integers in a range in O(lg N)?

Queries on Tree Path with Modifications

Question:
You are given a Tree with n Nodes(can be upto 10^5) and n-1 bidirectional edges. And lets say each node contains two values:
It's index(Just a unique number for node), lets say it will be from 1 to n.
And It's value Vi, which can vary from 1 to 10^8
Now there will be multiple same type of queries(number of queries can be upto 10^5) on this same tree, as follows:
You are given node1, node2 and a value P(can vary from 1 to 10^8).
And for every this type of query, you just have to find number of nodes in path from node1 to node2 whose value is less than P.
NOTE: There will be unique path between all the nodes and no two edges belong to same pair of nodes.
Required Time Complexity O(nLog(n)) or can be in other terms but should be solvable in 1 Sec with given constraints.
What I have Tried:
(A). I could solve it easily if value of P would be fixed, using LCA approach in O(nLog(n)) by storing following info at each node:
Number of nodes whose value is less than P, from root to given node.
But here P is varying way too much so this will not help.
(B). Other approach I was thinking is, using simple DFS. But that will also take O(nq), where q is number of queries. Again as n and q both are varying between 1 to 10^5 so this will not help too in given time constraint.
I could not think anything else. Any help would be appreciated. :)
Source:
I read this problem somewhere on SPOJ I guess. But cannot find it now. Tried searching it on Web but could not find solution for it anywhere (Codeforces, CodeChef, SPOJ, StackOverflow).
Let ans(v, P) be the answer on a vertical path from the root to v and the given value of P.
How can we compute it? There's a simple offline solution: we can store all queries for the given node in a vector associated with it, run the depth-first search keep all values on the current path from the path in data structure that can do the following:
add a value
delete a value
count the number elements smaller than X
Any balanced binary-search tree would do. You can make it even simpler: if you know all the queries beforehand, you can compress the numbers so that they're in the [0..n - 1] range and use a binary index tree.
Back to the original problem: the answer to a (u, v, P) query is clearly ans(v, P) + ans(u, P) - 2 * ans(LCA(u, v), P).
That's it. The time complexity is O((N + Q) log N).

Create a binary search tree with a better complexity

You are given a number which is the root of a binary search tree. Then you are given an array of N elements which you have to insert into the binary search tree. The time complexity is N^2 if the array is in the sorted order. I need to get the same tree structure in a much better complexity (say NlogN). I tried it a lot but wasn't able to solve it. Can somebody help?
I assume that all numbers are distinct (if it's not the case, you can use a pair (number, index) instead).
Let's assume that we want to insert we want to insert an element X. If it's the smallest/the largest element so far, its clear where it goes.
Let's a = max y: y in tree and y < X and b = min y: y in tree and y > X. I claim that:
One of them is an ancestor of the other.
Either a doesn't have the right child or b doesn't have the left child.
Proof:
Let it not be the case. Let l = lca(a, b). As a is in its left subtree and b is in it's right subtree, a < l < b. Contradiction.
Let a be an ancestor of b. If b has a left child c. Than a < c < b. Contradiction (the other case is handled similarly).
So the solution goes like this:
Let's a keep a set of elements that are already in a tree (I mean an efficient set with lower_bound operation like std::set in C++ or TreeSet in Java).
Let's find a and b as described above upon every insertion (in O(log N) time using the set's lower_bound operation). Exactly one of them doesn't have an appropriate child. That's where the new element goes.
The total time complexity is clearly O(N log N).
If you look up a word in a dictionary, you open the dictionary about halfway and look at the page. That then tells you if the search word is in the first or second half of the dictionary. Repeat, eliminating half the remaining words on each pass, and you soon narrow it down to a single word. 4 billion word dictionaries will take about 32 passes.
A binary search tree uses the same principle. Except as well as looking up, you can also insert. Insertion is O(log N), unless the tree becomes degenerate.
To prevent the tree going degenerate, you use a system of "red" and "black" nodes (the colours are just conventional), and you don't allow long runs of
either colour. The full explanation is in my book, Basic Algorithms
http://www.lulu.com/spotlight/bgy1mm
An implementation is here
https://github.com/MalcolmMcLean/babyxrc/blob/master/src/rbtree.c
https://github.com/MalcolmMcLean/babyxrc/blob/master/src/rbtree.h
But you will need some explanation if you want to learn about red black
trees from it.

Missing number in binary search tree

If I have order statistic binary balanced tree that has n different integers as its keys and I want to write function find(x) that returns the minimal integer that is not in the tree, and is greater than x. in O(log(n)) time.
For example, if the keys in the tree are 6,7,8,10,11,13,14 then find(6)=9, find(8)=9, find(10)=12, find(13)=15.
I think about finding the max in O(log(n)) and the index of x (mark i_x) in O(log(n)) then if i_x=n-(m-x) then I can simply return max+1.
By index I mean in 6,7,8,10,11,13,14 that index of 6 is 0 and index of 10 is 3 for example...
But I'm having trouble with the other cases...
According to wikipedia, an order statistic tree supports those two operations in log(n) time:
Select(i) — find the i'th smallest element stored in the tree in O(log(n))
Rank(x) – find the rank of element x in the tree, i.e. its index in the sorted list of elements of the tree in O(log(n))
Start by getting the rank of x, and select the superior ranks of x until you find a place to insert your missing element. But this has worst-case n*log(n).
So instead, once you have the rank of x, you do a kind of binary search. The basic idea is whether there is a space between number x and y which are in the tree. There is a space if rank(x) - rank(y) != x - y.
General case is: when searching for the number in the interval [lo,hi] (lo and hi are ranks in the tree, mid is the middle rank), if there is a space between lo and mid then search inside [lo,mid], else search inside [mid, hi].
You will end up finding the number you seek.
However, this solution does not run in log(n) time, but in log^2(n). This is the best I can think of for a general solution.
EDIT:
Well, it's a tough question, I changed my mind several times. Here is what I came up with:
I assume that the left node holds inferior value and the right node holds superior value
Intuition of find(x): Start at the root and go down the tree almost like in a standard binary tree. If the branch we want to go does not contain the solution of find(x) then cut it.
We'll go through the basic cases first:
If the node I found is null, then I am done, and I return the value I was looking for.
If the current value is less than the one I am looking for, I search for x in the right subtree
If I found the node containing x, then I search for x+1 on the right subtree.
The case where x is in the left subtree is more tricky, because it may contain x, x+1, x+2, x+3, etc up to y-1 where y is the value stored in the current node. In this case, we want to search for y+1 in the right subtree.
However, if all the numbers from x to y are not in the left subtree (that is, there is a gap), then we will find a value in it, so we look into the left subtree for x.
Question is: How to find if the sequence from x to y is present in the subtree ?
The algorithm in python looks like this:
def find(node, x):
if node == null:
return x
if node.data < x:
return find(node.right, x)
if node.data == x:
return find(node.right, x+1)
if is_full(...):
return find(node.right, node.data+1)
return find(node.left, x)
To get the smallest value strictly greater than x which is not in the tree, the first call is find(root, x+1). If you want the smallest value greater than or equals to x that is not in the tree, the first call is find(root, x).
The is_full method checks if the left subtree contains all number from x to node.data-1.
Now, using this as a starting point, I believe you can find a suitable solution by yourself, using the fact that the number of nodes contained in each subtree is stored at the subtree's root.
I faced a similar question.
There were no restrictions about finding greater than some x, simply find the missing element in the BST.
Below is my answer, it is perfectly possible to do so in O(lg(n)) time, with the assumption that, tree is almost balanced. You might want to consider the proof that expected height of the randomly built BST is lg(n) given n elements. I use a simpler notation, O(h) where h = height of the tree, so two things are now separate.
assumptions and/or requirements:
I enhance the data structure. store the count of (left_subtree + right_subtree + 1) at each node.
Obviously, count of a single node is 1
This count is pre-computed and stored at each node
Kindly pardon my multiple notations for not equal to (=/= and !=)
Also note that code might be structured in little better way if one is to write a working code on a machine.
Moreover, I think, at this point in time, that this is correct. I tried as many corner cases as I could think of, and in general it works. Even if there is a counter example, I don;t think it will be that difficult to modify the code to fit that particular case; but please comment the counter example, I am interested.

Design a data structure

I am trying to design a data structure that stores elements according to some prescribed ordering, each element with its own value, and that supports each of the following
four operations in logarithmic time (amortized or worst-case, your choice):
add a new element of value v in the kth position
delete the kth element
returns the sum of the values of elements i through j
increase by x the values of elements i through j
Any Idea will be appreciated,
Thanks
I suspect you could do it with a red-black tree. Over the classic red-black tree, each node would need the following additional fields:
size
sum
increment
The size field would track the total number of child nodes, allowing for log(n) time insertion and deletion.
The sum field would track the sum of its child nodes, allowing for log(n) time summing.
The increment field would be used to track an increment to each of its child nodes which would be added on when calculating sums. So, when calculating the final sum, we would return sum + size*increment. This is the trickiest one. The increment field would be added on when calculating sums. I think by adding positive and negative increments at the appropriate nodes, it would be possible to alter the returned sum correctly in all cases by altering only log(n) nodes.
Needless to say, implementation would be very tricky. Sum and increment fields would have to be updated after each insertion and deletion, and each would have at least five cases to deal with.
Update: I'm not going to try to solve this completely, but I would note that incrementing i through j by n is equivalent to incrementing the whole tree by n, then decrementing 0 through i by n and decrementing j through to the end by n. A global increment can be done in constant time, with the other two operations being a 'left side decrement' and a 'right side decrement', which are symmetrical. Doing a left side decrement to i would be something like, 'take the count of the left subtree of the root node. If it the count is less than i, decrement the increment field on the left child of root by n. Then apply a left decrement of n to to right sub-tree of the root node up to i - count(left subtree) elements. Alternatively, if the count is greater than i, decrement the increment field of the left-left grandchild of the root by n, then apply a left decrement of n to the left-right subtree of the root up to count (left-left subtree) '. As the tree is balanced, I think the left decrement operation need only be recursively applied ln(n) times. The right decrement would be similar, but reversed.
What you're asking for isn't feasible.
Requirement #3 might be possible, but #4 just can't be done in logarithmic time. You have to edit at most every node. Imagine i is 0 and j is n-1. You'd have to edit every node. Even with constant access that's linear time.
Edit:
Upon further consideration, if you kept track of "mass increases" you could potentially control access to a node, decorating it on the way out with whatever mass increases it required. I still think it would entirely unweildly, but I suppose it's possible.
Requirement 1, 2 and 3 can be satisfied by Binary Indexed Tree (BIT, Fenwick Tree):
http://community.topcoder.com/tc?module=Static&d1=tutorials&d2=binaryIndexedTrees
I am thinking of a way to modify BIT to work with #4 in logarithm complexity.

Resources