I need to find a data structure which I can do with the following actions:
Build(S,k) - O(nlogn)
Search(S,k) - O(logn)
Insert(S,k) - O(logn)
Delete(S,k) - O(logn)
Decrease-Upto(s,k,d) - O(logn) - this method should subtract d(d>0) every node which is <=k
The obvious first choise was RedBlackTree.
However, I can't come to a solution regarding Decrease-Upto in O(Logn).
what happens if k is greater then the max key in the tree - that case i gotta update the whole tree.
Can someone suggest otherwise ? maybe some tips ?
You can store an extra value in each node of the tree, let's call it delta. You add delta of a node to keys stored in all its descendant to get the actual keys. So, to get the actual value of a key in a particular node, you sum all deltas from the root to that node and add this sum to the stored key.
To do Decrease-Upto, you just change deltas of O(log n) nodes on one path from root.
You don't have to change the structure of the tree after this operation, because is doesn't change ordering of the keys.
Related
Is there general pseudocode or related data structure to get the nth value of a b-tree? For example, the eighth value of this tree is 13 [1,4,9,9,11,11,12,13].
If I have some values sorted in a b-tree, I would like to find the nth value without having to go through the entire tree. Is there a better structure for this problem? The data order could update anytime.
You are looking for order statistics tree. The idea of it, is in addition to any data stored in nodes - also store the size of the subtree in the node, and keep them updated in insertions and deletions.
Since you are "touching" O(logn) nodes for each insert/delete operation - keeping it up to date still keeps the O(logn) behavior of these.
FindKth() is then done by eliminating subtrees that their bigger index is still smaller than k, and checking the next one. Since you don't need to go to the depth of each subtree, only directly to the required one (and checking the nodes in the path to this element) - you need to "touch" O(logn) nodes, which makes this operation O(logn) as well.
Build a Data structure that has functions:
set(arr,n) - initialize the structure with array arr of length n. Time O(n)
fetch(i) - fetch arr[i]. Time O(log(n))
invert(k,j) - (when 0 <= k <= j <= n) inverts the sub-array [k,j]. meaning [4,7,2,8,5,4] with invert(2,5) becomes [4,7,4,5,8,2]. Time O(log(n))
How about saving the indices in binary search tree and using a flag saying the index is inverted? But if I do more than 1 invert, it mess it up.
Here is how we can approach designing such a data structure.
Indeed, using a balanced binary search tree is a good idea to start.
First, let us store array elements as pairs (index, value).
Naturally, the elements are sorted by index, so that the in-order traversal of a tree will yield the array in its original order.
Now, if we maintain a balanced binary search tree, and store the size of the subtree in each node, we can already do fetch in O(log n).
Next, let us only pretend we store the index.
Instead, we still arrange elements as we did with (index, value) pairs, but store only the value.
The index is now stored implicitly and can be calculated as follows.
Start from the root and go down to the target node.
Whenever we move to a left subtree, the index does not change.
When moving to a right subtree, add the size of the left subtree plus one (the size of the current vertex) to the index.
What we got at this point is a fixed-length array stored in a balanced binary search tree. It takes O(log n) to access (read or write) any element, as opposed to O(1) for a plain fixed-length array, so it is about time to get some benefit for all the trouble.
The next step is to devise a way to split our array into left and right parts in O(log n) given the required size of the left part, and merge two arrays by concatenation.
This step introduces dependency on our choice of the balanced binary search tree.
Treap is the obvious candidate since it is built on top of the split and merge primitives, so this improvement comes for free.
Perhaps it is also possible to split a Red-black tree or a Splay tree in O(log n) (though I admit I didn't try to figure out the details myself).
Right now, the structure is already more powerful than an array: it allows splitting and concatenation of "arrays" in O(log n), although element access is as slow as O(log n) too.
Note that this would not be possible if we still stored index explicitly at this point, since indices would be broken in the right part of a split or merge operation.
Finally, it is time to introduce the invert operation.
Let us store a flag in each node to signal whether the whole subtree of this node has to be inverted.
This flag will be lazily propagating: whenever we access a node, before doing anything, check if the flag is true.
If this is the case, swap the left and right subtrees, toggle (true <-> false) the flag in the root nodes of both subtrees, and set the flag in the current node to false.
Now, when we want to invert a subarray:
split the array into three parts (before the subarray, the subarray itself, and after the subarray) by two split operations,
toggle (true <-> false) the flag in the root of the middle (subarray) part,
then merge the three parts back in their original order by two merge operations.
I want to augment a binary search tree such that search, insertion and delete be still supported in O(h) time and then I want to implement an algorithm to find the sum of all node values in a given range.
If you add an additional data structure to your BST class, specifically a Hashmap or Hashtable. Your keys will be the different numbers your BST contains and your values the number of occurrences for each. BST search(...) will not be impacted however insert(...) and delete(...) will need slight code changes.
Insert
When adding nodes to the BST check to see if that value exist in the Hashmap as a key. If it does exist increment occurrence count by 1. If it doesn't exist add it to the Hashmap with an initial value of 1.
Delete
When deleting decrement the occurrence count in the Hashmap (assuming your aren't being told to delete a node that doesn't exist)
Sum
Now for the sum function
sum(int start, int end)
You can iteratively check your Hashmap to see which numbers from the range exist in your map and their number of occurrences. Using this you can build out your sum by adding up all of the values in the Map that are in the range multiplied by their number of occurrences.
Complexities
Space: O(n)
Time of sum method: O(range size).
All other method time complexity isn't impacted.
You didn't mention a space restraint so hopefully this is OK. I am very interested to see if you can some how use the properties of a BST to solve this more efficiently nothing comes to mind for me.
I'm looking for some help on a specific augmented Red Black Binary Tree. My goal is to make every single operation run in O(log(n)) in the worst case. The nodes of the tree will have an integer as there key. This integer can not be negative, and the tree should be sorted by a simple compare function off of this integer. Additionally, each node will also store another value: its power. (Note that this has nothing to do with mathematical exponents). Power is a floating point value. Both power and key are always non-negative. The tree must be able to provide these operations in O(log(n)) runtime.:
insert(key, power): Insert into the tree. The node in the tree should also store the power, and any other variables needed to augment the tree in such a way that all other operations are also O(log(n)). You can assume that there is no node in the tree which already has the same key.
get(key): Return the power of the node identified by the key.
delete(key): Delete the node with key (assume that the key does exist in the tree prior to the delete.
update(key,power): Update the power at the node given by key.
Here is where it gets interesting:
highestPower(key1, key2): Return the maximum power of all nodes with key k in the range key1 <= k <= key2. That is, all keys from key1 to key2, inclusive on both ends.
powerSum(key1, key2): Return the sum of the powers of all nodes with key k in the ragne key1 <= k <= key2. That is, all keys from key1 to key2, inclusive on both ends.
The main thing I would like to know is what extra variables should I store at each node. Then I need to work out how to use each one of these in each of the above functions so that the tree stays balanced and all operations can run in O(log(n)) My original thought was to store the following:
highestPowerLeft: The highest power of all child nodes to the right of this node.
highestPowerRight: The highest power of all child nodes to the right of this node.
powerSumLeft: The sum of the powers of all child nodes to the left of this node.
powerSumRight: The sum of the powers of all child nodes to the right of this node.
Would just this extra information work? If so, I'm not sure how to deal with it in the functions that are required. Frankly my knowledge of Red Black Tree's isn't great because I feel like every explanation of them gets convoluted really fast, and all the rotations and things confuse the hell out of me. Thanks to anyone willing to attempt helping here, I know what I'm asking is far from simple.
A very interesting problem! For the sum, your proposed method should work (it should be enough to only store the sum of the powers to the left of the current node, though; this technique is called prefix sum). For the max, it doesn't work, since if both max values are equal, that value is outside of your interval, so you have no idea what the max value in your interval is. My only idea is to use a segment tree (in which the leaves are the nodes of your red-black tree), which lets you answer the question "what is the maximal value within the given range?" in logarithmic time, and also lets you update individual values in logarithmic time. However, since you need to insert new values into it, you need to keep it balanced as well.
Deleting a node from the middle of the heap can be done in O(lg n) provided we can find the element in the heap in constant time. Suppose the node of a heap contains id as its field. Now if we provide the id, how can we delete the node in O(lg n) time ?
One solution can be that we can have a address of a location in each node, where we maintain the index of the node in the heap. This array would be ordered by node ids. This requires additional array to be maintained though. Is there any other good method to achieve the same.
PS: I came across this problem while implementing Djikstra's Shortest Path algorithm.
The index (id, node) can be maintained separately in a hashtable which has O(1) lookup complexity (on average). The overall complexity then remains O(log n).
Each data structure is designed with certain operations in mind. From wikipedia about heap operations
The operations commonly performed with a heap are:
create-heap: create an empty heap
find-max or find-min: find the maximum item of a max-heap or a minimum item of a min-heap, respectively
delete-max or delete-min: removing the root node of a max- or min-heap, respectively
increase-key or decrease-key: updating a key within a max- or min-heap, respectively
insert: adding a new key to the heap
merge joining two heaps to form a valid new heap containing all the elements of both.
This means, heap is not the best data structure for the operation you are looking for. I would advice you to look for a better suited data structure(depending on your requirements)..
I've had a similar problem and here's what I've come up with:
Solution 1: if your calls to delete some random item will have a pointer to item, you can store your individual data items outside of the heap; have the heap be of pointers to these items; and have each item contain its current heap array index.
Example: the heap contains pointers to items with keys [2 10 5 11 12 6]. The item holding value 10 has a field called ArrayIndex = 1 (counting from 0). So if I have a pointer to item 10 and want to delete it, I just look at its ArrayIndex and use that in the heap for a normal delete. O(1) to find heap location, then usual O(log n) to delete it via recursive heapify.
Solution 2: If you only have the key field of the item you want to delete, not its address, try this. Switch to a red-black tree, putting your payload data in the actual tree nodes. This is also O( log n ) for insert and delete. It can additionally find an item with a given key in O( log n ), which makes delete-by-key continue to be log n.
Between these, solution 1 will require an overhead of constantly updating ArrayIndex fields with every swap. It also results in a kind of strange one-off data structure that the next code maintainer would need to study and understand. I think solution 2 would be about as fast, and has the advantage that it's a well-understood algo.