Suppose each object has this form: (key, priority) i.e., (0, 3), (5, 7).
Say you are given two numbers, x and y.
What data structure would you use to return the highest-priority object whose key is between x and y?
I think priority queue can be a good solution, but I have no idea how to modify it to return the highest-priority object in the given range.
Starting with a binary search tree, add two fields to each node: the priority, and (a pointer to) the highest-priority node in the subtree (see CLRS Chapter 14 for more on how to carry out this augmentation).
Now, to do a range query, start the search normally until the current node's key lies in the range. Examine that node and, using symmetric variants of the following procedure to identify O(log n) candidates that include the highest-priority node in range, the left and right subtrees of the current node. The following procedure is for the left subtree.
If the root is in range, consider it for highest priority, along with the highest-priority node in its right subtree (cached in the node). Continue with the left subtree. If the root is not in range, continue with the right subtree.
This problem is known as RMQ (Range Minimum/Maximum Query).
The best data structure to use in your case is Segment Tree.
It is a hard structure to get right the first time, but keep trying: this solves exactly your problem.
Related
given k avl trees(T1, T2, ...Tk).
all the trees together have n different numbers(T1 nodes+ T2 nodes+...Tk nodes = n )
I want to sort all of them together and print the numbers sorted from the the smallest to the biggest with the best efficiency possible.
any help ?
On each AVL tree define a cursor (generator, iterator, ...) that will visit the nodes in-order.
Create a priority queue (like a binary min-heap) with k entries -- one for each AVL tree, and as key (priority) the value of the tree's current (cursor) node.
Then repeat as long as there are entries in the priority queue:
Pull the first entry from the priority queue. This gives you a tree with a current node that has the least value (compared to all other "current" nodes).
Output that current node's value.
Forward the tree's cursor to the next node (according to in-order traversal).
If this succeeds (i.e. there is a next node) then push this tree back on the priority queue with its new key (i.e. the value of its now current node). Otherwise, skip this step.
This question was taken from a bigger one preparing for a job interview (Solved the rest of it successfully)
Question: Suggest DataStructure to handle boxes where a box has: special ID, weight and size.
I want to save those boxes in an AVL tree, and being able to solve the following problem:
From all boxes which has a maximum size of v (In other words: size<=v) I want to find the heaviest one.
How can I do this in log(n) where n is the number of total saved boxes?
I know that the solution would be saving some extra data in each node but I'm not sure which data would be helpful (no need to explain how to fix the data in rotation etc)
Example of extra data saved in each node: Id of heaviest box in right sub-tree.
It sounds like you're already on the right track: each node stores its heaviest descendant. The only missing piece is coming up with a set of log(n) nodes such that the target node is the descendant of one of them.
In other words, you need to identify all the subtrees of the AVL tree which consist entirely of nodes whose size is less than (i.e. are to the left of) your size=v node.
Which ones are those? Well, for one thing, the left child of your size=v node, of course. Then, go from that node up to the root. Every ancestor of the size=v node which is a right child, consider its left sibling (and the node itself). The set of subtrees whose roots you examine along the way will be all nodes to the left of the size=v node.
As a simplification, you can combine the upper-bound size search with the search for the target node. Basically, you traverse to the child with the highest maximum-weight descendant, but don't allow traversing to children which would violate the size constraint.
max = null
x = root
while x is not null:
if x.size <= v:
if x.weight > max.weight:
max = x
x = x.left or x.right, depending on which has a larger maxWeightDescendant
else:
x = x.left
In these slides (13) the deletion of a point in a kd-tree is described: It states that the left subtree can be swapped to be the right subtree, if the deleted node has only a left subtree. Then the minimum can be found and recursively be deleted (just as with a right subtree).
This is because kd-trees with equal keys for the current dimensions should be on the right.
My question: Why does the equal key point have to be the right children of the parent point? Also, what happens if my kd-tree algorithm returns a tree with an equal key point on the left?
For example:
Assume the dataset (7,2), (7,4), (9,6)
The resulting kd-tree would be (sorted with respect to one axis):
(7,2)
/ \
(7,4) (9,6)
Another source that states the same theory is this one (paragraph above Example 15.4.3)
Note that we can replace the node to be deleted with the least-valued node from the right subtree only if the right subtree exists. If it does not, then a suitable replacement must be found in the left subtree. Unfortunately, it is not satisfactory to replace N's record with the record having the greatest value for the discriminator in the left subtree, because this new value might be duplicated. If so, then we would have equal values for the discriminator in N's left subtree, which violates the ordering rules for the kd tree. Fortunately, there is a simple solution to the problem. We first move the left subtree of node N to become the right subtree (i.e., we simply swap the values of N's left and right child pointers). At this point, we proceed with the normal deletion process, replacing the record of N to be deleted with the record containing the least value of the discriminator from what is now N's right subtree.
Both refer to nodes that only have a left subtree but why would this be any different?
Thanks!
There is no hard and fast rule to have equal keys on right only. You can update that to left as well.
But doing this, you would also need update your algorithms of search and delete operations.
Have a look at these links:
https://www.geeksforgeeks.org/k-dimensional-tree/
https://www.geeksforgeeks.org/k-dimensional-tree-set-3-delete/
We are given a tree with n nodes in form of a pointer to its root node, where each node contains a pointer to its parent, left child and right child, and also a key which is an integer. For each node v I want to add additional field v.bigger which should contain number of nodes with key bigger than v.key, that are in a subtree rooted at v. Adding such a field to all nodes of a tree should take O(n log n) time in total.
I'm looking for any hints that would allow me to solve this problem. I tried several heuristics - for example when thinking about doing this problem in bottom-up manner, for a fixed node v, v.left and v.right could provide v with some kind of set (balanced BST?) with operation bigger(x), which for a given x returns a number of elements bigger than x in that set in logarihmic time. The problem is, we would need to merge such sets in O(log n), so this seems as a no-go, as I don't know any ordered set like data structure which supports quick merging.
I also thought about top-down approach - a node v adds one to some u.bigger for some node u if and only if u lies on a simple path to the root and u<v. So v could update all such u's somehow, but I couldn't come up with any reasonable way of doing that...
So, what is the right way of thinking about this problem?
Perform depth-first search in given tree (starting from root node).
When any node is visited for the first time (coming from parent node), add its key to some order-statistics data structure (OSDS). At the same time query OSDS for number of keys larger than current key and initialize v.bigger with negated result of this query.
When any node is visited for the last time (coming from right child), query OSDS for number of keys larger than current key and add the result to v.bigger.
You could apply this algorithm to any rooted trees (not necessarily binary trees). And it does not necessarily need parent pointers (you could use DFS stack instead).
For OSDS you could use either augmented BST or Fenwick tree. In case of Fenwick tree you need to preprocess given tree so that values of the keys are compressed: just copy all the keys to an array, sort it, remove duplicates, then substitute keys by their indexes in this array.
Basic idea:
Using the bottom-up approach, each node will get two ordered lists of the values in the subtree from both sons and then find how many of them are bigger. When finished, pass the combined ordered list upwards.
Details:
Leaves:
Leaves obviously have v.bigger=0. The node above them creates a two item list of the values, updates itself and adds its own value to the list.
All other nodes:
Get both lists from sons and merge them in an ordered way. Since they are already sorted, this is O(number of nodes in subtree). During the merge you can also find how many nodes qualify the condition and get the value of v.bigger for the node.
Why is this O(n logn)?
Every node in the tree counts through the number of nodes in its subtree. This means the root counts all the nodes in the tree, the sons of the root each count (combined) the number of nodes in the tree (yes, yes, -1 for the root) and so on all nodes in the same height count together the number of nodes that are lower. This gives us that the number of nodes counted is number of nodes * height of the tree - which is O(n logn)
What if for each node we keep a separate binary search tree (BST) which consists of nodes of the subtree rooted at that node.
For a node v at level k, merging the two subtrees v.left and v.right which both have O(n/2^(k+1)) elements is O(n/2^k). After forming the BST for this node, we can find v.bigger in O(n/2^(k+1)) time by just counting the elements in the right (traditionally) subtree of the BST. Summing up, we have O(3*n/2^(k+1)) operations for a single node at level k. There are a total of 2^k many level k nodes, therefore we have O(2^k*3*n/2^(k+1)) which is simplified as O(n) (dropping the 3/2 constant). operations at level k. There are log(n) levels, hence we have O(n*log(n)) operations in total.
Problem : Given a rooted Tree T containing N nodes. Each node is numbered form 1 to N, node 1 being the root node. Also, each node contains some value. We have to do three kind of queries in the given tree.
Query 1::
Given a node nd, you have to find the sum of the values of all the nodes of the subtree rooted at nd and print the answer.
Query 2::
Given a node nd, you have to delete the subtree rooted at nd, completely (including node nd).
Query 3::
Given a node nd and some integer v, you have to add a child to node nd having value equal to v.
Constraints : N will be of the order of 100000. And total number of queries wil also be of the order 100000. So, I can't to DFS traversal every time.
My Idea: My solution is offline . I will first find all the nodes that are added to the tree at-least once and make the corresponding tree. Then I will do pre-order traversal to the tree and convert it into an array where a subtree will always appear continuously. Then I can use segment tree data structure to solve the problem. My algorithm will be thus O(QlogN), where Q is the total number of queries. However, I am looking for a "online" solution which is efficient. I mean, I have perform each query as soon as it is asked. I can not store all the queries first then perform them one by one.
Any help is appreciated a lot!
Thanks.
Assuming tree is balanced, with two extra parameters in every node you can solve it in o(qlogn).
With every node maintain a sum whose value will be equal to the sum of values of nodes in the subtree rooted at that and maintain parent as well.
With the above two requirements, query one just reduces to returning sum plus the value at that node(o(1)). query two reduces to just subtracting sum plus the value of node from every parent of that node till you reach the root(o(logn)). query three just reduces to adding v to every parent of that node till you reach the root(o(logn)).