I'm trying to solve an exercise which results to be a little bit difficult since I have to implement a priority queue starting from a template class of a tree (kind of RedBlack or BinarySearch Tree).
Using the template which looks like
class Node
int key
Node left
Node right
Node parent
int leftNodes
int rightNodes
Initially, when I had to insert a new element I tried to fill completely a level of the tree and then using an InOrderTreeTRaversal/Sort algorithm filling an array and generating a BinarySearch tree from that array and replacing with the new root element the original one. Supposing to have as a result a balanced tree.
Unfortunately this approach appears inappropriate since the tree must emulate the maxheap property maintaining balanced the tree for every insertion/deletion (and my code didn't work well in filling completely a tree level). It is possible implements a Tree with Heap capabilities? I mean a tree for which each element is bigger or equal its children, remains balanced after insertion and has autobalancing capabilities when the root node (the bigger key element) is deleted?
You probably want to implement a binary heap, see http://en.wikipedia.org/wiki/Binary_heap
IIRC one of the main advantages of this data structure is that it can be embedded in an array (because of the balanced nature). Heapsort uses this kind of data structure to sort in-place.
Related
Is there a balanced BST structure that also keeps track of subtree size in each node?
In Java, TreeMap is a red-black tree, but doesn't provide subtree size in each node.
Previously, I did write some BST that could keep track subtree size of each node, but it's not balanced.
The questions are:
Is it possible to implement such a tree, while keeping efficiency of (O(lg(n)) for basic operations)?
If yes, then is there any 3rd-party libraries provide such an impl?
A Java impl is great, but other languages (e.g c, go) would also be helpful.
BTW:
The subtree size should be kept track in each node.
So that could get the size without traversing the subtree.
Possible appliation:
Keep track of rank of items, whose value (that the rank depends on) might change on fly.
The Weight Balanced Tree (also called the Adams Tree, or Bounded Balance tree) keeps the subtree size in each node.
This also makes it possible to find the Nth element, from the start or end, in log(n) time.
My implementation in Nim is on github. It has properties:
Generic (parameterized) key,value map
Insert (add), lookup (get), and delete (del) in O(log(N)) time
Key-ordered iterators (inorder and revorder)
Lookup by relative position from beginning or end (getNth) in O(log(N)) time
Get the position (rank) by key in O(log(N)) time
Efficient set operations using tree keys
Map extensions to set operations with optional value merge control for duplicates
There are also implementations in Scheme and Haskell available.
That's called an "order statistic tree": https://en.wikipedia.org/wiki/Order_statistic_tree
It's pretty easy to add the size to any kind of balanced binary tree (red-black, avl, b-tree, etc.), or you can use a balancing algorithm that works with the size directly, like weight-balanced trees (#DougCurrie answer) or (better) size-balanced trees: https://cs.wmich.edu/gupta/teaching/cs4310/lectureNotes_cs4310/Size%20Balanced%20Tree%20-%20PEGWiki%20sourceMayNotBeFullyAuthentic%20but%20description%20ok.pdf
Unfortunately, I don't think there are any standard-library implementations, but you can find open source if you look for it. You may want to roll your own.
I've been playing with Binomial Heaps, and I encountered a problem I'd like to discuss here, because I think it may concern some user implementing a Binomial Heap data structure.
My aim is to have pointers to nodes, or handles, which directly points to Binomial Heap internal nodes, which in turn contain my priority value, which is usually an integer, when I insert them into a Binomial Heap.
In this way I keep the pointer/handle to what I have inserted, and if it's that the case, I can delete the value directly using binomial_heap_delete_node(node), just like an iterator works.
As we'll see, having this is NOT possible with Binomial Heaps, and that's because of the architecture of this data structure.
The main problem with Binomial Heaps, is that at some point you'll need an operation: binomial_heap_swap_parent_and_child(parent, child), and you'll need this operation in both binomial_heap_decrease_key(node, key) and in binomial_heap_delete_node(node). The purpose of these operations are quite clear, by reading their names.
So, the problem is how binomial_heap_swap_parent_and_child(parent, child) works: in all the implementations I saw, it swaps the priority value between nodes, and NOT the nodes themselves.
This will invalidate all of your pointers/handles/iterators to nodes, because they will still point to correct nodes, but those nodes won't have the same priority value you inserted before, but another one.
And this is quite logical, if we watch how Binomial Heaps (or Binomial Trees, in general) are structured: you have a parent node, treated by many children as "the parent", so many children points to it, but that parent node doesn't know how many children (or, more importantly, which children) are pointing to it, so it would be impossible to swap position of a node like this. Your only choice is to swap integer priority keys, but that will invalidate all pointers/handles/iterators to nodes.
NOTE: A possible solution would NOT be that instead of using binomial_heap_delete_node(node), one can just set priority of the node to remove to -999999999 (or such minimum values) and pop the minimum node out: this won't solve the problem, since binomial_heap_decrease_key(node, key) still needs the node parent-child swap operation, which the only solution is to swap integer priorities.
I want to know if someone has incurred in this problem before.
I think the only solution is to use another heap structure, such as Binary Heap, Pairing Heap, or something else.
As with many data structure problems, it's not hard to solve this one with an extra level of indirection. Define something like:
struct handle {
struct heap_node *node; // points to node that "owns" this handle
struct user_data *data; // priority key calculated from this
}
You provide users with handles (either by copy or pointer/reference; your choice).
Each internal node points to exactly one handle (i.e. node->handle->node == node). Two such pointers are swapped to exchange parent and child.
Several variations are possible. E.g. the data field could be the data itself rather than a pointer to it. The main idea is that adding the level of indirection between handles and nodes provides the necessary flexibility.
Which one is better , Link-list or Tree, Memory-wise (RAM) ..?
Link-list is a linear structure. Or Tree is Leveled Structure( child-nodes) .
Which one is better memory-wise. Not searching-wise.
Besides Damien's witty comment : what sort of tree ? Binary ? Red/black ? Ternary ? With a linked-list of children for each node ? Nodes referencing their parent or not ?
Once you chose your data structure, you just look at the overhead for each node. For instance, a singly linked list node's overhead is one pointer to the next element. A simple binary tree node's overhead typically will be two pointers : one two each child. So there you go, simple as that. That particular list would have twice less overhead than that particular tree, only considering the data structure itself.
When comparing a linked-list and a tree, memory rarely is to consider because the purposes of these 2 data structures are completely different. In terms of memory linked list can be compared to a vector (an array): because a vector stores items on adjacent memory, it does not need a pointer along with each item so a vector/array consumes less memory. A tree needs a vector of children in each node, while each item in this vector is a pointer to a child node. So a tree consumes at least as much memory as a linked-list because for each node except the root a pointer to that node is stored in its parent.
So I see that trees are usually implemented as a list where each node is dynamically allocated and each node contains pointers to two of its children.
But a heap is almost always implemented (or so is recommended in text books) using an array. Why is that? Is there some underlying assumption about the uses of these two data strcutures? For e.g. if you are implementing a priority queue using a min heap then the number of nodes in the queue is constant and so it can be implemented using an array of fixed size. But when you are talking/teaching about a heap in general why recommend implemeting it using an array. Or to flip the question a bit why not recommend learnig about trees with an implementation using arrays?
(I assume by heap you mean binary heap; other heaps are almost always linked nodes.)
A binary heap is always a complete tree, and no operation on it moves whole subtrees around or otherwise alters the topology of the tree in any nontrivial way. This is not an assumption, the first is part of the definition of a heap and the second is immediately obvious from the definition of the operations.
First, since the Ahnentafel layout requires reserving space for every internal node (and all leaf nodes except the rightmost ones), an incomplete tree implemented this way would waste space for nodes that don't exist. Conversely, for a complete tree it's the most efficient layout possible, since all space is actually used for node data, and no space is needed for pointers.
Second, moving a subtree in the array would require copying all child elements to their new positions (since the left child's index is always twice the parent's index, the former changes when the latter changes, recursively down to the leafs). When you have nodes linked via pointers, you only need to move a few pointers around regardless of how large the trees below those pointers are. Moving subtrees is a core component of many algorithms of trees, including all kinds of binary search trees. It needs to be lightning fast for those algorithms to be efficient. Binary heap operations however never need to do this so it's a non-issue.
My problem is not with the structure to hold the Tree but the way I am doing it; because I think this implementation will cost much in long run.
I have a tree structure in which a tree Node will contain the List of references of its children. But here the problem is that while finding the child of a Node, we need to go through the List of children which will take Linear time(Linear time complexity). And I also need to store these all as immediate child(as children word is used for the immediate children).
Now, is there any way I can put all the children other than List structure so that the retrieval and deletion of the children from the List will be efficient and logarithmic(if we can)?
If I am going to traverse the Tree then to go to the right children from the root node, I will have to check a condition for each child node. That check would be Linear search and check.
I just want a technique which will help in improving this algorithm of searching for the right child in the children list during traversal.
Instead of just having each node keep a regular list, have it either keep a sorted list(log n lookup) or a hashmap(constant time lookup). In this case sorting is probably the best so you can easily iterate over the elements and save space.