Why does Fibonacci heap keep a global node counter? - data-structures

I read that Fibonacci heap keeps a global node counter, but I can't see why. I even found an implementation that had the counter but didn't use it at all.

It's to make queries of the form "how many elements are in the heap?" take time O(1). Without caching this information, this query would take time O(n), since each tree would have to be traversed to count up how many nodes it contains. This is similar to why some linked list implementations keep a counter tracking the number of nodes.
Hope this helps!

Related

Delete and Increase key for Binomial heap

I currently studying the binomial heap right now.
I learned that following operations for the binomial heaps can be completed in Theta(log n) time.:
Get-max
Insert
Extract Max
Merge
Increase-Key
Delete
But, the two operations Increase key and Delete operations said they need the pointer to the element that need to be complete in Theta(log n).
Here is 3 questions I want to ask:
Is this because if Increase key and Delete don't have the pointer to element, they have to search the elements before the operations took place?
what is the time complexity for the searching operations for the binomial heap? (I believe O(n))
If the pointer to the element is not given for Increase key and Delete operations, those two operations will take O(n) time or it can be lower than that.
It’s good that you’re thinking about this!
Yes, that’s exactly right. The nodes in a binomial heap are organized in a way that makes it very quick to find the minimum value, but the relative ordering of the remaining elements is not guaranteed to be in an order that makes it easy to find things.
There isn’t a general way to search a binomial heap for an element faster than O(n). Or, stated differently, the worst-case cost of any way of searching a binomial heap is Ω(n). Here’s one way to see this. Form a binomial heap where n-1 items have priority 137 and one item has priority 42. The item with priority 42 must be a leaf node. There are (roughly) n/2 leaves in the heap, and since there is no ordering on them to find that one item you’d have to potentially look at all the leaves. To formalize this, you could form multiple different binomial heaps with these items, and whatever algorithm was looking for the item of priority 42 would necessarily have to find it in the last place it looks at least once.
For the reasons given above, no, there’s no way to implement those operations quickly without having pointers to them, since in the worst case you have to search everywhere.

Why not use an array of pointers to optimize linked list but use skip list?

Recently I was learning about the skip list, and I've learned that it is designed to speed up the lookup of linked list. But I am wondering why not use a data structure which adds an array of pointers of all nodes based on the structure of the linked list ? For a list of 2^n nodes, if each levels we have half of the number of pointers of the lower level we will add 2^n-1 ponters, it's almost the same number of adding a list of pointers of each nodes, and at the same time it's O(1) to access by index.
There must be some reasons not to implement my idea, can anyone tell me?
A skip list offers average performance of O(log(n)) to read, insert, update and delete the element at any index.
Your array of pointers idea offers average performance of O(1), O(n), O(1) and O(n) for the same operations. Admittedly with a good constant on the O(n) operations, but still that is how it scales.
Both data structures are used in practice. They are just used for different situations.

Best algorithm/data structure for a continually updated priority queue

I need to frequently find the minimum value object in a set that's being continually updated. I need to have a priority queue type of functionality. What's the best algorithm or data structure to do this? I was thinking of having a sorted tree/heap, and every time the value of an object is updated, I can remove the object, and re-insert it into the tree/heap. Is there a better way to accomplish this?
A binary heap is hard to beat for simplicity, but it has the disadvantage that decrease-key takes O(n) time. I know, the standard references say that it's O(log n), but first you have to find the item. That's O(n) for a standard binary heap.
By the way, if you do decide to use a binary heap, changing an item's priority doesn't require a remove and re-insert. You can change the item's priority in-place and then either bubble it up or sift it down as required.
If the performance of decrease-key is important, a good alternative is a pairing heap, which is theoretically slower than a Fibonacci heap, but is much easier to implement and in practice is faster than the Fibonacci heap due to lower constant factors. In practice, pairing heap compares favorably with binary heap, and outperforms binary heap if you do a lot of decrease-key operations.
You could also marry a binary heap and a dictionary or hash map, and keep the dictionary updated with the position of the item in the heap. This gives you faster decrease-key at the cost of more memory and increased constant factors for the other operations.
Quoting Wikipedia:
To improve performance, priority queues typically use a heap as their
backbone, giving O(log n) performance for inserts and removals, and
O(n) to build initially. Alternatively, when a self-balancing binary
search tree is used, insertion and removal also take O(log n) time,
although building trees from existing sequences of elements takes O(n
log n) time; this is typical where one might already have access to
these data structures, such as with third-party or standard libraries.
If you are looking for a better way, there must be something special about the objects in your priority queue. For example, if the keys are numbers from 1 to 10, a countsort-based approach may outperform the usual ones.
If your application looks anything like repeatedly choosing the next scheduled event in a discrete event simulation, you might consider the options listed in e.g. http://en.wikipedia.org/wiki/Discrete_event_simulation and http://www.acm-sigsim-mskr.org/Courseware/Fujimoto/Slides/FujimotoSlides-03-FutureEventList.pdf. The later summarizes results from different implementations in this domain, including many of the options considered in other comments and answers - and a search will find a number of papers in this area. Priority queue overhead really does make some difference in how many times real time you can get your simulation to run - and if you wish to simulate something that takes weeks of real time this can be important.

Can we use binary search tree to simulate heap operation?

I was wondering if we can use a binary search tree to simulate heap operations (insert, find minimum, delete minimum), i.e., use a BST for doing the same job?
Are there any kind of benefits for doing so?
Sure we can. but with a balanced BST.
The minimum is the leftest element. The maximum is the rightest element. finding those elements is O(logn) each, and can be cached on each insert/delete, after the data structure was modified [note there is room for optimizations here, but this naive approach also doesn't contradict complexity requirement!]
This way you get insert,delete: O(logn), findMin/findMax: O(1)
EDIT:
The only advantage I can think of in this implementtion is that you get both findMin,findMax in one data structure.
However, this solution will be much slower [more ops per step, more cache misses are expected...] and consume more space then the regular array-based implementation of a heap.
Yes, but you lose the O(1) average insert of the heap
As others mentioned, you can use a BST to simulate a heap.
However this has one major downside: you lose the O(1) insert average time, which is basically the only reason to use the heap in the first place: https://stackoverflow.com/a/29548834/895245
If you want to track both min and max on a heap, I recommend that you do it with two heaps instead of a BST to keep the O(1) insert advantage.
Yes, we can, by simply inserting and finding the minimum into the BST. There are few benefits, however, since a lookup will take O(log n) time and other functions receive similar penalties due to the stricter ordering enforced throughout the tree.
Basically, I agree with #amit answer. I will elaborate more on the implementation of this modified BST.
Heap can do findMin or findMax in O(1) but not both in the same data structure. With a slight modification, the BST can do both findMin and findMax in O(1).
In this modified BST, you keep track of the the min node and max node every time you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)
In this BST (specifically Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). Thanks to #JimMischel comment below however we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)
In my opinion, generally Heap can be replaced by Balanced BST because BST perform better in almost all of the heap data structure can do. However, I am not sure if Heap should be considered as an obsolete data structure. (What do you think?)
PS: Have to cross reference to different questions: https://stackoverflow.com/a/27074221/764592

Which node data structure to use for a trie

I am using trie for the first time.I wanted to know which is the best data structure to use for a trie while deciding which is the next branch that one is supposed to traverse. I was looking among an array,a hashmap and a linked list.
Each of these options has their advantages and disadvantages.
If you store the child nodes in an array, then you can look up which child to visit extremely efficiently by just indexing into the array. However, the space usage per node will be high: O(|Σ|), where Σ is the set of letters that your words can be formed from, even if most of those children are null.
If you store the child nodes in a linked list, then the time required to find a child will be O(|Σ|), since you may need to scan across all of the nodes of the linked list to find the child you want. On the other hand, the space efficiency will be quite good, because you only store the children that you're using. You could also consider using a fixed-sized array here, which has even better space usage but leads to very expensive insertions and deletions.
If you store the child nodes in a hash table, then the (expected) time to find a child will be O(1) and the memory usage will only be proportional (roughly) to the number of children you have. Interestingly, because you know in advance what values you're going to be hashing, you could consider using a dynamic perfect hash table to ensure worst-case O(1) lookups, at the expense of some precomputation.
Another option would be to store the child nodes in a binary search tree. This gives rise to the ternary search tree data structure. This choice is somewhere between the linked list and hash table options - the space usage is low and you can perform predecessor and successor queries efficiently, but there's a slight increase in the cost of performing a lookup due to the search cost in the BST. If you have a static trie where insertions never occur, you can consider using weight-balanced trees as the BSTs at each point; this gives excellent runtime for searches (O(n + log k), where n is the length of the string to search for and k is the total number of words in the trie).
In short, the array lookups are fastest but its space usage in the worst case is the worst. A statically-sized array has the best memory usage but expensive insertions and deletions. The hash table has decently fast lookups and good memory usage (on average). Binary search trees are somewhere in the middle. I would suggest using the hash table here, though if you put a premium on space and don't care about lookup times the linked list might be better. Also, if your alphabet is small (say, you're making a binary trie), the array overhead won't be too bad and you may want to use that.
Hope this helps!
If you are trying to build trie just for alphabets, I would suggest to use array and then use particia tree (space optimized trie).
http://en.wikipedia.org/wiki/Radix_tree
This will allow you to do fast lookup with array and doesn't waste too much of space if branching factor of certain node is low.

Resources