Deque with a max api? - data-structures

Deque with a max api? - data-structures

Implementing a queue with a max() API so that push(), pop(), and max() all work in (amortized) O(1) is a known solved problem. Is there a known solution for implementing a double ended queue with the same max() API which is faster than O(n)? Can it be proven that it's impossible?

It's 100% possible to have a deque with an O(1) max api.
A deque can be implemented from two stacks. While there's some additional business logic in keeping the deque balanced, the idea is fairly simple. Imagine two stacks facing opposite directions joined together. From this structure you can pop and append to both sides.
It's possible to create a stack that has a constant time get_max() or get_min(). Each time you push onto the stack push onto it two things - (value, current_max). You can calculate the current_max in constant time by comparing the current_max in previous element to the current value. The result of get_max() will always be the current_max of the top of the stack.
If you implement a deque from two stacks that have the get_max() api, to get max of the deque, you only have to call get_max() for both stacks and return the bigger value.

Related

Why would you implement a stack or queue using a link list rather than array or vector implementation?

So I know that implementing a stack or queue using vector or an array have these properties:
O(n) for searching
At least with array implementation (All on the stack rather than heap)
O(1) peek top/front or back/bottom
And if array's space constraint is an issue you would implement the stack or queue using a vector, so why would anyone implement one of these data structures using a link list? Any real life examples would be great, and Big O notation of some basic functionality if differ from array/vector implementation.

For the queue, a linked list would provide faster results when manipulating data in the middle of the queue (add/delete): O(1). If implemented with an array or vector, it would be O(n) because you have to move other elements to create the space for the new element, or fill the space of the deleted element.
As far as the stack, I refer you to this answer: Linked list vs. dynamic array for implementing a stack

Special stack to find medium

From this, we can design data structure special stack with method getMin() which should return minimum element from the SpecialStack.
My question is: How to implement the method getMed() which should return the medium element from the SpecailStack?
From this Data Structure to find median, we know the best data structure is two heaps. Left is a Max-Heap; Right is a Min-Heap. However, for my question it seems not good, because the top element pushed into stack must be maintained which heap can not do that. Am I right?
Edit I do not know how to maintain the index of latest pushed element with Heap.
Thanks a lot.

You could alternatively use an Order Statistic Tree.

You can use any balanced binary search tree here instead of a heap. It is easy to find min or max element in a tree(the leftmost and the rightmost node), and it also supports deletion in O(log N).
So you can maintain a stack and two binary search trees(instead of two heaps). Push implementation is pretty strainforward. To pop an element, you can delete the top element from a tree where it is stored, adjust the trees(like in two heaps algorithm) and then pop it from the stack.

How to update key of a relaxed vertex in Dijkstra's algorithm?

Just like it was asked here,
I fail to understand how we can find the index of a relaxed vertex in the heap.
Programming style-wise, the heap is a black box that abstracts away the details of a priority queue. Now if we need to maintain a hash table that maps vertex keys to corresponding indices in the heap array, that would need to be done in heap implementation, right?
But most standard heaps don't provide a hash table that does such mapping.
Another way to deal with this whole problem is to add the relaxed vertices to the heap regardless of anything. When we extract the minimum we'll get the best one. To prevent the same vertex being extracted multiple times, we can mark it visited.
So my exact question is, what is the typical way (in the industry) of dealing with this problem?
What are the pros and cons compared what the methods I mentioned?

Typically, you'd need a specially-constructed priority queue that supports the decreaseKey operation in order to get this to work. I've seen this implemented by having the priority queue explicitly keep track of a hash table of the indices (if using a binary heap), or by having an intrusive priority queue where elements stored are nodes in the heap (if using a binomial heap or Fibonacci heap, for example). Sometimes, the priority queue's insertion operation will return a pointer to the node in the priority queue that holds the newly-added key. As an example, here is an implementation of a Fibonacci heap that supports decreaseKey. It works by having each insert operation return a pointer to the node in the Fibonacci heap, which makes it possible to look up the node in O(1), assuming you keep track of the returned pointers.
Hope this helps!

You are asking some very valid questions but unfortunately they are kind of vague so we won't be able to give you a 100% solid "industry standard" answer. However, I'll try to go over your points anyway:
Programming style-wise, the heap is a black box that abstracts away the details of a priority queue
Technically, a priority queue is the abstract interface (insert elements with a priority, extract the lowest priority element) and a heap is a concrete implementation (array-based heap, binomial heap, fibonacci heap, etc).
What I'm trying to say is that using an array is only one particular way to implement a priority queue.
Now if we need to maintain a hash table that maps vertex keys to corresponding indices in the heap array, that would need to be done in heap implementation, right?
Yes, because everytime you move an element inside the array you will need to update the index in the hash table.
But most standard heaps don't provide a hash table that does such mapping.
Yes. This can be very annoying.
Another way to deal with this whole problem is to add the relaxed vertices to the heap regardless of anything.
I guess that could work but I dont think I ever saw anyone do that. The whole point of using a heap here is to increase performance and by adding redundant elements to the heap you kind of go against that. Sure, you preserve the "black-boxness" of the priority queue but I don't know if that is worth it. Additionally, there could be a chance that the extra pop_heap operations could negatively affect your asymptoptic complexity but I'd have to do the math to check.
what is the typical way (in the industry) of dealing with this problem?
First of all, ask yourself if you can get away with using a dumb array instead of a priority queue.
Sure, finding the minimum element in now O(N) instead of O(log n) but the implementation is the simplest (an advantage on its own). Additionally, using an array will be just as efficient if your graph is dense and even if your graph is sparse it might be efficient enough depending on how big your graph is.
If you really need a priority queue, then you are going to have to find one that has a decreaseKey operation implemented. If you can't find one, I would say its not that bad to implement it yourself - it might be less trouble than trying to find an existing implementation and then trying to fit it in with the rest of your code.
Finally, I would not recommend using the really fancy heap data structures (such as fibonacci heaps). While these often show up in textbooks as a way to get optimal asymptotics, in practice they have terrible constant factors and these constant factors are significant when compared with something that is logarithmic.

Programming style-wise, the heap is a black box that abstracts away the details of a priority queue.
Not necessarily. Both C++ and Python have heap libraries that provide functions on arrays rather than black box objects. Go abstracts a bit, but requires the programmer to provide an array-like data structure for its heap operations to work on.
All this abstraction leaking in standardized, industry-strength libraries has a reason: some algorithms (Dijkstra) require a heap with additional operations, which would degrade the performance of other algorithms. Yet other algorithms (heapsort) need heap operations that work in-place on input arrays. If your library's heap gives you a black-box object, and it doesn't suffice for some algorithm, then it's time to re-implement the operations as function on arrays, or find a library that does have the operations you need.

This is a great question and one of those details that algorithms books like CLRS just glaze over without mention.
There are a few ways to do handle this, either:
Use a custom heap implementation that supports decreaseKey operations
Every time you "relax" a vertex, you just add it back into the heap with the new lower weight, then you write a custom way to ignore the old elements later. You can take advantage of the fact that you only ever add a node into the heap/priority-queue if the weight has decreased.
Option #1 is definitely used. For example, if you are familiar with OpenSourceRoutingMachine (OSRM) it searches over graphs with many millions of nodes to compute road routing directions. It uses a Boost implementation of a d-ary heap specifically because it has better decreaseKey operations, source. Often the Fibonacci_heap is also mentioned for this purpose because it supports O(1) decrease key operations, but likewise you'd probably have to roll your own.
In option #2 you end up doing more insertions and removeMin operations in total. If D is the total number of "relax" operations you must do, you end up doing a total of D additional heap operations. So while this has a theoretically worse runtime complexity, in practice there is research evidence that option #2 can be more performant because you can take advantage of cache locality and avoid the additional overhead of keeping pointers to do the decreaseKey operations (see [1], specifically pg. 16). This approach also has the advantage of being simpler and allows you to use standard library heap/priority-queue implementations in most languages.
To give you some psuedocode for how option #2 would look:
// Imagine this is some lookup table that has the minimum weight
// so far for each node.
weights = {}
while Queue is not empty:
u = Queue.removeMin()
// This is our new logic to discard the duplicate entries.
if u.weight > weights[u]:
continue
visit neighbors[u] and relax() each one
As an alternative, you can also check out the the Python standard library heapq docs which describe another approach to keeping track of "dead" entries in the heap. Whether you find it helpful depends on what data structure you are using for your graph representation and storing of vertex distances.
[1] Priority Queues and Dijkstra’s Algorithm 2007

Algorithm Question.. Linked List

Scenario is as follows:-
I want to reverse the direction of the singly linked list, In other words, after the reversal all pointers should now point backwards..
Well the algorithm should take linear time.
The solution that i have thought of using another datastructure A Stack.. With the help of which the singly linked list would be easily reversed, with all pointers pointing backwards.. But i am in doubt, that whether the following implementation yeild linear time complexity.. Please comment on this.. And if any other efficient algorithm is in place, then please discuss..
Thanks.

You could do it like this: As long as there are nodes in the input list, remove its first node and insert it at the beginning of the output list:
node* reverse(node *in) {
out = NULL;
while (in) {
node = in;
in = in->next;
node->next = out;
out = node;
}
return out;
}

2 times O(N) = O(2*n) is still O(N). So first push N elements and then popping N elements from a stack is indeed linear in time, as you expected.
See also the section Multiplication by a Constant on the "Big O Notation" wikipedia entry.

If you put all of the nodes of your linked list in a stack, it will run in linear time, as you simply traverse the nodes on the stack backwards.
However, I don't think you need a stack. All you need to remember is the node you were just at, to reverse the pointer of the current node. Make note of the next node before you reverse the pointer at this node.

The previous answers have and already (and rightly) mentioned that the solution using pointer manipulation and the solution using stack are both O(n).
The remaining question is to compare the real run time (machine cycle complexity) performance of the two different implementations of the reverse() function.
I expect that the following two aspects might be relevant:
The stack implementation. Does it
require the maximum stack depth to
be explicitly specified? If so, how is that specified? If not, how
the stack does memory management as
the size grows arbitrarily large?
I guess that nodes have to be copied
from list to stack. [Is there a way
without copying?] In that case, the
copy complexity of the node needs to
be accounted for. Thats because the
size of the node can be
(arbitrarily) large.
Given these, in place reversal by manipulating pointers seems more attractive to me.

For a list of size n, you call n times push and n times pop, both of which are O(1) operations, so the whole operation is O(n).

You can use a stack to achieve a O(n) implementation. But the recursive solution IS using a stack (THE stack)! And, like all recursive algorithms, it is equivalent to looping. However, in this case, using recursion or an explicit stack would create a space complexity of O(n) which is completely unnecessary.

position index for binary heap priority queues?

So let's say I have a priority queue of N items with priorities, where N is in the thousands, using a priority queue implemented with a binary heap. I understand the EXTRACT-MIN and INSERT primitives (see Cormen, Leiserson, Rivest which uses -MAX rather than -MIN).
But DELETE and DECREASE-KEY both seem to require the priority queue to be able to find an item's index in the heap given the item itself (alternatively, that index needs to be given by consumers of the priority queue, but this seems like an abstraction violation).... which looks like an oversight. Is there a way to do this efficiently without having to add a hashtable on top of the heap?

Right, I think the point here is that for the implementation of the priority queue you may use a binary heap who's API takes an index (i) for its HEAP-INCREASE-KEY(A, i, key), but the interface to the priority queue may be allowed to take an arbitrary key. You're free to have the priority queue encapsulate the details of key->index maps. If you need your PQ-INCREASE-KEY(A, old, new) to to work in O(log n) then you'd better have a O(log n) or better key to index lookup that you keep up to date. That could be a hash table or other fast lookup structure.
So, to answer your question: I think it's inevitable that the data structure be augmented some how.

FWIW, and if someone still comes looking for something similar — I recently chanced upon an implementation for an Indexed priority queue while doing one of the Coursera courses on Algorithms.
The basic gist is to incorporate a reverse lookup using 2 arrays to support the operations that the OP stated.
Here's a clear implementation for Min Ordered Indexed Priority Queue.

"But DELETE and DECREASE-KEY both seem to require the priority queue to be able to find an item's index in the heap given the item itself" -- it's clear from the code that at least a few of these methods use an index into the heap rather than the item's priority. Clearly, i is an index in HEAP-INCREASE-KEY:
HEAP-INCREASE-KEY(A, i, key)
if key < A[i]
then error 'new key is smaller than current key"
A[i] <-- key
...
So if that's the API, use it.

I modified my node class to add a heapIndex member. This is maintained by the heap as nodes are swapped during insert, delete, decrease, etc.
This breaks encapsulation (my nodes are now tied to the heap), but it runs fast, which was more important in my situation.

One way is to split up the heap into the elements on one side and the organization on the other.
For full functionality, you need two relations:
a) Given a Heap Location (e.g. Root), find the Element seated there.
b) Given an Element, find its Heap Location.
The second is very easy: add a value "location" (most likely an index in an array-based heap) that is updated every time the element is moved in the heap.
The first is also simple: instead of storing Elements, you simply keep a heap of pointers to Elements (or array indeces). Now, given a Location (e.g. Root), you can find the Element seated there by dereferencing it (or accessing the vector).

But DELETE and DECREASE-KEY both seem to require the priority queue to be able to find an item's index in the heap given the item itself
Actually, that's not true. You can implement these operations in an unindexed graph, linked-lists and 'traditional' search trees by having predecessor(s) and successor(s) pointers.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio