Efficient nested priority queue - algorithm

I'm looking for a data-structure/algorithm (not multithreaded) which is essentially a nested priority queue. That is:
The next element to be taken is the one with the highest priority.
An element can either be a simple element with a priority, or it can be another priority queue (though a limit of one level of nesting is fine for my purposes). Regardless of the level of nesting, the element with the highest priority across the queue/sub-queues/sub-sub-queues/etc is the one chosen nexxt.
Elements can be added or deleted at any level, though a simple node never turns into a sub-queue (or vise-versa).
The priority of a simple element doesn't change after being inserted.
I haven't been able to come up with anything efficient/elegant, and Googling hasn't turned up anything.

I haven't actually built this, but I did some pretty extensive analysis on the idea and it seems like it should work. I call it a queue of queues. The reason I never built it is because the project I was building it for was canceled before I needed the queue.
First, I decided that a "simple element" would instead be a priority queue that contains a single element. Not having to manage two different types of elements simplified the design, and analysis showed that it shouldn't affect performance in any significant way.
Because a sub-queue's priority can change whenever a new item is added, or an item is removed from it, I elected to use a Pairing heap for the main queue and the subqueues. Pairing heap performs better than binary heap when you have to do a lot of priority changes. The problem with binary heap is that if you want to change an item's priority, you have to find the item first. In a binary heap, that's an O(n) operation. In pairing heap, the amortized cost of a priority change is O(log n) because you already have a reference to the node.
So the idea is, if you're adding a new sub-queue you just add it to the main queue and it'll get put in the proper place. If you're updating a sub-queue, you add or remove the item (which is O(log n) on the sub-queue), and then adjust the sub-queue's position in the main queue (which is O(log n) on the main queue).
All my analysis said that this should work quite well, although I'm still not sure how well it would work with multiple threads. I think I have a good idea how to synchronize access and not end up blocking the entire queue for every insertion and deletion, except for a very brief time. I guess I'll find out if I ever build it. It might be possible to create a lock-free concurrent pairing heap.
I selected Pairing heap because of its better performance in re-ordering keys, and also because it's much easier to implement than Fibonacci heap or many of the others, and although its asymptotic performance is slower than Fibonacci heap, its real-world performance is much, much better. The only drawback to me is that a Pairing heap will occupy more memory than an equivalent binary heap. It's the old time/space tradeoff.
Another option would be to implement a skip list priority queue, which also has O(log n) performance for insertion and changing priority. And I've seen lock-free concurrent skip list implementations. Implementing an efficient skip list isn't difficult in C, because it handles variable record sizes very well. In C# and other languages that don't allow you to build varying length structures, skip list can be a real memory hog.
As I said, I never actually built this thing, but all my research and design notes tell me that it should be reasonably easy to build and should perform quite well.

Related

Floodfill: Stack vs. Queue

It is possible to write a flood fill function that uses either a queue or a stack. Which is faster under which circumstances (if at all), and why?
Provided you implement them correctly they should be equally fast. That is avoid recursion, implement the queue using a vector, not a linked list.
Both have O(N) complexity (N is the number of cells to be filled).
For very large examples(I would guess 10k x 10k), you might implement the stack approach so that you favor memory cache lines which would give you a slight advantage. This is hard to do right, reliably, since it is hardware dependent.

Having a hash map to keep track of elements in another data structure

So sometimes I have a certain ds with certain functionalities which have a get time complexity of O(N) like a queue, stack, heap, etc.. I use one of these ds in a program which just needs to check whether a certain element is in one of theses ds, but because they have a get complexity of O(N), it is the pitfall in my algorithm.
If memory isn't much of my worries, would it be poor design to have a hashmap which keeps track of the elements in the restricted data structure? Doing this would essentially remove the O(N) restriction and allow it to be O(1).
Having a supplemental hash table is warranted in many situations. However, maintaining a "parallel hash" could become a liability.
The situation that you describe, when you need to check membership quickly, is often modeled with a hash-based set (HashSet<T>, std::unordered_set<T>, and so on, depending on the language). The disadvantage of these structures is that the order of elements is not specified, and that they cannot have duplicates.
Depending on the library, you may have access to data structures that fix these shortcomings. For example, Java offers LinkedHashSet<T> which provides a predictable order of enumeration, and C++ provides std::unordered_multiset<T>, which allows duplicates.

Looking for an optimal in-place sorting algorithm

I'm working on an avionics OS (thread layer) and I'm looking for an optimal solution regarding the following (simplified) requirement:
"Threads waiting for [various objects] are queued in priority order. For the same priority, threads are also queued in FIFO order".
[various objects] are e.g. semaphores.
I thought I could build such a waiting list using a classical linked list, which makes insertion/sorting relatively fast and easy, and which perfectly fits with expected usage (one thread goes in waiting state at a time). But I am working on a bare metal target and I don't have any libc support, thus I have no malloc (which is very useful for linked list!).
For sorting threads by priority I usually use binary heaps (http://en.wikipedia.org/wiki/Binary_heap) which is very efficient, but it can't be used here because "FIFO order" can not be managed this way.
Of course, I can do it with more classical sorting algorithms, but they are usually time-consuming, even for one insertion, because a lot of array elements may be moved at each insertion.
So I wonder if an appropriate algorithm exists... maybe a kind of improved binary heap?... Or a “static” linked list?... Or maybe the best thing is an allocator algorithm associated with a linked list?...
For information:
- the total number of threads is limited to 128, so memory need is always finite and can be known/reserved at compile time.
- I have a limited quantity or RAM, so I can hardly do constructions such a binary heap sorted by priority pointing on FIFOs (naturally ordered by arrival time)…
I'd really appreciate any idea and fresh look regarding this problem.
Thanks !
Probably you need a stable in-place sort - it will maintain relative order of items after sorting by priority, satisfying your FIFO requirement.
Pick anything from list in wiki, for example in-place merge sort, block sort and tim sort are both in-place and stable:
http://en.wikipedia.org/wiki/Sorting_algorithm
Regarding memory allocation and linked lists - maybe you can implement your own malloc?
You can allocate fixed size heap (128 * thread information size), and then use index of each block
as a pointer. So real pointer to object will be (heap start address) + index * (block size).
And then implement sorting as you normally would, but with indexes instead of pointers.
Another idea is to separate FIFO requirement from priority queue requirement, and sort containers with queues of same-priority items - but this would require dynamic list allocation and larger heap.
The standard technique for this problem is the Bentley-Saxe priority queue. I would describe it in detail here, along with some tips for how to implement it in-place with minimal memory requirements, but anything I said would just be reiterating Pat Morin's excellent answer over on the CS Theory StackExchange: Pat Morin's answer to "Is there a stable heap?" on cstheory.SE

Data Structure Parallel Add Serial Remove Needed

I'm working on a dynamically branching particle system on the GPU. I need a parallel data structure with the following properties:
One thread must be able to remove elements one by one, in constant time. The element returned isn't important to the algorithm--just so long as some element is returned when nonempty. For extra awesomeness, change to any number of threads.
Any number of threads must be able to add elements to the data structure in constant time. Note that some locking is allowed, (and necessary) but it must still scale with no relation on the number of threads. I.e., more threads shouldn't slow it down.
Basic synchronization primitives (mutexes, semaphores), and anything that can be implemented using them, are available.
I had toyed with the idea of a linked list, but this violates condition two (since adding would be O(m) for m threads, since locking must be taken into consideration). I'm not sure such a data structure exists--but I thought I would ask.
Without knowing more about how you want your data organized (sorted? FIFO? LIFO?) I'm. Or sure whether I can give you an exact answer. However, what you're describing sounds like the definition of a lock-free structure. Lock-free implementations of stacks and queues exist, which do support O(1) insertions and deletions even when there are a lot of threads modifying the structure concurrently. They're usually based on atomic test-and-set operations.
If locks are okay and you just want a highly-concurrent data structure that's sorted, consider looking into concurrent skip lists, which provide O(log n) sorted insertion and deletion with multiple active threads.
Hope this helps!

Linked list vs. dynamic array for implementing a stack

I've started reviewing data structures and algorithms before my final year of school starts to make sure I'm on top of everything. One review problem said "Implement a stack using a linked list or dynamic array and explain why you made the best choice".
To me, it seemed more intuitive to use a list with a tail pointer to implement a stack since it may need to be resized often. It seems like for a large amount of data, a list is the better choice since a dynamic array re-size is an expensive operation. Additionally, with a list, you don't need to allocate any more space than you actually need so it's more space efficient.
However, a dynamic array would definitely allow for adding data far quicker (except when it needs to be resized). However, I'm not sure if using an array is overall quicker, or only if it doesn't need to be resized.
The book's solution said "for storing very large objects, a list is a better implementation" but I don't understand why.
Which way is best? What factors should be used to determine which implementation is "best"? Also, is any of my logic here off?
There are many tradeoffs involved here and I don't think that there's a "correct" answer to this question.
If you implement the stack using a linked list with a tail pointer, then the worst-case runtime to push, pop, or peek is O(1). However, each element will have some extra overhead associated with it (namely, the pointer) that means that there is always O(n) overhead for the structure. Additionally, depending on the speed of your memory allocator, the cost of allocating new nodes for the stack might be noticeable. Also, if you were to continuously pop off all the elements from the stack, you might have a performance hit from poor locality, since there is no guarantee that the linked list cells will be stored contiguously in memory.
If you implement the stack with a dynamic array, then the amortized runtime to push or pop is O(1) and the worst-case cost of a peek is O(1). This means that if you care about the cost of any single operation in the stack, this may not be the best approach. That said, allocations are infrequent, so the total cost of adding or removing n elements is likely to be faster than the corresponding cost in the linked-list based approach. Additionally, the memory overhead of this approach is usually better than the memory overhead of the linked list. If your dynamic array just stores pointers to the elements, then the memory overhead in the worst-case occurs when half the elements are filled in, in which case there are n extra pointers (the same as in the case when you were using the linked list), and in the best case when the dynamic array is full there are no empty cells and the extra overhead is O(1). If, on the other hand, your dynamic array directly contains the elements, the memory overhead can be worse in the worst-case. Finally, because the elements are stored contiguously, there is better locality if you want to continuously push or pop elements from the stack, since all the elements are right next to each other in memory.
In short:
The linked-list approach has worst-case O(1) guarantees on each operation; the dynamic array has amortized O(1) guarantees.
The locality of the linked list is not as good as the locality of the dynamic array.
The total overhead of the dynamic array is likely to be smaller than the total overhead of the linked list, assuming both store pointers to their elements.
The total overhead of the dynamic array is likely to be greater than that of the linked list if the elements are stored directly.
Neither of these structures is clearly "better" than the other. It really depends on your use case. The best way to figure out which is faster would be to time both and see which performs better.
Hope this helps!
Well, for the small objects vs. large objects question, consider how much extra space to use for a linked list if you've got small objects on your stack. Then consider how much extra space you'll need if you've got a bunch of large objects on your stack.
Next, consider the same questions, but with an implementation based on dynamic arrays.
What matters is the number of times malloc() gets called in the course of running a task. It could take from hundreds to thousands of instructions to get you a block of memory. (The time in free() or GC should be proportional to that.) Also, keep a sense of perspective. This might be 99% of the total time, or only 1%, depending what else is happening.
I think you answered the question yourself. For a stack with a large number of items, the dynamic array would have excessive overhead costs (copying overhead) when simply adding an extra item to the top of the stack. With a list it's a simple switch of pointers.
Resizing the dynamic array would not be an expensive task if you design your implementation well.
For instance, to grow the array, if it is full, create a new array of twice the size, and copy items.
You will end up with an amortized cost of ~3N for adding N items.

Resources