Data Structure Parallel Add Serial Remove Needed - algorithm

I'm working on a dynamically branching particle system on the GPU. I need a parallel data structure with the following properties:
One thread must be able to remove elements one by one, in constant time. The element returned isn't important to the algorithm--just so long as some element is returned when nonempty. For extra awesomeness, change to any number of threads.
Any number of threads must be able to add elements to the data structure in constant time. Note that some locking is allowed, (and necessary) but it must still scale with no relation on the number of threads. I.e., more threads shouldn't slow it down.
Basic synchronization primitives (mutexes, semaphores), and anything that can be implemented using them, are available.
I had toyed with the idea of a linked list, but this violates condition two (since adding would be O(m) for m threads, since locking must be taken into consideration). I'm not sure such a data structure exists--but I thought I would ask.

Without knowing more about how you want your data organized (sorted? FIFO? LIFO?) I'm. Or sure whether I can give you an exact answer. However, what you're describing sounds like the definition of a lock-free structure. Lock-free implementations of stacks and queues exist, which do support O(1) insertions and deletions even when there are a lot of threads modifying the structure concurrently. They're usually based on atomic test-and-set operations.
If locks are okay and you just want a highly-concurrent data structure that's sorted, consider looking into concurrent skip lists, which provide O(log n) sorted insertion and deletion with multiple active threads.
Hope this helps!

Related

Why Redis SortedSet uses Skip List instead of Balanced Tree?

The Redis document said as below :
ZSETs are ordered sets using two data structures to hold the same elements
in order to get O(log(N)) INSERT and REMOVE operations into a sorted
data structure.
The elements are added to a hash table mapping Redis objects to
scores. At the same time the elements are added to a skip list
mapping scores to Redis objects (so objects are sorted by scores in
this "view").
I can not understand very much. Could someone give me a detailed explanation?
Antirez said, see in https://news.ycombinator.com/item?id=1171423
There are a few reasons:
They are not very memory intensive. It's up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.
A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.
About the Append Only durability & speed, I don't think it is a good idea to optimize Redis at cost of more code and more complexity for a use case that IMHO should be rare for the Redis target (fsync() at every command). Almost no one is using this feature even with ACID SQL databases, as the performance hint is big anyway.
About threads: our experience shows that Redis is mostly I/O bound. I'm using threads to serve things from Virtual Memory. The long term solution to exploit all the cores, assuming your link is so fast that you can saturate a single core, is running multiple instances of Redis (no locks, almost fully scalable linearly with number of cores), and using the "Redis Cluster" solution that I plan to develop in the future.
First of all, I think I got the idea of what the Redis documents says. Redis ordered set maintain the order of elements by the the element's score specified by user. But when user using some Redis Zset APIs, it only gives element args. For example:
ZREM key member [member ...]
ZINCRBY key increment member
...
redis need to know what value is about this member (element), so it uses hash table maintaining a mapping, just like the documents said:
The elements are added to a hash table mapping Redis objects to
scores.
when it receives a member, it finds its value through the hash table, and then manipulate the operation on the skip list to maintain the order of set. redis uses two data structure to maintain a double mapping to satisfy the need of the different API.
I read the papers by William Pugh Skip Lists: A Probabilistic
Alternative to Balanced Trees, and found the skip list is very elegant and easier to implement than rotating.
Also, I think the general binary balanced tree is able to do this work at the same time cost. I case I've missed something, please point that out.

Having a hash map to keep track of elements in another data structure

So sometimes I have a certain ds with certain functionalities which have a get time complexity of O(N) like a queue, stack, heap, etc.. I use one of these ds in a program which just needs to check whether a certain element is in one of theses ds, but because they have a get complexity of O(N), it is the pitfall in my algorithm.
If memory isn't much of my worries, would it be poor design to have a hashmap which keeps track of the elements in the restricted data structure? Doing this would essentially remove the O(N) restriction and allow it to be O(1).
Having a supplemental hash table is warranted in many situations. However, maintaining a "parallel hash" could become a liability.
The situation that you describe, when you need to check membership quickly, is often modeled with a hash-based set (HashSet<T>, std::unordered_set<T>, and so on, depending on the language). The disadvantage of these structures is that the order of elements is not specified, and that they cannot have duplicates.
Depending on the library, you may have access to data structures that fix these shortcomings. For example, Java offers LinkedHashSet<T> which provides a predictable order of enumeration, and C++ provides std::unordered_multiset<T>, which allows duplicates.

Efficient nested priority queue

I'm looking for a data-structure/algorithm (not multithreaded) which is essentially a nested priority queue. That is:
The next element to be taken is the one with the highest priority.
An element can either be a simple element with a priority, or it can be another priority queue (though a limit of one level of nesting is fine for my purposes). Regardless of the level of nesting, the element with the highest priority across the queue/sub-queues/sub-sub-queues/etc is the one chosen nexxt.
Elements can be added or deleted at any level, though a simple node never turns into a sub-queue (or vise-versa).
The priority of a simple element doesn't change after being inserted.
I haven't been able to come up with anything efficient/elegant, and Googling hasn't turned up anything.
I haven't actually built this, but I did some pretty extensive analysis on the idea and it seems like it should work. I call it a queue of queues. The reason I never built it is because the project I was building it for was canceled before I needed the queue.
First, I decided that a "simple element" would instead be a priority queue that contains a single element. Not having to manage two different types of elements simplified the design, and analysis showed that it shouldn't affect performance in any significant way.
Because a sub-queue's priority can change whenever a new item is added, or an item is removed from it, I elected to use a Pairing heap for the main queue and the subqueues. Pairing heap performs better than binary heap when you have to do a lot of priority changes. The problem with binary heap is that if you want to change an item's priority, you have to find the item first. In a binary heap, that's an O(n) operation. In pairing heap, the amortized cost of a priority change is O(log n) because you already have a reference to the node.
So the idea is, if you're adding a new sub-queue you just add it to the main queue and it'll get put in the proper place. If you're updating a sub-queue, you add or remove the item (which is O(log n) on the sub-queue), and then adjust the sub-queue's position in the main queue (which is O(log n) on the main queue).
All my analysis said that this should work quite well, although I'm still not sure how well it would work with multiple threads. I think I have a good idea how to synchronize access and not end up blocking the entire queue for every insertion and deletion, except for a very brief time. I guess I'll find out if I ever build it. It might be possible to create a lock-free concurrent pairing heap.
I selected Pairing heap because of its better performance in re-ordering keys, and also because it's much easier to implement than Fibonacci heap or many of the others, and although its asymptotic performance is slower than Fibonacci heap, its real-world performance is much, much better. The only drawback to me is that a Pairing heap will occupy more memory than an equivalent binary heap. It's the old time/space tradeoff.
Another option would be to implement a skip list priority queue, which also has O(log n) performance for insertion and changing priority. And I've seen lock-free concurrent skip list implementations. Implementing an efficient skip list isn't difficult in C, because it handles variable record sizes very well. In C# and other languages that don't allow you to build varying length structures, skip list can be a real memory hog.
As I said, I never actually built this thing, but all my research and design notes tell me that it should be reasonably easy to build and should perform quite well.

Is there any practical usage of Doubly Linked List, Queues and Stacks?

I've been coding for quite sometime now. And my work pertains to solving real-world business scenarios. However, I have not really come across any practical usage of some of the data structures like the Linked List, Queues and Stacks etc.
Not even at the business framework level. Of course, there is the ubiquitous HashTable, ArrayList and of late the List...but is there any practical usage of some of the other basic data structures?
It would be great if someone gave a real-world solution where a Doubly Linked List "performs" better than the obvious easily usable counterpart.
Of course it’s possible to get by with only a Map (aka HashTable) and a List. A Queue is only a glorified List but if you use a Queue everywhere you really need a queue then your code gets a lot more readable because nobody has to guess what you are using that List for.
And then there are algorithms that work a lot better when the underlying data structure is not a plain List but a DoublyLinkedList due to the way they have to navigate the list. The same is valid for all other data structures: there’s always a use for them. :)
Stacks can be used for pairing (parseing) such as matching open brackets to closing brackets.
Queues can be used for messaging, or activity processing.
Linked list, or double linked lists can be used for circular navigation.
Most of these algorithms are usually at a lower level than your usual "business" application. For example indices on the database is a variation of a multiply linked list. Implementation of function calling mechanism(or a parse tree) is a stack. Queues and FIFOs are used for servicing network request etc.
These are just examples of collection structures that are optimized for speed in various scenarios.
LIFO-Stack and FIFO-Queue are reasonably abstract (behavioral spec-level) data structures, so of course there are plenty of practical uses for them. For example, LIFO-Stack is a great way to help remove recursion (stack up the current state and loop, instead of making a recursive call); FIFO-Queue helps "buffer up" and "peel away" work nuggets in a coroutine arrangement; etc, etc.
Doubly-linked-List is more of an implementation issue than a behavioral spec-level one, mostly... can be a good way to implement a FIFO-Queue, for example. If you need a sequence with fast splicing and removal give a pointer to one sequence iten, you'll find plenty of other real-world uses, too.
I use queues, linked lists etc. in business solutions all the time.
Except they are implemented by Oracle, IBM, JMS etc.
These constructs are generally at a much lower level of abstaction than you would want while implementing a business solution. Where a business problem would benifit from
such low level constructs (e.g. delivery route planning, production line scheduling etc.) there is usually a package available to do it or you.
I don't use them very often, but they do come up. For example, I'm using a queue in a current project to process asynchronous character equipment changes that must happen in the order the user makes them.
A linked list is useful if you have a subset of "selected" items out of a larger set of items, where you must perform one type of operation on a "selected" item and a default operation or no operation at all on a normal item and the set of "selected" items can change at will (possibly due to user input). Because linked list removal can be done nearly instantaneously (vs. the traversal time it would take for an array search), if the subsets are large enough then it's faster to maintain a linked list than to either maintain an array or regenerate the whole subset by scanning through the whole larger set every time you need the subset.
With a hash table or binary tree, you could search for a single "selected" item, but you couldn't search for all "selected" items without checking every item (or having a separate dictionary for every permutation of selected items, which is obviously impractical).
A queue can be useful if you are in a scenario where you have a lot of requests coming in and you want to make sure to handle them fairly, in order.
I use stacks whenever I have a recursive algorithm, which usually means it's operating on some hierarchical data structure, and I want to print an error message if I run out of memory instead of simply letting the software crash if the program stack runs out of space. Instead of calling the function recursively, I store its local variables in an object, run a loop, and maintain a stack of those objects.

Efficient reordering of large dataset to maximize memory cache effectiveness

I've been working on a problem which I thought people might find interesting (and perhaps someone is aware of a pre-existing solution).
I have a large dataset consisting of a long list of pairs of pointers to objects, something like this:
[
(a8576, b3295),
(a7856, b2365),
(a3566, b5464),
...
]
There are way too many objects to keep in memory at any one time (potentially hundreds of gigabytes), so they need to be stored on disk, but can be cached in memory (probably using an LRU cache).
I need to run through this list processing every pair, which requires that both objects in the pair be loaded into memory (if they aren't already cached there).
So, the question: is there a way to reorder the pairs in the list to maximize the effectiveness of an in-memory cache (in other words: minimize the number of cache misses)?
Notes
Obviously, the re-ordering algorithm should be as fast as possible, and shouldn't depend on being able to have the entire list in memory at once (since we don't have enough RAM for that) - but it could iterate over the list several times if necessary.
If we were dealing with individual objects, not pairs, then the simple answer would be to sort them. This obviously won't work in this situation because you need to consider both elements in the pair.
The problem may be related to that of finding a minimum graph cut, but even if the problems are equivalent, I don't think solutions to min-cut meet
My assumption is that the heuristic would stream the data off the disk, and write it back in chunks in a better order. It may need to iterate over this several times.
Actually it may not just be pairs, it could be triplets, quadruplets, or more. I'm hoping that an algorithm that does this for pairs can be easily generalized.
Your problem is related to a similar one for computer graphics hardware:
When rendering indexed vertices in a triangle mesh, typically the hardware has a cache of most recently transformed vertices (~128 the last time I had to worry about it, but suspect the number is larger these days). Vertices not cached need a relatively expensive transform operation to calculate. "Mesh optimisation" to restructure triangle meshes to optimise cache usage used to be a pretty hot research topic. Googling
vertex cache optimisation
(or optimization :^) might find you some interesting material relevant to your problem. As other posters suggest, I suspect doing this effectively will depend on exploiting any inherent coherence in your data.
Another thing to bear in mind: as an LRU cache becomes overloaded it can be well worth changing to an MRU replacement strategy to at least hold some of the items in memory (rather than turning over the entire cache each pass). I seem to remember John Carmack has written some good material on this subject in connection with Direct3D texture caching strategies.
For start, you could mmap the list. That works if there's enough address space, not memory, e.g. on 64-bit CPUs. This makes it easier to access the elements in order.
You could sort that list according to a minimum distance in cache which considers both elements, which works well if the objects are in a contiguous space. The sorting function could be something like: compare (a, b) to (c, d) = (a - c) + (b - d) (which looks like a Hamming distance). Then you pull in slices of the object store and process according to the list.
EDIT: fixed a mistake in the distance.
Even though you're not just sorting this list, the general pattern of a multiway merge sort might be applicable - that is, consider some kind of (possibly recursive) breakdown of the set into smaller sets that can be dealt with in memory separately, and then a second phase where small chunks of the previously dealt-with sets can all be combined together. Even not knowing the specific nature of what you're doing with the pairs, it's safe to say that many algorithmic problems are made much more straightforward when you're dealing with sorted data (including graph problems, which might be what you have on your hands here).
I think the answer to this question is going to depend very heavily on exactly the access pattern of the pair of objects. As you said, just sorting the pointers would be best in a simple, non-paired case. In a more complex case it may still make sense to sort by one of the halves of the pair if the pattern is such that locality for those values is more important (if, for example, these are key/value pairs and you are doing a lot of searches, locality for the keys is infinitely more important than for the values).
So, really, my answer is that this question can't be answered in a general case.
For storing your structure, what you actually want is probably a B-tree. These are designed for what you're talking about--keeping track of large collections where you don't want to (or can't) keep the whole thing in memory.

Resources