What is the most optimal data structure for storing a partially-ordered set (poset)? - algorithm

I am looking for a data structure for storing a poset, that supports the following operations with good big-Oh complexity (these are done frequently):
determining if a given element can be consistently ordered before another one (top priority above others)
adding new ordering constraints
returning all elements that can be ordered before a given element
The data structure must also support generating a totally ordered set from the poset, but that only needs to be done once for the output after the algorithm has already run.
So far I have been relying on a naive poset implementation, but I am reaching a point where that is no longer sustainable.
I have looked at https://cstheory.stackexchange.com/questions/8503/what-persistent-data-structure-for-a-set-of-partially-ordered-elements, and there are some interesting papers in the answers, but most of these data structures (like ChainMerge) seem to be optimized with sorting in mind, where for me it is the least important step.


Data structure for non overlapping ranges of integers?

I remember learning a data structure that stored a set of integers as ranges in a tree, but it's been 10 years and I can't remember the name of the data structure, and I'm a bit fuzzy on the details. If it helps, it's a functional data structure that was taught at CMU, I believe in 15-212 (Principles of Programming) in 2002.
Basically, I want to store a set of integers, most of which are consecutive. I want to be able to query for set membership efficiently, add a range of integers efficiently, and remove a range of integers efficiently. In particular, I don't care to preserve what the original ranges are. It's better if adjacent ranges are coalesced into a single larger range.
A naive implementation would be to simply use a generic set data structure such as a HashSet or TreeSet, and add all integers in a range when adding a range, or remove all integers in a range when removing a range. But of course, that would waste a lot of memory in addition to making add and remove slow.
I'm thinking of a purely functional data structure, but for my current use I don't need it to be. IIRC, lookup, insertion, and deletion were all O(log N), where N was the number of ranges in the set.
So, can you tell me the name of the data structure I'm trying to remember, or a suitable alternative?
I found the old homework and the data structure I had in mind were Discrete Interval Encoding Trees or diets for short. They are described in detail in Diets for Fat Sets, Martin Erwig. Journal of Functional Programming, Vol. 8, No. 6, 627-632, 1998. It is basically a tree of intervals with the invariant that all of the intervals are non-overlapping and non-touching. There is a Haskell implementation in Hackage. I was hoping there would be an existing implementation for Scala, but I'm not seeing any.
The homework also included another data structure they called a Recursive Interval-Occluding Tree (RIOT), which rather than keeping only an interval at each node keeps an interval and another (possibly empty) RIOT of things removed from the interval. The assignment included benchmarks showing it did better than diets for random insertions and deletions. AFAICT it is simply something the TAs made up and never published as it no longer seems to exist anywhere on the Internets, at least not under that name.
You probably are looking for segment trees. This might be helpful: http://www.topcoder.com/tc?d1=tutorials&d2=lowestCommonAncestor&module=Static
You can also use binary search trees for the same, for which each node will have two data fields: min_val and max_val.
During insertion algorithm, you just need to call another merging operation to check if the left-child,parent,right-child create a sequence, so as to club them into a single node. This will take O(log n) time.
Other operations like deletion and look-up will take O(log n) time as usual, but special measures need to be taken while deletion.

When to use a treap

Can anyone provide real examples of when is the best way to store your data is treap?
I want to understand in which situations treap will be better than heaps and tree structures.
If it's possible, please provide some examples from real situations.
I've tried to search cases of using treaps here and by googling, but did not find anything.
Thank you.
If hash values are used as priorities, treaps provide unique representation of the content.
Consider an order set of items implemented as an AVL-tree or rb-tree. Inserting items in different orders will typically end up in trees with different shapes (although all of them are balanced). For a given content a treap will always be of the same shape regardless of history.
I have seen two reasons for why unique representation could be useful:
Security reasons. A treap can not contain information on history.
Efficient sub tree sharing. The fastest algorithms for set operations I have seen use treaps.
I can not provide you any real-world examples. But I do use treaps to solve some problems in programming contests:
These are not actually real problems, but they make sense.
You can use it as a tree-based map implementation. Depending on the application, it could be faster. A couple of years ago I implemented a Treap and a Skip list myself (in Java) just for fun and did some basic benchmarking comparing them to TreeMap, and the Treap was the fastest. You can see the results here.
One of its greatest advantages is that it's very easy to implement, compared to Red-Black trees, for example. However, as far as I remember, it doesn't have a guaranteed cost in its operations (search is O(log n) with high probability), in comparison to Red-Black trees, which means that you wouldn't be able to use it in safety-critical applications where a specific time bound is a requirement.
Treaps are awesome variant of balanced binary search tree. There do exist many algorithms to balance binary trees, but most of them are horrible things with tons of special cases to handle. On the other hand , it is very easy to code Treaps.By making some use of randomness, we have a BBT that is expected to be of logarithmic height.
Some good problems to solve using treaps are --
http://www.spoj.com/problems/QMAX3VN/ ( Easy level )
http://www.spoj.com/problems/GSS6/ ( Moderate level )
Let's say you have a company and you want to create an inventory tool:
Be able to (efficiently) search products by name so you can update the stock.
Get, at any time, the product with the lowest items in stock, so that you are able to plan your next order.
One way to handle these requirements could be by using two different
data structures: one for efficient search by name, for instance, a
hash table, and a priority queue to get the item that most urgently
needs to be resupplied. You have to manage to coordinate those two
data structures and you will need more than twice memory. if we sort
the list of entries according to name, we need to scan the whole list
to find a given value for the other criterion, in this case, the
quantity in stock. Also, if we use a min-heap with the scarcer
products at its top, then we will need linear time to scan the whole
heap looking for a product to update.
Treap is the blend of tree and heap. The idea is to enforce BST’s
constraints on the names, and heap’s constraints on the quantities.
Product names are treated as the keys of a binary search tree.
The inventory quantities, instead, are treated as priorities of a
heap, so they define a partial ordering from top to bottom. For
priorities, like all heaps, we have a partial ordering, meaning that
only nodes on the same path from the root to leaves are ordered with
respect to their priority. In the above image, you can see that
children nodes always have a higher stock count than their parents,
but there is no ordering between siblings.
Any subtree in Treap is also a Treap (i.e. satisfies BST rule as well as min- or max- heap rule too). Due to this property, an ordered list can be easily split, or multiple ordered lists can be easily merged using Treaps than using an RB Tree. The implementation is easier. Design is also easier.

What is the proper data structure to store self-sorting list with repeating keys?

I need something that will work in O(log(n)) complexity, and I thought about AVL trees, but the problem is that some keys may repeat themselves (score of a person for example), so I can't think of how to implement it as a tree. What is a proper way to do this?
There are many options available. Most flavors of binary search trees can easily be modified to allow for nodes with duplicated values, since the balancing operations (usually) purely consist of rotations, which keep the sequence in order. For cases like these, you'd just do a normal BST insertion, but every time you see a duplicated value, you just arbitrarily move to the left or the right and continue as if the value were distinct.
Skiplists are particularly easy to update to support multiple copies of each key, since they don't do any complicated structural updates on insertions or deletions.
If you don't have auxiliary information associated with each key, then another simpler option would be to store a standard binary search tree, but to augment each node with a "count" field indicating how many logical copies of that field exist. Every time you do an insertion, if the key doesn't exist, you create it with count 1. If it already exists, you just increment the count in the existing node. Deletions would be implemented analogously.
Of course, if you don't want to roll your own data structure, just go and find a good implementation of a multimap or multiset, which should get the job done for you quite nicely. Depending on your Programming Language of Choice, you might even find these in the standard libraries. :-)

Best Data Structure to Store Large Amounts of Data with Dynamic and Non-unique Keys?

Basically, I have a large number of C structs to keep track of, that are essentially:
struct Data {
int key;
... // More data
I need to periodically access lots (hundreds) of these, and they must be sorted from lowest to highest key values. The keys are not unique and they will be changed over the course of the program. To make matters even more interesting, the majority of the structures will be culled (based on criteria completely unrelated to the key values) from the pool right before being sorted, but I still need to keep references to them.
I've looked into using a binary search tree to store them, but the keys are not guaranteed to be unique and I'm not entirely sure how to restructure the tree once a key is changed or how to cull specific structures.
To recap in case that was unclear above, I need to:
Store a large number of structures with non-unique and dynamic keys.
Cull a large percentage of the structures (but not free them entirely because different structures are culled each time).
Sort the remaining structures from highest to lowest key value.
What data structure/algorithms would you use to solve this problem? The method needs to be as fast and/or memory efficient as possible, since this is a real-time application.
EDIT: The culling is done by iterating over all of the objects and making a decision for each one. The keys change between the culling/sorting runs. I should have stated that they don't change a lot, but they do change, and they can change multiple times between the culling/sorting runs. (If it helps, the key for each structure is actually a z-order for a Sprite. They need to be sorted before each drawing loop so the Sprites with lower z-orders are drawn first.)
Just stick 'em all in a big array.
When the time comes to do the cull and sort, start by doing the sort. Do an insertion sort. That's right - nothing clever, just an insertion sort.
After the sort, go through the sorted array, and for each object, make the culling decision, then immediately output the object if it isn't culled.
This is about as memory-efficient as it gets. It should also require very little computation: there's no bookkeeping on updates between cull/sort passes, and the sort will be cheap - because insertion sort is adaptive, and for an almost-sorted array like this, it will be almost O(n). The one thing it doesn't do is cache locality: there will be two separate passes over the array, for the sort, and the cull/output.
If you demand more cleverness, then instead of an insertion sort, you could use another adaptive, in-place sort that's faster. Timsort and smoothsort are good candidates; both are utterly fiendish to implement.
The big alternative to this is to only sort unculled objects, using a secondary, temporary, list of such objects which you sort (or keep in a binary tree or whatever). But the thing is, if the keys don't change that much, then the win you get from using an adaptive sort on an almost-sorted array will (i reckon!) outweigh the win you would get from sorting a smaller dataset. It's O(n) vs O(n log n).
The general solution to this type of problem is to use a balanced search tree (e.g. AVL tree, red-black tree, B-tree), which guarantees O(log n) time (almost constant, but not quite) for insertion, deletion, and lookup, where n is the number of items currently stored in the tree. Guaranteeing no key is stored in the tree twice is quite trivial, and is done automatically by many implementations.
If you're working in C++, you could try using std::map<int, yourtype>. If in C, find or implement some simple binary search tree code, and see if it's fast enough.
However, if you use such a tree and find it's too slow, you could look into some more fine-tuned approaches. One might be to put your structs in one big array, radix sort by the integer key, cull on it, then re-sort per pass. Another approach might be to use a Patricia tree.

Self-sorted data structure with random access

I need to implement self-sorted data structure with random access. Any ideas?
A self sorted data structure can be binary search trees. If you want a self sorted data structure and a self balanced one. AVL tree is the way to go. Retrieval time will be O(lgn) for random access.
Maintaining a sorted list and accessing it arbitrarily requires at least O(lgN) / operation. So, look for AVL, red-black trees, treaps or any other similar data structure and enrich them to support random indexing. I suggest treaps since they are the easiest to understand/implement.
One way to enrich the treap tree is to keep in each node the count of nodes in the subtree rooted at that node. You'll have to update the count when you modify the tree (eg: insertion/deletion).
I'm not too much involved lately with data structures implementation. Probably this answer is not an answer at all... you should see "Introduction to algorithms" written by Thomas Cormen. That book has many "recipes" with explanations about the inner workings of many data structures.
On the other hand you have to take into account how much time do you want to spend writing an algorithm, the size of the input and the if there is an actual necessity of an special kind of datastructure.
I see one thing missing from the answers here, the Skiplist
You get order automatically, there is a probabilistic element to search and creation.
Fits the question no worse than binary trees.
Self sorting is a little bit to ambigious. First of all
What kind of data structure?
There are a lot of different data structures out there, such as:
Linked list
Double linked list
Binary tree
Hash set / map
And many more and each of them behave differently than others and have their benefits of course.
Now, not all of them could or should be self-sorting, such as the Stack, it would be weird if that one were self-sorting.
However, the Linked List and the Binary Tree could be self sorting, and for this you could sort it in different ways and on different times.
For Linked Lists
I would preffere Insertion sort for this, you can read various good articles about this on both wikis and other places. I like the pasted link though. Look at it and try to understand the concept.
If you want to sort after it is inserted, i.e. on random times, well then you can just implement a sorting algororithm different than insertion sort maybe, bubblesort or maybe quicksort, I would avoid bubblesort though, it's a lot slower! But easier to gasp the mind around.
Random Access
Random is always something thats being discusses around so have a read about how to perform good randomization and you will be on your way, if you have a linked list and have a "getAt"-method, you could just randomize an index between 0 and n and get the item at that index.
