When to use a treap - data-structures

Can anyone provide real examples of when is the best way to store your data is treap?
I want to understand in which situations treap will be better than heaps and tree structures.
If it's possible, please provide some examples from real situations.
I've tried to search cases of using treaps here and by googling, but did not find anything.
Thank you.

If hash values are used as priorities, treaps provide unique representation of the content.
Consider an order set of items implemented as an AVL-tree or rb-tree. Inserting items in different orders will typically end up in trees with different shapes (although all of them are balanced). For a given content a treap will always be of the same shape regardless of history.
I have seen two reasons for why unique representation could be useful:
Security reasons. A treap can not contain information on history.
Efficient sub tree sharing. The fastest algorithms for set operations I have seen use treaps.

I can not provide you any real-world examples. But I do use treaps to solve some problems in programming contests:
http://poj.org/problem?id=2761
http://poj.org/problem?id=3481
These are not actually real problems, but they make sense.

You can use it as a tree-based map implementation. Depending on the application, it could be faster. A couple of years ago I implemented a Treap and a Skip list myself (in Java) just for fun and did some basic benchmarking comparing them to TreeMap, and the Treap was the fastest. You can see the results here.
One of its greatest advantages is that it's very easy to implement, compared to Red-Black trees, for example. However, as far as I remember, it doesn't have a guaranteed cost in its operations (search is O(log n) with high probability), in comparison to Red-Black trees, which means that you wouldn't be able to use it in safety-critical applications where a specific time bound is a requirement.

Treaps are awesome variant of balanced binary search tree. There do exist many algorithms to balance binary trees, but most of them are horrible things with tons of special cases to handle. On the other hand , it is very easy to code Treaps.By making some use of randomness, we have a BBT that is expected to be of logarithmic height.
Some good problems to solve using treaps are --
http://www.spoj.com/problems/QMAX3VN/ ( Easy level )
http://www.spoj.com/problems/GSS6/ ( Moderate level )

Let's say you have a company and you want to create an inventory tool:
Be able to (efficiently) search products by name so you can update the stock.
Get, at any time, the product with the lowest items in stock, so that you are able to plan your next order.
One way to handle these requirements could be by using two different
data structures: one for efficient search by name, for instance, a
hash table, and a priority queue to get the item that most urgently
needs to be resupplied. You have to manage to coordinate those two
data structures and you will need more than twice memory. if we sort
the list of entries according to name, we need to scan the whole list
to find a given value for the other criterion, in this case, the
quantity in stock. Also, if we use a min-heap with the scarcer
products at its top, then we will need linear time to scan the whole
heap looking for a product to update.
Treap
Treap is the blend of tree and heap. The idea is to enforce BST’s
constraints on the names, and heap’s constraints on the quantities.
Product names are treated as the keys of a binary search tree.
The inventory quantities, instead, are treated as priorities of a
heap, so they define a partial ordering from top to bottom. For
priorities, like all heaps, we have a partial ordering, meaning that
only nodes on the same path from the root to leaves are ordered with
respect to their priority. In the above image, you can see that
children nodes always have a higher stock count than their parents,
but there is no ordering between siblings.
Reference

Any subtree in Treap is also a Treap (i.e. satisfies BST rule as well as min- or max- heap rule too). Due to this property, an ordered list can be easily split, or multiple ordered lists can be easily merged using Treaps than using an RB Tree. The implementation is easier. Design is also easier.

Related

Data structure for non overlapping ranges of integers?

I remember learning a data structure that stored a set of integers as ranges in a tree, but it's been 10 years and I can't remember the name of the data structure, and I'm a bit fuzzy on the details. If it helps, it's a functional data structure that was taught at CMU, I believe in 15-212 (Principles of Programming) in 2002.
Basically, I want to store a set of integers, most of which are consecutive. I want to be able to query for set membership efficiently, add a range of integers efficiently, and remove a range of integers efficiently. In particular, I don't care to preserve what the original ranges are. It's better if adjacent ranges are coalesced into a single larger range.
A naive implementation would be to simply use a generic set data structure such as a HashSet or TreeSet, and add all integers in a range when adding a range, or remove all integers in a range when removing a range. But of course, that would waste a lot of memory in addition to making add and remove slow.
I'm thinking of a purely functional data structure, but for my current use I don't need it to be. IIRC, lookup, insertion, and deletion were all O(log N), where N was the number of ranges in the set.
So, can you tell me the name of the data structure I'm trying to remember, or a suitable alternative?
I found the old homework and the data structure I had in mind were Discrete Interval Encoding Trees or diets for short. They are described in detail in Diets for Fat Sets, Martin Erwig. Journal of Functional Programming, Vol. 8, No. 6, 627-632, 1998. It is basically a tree of intervals with the invariant that all of the intervals are non-overlapping and non-touching. There is a Haskell implementation in Hackage. I was hoping there would be an existing implementation for Scala, but I'm not seeing any.
The homework also included another data structure they called a Recursive Interval-Occluding Tree (RIOT), which rather than keeping only an interval at each node keeps an interval and another (possibly empty) RIOT of things removed from the interval. The assignment included benchmarks showing it did better than diets for random insertions and deletions. AFAICT it is simply something the TAs made up and never published as it no longer seems to exist anywhere on the Internets, at least not under that name.
You probably are looking for segment trees. This might be helpful: http://www.topcoder.com/tc?d1=tutorials&d2=lowestCommonAncestor&module=Static
You can also use binary search trees for the same, for which each node will have two data fields: min_val and max_val.
During insertion algorithm, you just need to call another merging operation to check if the left-child,parent,right-child create a sequence, so as to club them into a single node. This will take O(log n) time.
Other operations like deletion and look-up will take O(log n) time as usual, but special measures need to be taken while deletion.

How do I balance a BK-Tree and is it necessary?

I am looking into using an Edit Distance algorithm to implement a fuzzy search in a name database.
I've found a data structure that will supposedly help speed this up through a divide and conquer approach - Burkhard-Keller Trees. The problem is that I can't find very much information on this particular type of tree.
If I populate my BK-tree with arbitrary nodes, how likely am I to have a balance problem?
If it is possibly or likely for me to have a balance problem with BK-Trees, is there any way to balance such a tree after it has been constructed?
What would the algorithm look like to properly balance a BK-tree?
My thinking so far:
It seems that child nodes are distinct on distance, so I can't simply rotate a given node in the tree without re-calibrating the entire tree under it. However, if I can find an optimal new root node this might be precisely what I should do. I'm not sure how I'd go about finding an optimal new root node though.
I'm also going to try a few methods to see if I can get a fairly balanced tree by starting with an empty tree, and inserting pre-distributed data.
Start with an alphabetically sorted list, then queue from the middle. (I'm not sure this is a great idea because alphabetizing is not the same as sorting on edit distance).
Completely shuffled data. (This relies heavily on luck to pick a "not so terrible" root by chance. It might fail badly and might be probabilistically guaranteed to be sub-optimal).
Start with an arbitrary word in the list and sort the rest of the items by their edit distance from that item. Then queue from the middle. (I feel this is going to be expensive, and still do poorly as it won't calculate metric space connectivity between all words - just each word and a single reference word).
Build an initial tree with any method, flatten it (basically like a pre-order traversal), and queue from the middle for a new tree. (This is also going to be expensive, and I think it may still do poorly as it won't calculate metric space connectivity between all words ahead of time, and will simply get a different and still uneven distribution).
Order by name frequency, insert the most popular first, and ditch the concept of a balanced tree. (This might make the most sense, as my data is not evenly distributed and I won't have pure random words coming in).
FYI, I am not currently worrying about the name-synonym problem (Bill vs William). I'll handle that separately, and I think completely different strategies would apply.
There is a lisp example in the article: http://cliki.net/bk-tree. About unbalancing the tree I think the data structure and the method seems to be complicated enough and also the author didn't say anything about unbalanced tree. When you experience unbalanced tree maybe it's not for you?

What is the proper data structure to store self-sorting list with repeating keys?

I need something that will work in O(log(n)) complexity, and I thought about AVL trees, but the problem is that some keys may repeat themselves (score of a person for example), so I can't think of how to implement it as a tree. What is a proper way to do this?
There are many options available. Most flavors of binary search trees can easily be modified to allow for nodes with duplicated values, since the balancing operations (usually) purely consist of rotations, which keep the sequence in order. For cases like these, you'd just do a normal BST insertion, but every time you see a duplicated value, you just arbitrarily move to the left or the right and continue as if the value were distinct.
Skiplists are particularly easy to update to support multiple copies of each key, since they don't do any complicated structural updates on insertions or deletions.
If you don't have auxiliary information associated with each key, then another simpler option would be to store a standard binary search tree, but to augment each node with a "count" field indicating how many logical copies of that field exist. Every time you do an insertion, if the key doesn't exist, you create it with count 1. If it already exists, you just increment the count in the existing node. Deletions would be implemented analogously.
Of course, if you don't want to roll your own data structure, just go and find a good implementation of a multimap or multiset, which should get the job done for you quite nicely. Depending on your Programming Language of Choice, you might even find these in the standard libraries. :-)

Useful data structure for the following case

Which can be the beste data structures for the following case.
1.Should have operations like search, insert and delete. Mostly searching activities will be there.Around 90% of the operations will be search and rest are delete and insert.
2 Insertion,deletion and searching will be based on the key of the objects. Each key will point to a object. The keys will be sorted.
Any suggestion for optimal data structure will be highly appreciated.
AVL tree, or at least BST.
If you want to acces often the same elements you might want to consider splay trees too.
(Should I explain why?)
Not sure by what you mean with "data structures"
I would suggest MySQL.
Read more here: WikiPedia
Self-balancing tree of sorts (AVL, RB), or a hash table.
My guess is that you want to optimize time. Overall, a red-black tree will have logarithmic-time performance in all three operations. It will probably be your best overall bet on execution time; however, red-black trees are complex to implement and require a node structure meaning they will be stored using more memory than the contained data itself requires.
You want a tree-backed Map; basically you just want a tree where the nodes are dynamically sorted ("self-balanced") by key, with your objects hanging off of each node with corresponding key.
If you would like an "optimal" data structure, that completely depends on the distribution of patterns of inputs you expect. The nice thing about a self-balancing tree is you don't really need to care too much about the pattern of inputs. If you really want the best-guess as-close-to-optimal as possible we know of, and you don't know much about the specific sequences of queries, you can use a http://en.wikipedia.org/wiki/Tango_tree which is O(log(log(N))-competitive. This grows so slowly that, for all practical purposes, you have something which performs no worse than effectively a constant factor from the best possible data structure you could have chosen.
However it's somewhat grungy to implement, you may just be better using a library for a self-balancing tree.
Python:
https://github.com/pgrafov/python-avl-tree/
Java:
If you're just Java, just use a TreeMap (red-black tree based) and ignore the implementation details. Most languages have similar data structures in their standard libraries.

Self-sorted data structure with random access

I need to implement self-sorted data structure with random access. Any ideas?
A self sorted data structure can be binary search trees. If you want a self sorted data structure and a self balanced one. AVL tree is the way to go. Retrieval time will be O(lgn) for random access.
Maintaining a sorted list and accessing it arbitrarily requires at least O(lgN) / operation. So, look for AVL, red-black trees, treaps or any other similar data structure and enrich them to support random indexing. I suggest treaps since they are the easiest to understand/implement.
One way to enrich the treap tree is to keep in each node the count of nodes in the subtree rooted at that node. You'll have to update the count when you modify the tree (eg: insertion/deletion).
I'm not too much involved lately with data structures implementation. Probably this answer is not an answer at all... you should see "Introduction to algorithms" written by Thomas Cormen. That book has many "recipes" with explanations about the inner workings of many data structures.
On the other hand you have to take into account how much time do you want to spend writing an algorithm, the size of the input and the if there is an actual necessity of an special kind of datastructure.
I see one thing missing from the answers here, the Skiplist
https://en.wikipedia.org/wiki/Skip_list
You get order automatically, there is a probabilistic element to search and creation.
Fits the question no worse than binary trees.
Self sorting is a little bit to ambigious. First of all
What kind of data structure?
There are a lot of different data structures out there, such as:
Linked list
Double linked list
Binary tree
Hash set / map
Stack
Heap
And many more and each of them behave differently than others and have their benefits of course.
Now, not all of them could or should be self-sorting, such as the Stack, it would be weird if that one were self-sorting.
However, the Linked List and the Binary Tree could be self sorting, and for this you could sort it in different ways and on different times.
For Linked Lists
I would preffere Insertion sort for this, you can read various good articles about this on both wikis and other places. I like the pasted link though. Look at it and try to understand the concept.
If you want to sort after it is inserted, i.e. on random times, well then you can just implement a sorting algororithm different than insertion sort maybe, bubblesort or maybe quicksort, I would avoid bubblesort though, it's a lot slower! But easier to gasp the mind around.
Random Access
Random is always something thats being discusses around so have a read about how to perform good randomization and you will be on your way, if you have a linked list and have a "getAt"-method, you could just randomize an index between 0 and n and get the item at that index.

Resources