What is faster in practice: Treap or Splay tree? - algorithm

I've learned both Treap and Splay tree and solved few problems using them.
In theory, their complexity is O(log n) on average, but in worst-case Treap's complexity is O(n) while Splay tree's is amortized O(log n).
In which case does worst case occur in Treap (since its priorities are randomly chosen), and is Treap really slower than Splay tree? I've solved some tasks on SPOJ with both Splay tree and Treap, and solutions using Treap were a bit faster (around 0.2s) than ones using Splay tree. So which one is actually faster, and which one should I mainly use and when?

In practice, neither are really used. They are often way more complex than necessary. They're mostly interesting academically and for programming contests. I've really only run into red-black trees and B-trees in production code, other types of balanced trees are extremely rare.
If you're finding that treaps are faster, then just use them, as the O(n) worst case time performance is due to bad luck, not adversarial input. Splay trees are slightly slower because you have to "pay" for the amortization in practice to get the worst case down to O(log n).

Related

Is there any practical application of Tango Trees?

Balanced binary search tree gives an O(log(n)) guaranteed search time.
Tango trees achieves a search of O(log(log(n)) while compromising small amount of memory per node. While I understand that from theoretical point of view log(n) and log(log(n)) makes a huge difference, for majority of practical applications it provides almost no advantage.
For example even for a huge number like n = 10^20 (which is like few thousand petabytes) the difference between log(n) = 64 and log(log(n)) = 6 is pretty negligible. So is there any practical usage of a Tango tree?
tl;dr: no, use a splay tree instead.
Tango trees don't give you O(log log n) worst case lookups -- the average case is I think O(log n log log n). What they do is run at most O(log log n) times more slowly than a binary tree with an oracle that performs rotations to optimize the access patterns.
Splay trees might run O(1) times more slowly than the aforementioned theoretical magic tree -- this is the Dynamic Optimality conjecture. Splay trees are much simpler than tango trees and will have lower constant factors to boot. I can't imagine a practical application where the tango tree guarantee would be useful.

time complexity for binary search trees

If I use an insert() function for my bst, the time complexity can be as bad as O(n) and as good as O(log n). I'm assumng that if I had a perfectly balanced tree, the time complexity is log n because I am able to ignore half of the tree every time I go down a "branch". And if my tree is completely unbalanced it would be O(n). Am I correct for thinking this?
Yes, that is correct, see e.g. wikipedia, http://en.wikipedia.org/wiki/Binary_search_tree#Searching.
If you use e.g. C++ STL std::map or std::set, you get a red-black, balanced tree. Also worth noting is that with these STL data structures, you get this performance 100% of the time, which can be very important in e.g. hard real-time systems. Hash tables are even faster, but are not fast a 100% of the time like the red-black trees.

How is insertion and deletion more faster in red black tree than AVL tree?

I would like to understand the difference bit better, but haven't found a source that can break it down to my level.
I am aware that both trees require at most 2 rotations per insertion. Then how is insertion faster in red-black trees?
And how insertion requires O(log n) rotations in avl tree while O(1) in red-black?
Well, I don't know what your level is, exactly, but to put it simply, red-black trees are less balanced than AVL trees. For red-black trees, the path from the root to the furthest leaf is no more than twice as long as the path from the root to the nearest leaf, while for AVL trees there is never more than one level difference between two neighboring subtrees. This makes insertions and deletions slightly more costly in AVL trees but lookup faster. The asymptotic and worst-case behavior of the two data structures is identical though (the runtime (not number of rotations) is O(log n) for insertions in both cases, the O(1) you mentioned is the so-called amortized runtime).
See this paragraph for a short comparison of the two data structures.
Insertion and deletion is not faster in red-black trees. This is a common ASSUMPTION and the assumption is based on the fact that red-black trees perform slightly fewer rotations on average per insert than AVL (.6 vs .7).
You can check for yourself in Java comparing TreeMap(red-black) to this implementation of TreeMapAVL and you can get exact numbers instead of the common, but incorrect, assumptions. https://github.com/dmcmanam/bbst-showdown

Implementation of priority queue by AVL Tree data structure

Priority queue:
Basic operations: Insertion
Delete (Delete minumum element)
Goal: To provide efficient running time or order of growth for above functionality.
Implementation of Priority queue By:
Linked List: Insertion will take o(n) in case of insertion at end o(1) in case of
insertion at head.
Delet (Finding minumum and Delete this ) will take o(n)
BST:
Insertion/Deltion of minimum = In avg case it will take o(logn) worst case 0(n)
AVL Tree:
Insertion/deletion/searching: o(log n) in all cases.
My confusion goes here:
Why not we have used AVL Tree for implementation of Priority queue, Why we gone
for Binary heap...While as we know that in AVL Tree we can do insertion/ Deletion/searching in o(log n) in worst case.
Complexity isn't everything, there are other considerations for actual performance.
For most purposes, most people don't even use an AVL tree as a balanced tree (Red-Black trees are more common as far as I've seen), let alone as a priority queue.
This is not to say that AVL trees are useless, I quite like them. But they do have a relatively expensive insert. What AVL trees are good for (beating even Red-Black trees) is doing lots and lots of lookups without modification. This is not what you need for a priority queue.
As a separate consideration -- never mind your O(log n) insert for a binary heap, a fibonacci heap has O(1) insert and O(log N) delete-minimum. There are a lot of data structures to choose from with slightly different trade-offs, so you wouldn't expect to see everyone just pick the first thing that satisfies your (quite brief) criteria.
Binary heap is not Binary Search Tree (BST). If severely unbalanced / deteriorated into a list, it will indeed take O(n) time. Heaps are usually always O(log(n)) or better. IIRC Sedgewick claimed O(1) average-time for array-based heaps.
Why not AVL? Because it maintains too much order in a structure. Too much order means, too much effort went into maintaining that order. The less order we can get away with, the better - it will usually translate to faster operations. For example, RBTs are better than AVL trees. RBTs, red-black trees, are almost balanced trees - they save operations while still ensuring O(log(n)) time.
But any tree is totally-ordered structure, so heaps are generally better, because they only ensure that the minimal element is on top. They are only partially ordered.
Because in a binary heap the minimum element is the root.

Data structure needed

After doing some thought I came to the conclusion that I require a data structure that supports:
Insert
Remove
Find
Delete minimum
of course I want to implement this in the best complexity I can.
My thoughts are that a Self-balancing binary search tree will do A-D in O(log(n)) (worst case).
Maybe this can be improved somehow so A-C will be in O(log(n)) and D (that I think will be more frequent) will run in O(1).
I do a worst case analysis, but if you can think of something that will run 'fast' but it's Amortized analysis or on average than it's no problem.
any improvement to what I have in mind is welcomed!
(note: I believe that A and D will be much more frequent that B and C)
It needs to be some sort of sorted, balanced tree. It is not likely that any tree will be significantly better suited for the minimum deletion, as it will still require re-balancing anyway. All of the operations you ask for will be O(log(n)). Red-black trees are readily available in C++ and Java.
What you’re describing is a priority queue, augmented by a “find” operation.
It is usually implemented in terms of a min-heap. All operations you listed, except “find”, run in O(log n), and it is notably the most efficient overall data structure for this job. It is important to note that this is a special case of a binary tree that can be implemented much more efficiently than a general binary search tree, both in terms of memory consumption and performance (same asymptotic performance but much better constant factors).
Unfortunately, “find” still takes O(n).
It is implemented in Java in the PriorityQueue class.

Resources