Balanced binary search tree gives an O(log(n)) guaranteed search time.
Tango trees achieves a search of O(log(log(n)) while compromising small amount of memory per node. While I understand that from theoretical point of view log(n) and log(log(n)) makes a huge difference, for majority of practical applications it provides almost no advantage.
For example even for a huge number like n = 10^20 (which is like few thousand petabytes) the difference between log(n) = 64 and log(log(n)) = 6 is pretty negligible. So is there any practical usage of a Tango tree?
tl;dr: no, use a splay tree instead.
Tango trees don't give you O(log log n) worst case lookups -- the average case is I think O(log n log log n). What they do is run at most O(log log n) times more slowly than a binary tree with an oracle that performs rotations to optimize the access patterns.
Splay trees might run O(1) times more slowly than the aforementioned theoretical magic tree -- this is the Dynamic Optimality conjecture. Splay trees are much simpler than tango trees and will have lower constant factors to boot. I can't imagine a practical application where the tango tree guarantee would be useful.
Related
I've learned both Treap and Splay tree and solved few problems using them.
In theory, their complexity is O(log n) on average, but in worst-case Treap's complexity is O(n) while Splay tree's is amortized O(log n).
In which case does worst case occur in Treap (since its priorities are randomly chosen), and is Treap really slower than Splay tree? I've solved some tasks on SPOJ with both Splay tree and Treap, and solutions using Treap were a bit faster (around 0.2s) than ones using Splay tree. So which one is actually faster, and which one should I mainly use and when?
In practice, neither are really used. They are often way more complex than necessary. They're mostly interesting academically and for programming contests. I've really only run into red-black trees and B-trees in production code, other types of balanced trees are extremely rare.
If you're finding that treaps are faster, then just use them, as the O(n) worst case time performance is due to bad luck, not adversarial input. Splay trees are slightly slower because you have to "pay" for the amortization in practice to get the worst case down to O(log n).
What's the worst case time complexity in a log-structured merge tree for a simple search query (like querying a single WHERE clause)?
Is it O(log N)? O(N*Log N)? Something else?
How about for a multiple query, like searching for multiple WHERE clauses in a key-value database?
The wikipedia page on LSM trees is currently lacking this info.
And I'm trying to make sense of the original paper.
I have been wondering the same.
If you have a series of trees, getting smaller by a constant factor each time, and you need to search them all for a single key, the cost seems to be O(log(N)^2).
Say the first (binary) tree takes log_2(N) branches to reach a node. The second might be half the size, and take (log_2(N) - 1) branches to find a node. The smallest tree will be some O(1) constant in size and there are roughly log_2(N) trees total. Summing the series gives O(log_2(N)^2).
However, I'm wondering if there is some more clever scheme where arbitrary single-key lookups, insertions or deletions have amortized cost O(log(N)), but haven't been able to find an answer (yet).
For a simple search indexed by a LSM tree, it is O(log n). This is because the biggest tree in the LSM tree is a B tree, which is O(log n), and the other trees are subsets of B trees or in the case of in memory trees, more efficient trees, which are no worse than O(log n). The number of trees is a constant, so it doesn't affect the order of the search time.
Priority queue:
Basic operations: Insertion
Delete (Delete minumum element)
Goal: To provide efficient running time or order of growth for above functionality.
Implementation of Priority queue By:
Linked List: Insertion will take o(n) in case of insertion at end o(1) in case of
insertion at head.
Delet (Finding minumum and Delete this ) will take o(n)
BST:
Insertion/Deltion of minimum = In avg case it will take o(logn) worst case 0(n)
AVL Tree:
Insertion/deletion/searching: o(log n) in all cases.
My confusion goes here:
Why not we have used AVL Tree for implementation of Priority queue, Why we gone
for Binary heap...While as we know that in AVL Tree we can do insertion/ Deletion/searching in o(log n) in worst case.
Complexity isn't everything, there are other considerations for actual performance.
For most purposes, most people don't even use an AVL tree as a balanced tree (Red-Black trees are more common as far as I've seen), let alone as a priority queue.
This is not to say that AVL trees are useless, I quite like them. But they do have a relatively expensive insert. What AVL trees are good for (beating even Red-Black trees) is doing lots and lots of lookups without modification. This is not what you need for a priority queue.
As a separate consideration -- never mind your O(log n) insert for a binary heap, a fibonacci heap has O(1) insert and O(log N) delete-minimum. There are a lot of data structures to choose from with slightly different trade-offs, so you wouldn't expect to see everyone just pick the first thing that satisfies your (quite brief) criteria.
Binary heap is not Binary Search Tree (BST). If severely unbalanced / deteriorated into a list, it will indeed take O(n) time. Heaps are usually always O(log(n)) or better. IIRC Sedgewick claimed O(1) average-time for array-based heaps.
Why not AVL? Because it maintains too much order in a structure. Too much order means, too much effort went into maintaining that order. The less order we can get away with, the better - it will usually translate to faster operations. For example, RBTs are better than AVL trees. RBTs, red-black trees, are almost balanced trees - they save operations while still ensuring O(log(n)) time.
But any tree is totally-ordered structure, so heaps are generally better, because they only ensure that the minimal element is on top. They are only partially ordered.
Because in a binary heap the minimum element is the root.
After doing some thought I came to the conclusion that I require a data structure that supports:
Insert
Remove
Find
Delete minimum
of course I want to implement this in the best complexity I can.
My thoughts are that a Self-balancing binary search tree will do A-D in O(log(n)) (worst case).
Maybe this can be improved somehow so A-C will be in O(log(n)) and D (that I think will be more frequent) will run in O(1).
I do a worst case analysis, but if you can think of something that will run 'fast' but it's Amortized analysis or on average than it's no problem.
any improvement to what I have in mind is welcomed!
(note: I believe that A and D will be much more frequent that B and C)
It needs to be some sort of sorted, balanced tree. It is not likely that any tree will be significantly better suited for the minimum deletion, as it will still require re-balancing anyway. All of the operations you ask for will be O(log(n)). Red-black trees are readily available in C++ and Java.
What you’re describing is a priority queue, augmented by a “find” operation.
It is usually implemented in terms of a min-heap. All operations you listed, except “find”, run in O(log n), and it is notably the most efficient overall data structure for this job. It is important to note that this is a special case of a binary tree that can be implemented much more efficiently than a general binary search tree, both in terms of memory consumption and performance (same asymptotic performance but much better constant factors).
Unfortunately, “find” still takes O(n).
It is implemented in Java in the PriorityQueue class.
I am reading the basics of splay trees. The amortized cost of an operation is O(log n) over n operations. Some rough basic idea is that when you access a node, you splay it i.e. you take it to root so next time this is quickly accessed and also if the node was deep, it enhances balance-ness of tree.
I don't understand how the tree can perform amortized O(log n) for this sample input:
Say a tree of n nodes is already built. My next n operations are n reads. I access a deep node say at depth n. This takes O(n). True that after this access, the tree will become balanced. But say every time I access the most current deep node. This will never be less than O(log n). then how we can ever compensate for the first costly O(n) operation and bring the amortized cost of each read as O(log n)?
Thanks.
Assuming your analysis is correct and the operations are O(log(n)) per access and O(n) the first time...
If you always access the bottommost element (using some kind of worst-case oracle), a sequence of a accesses will take O(a*log(n) + n). And thus the amortized cost per operation is O((a*log(n) + n)/a)=O(log(n) + n/a) or just O(log(n)) as the number of accesses grows large.
This is the definition of asymptotic average-case performance/time/space, also called "amortized performance/time/space". You are accidentally thinking that a single O(n) step means all steps are at least O(n); one such step is only a constant amount of work in the long run; the O(...) is hiding what's really going on, which is taking the limit of [total amount of work]/[queries]=[average ("amortized") work per query].
This will never be less than O(log n).
It has to be in order to get O(log n) average performance. To get intuition, the following website may be good: http://users.informatik.uni-halle.de/~jopsi/dinf504/chap4.shtml specifically the image http://users.informatik.uni-halle.de/~jopsi/dinf504/splay_example.gif -- it seems that while performing the O(n) operations, you move the path you searched scrunching it towards the top of the tree. You probably only have a finite number of such O(n) operations to perform until the entire tree is balanced.
Here's another way to think about it:
Consider an unbalanced binary search tree. You can spend O(n) time balancing it. Assuming you don't add elements to it*, it takes O(log(n)) amortized time per query to fetch an element. The balancing setup cost is included in the amortized cost because it is effectively a constant which, as demonstrated in the equations in the answer, disappears (is dwarfed) by the infinite amount of work you are doing. (*if you do add elements to it, you need a self-balancing binary search tree, one of which is a splay tree)