time complexity for binary search trees - big-o

If I use an insert() function for my bst, the time complexity can be as bad as O(n) and as good as O(log n). I'm assumng that if I had a perfectly balanced tree, the time complexity is log n because I am able to ignore half of the tree every time I go down a "branch". And if my tree is completely unbalanced it would be O(n). Am I correct for thinking this?

Yes, that is correct, see e.g. wikipedia, http://en.wikipedia.org/wiki/Binary_search_tree#Searching.
If you use e.g. C++ STL std::map or std::set, you get a red-black, balanced tree. Also worth noting is that with these STL data structures, you get this performance 100% of the time, which can be very important in e.g. hard real-time systems. Hash tables are even faster, but are not fast a 100% of the time like the red-black trees.

Related

What is faster in practice: Treap or Splay tree?

I've learned both Treap and Splay tree and solved few problems using them.
In theory, their complexity is O(log n) on average, but in worst-case Treap's complexity is O(n) while Splay tree's is amortized O(log n).
In which case does worst case occur in Treap (since its priorities are randomly chosen), and is Treap really slower than Splay tree? I've solved some tasks on SPOJ with both Splay tree and Treap, and solutions using Treap were a bit faster (around 0.2s) than ones using Splay tree. So which one is actually faster, and which one should I mainly use and when?
In practice, neither are really used. They are often way more complex than necessary. They're mostly interesting academically and for programming contests. I've really only run into red-black trees and B-trees in production code, other types of balanced trees are extremely rare.
If you're finding that treaps are faster, then just use them, as the O(n) worst case time performance is due to bad luck, not adversarial input. Splay trees are slightly slower because you have to "pay" for the amortization in practice to get the worst case down to O(log n).

Splay Tree Worst Case Search Time

Since splay tree is a type of unbalanced binary search tree (brilliant.org/wiki/splay-tree), it cannot guarantee a height of at most O(log(n)). Thus, I would think it cannot guarantee a worst case search time of O(log(n)).
But according to bigocheatsheet.com:
Splay Tree has worst case search time of O(log(n))???
You’re correct; the cost of a lookup in a splay tree can reach Θ(n) for an imbalanced tree.
Many resources like the big-O cheat sheet either make simplifying assumptions or just have factually incorrect data in them. It’s unclear whether they were just wrong here, or whether they were talking amortized worst case, etc.
It’s always best to know the internals of the data structures you’re working with so that you can understand where the runtimes come from.

Implementation of priority queue by AVL Tree data structure

Priority queue:
Basic operations: Insertion
Delete (Delete minumum element)
Goal: To provide efficient running time or order of growth for above functionality.
Implementation of Priority queue By:
Linked List: Insertion will take o(n) in case of insertion at end o(1) in case of
insertion at head.
Delet (Finding minumum and Delete this ) will take o(n)
BST:
Insertion/Deltion of minimum = In avg case it will take o(logn) worst case 0(n)
AVL Tree:
Insertion/deletion/searching: o(log n) in all cases.
My confusion goes here:
Why not we have used AVL Tree for implementation of Priority queue, Why we gone
for Binary heap...While as we know that in AVL Tree we can do insertion/ Deletion/searching in o(log n) in worst case.
Complexity isn't everything, there are other considerations for actual performance.
For most purposes, most people don't even use an AVL tree as a balanced tree (Red-Black trees are more common as far as I've seen), let alone as a priority queue.
This is not to say that AVL trees are useless, I quite like them. But they do have a relatively expensive insert. What AVL trees are good for (beating even Red-Black trees) is doing lots and lots of lookups without modification. This is not what you need for a priority queue.
As a separate consideration -- never mind your O(log n) insert for a binary heap, a fibonacci heap has O(1) insert and O(log N) delete-minimum. There are a lot of data structures to choose from with slightly different trade-offs, so you wouldn't expect to see everyone just pick the first thing that satisfies your (quite brief) criteria.
Binary heap is not Binary Search Tree (BST). If severely unbalanced / deteriorated into a list, it will indeed take O(n) time. Heaps are usually always O(log(n)) or better. IIRC Sedgewick claimed O(1) average-time for array-based heaps.
Why not AVL? Because it maintains too much order in a structure. Too much order means, too much effort went into maintaining that order. The less order we can get away with, the better - it will usually translate to faster operations. For example, RBTs are better than AVL trees. RBTs, red-black trees, are almost balanced trees - they save operations while still ensuring O(log(n)) time.
But any tree is totally-ordered structure, so heaps are generally better, because they only ensure that the minimal element is on top. They are only partially ordered.
Because in a binary heap the minimum element is the root.

Data structure needed

After doing some thought I came to the conclusion that I require a data structure that supports:
Insert
Remove
Find
Delete minimum
of course I want to implement this in the best complexity I can.
My thoughts are that a Self-balancing binary search tree will do A-D in O(log(n)) (worst case).
Maybe this can be improved somehow so A-C will be in O(log(n)) and D (that I think will be more frequent) will run in O(1).
I do a worst case analysis, but if you can think of something that will run 'fast' but it's Amortized analysis or on average than it's no problem.
any improvement to what I have in mind is welcomed!
(note: I believe that A and D will be much more frequent that B and C)
It needs to be some sort of sorted, balanced tree. It is not likely that any tree will be significantly better suited for the minimum deletion, as it will still require re-balancing anyway. All of the operations you ask for will be O(log(n)). Red-black trees are readily available in C++ and Java.
What you’re describing is a priority queue, augmented by a “find” operation.
It is usually implemented in terms of a min-heap. All operations you listed, except “find”, run in O(log n), and it is notably the most efficient overall data structure for this job. It is important to note that this is a special case of a binary tree that can be implemented much more efficiently than a general binary search tree, both in terms of memory consumption and performance (same asymptotic performance but much better constant factors).
Unfortunately, “find” still takes O(n).
It is implemented in Java in the PriorityQueue class.

Using red black trees for sorting

The worst-case running time of insertion on a red-black tree is O(lg n) and if I perform a in-order walk on the tree, I essentially visit each node, so the total worst-case runtime to print the sorted collection would be O(n lg n)
I am curious, why are red-black trees not preferred for sorting over quick sort (whose average-case running time is O(n lg n).
I see that maybe because red-black trees do not sort in-place, but I am not sure, so maybe someone could help.
Knowing which sort algorithm performs better really depend on your data and situation.
If you are talking in general/practical terms,
Quicksort (the one where you select the pivot randomly/just pick one fixed, making worst case Omega(n^2)) might be better than Red-Black Trees because (not necessarily in order of importance)
Quicksort is in-place. The keeps your memory footprint low. Say this quicksort routine was part of a program which deals with a lot of data. If you kept using large amounts of memory, your OS could start swapping your process memory and trash your perf.
Quicksort memory accesses are localized. This plays well with the caching/swapping.
Quicksort can be easily parallelized (probably more relevant these days).
If you were to try and optimize binary tree sorting (using binary tree without balancing) by using an array instead, you will end up doing something like Quicksort!
Red-Black trees have memory overheads. You have to allocate nodes possibly multiple times, your memory requirements with trees is doubles/triple that using arrays.
After sorting, say you wanted the 1045th (say) element, you will need to maintain order statistics in your tree (extra memory cost because of this) and you will have O(logn) access time!
Red-black trees have overheads just to access the next element (pointer lookups)
Red-black trees do not play well with the cache and the pointer accesses could induce more swapping.
Rotation in red-black trees will increase the constant factor in the O(nlogn).
Perhaps the most important reason (but not valid if you have lib etc available), Quicksort is very simple to understand and implement. Even a school kid can understand it!
I would say you try to measure both implementations and see what happens!
Also, Bob Sedgewick did a thesis on quicksort! Might be worth reading.
There are plenty of sorting algorithms which are worst case O(n log n) - for example, merge sort. The reason quicksort is preferred is because it is faster in practice, even though algorithmically it may not be as good as some other algorithms.
Often in-built sorts use a combination of various methods depending on the values of n.
There are many cases where red-back trees are not bad for sorting. My testing showed, compared to natural merge sort, that red-black trees excel where:
Trees are better for Dups:
All the tests where dups need to be eleminated, tree algorithm is better. This is not astonishing, since the tree can be kept very small from the beginning, whereby algorithms that are designed for inline array sort might pass around larger segments for a longer time.
Trees are better for Random:
All the tests with random, tree algorithm is better. This is also not astonishing, since in a tree distance between elements is shorter and shifting is not necessary. So repeatedly inserting into a tree could need less effort than sorting an array.
So we get the impression that the natural merge sort only excels in ascending and descending special cases. Which cant be even said for quick sort.
Gist with the test cases here.
P.S.: it should be noted that using trees for sorting is non-trivial. One has not only to provide an insert routine but also a routine that can linearize the tree back to an array. We are currently using a get_last and a predecessor routine, which doesn't need a stack. But these routines are not O(1) since they contain loops.
Big-O time complexity measures do not usually take into account scalar factors, e.g., O(2n) and O(4n) are usually just reduced to O(n). Time complexity analysis is based on operational steps at an algorithmic level, not at a strict programming level, i.e., no source code or native machine instruction considerations.
Quicksort is generally faster than tree-based sorting since (1) the methods have the same algorithmic average time complexity, and (2) lookup and swapping operations require fewer program commands and data accesses when working with simple arrays than with red-black trees, even if the tree uses an underlying array-based implementation. Maintenance of the red-black tree constraints requires additional operational steps, data field value storage/access (node colors), etc than the simple array partition-exchange steps of a quicksort.
The net result is that red-black trees have higher scalar coefficients than quicksort does that are being obscured by the standard O(n log n) average time complexity analysis result.
Some other practical considerations related to machine architectures are briefly discussed in the Quicksort article on Wikipedia
Generally, representations of O(nlgn) algorithms can be expanded to A*nlgn + B where A and B are constants. There are many algorithmic proofs that show the coefficients for quicksort are smaller than those of other algorithms. That is in best-case (quick sort performs horribly on sorted data).
Hi the best way to explain the difference between all sorting routine in my opinion is.
(My answer is for people who are confused how quick sort is faster in practice than another sorting algo).
"Think u are running on a very slow computer".
First thing one comparing operation takes 1 hour.
One shifting operation takes 2 hours.
"I am using hour just to make people understand how important time is".
Now from all the sorting operations quick-sort have very very less comparisons and very less swapping for elements.
Quick-sort is faster for this main reason.

Resources