Splay Tree Worst Case Search Time - algorithm

Since splay tree is a type of unbalanced binary search tree (brilliant.org/wiki/splay-tree), it cannot guarantee a height of at most O(log(n)). Thus, I would think it cannot guarantee a worst case search time of O(log(n)).
But according to bigocheatsheet.com:
Splay Tree has worst case search time of O(log(n))???

You’re correct; the cost of a lookup in a splay tree can reach Θ(n) for an imbalanced tree.
Many resources like the big-O cheat sheet either make simplifying assumptions or just have factually incorrect data in them. It’s unclear whether they were just wrong here, or whether they were talking amortized worst case, etc.
It’s always best to know the internals of the data structures you’re working with so that you can understand where the runtimes come from.

Related

Why is amortized analysis of splay tree only focusing on the splay operation and not accounting for the downwards search

Each dictionary operation in a splay tree uses a splay operation to bring a node to the root of the tree. The amortized efficiency of this splay operation is typically analyzed with the potential method and described in many sources online (including the wikipedia) page. The amortized time of this splay operation is then reported as O(m lg n).
However, I nowhere find an actual analysis of complete dictionary operations, such as insert, delete, ...
Each of these operations uses, besides a splay operation, also a downward search through the tree to find the correct position of the node to insert or delete. Only after you have found that node, you can start the splay operation.
People tend to make statements like:
"the complexity of splay tree operation is the same as that of the associated splay"
["For our analysis, we note that the time for performing a search, insertion, or deletion
is proportional to the time for the associated splaying"], p. 456 of the book of Goodrich titled "Data structures and algorithms in C++"
I have two questions:
How is one able to make this conclusion that the time to perform a search is proportional to the time for the splay? This kind of implies that the time for the downwards traversal to the node is also proportional to the tie of the splay?
What is the amortized time efficiency of a downwards traversal? Is it a constant, simply because you don't change the structure of the tree by simply doing a downwards traversal (so your potential stays the same)? And isn't this constant than = N, since this is the worst case?
How is one able to make this conclusion?
How is one able to make this conclusion that the time to perform a search is proportional to the time for the splay? This kind of implies that the time for the downwards traversal to the node is also proportional to the tie of the splay?
The splay phase operates on each of the nodes traversed during the search phase. Since the work done at each node during the search phase is constant, we infer that over any sequence of operations, search = O(splay), hence O(search + splay) = O(splay).
What is the amortized time efficiency of a downwards traversal? Is it a constant, simply because you don't change the structure of the tree by simply doing a downwards traversal (so your potential stays the same)? And isn't this constant than = N, since this is the worst case?
Yes, if it were possible to search without splaying afterward. For the reason previously discussed, we treat them as inseparable, so effectively we multiply by a constant the amortization credits used by a lengthy splay to cover the search too.

Time complexity of binary search in a slightly unbalanced binary tree

The best case running time for binary search is O(log(n)), if the binary tree is balanced. The worst case would be, if the binary tree is so unbalanced, that it basically represents a linked list. In that case the running time of a binary search would be O(n).
However, what if the tree is only slightly unbalanced, as is teh case for this tree:
Best case would still be O(log n) if I am not mistaken. But what would be the worst case?
Typically, when we say something like "the cost of looking up an element in a balanced binary search tree is O(log n)," what we mean is "in the worst case, we have to do O(log n) work in the course of performing a search on a balanced binary search tree." And since we're talking about big-O notation here, the previous statement is meant to be taken about balanced trees in general rather than a specific concrete tree.
If you have a specific BST in mind, you can work out the maximum number of comparisons required to find any element. Just find the deepest node in the tree, then imagine searching for a value that's bigger than that value but smaller than the next value in the tree. That will cause you to walk all the way down the tree as deeply as possible, making the maximum number of comparisons possible (specifically, h + 1 of them, where h is the height of the tree).
To be able to talk about the big-O cost of performing lookups in a tree, you'd need to talk about a family of trees of different numbers of nodes. You could imagine "kinda balanced" trees whose depth is Θ(√n), for example, where lookups would take time O(√n), for example. However, it's uncommon to encounter trees like that in practice, since generally you'd either (1) have a totally imbalanced tree or (2) use some sort of balanced tree that would prevent the height from getting that high.
In a sorted array of n values, the run-time of binary search for a value, is
O(log n), in the worst case. In the best case, the element you are searching for, is in the exact middle, and it can finish up in constant-time. In the average case too, the run-time is O(log n).

Splay tree: worst-case sequence

I want to try trying executing the worst-case sequence on Splay tree.
But what is the worst-case sequence on Splay-trees?
And are there any way to calculate this sequence easily given the keys which is inserted into the tree?
Any can help me with this?
Unless someone corrects me, I'm going to go with "no one actually knows what the worst-case series of operations is on a splay tree or what the complexity is in that case."
While we do know many results about the efficiency of splay trees, we actually don't know all that much about how to bound the time complexity of a splay tree. There's a conjecture called the dynamic optimality conjecture that says that in the worst case, any sufficiently long series of operations on a splay tree will take no more than a constant amount of time more than the best possible self-adjusting binary search tree on that series of operations. One of the challenges we're having in trying to prove this is that no one actually knows how to determine the cost of the best possible BST on all inputs. Another is that finding upper bounds on the runtimes of various input combinations to splay trees is hard - as of now, no one knows whether it takes time O(n) to treat a splay tree as a deque!
Hope this helps!
I don't know if an attempt of an answer after more than five years is of any use to you, but sorry, I made my Master in CS only recently :-) In the wake of that, I played around exactly with your question.
Consider the sequence S(3,2) (it should be obvious how S(m,n) works generally if you graph it): S(3,2)=[5,13,6,14,3,15,4,16,1,17,2,18,11,19,12,20,9,21,10,22,7,23,8,24]. Splay is so lousy on this sequence type that the competive ratio r to the "Greedy Future" algorithm (see Demaine) is S[infty,infty]=2. I was never able to get over 2 even though Greedy Future is also not completely optimal and I could shave off a few operations.
(Legend: black,grey,blue: S(7,4); purple,orange,red: Splay must access these points too. Shown in the Demaine formulation.)
But note that your question is somewhat ill defined! If you ask for the absolutely worst sequence, then take e.g. the bit-reversal sequence, ANY tree algorithm needs O(n log n) for that. But if you ask for the competetive ratio r as implied in templatetypdef's answer, then indeed nobody knows (but I would make bets on r=2, see above).
Feel free to email me for details, I'm easily googled.

time complexity for binary search trees

If I use an insert() function for my bst, the time complexity can be as bad as O(n) and as good as O(log n). I'm assumng that if I had a perfectly balanced tree, the time complexity is log n because I am able to ignore half of the tree every time I go down a "branch". And if my tree is completely unbalanced it would be O(n). Am I correct for thinking this?
Yes, that is correct, see e.g. wikipedia, http://en.wikipedia.org/wiki/Binary_search_tree#Searching.
If you use e.g. C++ STL std::map or std::set, you get a red-black, balanced tree. Also worth noting is that with these STL data structures, you get this performance 100% of the time, which can be very important in e.g. hard real-time systems. Hash tables are even faster, but are not fast a 100% of the time like the red-black trees.

Data structure needed

After doing some thought I came to the conclusion that I require a data structure that supports:
Insert
Remove
Find
Delete minimum
of course I want to implement this in the best complexity I can.
My thoughts are that a Self-balancing binary search tree will do A-D in O(log(n)) (worst case).
Maybe this can be improved somehow so A-C will be in O(log(n)) and D (that I think will be more frequent) will run in O(1).
I do a worst case analysis, but if you can think of something that will run 'fast' but it's Amortized analysis or on average than it's no problem.
any improvement to what I have in mind is welcomed!
(note: I believe that A and D will be much more frequent that B and C)
It needs to be some sort of sorted, balanced tree. It is not likely that any tree will be significantly better suited for the minimum deletion, as it will still require re-balancing anyway. All of the operations you ask for will be O(log(n)). Red-black trees are readily available in C++ and Java.
What you’re describing is a priority queue, augmented by a “find” operation.
It is usually implemented in terms of a min-heap. All operations you listed, except “find”, run in O(log n), and it is notably the most efficient overall data structure for this job. It is important to note that this is a special case of a binary tree that can be implemented much more efficiently than a general binary search tree, both in terms of memory consumption and performance (same asymptotic performance but much better constant factors).
Unfortunately, “find” still takes O(n).
It is implemented in Java in the PriorityQueue class.

Resources