Running time complexity for binary search tree - data-structures

I already know if you try to find the item with particular key the running time of worst case
is O(n) ,nis the number of node. If you try to print out all the data items in order of their keys then the running time of worst case is O(n). If you try to search for a particular data item(you don't know the key) then the running time of worst case is O(n). However, what if the keys and data are both integers and, the input items were randomly scrambled before they were inserted. Will the worst cases of running time still the same?

In the worst-case, yes. A randomly-built BST with n nodes has a 2n-1 / n! chance of being built degenerately, which is extremely rare as n gets to any reasonable size but still possible. In that case, a lookup might take Θ(n) time because the search might need to descend all the way down to the deepest leaf.
On expectation, though, the tree height will be Θ(log n), so lookups will take expected O(log n) time.
The time to print a tree is independent of the shape of the tree, by the way. It's always Θ(n).
Hope this helps!

You might not be able to change the worst case running time of a normal BST, however, if you randomize the input(in less than O(log n) time, if you're targeting O(log n) overall), then chances of that worst case occurring are highly rare. See mathematical analysis here.
In case you are interested in guaranteed O(log n) time, you can use Balanced BSTs like Red Black Trees etc. However, time to print will still be O(n) as you still need to visit each and every node before you can print it.

Related

Worst case Big-O runtime for heaps

You intend to run heapify on an array of n integers, in order to turn it into a heap in-place. How long will this operation take, in the worst case? (choose the tightest possible bound)
Options are:
a) O(n)
b) O(nlogn)
c) O(nlog^2n)
d) O(n^2)
I tried this out and got the following:
Since we have at most n nodes we have O(n) and since we need to move up and compare only the height of the tree times we get O(logn) thereby giving us O(nlogn). But this solution is wrong.
Then I thought maybe we don't compare the only the height of the tree times because a smaller node can be placed on the right side of the tree forcing us to go all the way to the right side and I marked O(n^2). And that was wrong too. Any suggestions?

Data structure with O(1) insertion and O(log(n)) search complexity?

Is there any data structure available that would provide O(1) -- i.e. constant -- insertion complexity and O(log(n)) search complexity even in the worst case?
A sorted vector can do a O(log(n)) search but insertion would take O(n) (taken the fact that I am not always inserting the elements either at the front or the back). Whereas a list would do O(1) insertion but would fall short of providing O(log(n)) lookup.
I wonder whether such a data structure can even be implemented.
Yes, but you would have to bend the rules a bit in two ways:
1) You could use a structure that has O(1) insertion and O(1) search (such as the CritBit tree, also called bitwise trie) and add artificial cost to turn search into O(log n).
A critbit tree is like a binary radix tree for bits. It stores keys by walking along the bits of a key (say 32bits) and use the bit to decide whether to navigate left ('0') or right ('1') at every node. The maximum complexity for search and insertion is both O(32), which becomes O(1).
2) I'm not sure that this is O(1) in a strict theoretical sense, because O(1) works only if we limit the value range (to, say, 32 bit or 64 bit), but for practical purposes, this seems a reasonable limitation.
Note that the perceived performance will be O(log n) until a significant part of the possible key permutations are inserted. For example, for 16 bit keys you probably have to insert a significant part of 2^16 = 65563 keys.
No (at least in a model where the elements stored in the data structure can be compared for order only; hashing does not help for worst-case time bounds because there can be one big collision).
Let's suppose that every insertion requires at most c comparisons. (Heck, let's make the weaker assumption that n insertions require at most c*n comparisons.) Consider an adversary that inserts n elements and then looks up one. I'll describe an adversarial strategy that, during the insertion phase, forces the data structure to have Omega(n) elements that, given the comparisons made so far, could be ordered any which way. Then the data structure can be forced to search these elements, which amount to an unsorted list. The result is that the lookup has worst-case running time Omega(n).
The adversary's goal is to give away as little information as possible. Elements are sorted into three groups: winners, losers, and unknown. Initially, all elements are in the unknown group. When the algorithm compares two unknown elements, one chosen arbitrarily becomes a winner and the other becomes a loser. The winner is deemed greater than the loser. Similarly, unknown-loser, unknown-winner, and loser-winner comparisons are resolved by designating one of the elements a winner and the other a loser, without changing existing designations. The remaining cases are loser-loser and winner-winner comparisons, which are handled recursively (so the winners' group has a winner-unknown subgroup, a winner-winners subgroup, and a winner-losers subgroup). By an averaging argument, since at least n/2 elements are compared at most 2*c times, there exists a subsub...subgroup of size at least n/2 / 3^(2*c) = Omega(n). It can be verified that none of these elements are ordered by previous comparisons.
I wonder whether such a data structure can even be implemented.
I am afraid the answer is no.
Searching OK, Insertion NOT
When we look at the data structures like Binary search tree, B-tree, Red-black tree and AVL tree, they have average search complexity of O(log N), but at the same time the average insertion complexity is same as O(log N). Reason is obvious, the search will follow (or navigate through) the same pattern in which the insertion happens.
Insertion OK, Searching NOT
Data structures like Singly linked list, Doubly linked list have average insertion complexity of O(1), but again the searching in Singly and Doubly LL is painful O(N), just because they don't have any indexing based element access support.
Answer to your question lies in the Skiplist implementation, which is a linked list, still it needs O(log N) on average for insertion (when lists are expected to do insertion in O(1)).
On closing notes, Hashmap comes very close to meet the speedy search and speedy insertion requirement with the cost of huge space, but if horribly implemented, it can result into a complexity of O(N) for both insertion and searching.

Does every algorithm has a best case data input?

Does every algorithm has a 'best case' and 'worst case' , this was a question raised by someone who answered it with no ! I thought that every algorithm has a case depending on its input so that one algorithm finds that a particular set of input are the best case but other algorithms consider it the worst case.
so which answer is correct and if there are algorithms that doesn't have a best case can you give an example ?
Thank You :)
No, not every algorithm has a best and worst case. An example of that is the linear search to find the max/min element in an unsorted array: it always checks all items in the array no matter what. It's time complexity is therefore Theta(N) and it's independent of the particular input.
Best Case input is the casein which your code would take the least number of procedure calls. eg. You have an if in your code and in that, you iterate for every element and no such functionality in else part. So, any input in which the code does not enter if block will be the best case input and conversely, any input in which code enters this if will be worst case for this algorithm.
If, for any algorithm, branching or recursion or looping causes a difference in complexity factor for that algorithm, it will have a possible best case or possible worst case scenario. Otherwise, you can say that it does not or that it has similar complexity for best case or worst case.
Talking about sorting algorithms, lets take example of merge and quick sorts. (I believe you know them well, and their complexities for that matter).
In merge sort every time, array is divided into two equal parts thus taking log n factor in splitting while in recombining, it takes O(n) time (for every split, of course). So, total complexity is always O(n log n) and it does not depend on the input. So, you can either say merge sort has no best/worst case conditions or its complexity is same for best/worst cases.
On the other hand, if quick sort (not randomized, pivot always the 1st element) is given a random input, it will always divide the array in two parts, (may or may not be equal, doesn't matter) and if it does this, log factor of its complexity comes into picture (though base won't always be 2). But, if the input is sorted already (ascending or descending) it will always split it into 1 element + rest of array, so will take n-1 iterations to split the array, which changes its O(log n) factor to O(n) thereby changing complexity to O(n^2). So, quick sort will have best and worst cases with different time complexities.
Well, I believe every algorithm has a best and worst case though there's no guarantee that they will differ. For example, the algorithm to return the first element in an array has an O(1) best, worst and average case.
Contrived, I know, but what I'm saying is that it depends entirely on the algorithm what their best and worst cases are, but the cases will exist, even if they're the same, or unbounded at the top end.
I think its reasonable to say that most algorithms have a best and a worst case. If you think about algorithms in terms of Asymptotic Analysis you can say that a O(n) search algorithm will perform worse than a O(log n) algorithm. However if you provide the O(n) algorithm with data where the search item is early on in the data set and the O(log n) algorithm with data where the search item is in the last node to be found the O(n) will perform faster than the O(log n).
However an algorithm that has to examine each of the inputs every time such as an Average algorithm won't have a best/worst as the processing time would be the same no matter the data.
If you are unfamiliar with Asymptotic Analysis (AKA big O) I suggest you learn about it to get a better understanding of what you are asking.

Heapsort. How is it possible so simulate worstcase-scenario?

I am rather clear on how to programme it, but I am not sure on the definition, e.g. how to write it down in mathematics terms.
A normal heapsort is done with N elements in O notation. So O(log(n))
I just started with heapsort, so I might be a little bit off here.
But how can I for example look for a random element, when there are N elements?
And then pick that random element and delete it?
I was thinking that in a worst case - situation it has to go through the whole tree (Because the element could either be at the first place or at the last place, e.g. highest or lowest).
But how can I write that down in mathematics terms?
Heapsort's worst case performance is O(n log n), and to quote alestanis:
Max in max-heap: O(1). Min in min-heap: O(1). Opposite cases in O(n).
Here's an SO-answer explaining how to do the opposite cases in O(1) if you create the heap yourself.
To build maxheap array worstcase is O(n) and to max heapify complexcity in worst case is O(logn) so HeapSort worstCase is O(nlogn)

Intuition behind splay tree (self balancing trees)

I am reading the basics of splay trees. The amortized cost of an operation is O(log n) over n operations. Some rough basic idea is that when you access a node, you splay it i.e. you take it to root so next time this is quickly accessed and also if the node was deep, it enhances balance-ness of tree.
I don't understand how the tree can perform amortized O(log n) for this sample input:
Say a tree of n nodes is already built. My next n operations are n reads. I access a deep node say at depth n. This takes O(n). True that after this access, the tree will become balanced. But say every time I access the most current deep node. This will never be less than O(log n). then how we can ever compensate for the first costly O(n) operation and bring the amortized cost of each read as O(log n)?
Thanks.
Assuming your analysis is correct and the operations are O(log(n)) per access and O(n) the first time...
If you always access the bottommost element (using some kind of worst-case oracle), a sequence of a accesses will take O(a*log(n) + n). And thus the amortized cost per operation is O((a*log(n) + n)/a)=O(log(n) + n/a) or just O(log(n)) as the number of accesses grows large.
This is the definition of asymptotic average-case performance/time/space, also called "amortized performance/time/space". You are accidentally thinking that a single O(n) step means all steps are at least O(n); one such step is only a constant amount of work in the long run; the O(...) is hiding what's really going on, which is taking the limit of [total amount of work]/[queries]=[average ("amortized") work per query].
This will never be less than O(log n).
It has to be in order to get O(log n) average performance. To get intuition, the following website may be good: http://users.informatik.uni-halle.de/~jopsi/dinf504/chap4.shtml specifically the image http://users.informatik.uni-halle.de/~jopsi/dinf504/splay_example.gif -- it seems that while performing the O(n) operations, you move the path you searched scrunching it towards the top of the tree. You probably only have a finite number of such O(n) operations to perform until the entire tree is balanced.
Here's another way to think about it:
Consider an unbalanced binary search tree. You can spend O(n) time balancing it. Assuming you don't add elements to it*, it takes O(log(n)) amortized time per query to fetch an element. The balancing setup cost is included in the amortized cost because it is effectively a constant which, as demonstrated in the equations in the answer, disappears (is dwarfed) by the infinite amount of work you are doing. (*if you do add elements to it, you need a self-balancing binary search tree, one of which is a splay tree)

Resources