next best/worst case for any algorithms - algorithm

Encountered a question like this for mergesort specifically and was wondering how does one approach a question like this for other algorithms (insertionsort,heapsort,quicksort and etc)
Is it safe to assume that the nth best/worst arrangement for any algorithm is the nth step of solving the best/worst arrangement for the same set of data?
Example:
If the worst case for mergesort with the following array of integers [1,2,3,4,5,6,7,8] is [1,5,3,7,2,6,4,8]. What is the next worst case for this array of integers?
I assumed it would be the next arrangement when solving the worst case which is [1,3,5,7,2,6,4,8]. Am I approaching such a question wrongly?

The concept of a "next-best" or "next-worst" case is not really well-defined in the first place. Neither is the concept of "the state of the array after one step", because not all algorithms modify an array in-place.
When we say the "worst case" of an algorithm, we don't mean a single input to an algorithm. For example, the array [5, 4, 3, 2, 1] is not - by itself - the worst case of the insertion sort algorithm. This array is one of the worst inputs (i.e. highest number of steps to compute) for insertion sort out of arrays of length 5, but we are very rarely interested in arrays of one specific length.
What we mean by "best case" or "worst case" is actually an infinite family of inputs, such that each member of that family is a best or worst input for its own value of n, and the family must contain inputs for arbitrarily large values of n. So, for example:
The infinite set of arrays {[1], [2, 1], [3, 2, 1], [4, 3, 2, 1], ...} is a worst case for insertion sort. For inputs from this infinite set, the asymptotic complexity of insertion sort is Θ(n2).
The infinite set of arrays {[1], [1, 2], [1, 2, 3], [1, 2, 3, 4], ...} is a best case for insertion sort. For inputs from this infinite set, the asymptotic complexity of insertion sort is Θ(n).
Note that the (larger) infinite set of all arrays which are in descending order is also a worst case for insertion sort, and the (larger) infinite set of all arrays in ascending order is also a best case. So the family is not unique, but the asymptotic complexity of the algorithm on inputs from any two "best case" (or any two "worst case") families is the same.
Now we've got that out of the way, let's think about what a "next-best" or "next-worst" case would have to mean. If the asymptotic complexity of insertion sort on some family of inputs is also Θ(n2), then that family is a worst case for insertion sort; so the asymptotic complexity of a "next-worst" case would have to be something lower than Θ(n2).
But however small a gap you choose, it is not the "next-worst":
If you choose a family where the complexity is Θ(n1.999), then it is not "next-worst" because I can find another family where the complexity is Θ(n1.9999).
If you choose a family where the complexity is Θ(n2 / log n), I can find one where it's Θ(n2 / log log n).
That is, the asymptotic complexities of different families of possible inputs for insertion sort form a dense order, for any two different complexities there is another complexity in between those two, so there is no "next" or "previous" one.

Related

Why is Selection Sort said to have O(n) swaps?

I am reading about use cases of Selection Sort, and this source says:
(selection sort is used when...) cost of writing to a memory matters like in flash memory (number of writes/swaps is O(n) as compared to O(n2) of bubble sort)
We can even see O(n^2) swaps in this example:
[1, 2, 3, 4, 5]. It's going to have 4 swaps, then 3, then 2, and 1. That is O(n^2), not O(n) swaps. Why do they say the opposite?
A selection sort has a time complexity of O(n2), but only O(n) swaps.
In each iteration i, you go over all the remaining items (in indexes i and onwards), find the right value to populate that index, and swap it there. So in total you perform O(n2) comparisons, but only O(n) swaps.

Space complexity of an array

I recently came across a problem that made me wonder.
What if I stored a N element array inside an array of length N across all the N indexes.
As a tiny example:
[
[1, 2, 3],
[5, 6, 7],
[8, 9, 10],
]
An array of length 3 and at every index there is an array again of length 3
What would be the space complexity? Is it still O(N) or has it change.
It would still be O(n) because the Space Complexity analysis is meant to describe the complexity of the relationship between n and space, it doesn't care if you store an array of 3 elements at every index. The space used will be 3 times higher but still the relationship will be linear.
Big-O notation describes an asymptotic upper bound. It represents the
algorithm’s scalability and performance.
Simply put, it gives the worst-case scenario of an algorithm’s growth
rate.
from here.
It would be different if you said that at every index an array of N=index elements is stored. In that case it would have been O(n^2).

Why is randomised quicksort considered better than standard quicksort?

In Cormen's own words - "The difference is that with the deterministic algorithm, a particular input can elicit that worst-case behavior. With the randomized algorithm, however, no input can always elicit the worst-case behavior."
How does adding a randomized pivot change anything, the algorithm is still gonna perform bad under some particular input and considering each kind of input equally likely this is no better than that of the standard quicksort, only difference being we don't actually know which particular input is going to cause the worst case time complexity. So why is the randomized version considered better?
Consider the following version of quicksort, where we always pick the last element as the pivot. Now consider the following array:
int[] arr = {9, 8, 7, 6, 5, 4, 3, 2, 1};
When this array is sorted using our version of quicksort, it will always pick the smallest element as its pivot, the last element. And in the first iteration, it will change the array like this:
arr = [1, 8, 7, 6, 5, 4, 3, 2, 9];
Now, it will recurse on the sub-arrays:
s1 = [1, 8, 7, 6, 5, 4, 3, 2];
s2 = [9];
In s1 it will again pick 2 as its pivot, and only 8 and 2 will interchange positions. So, in this way, if we try to formulate a recurrence relation, for its complexity, it will be
T(n) = T(n-1) + O(n)
which corresponds to O(n^2).
So, for this array, the standard version will always take O(n^2) time.
In the randomized version, we first exchange the last element with some random element in the array and then select it as the pivot. So, for the given array, this pivot will split the array randomly, most probably in the middle. So, now the recurrence will be
T(n) = 2T(n/2) + O(n)
which will be O(n * Log(n)).
That's why we consider randomized quicksort better than standard quicksort, because, there is very low probability of bad splits in randomized quicksort.
The difference is that with the deterministic algorithm, a particular input can elicit that worst-case behavior. With the randomized algorithm, however, no input can always elicit the worst-case behavior.
This should be clarified to mean a truly randomized algorithm. If instead a deterministic pseudo-random algorithm is used, then a deliberately created input can elicit worst case behavior.
With the randomized algorithm, however, no input can always elicit the worst-case behavior.
This should be clarified: even with a truly randomized algorithm, there is still the possibility of some specific input that could elicit worst-case behavior in one or more invocations of a randomized quicksort with that input, but no input could always elicit worst-case behavior for an infinite number of invocations of a truly randomized quicksort on that same input.
Most library implementations of single pivot quicksort use a median of 3 or median of 9, since they can't rely on having fast instructions for random numbers like X86 RRAND and fast divide (for modulo function). If a quicksort was somehow part of an encryption scheme, then a truly randomized algorithm could be used to avoid time based attacks.

Can someone clarify the difference between Quicksort and Randomized Quicksort?

How is it different if I select a randomized pivot versus just selecting the first pivot in an unordered set/list?
If the set is unordered, isnt selecting the first value in the set, random in itself? So essentially, I am trying to understand how/if randomizing promises a better worst case runtime.
I think you may be mixing up the concepts of arbitrary and random. It's arbitrary to pick the first element of the array - you could pick any element you'd like and it would work equally well - but it's not random. A random choice is one that can't be predicted in advance. An arbitrary choice is one that can be.
Let's imagine that you're using quicksort on the sorted sequence 1, 2, 3, 4, 5, 6, ..., n. If you choose the first element as a pivot, then you'll choose 1 as the pivot. All n - 1 other elements then go to the right and nothing goes to the left, and you'll recursively quicksort 2, 3, 4, 5, ..., n.
When you quicksort that range, you'll choose 2 as the pivot. Partitioning the elements then puts nothing on the left and the numbers 3, 4, 5, 6, ..., n on the right, so you'll recursively quicksort 3, 4, 5, 6, ..., n.
More generally, after k steps, you'll choose the number k as a pivot, put the numbers k+1, k+2, ..., n on the right, then recursively quicksort them.
The total work done here ends up being Θ(n2), since on the first pass (to partition 2, 3, ..., n around 1) you have to look at n-1 elements, on the second pass (to partition 3, 4, 5, ..., n around 2), you have to look at n-2 elements, etc. This means that the work done is (n-1)+(n-2)+ ... +1 = Θ(n2), quite inefficient!
Now, contrast this with randomized quicksort. In randomized quicksort, you truly choose a random element as your pivot at each step. This means that while you technically could choose the same pivots as in the deterministic case, it's very unlikely (the probability would be roughly 22 - n, which is quite low) that this will happen and trigger the worst-case behavior. You're more likely to choose pivots closer to the center of the array, and when that happens the recursion branches more evenly and thus terminates a lot faster.
The advantage of randomized quicksort is that there's no one input that will always cause it to run in time Θ(n log n) and the runtime is expected to be O(n log n). Deterministic quicksort algorithms usually have the drawback that either (1) they run in worst-case time O(n log n), but with a high constant factor, or (2) they run in worst-case time O(n2) and the sort of input that triggers this case is deterministic.
In quick sort, the pivot is always the right most index of the selected array whereas in Randomized quick sort, pivot can be any element in the array.

Inserting a new value in binary search tree

Using an algorithm Tree-Insert(T, v) that inserts a new value v into a binary search tree T, the following algorithm grows a binary search tree by repeatedly inserting each value in a given section of an array into the tree:
Tree-Grow(A, first, last, T)
1 for i ← first to last
2 do Tree-Insert(T, A[i])
If the tree is initially empty, and the length of array section (i.e., last-first+1) is n, what are the best-case and the worst-case asymptotic running time of the above algorithm, respectively?
When n = 7, give a best-case instance (as an array containing digits 1 to 7, in certain order), and a worst-case instance (in the same form) of the algorithm.
If the array is sorted and all the values are distinct, find a way to modify Tree-Grow, so that it will always build the shortest tree.
What are the best-case and the worst-case asymptotic running time of the modified algorithm, respectively?
Please tag homework questions with the homework tag. In order to do well on your final exam, I suggest you actually learn this stuff, but I'm not here to judge you.
1) It takes O(n) to iterate from first to last. It takes O(lg n) to insert into a binary tree, therefore it the algorithm that you have shown takes O(n lg n) in the best case.
The worst case of inserting into a binary tree is when the tree is really long, but not very bushy; similar to a linked list. In that case, it would take O(n) to insert, therefore it would take O(n^2) in the worst case.
2) Best Case: [4, 2, 6, 1, 3, 5, 7], Worst Case: [1, 2, 3, 4, 5, 6, 7]
3) Use the n/2 index as the root, then recursively do this for the left side and right side of the array.
4) O(n lg n) in the best and worst case.
I hope this helps.

Resources