Difference in Space Complexity of different sorting algorithms - algorithm

I am trying to understand Space Complexities of different sorting algorithms.
http://bigocheatsheet.com/?goback=.gde_98713_member_241501229
from the above link I found that the complexity of
bubble sort,insertion and selection sort is O(1)
where as quick sort is O(log(n)) and merge sort is O(n).
we were actually not allocating extra memory in any of the algorithms.
Then why the space complexities are different when we are using the same array to sort them?

When you run code, memory is assigned in two ways:
Implicitly, as you set up function calls.
Explicitly, as you create chunks of memory.
Quicksort is a good example of implicit use of memory. While I'm doing a quicksort, I'm recursively calling myself O(n) times in the worst case, O(log(n)) in the average case. Those recursive calls each take O(1) space to keep track of, leading to a O(n) worst case and O(log(n)) average case.
Mergesort is a good example of explicit use of memory. I take two blocks of sorted data, create a place to put the merge, and then merge from those two into that merge. Creating a place to put the merge is O(n) data.
To get down to O(1) memory you need to both not assign memory, AND not call yourself recursively. This is true of all of bubble, insertion and selection sorts.

It's important to keep in mind that there are a lot of different ways to implement each of these algorithms, and each different implementation has a different associated space complexity.
Let's start with merge sort. The most common implementation of mergesort on arrays works by allocating an external buffer in which to perform the merges of the individual ranges. This requires space to hold all the elements of the array, which takes extra space Θ(n). However, you could alternatively use an in-place merge for each merge, which means that the only extra space you'd need would be space for the stack frames of the recursive calls, dropping the space complexity down to Θ(log n), but increasing the runtime of the algorithm by a large constant factor. You could alternatively do a bottom-up mergesort using in-place merging, which requires only O(1) extra space but with a higher constant factor.
On the other hand, if you're merge sorting linked lists, then the space complexity is going to be quite different. You can merge linked lists in space O(1) because the elements themselves can easily be rewired. This means that the space complexity of merge sorting linked lists is Θ(log n) from the space needed to store the stack frames for the recursive calls.
Let's look at quicksort as another example. Quicksort doesn't normally allocate any external memory, but it does need space for the stack frames it uses. A naive implementation of quicksort might need space Θ(n) in the worst case for stack frames if the pivots always end up being the largest or smallest element of the array, since in that case you keep recursively calling the function on arrays of size n - 1, n - 2, n - 3, etc. However, there's a standard optimization you can perform that's essentially tail-call elimination: you recursively invoke quicksort on the smaller of the two halves of the array, then reuse the stack space from the current call for the larger half. This means that you only allocate new memory for a recursive call on subarrays of size at most n / 2, then n / 4, then n / 8, etc. so the space usage drops to O(log n).

I'll assume the array we're sorting is passed by reference, and I'm assuming the space for the array does not count in the space complexity analysis.
The space complexity of quicksort can be made O(n) (and expected O(log n) for randomized quicksort) with clever implementation: e.g. don't copy the whole sub-arrays, but just pass on indexes.
The O(n) for quicksort comes from the fact that the number of "nested" recursive calls can be O(n): think of what happens if you keep making unlucky choices for the pivot. While each stack frame takes O(1) space, there can be O(n) stack frames. The expected depth (i.e. expected stack space) is O(log n) if we're talking about randomized quicksort.
For merge sort I'd expect the space complexity to be O(log n) because you make at most O(log n) "nested" recursive calls.
The results you're citing also count the space taken by the arrays: then the time complexity of merge sort is O(log n) for stack space plus O(n) for array, which means O(n) total space complexity. For quicksort it is O(n)+O(n)=O(n).

Related

Space complexity Merge sort, Insertion sort explained (for dummies)

I was wondering if someone could explain to me how the space complexity of both these algorithms work. I have done readings on it but they seem to be contradictive, if I understand correctly.
I'm for example interested in how a linked list would affect the space complexity and this question says it makes it faster?;
Why is mergesort space complexity O(log(n)) with linked lists?
This question however says it shouldn't matter; Merge Sort Time and Space Complexity
Now I'm a bit new to programming and would like to understand the theory a bit better so dummie language would be appreciated.
The total space complexity of merge sort is O(n) since you have to store the elements somewhere. Nevertheless, there can indeed be a difference in additional space complexity, between an array implementation and a linked-list implementation.
Note that you can implement an iterative version that only requires O(1) additional space. However, if I remember correclty, this version would perform horribly.
In the conventional recursive version, you need to account for the stack frames. That alone gives a O(log n) additional space requirement.
In a linked-list implementation, you can perform merges in-place without any auxiliary memory. Hence the O(log n) additional space complexity.
In an array implementation, merges require auxiliary memory (likely an auxiliary array), and the last merge requires the same amount of memory as that used to store the elements in the first place. Hence the O(n) additional space complexity.
Keep in mind that space complexity tells you how the space needs of the algorithm grows as the input size grows. There are details that space complexity ignores. Namely, the sizes of a stack frame and an element are probably different, and a linked-list takes up more space than an array because of the links (the references). That last detail is important for small elements, since the additional space requirement of the array implementation is likely less than the additional space taken by the links of the linked-list implementation.
Why is merge sort space complexity O(log(n)) with linked lists?
This is only true for top down merge sort for linked lists, where O(log2(n)) stack space is used due to recursion. For bottom up merge sort for linked lists, space complexity is O(1) (constant space). One example of an optimized bottom up merge sort for a linked list uses a small (26 to 32) array of pointers or references to to the first nodes of list. This would still be considered O(1) space complexity. Link to pseudo code in wiki article:
https://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation_using_lists

For faser searching, shouldn't one apply merge sort on the data before doing binary search or just jump straight to linear search?

I'm learning about algorithms and have doubts about their application in certain situations. There is the divide and conquer merge sort, and the binary search. Both faster than linear growth algos.
Let's say I want to search for some value in a large list of data. I don't know whether the data is sorted or not. How about instead of doing a linear search, why not first do merge sort and then do binary search. Would that be faster? Or the process of applying merge sort and then binary search combined would slow it down even more than linear search? Why? Would it depend on the size of the data?
There's a flaw in the premise of your question. Merge Sort has O(N logN) complexity, which is the best any comparison-based sorting algorithm can be, but that's still a lot slower than a single linear scan. Note that log2(1000) ~= 10. (Obviously, the constant-factors matter a lot, esp. for smallish problem sizes. Linear search of an array is one of the most efficient things a CPU can do. Copying stuff around for MergeSort is not bad, because the loads and stores are from sequential addresses (so caches and prefetching are effective), but it's still a ton more work than 10 reads through the array.)
If you need to support a mix of insert/delete and query operations, all with good time complexity, pick the right data structure for the task. A binary search tree is probably appropriate (or a Red-Black tree or some other variant that does some kind of rebalancing to prevent O(n) worst-case behaviour). That'll give you O(log n) query, and O(log n) insert/delete.
sorted array gives you O(n) insert/delete (because you have to shuffle the remaining elements over to make or close gaps), but O(log n) query (with lower time and space overhead than a tree).
unsorted array: O(n) query (linear search), O(1) insert (append to the end), O(n) delete (O(n) query, then shuffle elements to close the gap). Efficient deletion of elements near the end.
linked list, sorted or unsorted: few advantages other than simplicity.
hash table: insert/delete: O(1) average (amortized). query for present/not-present: O(1). Query for which two elements a non-present value is between: O(n) linear scan keeping track of the min element greater than x, and max element less than x.
If your inserts/deletes happen in large chunks, then sorting the new batch and doing a merge-sort is much more efficient than adding elements one at a time to a sorted array. (i.e. InsertionSort). Adding a chunk at the end and doing QuickSort is also an option, and might modify less memory.
So the best choice depends on the access pattern you're optimizing for.
If the list is of size n, then
TimeOfMergeSort(list) + TimeOfBinarySearch(list) = O(n log n) + O(log n) = O(n log n)
TimeOfLinearSearch(list) = O(n)
O(n) < O(n log n)
Implies
TimeOfLinearSearch(list) < TimeOfMergeSort(list) + TimeOfBinarySearch(list)
Of course, as mentioned in the comments frequency of sorting and frequency of searching play a huge role in amortized cost.

Why quick sort is considered as fastest sorting algorithm?

Quick sort has worst case time complexity as O(n^2) while others like heap sort and merge sort has worst case time complexity as O(n log n) ..still quick sort is considered as more fast...Why?
On a side note, if sorting an array of integers, then counting / radix sort is fastest.
In general, merge sort does more moves but fewer compares than quick sort. The typical implementation of merge sort uses a temp array of the same size as the original array, or 1/2 the size (sort 2nd half into second half, sort first half into temp array, merge temp array + 2nd half into original array), so it needs more space than quick sort which optimally only needs log2(n) levels of nesting, and to avoid worst case nesting, a nesting check may be used and quick sort changed to heap sort, (this is called introsort).
If the compare overhead is greater than the move overhead, then merge sort is faster. A common example where compares take longer than moves would be sorting an array of pointers to strings. Only the (4 or 8 byte) pointers are moved, while the strings may be significantly larger (and similar for a large number of strings).
If there is significant pre-ordering of the data to be sorted, then timsort (fixed sized runs) or a "natural" merge sort (variable sized runs) will be faster.
While it is true that quicksort has worst case time complexity of O(n^2), as long as the quicksort implementation properly randomizes the input, its average case (expected) running time is O(n log n).
Additionally, the constant factors hidden by the asymptotic notation that do matter in practice are pretty small as compared to other popular choices such as merge sort. Thus, in expectation, quicksort will outperform other O(n log n) comparison sorts despite the less savory worst case bounds
Not exactly like that. Quicksort is the best in most cases, however it's pesimistic time complexity can be O(n^2), it doesn't mean it always is. The issue lies in choosing the right point of pivot, if you choose it correctly you have time complexity O(n log n).
In addition, quicksort is one of the cheapest/easiest in implementation.

Sort Stack Ascending Order (Space Analysis)

I was going through the book "Cracking the Coding Interview" and came across the question
"Write a program to sort a stack in ascending order. You may use additional stacks to hold items, but you may not copy the elements into any other data structures (such as an array). The stack supports the following operations: push, pop, peek, isEmpty."
The book gave an answer with O(n^2) time complexity and O(n) space.
However, I came across this blog providing an answer in O(n log n) time complexity using quicksort approach.
What I was wondering was is the space complexity O(n^2) though? Since each call to the method involves initializing another two stacks, along with another two recursive calls.
I'm still a little shaky on space complexity. I'm not sure if this would be O(n^2) space with the new stacks spawned from each recursive call being smaller than the ones a level up.
If anyone could give a little explanation behind their answer, that would be great.
The space complexity is also O(n log n) in average case. If space complexity happens to be O(n^2), then how can time complexity be O(n log n), as each space allocated need at least one access.
So, in average case, assuming that stack is divided in half each time, at ith depth of recursion, size of array becomes O(n/2^i) with 2^i recursion branches on ith depth.
So total size allocated on ith depth is O(n/2^i) *2^i = O(n).
Since maximum depth is log n, so overall space complexity is O(n log n).
However, in worst case, space complexity is O(n^2).
In this method of quicksort, the space complexity will exactly follow the time complexity - the reason is quite simple. You are dividing the sub stacks recursively (using the pivot) until each element is in a stack of size one. This leads to (2^x = n) divisions of x sub stacks (log n depth) and in the end you have n stacks each of size one. Hence the total space complexity will be O(n*log n).
Keep in mind that in this case, the space complexity will follow the time complexity exactly as we are literally occupying new space at each iteration. So, in the worst case, the space complexity will be O(n^2).

How to compute the algorithmic space complexity

I am reviewing my data structures and algorithm analysis lesson, and I get a question that how to determine to the space complexity of merge sort and quick sort
algorithms ?
The depth of recursion is only O(log n) for linked list merge-sort
The amount of extra storage space needed for contiguous quick sort is O(n).
My thoughts:
Both use divide-and-conquer strategy, so I guess the space complexity of linked list merge sort should be same as the contiguous quick sort. Actually I opt for O(log n) because before every iteration or recursion call the list is divided in half.
Thanks for any pointers.
The worst case depth of recursion for quicksort is not (necessarily) O(log n), because quicksort doesn't divide the data "in half", it splits it around a pivot which may or may not be the median. It's possible to implement quicksort to address this[*], but presumably the O(n) analysis was of a basic recursive quicksort implementation, not an improved version. That would account for the discrepancy between what you say in the blockquote, and what you say under "my thoughts".
Other than that I think your analysis is sound - neither algorithm uses any extra memory other than a fixed amount per level of recursion, so depth of recursion dictates the answer.
Another possible way to account for the discrepancy, I suppose, is that the O(n) analysis is just wrong. Or, "contiguous quicksort" isn't a term I've heard before, so if it doesn't mean what I think it does ("quicksorting an array"), it might imply a quicksort that's necessarily space-inefficient in some sense, such as returning an allocated array instead of sorting in-place. But it would be silly to compare quicksort and mergesort on the basis of the depth of recursion of the mergesort vs. the size of a copy of the input for the quicksort.
[*] Specifically, instead of calling the function recursively on both parts, you put it in a loop. Make a recursive call on the smaller part, and loop around to do the bigger part, or equivalently push (pointers to) the larger part onto a stack of work to do later, and loop around to do the smaller part. Either way, you ensure that the depth of the stack never exceeds log n, because each chunk of work not put on the stack is at most half the size of the chunk before it, down to a fixed minimum (1 or 2 if you're sorting purely with quicksort).
I'm not really familiar with the term "contiguous quicksort". But quicksort can have either O(n) or O(log n) space complexity depending on how it is implemented.
If it is implemented as follows:
quicksort(start,stop) {
m=partition(start,stop);
quicksort(start,m-1);
quicksort(m+1,stop);
}
Then the space complexity is O(n), not O(log n) as is commonly believed.
This is because you are pushing onto the stack twice at each level, so the space complexity is determined from the recurrance:
T(n) = 2*T(n/2)
Assuming the partitioning divides the array into 2 equal parts (best case). The solution to this according to the Master Theorem is T(n) = O(n).
If we replace the second quicksort call with tail recursion in the code snippet above, then you get T(n) = T(n/2) and therefore T(n) = O(log n) (by case 2 of the Master theorem).
Perhaps the "contiguous quicksort" refers to the first implementation because the two quicksort calls are next to each other, in which case the space complexity is O(n).

Resources