Bubble sort complexity O(n) - complexity-theory

We have the series of numbers.We can see that this series is almost sorted.
Since this series is almost sorted does it mean that the complexity is O(n)?

No. There are so many reasons it's hard to know where to start. First, O() notation is not defined for specific input examples. The complexity of an algorithm is defined for any possible input.
Aside from that, even an almost sorted list can require O(N^2) time to sort. Simply take a sorted list, swap the first and last elements, and pass that to Bubble Sort. That seems like it would meet the definition of almost sorted, but Bubble Sort will take N^2 operations to put the list in total order.

Yes. This example can be considered as O(n).,
There are cases when O(n) and even less than that is possible.
Examples-
Already sorted array (1 2 3 4 5 6)
An array in which only the alternate values are exchanged (2 1 4 3 6 5)
etc.
Keeping these best cases or exceptional cases aside, the complexity of Bubble sort for a given random unsorted array is O(N^2).

This is very vague, but O() notation talks about worst-case runtime. So whatever input is handed to bubble-sort (for instance) can take at most n^2 number of operations to sort. Specific examples may take anywhere from the least amount of operations possible to the most operations possible (with bubble sort that is O(n^2)).

Related

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

Why is Insertion Sort O(n^2) better at sorting small array ~ 7 elements. compare to O(nlogn) sorting algorithm like Quick Sort and Merge Sort?

What I seen:
First I have read these two other SO post
Why is Insertion sort better than Quick sort for small list of elements?
Is there ever a good reason to use Insertion Sort?
But the answers on there are not specific enough for me.
From answers from these two post they mainly pointed out Merge Sort and Quick Sort can be slow because of the extra overhead from the recursive function calls. But I am wondering how the specific threshold 7 get set?
My Question:
I want to know why the cut off is around 7 elements where quadratic sorting algorithm like Insertion Sort is faster than O(nlogn) sorting algorithm like Quick Sort or Merge Sort.
Use insertion sort on small subarrays. Mergesort has too much overhead for tiny subarrays.
Cutoff to insertion sort for ~ 7 elements.
I got this from Princeton lecture slide which I think is reputable enough source. see on the 11th slide under Mergesort: Practical Improvements section.
I will really appreciate it if your answer includes examples for mathematical proof.
Big-O only notes the factor that dominates as n gets large. It ignores constant factors and lesser terms, which pretty much always exist and are more significant when n is small. As a consequence, Big-O is near useless for comparing algorithms that will only ever need to work on tiny inputs.
For example, you can have an O(n log n) function with a time graph like t = 5n log n + 2n + 3, and an O(n^2) function whose time graph was like t = 0.5n^2 + n + 2.
Compare those two graphs, and you'll find that in spite of Big-O, the O(n^2) function would be slightly faster until n reaches about 13.

How can I find the upper and lower boundary for quick sort?

I got the average case complexity for quick sort.Now how can I find the upper and lower bounds for quick sort?
The time complexity of the Quick Sort is O(N log(N)) with a worst case of O(N^2). This is due to the fact that it must go through all the numbers of the array and divide them equally. into two sub arrays that are lower and higher then the selected pivot. Each of these sub arrays must continue through the same process. This divide and conquer continues until there are only arrays of size 2 that are sorted correctly. To compute this it takes N log(N). this is easily seen with a binary tree, where the leaves (the bottom rows) are sorted. Then you just concatenate them.
8
4 4
2 2 2 2
Quick Sort runs into problems when you have a sorted array. Something like insertion sort would have a O(N) time algorithm at this situation. Dealing with arrays that are partially sorted and you need a time crunch (that is if you are dealing with Millions of Data), then you might want to create a algorithm of your own design that suits your taste.
Reference: https://en.wikipedia.org/wiki/Quicksort

Sorting in O(n*log(n)) worst case

Is there a sort of an array that works in O(n*log(n)) worst case time complexity?
I saw in Wikipedia that there are sorts like that, but they are unstable, what does that mean? Is there a way to do in low space complexity?
Is there a best sorting algorithm?
An algorithm that requires only O(1) extra memory (so modifying the input array is permitted) is generally described as "in-place", and that's the lowest space complexity there is.
A sort is described as "stable" or not, according to what happens when there are two elements in the input which compare as equal, but are somehow distinguishable. For example, suppose you have a bunch of records with an integer field and a string field, and you sort them on the integer field. The question is, if two records have the same integer value but different string values, then will the one that came first in the input, also come first in the output, or is it possible that they will be reversed? A stable sort is one that guarantees to preserve the order of elements that compare the same, but aren't identical.
It is difficult to make a comparison sort that is in-place, and stable, and achieves O(n log n) worst-case time complexity. I've a vague idea that it's unknown whether or not it's possible, but I don't keep up to date on it.
Last time someone asked about the subject, I found a couple of relevant papers, although that question wasn't identical to this question:
How to sort in-place using the merge sort algorithm?
As far as a "best" sort is concerned - some sorting strategies take advantage of the fact that on the whole, taken across a large number of applications, computers spend a lot of time sorting data that isn't randomly shuffled, it has some structure to it. Timsort is an algorithm to take advantage of commonly-encountered structure. It performs very well in a lot of practical applications. You can't describe it as a "best" sort, since it's a heuristic that appears to do well in practice, rather than being a strict improvement on previous algorithms. But it's the "best" known overall in the opinion of people who ship it as their default sort (Python, Java 7, Android). You probably wouldn't describe it as "low space complexity", though, it's no better than a standard merge sort.
You can check out between mergesort, quicksort or heapsort all nicely described here.
There is also radix sort whose complexity is O(kN) but it takes full advantage of extra memory consumption.
You can also see that for smaller collections quicksort is faster but then mergesort takes the lead but all of this is case specific so take your time to study all 4 algorithms
For the question best algorithm, the simple answer is, it depends.It depends on the size of the data set you want to sort,it depends on your requirement.Say, Bubble sort has worst-case and average complexity both О(n2), where n is the number of items being sorted. There exist many sorting algorithms with substantially better worst-case or average complexity of O(n log n). Even other О(n2) sorting algorithms, such as insertion sort, tend to have better performance than bubble sort. Therefore, bubble sort is not a practical sorting algorithm when n is large.
Among simple average-case Θ(n2) algorithms, selection sort almost always outperforms bubble sort, but is generally outperformed by insertion sort.
selection sort is greatly outperformed on larger arrays by Θ(n log n) divide-and-conquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays.
Likewise, you can yourself select the best sorting algorithm according to your requirements.
It is proven that O(n log n) is the lower bound for sorting generic items. It is also proven that O(n) is the lower bound for sorting integers (you need at least to read the input :) ).
The specific instance of the problem will determine what is the best algorithm for your needs, ie. sorting 1M strings is different from sorting 2M 7-bits integers in 2MB of RAM.
Also consider that besides the asymptotic runtime complexity, the implementation is making a lot of difference, as well as the amount of available memory and caching policy.
I could implement quicksort in 1 line in python, roughly keeping O(n log n) complexity (with some caveat about the pivot), but Big-Oh notation says nothing about the constant terms, which are relevant too (ie. this is ~30x slower than python built-in sort, which is likely written in C btw):
qsort = lambda a: [] if not a else qsort(filter(lambda x: x<a[len(a)/2], a)) + filter(lambda x: x == a[len(a)/2], a) + qsort(filter(lambda x: x>a[len(a)/2], a))
For a discussion about stable/unstable sorting, look here http://www.developerfusion.com/article/3824/a-guide-to-sorting/6/.
You may want to get yourself a good algorithm book (ie. Cormen, or Skiena).
Heapsort, maybe randomized quicksort
stable sort
as others already mentioned: no there isn't. For example you might want to parallelize your sorting algorithm. This leads to totally different sorting algorithms..
Regarding your question meaning stable, let's consider the following: We have a class of children associated with ages:
Phil, 10
Hans, 10
Eva, 9
Anna, 9
Emil, 8
Jonas, 10
Now, we want to sort the children in order of ascending age (and nothing else). Then, we see that Phil, Hans and Jonas all have age 10, so it is not clear in which order we have to order them since we sort just by age.
Now comes stability: If we sort stable we sort Phil, Hans and Jonas in the order they were before, i.e. we put Phil first, then Hans, and at last, Jonas (simply because they were in this order in the original sequence and we only consider age as comparison criterion). Similarily, we have to put Eva before Anna (both the same age, but in the original sequence Eva was before Anna).
So, the result is:
Emil, 8
Eva, 9
Anna, 9
Phil, 10 \
Hans, 10 | all aged 10, and left in original order.
Jonas, 10 /
To put it in a nutshell: Stability means that if two elements are equal (w.r.t. the chosen sorting criterion), the one coming first in the original sequence still comes first in the resulting sequence.
Note that you can easily transform any sorting algorithm into a stable sorting algorithm: If your original sequence holds n elements: e1, e2, e3, ..., en, you simply attach a counter to each one: (e1, 0), (e2, 1), (e3, 2), ..., (en, n-1). This means you store for each element its original position.
If now two elements are equal, you simply compare their counters and put the one with the lower counter value first. This increases runtime (and memory) by O(n), which is asymptotic no worsening since the best (comparison) sort algorithm needs already O(n lg n).

Analysis of algorithms (complexity)

How are algorithms analyzed? What makes quicksort have an O(n^2) worst-case performance while merge sort has an O(n log(n)) worst-case performance?
That's a topic for an entire semester. Ultimately we are talking about the upper bound on the number of operations that must be completed before the algorithm finishes as a function of the size of the input. We do not include the coeffecients (ie 10N vs 4N^2) because for N large enough, it doesn't matter anymore.
How to prove what the big-oh of an algorithm is can be quite difficult. It requires a formal proof and there are many techniques. Often a good adhoc way is to just count how many passes on the data the algorithm makes. For instance, if your algorithm has nested for loops, then for each of N items you must operate N times. That would generally be O(N^2).
As to merge sort, you split the data in half over and over. That takes log2(n). And for each split you make a pass on the data, which gives N log(n).
quick sort is a bit trickier because in the average case it is also n log (n). You have to imagine what happens if your partition splits the data such that every time you get only one element on one side of the partition. Then you will need to split the data n times instead of log(n) times which makes it N^2. The advantage of quicksort is that it can be done in place, and that we usually get closer to N log(n) performance.
This is introductory analysis of algorithms course material.
An operation is defined (ie, multiplication) and the analysis is performed in terms of either space or time.
This operation is counted in terms of space or time. Typically analyses are performed as Time being the dependent variable upon Input Size.
Example pseudocode:
foreach $elem in #list
op();
endfor
There will be n operations performed, where n is the size of #list. Count it yourself if you don't believe me.
To analyze quicksort and mergesort requires a decent level of what is known as mathematical sophistication. Loosely, you solve a discrete differential equation derived from the recursive relation.
Both quicksort and merge sort split the array into two, sort each part recursively, then combine the result. Quicksort splits by choosing a "pivot" element and partitioning the array into smaller or greater then the pivot. Merge sort splits arbitrarily and then merges the results in linear time. In both cases a single step is O(n), and if the array size halves each time this would give a logarithmic number of steps. So we would expect O(n log(n)).
However quicksort has a worst case where the split is always uneven so you don't get a number of steps proportional to the logarithmic of n, but a number of steps proportional to n. Merge sort splits exactly into two halves (or as close as possible) so it doesn't have this problem.
Quick sort has many variants depending on pivot selection
Let's assume we always select 1st item in the array as a pivot
If the input array is sorted then Quick sort will be only a kind of selection sort!
Because you are not really dividing the array.. you are only picking first item in each cycle
On the other hand merge sort will always divide the input array in the same manner, regardless of its content!
Also note: the best performance in divide and conquer when divisions length are -nearly- equal !
Analysing algorithms is a painstaking effort, and it is error-prone. I would compare it with a question like, how much chance do I have to get dealt two aces in a bridge game. One has to carefully consider all possibilities and must not overlook that the aces can arrive in any order.
So what one does for analysing those algorithms is going through an actual pseudo code of the algorithm and add what result a worst case situation would have. In the following I will paint with a large brush.
For quicksort one has to choose a pivot to split the set. In a case of dramatic bad luck the set splits in a set of n-1 and a set of 1 each time, for n steps, where each steps means inspecting n elements. This arrive at N^2
For merge sort one starts by splitting the sequence into in order sequences. Even in the worst case that means at most n sequences. Those can be combined two by two, then the larger sets are combined two by two etc. However those (at most) n/2 first combinations deal with extremely small subsets, and the last step deals with subsets that have about size n, but there is just one such step. This arrives at N.log(N)

Resources