Algorithm comparison in unsorted array - algorithm

If I have a unsorted array A[1.....n]
using linear search to search number x
using bubble sorting to sort the array A in ascending order, then use binary search to search number x in sorted array
Which way will be more efficient — 1 or 2?
How to justify it?

If you need to search for a single number, nothing can beat a linear search: sorting cannot proceed faster than O(n), and even that is achievable only in special cases. Moreover, bubble sort is extremely inefficient, taking O(n2) time. Binary search is faster than that, so the overall timing is going to be dominated by O(n2).
Hence you are comparing O(n) to O(n2); obviously, O(n) wins.
The picture would be different if you needed to search for k different numbers, where k is larger than n2. The outcome of this comparison may very well be negative.

Related

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

Sorting partially sorted array

Suppose you are given a sorted list of n elements followed by f(n) randomly ordered elements. How would you sort the list if (i) f(n) = O(logn). I feel best algo would be merge sort but I am not sure of the resulting time complexity.
You should first sort the f(n) elements with any sort method and then use merge sort for the final phase. The time complexity would be O(n) as O(log(n)2) is negligible compared to the linear scan of the sorted portion.
If by list you mean an array, you could reduce the number of comparisons to O(log(n)2) by looking for the insertion point into the left portion using binary search. It would still take O(n) copying operations, so depending on the relative costs of copying vs: comparing, the time might stay sub-linear even for moderately large values of n.

Does comparison really take O(1) time? and if not... why do we use comparison sorts?

Consider two k-bit numbers (in binary representation):
$$A = A_1 A_2 A_3 A_4 ... A_k $$
$$B = B_1 B_2 B_3 B_4 ... B_k $$
to compare we scan from left to right looking for an occurrence of a 0 and check opposite number if that digit is also a 0 (for both numbers) noticing that if ever such a case is found then the source of the 0 is less than the source of the 1. But what if the numbers are:
111111111111
111111111110
clearly this will require scanning the whole number and if we are told nothing about the numbers ahead of time and simply given them then:
Comparison take $O(k)$ time.
Therefore when we look at the code for a sorting method such as high-performance quick sort:
HPQuicksort(list): T(n)
check if list is sorted: if so return list
compute median: O(n) time (or technically: O(nk))
Create empty list $L_1$, $L_2$, and $L_3$ O(1) time
Scan through list O(n)
if element is less place into $L_1$ O(k)
if element is more place into $L_2$ O(k)
if element is equal place into $L_3$ O(k)
return concatenation of HP sorted $L_1$, $L_3$, $L_2$ 2 T(n/2)
Thus: T(n) = O(n) + O(nk) + 2*T(n/2) ---> T(n) = O(nklog(n))
Which means quicksort is slower than radix sort.
Why do we still use it then?
There seem to be two independent questions here:
Why do we claim that comparisons take time O(1) when analyzing sorting algorithms, when in reality they might not?
Why would we use quicksort on large integers instead of radix sort?
For (1), typically, the runtime analysis of sorting algorithms is measured in terms of the number of comparisons made rather than in terms of the total number of operations performed. For example, the famous sorting lower bound gives a lower bound in terms of number of comparisons, and the analyses of quicksort, heapsort, selection sort, etc. all work by counting comparisons. This is useful for a few reasons. First, typically, a sorting algorithm will be implemented by being given an array and some comparison function used to compare them (for example, C's qsort or Java's Arrays.sort). From the perspective of the sorting algorithm, this is a black box. Therefore, it makes sense to analyze the algorithm by trying to minimize the number of calls to the black box. Second, if we do perform our analyses of sorting algorithms by counting comparisons, it's easy to then determine the overall runtime by multiplying the number of comparisons by the cost of a comparison. For example, you correctly determined that sorting n k-bit integers will take expected time O(kn log n) using quicksort, since you can just multiply the number of comparisons by the cost of a comparison.
For your second question - why would we use quicksort on large integers instead of radix sort? - typically, you would actually use radix sort in this context, not quicksort, for the specific reason that you pointed out. Quicksort is a great sorting algorithm for sorting objects that can be compared to one another and has excellent performance, but radix sort frequently outperforms it on large arrays of large strings or integers.
Hope this helps!

Insertion Sort with binary search

When implementing Insertion Sort, a binary search could be used to locate the position within the first i - 1 elements of the array into which element i should be inserted.
How would this affect the number of comparisons required? How would using such a binary search affect the asymptotic running time for Insertion Sort?
I'm pretty sure this would decrease the number of comparisons, but I'm not exactly sure why.
Straight from Wikipedia:
If the cost of comparisons exceeds the cost of swaps, as is the case
for example with string keys stored by reference or with human
interaction (such as choosing one of a pair displayed side-by-side),
then using binary insertion sort may yield better performance. Binary
insertion sort employs a binary search to determine the correct
location to insert new elements, and therefore performs ⌈log2(n)⌉
comparisons in the worst case, which is O(n log n). The algorithm as a
whole still has a running time of O(n2) on average because of the
series of swaps required for each insertion.
Source:
http://en.wikipedia.org/wiki/Insertion_sort#Variants
Here is an example:
http://jeffreystedfast.blogspot.com/2007/02/binary-insertion-sort.html
I'm pretty sure this would decrease the number of comparisons, but I'm
not exactly sure why.
Well, if you know insertion sort and binary search already, then its pretty straight forward. When you insert a piece in insertion sort, you must compare to all previous pieces. Say you want to move this [2] to the correct place, you would have to compare to 7 pieces before you find the right place.
[1][3][3][3][4][4][5] ->[2]<- [11][0][50][47]
However, if you start the comparison at the half way point (like a binary search), then you'll only compare to 4 pieces! You can do this because you know the left pieces are already in order (you can only do binary search if pieces are in order!).
Now imagine if you had thousands of pieces (or even millions), this would save you a lot of time. I hope this helps. |=^)
If you have a good data structure for efficient binary searching, it is unlikely to have O(log n) insertion time. Conversely, a good data structure for fast insert at an arbitrary position is unlikely to support binary search.
To achieve the O(n log n) performance of the best comparison searches with insertion sort would require both O(log n) binary search and O(log n) arbitrary insert.
Binary Insertion Sort - Take this array => {4, 5 , 3 , 2, 1}
Now inside the main loop , imagine we are at the 3rd element. Now using Binary Search we will know where to insert 3 i.e. before 4.
Binary Search uses O(Logn) comparison which is an improvement but we still need to insert 3 in the right place. For that we need to swap 3 with 5 and then with 4.
Due to insertion taking the same amount of time as it would without binary search the worst case Complexity Still remains O(n^2).
I hope this helps.
Assuming the array is sorted (for binary search to perform), it will not reduce any comparisons since inner loop ends immediately after 1 compare (as previous element is smaller). In general the number of compares in insertion sort is at max the number of inversions plus the array size - 1.
Since number of inversions in sorted array is 0, maximum number of compares in already sorted array is N - 1.
For comparisons we have log n time, and swaps will be order of n.
For n elements in worst case : n*(log n + n) is order of n^2.

Can you sort n integers in O(n) amortized complexity?

Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
What about trying to create a worst case of O(n) complexity?
Most of the algorithms today are built on O(nlogn) average + O(n^2) worst case.
Some, while using more memory are O(nlogn) worst.
Can you with no limitation on memory usage create such an algorithm?
What if your memory is limited? how will this hurt your algorithm?
Any page on the intertubes that deals with comparison-based sorts will tell you that you cannot sort faster than O(n lg n) with comparison sorts. That is, if your sorting algorithm decides the order by comparing 2 elements against each other, you cannot do better than that. Examples include quicksort, bubblesort, mergesort.
Some algorithms, like count sort or bucket sort or radix sort do not use comparisons. Instead, they rely on the properties of the data itself, like the range of values in the data or the size of the data value.
Those algorithms might have faster complexities. Here is an example scenario:
You are sorting 10^6 integers, and each integer is between 0 and 10. Then you can just count the number of zeros, ones, twos, etc. and spit them back out in sorted order. That is how countsort works, in O(n + m) where m is the number of values your datum can take (in this case, m=11).
Another:
You are sorting 10^6 binary strings that are all at most 5 characters in length. You can use the radix sort for that: first split them into 2 buckets depending on their first character, then radix-sort them for the second character, third, fourth and fifth. As long as each step is a stable sort, you should end up with a perfectly sorted list in O(nm), where m is the number of digits or bits in your datum (in this case, m=5).
But in the general case, you cannot sort faster than O(n lg n) reliably (using a comparison sort).
I'm not quite happy with the accepted answer so far. So I'm retrying an answer:
Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
The answer to this question depends on the machine that would execute the sorting algorithm. If you have a random access machine, which can operate on exactly 1 bit, you can do radix sort for integers with at most k bits, which was already suggested. So you end up with complexity O(kn).
But if you are operating on a fixed size word machine with a word size of at least k bits (which all consumer computers are), the best you can achieve is O(n log n). This is because either log n < k or you could do a count sort first and then sort with a O (n log n) algorithm, which would yield the first case also.
What about trying to create a worst case of O(n) complexity?
That is not possible. A link was already given. The idea of the proof is that in order to be able to sort, you have to decide for every element to be sorted if it is larger or smaller to any other element to be sorted. By using transitivity this can be represented as a decision tree, which has n nodes and log n depth at best. So if you want to have performance better than Ω(n log n) this means removing edges from that decision tree. But if the decision tree is not complete, than how can you make sure that you have made a correct decision about some elements a and b?
Can you with no limitation on memory usage create such an algorithm?
So as from above that is not possible. And the remaining questions are therefore of no relevance.
If the integers are in a limited range then an O(n) "sort" of them would involve having a bit vector of "n" bits ... looping over the integers in question and setting the n%8 bit of offset n//8 in that byte array to true. That is an "O(n)" operation. Another loop over that bit array to list/enumerate/return/print all the set bits is, likewise, an O(n) operation. (Naturally O(2n) is reduced to O(n)).
This is a special case where n is small enough to fit within memory or in a file (with seek()) operations). It is not a general solution; but it is described in Bentley's "Programming Pearls" --- and was allegedly a practical solution to a real-world problem (involving something like a "freelist" of telephone numbers ... something like: find the first available phone number that could be issued to a new subscriber).
(Note: log(10*10) is ~24 bits to represent every possible integer up to 10 digits in length ... so there's plenty of room in 2*31 bits of a typical Unix/Linux maximum sized memory mapping).
I believe you are looking for radix sort.

Resources