Best runtime for n-1 comparisons? - algorithm

If an algorithm must make n-1 comparisons to find a certain element, then can we assume that best possible runtime of the algorithm is O(n)?
I know that the lower bound for sorting algorithms is nlogn but since we only return the found one element, I figured it would be possible to do better in terms of run time?
Thanks!

To find a certain element in an unsorted list you need O(n).
But if you sort the array (takes O(n log n) in general) you can find a certain element in O(log n).
So if you want to find often elements in the same list it is most likely worth to sort the list to then be able to find elements much more efficient.

If your array is unsorted and you find some element in it then in worst case Linear search algorithm make n-1 comparisons and time complexity will be O(n).
But if you want to reduce your time complexity then first sort your array and use Binary search algorithm it is take O(logn) in worst case.
So Binary search algorithm is more efficient then linear search.

For unsorted elements, worst case is when you have to go over all the elements, i.e., O(N). If you need many look-ups then you have several pre-processing alternatives that speed up all future accesses.
Option 1: put the elements in a standard hash table. Creating the hash table costs O(N), on average, and later pay O(1) on average for each lookup. This assumes that a reasonable hash-function can be created for this type of elements.
Most languages/libraries implement bucket-based hash-tables, which in pathological cases can put all elements in one bucket, costing O(N) per lookup.
Option 2: there are other hash-table implementations that don't suffer from pathological O(N) cases. The Robin Hood hashing (Wikipedia) (more at Programming.Guide) guarantees O(log N) lookup in the worst case, with average of O(1).
Option 3: another option is to sort elements in O(N log N) once, and then use binary-search to lookup in O(log N). Usually this is slower than Robin Hood hashing (Option 2).
Option 4: If the values are simple integers with limited range, with max-min around N, then it is possible to put the values in an array (list), such that array[value-min] will contain a count of how many times the value appears in the input. It costs O(N) to construct, and O(1) to lookup. Better, the constants for both preprocessing and lookup are significantly lower than in any other method.
Note: I didn't mention the O(N) counting-sort as an alternative to the general case of O(N log N) sorting (option 3), since if max(value)-min(value) is small enough for counting-sort, then option 4 is relevant and is simpler and faster.
If applicable, choose option 4, otherwise if you wish to invest time and code then choose option 2. If 4 isn't applicable, and 2 is not worth the effort in your case, then choose option 2 if you don't mind the pathological worst-case (never choose option 2 when an adversary may want to harm you in a DOS attack).

Your question has nothing to do with sorting, let alone linear search.
If you claim that n-1 comparisons are mandated, then your problem has certainly complexity Ω(n). But with that information alone, you can't guarantee O(n) because it is not said that these n-1 comparisons are sufficient, nor that the algorithm does not perform extra operations, for instance to decide which comparisons to perform. It could turn out that your algorithm is O(n³) with no chance to do better, but we can't tell.
Best case complexity: Ω(n).
Worst case complexity: unknown.

Related

Sorting Algorithms with time complexity Log(n)

Is there any sorting algorithm with an average time complexity log(n)??
example [8,2,7,5,0,1]
sort given array with time complexity log(n)
No; this is, in fact, impossible for an arbitrary list! We can prove this fairly simply: the absolute minimum thing we must do for a sort is look at each element in the list at least once. After all, an element may belong anywhere in the sorted list; if we don't even look at an element, it's impossible for us to sort the array. This means that any sorting algorithm has a lower bound of n, and since n > log(n), a log(n) sort is impossible.
Although n is the lower bound, most sorts (like merge sort, quick sort) are n*log(n) time. In fact, while we can sort purely numerical lists in n time in some cases with radix sort, we actually have no way to, say, sort arbitrary objects like strings in less than n*log(n).
That said, there may be times when the list is not arbitrary; ex. we have a list that is entirely sorted except for one element, and we need to put that element in the list. In that case, methods like binary search tree can let you insert in log(n), but this is only possible because we are operating on a single element. Building up a tree (ie. performing n inserts) is n*log(n) time.
As #dominicm00 also mentioned the answer is no.
In general when you see an algorithm with time complexity of Log N with base 2 that means that, you are dividing the input list into 2 sets, and getting rid of one of them repeatedly. In sorting algorithm we need to put all the elements in their appropriate place, if we get rid of half of the list in each iteration, that does not correlate with sorting functionality.
The most efficient sorting algorithms have the time complexity of O(n), but with some limitations. Three most famous algorithm with complexity of O(n) are :
Counting sort with time complexity of O(n+k), while k is the maximum number in given list. Assuming n>>k, you can consider its time complexity as O(n)
Radix sort with time complexity of O(d*(n+k)), where k is maximum number of input list and d is maximum number of digits you may have in input list. Similar to counting sort assuming n>>k && n>>d => time complexity will be O(n)
Bucket sort with time complexity of O(n)
But in general due to limitation of each of these algorithms most implementation relies on O(n* log n) algorithms, such as merge sort, quick sort, and heap sort.
Also there are some sorting algorithms with time complexity of O(n^2) which are recommended for list with smaller sizes such as insertion sort, selection sort, and bubble sort.
Using a PLA it might be possible to implement counting sort for a few elements with a low range of values.
count each amount in parallel and sum using lg2(N) steps
find the offset of each element in lg2(N) steps
write the array in O(1)
Only massive parallel computation would be able to do this, general purpose CPU's would not do here unless they implement it in silicon as part of their SIMD.

What is the appropriate data structure for insertion sort?

I revisited insertion sort algorithm and noticed something funny.
One obviously shouldn't use an array with this sort, as upon insertion, one will have to shift all subsequent elements O(n^2 log(n)). However a linked list is also not good here, since we preferably find the right placement using binary search, which isn't possible for a simple linked list (so we end up with O(n^2)).
Which makes me wonder: what is a data structure on which this sorting algorithm provides its premise of O(nlog(n)) complexity?
From where did you get the premise of O(n log n)? Wikipedia disagrees, as does my own experience. The premises of the insertion sort include components that are O(n) for each of the n elements.
Also, I believe that your claim of O(n^2 log n) is incorrect. The binary search is log n, and the ensuing "move sideways" is n, but these two steps are in succession, not nested. The result is n + log n, not a multiplication. The result is the expected O(n^2).
If you use a gapped array and a binary search to figure out where to insert things, then with high probability your sort will be O(n log(n)). See https://en.wikipedia.org/wiki/Library_sort for details.
However this is not as efficient as a wide variety of other sorts that are widely implemented. So this knowledge is only of theoretical interest.
Insertion sort is defined over array or list, if you use some other data structure, then it will be another algorithm.
Of course if you use a BST, insertion and search would be O(log(n)) and your overall complexity would be O(n.log(n)) on the average (remind that it will be O(n^2) in the worst), but this will be no more an insertion sort but a tree sort. If you use an AVL tree, then you get the O(n.log(n)) worst case complexity.
In insertion sort the best case scenario is when the sequence is already sorted and that takes Linear time and in the worst case takes O(n^2) time. I do not know how you got the logarithmic part in the complexity.

O(nlogn) in-place sorting algorithm

This question was in the preparation exam for my midterm in introduction to computer science.
There exists an algorithm which can find the kth element in a list in
O(n) time, and suppose that it is in place. Using this algorithm,
write an in place sorting algorithm that runs in worst case time
O(n*log(n)), and prove that it does. Given that this algorithm exists,
why is mergesort still used?
I assume I must write some alternate form of the quicksort algorithm, which has a worst case of O(n^2), since merge-sort is not an in-place algorithm. What confuses me is the given algorithm to find the kth element in a list. Isn't a simple loop iteration through through the elements of an array already a O(n) algorithm?
How can the provided algorithm make any difference in the running time of the sorting algorithm if it does not change anything in the execution time? I don't see how used with either quicksort, insertion sort or selection sort, it could lower the worst case to O(nlogn). Any input is appreciated!
Check wiki, namely the "Selection by sorting" section:
Similarly, given a median-selection algorithm or general selection algorithm applied to find the median, one can use it as a pivot strategy in Quicksort, obtaining a sorting algorithm. If the selection algorithm is optimal, meaning O(n), then the resulting sorting algorithm is optimal, meaning O(n log n). The median is the best pivot for sorting, as it evenly divides the data, and thus guarantees optimal sorting, assuming the selection algorithm is optimal. A sorting analog to median of medians exists, using the pivot strategy (approximate median) in Quicksort, and similarly yields an optimal Quicksort.
The short answer why mergesort is prefered over quicksort in some cases is that it is stable (while quicksort is not).
Reasons for merge sort. Merge Sort is stable. Merge sort does more moves but fewer compares than quick sort. If the compare overhead is greater than move overhead, then merge sort is faster. One situation where compare overhead may be greater is sorting an array of indices or pointers to objects, like strings.
If sorting a linked list, then merge sort using an array of pointers to the first nodes of working lists is the fastest method I'm aware of. This is how HP / Microsoft std::list::sort() is implemented. In the array of pointers, array[i] is either NULL or points to a list of length pow(2,i) (except the last pointer points to a list of unlimited length).
I found the solution:
if(start>stop) 2 op.
pivot<-partition(A, start, stop) 2 op. + n
quickSort(A, start, pivot-1) 2 op. + T(n/2)
quickSort(A, pibvot+1, stop) 2 op. + T(n/2)
T(n)=8+2T(n/2)+n k=1
=8+2(8+2T(n/4)+n/2)+n
=24+4T(n/4)+2n K=2
...
=(2^K-1)*8+2^k*T(n/2^k)+kn
Recursion finishes when n=2^k <==> k=log2(n)
T(n)=(2^(log2(n))-1)*8+2^(log2(n))*2+log2(n)*n
=n-8+2n+nlog2(n)
=3n+nlog2(n)-8
=n(3+log2(n))-8
is O(nlogn)
Quick sort have worstcase O(n^2), but that only occurs if you have bad luck when choosing the pivot. If you can select the kth element in O(n) that means you can choose a good pivot by doing O(n) extra steps. That yields a woest-case O(nlogn) algorithm. There are a couple of reasons why mergesort is still used. First, this selection algorithm is more or less cumbersome to implement in-place, and also adds several extra operations to the regular quicksort, so it is not that fastest than merge sort, as one might expect.
Nevertheless, MergeSort is not still used because of its worst time complexity, in fact HeapSort achieves the same worst case bounds and is also in place, and didn't replace MergeSort, though it has also other disadvantages against quicksort. The main reason why MergeSort survives is because it is the fastest stable sort algorithm know so far. There are several applications in which is paramount to have an stable sorting algorithm. And that is the strength of MergeSort.
A stable sort is such that the equal items preserve the original relative order. For example, this is very useful when you have two keys, and you want to sort by first key first and then by second key, preserving the first key order.
The problem with HeapSort against quicksort is that it is cache inefficient, since you swap/compare elements too far from each other in the array, while quicksort compares consequent elements, these elements are more likely to be in the cache at the same time.

Does every algorithm has a best case data input?

Does every algorithm has a 'best case' and 'worst case' , this was a question raised by someone who answered it with no ! I thought that every algorithm has a case depending on its input so that one algorithm finds that a particular set of input are the best case but other algorithms consider it the worst case.
so which answer is correct and if there are algorithms that doesn't have a best case can you give an example ?
Thank You :)
No, not every algorithm has a best and worst case. An example of that is the linear search to find the max/min element in an unsorted array: it always checks all items in the array no matter what. It's time complexity is therefore Theta(N) and it's independent of the particular input.
Best Case input is the casein which your code would take the least number of procedure calls. eg. You have an if in your code and in that, you iterate for every element and no such functionality in else part. So, any input in which the code does not enter if block will be the best case input and conversely, any input in which code enters this if will be worst case for this algorithm.
If, for any algorithm, branching or recursion or looping causes a difference in complexity factor for that algorithm, it will have a possible best case or possible worst case scenario. Otherwise, you can say that it does not or that it has similar complexity for best case or worst case.
Talking about sorting algorithms, lets take example of merge and quick sorts. (I believe you know them well, and their complexities for that matter).
In merge sort every time, array is divided into two equal parts thus taking log n factor in splitting while in recombining, it takes O(n) time (for every split, of course). So, total complexity is always O(n log n) and it does not depend on the input. So, you can either say merge sort has no best/worst case conditions or its complexity is same for best/worst cases.
On the other hand, if quick sort (not randomized, pivot always the 1st element) is given a random input, it will always divide the array in two parts, (may or may not be equal, doesn't matter) and if it does this, log factor of its complexity comes into picture (though base won't always be 2). But, if the input is sorted already (ascending or descending) it will always split it into 1 element + rest of array, so will take n-1 iterations to split the array, which changes its O(log n) factor to O(n) thereby changing complexity to O(n^2). So, quick sort will have best and worst cases with different time complexities.
Well, I believe every algorithm has a best and worst case though there's no guarantee that they will differ. For example, the algorithm to return the first element in an array has an O(1) best, worst and average case.
Contrived, I know, but what I'm saying is that it depends entirely on the algorithm what their best and worst cases are, but the cases will exist, even if they're the same, or unbounded at the top end.
I think its reasonable to say that most algorithms have a best and a worst case. If you think about algorithms in terms of Asymptotic Analysis you can say that a O(n) search algorithm will perform worse than a O(log n) algorithm. However if you provide the O(n) algorithm with data where the search item is early on in the data set and the O(log n) algorithm with data where the search item is in the last node to be found the O(n) will perform faster than the O(log n).
However an algorithm that has to examine each of the inputs every time such as an Average algorithm won't have a best/worst as the processing time would be the same no matter the data.
If you are unfamiliar with Asymptotic Analysis (AKA big O) I suggest you learn about it to get a better understanding of what you are asking.

Heapsort. How is it possible so simulate worstcase-scenario?

I am rather clear on how to programme it, but I am not sure on the definition, e.g. how to write it down in mathematics terms.
A normal heapsort is done with N elements in O notation. So O(log(n))
I just started with heapsort, so I might be a little bit off here.
But how can I for example look for a random element, when there are N elements?
And then pick that random element and delete it?
I was thinking that in a worst case - situation it has to go through the whole tree (Because the element could either be at the first place or at the last place, e.g. highest or lowest).
But how can I write that down in mathematics terms?
Heapsort's worst case performance is O(n log n), and to quote alestanis:
Max in max-heap: O(1). Min in min-heap: O(1). Opposite cases in O(n).
Here's an SO-answer explaining how to do the opposite cases in O(1) if you create the heap yourself.
To build maxheap array worstcase is O(n) and to max heapify complexcity in worst case is O(logn) so HeapSort worstCase is O(nlogn)

Resources