What's 1.5n comparison? - algorithm

I'm beginning to learn Data Structure and Algorithms with UCSD's MOOC.
For the second problem, they ask us to implement an algorithm to find the two highest values in an array.
As an additional problem, they add the following exercise:
Exercise Break. Find two largest elements in an array in 1.5n comparisons.
I don't know exactly what 1.5 comparisons mean. I've searched on Google but couldn't find an explanation of comparisons in algorithms.
Is there a site with some examples of comparisons?

Is talking about the complexity of the algorithm
You have to give an algorithm who takes O(3/2 n) in the worst case.
Just to an example, bubble sort algo. takes O(n*n) in the worst case

Related

Big O algorithms minimum time

I know that for some problems, no matter what algorithm you use to solve it, there will always be a certain minimum amount of time that will be required to solve the problem. I know BigO captures the worst-case (maximum time needed), but how can you find the minimum time required as a function of n? Can we find the minimum time needed for sorting n integers, or perhaps maybe finding the minimum of n integers?
what you are looking for is called best case complexity. It is kind of useless analysis for algorithms while worst case analysis is the most important analysis and average case analysis is sometimes used in special scenario.
the best case complexity depends on the algorithms. for example in a linear search the best case is, when the searched number is at the beginning of the array. or in a binary search it is in the first dividing point. in these cases the complexity is O(1).
for a single problem, best case complexity may vary depending on the algorithm. for example lest discuss about some basic sorting algorithms.
in bubble sort best case is when the array is already sorted. but even in this case you have to check all element to be sure. so the best case here is O(n). same goes to the insertion sort
for quicksort/mergesort/heapsort the best case complexity is O(n log n)
for selection sort it is O(n^2)
So from the above case you can understand that the complexity ( whether it is best , worst or average) depends on the algorithm, not on the problem

Is a better sorting algorithm required with a time complexity of O(n)?

I am working on a program that uses just one for-loop for N times and sorts the N elements.
Just wished to ask, is it worth it? Because I know it's gonna work because it is working pretty well on paper.
It also uses comparisons.
I also wished to know if there were any drawbacks in Radix Sort.
Cheers.
Your post mentions that you are using comparisons. Comparison-based sorting algorithms need at least O(n log n) comparisons for average inputs. Please note that Ω(n log n) lower bound on comparison sorting algorithms has been proven mathematically using information theory. You can only achieve O(n) is the best case scenario where the input data is already sorted. There is a lot more detail on sorting algorithm on Wikipedia.
I would only implement your sorting algorithm as a challenging programming exercise. Most modern languages already provide fast sorting algorithms that have been thoroughly tested.

No key-comparison sorting algorithm

On this webpage I can read:
A few special case algorithms (one example is mentioned in
Programming Pearls) can sort certain data sets faster than
O(n*log(n)). These algorithms are not based on comparing the items
being sorted and rely on tricks. It has been shown that no
key-comparison algorithm can perform better than O(n*log(n)).
It's the first time I hear about non-comparison algorithms. Could anybody give me an example of one of those algorithms and explain better how they solve the sorting problem faster then O(nlog(n))? What kind of tricks the author of that webpage is talking about?
Any link to papers or other good source are welcome. Thank you.
First, let's get the terminology straight:
Key comparison algorithms can't do better than O(n logn).
There exist other -- non-comparison -- algorithms that, given certain assumptions about the data, can do better than O(n logn). Bucket sort is one such example.
To give an intuitive example of the second class, let's say you know that your input array consists entirely of zeroes and ones. You could iterate over the array, counting the number of zeroes and ones. Let's call the final counts n0 and n1. You then iterate over the output array, writing out n0 zeroes followed by n1 ones. This is an O(n) sorting algorithm.
It has been possible to come up with a linear-time algorithm for this problem only because we exploit the special structure of the data. This is in contrast to key comparison algorithms, which are general-purpose. Such algorithms don't need to know anything about the data, except for one thing: they need to know how to compare the sorting keys of any two elements. In other words, given any two elements, they need to know which should come first in the sorted array.
The price of being able to sort anything in any way imaginable using just one algorithm is that no such algorithm can hope to do better than O(n logn) on average.
Yes non comparison sorting usually takes O(n) an example of these sorting algorithms are the Bucket Sort and Radix Sort

Need an efficient selection algorithm?

I am looking for an algorithm for selecting A [N/4] the element in an unsorted array A where N is the Number of elements of the array A. I want the algorithm to do the selection in sublinear times .I have knowledge of basic structures like a BST etc? Which one will be the best algorithm for me keeping in mind I want it to be the fastest possible and should not be too tough for me to implement.Here N can vary upto 250000.Any help will be highly appreciated.Note array can have non unique elements
As #Jerry Coffin mentioned, you cannot hope to get a sublinear time algorithm here unless you are willing to do some preprocessing up front. If you want a linear-time algorithm for this problem, you can use the quickselect algorithm, which runs in expected O(n) time with an O(n2) worst-case. The median-of-medians algorithm has worst-case O(n) behavior, but has a high constant factor. One algorithm that you might find useful is the introselect algorithm, which combines the two previous algorithms to get a worst-case O(n) algorithm with a low constant factor. This algorithm is typically what's used to implement the std::nth_element algorithm in the C++ standard library.
If you are willing to do some preprocessing ahead of time, you can put all of the elements into an order statistic tree. From that point forward, you can look up the kth element for any k in time O(log n) worst-case. The preprocessing time required is O(n log n), though, so unless you are making repeated queries this is unlikely to be the best option.
Hope this helps!

Is it possible to calculate median of a list of numbers better than O(n log n)?

I know that it is possible to calculate the mean of a list of numbers in O(n). But what about the median? Is there any better algorithm than sort (O(n log n)) and lookup middle element (or mean of two middle elements if an even number of items in list)?
Yes. You can do it (deterministically) in O(n).
What you're talking about is a selection algorithm, where k = n/2. There is a method based on the same partitioning function used in quicksort which works. It is called, not surprisingly, quickselect. While it can, like quicksort, have a O(n2) worst case, this can be brought down to linear time using the proper pivot selection.
Partially irrelevant, but: a quick tip on how to quickly find answers to common basic questions like this on the web.
We're talking about medians? So Gg to the page about medians in wikipedia
Search page for algorithm:
Efficient computation of the sample median
Even though sorting n items takes in general O(n log n) operations, by using a "divide and conquer" algorithm the median of n items can be computed with only O(n) operations (in fact, you can always find the k-th element of a list of values with this method; this is called the selection problem).
Follow the link to the selection problem for the description of algorithm. Read intro:
... There are worst-case linear time selection algorithms. ...
And if you're interested read about the actual ingenious algorithm.
If the numbers are discrete (e.g. integers) and there is a manageable number of distinct values, you can use a "bucket sort" which is O(N), then iterate over the buckets to figure out which bucket holds the median. The complete calculation is O(N) in time and O(B) in space.
Just for fun (and who knows, it may be faster) there's another randomized median algorithm, explained technically in Mitzenmacher's and Upfall's book. Basically, you choose a polynomially-smaller subset of the list, and (with some fancy bookwork) such that it probably contains the real median, and then use it to find the real median. The book is on google books, and here's a link. Note: I was able to read the pages of the algorthm, so assuming that google books reveals the same pages to everyone, you can read them too.
It is a randomized algorithm s.t. if it finds the answer, it is 100% certain that it is the correct answer (this is called Las Vegas style). The randomness arises from the runtime --- occasionally (with probability 1/(sqrt(n)), I think) it FAILS to find the median, and must be re-run.
Asymptotically, it is exactly linear when you take into the chance of failure --- that is to say, it is a wee bit less than linear, exactly such that when you take into account the number of times you may need to re-run it, it becomes linear.
Note: I'm not saying this is better or worse --- I certainly haven't done a real-life runtime comparison between these algorithms! I'm simply presenting an additional algorithm that has linear runtime, but works in a significantly different way.
This link has popped up recently on calculating median: http://matpalm.com/median/question.html .
In general I think you can't go beyond O(n log n) time, but I don't have any proof on that :). No matter how much you make it parallel, aggregating the results into a single value takes at least log n levels of execution.
Try the randomized algorithm, the sampling size (e.g. 2000) is independent from the data size n, still be able to get sufficiently high (99%) accuracy. If you need higher accuracy, just increase sampling size. Using Chernoff bound can proof the probability under a certain sampling size. I've write some JavaScript Code to implement the algorithm, feel free to take it. http://www.sfu.ca/~wpa10

Resources