'Rare' sorting algorithms? [closed] - algorithm

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 11 years ago.
Our algorithm professor gave us a assignment that requires us to choose a rare sorting algorithm (e.g. Introsort, Gnomesort, etc.) and do some research about it.
Wikipedia sure has a plenty of information about this, but it is still not enough for me to do the research in depth.
So I would like to find a book that include discussions of those rare sorting algorithms, since most of the textbooks (like CLRS, the one I am using) only discuss about some basic sorting algorithms (e.g. Bubble Sort, Merge Sort, Insertion Sort.).
Is there a book or website that contains a good amount of those information?
Thanks!

Well, a very interesting "rare" sorting algorithm in Smoothsort by Edsger Dijkstra. On paper it is almost a perfect sort:
O(n) best
O(n log n) average
O(n log n) worst
O(1) memory
n comparisons, 0 swaps when input is sorted
It is so rare due to it's complex nature (which makes it hard to optimize).
You can read the paper written by Dijkstra himself here: http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF
And here is the wikipedia link and a very extensive article about smoothsort (by Keith Schwarz).

One of a sorting which may be you say Rare Sorting, is timsorting, It works great in arrays which are have sorted parts, best case is O(n), and worst and average case is O(n log n).
Another fast way of sorting is bitonic sorting, which is base of nearly all parallel sorting algorithms. you can find thousands of papers about in the web, also some books like Parallel algorithm of Quinn you can find extended discussion on it, and related variations of this algorithm.
Also Art of computer programming volume 3 has good discussion on sorting strategies.

Bitonic sort is O(N log^2(N)) (slightly asymptotically slower than the likes of quicksort), but it is parallellizable, with a highly regular structure. This lets you use SIMD vector instruction sets like SSE -- providing a constant-factor boost which makes it an interesting option for "bottom-level" sorts (instead of the more commonly used insertion sort).

Related

Interview Question: What sort would you use if you required tight max time bounds and wanted highly regular performance?

I came across this question on several interview sites and the answer was always Balanced Tree Sort because it's guaranteed to have runtime O(nlogn). My question is, why can't the answer also be MergeSort? MergeSort is also guaranteed to be O(nlogn).

Quicksort in Θ(n lg n) [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
This is a question to people who are programmers for a living -
I just proved (using the Master theorem) that if we use quicksort and we pick the pivot to be the median of the the subarray we are partitioning (using the median of medians algorithm with Θ(n) worst case run time) then the worst case run time of quicksort is Θ(n lg n) - so basically this means that this version of quicksort is as good as it can get.
My question now is - does anyone implement quicksort like this in practice? Or is it just one of those nice theoretical things that are actually not good in real life?
PS - I don't need proofs of what I'm stating, I just want to know if this is being widely known/useful
This is known (see the wikipedia entry), but since in practice the worst case is relatively rare, the added overhead of an O(N) selection algorithm on the average case is generally considered unacceptable.
It really depends on where you're working.
So far, personally, I never actually implemented it - But I really think it varies, depending on the requirements of your workplace.
When you made partition around some pivot, you already have "quality" of the pivot (how evenly it divides array). If it's lower than some threshold, you can try some smarter ways to select pivot. This keeps time complexity O(n*log n) and keeps constants low, because complex selection is done rarely.
If I don't mistake C++ STL uses something like this, but I haven't any links - that's from a conversation on work.
update
C++ STL (at least the one in Visual Studio) seems to do a different thing:
Perform partition
Unconditionally sort the smaller part by recursion (since it cannot be bigger than half that's safe for O(n*log n))
Handle the larger part in the same loop (without recursive call)
If number of iterations exceeds approx. 1.5 log2(N), it switches to heap sort which is O(n*log n).

Why isn't smoothsort more common? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
From reading this article from Wikipedia on sorting algorithms, it would seem that smoothsort is the best sorting algorithm there is. It has top performance in all categories: best, average, and worst. Nothing beats it in any category. It also has constant memory requirements. The only downside is that it isn't stable.
It beats timsort in memory, and it beats quicksort in both worst-case performance and memory.
But I never hear about smoothsort. Nobody ever mentions it, and most discussions seem to revolve around other sorting algorithms.
Why is that?
Big-O performance is great for publishing papers, but in the real world we have to look at the constants too. Quicksort has been the algorithm of choice for unstable, in-place, in-memory sorting for so long because we can implement its inner loop very efficiently and it is very cache-friendly. Even if you can implement smoothsort's inner loop as efficiently, or nearly as efficiently, as quicksort's, you will probably find that its cache miss rate makes it slower.
We mitigate quicksort's worst-case performance by spending a little more effort choosing good pivots (to reduce the number of pathological cases) and detecting pathological cases. Look up introsort. Introsort runs quicksort first, but switches to heapsort if it detects excessive recursion (which indicates a pathological case for quicksort).
Better asymptotic doesn't imply better performance (though usually it turns out so). Hidden constant may be several times bigger, causing it to be slower that another algorithm (with same or even worst asymptotic complexity) on arrays of relatively small size (where relatively small array, in fact, may be of arbitrary size, 10100, for example. That's asymptotic analysis). But I don't know anything about smoothsort hidden constants.
For example, there is a O(n) worstcase in time algorithm for finding kth order statistic, but it's so complex that O(n log n) worstcase version outperforms it in most cases.
Also, there is an interesting comparison:
…As you can see, both Timsort and Smoothsort didn’t cut the mustard. Smoothsort is worse than STL sorts in all cases(even with std:bitset replaced with raw bit operations)…
Well first I would say that it is not like Smoothsort is not famous. It depends on the need of a user and also it depends on the user whether to use it or not.
The advantage of smoothsort is that it comes closer to O(n) time if the input is already sorted to some degree, whereas heapsort averages O(n log n) regardless of the initial sorted state.
From the Documentation:-
The smoothsort algorithm needs to be able to hold in memory the sizes
of all of the heaps in the string. Since all these values are
distinct, this is usually done using a bit vector. Moreover, since
there are at most O(log n) numbers in the sequence, these bits can be
encoded in O(1) machine words, assuming a transdichotomous machine
model.

Help In Learning Algorithm Basics [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
I am learning algorithms and need you guys to help me. I am a beginner so forgive me if my question is not clear. Whiles am learning i am seeing something like NlogN, N^2 etc.. and something like that.
I don't really understand it clearly when it comes to checking the efficiency/performance of different algorithms using these notations. I understand Logarithms very well but the way they are been used in relation to checking algorithms performance makes me mad.
I am asking if someone can point me to a tutorial where such notations are been explained so that i can get the basics very well. I really want to understand them and am willing to learn.
Thanks for your help.
Kap.
What you've described is called big O notation. Here is a guide explaining it.
An important thing to notice is that the notation ignores insignificant terms. If your algorithm takes 6X^2 + 3X + 12 seconds to execute, where X is the number of data points being processed, just call it O(X^2) because when X gets big, the 6 won't really make a difference, nor will the 3 nor the 12.
Buy Introduction to Algorithms. You can get a second hand version at an affordable price.
And/or view these great online video lectures from MIT built around aforementioned book.
By viewing those lectures, you'll understand how some algorithms have logarithmic time complexity, whereas some have exponential, etc.
Those are just functions, receiving the number of items in input, and returning how many operations it takes to complete the algorithm (usually they return the limiting factor of the algorithm, and not a specific function.. more on that - here).
http://www.amazon.com/Structures-Algorithm-Analysis-Allen-Weiss/dp/0805390529 is one of the best books which will explain Algorithms in excellent way.
--Cheers
You're talking about Big-O notation. This notation is a way of describing the worst possible running time of an algorithm as a function of its input size.
O(n^2) means that if the input has a size of n (such as a list with n elements in it), the algorithm will require n^2 passes through to execute in the worst-case scenarion (Big-O is worst case; there are other notations for best-case and average case). This could happen if you have a a for loop nested inside another, and both run from 1 to n.
O(nlogn) is similar. It usually happens when you are traversing a tree structure (such as a binary tree).
Note that you will probably never see something like O(3n) because for very large values of n, the constant 3 doesn't matter much, so it would be simplified to O(n).
Many of the others have already posted good links to read at.

Resource on computing time complexity of algorithms [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is there any good resource (book, reference, web site, application...) which explains how to compute time complexity of an algorithm?
Because, it is hard to make the things concrete in my mind. Sometimes it is talking about an iteration has time complexity of lg n; and then according to the another loop it becomes n.lg n; and sometimes they use big omega notation to express it, sometimes big-o and etc...
These things are quite abstract to me. So, I need a resource which has good explanation and a lot of examples to make me see what's goin on.
Hopefully, I explained my problem clearly. I am quite sure that everyone who just started to study algorithms has also the same problem with me. So, maybe those experienced guys can also share their experience with us. Thanks...
I think the best book is Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein.
Here is it on Amazon.
Guys, you're all recommending true complexity theory books -- Arora and Barak contains all sorts of things like PCP, Interactive Proofs, Quantum Computing and topics on Expander graphs -- things that most programmers/software developers have never heard of or will ever use. Papdimitriou is in the same category. Knuth is freaking impossible to read (and I was a CS/Math major) and gives zero intuition on how things work.
If you want a simple way to compute runtimes, and to get the flavour of the analysis, try the first chapter or so of Kleinberg and Tardos "Design and Analysis of Algorithms", that holds your hand through the fundamentals, and then you can work on much harder problems.
I read Christos Papadimitriou's Computational Complexity in university and really enjoyed it. It's not an easy matter, the book is long but it's well written (i.e., understandable and suitable for self-teaching) and it contains lots of useful knowledge, much more than just the "how do I figure out time complexity of algorithm x".
I agree that Introduction to Algorithms is a good book. For more detailed instructions on e.g. how to solve recurrence relations see Knuth's Concrete Mathematics. A good book on Computational Complexity itself is the one by Papadimitriou. But that last book might be a bit too thorough if you only want to calculate the complexity of given algorithms.
A short overview about big-O/-Omega notation:
If you can give an algorithm that solves a problem in time T(c*(n log n)) (c being a constant), than the time complexity of that problem is O(n log n). The big-O gets rid of the c, that is any constant factors not depending on the input size n. Big-O gives an upper bound on the running time, because you have shown (by giving an algorithm) that you can solve the problem in that amount of time.
If you can give a proof that any algorithm solving the problem takes at least time T(c*(n log n)) than the problem is in Omega(n log n). Big-Omega gives a lower bound on the problem. In most cases lower bounds are much harder to proof than upper bounds, because you have to show that any possible algorithm for the problem takes at least T(c*(n log n). Knowing a lower bound does not mean knowing an algorithm that reaches that lower bound.
If you have a lower bound, e.g. Omega(n log n), and an algorithm that solves the problem in that time (an upper bound), than the problem is in Theta(n log n). This means an "optimal" algorithm for this problem is known. Of course this notation hides the constant factor c which can be quite big (that's why I wrote optimal in quotes).
Note that when using these notations you are usually talking about the worst-case running time of an algorithm.
Computational complexity theory article in Wikipedia has a list of references, including a link to the book draft Computational Complexity: A Modern Approach, a textbook by Sanjeev Arora and Boaz Barak, Cambridge University Press.
The classic set of books on the subject is Knuth's Art of Computer Programming series. They're heavy on theory and formal proofs, so brush up on your calculus before you tackle them.
A course in Discrete Mathematics is sometimes given before Introduction to Algorithms.

Resources