Related
Is there a chart or table anywhere that displays a lot of(at least the popular ones) data structures and algorithms with their running times and efficiency?
What I am looking for is something that I can glance at, and decide which structure/algorithm is best for a particular case. It would be helpful when working on a new project or just as a study guide.
A chart or table isn't going to be a particularly useful reference.
If you're going to be using a particular algorithm or data structure to tackle a problem, you'd better know and understand it inside and out. And that includes knowing (and knowing how to derive) their respective efficiencies. It's not particularly difficult. Most standard algorithms have simple, intuitive run-times like N^2, N*logN, etc.
That being said, run-time Big-O isn't everything. Take sorting for example. Heap sort has a better Big-O than say quick sort, yet quick sort performs much better in practice. Constant factors in Big-O's can also make a huge difference.
When you're talking about data structures, there's a lot more to them than meets the eye. For example, a hash map seems like just a tree map with much better performance, but you get additional sorting structure with a tree map.
Knowing what is the best algorithm/data structure to use is a matter of knowledge experience, not a look up table.
Though back to your question, I don't know of any such reference. It would be a good exercise to make one yourself though. Wikipedia has pretty decent articles on common algorithms/data structures along with some decent analysis.
I don't believe that any such list exists. The sheer number of known algorithms and data structures is staggering, and new ones are being developed all the time. Moreover, many of these algorithms and data structures are specialized, meaning that even if you had a list in front of you it would be difficult to know which ones were applicable for the particular problems you were trying to solve.
Another concern with such a list is how to quantify efficiency. If you were to rank algorithms in terms of asymptotic complexity (big-O), then you might end up putting certain algorithms and data structures that are asymptotically optimal but impractically slow on small inputs ahead of algorithms that are known to be fast for practical cases but might not be theoretically perfect. As an example, consider looking up the median-of-medians algorithm for linear time order statistics, which has such a huge constant factor that other algorithms tend to be much better in practice. Or consider quicksort, which in the worst-case is O(n2) but in practice has average complexity O(n lg n) and is much faster than other sorting algorithms.
On the other hand, were you to try to list the algorithms by runtime efficiency, the list would be misleading. Runtime efficiency is based on a number of factors that are machine- and input-specific (such as locality, size of the input, shape of the input, speed of the machine, processor architecture, etc.) It might be useful as a rule-of-thumb, but in many cases you might be mislead by the numbers to pick one algorithm when another is far superior.
There's also implementation complexity to consider. Many algorithms exist only in papers, or have reference implementations that are not optimized or are written in a language that isn't what you're looking for. If you find a Holy Grail algorithm that does exactly what you want but no implementation for it, it might be impossibly difficult to code up and debug your own version. For example, if there weren't a preponderance of red/black tree implementations, do you think you'd be able to code it up on your own? How about Fibonacci heaps? Or (from personal experience) van Emde Boas trees? Often it may be a good idea to pick a simpler algorithm that's "good enough" but easy to implement over a much more complex algorithm.
In short, I wish a table like this could exist that really had all this information, but practically speaking I doubt it could be constructed in a way that's useful. The Wikipedia links from #hammar's comments are actually quite good, but the best way to learn what algorithms and data structures to use in practice is by getting practice trying them out.
Collecting all algorithms and/or data structures is essentially impossible -- even as I'm writing this, there's undoubtedly somebody, somewhere is inventing some new algorithm or data structure. In the greater scheme of things, it's probably not of much significance, but it's still probably new and (ever so slightly) different from anything anybody's done before (though, of course, it's always possible it'll turn out to be a big, important thing).
That said, the US NIST has a Dictionary of Algorithms and Data Structures that lists more than most people ever know or care about. It covers most of the obvious "big" ones that everybody knows, and an awful lot of less-known ones as well. The University of Canterbury has another that is (or at least seems to me) a bit more modest, but still covers most of what a typical programmer probably cares about, and is a bit better organized for finding an algorithm to solve a particular problem, rather than being based primarily on already knowing the name of the algorithm you want.
There are also various collections/lists that are more specialized. For example, The Stony Brook Algorithm Repository is devoted primarily (exclusively?) to combinatorial algorithms. It's based on the Algorithm Design Manual, so it can be particularly useful if you have/use that book (and in case you're wondering, this book is generally quite highly regarded).
The first priority for a computer program is correctness and the second, most of the time, is programmer time - something directly linked to mantainability and extensibility.
Because of this, there is a school of programming that advocates just using simple stuff like arrays of records, unless it happens to be a performance sensitive part, in which case you need not only consider data structures and algorithms but also the "architechture" that led you to have that problem in the first place.
Why would you choose bubble sort over other sorting algorithms?
You wouldn't.
Owen Astrachan of Duke University once wrote a research paper tracing the history of bubble sort (Bubble Sort: An Archaeological Algorithmic Analysis) and quotes CS legend Don Knuth as saying
In short, the bubble sort seems to have nothing to recommend it, except a catchy name
and the fact that it leads to some interesting theoretical problems.
The paper concludes with
In this paper we have investigated the origins of bubble sort and its enduring popularity despite warnings against its use by many experts. We confirm the warnings by analyzing its complexity both in coding and runtime.
Bubble sort is slower than the other O(n2) sorts; it's about four times as slow as insertion sort and twice as slow as selection sort. It does have good best-case behavior (if you include a check for no swaps), but so does Insertion Sort: just one pass over an already-sorted array.
Bubble Sort is impractically slow on almost all real data sets. Any good implementation of quicksort, heapsort, or mergesort is likely to outperform it by a wide margin. Recursive sorts that use a simpler sorting algorithm for small-enough base-cases use Insertion Sort, not Bubble Sort.
Also, the President of the United States says you shouldn't use it.
Related: Why bubble sort is not efficient? has some more details.
There's one circumstance in which bubble sort is optimal, but it's one that can only really occur with ancient hardware (basically, something like a drum memory with two heads, where you can only read through the data in order, and only work with two data items that are directly next to each other on the drum).
Other than that, it's utterly useless, IMO. Even the excuse of getting something up and running quickly is nonsense, at least in my opinion. A selection sort or insertion sort is easier to write and/or understand.
You would implement bubble sort if you needed to create a web page showing an animation of bubble sort in action.
When all of the following conditions are true
Implementing speed is way more important than execution speed (probability <1%)
Bubble sort is the only sorting algorithm you remember from university class (probability 99%)
You have no sorting library at hand (probability <1%)
You don't have access to Google (probability <1%)
That would be less than 0,000099 % chance that you need to implement bubble sort, that is less than one in a million.
If your data is on a tape that is fast to read forward, slow to seek backward, and fast to rewind (or is a loop so it doesn't need rewinding), then bubblesort will perform quite well.
I suspect a trick question. No one would choose bubble sort over other sorting algorithms in the general case. The only time it really makes any sense is when you're virtually certain that the input is (nearly) sorted already.
Bubble sort is easy to implement. While the 'standard' implementation has poor performance, there is a very simple optimization which makes it a strong contender compared to many other simple algorithms. Google 'combsort', and see the magic of a few well placed lines. Quicksort still outperforms this, but is less obvious to implement and needs a language that supports recursive implementations.
I can think of a few reasons for bubble sort:
It's a basic elementary sort. They're great for beginner programmers learning the if, for, and while statements.
I can picture some free time for a programmer to experiment on how all the sorts work. What better to start with at the top with than the bubble sort (yes, this does demean its rank, but who doesn't think 'bubble sort' if someone says 'sorting algorithms').
Very easy to remember and work with for any algorithm.
When I was starting on linked lists, bubble sort helped me understand how all the nodes worked well with each other.
Now I'm feeling like a lame commercial advertising about bubble sort so I'll be quiet now.
I suppose you would choose bubble sort if you needed a sorting algorithm which was guaranteed to be stable and had a very small memory footprint. Basically, if memory is really scarce in the system (and performance isn't a concern) then it would work, and would be easily understood by anybody supporting the code. It also helps if you know ahead of time that the values are mostly sorted already.
Even in that case, insertion sort would probably be better.
And if it's a trick question, next time suggest Bogosort as an alternative. After all, if they're looking for bad sorting, that's the way to go.
It's useful for "Baby's First Sort" types of exercises in school because it's easy to explain how it works and it's easy to implement. Once you've written it, and maybe run it once, delete it and never think of it again.
You might use Bubblesort if you just wanted to try something quickly. If, for instance, you are in a new environment and you are playing around with a new idea, you can quickly throw in a bubble sort in very little time. It might take you much longer to remember and write a different sort and debug it and you still might not get it right. If your experiment works out and you need to use the code for something real, then you can spend the time to get it right.
No sense putting a lot of effort into the sort algorithm if you are just prototyping.
When demonstrating with a concrete example how not to implement a sort routine.
Because your other sorting algorithm is Monkey Sort? ;)
Seriously though, bubble sort is mainly a sorting algorithm for educational reasons and has no practical value.
When the array is already "almost" sorted or you have few additions into an already sorted-list, you can use bubble sort to resort it. Bubble sort usually works for small data-sets.
I am now looking at my old school assignment and want to find the solution of a question.
Which sorting method is most suitable for parallel processing?
Bubble sort
Quick sort
Merge sort
Selection sort
I guess quick sort (or merge sort?) is the answer.
Am I correct?
Like merge sort, quicksort can also be easily parallelized due to its divide-and-conquer nature. Individual in-place partition operations are difficult to parallelize, but once divided, different sections of the list can be sorted in parallel.
One advantage of parallel quicksort over other parallel sort algorithms is that no synchronization is required. A new thread is started as soon as a sublist is available for it to work on and it does not communicate with other threads. When all threads complete, the sort is done.
http://en.wikipedia.org/wiki/Quicksort
It depends completely on the method of parallelization. For multithreaded general computing, a merge sort provides pretty reliable load balancing and memory localization properties. For a large sorting network in hardware, a form of Batcher, Bitonic, or Shell sort is actually best if you want good O(log² n) performance.
i think merge sort
you can divide the dataset and make parallel operations on them..
I think Merge Sort would be the best answer here. Because the basic idea behind merge sort is to divide the problem into individual solutions.Solve them and Merge them.
Thats what we actually do in parallel processing too. Divide the whole problem into small unit statements to compute parallely and then join the results.
Thanks
Just a couple of random remarks:
Many discussions of how easy it is to parallelize quicksort ignore the pivot selection. If you traverse the array to find it, you've introduced a linear time sequential component.
Quicksort is not easy to implement at all in distributed memory. There is a discussion in the Kumar book
Yeah, I know, one should not use bubble sort. But "odd-even transposition sort", which is more or less equivalent, is actually a pretty good parallel programming exercise. In particular for distributed memory parallelism. It is the easiest example of a sorting network, which is very doable in MPI and such.
It is merge sort since the sorting is done on two sub arrays and they are compared and sorted at the end. these can be done in parallel
Sorting has been studied for decades, so surely the sorting algorithms provide by any programming platform (java, .NET, etc.) must be good by now, right? Is there any reason to override something like System.Collections.SortedList?
There are absolutely times where your intimate understanding of your data can result in much, much more efficient sorting algorithms than any general purpose algorithm available. I shared an example of such a situation in another post at SO, but I'll share it hear just to provide a case-in-point:
Back in the days of COBOL, FORTRAN, etc... a developer working for a phone company had to take a relatively large chunk of data that consisted of active phone numbers (I believe it was in the New York City area), and sort that list. The original implementation used a heap sort (these were 7 digit phone numbers, and a lot of disk swapping was taking place during the sort, so heap sort made sense).
Eventually, the developer stumbled on a different approach: By realizing that one, and only one of each phone number could exist in his data set, he realized that he didn't have to store the actual phone numbers themselves in memory. Instead, he treated the entire 7 digit phone number space as a very long bit array (at 8 phone numbers per byte, 10 million phone numbers requires just over a meg to capture the entire space). He then did a single pass through his source data, and set the bit for each phone number he found to 1. He then did a final pass through the bit array looking for high bits and output the sorted list of phone numbers.
This new algorithm was much, much faster (at least 1000x faster) than the heap sort algorithm, and consumed about the same amount of memory.
I would say that, in this case, it absolutely made sense for the developer to develop his own sorting algorithm.
If your application is all about sorting, and you really know your problem space, then it's quite possible for you to come up with an application specific algorithm that beats any general purpose algorithm.
However, if sorting is an ancillary part of your application, or you are just implementing a general purpose algorithm, chances are very, very good that some extremely smart university types have already provided an algorithm that is better than anything you will be able to come up with. Quick Sort is really hard to beat if you can hold things in memory, and heap sort is quite effective for massive data set ordering (although I personally prefer to use B+Tree type implementations for the heap b/c they are tuned to disk paging performance).
Generally no.
However, you know your data better than the people who wrote those sorting algorithms. Perhaps you could come up with an algorithm that is better than a generic algorithm for your specific set of data.
Implementing you own sorting algorithm is akin to optimization and as Sir Charles Antony Richard Hoare said, "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil".
Certain libraries (such as Java's very own Collections.sort) implement a sort based on criteria that may or may not apply to you. For example, Collections.sort uses a merge sort for it's O(n log(n)) efficiency as well as the fact that it's an in-place sort. If two different elements have the same value, the first element in the original collection stays in front (good for multi-pass sorting to different criteria (first scan for date, then for name, the collection stays name (then date) sorted)) However, if you want slightly better constants or have a special data-set, it might make more sense to implement your own quick sort or radix sort specific exactly to what you want to do.
That said, all operations are fast on sufficiently small n
Short answer; no, except for academic interest.
You might want to multi-thread the sorting implementation.
You might need better performance characteristics than Quicksorts O(n log n), think bucketsort for example.
You might need a stable sort while the default algorithm uses quicksort. Especially for user interfaces you'll want to have the sorting order be consistent.
More efficient algorithms might be available for the data structures you're using.
You might need an iterative implementation of the default sorting algorithm because of stack overflows (eg. you're sorting large sets of data).
Ad infinitum.
A few months ago the Coding Horror blog reported on some platform with an atrociously bad sorting algorithm. If you have to use that platform then you sure do want to implement your own instead.
The problem of general purpose sorting has been researched to hell and back, so worrying about that outside of academic interest is pointless. However, most sorting isn't done on generalized input, and often you can use properties of the data to increase the speed of your sorting.
A common example is the counting sort. It is proven that for general purpose comparison sorting, O(n lg n) is the best that we can ever hope to do.
However, suppose that we know the range that the values to be sorted are in a fixed range, say [a,b]. If we create an array of size b - a + 1 (defaulting everything to zero), we can linearly scan the array, using this array to store the count of each element - resulting in a linear time sort (on the range of the data) - breaking the n lg n bound, but only because we are exploiting a special property of our data. For more detail, see here.
So yes, it is useful to write your own sorting algorithms. Pay attention to what you are sorting, and you will sometimes be able to come up with remarkable improvements.
If you have experience at implementing sorting algorithms and understand the way the data characteristics influence their performance, then you would already know the answer to your question. In other words, you would already know things like a QuickSort has pedestrian performance against an almost sorted list. :-) And that if you have your data in certain structures, some sorts of sorting are (almost) free. Etc.
Otherwise, no.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Do bubble sorts have any real world use? Every time I see one mentioned, it's always either:
A sorting algorithm to learn with.
An example of a sorting algorithm not to use.
Bubble sort is (provably) the fastest sort available under a very specific circumstance. It originally became well known primarily because it was one of the first algorithms (of any kind) that was rigorously analyzed, and the proof was found that it was optimal under its limited circumstance.
Consider a file stored on a tape drive, and so little random access memory (or such large keys) that you can only load two records into memory at any given time. Rewinding the tape is slow enough that doing random access within the file is generally impractical -- if possible, you want to process records sequentially, no more than two at a time.
Back when tape drives were common, and machines with only a few thousand (words|bytes) of RAM (of whatever sort) were common, that was sufficiently realistic to be worth studying. That circumstance is now rare, so studying bubble sort makes little sense at all -- but even worse, the circumstance when it's optimal isn't taught anyway, so even when/if the right situation arose, almost nobody would realize it.
As far as being the fastest on an extremely small and/or nearly sorted set of data, while that can cover up the weakness of bubble sort (to at least some degree), an insertion sort will essentially always be better for either/both of those.
It depends on the way your data is distributed - if you can make some assumptions.
One of the best links I've found to understand when to use a bubble sort - or some other sort, is this - an animated view on sorting algorithms:
http://www.sorting-algorithms.com/
It doesn't get used much in the real world. It's a good learning tool because it's easy to understand and fast to implement. It has bad (O(n^2)) worst case and average performance. It has good best case performance when you know the data is almost sorted, but there are plenty of other algorithms that have this property, with better worst and average case performance.
I came across a great use for it in an optimisation anecdote recently. A program needed a set of sprites sorted in depth order each frame. The spites order wouldn't change much between frames, so as an optimisation they were bubble sorted with a single pass each frame. This was done in both directions (top to bottom and bottom to top). So the sprites were always almost sorted with a very efficient O(N) algorithm.
It's probably the fastest for tiny sets.
Speaking of education. A link to the last scene of sorting out sorting, it's amazing. A must-see.
It's good for small data sets - which is why some qsort implementations switch to it when the partition size gets small. But insertion sort is still faster, so there's no good reason to use it except as a teaching aid.
we recently used bubblesort in an optimality proof for an algorithm. We had to transform an arbitrary optimal solution represented by a sequence of objects into a solution that was found by our algorithm. Because our algorithm was just "Sort by this criteria", we had to prove that we can sort an optimal solution without making it worse. In this case, bubble sort was a very good algorithm to use, because it has the nice invariant of just swapping two elements that are next to each other and are in the wrong order. Using more complicated algorithms there would have melted brains, I think.
Greetings.
I think it's a good "teaching" algorithm because it's very easy to understand and implement. It may also be useful for small data sets for the same reason (although some of the O(n lg n) algorithms are pretty easy to implement too).
I can't resist responding to any remarks on bubble sort by mentioning the faster (seems to be O(nlogn), but this is not really proven) Comb Sort. Note that Comb sort is a bit faster if you use a precomputed table. Comb sort is exactly the same as bubble sort except that it doesn't initially start by swapping adjacent elements. It's almost as easy to implement/understand as bubble sort.
Bubble sort is easy to implement and it is fast enough when you have small data sets.
Bubble sort is fast enough when your set is almost sorted (e.g. one or several elements are not in the correct positions), in this case you better to interlace traverses from 0-index to n-index and from n-index to 0-index.
Using C++ it can be implemented in the following way:
void bubbleSort(vector<int>& v) { // sort in ascending order
bool go = true;
while (go) {
go = false;
for (int i = 0; i+1 < v.size(); ++i)
if (v[i] > v[i+1]) {
swap(v[i], v[j]);
go = true;
}
for (int i = (int)v.size()-1; i > 0; --i)
if (v[i-1] > v[i]) {
swap(v[i-1], v[i]);
go = true;
}
}
}
It can be good if swap of two adjacent items is chip and swap of arbitrary items is expensive.
Donald Knuth, in his famous "The Art of Computer Programming", concluded that "the bubble sort seems to have nothing to recommend it, except a catchy name and the fact that it leads to some interesting theoretical problems".
Since this algorithm is easy to implement it is easy to support, and it is important in real application life cycle to reduce effort for support.
I used to use it in some cases for small N on the TRS-80 Model 1.
Using a for loop, one could implement the complete sort on one program line.
Other than that, it is good for teaching, and sometimes for lists that are nearly in sorted order.
I once used it for a case where the vast majority of the time it would be sorting two items.
The next time I saw that code, someone had replaced it with the library sort. I hope they benchmarked it first!
It's quick and easy to code and (nearly impossible to do wrong). It has it's place if you're not doing heavy lifting and there's no library sorting support.
It is the sort I use most often actually. (In our project, we cannot use any external libraries.)
It is useful when I know for sure that data set is really small, so I do not care one bit about speed and want shortest and simplest code.
Bubble is not the lowest you can go. Recently, I was in a situation when I needed to sort exactly three elements. I wrote something like this:
// Use sort of stooge to sort the three elements by cpFirst
SwapElementsIfNeeded(&elementTop, &elementBottom);
SwapElementsIfNeeded(&elementTop, &elementMiddle);
SwapElementsIfNeeded(&elementMiddle, &elementBottom);
*pelement1 = elementTop;
*pelement2 = elementMiddle;
*pelement3 = elementBottom;
Oh yes, it is a good selection mechanism. If you find it in code written by someone, you don't hire him.
Mostly nothing. Use QuickSort or SelectionSort instead...!