Algorithms for Big O Analysis

Algorithms for Big O Analysis - algorithm

What all algorithms do you people find having amazing (tough, strange) complexity analysis in terms of both - Resulting O notation and uniqueness in way they are analyzed?

I have (quite) a few examples:
The union-find data structure, which supports operations in (amortized) inverse Ackermann time. It's particularly nice because the data structure is incredibly easy to code.
Splay trees, which are self-balancing binary trees (that is, no extra information is stored other than the BST -- no red/black information. Amortized analysis was essentially invented to prove bounds for splay trees; splay trees run in amortized logarithmic time, but worst-case linear time. The proofs are cool.
Fibonacci heaps, which perform most of the priority queue operations in amortized constant time, thus improving the runtime of Dijkstra's algorithm and other problems. As with splay trees, there are slick "potential function" proofs.
Bernard Chazelle's algorithm for computing minimum spanning trees in linear times inverse Ackermann time. The algorithm uses soft heaps, a variant of the traditional priority queue, except that some "corruption" might occur and queries might not be answered correctly.
While on the topic of MSTs: an optimal algorithm has been given by Pettie and Ramachandran, but we don't know the running time!
Lots of randomized algorithms have interested analyses. I'll only mention one example: Delaunay triangulation can be computed in expected O(n log n) time by incrementally adding points; the analysis is apparently intricate, though I haven't seen it.
Algorithms that use "bit tricks" can be neat, e.g. sorting in O(n log log n) time (and linear space) -- that's right, it breaks the O(n log n) barrier by using more than just comparisons.
Cache-oblivious algorithms often have interesting analyses. For example, cache-oblivious priority queues (see page 3) use log log n levels of sizes n, n2/3, n4/9, and so on.
(Static) range-minimum queries on arrays are neat. The standard proof tests your limits with respect to reduction: range-minimum queries is reduced to least common ancestor in trees, which is in turn reduced to a range-minimum queries in a specific kind of arrays. The final step uses a cute trick, too.

Ackermann's function.

This one is kinda simple but Comb Sort blows my mind a little.
http://en.wikipedia.org/wiki/Comb_sort
It is such a simple algorithm for the most part it reads like an overly complicated bubble sort, but it is O(n*Log[n]). I find that mildly impressive.
The plethora of Algorithms for Fast Fourier Transforms are impressive too, the math that proves their validity is trippy and it was fun to try to prove a few on my own.
http://en.wikipedia.org/wiki/Fast_Fourier_transform
I can fairly easily understand the prime radix, multiple prime radix, and mixed radix algorithms but one that works on sets whose size are prime is quite cool.

2D ordered search analysis is quite interesting. You've got a 2-dimensional numeric array of numbers NxN where each row is sorted left-right and each column is sorted top-down. The task is to find a particular number in the array.
The recursive algorithm: pick the element in the middle, compare with the target number, discard a quarter of the array (depending on the result of the comparison), apply recursively to the remainig 3 quarters is quite interesting to analyze.

Non-deterministically polynomial complexity gets my vote, especially with the (admittedly considered unlikely) possibility that it may turn out to be the same as polynomial. In the same vein, anything that can theoretically benefit from quantum computing (N.B. this set is by no means all algorithms).
The other that would get my vote would be common mathematical operations on arbitrary-precision numbers -- this is where you have to consider things like multiplying big numbers is more expensive than multiplying small ones. There is quite a lot of analysis of this in Knuth (which shouldn't be news to anyone). Karatsuba's method is pretty neat: cut the two factors in half by digit (A1;A2)(B1;B2) and multiply A1 B1, A1 B2, A2 B1, A2 B2 separately, and then combine the results. Recurse if desired...

Shell sort. There are tons of variants with various increments, most of which have no benefits except to make the complexity analysis simpler.

Related

What sorting techniques can I use when comparing elements is expensive?

Problem
I have an application where I want to sort an array a of elements a0, a1,...,an-1. I have a comparison function cmp(i,j) that compares elements ai and aj and a swap function swap(i,j), that swaps elements ai and aj of the array. In the application, execution of the cmp(i,j) function might be extremely expensive, to the point where one execution of cmp(i,j) takes longer than any other steps in the sort (except for other cmp(i,j) calls, of course) together. You may think of cmp(i,j) as a rather lengthy IO operation.
Please assume for the sake of this question that there is no way to make cmp(i,j) faster. Assume all optimizations that could possibly make cmp(i,j) faster have already been done.
Questions
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
It is possible in my application to write a predicate expensive(i,j) that is true iff a call to cmp(i,j) would take a long time. expensive(i,j) is cheap and expensive(i,j) ∧ expensive(j,k) → expensive(i,k) mostly holds in my current application. This is not guaranteed though.
Would the existance of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I'd like pointers to further material on this topic.
Example
This is an example that is not entirely unlike the application I have.
Consider a set of possibly large files. In this application the goal is to find duplicate files among them. This essentially boils down to sorting the files by some arbitrary criterium and then traversing them in order, outputting sequences of equal files that were encountered.
Of course reader in large amounts of data is expensive, therefor one can, for instance, only read the first megabyte of each file and calculate a hash function on this data. If the files compare equal, so do the hashes, but the reverse may not hold. Two large file could only differ in one byte near the end.
The implementation of expensive(i,j) in this case is simply a check whether the hashes are equal. If they are, an expensive deep comparison is neccessary.

I'll try to answer each question as best as I can.
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Traditional sorting methods may have some variation, but in general, there is a mathematical limit to the minimum number of comparisons necessary to sort a list, and most algorithms take advantage of that, since comparisons are often not inexpensive. You could try sorting by something else, or try using a shortcut that may be faster that may approximate the real solution.
Would the existance of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I don't think you can get around the necessity of doing at least the minimum number of comparisons, but you may be able to change what you compare. If you can compare hashes or subsets of the data instead of the whole thing, that could certainly be helpful. Anything you can do to simplify the comparison operation will make a big difference, but without knowing specific details of the data, it's hard to suggest specific solutions.
I'd like pointers to further material on this topic.
Check these out:
Apparently Donald Knuth's The Art of Computer Programming, Volume 3 has a section on this topic, but I don't have a copy handy.
Wikipedia of course has some insight into the matter.
Sorting an array with minimal number of comparisons
How do I figure out the minimum number of swaps to sort a list in-place?
Limitations of comparison based sorting techniques

The theoretical minimum number of comparisons needed to sort an array of n elements on average is lg (n!), which is about n lg n - n. There's no way to do better than this on average if you're using comparisons to order the elements.
Of the standard O(n log n) comparison-based sorting algorithms, mergesort makes the lowest number of comparisons (just about n lg n, compared with about 1.44 n lg n for quicksort and about n lg n + 2n for heapsort), so it might be a good algorithm to use as a starting point. Typically mergesort is slower than heapsort and quicksort, but that's usually under the assumption that comparisons are fast.
If you do use mergesort, I'd recommend using an adaptive variant of mergesort like natural mergesort so that if the data is mostly sorted, the number of comparisons is closer to linear.
There are a few other options available. If you know for a fact that the data is already mostly sorted, you could use insertion sort or a standard variation of heapsort to try to speed up the sorting. Alternatively, you could use mergesort but use an optimal sorting network as a base case when n is small. This might shave off enough comparisons to give you a noticeable performance boost.
Hope this helps!

A technique called the Schwartzian transform can be used to reduce any sorting problem to that of sorting integers. It requires you to apply a function f to each of your input items, where f(x) < f(y) if and only if x < y.
(Python-oriented answer, when I thought the question was tagged [python])
If you can define a function f such that f(x) < f(y) if and only if x < y, then you can sort using
sort(L, key=f)
Python guarantees that key is called at most once for each element of the iterable you are sorting. This provides support for the Schwartzian transform.
Python 3 does not support specifying a cmp function, only the key parameter. This page provides a way of easily converting any cmp function to a key function.

Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Edit: Ah, sorry. There are algorithms that minimize the number of comparisons (below), but not that I know of for specific elements.
Would the existence of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
Not that I know of, but perhaps you'll find it in these papers below.
I'd like pointers to further material on this topic.
On Optimal and Eﬃcient in Place Merging
Stable Minimum Storage Merging by Symmetric Comparisons
Optimal Stable Merging (this one seems to be O(n log2 n) though
Practical In-Place Mergesort
If you implement any of them, posting them here might be useful for others too! :)

Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Merge insertion algorithm, described in D. Knuth's "The art of computer programming", Vol 3, chapter 5.3.1, uses less comparisons than other comparison-based algorithms. But still it needs O(N log N) comparisons.
Would the existence of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I think some of existing sorting algorithms may be modified to take into account expensive(i,j) predicate. Let's take the simplest of them - insertion sort. One of its variants, named in Wikipedia as binary insertion sort, uses only O(N log N) comparisons.
It employs a binary search to determine the correct location to insert new elements. We could apply expensive(i,j) predicate after each binary search step to determine if it is cheap to compare the inserted element with "middle" element found in binary search step. If it is expensive we could try the "middle" element's neighbors, then their neighbors, etc. If no cheap comparisons could be found we just return to the "middle" element and perform expensive comparison.
There are several possible optimizations. If predicate and/or cheap comparisons are not so cheap we could roll back to the "middle" element earlier than all other possibilities are tried. Also if move operations cannot be considered as very cheap, we could use some order statistics data structure (like Indexable skiplist) do reduce insertion cost to O(N log N).
This modified insertion sort needs O(N log N) time for data movement, O(N2) predicate computations and cheap comparisons and O(N log N) expensive comparisons in the worst case. But more likely there would be only O(N log N) predicates and cheap comparisons and O(1) expensive comparisons.
Consider a set of possibly large files. In this application the goal is to find duplicate files among them.
If the only goal is to find duplicates, I think sorting (at least comparison sorting) is not necessary. You could just distribute the files between buckets depending on hash value computed for first megabyte of data from each file. If there are more than one file in some bucket, take other 10, 100, 1000, ... megabytes. If still more than one file in some bucket, compare them byte-by-byte. Actually this procedure is similar to radix sort.

Most sorting algorithm out there try minimize the amount of comparisons during sorting.
My advice:
Pick quick-sort as a base algorithm and memorize results of comparisons just in case you happen to compare the same problems again. This should help you in the O(N^2) worst case of quick-sort. Bear in mind that this will make you use O(N^2) memory.
Now if you are really adventurous you could try the Dual-Pivot quick-sort.

Something to keep in mind is that if you are continuously sorting the list with new additions, and the comparison between two elements is guaranteed to never change, you can memoize the comparison operation which will lead to a performance increase. In most cases this won't be applicable, unfortunately.

We can look at your problem in the another direction, Seems your problem is IO related, then you can use advantage of parallel sorting algorithms, In fact you can run many many threads to run comparison on files, then sort them by one of a best known parallel algorithms like Sample sort algorithm.

Quicksort and mergesort are the fastest possible sorting algorithm, unless you have some additional information about the elements you want to sort. They will need O(n log(n)) comparisons, where n is the size of your array.
It is mathematically proved that any generic sorting algorithm cannot be more efficient than that.
If you want to make the procedure faster, you might consider adding some metadata to accelerate the computation (can't be more precise unless you are, too).
If you know something stronger, such as the existence of a maximum and a minimum, you can use faster sorting algorithms, such as radix sort or bucket sort.
You can look for all the mentioned algorithms on wikipedia.
As far as I know, you can't benefit from the expensive relationship. Even if you know that, you still need to perform such comparisons. As I said, you'd better try and cache some results.
EDIT I took some time to think about it, and I came up with a slightly customized solution, that I think will make the minimum possible amount of expensive comparisons, but totally disregards the overall number of comparisons. It will make at most (n-m)*log(k) expensive comparisons, where
n is the size of the input vector
m is the number of distinct component which are easy to compare between each other
k is the maximum number of elements which are hard to compare and have consecutive ranks.
Here is the description of the algorithm. It's worth nothing saying that it will perform much worse than a simple merge sort, unless m is big and k is little. The total running time is O[n^4 + E(n-m)log(k)], where E is the cost of an expensive comparison (I assumed E >> n, to prevent it from being wiped out from the asymptotic notation. That n^4 can probably be further reduced, at least in the mean case.
EDIT The file I posted contained some errors. While trying it, I also fixed them (I overlooked the pseudocode for insert_sorted function, but the idea was correct. I made a Java program that sorts a vector of integers, with delays added as you described. Even if I was skeptical, it actually does better than mergesort, if the delay is significant (I used 1s delay agains integer comparison, which usually takes nanoseconds to execute)

Sorting in O(n*log(n)) worst case

Is there a sort of an array that works in O(n*log(n)) worst case time complexity?
I saw in Wikipedia that there are sorts like that, but they are unstable, what does that mean? Is there a way to do in low space complexity?
Is there a best sorting algorithm?

An algorithm that requires only O(1) extra memory (so modifying the input array is permitted) is generally described as "in-place", and that's the lowest space complexity there is.
A sort is described as "stable" or not, according to what happens when there are two elements in the input which compare as equal, but are somehow distinguishable. For example, suppose you have a bunch of records with an integer field and a string field, and you sort them on the integer field. The question is, if two records have the same integer value but different string values, then will the one that came first in the input, also come first in the output, or is it possible that they will be reversed? A stable sort is one that guarantees to preserve the order of elements that compare the same, but aren't identical.
It is difficult to make a comparison sort that is in-place, and stable, and achieves O(n log n) worst-case time complexity. I've a vague idea that it's unknown whether or not it's possible, but I don't keep up to date on it.
Last time someone asked about the subject, I found a couple of relevant papers, although that question wasn't identical to this question:
How to sort in-place using the merge sort algorithm?
As far as a "best" sort is concerned - some sorting strategies take advantage of the fact that on the whole, taken across a large number of applications, computers spend a lot of time sorting data that isn't randomly shuffled, it has some structure to it. Timsort is an algorithm to take advantage of commonly-encountered structure. It performs very well in a lot of practical applications. You can't describe it as a "best" sort, since it's a heuristic that appears to do well in practice, rather than being a strict improvement on previous algorithms. But it's the "best" known overall in the opinion of people who ship it as their default sort (Python, Java 7, Android). You probably wouldn't describe it as "low space complexity", though, it's no better than a standard merge sort.

You can check out between mergesort, quicksort or heapsort all nicely described here.
There is also radix sort whose complexity is O(kN) but it takes full advantage of extra memory consumption.
You can also see that for smaller collections quicksort is faster but then mergesort takes the lead but all of this is case specific so take your time to study all 4 algorithms

For the question best algorithm, the simple answer is, it depends.It depends on the size of the data set you want to sort,it depends on your requirement.Say, Bubble sort has worst-case and average complexity both О(n2), where n is the number of items being sorted. There exist many sorting algorithms with substantially better worst-case or average complexity of O(n log n). Even other О(n2) sorting algorithms, such as insertion sort, tend to have better performance than bubble sort. Therefore, bubble sort is not a practical sorting algorithm when n is large.
Among simple average-case Θ(n2) algorithms, selection sort almost always outperforms bubble sort, but is generally outperformed by insertion sort.
selection sort is greatly outperformed on larger arrays by Θ(n log n) divide-and-conquer algorithms such as mergesort. However, insertion sort or selection sort are both typically faster for small arrays.
Likewise, you can yourself select the best sorting algorithm according to your requirements.

It is proven that O(n log n) is the lower bound for sorting generic items. It is also proven that O(n) is the lower bound for sorting integers (you need at least to read the input :) ).
The specific instance of the problem will determine what is the best algorithm for your needs, ie. sorting 1M strings is different from sorting 2M 7-bits integers in 2MB of RAM.
Also consider that besides the asymptotic runtime complexity, the implementation is making a lot of difference, as well as the amount of available memory and caching policy.
I could implement quicksort in 1 line in python, roughly keeping O(n log n) complexity (with some caveat about the pivot), but Big-Oh notation says nothing about the constant terms, which are relevant too (ie. this is ~30x slower than python built-in sort, which is likely written in C btw):
qsort = lambda a: [] if not a else qsort(filter(lambda x: x<a[len(a)/2], a)) + filter(lambda x: x == a[len(a)/2], a) + qsort(filter(lambda x: x>a[len(a)/2], a))
For a discussion about stable/unstable sorting, look here http://www.developerfusion.com/article/3824/a-guide-to-sorting/6/.
You may want to get yourself a good algorithm book (ie. Cormen, or Skiena).

Heapsort, maybe randomized quicksort
stable sort
as others already mentioned: no there isn't. For example you might want to parallelize your sorting algorithm. This leads to totally different sorting algorithms..

Regarding your question meaning stable, let's consider the following: We have a class of children associated with ages:
Phil, 10
Hans, 10
Eva, 9
Anna, 9
Emil, 8
Jonas, 10
Now, we want to sort the children in order of ascending age (and nothing else). Then, we see that Phil, Hans and Jonas all have age 10, so it is not clear in which order we have to order them since we sort just by age.
Now comes stability: If we sort stable we sort Phil, Hans and Jonas in the order they were before, i.e. we put Phil first, then Hans, and at last, Jonas (simply because they were in this order in the original sequence and we only consider age as comparison criterion). Similarily, we have to put Eva before Anna (both the same age, but in the original sequence Eva was before Anna).
So, the result is:
Emil, 8
Eva, 9
Anna, 9
Phil, 10 \
Hans, 10 | all aged 10, and left in original order.
Jonas, 10 /
To put it in a nutshell: Stability means that if two elements are equal (w.r.t. the chosen sorting criterion), the one coming first in the original sequence still comes first in the resulting sequence.
Note that you can easily transform any sorting algorithm into a stable sorting algorithm: If your original sequence holds n elements: e1, e2, e3, ..., en, you simply attach a counter to each one: (e1, 0), (e2, 1), (e3, 2), ..., (en, n-1). This means you store for each element its original position.
If now two elements are equal, you simply compare their counters and put the one with the lower counter value first. This increases runtime (and memory) by O(n), which is asymptotic no worsening since the best (comparison) sort algorithm needs already O(n lg n).

How to test an algorithm for perfect optimization?

Is there any way to test an algorithm for perfect optimization?

There is no easy way to prove that any given algorithm is asymptotically optimal.
Proving optimality (if ever) sometimes follows years and/or decades after the algorithm has been written. A classic example is the Union-Find/disjoint-set data structure.
Disjoint-set forests are a data structure where each set is represented by a tree data structure, in which each node holds a reference to its parent node. They were first described by Bernard A. Galler and Michael J. Fischer in 1964, although their precise analysis took years.
[...] These two techniques complement each other; applied together, the amortized time per operation is only O(α(n)), where α(n) is the inverse of the function f(n) = A(n,n), and A is the extremely quickly-growing Ackermann function.
[...] In fact, this is asymptotically optimal: Fredman and Saks showed in 1989 that Ω(α(n)) words must be accessed by any disjoint-set data structure per operation on average.
For some algorithms optimality can be proven after very careful analysis, but generally speaking, there's no easy way to tell if an algorithm is optimal once it's written. In fact, it's not always easy to prove if the algorithm is even correct.
See also
Wikipedia/Matrix multiplication
The naive algorithm is O(N3), Strassen's is roughly O(N2.807), Coppersmith-Winograd is O(N2.376), and we still don't know what is optimal.
Wikipedia/Asymptotically optimal
it is an open problem whether many of the most well-known algorithms today are asymptotically optimal or not. For example, there is an O(nα(n)) algorithm for finding minimum spanning trees. Whether this algorithm is asymptotically optimal is unknown, and would be likely to be hailed as a significant result if it were resolved either way.
Practical considerations
Note that sometimes asymptotically "worse" algorithms are better in practice due to many factors (e.g. ease of implementation, actually better performance for the given input parameter range, etc).
A typical example is quicksort with a simple pivot selection that may exhibit quadratic worst-case performance, but is still favored in many scenarios over a more complicated variant and/or other asymptotically optimal sorting algorithms.

For those among us mortals that merely want to know if an algorithm:
reasonably works as expected;
is faster than others;
there is an easy step called 'benchmark'.
Pick up the best contenders in the area and compare them with your algorithm.
If your algorithm wins then it better matches your needs (the ones defined by
your benchmarks).

Using red black trees for sorting

The worst-case running time of insertion on a red-black tree is O(lg n) and if I perform a in-order walk on the tree, I essentially visit each node, so the total worst-case runtime to print the sorted collection would be O(n lg n)
I am curious, why are red-black trees not preferred for sorting over quick sort (whose average-case running time is O(n lg n).
I see that maybe because red-black trees do not sort in-place, but I am not sure, so maybe someone could help.

Knowing which sort algorithm performs better really depend on your data and situation.
If you are talking in general/practical terms,
Quicksort (the one where you select the pivot randomly/just pick one fixed, making worst case Omega(n^2)) might be better than Red-Black Trees because (not necessarily in order of importance)
Quicksort is in-place. The keeps your memory footprint low. Say this quicksort routine was part of a program which deals with a lot of data. If you kept using large amounts of memory, your OS could start swapping your process memory and trash your perf.
Quicksort memory accesses are localized. This plays well with the caching/swapping.
Quicksort can be easily parallelized (probably more relevant these days).
If you were to try and optimize binary tree sorting (using binary tree without balancing) by using an array instead, you will end up doing something like Quicksort!
Red-Black trees have memory overheads. You have to allocate nodes possibly multiple times, your memory requirements with trees is doubles/triple that using arrays.
After sorting, say you wanted the 1045th (say) element, you will need to maintain order statistics in your tree (extra memory cost because of this) and you will have O(logn) access time!
Red-black trees have overheads just to access the next element (pointer lookups)
Red-black trees do not play well with the cache and the pointer accesses could induce more swapping.
Rotation in red-black trees will increase the constant factor in the O(nlogn).
Perhaps the most important reason (but not valid if you have lib etc available), Quicksort is very simple to understand and implement. Even a school kid can understand it!
I would say you try to measure both implementations and see what happens!
Also, Bob Sedgewick did a thesis on quicksort! Might be worth reading.

There are plenty of sorting algorithms which are worst case O(n log n) - for example, merge sort. The reason quicksort is preferred is because it is faster in practice, even though algorithmically it may not be as good as some other algorithms.
Often in-built sorts use a combination of various methods depending on the values of n.

There are many cases where red-back trees are not bad for sorting. My testing showed, compared to natural merge sort, that red-black trees excel where:
Trees are better for Dups:
All the tests where dups need to be eleminated, tree algorithm is better. This is not astonishing, since the tree can be kept very small from the beginning, whereby algorithms that are designed for inline array sort might pass around larger segments for a longer time.
Trees are better for Random:
All the tests with random, tree algorithm is better. This is also not astonishing, since in a tree distance between elements is shorter and shifting is not necessary. So repeatedly inserting into a tree could need less effort than sorting an array.
So we get the impression that the natural merge sort only excels in ascending and descending special cases. Which cant be even said for quick sort.
Gist with the test cases here.
P.S.: it should be noted that using trees for sorting is non-trivial. One has not only to provide an insert routine but also a routine that can linearize the tree back to an array. We are currently using a get_last and a predecessor routine, which doesn't need a stack. But these routines are not O(1) since they contain loops.

Big-O time complexity measures do not usually take into account scalar factors, e.g., O(2n) and O(4n) are usually just reduced to O(n). Time complexity analysis is based on operational steps at an algorithmic level, not at a strict programming level, i.e., no source code or native machine instruction considerations.
Quicksort is generally faster than tree-based sorting since (1) the methods have the same algorithmic average time complexity, and (2) lookup and swapping operations require fewer program commands and data accesses when working with simple arrays than with red-black trees, even if the tree uses an underlying array-based implementation. Maintenance of the red-black tree constraints requires additional operational steps, data field value storage/access (node colors), etc than the simple array partition-exchange steps of a quicksort.
The net result is that red-black trees have higher scalar coefficients than quicksort does that are being obscured by the standard O(n log n) average time complexity analysis result.
Some other practical considerations related to machine architectures are briefly discussed in the Quicksort article on Wikipedia

Generally, representations of O(nlgn) algorithms can be expanded to A*nlgn + B where A and B are constants. There are many algorithmic proofs that show the coefficients for quicksort are smaller than those of other algorithms. That is in best-case (quick sort performs horribly on sorted data).

Hi the best way to explain the difference between all sorting routine in my opinion is.
(My answer is for people who are confused how quick sort is faster in practice than another sorting algo).
"Think u are running on a very slow computer".
First thing one comparing operation takes 1 hour.
One shifting operation takes 2 hours.
"I am using hour just to make people understand how important time is".
Now from all the sorting operations quick-sort have very very less comparisons and very less swapping for elements.
Quick-sort is faster for this main reason.

How do I find out out the fundamental operation when calculating run-time complexity?

I am trying to get the worst run-time complexity order on a couple of algorithms created. However I have run into a problem that I keep tending to select the wrong or wrong amount of fundamental operations for an algorithm.
To me it appears to be that the selection of the fundamental operation is more of an art than a science. After googling and reading my text boxes, I still have not found a good definition. So far I have defined it as "An operation that always occurs within an algorithms execution" such as a comparison or array manipulation.
But algorithms often have many comparisons that are always executed so which operation do you pick?

I agree to some degree it's an art, so you should always clarify when writing documentation, etc.. But usually it's a "visit" to the underlying data structure. So like you said, for an array it's a comparison or a swap, for a hash map it may be a manual examination of a key, for a graph it's a visit to a vertex or edge, etc.

Even practicing complexity theorists have disagreements about this sort of thing, so what follows may be a bit subjective: http://blog.computationalcomplexity.org/2009/05/shaving-logs-with-unit-cost.html
The purpose of big-O notation is to summarize the efficiency of an algorithm for the reader. In practical contexts, I am most concerned with how many clock cycles an algorithm takes, assuming that the big-O constant is neither extremely small or large (and ignoring the effects of the memory hierarchy); this is the "unit-cost" model alluded to in the linked post.
The reason to count comparisons for sorting algorithms is that the cost of a comparison depends on the type of the input data. You could say that a sorting algorithm takes O(c n log n) cycles where c is the expense of a comparison, but it's simpler in this case to count comparisons instead because the other work performed by the algorithm is O(n log n). There's a sorting algorithm that sorts the concatenation of n sorted arrays of length n in n^2 log n steps and n^2 comparisons; here, I would expect that the number of comparisons and the computational overhead be stated separately, because neither necessarily dominates the other.

This only works when You have actually implemented the algorithm, but You could just use a profiler to see which operation is the bottleneck. That's a practical point of view. In theory, some assume that everything that is not the fundamental operation runs in zero time.

The somewhat simple definition I have heard is:
The operation which is executed at least as many times as any other
operation in the algorithm.
For example, in a sorting algorithm, these tend to be comparisons rather than assignments as you almost always have to visit and 'check' an element before you re-order it, but the check may not result in a re-ordering. So there will always be at-least as many comparisons as assignments.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio