Which programs/algorithms change the representation of their data structure at runtime in order to obtain beter performance?
Context:
Data structures "define" how real-world concepts are structured and represented in computer memory. For different kinds of computations a different data structure should/can be used to achieve acceptable performance (e.g., linked-list versus array implementation).
Self-adaptive (cf. self-updating) data structures are data structures that change their internal state according to a concrete usage pattern (e.g., self balancing trees). These changes are internal, i.e., depending the data. Moreover, these changes are anticipated upon by design.
Other algorithms can benefit from an external change of representation. In
matrix-multiplication, for instance, it is a well know performance trick to transpose "the second matrix" (such that caches are used more efficiently). This is actually changing the matrix representation from row-major to column major order. Because "A" is not the same as "Transposed(A)", the the second matrix is transposed again after the multiplication to keep the program semantically correct.
A second example is using a linked-list at program start-up to populate "the data structure" and change to an array based implementation once the content of the list becomes "stable".
I am looking for programmers that have similar experiences with other example programs where an external change of representation is performed in their application in order to have better performance. Thus, where the representation (chosen implementation) of a data structure is changed at runtime as an explicit part of the program.
The pattern of transforming the input representation in order to enable a more efficient algorithm comes up in many situations. I would go as far as to say this is an important way to think about designing efficient algorithms in general. Some examples that come to mind:
HeapSort. It works by transforming your original input list into a binary heap (probably a min-heap), and then repeatedly calling the remove-min function to get the list elements in sorted order. Asymptotically, it is tied for the fastest comparison-based sorting algorithm.
Finding duplicates in a list. Without changing the input list, this will take O(n^2) time. But if you can sort the list, or store the elements in a hash table or Bloom filter, you can find all the duplicates in O(n log n) time or better.
Solving a linear program. A linear program (LP) is a certain kind of optimization problem with many applications in economics and elsewhere. One of the most important techniques in solving LPs is duality, which means converting your original LP into what is called the "dual", and then solving the dual. Depending on your situation, solving the dual problem may be much easier than solving the original ("primal") LP. This book chapter starts with a nice example of primal/dual LPs.
Multiplying very large integers or polynomials. The fastest known method is using the FFT; see here or here for some nice descriptions. The gist of the idea is to convert from the usual representation of your polynomial (a list of coefficients) to an evaluation basis (a list of evaluations of that polynomial at certain carefully-chosen points). The evaluation basis makes multiplication trivial - you can just multiply each pair of evaluations. Now you have the product polynomial in an evaluation basis, and you interpolate (opposite of evaluation) to get back the coefficients, like you wanted. The Fast Fourier Transform (FFT) is a very efficient way of doing the evaluation and interpolation steps, and the whole thing can be much faster than working with the coefficients directly.
Longest common substring. If you want to find the longest substring that appears in a bunch of text documents, one of the fastest ways is to create a suffix tree from each one, then merge them together and find the deepest common node.
Linear algebra. Various matrix computations are performed most efficiently by converting your original matrix into a canonical form such as Hermite normal form or computing a QR factorization. These alternate representations of the matrix make standard things such as finding the inverse, determinant, or eigenvalues much faster to compute.
There are certainly many examples besides these, but I was trying to come up with some variety.
Related
So, I am studying 3-way merge sort and I am wondering about the without loss of generality.
lets assume that we have array A' with power of 3 elements and A with power of any constant.
Here is my question.
Why having an assumption that n(number of elements) is a power of three is without loss of generality?
WHy any assumption of the form that n is a power of a constant is also without loss of generality?
Because you can always enlarge an array A to fit the size that you want just for making an algorithm to work.
An actual implementation might or might not use that assumption, but in principle taking the assumption does not prevent you to apply the algorithm to any array A of any size. The assumption about size is there because it simplifies the algorithm and is convenient for analyzing time and conplexity.
I have been studying this topic in an Algorithms textbook.
The clever usage of the complex roots of unity seems to be mathematically working. However, I do not understand how one could actually represent this in a computer.
I can think of two things:
Use the real/imaginary decomposition to represent the complex numbers. But this means using floats, which means I open up my algorithm to numerical error and I would loose out precision even if I want to multiply two polynomials with integer coefficients.
Represent exp(i 2pi/n) as n. So, I'd eventually get a tuple in omega, and if I have to keep it in this form, I'd essentially be doing polynomial multiplication in omega again, taking us back to square one.
I'd really like to see an implementation of this algorithm in a familiar programming language.
Indeed as you identify, the roots of unity are typically not nice numbers that can be stored well in a computer. Since the numerical error is small, if you know the output should be integers, rounding usually produces the right result.
If you don't want to (or cannot) rely on that, an exact option is the Number Theoretic Transform. It substitutes the roots of unity in the complex plane with roots of unity in a finite field ℤ/pℤ where p is a suitable prime. p has to be large enough for all the necessary roots to exist, and the efficiency is affected by properties of p. If you choose a Fermat prime then the roots of unity have convenient forms and there is a trick to do reduction modulo p more efficiently than usual. That is all exact integer arithmetic and the values stay small, so there is no problem implementing it in a computer.
That technique is used in the Schönhage–Strassen algorithm so you can look up the specifics there.
As an undergrad, I studied O(n log n) algorithms for sorting and proofs that we cannot do better for the general case when all we can do is compare 2 numbers. That was for a random access memory model of computation.
I want to know if such theoretic lowerbounds exists for functional style(referentially transparent) programs. Let us assume that each beta reduction counts as one step, and each comparison counts as one step.
So let us assume that you agree with the proof that we cannot do better than (n*log n) in a general case where the only thing we have is a comparison.
So the question is whether we can show that we can do the same without side effects.
One way to do that uses the idea that if you can build an immutable binary search tree in O(n*log n) and then infix-traverse it (can be done in O(n)), then we would have a sorting algorithm.
If we can go through all items and add every item to a balanced immutable (persistent) tree in O(log n) it would give us O(n*log n) algorithm.
Can we add to a persistent binary tree in O(log n)? Sure, there are immutable variants of balanced binary search trees with O(log n) insert in every reasonable implementation of persistent data structure library.
To get an idea why it is possible, imagine a standard balanced binary search tree as e.g. red-black tree. You can make immutable version of it by following the same algorithm as for the mutable one, except whenever pointers or color change, you need to allocate a new node and consequently all of its parents to the root (while simultaneously transforming them too if necessary). Side branches that do not change get reused. There are at most O(log n) affected nodes, so at most O(log n) operations (including allocations) per insertion. If you know red-black, you can see that there are no other multipliers to this except for constants (for rotations you can get a few extra allocations for affected siblings, but that still remains a constant factor).
This - quite informal - demonstration can give you an idea that a proof for O(n*log n) for sorting without side effects exists. However, there are a few more things that I left out. E.g. allocation is considered to be O(1) here, which may not be a case always, but that would get too complex.
I think modern functional programming implementations (at least Clojure, since that's the only one I know) do have immutable memory, but that doesn't mean that altering lists and such result in copying of the whole original list. As such, I don't believe there are order-of-magnitude level computational differences between implementing sorting algorithms with imperative or functional idioms.
More on how immutable lists can be modified without memory copying:
For an example of how this might work, consider this snippet from the Clojure reference at: http://clojure-doc.org/articles/tutorials/introduction.html
...In Clojure, all scalars and core data structures are like this. They
are values. They are immutable.
The map {:name "John" :hit-points 200 :super-power :resourcefulness} is a
value. If you want to "change" John's hit-points, you don't change
anything per se, but rather, you just conjure up a whole new hashmap
value.
But wait: If you've done any imperative style programming in C-like
languages, this sounds crazy wasteful. However, the yin to this
immutability yang is that --- behind the scenes --- Clojure shares
data structures. It keeps track of all their pieces and re-uses them
pervasively. For example, if you have a 1,000,000-item list and want
to tack on one more item, you just tell Clojure, "give me a new one
but with this item added" --- and Clojure dutifully gives you back a
1,000,001-item list in no time flat. Unbeknownced to you it's re-using
the original list.
So why the fuss about immutability?
I don't know the complete history of functional programming, but it seems to me that the immutability characteristic of functional languages essentially abstracts away the intricacies of shared memory.
While this is cool, without a language-managed shared data structure underpinning the whole mechanism, it would be impractically slow for many use cases.
What is meant by to "sort in place"?
The idea of an in-place algorithm isn't unique to sorting, but sorting is probably the most important case, or at least the most well-known. The idea is about space efficiency - using the minimum amount of RAM, hard disk or other storage that you can get away with. This was especially relevant going back a few decades, when hardware was much more limited.
The idea is to produce an output in the same memory space that contains the input by successively transforming that data until the output is produced. This avoids the need to use twice the storage - one area for the input and an equal-sized area for the output.
Sorting is a fairly obvious case for this because sorting can be done by repeatedly exchanging items - sorting only re-arranges items. Exchanges aren't the only approach - the Insertion Sort, for example, uses a slightly different approach which is equivalent to doing a run of exchanges but faster.
Another example is matrix transposition - again, this can be implemented by exchanging items. Adding two very large numbers can also be done in-place (the result replacing one of the inputs) by starting at the least significant digit and propogating carries upwards.
Getting back to sorting, the advantages to re-arranging "in place" get even more obvious when you think of stacks of punched cards - it's preferable to avoid copying punched cards just to sort them.
Some algorithms for sorting allow this style of in-place operation whereas others don't.
However, all algorithms require some additional storage for working variables. If the goal is simply to produce the output by successively modifying the input, it's fairly easy to define algorithms that do that by reserving a huge chunk of memory, using that to produce some auxiliary data structure, then using that to guide those modifications. You're still producing the output by transforming the input "in place", but you're defeating the whole point of the exercise - you're not being space-efficient.
For that reason, the normal definition of an in-place definition requires that you achieve some standard of space efficiency. It's absolutely not acceptable to use extra space proportional to the input (that is, O(n) extra space) and still call your algorithm "in-place".
The Wikipedia page on in-place algorithms currently claims that an in-place algorithm can only use a constant amount - O(1) - of extra space.
In computer science, an in-place algorithm (or in Latin in situ) is an algorithm which transforms input using a data structure with a small, constant amount of extra storage space.
There are some technicalities specified in the In Computational Complexity section, but the conclusion is still that e.g. Quicksort requires O(log n) space (true) and therefore is not in-place (which I believe is false).
O(log n) is much smaller than O(n) - for example the base 2 log of 16,777,216 is 24.
Quicksort and heapsort are both normally considered in-place, and heapsort can be implemented with O(1) extra space (I was mistaken about this earlier). Mergesort is more difficult to implement in-place, but the out-of-place version is very cache-friendly - I suspect real-world implementations accept the O(n) space overhead - RAM is cheap but memory bandwidth is a major bottleneck, so trading memory for cache-efficiency and speed is often a good deal.
[EDIT When I wrote the above, I assume I was thinking of in-place merge-sorting of an array. In-place merge-sorting of a linked list is very simple. The key difference is in the merge algorithm - doing a merge of two linked lists with no copying or reallocation is easy, doing the same with two sub-arrays of a larger array (and without O(n) auxiliary storage) AFAIK isn't.]
Quicksort is also cache-efficient, even in-place, but can be disqualified as an in-place algorithm by appealing to its worst-case behaviour. There is a degenerate case (in a non-randomized version, typically when the input is already sorted) where the run-time is O(n^2) rather than the expected O(n log n). In this case the extra space requirement is also increased to O(n). However, for large datasets and with some basic precautions (mainly randomized pivot selection) this worst-case behaviour becomes absurdly unlikely.
My personal view is that O(log n) extra space is acceptable for in-place algorithms - it's not cheating as it doesn't defeat the original point of working in-place.
However, my opinion is of course just my opinion.
One extra note - sometimes, people will call a function in-place simply because it has a single parameter for both the input and the output. It doesn't necessarily follow that the function was space efficient, that the result was produced by transforming the input, or even that the parameter still references the same area of memory. This usage isn't correct (or so the prescriptivists will claim), though it's common enough that it's best to be aware but not get stressed about it.
In-place sorting means sorting without any extra space requirement. According to wiki , it says
an in-place algorithm is an algorithm which transforms input using a data structure with a small, constant amount of extra storage space.
Quicksort is one example of In-Place Sorting.
I don't think these terms are closely related:
Sort in place means to sort an existing list by modifying the element order directly within the list. The opposite is leaving the original list as is and create a new list with the elements in order.
Natural ordering is a term that describes how complete objects can somehow be ordered. You can for instance say that 0 is lower that 1 (natural ordering for integers) or that A is before B in alphabetical order (natural ordering for strings). You can hardly say though that Bob is greater or lower than Alice in general as it heavily depends on specific attributes (alphabetically by name, by age, by income, ...). Therefore there is no natural ordering for people.
I'm not sure these concepts are similar enough to compare as suggested. Yes, they both involve sorting, but one is about a sort ordering that is human understandable (natural) and the other defines an algorithm for efficient sorting in terms of memory by overwriting into the existing structure instead of using an additional data structure (like a bubble sort)
it can be done by using swap function , instead of making a whole new structure , we implement that algorithm without even knowing it's name :D
I was curious to know how to select a sorting algorithm based on the input, so that I can get the best efficiency.
Should it be on the size of the input or how the input is arranged(Asc/Desc) or the data structure used etc ... ?
The importance of algorithms generally, and in sorting algorithms as well is as following:
(*) Correctness - This is the most important thing. It worth nothing if your algorithm is super fast and efficient, but is wrong. In sorting, even if you have 2 candidates that are sorting correctly, but you need a stable sort - you will chose the stable sort algorithm, even if it is less efficient - because it is correct for your purpose, and the other is not.
Next are basically trade offs between running time, needed space and implementation time (If you will need to implement something from scratch rather then use a library, for a minor performance enhancement - it probably doesn't worth it)
Some things to take into consideration when thinking about the trade off mentioned above:
Size of the input (for example: for small inputs, insertion sort is empirically faster then more advanced algorithms, though it takes O(n^2)).
Location of the input (sorting algorithms on disk are different from algorithms on RAM, because disk reads are much less efficient when not sequential. The algorithm which is usually used to sort on disk is a variation of merge-sort).
How is the data distributed? If the data is likely to be "almost sorted" - maybe a usually terrible bubble-sort can sort it in just 2-3 iterations and be super fast comparing to other algorithms.
What libraries do you have already implemented? How much work will it take to implement something new? Will it worth it?
Type (and range) of the input - for enumerable data (integers for example) - an integer designed algorithm (like radix sort) might be more efficient then a general case algorithm.
Latency requirement - if you are designing a missile head, and the result must return within specific amount of time, quicksort which might decay to quadric running time on worst case - might not be a good choice, and you might want to use a different algorithm which have a strict O(nlogn) worst case instead.
Your hardware - if for example you are using a huge cluster and a huge data - a distributed sorting algorithm will probably be better then trying to do all the work on one machine.
It should be based on all those things.
You need to take into account size of your data as Insertion sort can be faster than quicksort for small data sets, etc
you need to know the arrangement of your data due to differing worst/average/best case asymptotic runtimes for each of the algorithm (and some whose worst/avg cases are the same whereas the other may have significantly worse worst case vs avg)
and you obviously need to know the data structure used as there are some very specialized sorting algorithms if your data is already in a special format or even if you can put it into a new data structure efficiently that will automatically do your sorting for you (a la BST or heaps)
The 2 main things that determine your choice of a sorting algorithm are time complexity and space complexity. Depending on your scenario, and the resources (time and memory) available to you, you might need to choose between sorting algorithms, based on what each sorting algorithm has to offer.
The actual performance of a sorting algorithm depends on the input data too, and it helps if we know certain characteristics of the input data beforehand, like the size of input, how sorted the array already is.
For example,
If you know beforehand that the input data has only 1000 non-negative integers, you can very well use counting sort to sort such an array in linear time.
The choice of a sorting algorithm depends on the constraints of space and time, and also the size/characteristics of the input data.
At a very high level you need to consider the ratio of insertions vs compares with each algorithm.
For integers in a file, this isn't going to be hugely relevant but if say you're sorting files based on contents, you'll naturally want to do as few comparisons as possible.