Number of ways of counting nothing is one? - algorithm

This is something that I routinely err in while solving problems. How do we decide what is the value of a recursive function when the argument is at the lowest extreme. An example will help:
Given n, find the number of ways to tile a 3xN grid using 2x1 blocks only. Rotation of blocks is allowed.
The DP solution is easily found as
f(n): the number of ways of tiling a 3xN grid
g(n): the number of ways of tiling a 3xN grid with a 1x1 block cut off at the rightmost column
f(n) = f(n-2) + 2*g(n-1)
g(n) = f(n-1) + g(n-2)
I initially thought that the base cases would be f(0)=0, g(0)=0, f(1)=0, g(1)=1. However, this yields a wrong answer. I then read somewhere that f(0)=1 and reasoned it out as
The number of ways of tiling a 3x0 grid is one because there is only one way we cannot use any tiles(2x1 blocks).
My question is, by that logic, shouldn't g(0) be also one. But, in the correct solution, g(0)=0. In general, when can we say that the number of ways of using nothing is one?

About your specific question of tiling, think this way:
How many ways are there to "tile a 3*0 grid"?
I would say: Just one way, don't do anything! and you can't "do nothing" any other way. (f(0) = 1)
How many ways are there to "tile a 3*0 grid, cutting that specific block off"?
I would say: Hey! That's impossible! You can't cut the specific block off since there is nothing. So, there's no way one can solve the task anyhow. (g(0) = 0)
Now, let's get to the general case:
There's no "general" rule about zero cases.
Depending on your problem, you may be able to somehow "interpret" the situation, and find the reasonable value. Most of the times (depending on your definition of "ways") number of ways of doing "nothing" is 1, and number of ways of doing something impossible is 0!
Warning! Being able to somehow "interpret" the zero case is not enough for the relation to be correct! You should recheck your recursive relation (i.e. the way you get the n-th value from the previous ones) to be applicable for the zero-to-one case as well, since most of the time this would be a "tricky" case.
You may find it easier to base your recursive relation on some non-zero case, if you find the zero-case being tricky, or confusing.

The way I see it, g(0) is invalid, since there is no way to cut a 1x1 block out of a 3x0 grid.
Invalid values are typically represented as 0, -∞ or ∞, but this largely depends on the problem. The number of ways to place something would make sense to be 0 to represent invalid values (otherwise your counts will be off). When working with min, you'd generally use ∞. When working with max, you'd generally use -∞ (or possibly 0).
Generally, the number of ways to place 0 objects or objects in a 0-sized space makes sense to be 1 (i.e. placing no objects) (so f(0) = 1). In a lot of other cases valid values would be 0.
But these are far from rules (and avoid blindly following rules, because you'll get hurt with exceptions); the best advice I can give - when in doubt, throw a few values in and see what happens.
In this case you can easily determine what the actual values for g(1), f(1), g(2) and f(2) should be, and use these to calculate g(0) and f(0):
g(1) = 1
f(1) = 0
g(2): (all invalid, since ? is not populated)
|X |X ?X
|? || --
-- ?| --
g(2) = 0, thus g(0) = 0 - f(1) = 0 - 0 = 0
f(2):
|| -- --
|| -- ||
-- -- ||
f(2) = 3, thus f(0) = 3 - 2*g(1) = 3 - 2 = 1

Related

Is Recursion W/Memoization In Staircase Problem Bottom-Up?

Considering the classical staircase problem as "Davis has a number of staircases in his house and he likes to climb each staircase 1, 2, or 3 steps at a time. Being a very precocious child, he wonders how many ways there are to reach the top of the staircase."
My approach is to use memoization with recursion as
# TimeO(N), SpaceO(N), DP Bottom Up + Memoization
def stepPerms(n, memo = {}):
if n < 3:
return n
elif n == 3:
return 4
if n in memo:
return memo[n]
else:
memo[n] = stepPerms(n - 1, memo) + stepPerms(n - 2 ,memo) + stepPerms(n - 3 ,memo)
return memo[n]
The question that comes to my mind is that, is this solution bottom-up or top-down. My way of approaching it is that since we go all the way down to calculate the upper N values (imagine the recursion tree). I consider this bottom-up. Is this correct?
Recoursion strategies are as a general rule topdown approaches, whether they have memory or not. The underlaying algorithm design is dynamic programming, which traditionally built in a bottom-up fashion.
I noticed that you wrote your code in python, and python is generally not happy about deep recoursion (small amounts are okay, but performance quickly takes a hit and there is a maximum recousion depth of 1000 - unless it was changed since I read that).
If we make a bottom-up dynamic programmin version, we can get rid of this recousion, and we can also recognise that we only need constant amount of space, since we are only really interested in the last 3 values:
def stepPerms(n):
if n < 1: return n
memo = [1,2,4]
if n <= 3: return memo[n-1]
for i in range(3,n):
memo[i % 3] = sum(memo)
return memo[n-1]
Notice how much simpler the logic is, appart from the i is one less than the value, since the positions are starts a 0 instead of the count of 1.
In the top-down approach, the complex module is divided into submodules. So it is top down approach. On the other hand, bottom-up approach begins with elementary modules and then combine them further.
And bottom up approach of this solution will be:
memo{}
for i in range(0,3):
memo[i]=i
memo[3]=4
for i in range(4,n+1):
memo[i]=memo[i-1]+memo[i-2]+memo[i-3]

safe array partition based on some criteria

I am trying to solve this problem. the problem can be summarized as:
Given a sequence of integers find no of safe partitions, where safe partitions are defined as:
A safe partition is a partition into subsequences S1,S2,…,SK such that for each valid i, min(Si)≤|Si|≤max(Si)— that is, for each subsequence in this partition, its length is greater or equal to its smallest element and smaller or equal to its largest element.
Ex:
Input => 1 6 2 3 4 3 4
Output => 6 partitions
[1],[6,2,3,4,3,4]
[1,6,2],[3,4,3,4]
[1,6,2,3],[4,3,4]
[1],[6,2],[3,4,3,4]
[1],[6,2,3],[4,3,4]
[1,6],[2,3],[4,3,4]
I can probably find out the solution somewhere on internet which includes the code but i am more intrested in finding out the approach to solve this problem so i am asking here what are the points that I am missing in my observation.
These are the things that pop in my mind when I read this problem:
if an element at index i extends a sequence safely its quite
possible that it could also be the start of a new sequence.so at
every element i am left with two choices whether it extends the
sequence or not.
so i think it can be represented mathematically as ,
p(0..N)=1+P(i..N)+P(i+1..N),if A[i] is safe to extend current partition
p(0..N)=1+ p(i..N), if A[i] can't be used to extend
where P is the partition function.
is this reasoning valid? am i missing something?
[I'm having trouble giving a direction without actually giving the solution, because once a person thinks in the right direction then the solution becomes evident. I'll try to highlight some facts which may put a person on the right track.]
Explicitly enumerating safe partitions is problematic, since there are O(2n) safe partitions. For example in:
1,N,1,N,1,N ... [N elements]
For this sequence, at any subsequence of length > 1 and the subsequence [1] matches the criteria. The number of safe partitions for such a sequence of length n=2k is 3k-1. To prove that, look at the following
Base k = 1: f(1) = f(2) = 1
Step assumption: f(2k) = 3k-1.
f(2k+1) =
f(2k+2) = (f(2k) + f(2k-1)) + (f(2k-2) + f(2k-3)) + ... + f(1) + 1
= 2*(f(2k) + f(2k-2) + .. + f(2)) + 1
= 2 * (3k-1 + 3k-2 + ... + 1) + 1
= 2 * (3k - 1) / 2 + 1
= 3k
Since enumeration is out of the question, for any reasonable performance, the solution must somehow count without iterating. Since the proof that 1,N,...,1,N has 3k-1 did not have to explicitly enumerate all sequences, its principles can be generalized to any sequence.
NOTES:
I have solved similar problems before, so the direction was clear to me. For this question I tried to break my thoughts into something manageable and came up with the thought about complexity. I had a very strong feeling that this is exponential even before writing it down, and trying to prove it. This comes from experience and from seeing other problems. The complexity function felt worse than a Fibbonacci because adding an element to a sequence seemed to be adding at least two elements of smaller sizes (similar to the Fibbonacci sequence). Since Fibbonacci is exponential, so the 1,...,1 partitioning must be exponential. From there went on and analyzed it with a recurrence relation.
The exact way I reached the solution matches my way of thought. Everybody has a different way of thought that works for them, and they need to develop and find it.
This is how I came to suspect that the number of safe sequences in tge example was 3k-1:
I recursively calculated f(2k), with base condition f(1)=f(2)=1. Then for 3:
[1,N,1]
[1],[N,1]
[1,N],[1]
And for 4:
[1,N,1,N]
[1],[N,1,N]
[1,N],[1,N]
Meaning f(3)=f(4)=3. Then I recursively applied
f(2k+2)=2*(f(2k) + f(2k-2) + .. + f(2)) + 1
resulting with f(2)=1, f(4)=3, f(6)=9, f(8)=27. This suspiciously looks like 3k-1. Then I simply had to prove that with induction.

Count ways to take atleast one stick

There are N sticks placed in a straight line. Bob is planning to take few of these sticks. But whatever number of sticks he is going to take, he will take no two successive sticks.(i.e. if he is taking a stick i, he will not take i-1 and i+1 sticks.)
So given N, we need to calculate how many different set of sticks he could select. He need to take at least stick.
Example : Let N=3 then answer is 4.
The 4 sets are: (1, 3), (1), (2), and (3)
Main problem is that I want solution better than simple recursion. Can their be any formula for it? As am not able to crack it
It's almost identical to Fibonacci. The final solution is actually fibonacci(N)-1, but let's explain it in terms of actual sticks.
To begin with we disregard from the fact that he needs to pick up at least 1 stick. The solution in this case looks as follows:
If N = 0, there is 1 solution (the solution where he picks up 0 sticks)
If N = 1, there are 2 solutions (pick up the stick, or don't)
Otherwise he can choose to either
pick up the first stick and recurse on N-2 (since the second stick needs to be discarded), or
leave the first stick and recurse on N-1
After this computation is finished, we remove 1 from the result to avoid counting the case where he picks up 0 sticks in total.
Final solution in pseudo code:
int numSticks(int N) {
return N == 0 ? 1
: N == 1 ? 2
: numSticks(N-2) + numSticks(N-1);
}
solution = numSticks(X) - 1;
As you can see numSticks is actually Fibonacci, which can be solved efficiently using for instance memoization.
Let the number of sticks taken by Bob be r.
The problem has a bijection to the number of binary vectors with exactly r 1's, and no two adjacent 1's.
This is solveable by first placing the r 1's , and you are left with exactly n-r 0's to place between them and in the sides. However, you must place r-1 0's between the 1's, so you are left with exactly n-r-(r-1) = n-2r+1 "free" 0's.
The number of ways to arrange such vectors is now given as:
(1) = Choose(n-2r+1 + (r+1) -1 , n-2r+1) = Choose(n-r+1, n-2r+1)
Formula (1) is deriving from number of ways of choosing n-2r+1
elements from r+1 distinct possibilities with replacements
Since we solved it for a specific value of r, and you are interested in all r>=1, you need to sum for each 1<=r<=n
So, the solution of the problem is given by the close formula:
(2) = Sum{ Choose(n-r+1, n-2r+1) | for each 1<=r<=n }
Disclaimer:
(A close variant of the problem with fixed r was given as HW in the course I am TAing this semester, main difference is the need to sum the various values of r.

How to test if one set of (unique) integers belongs to another set, efficiently?

I'm writing a program where I'm having to test if one set of unique integers A belongs to another set of unique numbers B. However, this operation might be done several hundred times per second, so I'm looking for an efficient algorithm to do it.
For example, if A = [1 2 3] and B = [1 2 3 4], it is true, but if B = [1 2 4 5 6], it's false.
I'm not sure how efficient it is to just sort and compare, so I'm wondering if there are any more efficient algorithms.
One idea I came up with, was to give each number n their corresponding n'th prime: that is 1 = 2, 2 = 3, 3 = 5, 4 = 7 etc. Then I could calculate the product of A, and if that product is a factor of the similar product of B, we could say that A is a subset of similar B with certainty. For example, if A = [1 2 3], B = [1 2 3 4] the primes are [2 3 5] and [2 3 5 7] and the products 2*3*5=30 and 2*3*5*7=210. Since 210%30=0, A is a subset of B. I'm expecting the largest integer to be couple of million at most, so I think it's doable.
Are there any more efficient algorithms?
The asymptotically fastest approach would be to just put each set in a hash table and query each element, which is O(N) time. You cannot do better (since it will take that much time to read the data).
Most set datastructures already support expected and/or amortized O(1) query time. Some languages even support this operation. For example in python, you could just do
A < B
Of course the picture changes drastically depending on what you mean by "this operation is repeated". If you have the ability to do precalculations on the data as you add it to the set (which presumably you have the ability to do so), this will allow you to subsume the minimal O(N) time into other operations such as constructing the set. But we can't advise without knowing much more.
Assuming you had full control of the set datastructure, your approach to keep a running product (whenever you add an element, you do a single O(1) multiplication) is a very good idea IF there exists a divisibility test that is faster than O(N)... in fact your solution is really smart, because we can just do a single ALU division and hope we're within float tolerance. Do note however this will only allow you roughly a speedup factor of 20x max I think, since 21! > 2^64. There might be tricks to play with congruence-modulo-an-integer, but I can't think of any. I have a slight hunch though that there is no divisibility test that is faster than O(#primes), though I'd like to be proved wrong!
If you are doing this repeatedly on duplicates, you may benefit from caching depending on what exactly you are doing; give each set a unique ID (though since this makes updates hard, you may ironically wish to do something exactly like your scheme to make fingerprints, but mod max_int_size with detection-collision). To manage memory, you can pin extremely expensive set comparison (e.g. checking if a giant set is part of itself) into the cache, while otherwise using a most-recent policy if you run into memory issues. This nice thing about this is it synergizes with an element-by-element rejection test. That is, you will be throwing out sets quickly if they don't have many overlapping elements, but if they have many overlapping elements the calculations will take a long time, and if you repeat these calculations, caching could come in handy.
Let A and B be two sets, and you want to check if A is a subset of B. The first idea that pops into my mind is to sort both sets and then simply check if every element of A is contained in B, as following:
Let n_A and n_B be the cardinality of A and B, respectively. Let i_A = 1, i_B = 1. Then the following algorithm (that is O(n_A + n_B)) will solve the problem:
// A and B assumed to be sorted
i_A = 1;
i_B = 1;
n_A = size(A);
n_B = size(B);
while (i_A <= n_A) {
while (A[i_A] > B[i_B]) {
i_B++;
if (i_B > n_B) return false;
}
if (A[i_A] != B[i_B}) return false;
i_A++;
}
return true;
The same thing, but in a more functional, recursive way (some will find the previous easier to understand, others might find this one easier to understand):
// A and B assumed to be sorted
function subset(A, B)
n_A = size(A)
n_B = size(B)
function subset0(i_A, i_B)
if (i_A > n_A) true
else if (i_B > n_B) false
else
if (A[i_A] <= B[i_B]) return (A[i_A] == B[i_B]) && subset0(i_A + 1, i_B + 1);
else return subset0(i_A, i_B + 1);
subset0(1, 1)
In this last example, notice that subset0 is tail recursive, since if (A[i_A] == B[i_B]) is false then there will be no recursive call, otherwise, if (A[i_A] == B[i_B]) is true, than there's no need to keep this information, since the result of true && subset0(...) is exactly the same as subset0(...). So, any smart compiler will be able to transform this into a loop, avoiding stack overflows or any performance hits caused by function calls.
This will certainly work, but we might be able to optimize it a lot in the average case if you have and provide more information about your sets, such as the probability distribution of the values in the sets, if you somehow expect the answer to be biased (ie, it will more often be true, or more often be false), etc.
Also, have you already written any code to actually measure its performance? Or are you trying to pre-optimize?
You should start by writing the simplest and most straightforward solution that works, and measure its performance. If it's not already satisfactory, only then you should start trying to optimize it.
I'll present an O(m+n) time-per-test algorithm. But first, two notes regarding the problem statement:
Note 1 - Your edits say that set sizes may be a few thousand, and numbers may range up to a million or two.
In following, let m, n denote the sizes of sets A, B and let R denote the size of the largest numbers allowed in sets.
Note 2 - The multiplication method you proposed is quite inefficient. Although it uses O(m+n) multiplies, it is not an O(m+n) method because the product lengths are worse than O(m) and O(n), so it would take more than O(m^2 + n^2) time, which is worse than the O(m ln(m) + n ln(n)) time required for sorting-based methods, which in turn is worse than the O(m+n) time of the following method.
For the presentation below, I suppose that sets A, B can completely change between tests, which you say can occur several hundred times per second. If there are partial changes, and you know which p elements change in A from one test to next, and which q change in B, then the method can be revised to run in O(p+q) time per test.
Step 0. (Performed one time only, at outset.) Clear an array F, containing R bits or bytes, as you prefer.
Step 1. (Initial step of per-test code.) For i from 0 to n-1, set F[B[i]], where B[i] denotes the i'th element of set B. This is O(n).
Step 2. For i from 0 to m-1, { test F[A[i]]. If it is clear, report that A is not a subset of B, and go to step 4; else continue }. This is O(m).
Step 3. Report that A is a subset of B.
Step 4. (Clear used bits) For i from 0 to n-1, clear F[B[i]]. This is O(n).
The initial step (clearing array F) is O(R) but steps 1-4 amount to O(m+n) time.
Given the limit on the size of the integers, if the set of B sets is small and changes seldom, consider representing the B sets as bitsets (bit arrays indexed by integer set member). This doesn't require sorting, and the test for each element is very fast.
If the A members are sorted and tend to be clustered together, then get another speedup by testing all the element in one word of the bitset at a time.

sorting algorithm where pairwise-comparison can return more information than -1, 0, +1

Most sort algorithms rely on a pairwise-comparison the determines whether A < B, A = B or A > B.
I'm looking for algorithms (and for bonus points, code in Python) that take advantage of a pairwise-comparison function that can distinguish a lot less from a little less or a lot more from a little more. So perhaps instead of returning {-1, 0, 1} the comparison function returns {-2, -1, 0, 1, 2} or {-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5} or even a real number on the interval (-1, 1).
For some applications (such as near sorting or approximate sorting) this would enable a reasonable sort to be determined with less comparisons.
The extra information can indeed be used to minimize the total number of comparisons. Calls to the super_comparison function can be used to make deductions equivalent to a great number of calls to a regular comparsion function. For example, a much-less-than b and c little-less-than b implies a < c < b.
The deductions cans be organized into bins or partitions which can each be sorted separately. Effectively, this is equivalent to QuickSort with n-way partition. Here's an implementation in Python:
from collections import defaultdict
from random import choice
def quicksort(seq, compare):
'Stable in-place sort using a 3-or-more-way comparison function'
# Make an n-way partition on a random pivot value
segments = defaultdict(list)
pivot = choice(seq)
for x in seq:
ranking = 0 if x is pivot else compare(x, pivot)
segments[ranking].append(x)
seq.clear()
# Recursively sort each segment and store it in the sequence
for ranking, segment in sorted(segments.items()):
if ranking and len(segment) > 1:
quicksort(segment, compare)
seq += segment
if __name__ == '__main__':
from random import randrange
from math import log10
def super_compare(a, b):
'Compare with extra logarithmic near/far information'
c = -1 if a < b else 1 if a > b else 0
return c * (int(log10(max(abs(a - b), 1.0))) + 1)
n = 10000
data = [randrange(4*n) for i in range(n)]
goal = sorted(data)
quicksort(data, super_compare)
print(data == goal)
By instrumenting this code with the trace module, it is possible to measure the performance gain. In the above code, a regular three-way compare uses 133,000 comparisons while a super comparison function reduces the number of calls to 85,000.
The code also makes it easy to experiment with a variety comparison functions. This will show that naïve n-way comparison functions do very little to help the sort. For example, if the comparison function returns +/-2 for differences greater than four and +/-1 for differences four or less, there is only a modest 5% reduction in the number of comparisons. The root cause is that the course grained partitions used in the beginning only have a handful of "near matches" and everything else falls in "far matches".
An improvement to the super comparison is to covers logarithmic ranges (i.e. +/-1 if within ten, +/-2 if within a hundred, +/- if within a thousand.
An ideal comparison function would be adaptive. For any given sequence size, the comparison function should strive to subdivide the sequence into partitions of roughly equal size. Information theory tells us that this will maximize the number of bits of information per comparison.
The adaptive approach makes good intuitive sense as well. People should first be partitioned into love vs like before making more refined distinctions such as love-a-lot vs love-a-little. Further partitioning passes should each make finer and finer distinctions.
You can use a modified quick sort. Let me explain on an example when you comparison function returns [-2, -1, 0, 1, 2]. Say, you have an array A to sort.
Create 5 empty arrays - Aminus2, Aminus1, A0, Aplus1, Aplus2.
Pick an arbitrary element of A, X.
For each element of the array, compare it with X.
Depending on the result, place the element in one of the Aminus2, Aminus1, A0, Aplus1, Aplus2 arrays.
Apply the same sort recursively to Aminus2, Aminus1, Aplus1, Aplus2 (note: you don't need to sort A0, as all he elements there are equal X).
Concatenate the arrays to get the final result: A = Aminus2 + Aminus1 + A0 + Aplus1 + Aplus2.
It seems like using raindog's modified quicksort would let you stream out results sooner and perhaps page into them faster.
Maybe those features are already available from a carefully-controlled qsort operation? I haven't thought much about it.
This also sounds kind of like radix sort except instead of looking at each digit (or other kind of bucket rule), you're making up buckets from the rich comparisons. I have a hard time thinking of a case where rich comparisons are available but digits (or something like them) aren't.
I can't think of any situation in which this would be really useful. Even if I could, I suspect the added CPU cycles needed to sort fuzzy values would be more than those "extra comparisons" you allude to. But I'll still offer a suggestion.
Consider this possibility (all strings use the 27 characters a-z and _):
11111111112
12345678901234567890
1/ now_is_the_time
2/ now_is_never
3/ now_we_have_to_go
4/ aaa
5/ ___
Obviously strings 1 and 2 are more similar that 1 and 3 and much more similar than 1 and 4.
One approach is to scale the difference value for each identical character position and use the first different character to set the last position.
Putting aside signs for the moment, comparing string 1 with 2, the differ in position 8 by 'n' - 't'. That's a difference of 6. In order to turn that into a single digit 1-9, we use the formula:
digit = ceiling(9 * abs(diff) / 27)
since the maximum difference is 26. The minimum difference of 1 becomes the digit 1. The maximum difference of 26 becomes the digit 9. Our difference of 6 becomes 3.
And because the difference is in position 8, out comparison function will return 3x10-8 (actually it will return the negative of that since string 1 comes after string 2.
Using a similar process for strings 1 and 4, the comparison function returns -5x10-1. The highest possible return (strings 4 and 5) has a difference in position 1 of '-' - 'a' (26) which generates the digit 9 and hence gives us 9x10-1.
Take these suggestions and use them as you see fit. I'd be interested in knowing how your fuzzy comparison code ends up working out.
Considering you are looking to order a number of items based on human comparison you might want to approach this problem like a sports tournament. You might allow each human vote to increase the score of the winner by 3 and decrease the looser by 3, +2 and -2, +1 and -1 or just 0 0 for a draw.
Then you just do a regular sort based on the scores.
Another alternative would be a single or double elimination tournament structure.
You can use two comparisons, to achieve this. Multiply the more important comparison by 2, and add them together.
Here is a example of what I mean in Perl.
It compares two array references by the first element, then by the second element.
use strict;
use warnings;
use 5.010;
my #array = (
[a => 2],
[b => 1],
[a => 1],
[c => 0]
);
say "$_->[0] => $_->[1]" for sort {
($a->[0] cmp $b->[0]) * 2 +
($a->[1] <=> $b->[1]);
} #array;
a => 1
a => 2
b => 1
c => 0
You could extend this to any number of comparisons very easily.
Perhaps there's a good reason to do this but I don't think it beats the alternatives for any given situation and certainly isn't good for general cases. The reason? Unless you know something about the domain of the input data and about the distribution of values you can't really improve over, say, quicksort. And if you do know those things, there are often ways that would be much more effective.
Anti-example: suppose your comparison returns a value of "huge difference" for numbers differing by more than 1000, and that the input is {0, 10000, 20000, 30000, ...}
Anti-example: same as above but with input {0, 10000, 10001, 10002, 20000, 20001, ...}
But, you say, I know my inputs don't look like that! Well, in that case tell us what your inputs really look like, in detail. Then someone might be able to really help.
For instance, once I needed to sort historical data. The data was kept sorted. When new data were added it was appended, then the list was run again. I did not have the information of where the new data was appended. I designed a hybrid sort for this situation that handily beat qsort and others by picking a sort that was quick on already sorted data and tweaking it to be fast (essentially switching to qsort) when it encountered unsorted data.
The only way you're going to improve over the general purpose sorts is to know your data. And if you want answers you're going to have to communicate that here very well.

Resources