Earlier today I asked if there's an idiomatic way to count elements matching predicate function in Mathematica, as I was concerned with performance.
My initial approach for a given predicate pred was the following:
PredCount1[lst_, pred_] := Length#Select[lst, pred];
and I got a suggestion to instead use
PredCount2[lst_, pred_] := Count[lst, x_/;pred#x];
I started profiling these functions, with different lst sizes and pred functions, and added two more definitions:
PredCount3[lst_, pred_] := Count[Thread#pred#lst, True];
PredCount4[lst_, pred_] := Total[If[pred##, 1, 0] & /# lst];
My data samples were ranges between 1 and 10 million elements, and my test functions were EvenQ, #<5& and PrimeQ. The following graphs demonstrate time taken.
PredCount2 is slowest, 3 and 4 duke it out.
Comparison predicate: #<5&
I've selected this function, because it's close to what I need in my actual problem. Don't worry that this is a silly test function, it actually proves that the 4th function has some merit, which I actually ended up using it in my solution.
Same as EvenQ, but 3 is clearly slower than 4.
This is just bizarre. Everything is flipped. I'm not suspecting caching as the culprit here, since worst values are for the function computed last.
So, what's the right (fastest) way to count the number of elements in a list, that match a given predicate function?

You are seeing the result of auto-compilation.
First, for a Listable function such as EvenQ and PrimeQ use of Thread is unnecessary:
EvenQ[{1, 2, 3}]
{False, True, False}
This also explains why PredCount3 performs well on these functions. (They are internally optimized for threading over a list.)
Now let us look at timings.
dat = RandomInteger[1*^6, 1*^6];
test = # < 5 &;
First#Timing[#[dat, test]] & /# {PredCount1, PredCount2, PredCount3, PredCount4}
{0.343, 0.437, 0.25, 0.047}
If we change a System Option to prevent auto-compilation within Map and run the test again:
SetSystemOptions["CompileOptions" -> {"MapCompileLength" -> Infinity}]
First#Timing[#[dat, test]] & /# {PredCount1, PredCount2, PredCount3, PredCount4}
{0.343, 0.452, 0.234, 0.765}
You can clearly see that without compilation PredCount4 is much slower. In short, if your test function can be compiled by Mathematica this is a good option.
Here are some other examples of fast counting using numeric functions.

The nature of the integers in the list can have a significant effect on the achievable timings. The use of Tally can improve performance if the range of the integers is constrained.
(* Count items in the list matching predicate, pred *)
PredCountID[lst_, pred_] :=
Select[Tally#lst, pred#First## &]\[Transpose] // Last // Total
(* Define the values over which to check timings *)
ranges = {100, 1000, 10000, 100000, 1000000};
sizes = {100, 1000, 10000, 100000, 1000000, 10000000,100000000};
For PrimeQ this function gives the following timings:
Showing that even in a 10^8 sized list, Primes can be counted in less than a tenth of a second if they are from the set of integers, of {0,...,100000} and below the resolution of Timing if they are within a small range such as 1 to 100.
Because the predicate only has to be applied over the set of Tally values, this approach is relatively insensitive to the exact predicate function.


How to construct a list of all Fibonacci numbers less than n in Mathematica

I would like to write a Mathematica function that constructs a list of all Fibonacci numbers less than n. Moreover, I would like to do this as elegantly and functionally as possible(so without an explicit loop).
Conceptually I want to take an infinite list of the natural numbers, map Fib[n] onto it, and then take elements from this list while they are less than n. How can I do this in Mathematica?
The first part can be done fairly easily in Mathematica. Below, I provide two functions nextFibonacci, which provides the next Fibonacci number greater than the input number (just like NextPrime) and fibonacciList, which provides a list of all Fibonacci numbers less than the input number.
ClearAll[nextFibonacci, fibonacciList]
nextFibonacci[m_] := Fibonacci[
NArgMax[{n, 1/Sqrt[5] (GoldenRatio^n - (-1)^n GoldenRatio^-n) <= m, n ∈ Integers}, n]
] + 1
nextFibonacci[1] := 2;
fibonacciList[m_] := Fibonacci#
Range[0, Block[{n},
NArgMax[{n, 1/Sqrt[5] (GoldenRatio^n - (-1)^n GoldenRatio^-n) < m, n ∈ Integers}, n]
Now you can do things like:
(* 21 *)
(* {0, 1, 1, 2, 3, 5, 8, 13, 21, 34} *)
The second part though, is tricky. What you're looking for is a Haskell type lazy evaluation that will only evaluate if and when necessary (as otherwise, you can't hold an infinite list in memory). For example, something like (in Haskell):
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
which then allows you to do things like
take 10 fibs
-- [0,1,1,2,3,5,8,13,21,34]
takeWhile (<100) fibs
-- [0,1,1,2,3,5,8,13,21,34,55,89]
Unfortunately, there is no built-in support for what you want. However, you can extend Mathematica to implement lazy style lists as shown in this answer, which was also implemented as a package. Now that you have all the pieces that you need, I'll let you work on this yourself.
If you grab my Lazy package from GitHub, your solution is as simple as:
LazySource[Fibonacci] ~TakeWhile~ ((# < 1000) &) // List
If you want to slightly more literally implement your original description
Conceptually I want to take an infinite list of the natural numbers, map Fib[n] onto it, and then take elements from this list while they are less than n.
you could do it as follows:
Fibonacci ~Map~ Lazy[Integers] ~TakeWhile~ ((# < 1000) &) // List
To prove that this is completely lazy, try the previous example without the // List on the end. You'll see that it stops with the (rather ugly) form:
LazyList[Fibonacci[First[LazyList[1, LazySource[#1 &, 2]]]],
Fibonacci /# Rest[LazyList[1, LazySource[#1 &, 2]]]]],
Rest[LazyList[Fibonacci[First[LazyList[1, LazySource[#1 &, 2]]]],
Fibonacci /# Rest[LazyList[1, LazySource[#1 &, 2]]]]], #1 <
1000 &]]
This consists of a LazyList[] expression whose first element is the first value of the expression that you're lazily evaluating and whose second element is instructions for how to continue the expansion.
It's a little bit inefficient to continually call Fibonacci[n] all the time, especially as n starts getting large. It's actually possible to construct a lazy generator that will calculate the current value of the Fibonacci sequence as we stream:
LazyFibonacci[] ~TakeWhile~ ((# < 1000)&) // List
Finally, we could generalize this up to a more abstract generating function that takes an initial value for an accumulator, a List of Rules to compute the accumulator's value for the next step and a List of Rules to compute the result from the current accumulator value.
LazyGenerator[init_, step_, extract_] :=
LazyList[Evaluate[init /. extract],
LazyGenerator[init /. step, step, extract]]
And could use it to generate the Fibonacci sequence as follows:
LazyGenerator[{1, 1}, {a_, b_} :> {b, a + b}, {a_, b_} :> a]
Ok, I hope I understood the question. But please note, I am not pure math major, I am mechanical engineering student. But this sounded interesting. So I looked up the formula and this is what I can come up with now. I have to run, but if there is a bug, please let me know and I will fix it.
This manipulate asks for n and then lists all Fibonacci numbers less than n. There is no loop to find how many Fibonacci numbers there are less than n. It uses Reduce to solve for the number of Fibonacci numbers less than n. I take the floor of the result and also threw away a constant that came up with in the solution a complex multiplier.
And then simply makes a table of all these numbers using Mathematica Fibonacci command. So if you enter n=20 it will list 1,1,2,3,5,8,13 and so on. I could do it for infinity as I ran out of memory (I only have 8 GB ram on my pc).
I put the limit for n to 500000 Feel free to edit the code and change it.
Module[{k, m},
k = Floor#N[Assuming[Element[m, Integers] && m > 0,
Reduce[f[m] == n, m]][[2, 1, 2]] /. Complex[0, 2] -> 0];
TableForm#Join[{{"#", "Fibonacci number" }},
Table[{i, Fibonacci[i]}, {i, 1, k}]]
{{n, 3, "n="}, 2, 500000, 1, Appearance -> "Labeled", ImageSize -> Small},
SynchronousUpdating -> False,
ContentSize -> {200, 500}, Initialization :>
\[CurlyPhi][n_] := ((1 + Sqrt[5])/2)^n;
\[Psi][n_] := -(1/\[CurlyPhi][n]);
f[n_] := (\[CurlyPhi][n] - \[Psi][n])/Sqrt[5];
Screen shot
The index k of the Fibonacci number Fk is k=Floor[Log[GoldenRatio,Fk]*Sqrt[5]+1/2]], Hence, the list of Fibonacci numbers less than or equal to n is

no explicit loop to calculate product of list to some modulo in Mathematica

In Mathematica, do I have to use an explicit loop to calculate the product of elements in a given list (potentially very long) modulo to another number?
Please teach me your elegant approach if you do have. Thanks!
Just to give an example
The above is very inefficient, because while calculating the products, one could have taken the modulo to make the multipliers smaller.
Edit 2
I guess my question relates to how to replace for loop for
Module[{ret = initial_value}, For[i = 1, i <= Length[list], i++, ret = general_function[list[[i]],ret]; ret]
given a general function general_function and a list list.
For long lists a divide-and-conquer is typically faster. The idea is to compute the times-mod for the first and second halves, multiply that, and take the mod.
Here is an example. We'll use a list of 10^6 integers, all between 0 and 10^10.
len = 6;
max = 10;
list = RandomInteger[10^max, 10^len];
Multiplying and taking the modulus, for a slightly larger mod (I wanted to decrease the likelihood that the result was zero):
In[119]:= Timing[Mod[Times ## list, 32327541]]
Out[119]= {1.360000, 8826597}
Here is a variant of the sort I described. Trial and error tuning indicated that lists of length 2^9 or so were best done nonrecursively, at least for numbers in the size range indicated above.
tmod2[ll_List, m_] := With[{len=Floor[Length[ll]/2]},
Mod[Times ## ll, m],
Mod[tmod2[Take[ll,len],m] * tmod2[Drop[ll,len],m], m]]]
In[120]:= Timing[tmod2[list, 32327541]]
Out[120]= {0.310000, 8826597}
When I increase the list length to 10^7 and allow ints from 0 to 10^20, the first method takes 50 seconds and the second one takes 5 seconds. So clearly the scaling is working to our advantage.
For situations where an iteration interleaving two operations might be preferred to divide-and-conquer, one might use Fold as below.
tmod3[ll_List, m_] := Fold[Mod[#1*#2,m]&, First[ll], Rest[ll]]
While not competitive with tmod2 on long lists, this is faster than multiplying out everything prior to invoking Mod. For length 10^7 and max element 0f 10^20 it takes around 8 seconds to do what tmod2 did in 5.
Why not use Times? The following
will probably be the most efficient. From a recent WRI blog post,
Times knows a clever binary splitting trick that can be used when you have a large number of integer arguments. It is faster to recursively split the arguments into two smaller products, (1*2*…32767)(32768*…*65536), rather than working through the arguments from first to last. It still has to do the same number of multiplications, but fewer of them involve very big integers, and so, on average, are quicker to do
I'm assuming that list in your question is just an example. If you really have to take the product of n consecutive integers starting with 1, then Factorial will be the fastest. i.e.,
Mod[2000!, 32327]
This appears to be as much as twice as fast as Daniel's code on my system:
list = RandomInteger[1*^20, 1*^7];
m = 32327501;
Mod[Times ## Mod[Times ### Partition[list, 50, 50, 1, {}], m], m] // AbsoluteTiming
tmod2[list, m] // AbsoluteTiming
{1.5800904, 21590133}
{3.1081778, 21590133}
Different partition lengths could be used to tune this for your system and work set.

On PackedArray, looking for advice for using them

I have not used PackedArray before, but just started looking at using them from reading some discussion on them here today.
What I have is lots of large size 1D and 2D matrices of all reals, and no symbolic (it is a finite difference PDE solver), and so I thought that I should take advantage of using PackedArray.
I have an initialization function where I allocate all the data/grids needed. So I went and used ToPackedArray on them. It seems a bit faster, but I need to do more performance testing to better compare speed before and after and also compare RAM usage.
But while I was looking at this, I noticed that some operations in M automatically return lists in PackedArray already, and some do not.
For example, this does not return packed array
a = Table[RandomReal[], {5}, {5}];
But this does
a = RandomReal[1, {5, 5}];
and this does
a = Table[0, {5}, {5}];
b = ListConvolve[ {{0, 1, 0}, {1, 4, 1}, {0, 1, 1}}, a, 1];
and also matrix multiplication does return result in packed array
a = Table[0, {5}, {5}];
b = a.a;
But element wise multiplication does not
b = a*a;
My question : Is there a list somewhere which documents which M commands return PackedArray vs. not? (assuming data meets the requirements, such as Real, not mixed, no symbolic, etc..)
Also, a minor question, do you think it will be better to check first if a list/matrix created is already packed before calling calling ToPackedArray on it? I would think calling ToPackedArray on list already packed will not cost anything, as the call will return right away.
update (1)
Just wanted to mention, that just found that PackedArray symbols not allowed in a demo CDF as I got an error uploading one with one. So, had to remove all my packing code out. Since I mainly write demos, now this topic is just of an academic interest for me. But wanted to thank everyone for time and good answers.
There isn't a comprehensive list. To point out a few things:
Basic operations with packed arrays will tend to remain packed:
In[66]:= a = RandomReal[1, {5, 5}];
In[67]:= Developer`PackedArrayQ /# {a, a.a, a*a}
Out[67]= {True, True, True}
Note above that that my version (8.0.4) doesn't unpack for element-wise multiplication.
Whether a Table will result in a packed array depends on the number of elements:
In[71]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {10}]]
Out[71]= False
In[72]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {11}]]
Out[72]= True
In[73]:= Developer`PackedArrayQ[Table[RandomReal[], {25}, {10}]]
Out[73]= True
On["Packing"] will turn on messages to let you know when things unpack:
In[77]:= On["Packing"]
In[78]:= a = RandomReal[1, 10];
In[79]:= Developer`PackedArrayQ[a]
Out[79]= True
In[80]:= a[[1]] = 0 (* force unpacking due to type mismatch *)
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {10}. >>
Out[80]= 0
Operations that do per-element inspection will usually unpack the array,
In[81]:= a = RandomReal[1, 10];
In[82]:= Position[a, Max[a]]
Developer`FromPackedArray::unpack: Unpacking array in call to Position. >>
Out[82]= {{4}}
There penalty for calling ToPackedArray on an already packed list is small enough that I wouldn't worry about it too much:
In[90]:= a = RandomReal[1, 10^7];
In[91]:= Timing[Do[Identity[a], {10^5}];]
Out[91]= {0.028089, Null}
In[92]:= Timing[Do[Developer`ToPackedArray[a], {10^5}];]
Out[92]= {0.043788, Null}
The frontend prefers packed to unpacked arrays, which can show up when dealing with Dynamic and Manipulate:
In[97]:= Developer`PackedArrayQ[{1}]
Out[97]= False
In[98]:= Dynamic[Developer`PackedArrayQ[{1}]]
Out[98]= True
When looking into performance, focus on cases where large lists are getting unpacked, rather than the small ones. Unless the small ones are in big loops.
This is just an addendum to Brett's answer:
will give you the lengths being used for which a function will return a packed array. So if you did need to pack a small list, as an alternative to using Developer`ToPackedArray you could temporarily set a smaller number for one of the compile options. e.g.
SetSystemOptions["CompileOptions" -> {"TableCompileLength" -> 20}]
Note also some difference between functions which to me at least doesn't seem intuitive so I generally have to test these kind of things whenever I use them rather than instinctively knowing what will work best:
f = # + 1 &;
g[x_] := x + 1;
data = RandomReal[1, 10^6];
Timing[Developer`PackedArrayQ[f /# data]]
{0.131565, True}
Timing[Developer`PackedArrayQ[g /# data]]
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {1000000}.
{1.95083, False}
Another addition to Brett's answer: If a list is a packed array then a ToPackedArray is very fast since this checked quite early. Also you might find this valuable:
In general for numerics stuff look for talks from Rob Knapp and/or Mark Sofroniou.
When I develop numerics codes, I write the function and then use On["Packing"] to make sure that everything is packed that needs to be packed.
Concerning Mike's answer, the threshold has been introduced since for small stuff there is overhead. Where the threshold is is hardware dependent. It might be an idea to write a function that sets these threshold based on measurements done on the computer.

Subset counting algorithm

I have a following problem I want to solve efficiently. I am given a set of k-tuples of Boolean values where I know in advance that some fraction of each of the values in each of the k-tuples is true. For example, I might have the following 4-tuples, where each tuple has at least 60% of it's Boolean values set to true:
(1, 0, 1, 0)
(1, 1, 0, 1)
(0, 0, 1, 0)
I am interested in finding sets of indices that have a particular property: if I look at each of the values in the tuples at the indicated indices, at least the given fraction of those tuples have the corresponding bit set. For example, in the above set of 4-tuples, I could consider the set {0}, since if you look at the zeroth element of each of the above tuples, two-thirds of them are 1, and 2/3 ~= 66% > 60%. I could also consider the set {2} for the same reason. However, I could not consider {1}, since at index 1 only one third of the tuples have a 1 and 1/3 is less than 60%. Similarly, I could not use {0, 2} as a set, because it is not true that at least 60% of the tuples have both bits 0 and 2 set.
My goal is to find all sets for which this property holds. Does anyone have a good algorithm for solving this?
Thank you.
As you've wrote, that can be assumed that architecture is x86_64 and you are looking for implementation performance, cause asymptotic complexity (as it is not going to go under linear - by definition of problem ;) ), I propose following algorithm (C++ like pseudocode):
/* N=16 -> int16; N=8 -> int8 etc. Select N according to input sizes. (maybe N=24 ;) ) */
count_occurences_intN(vector<intN> t, vector<long> &result_counters){
intN counters[2^N]={};
//first, count bit combinations
for_each(v in t)
//second, count bit occurrences, using aggregated data
for(column=0; column<N; ++column){
mask = 1 << column;
long *result_counter_ptr = &(result_counters[column]);
for(v=0; v<2^16; ++v)
if( v & mask )
Than, split your input k-bit vectors into N-bit vectors, and apply above function.
Depending on input size you might improve performance you choosing N=8, N=16, N=24 or applying naive approach.
As you've wrote, you can not assume anything on client side, just implement N={8,16,24} and naive and select one from four implementations depending on size of input.
Make a k-vector of integers, describing how many passes there were for each index. Loop through your set, for each element incrementing the k-vector of passes.
Then figure out the cardinality of your set (either in a separate loop, or in the above one). Then loop through your vector of counts, and emit a pass/fail vector based on your criteria.

sorting algorithm where pairwise-comparison can return more information than -1, 0, +1

Most sort algorithms rely on a pairwise-comparison the determines whether A < B, A = B or A > B.
I'm looking for algorithms (and for bonus points, code in Python) that take advantage of a pairwise-comparison function that can distinguish a lot less from a little less or a lot more from a little more. So perhaps instead of returning {-1, 0, 1} the comparison function returns {-2, -1, 0, 1, 2} or {-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5} or even a real number on the interval (-1, 1).
For some applications (such as near sorting or approximate sorting) this would enable a reasonable sort to be determined with less comparisons.
The extra information can indeed be used to minimize the total number of comparisons. Calls to the super_comparison function can be used to make deductions equivalent to a great number of calls to a regular comparsion function. For example, a much-less-than b and c little-less-than b implies a < c < b.
The deductions cans be organized into bins or partitions which can each be sorted separately. Effectively, this is equivalent to QuickSort with n-way partition. Here's an implementation in Python:
from collections import defaultdict
from random import choice
def quicksort(seq, compare):
'Stable in-place sort using a 3-or-more-way comparison function'
# Make an n-way partition on a random pivot value
segments = defaultdict(list)
pivot = choice(seq)
for x in seq:
ranking = 0 if x is pivot else compare(x, pivot)
# Recursively sort each segment and store it in the sequence
for ranking, segment in sorted(segments.items()):
if ranking and len(segment) > 1:
quicksort(segment, compare)
seq += segment
if __name__ == '__main__':
from random import randrange
from math import log10
def super_compare(a, b):
'Compare with extra logarithmic near/far information'
c = -1 if a < b else 1 if a > b else 0
return c * (int(log10(max(abs(a - b), 1.0))) + 1)
n = 10000
data = [randrange(4*n) for i in range(n)]
goal = sorted(data)
quicksort(data, super_compare)
print(data == goal)
By instrumenting this code with the trace module, it is possible to measure the performance gain. In the above code, a regular three-way compare uses 133,000 comparisons while a super comparison function reduces the number of calls to 85,000.
The code also makes it easy to experiment with a variety comparison functions. This will show that naïve n-way comparison functions do very little to help the sort. For example, if the comparison function returns +/-2 for differences greater than four and +/-1 for differences four or less, there is only a modest 5% reduction in the number of comparisons. The root cause is that the course grained partitions used in the beginning only have a handful of "near matches" and everything else falls in "far matches".
An improvement to the super comparison is to covers logarithmic ranges (i.e. +/-1 if within ten, +/-2 if within a hundred, +/- if within a thousand.
An ideal comparison function would be adaptive. For any given sequence size, the comparison function should strive to subdivide the sequence into partitions of roughly equal size. Information theory tells us that this will maximize the number of bits of information per comparison.
The adaptive approach makes good intuitive sense as well. People should first be partitioned into love vs like before making more refined distinctions such as love-a-lot vs love-a-little. Further partitioning passes should each make finer and finer distinctions.
You can use a modified quick sort. Let me explain on an example when you comparison function returns [-2, -1, 0, 1, 2]. Say, you have an array A to sort.
Create 5 empty arrays - Aminus2, Aminus1, A0, Aplus1, Aplus2.
Pick an arbitrary element of A, X.
For each element of the array, compare it with X.
Depending on the result, place the element in one of the Aminus2, Aminus1, A0, Aplus1, Aplus2 arrays.
Apply the same sort recursively to Aminus2, Aminus1, Aplus1, Aplus2 (note: you don't need to sort A0, as all he elements there are equal X).
Concatenate the arrays to get the final result: A = Aminus2 + Aminus1 + A0 + Aplus1 + Aplus2.
It seems like using raindog's modified quicksort would let you stream out results sooner and perhaps page into them faster.
Maybe those features are already available from a carefully-controlled qsort operation? I haven't thought much about it.
This also sounds kind of like radix sort except instead of looking at each digit (or other kind of bucket rule), you're making up buckets from the rich comparisons. I have a hard time thinking of a case where rich comparisons are available but digits (or something like them) aren't.
I can't think of any situation in which this would be really useful. Even if I could, I suspect the added CPU cycles needed to sort fuzzy values would be more than those "extra comparisons" you allude to. But I'll still offer a suggestion.
Consider this possibility (all strings use the 27 characters a-z and _):
1/ now_is_the_time
2/ now_is_never
3/ now_we_have_to_go
4/ aaa
5/ ___
Obviously strings 1 and 2 are more similar that 1 and 3 and much more similar than 1 and 4.
One approach is to scale the difference value for each identical character position and use the first different character to set the last position.
Putting aside signs for the moment, comparing string 1 with 2, the differ in position 8 by 'n' - 't'. That's a difference of 6. In order to turn that into a single digit 1-9, we use the formula:
digit = ceiling(9 * abs(diff) / 27)
since the maximum difference is 26. The minimum difference of 1 becomes the digit 1. The maximum difference of 26 becomes the digit 9. Our difference of 6 becomes 3.
And because the difference is in position 8, out comparison function will return 3x10-8 (actually it will return the negative of that since string 1 comes after string 2.
Using a similar process for strings 1 and 4, the comparison function returns -5x10-1. The highest possible return (strings 4 and 5) has a difference in position 1 of '-' - 'a' (26) which generates the digit 9 and hence gives us 9x10-1.
Take these suggestions and use them as you see fit. I'd be interested in knowing how your fuzzy comparison code ends up working out.
Considering you are looking to order a number of items based on human comparison you might want to approach this problem like a sports tournament. You might allow each human vote to increase the score of the winner by 3 and decrease the looser by 3, +2 and -2, +1 and -1 or just 0 0 for a draw.
Then you just do a regular sort based on the scores.
Another alternative would be a single or double elimination tournament structure.
You can use two comparisons, to achieve this. Multiply the more important comparison by 2, and add them together.
Here is a example of what I mean in Perl.
It compares two array references by the first element, then by the second element.
use strict;
use warnings;
use 5.010;
my #array = (
[a => 2],
[b => 1],
[a => 1],
[c => 0]
say "$_->[0] => $_->[1]" for sort {
($a->[0] cmp $b->[0]) * 2 +
($a->[1] <=> $b->[1]);
} #array;
a => 1
a => 2
b => 1
c => 0
You could extend this to any number of comparisons very easily.
Perhaps there's a good reason to do this but I don't think it beats the alternatives for any given situation and certainly isn't good for general cases. The reason? Unless you know something about the domain of the input data and about the distribution of values you can't really improve over, say, quicksort. And if you do know those things, there are often ways that would be much more effective.
Anti-example: suppose your comparison returns a value of "huge difference" for numbers differing by more than 1000, and that the input is {0, 10000, 20000, 30000, ...}
Anti-example: same as above but with input {0, 10000, 10001, 10002, 20000, 20001, ...}
But, you say, I know my inputs don't look like that! Well, in that case tell us what your inputs really look like, in detail. Then someone might be able to really help.
For instance, once I needed to sort historical data. The data was kept sorted. When new data were added it was appended, then the list was run again. I did not have the information of where the new data was appended. I designed a hybrid sort for this situation that handily beat qsort and others by picking a sort that was quick on already sorted data and tweaking it to be fast (essentially switching to qsort) when it encountered unsorted data.
The only way you're going to improve over the general purpose sorts is to know your data. And if you want answers you're going to have to communicate that here very well.
