I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form
T={{x1,y1,z1},{x2,y2,z2},....}
and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin.
Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a list of values occupying that bin. At the end I map Total onto the list of bins, summing their contents (I do this because I sometimes want to do other things, like maximize).
I have tried using Gather and other such functions to do this but the above method was ridiculously faster, though perhaps I am using Gather poorly. Anyway It still takes a few minutes to do the sorting by my method and I feel like Mathematica can do better. Does anyone have a nice efficient algorithm handy?
Here is a method based on Szabolcs's post that is about about an order of magnitude faster.
data = RandomReal[5, {500000, 3}];
(*500k values*)
zvalues = data[[All, 3]];
epsilon = 1*^-10;(*prevent 101 index*)
(*rescale and round (x,y) coordinates to index pairs in the 1..100 range*)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
SparseArray[
gb[[All, 1, 1]] ->
Total[gb[[All, All, 2]], {2}]]]; // AbsoluteTiming
Gives about {2.012217, Null}
AbsoluteTiming[
System`SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}];
res3 = SparseArray[indexes -> zvalues];
System`SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
]
Gives about {0.195228, Null}
res3 == res2
True
"TreatRepeatedEntries" -> 1 adds duplicate positions up.
I intend to do a rewrite of the code below because of Szabolcs' readability concerns. Until then, know that if your bins are regular, and you can use Round, Floor, or Ceiling (with a second argument) in place of Nearest, the code below will be much faster. On my system, it tests faster than the GatherBy solution also posted.
Assuming I understand your requirements, I propose:
data = RandomReal[100, {75, 3}];
bins = {0, 20, 40, 60, 80, 100};
Reap[
Sow[{#3, #2}, bins ~Nearest~ #] & ### data,
bins,
Reap[Sow[#, bins ~Nearest~ #2] & ### #2, bins, Tr##2 &][[2]] &
][[2]] ~Flatten~ 1 ~Total~ {3} // MatrixForm
Refactored:
f[bins_] := Reap[Sow[{##2}, bins ~Nearest~ #]& ### #, bins, #2][[2]] &
bin2D[data_, X_, Y_] := f[X][data, f[Y][#2, #2~Total~2 &] &] ~Flatten~ 1 ~Total~ {3}
Use:
bin2D[data, xbins, ybins]
Here's my approach:
data = RandomReal[5, {500000, 3}]; (* 500k values *)
zvalues = data[[All, 3]];
epsilon = 1*^-10; (* prevent 101 index *)
(* rescale and round (x,y) coordinates to index pairs in the 1..100 range *)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
(* approach 1: create bin-matrix first, then fill up elements by adding zvalues *)
res1 = Module[
{result = ConstantArray[0, {100, 100}]},
Do[
AddTo[result[[##]], zvalues[[i]]] & ## indexes[[i]],
{i, Length[indexes]}
];
result
]; // Timing
(* approach 2: gather zvalues by indexes, add them up, convert them to a matrix *)
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
SparseArray[gb[[All, 1, 1]] -> (Total /# gb[[All, All, 2]])]
]; // Timing
res1 == res2
These two approaches (res1 & res2) can handle 100k and 200k elements per second, respectively, on this machine. Is this sufficiently fast, or do you need to run this whole program in a loop?
Here's my approach using the function SelectEquivalents defined in What is in your Mathematica tool bag? which is perfect for a problem like this one.
data = RandomReal[100, {75, 3}];
bins = Range[0, 100, 20];
binMiddles = (Most#bins + Rest#bins)/2;
nearest = Nearest[binMiddles];
SelectEquivalents[
data
,
TagElement -> ({First#nearest[#[[1]]], First#nearest[#[[2]]]} &)
,
TransformElement -> (#[[3]] &)
,
TransformResults -> (Total[#2] &)
,
TagPattern -> Flatten[Outer[List, binMiddles, binMiddles], 1]
,
FinalFunction -> (Partition[Flatten[# /. {} -> 0], Length[binMiddles]] &)
]
If you would want to group according to more than two dimensions you could use in FinalFunction this function to give to the list result the desired dimension (I don't remember where I found it).
InverseFlatten[l_,dimensions_]:= Fold[Partition[#, #2] &, l, Most[Reverse[dimensions]]];
Related
The following was my question given by my teacher,
Generate a sequence of N = 1000 independent observations of random variable with distribution: (c) Exponential with parameter λ = 1 , by
inversion method.
Present graphically obtained sequences(except for those generated in point e) i.e. e.g. (a) i. plot in the coordinates (no. obs.,
value of the obs) ii. plot in the coordinates (obs no n, obs. no n +
i) for i = 1, 2, 3. iii. plot so called covariance function for some
values. i.e. and averages:
I have written the following code,
(*****************************************************************)
(*Task 01(c) and 02(a)*)
(*****************************************************************)
n = 1000;
taskC = Table[-Log[RandomReal[]], {n}];
ListPlot[taskC, AxesLabel->{"No. obs", "value of the obs"}]
i = 1;
ListPlot[Table[
{taskC[[k]], taskC[[k+i]]},
{k, 1, n-i,1}],
AxesLabel->{"obs.no.n", "obs.no.n+1"}]
i++;
ListPlot[Table[
{taskC[[k]], taskC[[k+i]]},
{k, 1, n-i,1}],
AxesLabel-> {"obs.no.n", "obs.no.n+2"}]
i++;
ListPlot[Table[
{taskC[[k]], taskC[[k+i]]},
{k,1,n-i,1}],
AxesLabel->{"obs.no.n", "obs.no.n+3"}]
avg = (1/n)*Sum[taskC[[i]], {i,n}];
ListPlot[Table[1/(n-tau) * Sum[(taskC[[i]]-avg)*(taskC[[i+tau]] - avg), n], {tau, 1,100}],
Joined->True,
AxesLabel->"Covariance Function"]
He has commented,
The plots of co-variance functions should start from 0-shift. Note
that for larger than 0 shifts you are estimating co-variance between
independent observations which is zero, while for 0 shift you are
estimating variance of observation which is large. Thus the contrast
between these two cases is a clear indication that the observations
are uncorrelated.
What did I do wrong?
How can I correct my code?
Zero-shift means calculating the covariance for tau = 0, which is simply the variance.
Labeled[ListPlot[Table[{tau,
1/(n - tau)*Sum[(taskC[[i]] - avg)*(taskC[[i + tau]] - avg), {i, n - tau}]},
{tau, 0, 5}], Filling -> Axis, FillingStyle -> Thick, PlotRange -> All,
Frame -> True, PlotRangePadding -> 0.2, AspectRatio -> 1],
{"Covariance Function K(n)", "n"}, {{Top, Left}, Bottom}]
Variance[taskC]
0.93484
Covariance[taskC, taskC]
0.93484
(* n = 1 *)
Covariance[Most[taskC], Rest[taskC]]
0.00926913
I'm trying to solve the following problem using Mathematica:
What is the smallest positive integer not obtainable from the set {2,3,4,5,6,7,8} via arithmetic operations {+,-,*,/}, exponentiation, and parentheses. Each number in the set must be used exactly once. Unary operations are NOT allowed (1 cannot be converted to -1 with without using a 0, for example).
For example, the number 1073741824000000000000000 is obtainable via (((3+2)*(5+4))/6)^(8+7).
I am a beginner with Mathematica. I have written code that I believe solves the problems for the set {2,3,4,5,6,7} (I obtained 2249 as my answer), but my code is not efficient enough to work with the set {2,3,4,5,6,7,8}. (My code already takes 71 seconds to run on the set {2,3,4,5,6,7})
I would very much appreciate any tips or solutions to solving this harder problem with Mathematica, or general insights as to how I could speed my existing code.
My existing code uses a brute force, recursive approach:
(* this defines combinations for a set of 1 number as the set of that 1 number *)
combinations[list_ /; Length[list] == 1] := list
(* this tests whether it's ok to exponentiate two numbers including (somewhat) arbitrary restrictions to prevent overflow *)
oktoexponent[number1_, number2_] :=
If[number1 == 0, number2 >= 0,
If[number1 < 0,
(-number1)^number2 < 10000 \[And] IntegerQ[number2],
number1^number2 < 10000 \[And] IntegerQ[number2]]]
(* this takes a list and removes fractions with denominators greater than 100000 *)
cleanup[list_] := Select[list, Denominator[#] < 100000 &]
(* this defines combinations for a set of 2 numbers - and returns a set of all possible numbers obtained via applications of + - * / filtered by oktoexponent and cleanup rules *)
combinations[list_ /; Length[list] == 2 && Depth[list] == 2] :=
cleanup[DeleteCases[#, Null] &#DeleteDuplicates#
{list[[1]] + list[[2]],
list[[1]] - list[[2]],
list[[2]] - list[[1]],
list[[1]]*list[[2]],
If[oktoexponent[list[[1]], list[[2]]], list[[1]]^list[[2]],],
If[oktoexponent[list[[2]], list[[1]]], list[[2]]^list[[1]],],
If[list[[2]] != 0, list[[1]]/list[[2]],],
If[list[[1]] != 0, list[[2]]/list[[1]],]}]
(* this extends combinations to work with sets of sets *)
combinations[
list_ /; Length[list] == 2 && Depth[list] == 3] :=
Module[{m, n, list1, list2},
list1 = list[[1]];
list2 = list[[2]];
m = Length[list1]; n = Length[list2];
cleanup[
DeleteDuplicates#
Flatten#Table[
combinations[{list1[[i]], list2[[j]]}], {i, m}, {j, n}]]]
(* for a given set, partition returns the set of all partitions into two non-empty subsets *)
partition[list_] := Module[{subsets},
subsets = Select[Subsets[list], # != {} && # != list &];
DeleteDuplicates#
Table[Sort#{subsets[[i]], Complement[list, subsets[[i]]]}, {i,
Length[subsets]}]]
(* this finally extends combinations to work with sets of any size *)
combinations[list_ /; Length[list] > 2] :=
Module[{partitions, k},
partitions = partition[list];
k = Length[partitions];
cleanup[Sort#
DeleteDuplicates#
Flatten#(combinations /#
Table[{combinations[partitions[[i]][[1]]],
combinations[partitions[[i]][[2]]]}, {i, k}])]]
Timing[desiredset = combinations[{2, 3, 4, 5, 6, 7}];]
{71.5454, Null}
Complement[
Range[1, 3000], #] &#(Cases[#, x_Integer /; x > 0 && x <= 3000] &#
desiredset)
{2249, 2258, 2327, 2509, 2517, 2654, 2789, 2817, 2841, 2857, 2990, 2998}
This is unhelpful, but I'm under my quota for useless babbling today:
(* it turns out the symbolizing + * is not that useful after all *)
f[x_,y_] = x+y
fm[x_,y_] = x-y
g[x_,y_] = x*y
gd[x_,y_] = x/y
(* power properties *)
h[h[a_,b_],c_] = h[a,b*c]
h[a_/b_,n_] = h[a,n]/h[b,n]
h[1,n_] = 1
(* expand simple powers only! *)
(* does this make things worse? *)
h[a_,2] = a*a
h[a_,3] = a*a*a
(* all symbols for two numbers *)
allsyms[x_,y_] := allsyms[x,y] =
DeleteDuplicates[Flatten[{f[x,y], fm[x,y], fm[y,x],
g[x,y], gd[x,y], gd[y,x], h[x,y], h[y,x]}]]
allsymops[s_,t_] := allsymops[s,t] =
DeleteDuplicates[Flatten[Outer[allsyms[#1,#2]&,s,t]]]
Clear[reach];
reach[{}] = {}
reach[{n_}] := reach[n] = {n}
reach[s_] := reach[s] = DeleteDuplicates[Flatten[
Table[allsymops[reach[i],reach[Complement[s,i]]],
{i,Complement[Subsets[s],{ {},s}]}]]]
The general idea here is to avoid calculating powers (which are
expensive and non-commutative), while at the same time using the
commutativity/associativity of addition/multiplication to reduce the
cardinality of reach[].
Code above also available at:
https://github.com/barrycarter/bcapps/blob/master/playground.m#L20
along with literally gigabytes of other useless code, data, and humor.
I think the answer to your question lays in the command Groupings. This allows you to create a binary tree of a list. The binary tree is very usefull as each of the operations you allow Plus, Subtract, Times, Divide, Power take two arguments. Eg.
In> Groupings[3,2]
Out> {List[List[1,2],3],List[1,List[2,3]]}
Thus all we need to do is replace List with any combination of the allowed operations.
However, Groupings seems to be almighty as it has an option to do this. Imagine you have two functions foo and bar and both take 2 arguments, then you can make all combinations as :
In> Groupings[3,{foo->2,bar->2}]
Out> {foo[foo[1,2],3],foo[1,foo[2,3]],foo[bar[1,2],3],foo[1,bar[2,3]],
bar[foo[1,2],3],bar[1,foo[2,3]],bar[bar[1,2],3],bar[1,bar[2,3]]}
Now it is possible to count the amount of combinations we have :
In> Groupings[Permutations[#],
{Plus->2,Subtract->2,Times->2,Divide->2,Power->2}
] &# {a,b,c,d,e};
In> Length#%
In> DeleteDuplicates#%%
In> Length#%
Out> 1050000
Out> 219352
This means that for 5 distinct numbers, we have 219352 unique combinations.
Sadly, many of these combinations cannot be evaluated due to overflow, division by zero or underflow. However, it is not evident which ones to remove. The value a^(b^(c^(d^e))) could be humongous, or just small. Fractional powers could result in perfect roots and divisions by large numbers can become perfect.
In> Groupings[Permutations[#],
{Plus->2,Subtract->2,Times->2,Divide->2,Power->2}
] &# {2, 3, 4};
In> Union[Cases[%, _?(IntegerQ[#] && # >= 0 &)]];
In> Split[%, #2 - #1 <= 1 &][[1]]
Out> {1, 2, 3, 4, 5, 6}
Does anyone know why the following random distributions of matrices generate different plots? (This is code to generate a plot of the PDFs for first cells from a set of 10x10 matrices sampled using an inverse Wishart distribution; amazingly, the plots are different depending on the way one performs the matrix inverse - and it seems the right plots are obtained by Inverse[_], why?)
base code:
<< MultivariateStatistics`;
Module[{dist, p, k, data, samples, scale, graphics, distribution},
p = 10;
k = 13;
samples = 500;
dist = WishartDistribution[IdentityMatrix[p], k];
(* a samples x p x p array *)
data = Inverse[#] & /# RandomVariate[dist, samples];
(* distribution graphics *)
distribution[i_, j_] := Module[{fiber, f, mean, rangeAll, colorHue},
fiber = data[[All, i, j]];
dist = SmoothKernelDistribution[fiber];
f = PDF[dist];
Plot[f[z], {z, -2, 2},
PlotLabel -> ("Mean=" <> ToString[Mean[fiber]]),
PlotRange -> All]
];
Grid # Table[distribution[i, j], {i, 1, 3}, {j, 1, 5}]
]
code variant: above, change line
data = Inverse[#] & /# RandomVariate[dist, samples];
by this
data = #^(-1) & /# RandomVariate[dist, samples];
and you will see the plotted distributions are different.
Inverse computes a matrix inverse, i.e. if a is a square matrix, then Inverse[a].a will be the identity matrix.
a^(-1) is the same as 1/a, i.e. it gives you the reciprocal of each matrix element. The ^ operator gives powers element-wise. If you want a matrix power, use MatrixPower.
I want to test if a list contains consecutive integers.
consQ[a_] := Module[
{ret = True},
Do[If[i > 1 && a[[i]] != a[[i - 1]] + 1, ret = False; Break[]], {i,
1, Length[a]}]; ret]
Although the function consQ does the job, I wonder if there is a better ( shorter, faster ) method of doing this, preferably using functional programming style.
EDIT:
The function above maps lists with consecutive integers in decreasing sequence to False. I would like to change this to True.
Szablics' solution is probably what I'd do, but here's an alternative:
consQ[a : {___Integer}] := Most[a] + 1 === Rest[a]
consQ[_] := False
Note that these approaches differ in how they handle the empty list.
You could use
consQ[a_List ? (VectorQ[#, IntegerQ]&)] := Union#Differences[a] === {1}
consQ[_] = False
You may want to remove the test for integers if you know that every list you pass to it will only have integers.
EDIT: A little extra: if you use a very old version that doesn't have Differences, or wonder how to implement it,
differences[a_List] := Rest[a] - Most[a]
EDIT 2: The requested change:
consQ[a : {Integer___}] := MatchQ[Union#Differences[a], {1} | {-1} | {}]
consQ[_] = False
This works with both increasing and decreasing sequences, and gives True for a list of size 1 or 0 as well.
More generally, you can test if the list of numbers are equally spaced with something like equallySpacedQ[a_List] := Length#Union#Differences[a] == 1
I think the following is also fast, and comparing reversed lists does not take longer:
a = Range[10^7];
f[a_] := Range[Sequence ## ##, Sign[-#[[1]] + #[[2]]]] &#{a[[1]], a[[-1]]} == a;
Timing[f[a]]
b = Reverse#a;
Timing[f[b]]
Edit
A short test for the fastests solutions so far:
a = Range[2 10^7];
Timing#consQSzab#a
Timing#consQBret#a
Timing#consQBeli#a
(*
{6.5,True}
{0.703,True}
{0.203,True}
*)
I like the solutions by the other two, but I'd be concerned about very long lists. Consider the data
d:dat[n_Integer?Positive]:= d = {1}~Join~Range[1, n]
which has its non-sequential point at the very beginning. Setting consQ1 for Brett's and consQ2 for Szabolcs, I get
AbsoluteTiming[ #[dat[ 10000 ] ]& /# {consQ1, consQ2}
{ {0.000110, False}, {0.001091, False} }
Or, roughly a ten times difference between the two, which stays relatively consistent with multiple trials. This time can be cut in roughly half by short-circuiting the process using NestWhile:
Clear[consQ3]
consQ3[a : {__Integer}] :=
Module[{l = Length[a], i = 1},
NestWhile[# + 1 &, i,
(#2 <= l) && a[[#1]] + 1 == a[[#2]] &,
2] > l
]
which tests each pair and only continues if they return true. The timings
AbsoluteTiming[consQ3[dat[ 10000 ]]]
{0.000059, False}
with
{0.000076, False}
for consQ. So, Brett's answer is fairly close, but occasionally, it will take twice as long.
Edit: I moved the graphs of the timing data to a Community Wiki answer.
Fold can be used in a fairly concise expression that runs very quickly:
consQFold[a_] := (Fold[If[#2 == #1 + 1, #2, Return[False]] &, a[[1]]-1, a]; True)
Pattern-matching can be used to provide a very clear expression of intent at the cost of substantially slower performance:
consQMatch[{___, i_, j_, ___}] /; j - i != 1 := False
consQMatch[_] = True;
Edit
consQFold, as written, works in Mathematica v8.0.4 but not in earlier versions of v8 or v7. To correct this problem, there are a couple of alternatives. The first is to explicitly name the Return point:
consQFold[a_] :=
(Fold[If[#2==#1+1, #2, Return[False,CompoundExpression]] &, a[[1]]-1, a]; True)
The second, as suggested by #Mr.Wizard, is to replace Return with Throw / Catch:
consQFold[a_] :=
Catch[Fold[If[#2 == #1 + 1, #2, Throw[False]]&, a[[1]]-1, a]; True]
Since the timing seems to be rather important. I've moved the comparisons between the various methods to this, Community Wiki, answer.
The data used are simply lists of consecutive integers, with a single non-consecutive point, and they're generated via
d : dat[n_Integer?Positive] := (d = {1}~Join~Range[1, n])
d : dat[n_Integer?Positive, p_Integer?Positive] /; p <= n :=
Range[1, p]~Join~{p}~Join~Range[p + 1, n]
where the first form of dat[n] is equivalent to dat[n, 1]. The timing code is simple:
Clear[consQTiming]
Options[consQTiming] = {
NonConsecutivePoints -> {10, 25, 50, 100, 250,500, 1000}};
consQTiming[fcns__, OptionPattern[]]:=
With[{rnd = RandomInteger[{1, #}, 100]},
With[{fcn = #},
Timing[ fcn[dat[10000, #]] & /# rnd ][[1]]/100
] & /# {fcns}
] & /# OptionValue[NonConsecutivePoints]
It generates 100 random integers between 1 and each of {10, 25, 50, 100, 250, 500, 1000} and dat then uses each of those random numbers as the non-consecutive point in a list 10,000 elements long. Each consQ implementation is then applied to each list produced by dat, and the results are averaged. The plotting function is simply
Clear[PlotConsQTimings]
Options[PlotConsQTimings] = {
NonConsecutivePoints -> {10, 25, 50, 100, 250, 500, 1000}};
PlotConsQTimings[timings : { _?VectorQ ..}, OptionPattern[]] :=
ListLogLogPlot[
Thread[{OptionValue[NonConsecutivePoints], #}] & /# Transpose[timings],
Frame -> True, Joined -> True, PlotMarkers -> Automatic
]
I timed the following functions consQSzabolcs1, consQSzabolcs2, consQBrett, consQRCollyer, consQBelisarius, consQWRFold, consQWRFold2, consQWRFold3, consQWRMatch, and MrWizard's version of consQBelisarius.
In ascending order of the left most timing: consQBelisarius, consQWizard, consQRCollyer, consQBrett, consQSzabolcs1, consQWRMatch, consQSzabolcs2, consQWRFold2, consQWRFold3,and consQWRFold.
Edit: Reran all of the functions with timeAvg (the second one) instead of Timing in consQTiming. I did still average over 100 runs, though. For the most part, there was any change, except for the lowest two where there is some variation from run to run. So, take those two lines with a grain of salt as they're timings are practically identical.
I am now convinced that belisarius is trying to get my goat by writing intentionally convoluted code. :-p
I would write: f = Range[##, Sign[#2 - #]]& ## #[[{1, -1}]] == # &
Also, I believe that WReach probably intended to write something like:
consQFold[a_] :=
Catch[
Fold[If[#2 === # + 1, #2, Throw#False] &, a[[1]] - 1, a];
True
]
Often I have written: {Min##, Max##} &
Yet this seems inefficient, as the expression must be scanned twice, once to find the minimum value, and once to find the maximum value. Is there a faster way to do this? The expression is often a tensor or array.
This beats it by a bit.
minMax = Compile[{{list, _Integer, 1}},
Module[{currentMin, currentMax},
currentMin = currentMax = First[list];
Do[
Which[
x < currentMin, currentMin = x,
x > currentMax, currentMax = x],
{x, list}];
{currentMin, currentMax}],
CompilationTarget -> "C",
RuntimeOptions -> "Speed"];
v = RandomInteger[{0, 1000000000}, {10000000}];
minMax[v] // Timing
I think it's a little faster than Leonid's version because Do is a bit faster than For, even in compiled code.
Ultimately, though, this is an example of the kind of performance hit you take when using a high level, functional programming language.
Addition in response to Leonid:
I don't think that the algorithm can account for all the time difference. Much more often than not, I think, both tests will be applied anyway. The difference between Do and For is measurable, however.
cf1 = Compile[{{list, _Real, 1}},
Module[{sum},
sum = 0.0;
Do[sum = sum + list[[i]]^2,
{i, Length[list]}];
sum]];
cf2 = Compile[{{list, _Real, 1}},
Module[{sum, i},
sum = 0.0;
For[i = 1, i <= Length[list],
i = i + 1, sum = sum + list[[i]]^2];
sum]];
v = RandomReal[{0, 1}, {10000000}];
First /#{Timing[cf1[v]], Timing[cf2[v]]}
{0.685562, 0.898232}
I think this is as fast as it gets, within Mathematica programming practices. The only way I see to attempt to make it faster within mma is to use Compile with the C compilation target, as follows:
getMinMax =
Compile[{{lst, _Real, 1}},
Module[{i = 1, min = 0., max = 0.},
For[i = 1, i <= Length[lst], i++,
If[min > lst[[i]], min = lst[[i]]];
If[max < lst[[i]], max = lst[[i]]];];
{min, max}], CompilationTarget -> "C", RuntimeOptions -> "Speed"]
However, even this seems to be somewhat slower than your code:
In[61]:= tst = RandomReal[{-10^7,10^7},10^6];
In[62]:= Do[getMinMax[tst],{100}]//Timing
Out[62]= {0.547,Null}
In[63]:= Do[{Min##,Max##}&[tst],{100}]//Timing
Out[63]= {0.484,Null}
You probably can write the function entirely in C, and then compile and load as dll - you may eliminate some overhead this way, but I doubt that you will win more than a few percents - not worth the effort IMO.
EDIT
It is interesting that you may significantly increase the speed of the compiled solution with manual common subexpression elimination (lst[[i]] here):
getMinMax =
Compile[{{lst, _Real, 1}},
Module[{i = 1, min = 0., max = 0., temp},
While[i <= Length[lst],
temp = lst[[i++]];
If[min > temp, min = temp];
If[max < temp, max = temp];];
{min, max}], CompilationTarget -> "C", RuntimeOptions -> "Speed"]
is a little faster than {Min##,Max##}.
For an array, you could do the simplest functional thing and use
Fold[{Min[##],Max[##]}&, First##, Rest##]& # data
Unfortunately, it is no speed demon. Even for short lists, 5 elements, all of Leonid's answers and Mark's answer are at least 7 times faster, uncompiled in v.7. For long lists, 25000 elements, this gets worse with Mark's being 19.6 times faster, yet even at this length this solution took only about 0.05 secs to run.
However, I'd not count out {Min[#], Max[#]}& as an option. Uncompiled it was 1.7 times faster than Mark's for short list and nearly 15 times faster for long lists (8 times and nearly 300 times faster, respectively, than the Fold solution).
Unfortunately, I could not get good numbers for the compiled versions of either {Min[#], Max[#]}&, Leonid's, or Mark's answers, instead I got numerous incomprehensible error messages. In fact, {Min[#], Max[#]}& increased in execution time. The Fold solution improved dramatically, though, and took twice as long as Leonid's answers' uncompiled times.
Edit: for the curious, here's some timing measurements of the uncompiled functions -
Each function was used on 100 lists of the length specified on the horizontal axis and the average time, in seconds, is the vertical axis. In ascending order of time, the curves are {Min[#], Max[#]}&, Mark's answer, Leonid's second answer, Leonid's first answer, and the Fold method from above.
For all of you who are doing timings I'd like to warn you that order of execution is extremely important. For instance, look at the following two subtly different timings tests:
(1)
res =
Table[
a = RandomReal[{0, 100}, 10^8];
{
Min[a] // AbsoluteTiming // First, Max[a] // AbsoluteTiming // First,
Max[a] // AbsoluteTiming // First, Min[a] // AbsoluteTiming // First
}
, {100}
]
The odd man out here is the last Min
(2)
res =
Table[
a = RandomReal[{0, 100}, 10^8];
{
Max[a] // AbsoluteTiming // First, Min[a] // AbsoluteTiming // First,
Min[a] // AbsoluteTiming // First, Max[a] // AbsoluteTiming // First
}
, {100}
]
Here, the highest timing is found for the first Max, the second-highest for the second max and the two Mins are about the same and lowest. Actually, I'd expect Max and Min to take about the same time,but they don't. The former seems to take 50% more time than the latter. Having the pole position also seems to come with a 50% handicap.
Now a comparison with the algorithms given by Mark and Leonid:
res =
Table[
a = RandomReal[{0, 100}, 10^8];
{
{Max[a], Min[a]} // AbsoluteTiming // First,
{Min##, Max##} &#a // AbsoluteTiming // First,
getMinMax[a] // AbsoluteTiming // First,
minMax[a] // AbsoluteTiming // First,
{Min[a], Max[a]} // AbsoluteTiming // First
}
, {100}
]
Here we find about .3 s for the {Max[a], Min[a]} (which includes the pole position handicap), the .1 level is for Mark's method; the others are all about the same.