Often I have written: {Min##, Max##} &
Yet this seems inefficient, as the expression must be scanned twice, once to find the minimum value, and once to find the maximum value. Is there a faster way to do this? The expression is often a tensor or array.
This beats it by a bit.
minMax = Compile[{{list, _Integer, 1}},
Module[{currentMin, currentMax},
currentMin = currentMax = First[list];
Do[
Which[
x < currentMin, currentMin = x,
x > currentMax, currentMax = x],
{x, list}];
{currentMin, currentMax}],
CompilationTarget -> "C",
RuntimeOptions -> "Speed"];
v = RandomInteger[{0, 1000000000}, {10000000}];
minMax[v] // Timing
I think it's a little faster than Leonid's version because Do is a bit faster than For, even in compiled code.
Ultimately, though, this is an example of the kind of performance hit you take when using a high level, functional programming language.
Addition in response to Leonid:
I don't think that the algorithm can account for all the time difference. Much more often than not, I think, both tests will be applied anyway. The difference between Do and For is measurable, however.
cf1 = Compile[{{list, _Real, 1}},
Module[{sum},
sum = 0.0;
Do[sum = sum + list[[i]]^2,
{i, Length[list]}];
sum]];
cf2 = Compile[{{list, _Real, 1}},
Module[{sum, i},
sum = 0.0;
For[i = 1, i <= Length[list],
i = i + 1, sum = sum + list[[i]]^2];
sum]];
v = RandomReal[{0, 1}, {10000000}];
First /#{Timing[cf1[v]], Timing[cf2[v]]}
{0.685562, 0.898232}
I think this is as fast as it gets, within Mathematica programming practices. The only way I see to attempt to make it faster within mma is to use Compile with the C compilation target, as follows:
getMinMax =
Compile[{{lst, _Real, 1}},
Module[{i = 1, min = 0., max = 0.},
For[i = 1, i <= Length[lst], i++,
If[min > lst[[i]], min = lst[[i]]];
If[max < lst[[i]], max = lst[[i]]];];
{min, max}], CompilationTarget -> "C", RuntimeOptions -> "Speed"]
However, even this seems to be somewhat slower than your code:
In[61]:= tst = RandomReal[{-10^7,10^7},10^6];
In[62]:= Do[getMinMax[tst],{100}]//Timing
Out[62]= {0.547,Null}
In[63]:= Do[{Min##,Max##}&[tst],{100}]//Timing
Out[63]= {0.484,Null}
You probably can write the function entirely in C, and then compile and load as dll - you may eliminate some overhead this way, but I doubt that you will win more than a few percents - not worth the effort IMO.
EDIT
It is interesting that you may significantly increase the speed of the compiled solution with manual common subexpression elimination (lst[[i]] here):
getMinMax =
Compile[{{lst, _Real, 1}},
Module[{i = 1, min = 0., max = 0., temp},
While[i <= Length[lst],
temp = lst[[i++]];
If[min > temp, min = temp];
If[max < temp, max = temp];];
{min, max}], CompilationTarget -> "C", RuntimeOptions -> "Speed"]
is a little faster than {Min##,Max##}.
For an array, you could do the simplest functional thing and use
Fold[{Min[##],Max[##]}&, First##, Rest##]& # data
Unfortunately, it is no speed demon. Even for short lists, 5 elements, all of Leonid's answers and Mark's answer are at least 7 times faster, uncompiled in v.7. For long lists, 25000 elements, this gets worse with Mark's being 19.6 times faster, yet even at this length this solution took only about 0.05 secs to run.
However, I'd not count out {Min[#], Max[#]}& as an option. Uncompiled it was 1.7 times faster than Mark's for short list and nearly 15 times faster for long lists (8 times and nearly 300 times faster, respectively, than the Fold solution).
Unfortunately, I could not get good numbers for the compiled versions of either {Min[#], Max[#]}&, Leonid's, or Mark's answers, instead I got numerous incomprehensible error messages. In fact, {Min[#], Max[#]}& increased in execution time. The Fold solution improved dramatically, though, and took twice as long as Leonid's answers' uncompiled times.
Edit: for the curious, here's some timing measurements of the uncompiled functions -
Each function was used on 100 lists of the length specified on the horizontal axis and the average time, in seconds, is the vertical axis. In ascending order of time, the curves are {Min[#], Max[#]}&, Mark's answer, Leonid's second answer, Leonid's first answer, and the Fold method from above.
For all of you who are doing timings I'd like to warn you that order of execution is extremely important. For instance, look at the following two subtly different timings tests:
(1)
res =
Table[
a = RandomReal[{0, 100}, 10^8];
{
Min[a] // AbsoluteTiming // First, Max[a] // AbsoluteTiming // First,
Max[a] // AbsoluteTiming // First, Min[a] // AbsoluteTiming // First
}
, {100}
]
The odd man out here is the last Min
(2)
res =
Table[
a = RandomReal[{0, 100}, 10^8];
{
Max[a] // AbsoluteTiming // First, Min[a] // AbsoluteTiming // First,
Min[a] // AbsoluteTiming // First, Max[a] // AbsoluteTiming // First
}
, {100}
]
Here, the highest timing is found for the first Max, the second-highest for the second max and the two Mins are about the same and lowest. Actually, I'd expect Max and Min to take about the same time,but they don't. The former seems to take 50% more time than the latter. Having the pole position also seems to come with a 50% handicap.
Now a comparison with the algorithms given by Mark and Leonid:
res =
Table[
a = RandomReal[{0, 100}, 10^8];
{
{Max[a], Min[a]} // AbsoluteTiming // First,
{Min##, Max##} &#a // AbsoluteTiming // First,
getMinMax[a] // AbsoluteTiming // First,
minMax[a] // AbsoluteTiming // First,
{Min[a], Max[a]} // AbsoluteTiming // First
}
, {100}
]
Here we find about .3 s for the {Max[a], Min[a]} (which includes the pole position handicap), the .1 level is for Mark's method; the others are all about the same.
Related
I am looking for some help with a Poisson Solver I am writing in Mathematica. The code is quite long with Arrays plugged in, but the full details can be found at http://pastebin.com/uSrSDcW6
I am calculating voltages given charge densities using the central difference method derived from Poisson's Eqn. After calculating the voltage, I test the data set for convergence. I am setting convergence thresholds on the order of 10^-1000+. I have the loop set up to kick out after 10000 iterations incase something goes awry, as a fail safe. I have a loop counter in place for sanity. The program seems to run fine as long as the convergence threshold is set to 10^-100.
My question is this: No matter what I update the threshold too, ex, 10^-100, 10^-150, the computation stops after 633 iterations and kicks out of the loop. I would appreciate any help with this, I am completely stuck. I've added comments to the program that should be explanatory for anyone on this forum. Again, I know this description is limited, so please see the attached url http://pastebin.com/uSrSDcW6 for the full program.
*Update10/9/12***I've isolated my issue down to the 16 digit machine precision. I need to open that up to my machine max precision of 10^309. Mathematica Help is sparse on how to do this. ex "N[MachinePrecision, 50]". Where would I set this in my program to apply it to all computation? Ill paste the loop here if that helps
Vnew / Vold / RHO are 10x10x34 Matrices
Epsilon is a constant
(Initialize ConvergenceLoop to O - This will serve as a fail safe to kick out of the loop if necessary)
ConvergenceLoop = 0;
(Initialize Convergence to zero)
Convergence = 0;
While[Convergence == 0 && ConvergenceLoop < 10000,
(Run through all i,j,k elements,calculating new voltage values)
Do[Vnew[[i]][[j]][[k]] = (1/(2/deltaX^2 + 2/deltaY^2 +
2/deltaZ^2)) *(((Vold[[i + 1]][[j]][[k]] +
Vold[[i - 1]][[j]][[k]])/(deltaX^2)) + ((Vold[[i]][[j + 1]][[k]] +
Vold[[i]][[j - 1]][[k]])/(deltaY^2)) + ((Vold[[i]][[j]][[k + 1]] +
Vold[[i]][[j]][[k - 1]])/(deltaZ^2)) + ((Rho[[i]][[j]][[k]]/Epsilon))), {i, 2, 9}, {j, 2,9}, {k, 2, 33}];
(Assume converged so the loop is triggered when the test hits the first value exceeding the defined convergence threshold)
Convergence = 1;
(This is the convergence test. User defined Convergence threshold)
Do[If[Vold[[i]][[j]][[k]] == 0, Null,
If[(Vnew[[i]][[j]][[k]] - Vold[[i]][[j]][[k]])/Vold[[i]][[j]][[k]] > .0000001, Convergence = 0;
(*This is purely diagnostic. I added a Tracker point to follow the convergence along.
user defined at any element*)
If[i == 5 && j == 5 && k == 10,
Print[ "Tracker Point" (Vnew[[i]][[j]][[k]] -
Vold[[i]][[j]][[k]])/Vold[[i]][[j]][[k]]], Null],Null]], {i, 2, 9}, {j, 2, 9}, {k, 2, 33}];
(Ignore the first iteration until Vnew and Vold are nonzero)
If[ConvergenceLoop < 2, Convergence = 0, Null];
(Forces Vold to evolve with Vnew)
Vold = Vnew;
ConvergenceLoop ++;]
(Added SessionTime for future planning purposes)
If[ConvergenceLoop == 10000,
Print["Convergence Loop Limit Reached. " (SessionTime[]/3600) ],
Print["Convergence Loop Limit Not Reached."]];
(We broke out of the while loop,meaning our data converged,so print the converged values)
If[Convergence == 1,
Print[ ConvergenceLoop "Congratulations Converged!" MatrixForm [Vnew]], Print["Did Not Converge!"]];
Since based on the comments above you have narrowed this to a precision problem as I suspected, please read these:
Funny behaviour when plotting a polynomial of high degree and large coefficients
Global precision setting
Confused by (apparent) inconsistent precision
I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form
T={{x1,y1,z1},{x2,y2,z2},....}
and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin.
Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a list of values occupying that bin. At the end I map Total onto the list of bins, summing their contents (I do this because I sometimes want to do other things, like maximize).
I have tried using Gather and other such functions to do this but the above method was ridiculously faster, though perhaps I am using Gather poorly. Anyway It still takes a few minutes to do the sorting by my method and I feel like Mathematica can do better. Does anyone have a nice efficient algorithm handy?
Here is a method based on Szabolcs's post that is about about an order of magnitude faster.
data = RandomReal[5, {500000, 3}];
(*500k values*)
zvalues = data[[All, 3]];
epsilon = 1*^-10;(*prevent 101 index*)
(*rescale and round (x,y) coordinates to index pairs in the 1..100 range*)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
SparseArray[
gb[[All, 1, 1]] ->
Total[gb[[All, All, 2]], {2}]]]; // AbsoluteTiming
Gives about {2.012217, Null}
AbsoluteTiming[
System`SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}];
res3 = SparseArray[indexes -> zvalues];
System`SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
]
Gives about {0.195228, Null}
res3 == res2
True
"TreatRepeatedEntries" -> 1 adds duplicate positions up.
I intend to do a rewrite of the code below because of Szabolcs' readability concerns. Until then, know that if your bins are regular, and you can use Round, Floor, or Ceiling (with a second argument) in place of Nearest, the code below will be much faster. On my system, it tests faster than the GatherBy solution also posted.
Assuming I understand your requirements, I propose:
data = RandomReal[100, {75, 3}];
bins = {0, 20, 40, 60, 80, 100};
Reap[
Sow[{#3, #2}, bins ~Nearest~ #] & ### data,
bins,
Reap[Sow[#, bins ~Nearest~ #2] & ### #2, bins, Tr##2 &][[2]] &
][[2]] ~Flatten~ 1 ~Total~ {3} // MatrixForm
Refactored:
f[bins_] := Reap[Sow[{##2}, bins ~Nearest~ #]& ### #, bins, #2][[2]] &
bin2D[data_, X_, Y_] := f[X][data, f[Y][#2, #2~Total~2 &] &] ~Flatten~ 1 ~Total~ {3}
Use:
bin2D[data, xbins, ybins]
Here's my approach:
data = RandomReal[5, {500000, 3}]; (* 500k values *)
zvalues = data[[All, 3]];
epsilon = 1*^-10; (* prevent 101 index *)
(* rescale and round (x,y) coordinates to index pairs in the 1..100 range *)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
(* approach 1: create bin-matrix first, then fill up elements by adding zvalues *)
res1 = Module[
{result = ConstantArray[0, {100, 100}]},
Do[
AddTo[result[[##]], zvalues[[i]]] & ## indexes[[i]],
{i, Length[indexes]}
];
result
]; // Timing
(* approach 2: gather zvalues by indexes, add them up, convert them to a matrix *)
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
SparseArray[gb[[All, 1, 1]] -> (Total /# gb[[All, All, 2]])]
]; // Timing
res1 == res2
These two approaches (res1 & res2) can handle 100k and 200k elements per second, respectively, on this machine. Is this sufficiently fast, or do you need to run this whole program in a loop?
Here's my approach using the function SelectEquivalents defined in What is in your Mathematica tool bag? which is perfect for a problem like this one.
data = RandomReal[100, {75, 3}];
bins = Range[0, 100, 20];
binMiddles = (Most#bins + Rest#bins)/2;
nearest = Nearest[binMiddles];
SelectEquivalents[
data
,
TagElement -> ({First#nearest[#[[1]]], First#nearest[#[[2]]]} &)
,
TransformElement -> (#[[3]] &)
,
TransformResults -> (Total[#2] &)
,
TagPattern -> Flatten[Outer[List, binMiddles, binMiddles], 1]
,
FinalFunction -> (Partition[Flatten[# /. {} -> 0], Length[binMiddles]] &)
]
If you would want to group according to more than two dimensions you could use in FinalFunction this function to give to the list result the desired dimension (I don't remember where I found it).
InverseFlatten[l_,dimensions_]:= Fold[Partition[#, #2] &, l, Most[Reverse[dimensions]]];
I want to test if a list contains consecutive integers.
consQ[a_] := Module[
{ret = True},
Do[If[i > 1 && a[[i]] != a[[i - 1]] + 1, ret = False; Break[]], {i,
1, Length[a]}]; ret]
Although the function consQ does the job, I wonder if there is a better ( shorter, faster ) method of doing this, preferably using functional programming style.
EDIT:
The function above maps lists with consecutive integers in decreasing sequence to False. I would like to change this to True.
Szablics' solution is probably what I'd do, but here's an alternative:
consQ[a : {___Integer}] := Most[a] + 1 === Rest[a]
consQ[_] := False
Note that these approaches differ in how they handle the empty list.
You could use
consQ[a_List ? (VectorQ[#, IntegerQ]&)] := Union#Differences[a] === {1}
consQ[_] = False
You may want to remove the test for integers if you know that every list you pass to it will only have integers.
EDIT: A little extra: if you use a very old version that doesn't have Differences, or wonder how to implement it,
differences[a_List] := Rest[a] - Most[a]
EDIT 2: The requested change:
consQ[a : {Integer___}] := MatchQ[Union#Differences[a], {1} | {-1} | {}]
consQ[_] = False
This works with both increasing and decreasing sequences, and gives True for a list of size 1 or 0 as well.
More generally, you can test if the list of numbers are equally spaced with something like equallySpacedQ[a_List] := Length#Union#Differences[a] == 1
I think the following is also fast, and comparing reversed lists does not take longer:
a = Range[10^7];
f[a_] := Range[Sequence ## ##, Sign[-#[[1]] + #[[2]]]] &#{a[[1]], a[[-1]]} == a;
Timing[f[a]]
b = Reverse#a;
Timing[f[b]]
Edit
A short test for the fastests solutions so far:
a = Range[2 10^7];
Timing#consQSzab#a
Timing#consQBret#a
Timing#consQBeli#a
(*
{6.5,True}
{0.703,True}
{0.203,True}
*)
I like the solutions by the other two, but I'd be concerned about very long lists. Consider the data
d:dat[n_Integer?Positive]:= d = {1}~Join~Range[1, n]
which has its non-sequential point at the very beginning. Setting consQ1 for Brett's and consQ2 for Szabolcs, I get
AbsoluteTiming[ #[dat[ 10000 ] ]& /# {consQ1, consQ2}
{ {0.000110, False}, {0.001091, False} }
Or, roughly a ten times difference between the two, which stays relatively consistent with multiple trials. This time can be cut in roughly half by short-circuiting the process using NestWhile:
Clear[consQ3]
consQ3[a : {__Integer}] :=
Module[{l = Length[a], i = 1},
NestWhile[# + 1 &, i,
(#2 <= l) && a[[#1]] + 1 == a[[#2]] &,
2] > l
]
which tests each pair and only continues if they return true. The timings
AbsoluteTiming[consQ3[dat[ 10000 ]]]
{0.000059, False}
with
{0.000076, False}
for consQ. So, Brett's answer is fairly close, but occasionally, it will take twice as long.
Edit: I moved the graphs of the timing data to a Community Wiki answer.
Fold can be used in a fairly concise expression that runs very quickly:
consQFold[a_] := (Fold[If[#2 == #1 + 1, #2, Return[False]] &, a[[1]]-1, a]; True)
Pattern-matching can be used to provide a very clear expression of intent at the cost of substantially slower performance:
consQMatch[{___, i_, j_, ___}] /; j - i != 1 := False
consQMatch[_] = True;
Edit
consQFold, as written, works in Mathematica v8.0.4 but not in earlier versions of v8 or v7. To correct this problem, there are a couple of alternatives. The first is to explicitly name the Return point:
consQFold[a_] :=
(Fold[If[#2==#1+1, #2, Return[False,CompoundExpression]] &, a[[1]]-1, a]; True)
The second, as suggested by #Mr.Wizard, is to replace Return with Throw / Catch:
consQFold[a_] :=
Catch[Fold[If[#2 == #1 + 1, #2, Throw[False]]&, a[[1]]-1, a]; True]
Since the timing seems to be rather important. I've moved the comparisons between the various methods to this, Community Wiki, answer.
The data used are simply lists of consecutive integers, with a single non-consecutive point, and they're generated via
d : dat[n_Integer?Positive] := (d = {1}~Join~Range[1, n])
d : dat[n_Integer?Positive, p_Integer?Positive] /; p <= n :=
Range[1, p]~Join~{p}~Join~Range[p + 1, n]
where the first form of dat[n] is equivalent to dat[n, 1]. The timing code is simple:
Clear[consQTiming]
Options[consQTiming] = {
NonConsecutivePoints -> {10, 25, 50, 100, 250,500, 1000}};
consQTiming[fcns__, OptionPattern[]]:=
With[{rnd = RandomInteger[{1, #}, 100]},
With[{fcn = #},
Timing[ fcn[dat[10000, #]] & /# rnd ][[1]]/100
] & /# {fcns}
] & /# OptionValue[NonConsecutivePoints]
It generates 100 random integers between 1 and each of {10, 25, 50, 100, 250, 500, 1000} and dat then uses each of those random numbers as the non-consecutive point in a list 10,000 elements long. Each consQ implementation is then applied to each list produced by dat, and the results are averaged. The plotting function is simply
Clear[PlotConsQTimings]
Options[PlotConsQTimings] = {
NonConsecutivePoints -> {10, 25, 50, 100, 250, 500, 1000}};
PlotConsQTimings[timings : { _?VectorQ ..}, OptionPattern[]] :=
ListLogLogPlot[
Thread[{OptionValue[NonConsecutivePoints], #}] & /# Transpose[timings],
Frame -> True, Joined -> True, PlotMarkers -> Automatic
]
I timed the following functions consQSzabolcs1, consQSzabolcs2, consQBrett, consQRCollyer, consQBelisarius, consQWRFold, consQWRFold2, consQWRFold3, consQWRMatch, and MrWizard's version of consQBelisarius.
In ascending order of the left most timing: consQBelisarius, consQWizard, consQRCollyer, consQBrett, consQSzabolcs1, consQWRMatch, consQSzabolcs2, consQWRFold2, consQWRFold3,and consQWRFold.
Edit: Reran all of the functions with timeAvg (the second one) instead of Timing in consQTiming. I did still average over 100 runs, though. For the most part, there was any change, except for the lowest two where there is some variation from run to run. So, take those two lines with a grain of salt as they're timings are practically identical.
I am now convinced that belisarius is trying to get my goat by writing intentionally convoluted code. :-p
I would write: f = Range[##, Sign[#2 - #]]& ## #[[{1, -1}]] == # &
Also, I believe that WReach probably intended to write something like:
consQFold[a_] :=
Catch[
Fold[If[#2 === # + 1, #2, Throw#False] &, a[[1]] - 1, a];
True
]
This question is in a way a continuation of the question I asked here:Simple way to delete a matrix column in Mathematica to which #belisarius and #Daniel provided very helpful answers.
What I am generally trying to do is to extract from a matrix A specific lines and columns OR what remains after what those specified are removed. So this can be formally writtewn as, find TakeOperator and Drop Operator such that:
TakeOperator[A,{i1,..,ip},{j1,...,jq}]=(A[[ik]][[jl]]) (1<=k<=p, 1<=l<=q) =Table[A[[ik]][[jl]],{k,p},{l,q}]
We note Ic={i'1,...,i'p'}=Complement[{1,...,Length[A]},{i1,...,ip}];Jc={j'1,...,j'q'}=Complement[{1,...,Length[A]},{j1,...,jq}];
DropOperator[A,{i1,..,ip},{j1,...,jq}]=(A[[ik]][[jl]]) (1<=k'<=p', 1<=l'<=q') =Table[A[[ik']][[jl']],{k',p'},{l','q}]
While Table as described above does the trick, it is highly inefficient to use Table in that manner.
Just to give an idea, I took # belisarius example:
In: First#Timing[a = RandomInteger[1000, {5000, 5000}];]
Out:0.218
In:Clear[b,c]
In:First#Timing[
b = Table[
If[i < 100, If[j < 100, a[[i]][[j]], a[[i]][[j + 1]]],
If[j < 100, a[[i + 1]][[j]], a[[i + 1]][[j + 1]]]], {i,
4999}, {j, 4999}]]
Out:140.807
In:First#Timing[c = Drop[a, {100}, {100}]]
Out:0.093
In:c===b
Out:True
Note: With respect to the use of Drop in the earlier post, I thought about using it as well, but when I checked the documentation, there was no suggestion of getting it done the way #belisarius and #daniel suggested. If the documentation could be updated in that direction in future releases, it would be helpful.
Part directly supports lists of indices when slicing arrays. The following definitions exploit that:
takeOperator[a_?MatrixQ, rows_List, cols_List] :=
a[[rows, cols]]
dropOperator[a_?MatrixQ, rows_List, cols_List] :=
a[[##]]& ## complementaryIndices[a, rows, cols]
complementaryIndices[a_?MatrixQ, rows_List, cols_List] :=
Complement ### Transpose # {Range /# Dimensions # a, {rows, cols}}
Example use:
a = RandomInteger[1000, {5000, 5000}];
First # Timing # takeOperator[a, Range[1, 5000, 2], Range[1, 5000, 2]]
(* 0.016 *)
First # Timing # dropOperator[a, Range[1, 5000, 2], Range[1, 5000, 2]]
(* 0.015 *)
You can also use explicit ranges in a way that is fairly efficient. They may provide some more flexibility. Here is your example.
a = RandomInteger[1000, {5000, 5000}];
Timing[b = Drop[a, {101}, {101}];]
Out[66]= {0.041993, Null}
Timing[
c = a[[Join[Range[100], Range[102, 5000]],
Join[Range[100], Range[102, 5000]]]];]
Out[67]= {0.061991, Null}
c == b
Out[62]= True
I would also suggest use of Span except offhand I do not see how to get it to work in this setting.
Daniel Lichtblau
Wolfram Research
I need a way to identify local minima and maxima in time series data with Mathematica. This seems like it should be an easy thing to do, but it gets tricky. I posted this on the MathForum, but thought I might get some additional eyes on it here.
You can find a paper that discusses the problem at: http://www.cs.cmu.edu/~eugene/research/full/compress-series.pdf
I've tried this so far…
Get and format some data:
data = FinancialData["SPY", {"May 1, 2006", "Jan. 21, 2011"}][[All, 2]];
data = data/First#data;
data = Transpose[{Range[Length#data], data}];
Define 2 functions:
First method:
findMinimaMaxima[data_, window_] := With[{k = window},
data[[k + Flatten#Position[Partition[data[[All, 2]], 2 k + 1, 1], x_List /; x[[k + 1]] < Min[Delete[x, k + 1]] || x[[k + 1]] > Max[Delete[x, k + 1]]]]]]
Now another approach, although not as flexible:
findMinimaMaxima2[data_] := data[[Accumulate#(Length[#] & /# Split[Prepend[Sign[Rest#data[[All, 2]] - Most#data[[All, 2]]], 0]])]]
Look at what each the functions does. First findMinimaMaxima2[]:
minmax = findMinimaMaxima2[data];
{Length#data, Length#minmax}
ListLinePlot#minmax
This selects all minima and maxima and results (in this instance) in about a 49% data compression, but it doesn't have the flexibility of expanding the window.
This other method does. A window of 2, yields fewer and arguably more important extrema:
minmax2 = findMinimaMaxima[data, 2];
{Length#data, Length#minmax2}
ListLinePlot#minmax2
But look at what happens when we expand the window to 60:
minmax2 = findMinimaMaxima[data, 60];
ListLinePlot[{data, minmax2}]
Some of the minima and maxima no longer alternate.
Applying findMinimaMaxima2[] to the output of findMinimaMaxima[] gives a workaround...
minmax3 = findMinimaMaxima2[minmax2];
ListLinePlot[{data, minmax2, minmax3}]
, but this seems like a clumsy way to address the problem.
So, the idea of using a fixed window to look left and right doesn't quite do everything one would like. I began thinking about an alternative that could use a range value R (e.g. a percent move up or down) that the function would need to meet or exceed to set the next minima or maxima. Here's my first try:
findMinimaMaxima3[data_, R_] := Module[{d, n, positions},
d = data[[All, 2]];
n = Transpose[{data[[All, 1]], Rest#FoldList[If[(#2 <= #1 + #1*R && #2 >= #1) || (#2 >= #1 - #1* R && #2 <= #1), #1, #2] &, d[[1]], d]}];
n = Sign[Rest#n[[All, 2]] - Most#n[[All, 2]]];
positions = Flatten#Rest[Most[Position[n, Except[0]]]];
data[[positions]]
]
minmax4 = findMinimaMaxima3[data, 0.1];
ListLinePlot[{data, minmax4}]
This too benefits from post processing with findMinimaMaxima2[]
ListLinePlot[{data, findMinimaMaxima2[minmax4]}]
But if you look closely, you see that it misses the extremes if they go beyond the R value in several positions - including the chart's absolute minimum and maximum as well as along the big moves up and down. Changing the R value shows how it misses the top and bottoms even more:
minmax4 = findMinimaMaxima3[data, 0.15];
ListLinePlot[{data, minmax4}]
So, I need to reconsider. Anyone can look at a plot of the data and easily identify the important minima and maxima. It seems harder to get an algorithm to do it. A window and/or an R value seem important to the solution, but neither on their own seems enough (at least not in the approaches above).
Can anyone extend any of the approaches shown or suggest an alternative to identifying the important minima and maxima?
Happy to forward a notebook with all of this code and discussion in it. Let me know if anyone needs it.
Thank you,
Jagra
I suggest to use an iterative approach. The following functions are taken from this post, and while they can be written more concisely without Compile, they'll do the job:
localMinPositionsC =
Compile[{{pts, _Real, 1}},
Module[{result = Table[0, {Length[pts]}], i = 1, ctr = 0},
For[i = 2, i < Length[pts], i++,
If[pts[[i - 1]] > pts[[i]] && pts[[i + 1]] > pts[[i]],
result[[++ctr]] = i]];
Take[result, ctr]]];
localMaxPositionsC =
Compile[{{pts, _Real, 1}},
Module[{result = Table[0, {Length[pts]}], i = 1, ctr = 0},
For[i = 2, i < Length[pts], i++,
If[pts[[i - 1]] < pts[[i]] && pts[[i + 1]] < pts[[i]],
result[[++ctr]] = i]];
Take[result, ctr]]];
Here is your data plot:
dplot = ListLinePlot[data]
Here we plot the mins, which are obtained after 3 iterations:
mins = ListPlot[Nest[#[[localMinPositionsC[#[[All, 2]]]]] &, data, 3],
PlotStyle -> Directive[PointSize[0.015], Red]]
The same for maxima:
maxs = ListPlot[Nest[#[[localMaxPositionsC[#[[All, 2]]]]] &, data, 3],
PlotStyle -> Directive[PointSize[0.015], Green]]
And the resulting plot:
Show[{dplot, mins, maxs}]
You may vary the number of iterations, to get more coarse-grained or finer minima/maxima.
Edit:
actually, I just noticed that a couple of points were still missed by this method, both for the
minima and maxima. So, I suggest it as a starting point, not as a complete solution. Perhaps, you
could analyze minima/maxima, coming from different iterations, and sometimes include those from a "previous", more fine-grained one. Also, the only "physical reason" that this kind of works, is that the nature of the financial data appears to be fractal-like, with several distinctly different scales. Each iteration in the above Nest-s targets a particular scale. This would not work so well for an arbitrary signal.