Related
I have a main function that I am using to fit measured heat capacities to a certain model:
HeatCapacity[a_, t_] :=
If[t > 1,
t, (6*a^3/(\[Pi]^2*t)) NIntegrate[
FermiDirac[a, \[Epsilon],
t]*(1 - FermiDirac[a, \[Epsilon],
t])*(Energy[\[Epsilon], t]^(2)/t -
0.5*d\[CapitalDelta]2[t]), {\[Epsilon], 0, \[Infinity]},
AccuracyGoal -> 5]];
Implicit to this function are repeated calls to another numerically integrated function:
Delta[t_] :=
Block[{a =
Subscript[k, B]
Subscript[\[CapitalTheta], D]/Subscript[\[CapitalDelta], 0],
b = Subscript[\[Alpha], BCS]/2/t},
Return[FindRoot[
NIntegrate[(1/Sqrt[\[Epsilon]^2 + x^2]) Tanh[
b*Sqrt[\[Epsilon]^2 + x^2]], {\[Epsilon], 0, a},
AccuracyGoal -> 5] - Log[2 a], {x, 0.01, 0.1}] [[1, 2]]*1 ]]
Now, once Delta[t] has been calculated once, it doesn't change, and should in principle not need to be recalculated every time it's called - which is what my current method is doing.
My question is, how can I best optimise my code such that Delta[t] is only calculated once? Would some form of lookup table be required? If so, does this change my requirements for performing the non-linear fit routine (i.e. some kind of discrete non linear model fit?).
For completeness, I shall include my full code with all functions used. I realise the mathematica subscripts etc don't appear nicely on here so I can reformat if people prefer.
Cheers
Energy[\[Epsilon]_, t_] :=
Sqrt[\[Epsilon]^2 +
Delta[t]^2]; (* energy spectrum, \[Epsilon] measured wrt Fermi \
level *)
g[\[Epsilon]_, t_] :=
Subscript[\[Alpha], BCS] Energy[\[Epsilon], t]/(2 t);
dtop[t_] :=
NIntegrate[Sech[g[\[Epsilon], t]]^2, {\[Epsilon], 0, \[Infinity]},
AccuracyGoal -> 5];
dbottom[t_] :=
NIntegrate[
t*Sech[g[\[Epsilon], t]]^2/(2 Energy[\[Epsilon], t]^2) -
t^2 Tanh[
g[\[Epsilon], t]]/(Subscript[\[Alpha], BCS]
Energy[\[Epsilon], t]^3), {\[Epsilon], 0, \[Infinity]},
AccuracyGoal -> 5];
d\[CapitalDelta]2[t_] := dtop[t]/dbottom[t];
FermiDirac[\[Alpha]_, \[Epsilon]_,
t_] := (E^(\[Alpha] Energy[\[Epsilon], t]/t) + 1)^(-1);
HeatCapacity[a_, t_] :=
If[t > 1,
t, (6*a^3/(\[Pi]^2*t)) NIntegrate[
FermiDirac[a, \[Epsilon],
t]*(1 - FermiDirac[a, \[Epsilon],
t])*(Energy[\[Epsilon], t]^(2)/t -
0.5*d\[CapitalDelta]2[t]), {\[Epsilon], 0, \[Infinity]},
AccuracyGoal -> 5]];
ScaledHC[\[Gamma]_, Tc_, a_, t_] := \[Gamma] Tc HeatCapacity[a, t/Tc];
result = NonlinearModelFit[datain,
ScaledHC[gamma, 4.7, alpha,
t], {{gamma, Subscript[\[Gamma], fit]}, {alpha, Subscript[\[Alpha],
fit]}}, t,
Weights -> (1./err^2.), {StepMonitor :>
Print["Gamma = ", Evaluate[gamma],
" \!\(\*SubscriptBox[\(T\), \(C\)]\) = ", Evaluate[b],
" alpha = ", Evaluate[alpha]]}]
You might read a little about the difference between = and :=, sometimes called Set[] and SetDelayed[], not to be confused with == and there is even an === and they are all different. = evaluates the right hand side the moment the cell is evaluated with the current values that all variables have and saves that result under the name Delta. It shouldn't evaluate the body of Delta again when the left hand side is used or used repeatedly in the future, just as long as you don't manually evaluate that cell again. := simply stores away the form of the right hand side and will evaluate that each time in the future when the left hand side is used and with the value variables have at that future time. If your variables won't change then perhaps = will be enough for you.
If you can arrange to have all the variables except t initialized before you then evaluate
Delta[t_]=Block[...]
Then that should evaluate only once. You could verify this by including a diagnostic Print[] inside your Delta function.
You might also investigate whether you really need the Return[] in that. Return[] has been a source of perplexing problems in the past and if I understand your code correctly that can be eliminated. The *1 might also be discarded because I can't see what that is doing for you.
If you don't necessarily need to hide the values of a and b then you might even write this as
Delta[t_]=(a=...;b=...;FindRoot[...][[1,2]]);
where you replace each ... with the obvious. The ( and ) will over-ride the precedence of semicolon and allow you to have compound statements in a single function definition.
You could even do further modifications to the code, but perhaps this is enough for now.
I have not, and don't have enough information to do so, carefully tested all your code after making such modifications, so test this carefully.
I have not used PackedArray before, but just started looking at using them from reading some discussion on them here today.
What I have is lots of large size 1D and 2D matrices of all reals, and no symbolic (it is a finite difference PDE solver), and so I thought that I should take advantage of using PackedArray.
I have an initialization function where I allocate all the data/grids needed. So I went and used ToPackedArray on them. It seems a bit faster, but I need to do more performance testing to better compare speed before and after and also compare RAM usage.
But while I was looking at this, I noticed that some operations in M automatically return lists in PackedArray already, and some do not.
For example, this does not return packed array
a = Table[RandomReal[], {5}, {5}];
Developer`PackedArrayQ[a]
But this does
a = RandomReal[1, {5, 5}];
Developer`PackedArrayQ[a]
and this does
a = Table[0, {5}, {5}];
b = ListConvolve[ {{0, 1, 0}, {1, 4, 1}, {0, 1, 1}}, a, 1];
Developer`PackedArrayQ[b]
and also matrix multiplication does return result in packed array
a = Table[0, {5}, {5}];
b = a.a;
Developer`PackedArrayQ[b]
But element wise multiplication does not
b = a*a;
Developer`PackedArrayQ[b]
My question : Is there a list somewhere which documents which M commands return PackedArray vs. not? (assuming data meets the requirements, such as Real, not mixed, no symbolic, etc..)
Also, a minor question, do you think it will be better to check first if a list/matrix created is already packed before calling calling ToPackedArray on it? I would think calling ToPackedArray on list already packed will not cost anything, as the call will return right away.
thanks,
update (1)
Just wanted to mention, that just found that PackedArray symbols not allowed in a demo CDF as I got an error uploading one with one. So, had to remove all my packing code out. Since I mainly write demos, now this topic is just of an academic interest for me. But wanted to thank everyone for time and good answers.
There isn't a comprehensive list. To point out a few things:
Basic operations with packed arrays will tend to remain packed:
In[66]:= a = RandomReal[1, {5, 5}];
In[67]:= Developer`PackedArrayQ /# {a, a.a, a*a}
Out[67]= {True, True, True}
Note above that that my version (8.0.4) doesn't unpack for element-wise multiplication.
Whether a Table will result in a packed array depends on the number of elements:
In[71]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {10}]]
Out[71]= False
In[72]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {11}]]
Out[72]= True
In[73]:= Developer`PackedArrayQ[Table[RandomReal[], {25}, {10}]]
Out[73]= True
On["Packing"] will turn on messages to let you know when things unpack:
In[77]:= On["Packing"]
In[78]:= a = RandomReal[1, 10];
In[79]:= Developer`PackedArrayQ[a]
Out[79]= True
In[80]:= a[[1]] = 0 (* force unpacking due to type mismatch *)
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {10}. >>
Out[80]= 0
Operations that do per-element inspection will usually unpack the array,
In[81]:= a = RandomReal[1, 10];
In[82]:= Position[a, Max[a]]
Developer`FromPackedArray::unpack: Unpacking array in call to Position. >>
Out[82]= {{4}}
There penalty for calling ToPackedArray on an already packed list is small enough that I wouldn't worry about it too much:
In[90]:= a = RandomReal[1, 10^7];
In[91]:= Timing[Do[Identity[a], {10^5}];]
Out[91]= {0.028089, Null}
In[92]:= Timing[Do[Developer`ToPackedArray[a], {10^5}];]
Out[92]= {0.043788, Null}
The frontend prefers packed to unpacked arrays, which can show up when dealing with Dynamic and Manipulate:
In[97]:= Developer`PackedArrayQ[{1}]
Out[97]= False
In[98]:= Dynamic[Developer`PackedArrayQ[{1}]]
Out[98]= True
When looking into performance, focus on cases where large lists are getting unpacked, rather than the small ones. Unless the small ones are in big loops.
This is just an addendum to Brett's answer:
SystemOptions["CompileOptions"]
will give you the lengths being used for which a function will return a packed array. So if you did need to pack a small list, as an alternative to using Developer`ToPackedArray you could temporarily set a smaller number for one of the compile options. e.g.
SetSystemOptions["CompileOptions" -> {"TableCompileLength" -> 20}]
Note also some difference between functions which to me at least doesn't seem intuitive so I generally have to test these kind of things whenever I use them rather than instinctively knowing what will work best:
f = # + 1 &;
g[x_] := x + 1;
data = RandomReal[1, 10^6];
On["Packing"]
Timing[Developer`PackedArrayQ[f /# data]]
{0.131565, True}
Timing[Developer`PackedArrayQ[g /# data]]
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {1000000}.
{1.95083, False}
Another addition to Brett's answer: If a list is a packed array then a ToPackedArray is very fast since this checked quite early. Also you might find this valuable:
http://library.wolfram.com/infocenter/Articles/3141/
In general for numerics stuff look for talks from Rob Knapp and/or Mark Sofroniou.
When I develop numerics codes, I write the function and then use On["Packing"] to make sure that everything is packed that needs to be packed.
Concerning Mike's answer, the threshold has been introduced since for small stuff there is overhead. Where the threshold is is hardware dependent. It might be an idea to write a function that sets these threshold based on measurements done on the computer.
I have a list of numbers. I want to extract from the list runs of numbers that fall inside some band and have some minimum length. For instance, suppose the I want to operate on this list:
thisList = {-1.2, -1.8, 1.5, -0.6, -0.8, -0.1, 1.4, -0.3, -0.1, -0.7}
with band=1 and runLength=3. I would like to have
{{-0.6, -0.8, -0.1}, {-0.3, -0.1, -0.7}}
as the result. Right now I'm using
Cases[
Partition[thisList,runLength,1],
x_ /; Abs[Max[x] - Min[x]] < band
]
The main problem is that where runs overlap, I get many copies of the run. For instance, using
thisList = {-1.2, -1.8, 1.5, -0.6, -0.8, -0.1, -0.5, -0.3, -0.1, -0.7}
gives me
{{-0.6, -0.8, -0.1}, {-0.8, -0.1, -0.5}, {-0.1, -0.5, -0.3}, {-0.5, -0.3, -0.1}, {-0.3, -0.1, -0.7}}
where I would rather have
{-0.6, -0.8, -0.1, -0.5, -0.3, -0.1, -0.7}
without doing some hokey reduction of the overlapping result. What's the proper way? It'd be nice if it didn't involve exploding the data using Partition.
EDIT
Apparenty, my first solution has at least two serious flaws: it is dead slow and completely impractical for lists larger than 100 elements, and it contains some bug(s) which I wasn't able to identify yet - it is missing some bands sometimes. So, I will provide two (hopefuly correct) and much more efficient alternatives, and I provide the flawed one below for any one interested.
A solution based on linked lists
Here is a solution based on linked lists. It allows us to still use patterns but avoid inefficiencies caused by patterns containing __ or ___ (when repeatedly applied):
ClearAll[toLinkedList];
toLinkedList[x_List] := Fold[{#2, #1} &, {}, Reverse#x]
ClearAll[accumF];
accumF[llFull_List, acc_List, {h_, t_List}, ctr_, max_, min_, band_, rLen_] :=
With[{cmax = Max[max, h], cmin = Min[min, h]},
accumF[llFull, {acc, h}, t, ctr + 1, cmax, cmin, band, rLen] /;
Abs[cmax - cmin] < band];
accumF[llFull_List, acc_List, ll : {h_, _List}, ctr_, _, _, band_, rLen_] /; ctr >= rLen :=
accumF[ll, (Sow[acc]; {}), ll, 0, h, h, band, rLen];
accumF[llFull : {h_, t : {_, _List}}, _List, ll : {head_, _List}, _, _, _, band_, rLen_] :=
accumF[t, {}, t, 0, First#t, First#t, band, rLen];
accumF[llFull_List, acc_List, {}, ctr_, _, _, _, rLen_] /; ctr >= rLen := Sow[acc];
ClearAll[getBandsLL];
getBandsLL[lst_List, runLength_Integer, band_?NumericQ] :=
Block[{$IterationLimit = Infinity},
With[{ll = toLinkedList#lst},
Map[Flatten,
If[# === {}, #, First##] &#
Reap[
accumF[ll, {}, ll, 0, First#ll, First#ll, band,runLength]
][[2]]
]
]
];
Here are examples of use:
In[246]:= getBandsLL[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,1.4,-0.3,-0.1,-0.7},3,1]
Out[246]= {{-0.6,-0.8,-0.1},{-0.3,-0.1,-0.7}}
In[247]:= getBandsLL[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7},3,1]
Out[247]= {{-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7}}
The main idea of the function accumF is to traverse the number list (converted to a linked list prior to that), and accumulate a band in another linked list, which is passed to it as a second argument. Once the band condition fails, the accumulated band is memorized using Sow (if it was long enough), and the process starts over with the remaining part of the linked list. The ctr parameter may not be needed if we would choose to use Depth[acc] instead.
There are a few non-obvious things present in the above code. One subtle point is that an attempt to join the two middle rules for accumF into a single rule (they look very similar) and use CompoundExpression (something like (If[ctr>=rLen, Sow[acc];accumF[...])) on the r.h.s. would lead to a non-tail-recursive accumF (See this answer for a more detailed discussion of this issue. This is also why I make the (Sow[acc]; {}) line inside a function call - to avoid the top-level CompoundExpression on the r.h.s.). Another subtle point is that I have to maintain a copy of the linked list containing the remaining elements right after the last successful match was found, since in the case of unsuccessful sequence I need to roll back to that list minus its first element, and start over. This linked list is stored in the first argument of accumF.
Note that passing large linked lists does not cost much, since what is copied is only a first element (head) and a pointer to the rest (tail). This is the main reason why using linked lists vastly improves performance, as compared to the case of patterns like {___,x__,right___} - because in the latter case, a full sequences x or right are copied. With linked lists, we effectively only copy a few references, and therefore our algorithms behave roughly as we expect (linearly in the length of the data list here). In this answer, I also mentioned the use of linked lists in such cases as one of the techniques to optimize code (section 3.4).
Compile - based solution
Here is a straightforward but not too elegant function based on Compile, which finds a list of starting and ending bands positions in the list:
bandPositions =
Compile[{{lst, _Real, 1}, {runLength, _Integer}, {band, _Real}},
Module[{i = 1, j, currentMin, currentMax,
startEndPos = Table[{0, 0}, {Length[lst]}], ctr = 0},
For[i = 1, i <= Length[lst], i++,
currentMin = currentMax = lst[[i]];
For[j = i + 1, j <= Length[lst], j++,
If[lst[[j]] < currentMin,
currentMin = lst[[j]],
(* else *)
If[lst[[j]] > currentMax,
currentMax = lst[[j]]
]
];
If[Abs[currentMax - currentMin] >= band ,
If[ j - i >= runLength,
startEndPos[[++ctr]] = {i, j - 1}; i = j - 1
];
Break[],
(* else *)
If[j == Length[lst] && j - i >= runLength - 1,
startEndPos[[++ctr]] = {i, j}; i = Length[lst];
Break[];
];
]
]; (* inner For *)
]; (* outer For *)
Take[startEndPos, ctr]], CompilationTarget -> "C"];
This is used in the final function:
getBandsC[lst_List, runLength_Integer, band_?NumericQ] :=
Map[Take[lst, #] &, bandPositions[lst, runLength, band]]
Examples of use:
In[305]:= getBandsC[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,1.4,-0.3,-0.1,-0.7},3,1]
Out[305]= {{-0.6,-0.8,-0.1},{-0.3,-0.1,-0.7}}
In[306]:= getBandsC[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7},3,1]
Out[306]= {{-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7}}
Benchmarks
In[381]:=
largeTest = RandomReal[{-5,5},50000];
(res1 =getBandsLL[largeTest,3,1]);//Timing
(res2 =getBandsC[largeTest,3,1]);//Timing
res1==res2
Out[382]= {1.109,Null}
Out[383]= {0.016,Null}
Out[384]= True
Obviously, if one wants performance, Compile wins hands down. My observations for larger lists are that both solutions have approximately linear complexity with the size of the number list (as they should), with compiled one roughly 150 times faster on my machine than the one based on linked lists.
Remarks
In fact, both methods encode the same algorithm, although this may not be obvious. The one with recursion and patterns is arguably somewhat more understandable, but that is a matter of opinion.
A simple, but slow and buggy version
Here is the original code that I wrote first to solve this problem. This is based on a rather straightforward use of patterns and repeated rule application. As mentioned, one disadvantage of this method is its very bad performance. This is actually another case against using constructs like {___,x__,y___} in conjunction with repeated rule application, for anything longer than a few dozens elements. In the mentioned recommendations for code optimization techniques, this corresponds to the section 4.1.
Anyways, here is the code:
If[# === {}, #, First##] &#
Reap[thisList //. {
left___,
Longest[x__] /;Length[{x}] >= runLength && Abs[Max[{x}] - Min[{x}]] < band,
right___} :> (Sow[{x}]; {right})][[2]]
It works correctly for both of the original small test lists. It also looks generally correct, but for larger lists it often misses some bands, which can be seen by comparison with the other two methods. I wasn't so far able to localize the bug, since the code seems pretty transparent.
I'd try this instead:
thisList /. {___, Longest[a : Repeated[_, {3, Infinity}]], ___} :>
{a} /; Abs[Max#{a} - Min#{a}] < 1
where Repeated[_, {3, Infinity}] guarantees that you get at least 3 terms, and Longest ensures it gives you the longest run possible. As a function,
Clear[f]
f[list_List, band_, minlen_Integer?Positive] := f[list, band, minlen, Infinity]
f[list_List, band_,
minlen_Integer?Positive, maxlen_?Positive] /; maxlen >= minlen :=
list /. {___, Longest[a : Repeated[_, {minlen, maxlen}]], ___} :> {a} /;
Abs[Max#{a} - Min#{a}] < band
Very complex answers given. :-) I think I have a simpler approach for you. Define to yourself what similarity means to you, and use GatherBy[] to collect all similar elements, or SplitBy[] to collect "runs" of similar elements (then remove "runs" of minimal unaccepted length, say 1 or 2, via DeleteCases[]).
Harder question is establishing similarity. By your method 1.2,0.9,1.9,0.8 would group first three elements, but not last three, but 0.9 and 0.8 are just as similar, and 1.9 would kick it out of your band. What about .4,.5,.6,.7,.8,.9,1.0,1.1,1.2,1.3,1.4,1.5 - where does similarity end?
Also look into ClusteringComponents[] and FindClusters[]
I have a linear recurrence problem where the next element relies on more than just the prior value, e.g. the Fibonacci sequence. One method calculating the nth element is to define it via a function call, e.g.
Fibonacci[0] = 0; Fibonacci[1] = 1;
Fibonacci[n_Integer?Positive] := Fibonacci[n] + Fibonacci[n - 1]
and for the sequence I'm working with, that is exactly what I do. (The definition is inside of a Module so I don't pollute Global`.) However, I am going to be using this with 210 - 213 points, so I'm concerned about the extra overhead when I just need the last term and none of the prior elements. I'd like to use Fold to do this, but Fold only passes the immediately prior result which means it is not directly useful for a general linear recurrence problem.
I'd like a pair of functions to replace Fold and FoldList that pass a specified number of prior sequence elements to the function, i.e.
In[1] := MultiFoldList[f, {1,2}, {3,4,5}] (* for lack of a better name *)
Out[1]:= {1, 2, f[3,2,1], f[4,f[3,2,1],2], f[5,f[4,f[3,2,1],2],f[3,2,1]]}
I had something that did this, but I closed the notebook prior to saving it. So, if I rewrite it on my own, I'll post it.
Edit: as to why I am not using RSolve or MatrixPower to solve this. My specific problem is I'm performing an n-point Pade approximant to analytically continue a function I only know at a set number of points on the imaginary axis, {zi}. Part of creating the approximant is to generate a set of coefficients, ai, which is another recurrence relation, that are then fed into the final relationship
A[n+1]== A[n] + (z - z[[n]]) a[[n+1]] A[n-1]
which is not amenable to either RSolve nor MatrixPower, at least that I can see.
Can RecurrenceTable perform this task for you?
Find the 1000th term in a recurrence depending on two previous values:
In[1]:= RecurrenceTable[{a[n] == a[n - 1] + a[n - 2],
a[1] == a[2] == 1}, a,
{n, {1000}}]
Out[1]= {4346655768693745643568852767504062580256466051737178040248172\
9089536555417949051890403879840079255169295922593080322634775209689623\
2398733224711616429964409065331879382989696499285160037044761377951668\
49228875}
Edit: If your recurrence is defined by a function f[m, n] that doesn't like to get evaluated for non-numeric m and n, then you could use Condition:
In[2]:= f[m_, n_] /; IntegerQ[m] && IntegerQ[n] := m + n
The recurrence table in terms of f:
In[3]:= RecurrenceTable[{a[n] == f[a[n - 1], a[n - 2]],
a[1] == a[2] == 1}, a, {n, {1000}}]
Out[3]= {4346655768693745643568852767504062580256466051737178040248172\
9089536555417949051890403879840079255169295922593080322634775209689623\
2398733224711616429964409065331879382989696499285160037044761377951668\
49228875}
A multiple foldlist can be useful but it would not be an efficient way to get linear recurrences evaluated for large inputs. A couple of alternatives are to use RSolve or matrix powers times a vector of initial values.
Here are these methods applied to example if nth term equal to n-1 term plus two times n-2 term.
f[n_] = f[n] /. RSolve[{f[n] == f[n - 1] + 2*f[n - 2], f[1] == 1, f[2] == 1},
f[n], n][[1]]
Out[67]= 1/3 (-(-1)^n + 2^n)
f2[n_Integer] := Last[MatrixPower[{{0, 1}, {2, 1}}, n - 2].{1, 1}]
{f[11], f2[11]}
Out[79]= {683, 683}
Daniel Lichtblau
Wolfram Research
Almost a convoluted joke, but you could use a side-effect of NestWhileList
fibo[n_] :=
Module[{i = 1, s = 1},
NestWhileList[ s &, 1, (s = Total[{##}]; ++i < n) &, 2]];
Not too bad performance:
In[153]:= First#Timing#fibo[10000]
Out[153]= 0.235
By changing the last 2 by any integer you may pass the last k results to your function (in this case Total[]).
LinearRecurrence and RecurrenceTable are very useful.
For small kernels, the MatrixPower method that Daniel gave is the fastest.
For some problems these may not be applicable, and you may need to roll your own.
I will be using Nest because I believe that is appropriate for this problem, but a similar construct can be used with Fold.
A specific example, the Fibonacci sequence. This may not be the cleanest possible for that, but I believe you will see the utility as I continue.
fib[n_] :=
First#Nest[{##2, # + #2} & ## # &, {1, 1}, n - 1]
fib[15]
Fibonacci[15]
Here I use Apply (##) so that I can address elements with #, #2, etc., rathern than #[[1]] etc. I use SlotSequence to drop the first element from the old list, and Sequence it into the new list at the same time.
If you are going to operate on the entire list at once, then a simple Append[Rest##, ... may be better. Either method can be easily generalized. For example, a simple linear recurrence implementation is
lr[a_, b_, n_Integer] := First#Nest[Append[Rest##, a.#] &, b, n - 1]
lr[{1,1}, {1,1}, 15]
(the kernel is in reverse order from the built in LinearRecurrence)
I was wondering how I could obtain the first element of a (already ordered) list that is greater than a given threshold.
I don't know really well the list manipulation function in Mathematica, maybe someone can give me a trick to do that efficiently.
Select does what you need, and will be consistent, respecting the pre-existing order of the list:
Select[list, # > threshold &, 1]
For example:
In[1]:= Select[{3, 5, 4, 1}, # > 3 &, 1]
Out[1]= {5}
You can provide whatever threshold or criterion function you need in the second argument.
The third argument specifies you only one (i.e., the first) element that matches.
Hope that helps!
Joe correctly states in his answer that one would expect a binary search technique to be faster than Select, which seem to just do a linear search even if the list is sorted:
ClearAll[selectTiming]
selectTiming[length_, iterations_] := Module[
{lst},
lst = Sort[RandomInteger[{0, 100}, length]];
(Do[Select[lst, # == 2 &, 1], {i, 1, iterations}] // Timing //
First)/iterations
]
(I arbitrarily put the threshold at 2 for demonstration purposes).
However, the BinarySearch function in Combinatorica is a) not appropriate (it returns an element which does match the requested one, but not the first (leftmost), which is what the question is asking.
To obtain the leftmost element that is larger than a threshold, given an ordered list, we may proceed either recursively:
binSearch[lst_,threshold_]:= binSearchRec[lst,threshold,1,Length#lst]
(*
return position of leftmost element greater than threshold
breaks if the first element is greater than threshold
lst must be sorted
*)
binSearchRec[lst_,threshold_,min_,max_] :=
Module[{i=Floor[(min+max)/2],element},
element=lst[[i]];
Which[
min==max,max,
element <= threshold,binSearchRec[lst,threshold,i+1,max],
(element > threshold) && ( lst[[i-1]] <= threshold ), i,
True, binSearchRec[lst,threshold,min,i-1]
]
]
or iteratively:
binSearchIterative[lst_,threshold_]:=Module[
{min=1,max=Length#lst,i,element},
While[
min<=max,
i=Floor[(min+max)/2];
element=lst[[i]];
Which[
min==max, Break[],
element<=threshold, min=i+1,
(element>threshold) && (lst[[i-1]] <= threshold), Break[],
True, max=i-1
]
];
i
]
The recursive approach is clearer but I'll stick to the iterative one.
To test its speed,
ClearAll[binSearchTiming]
binSearchTiming[length_, iterations_] := Module[
{lst},
lst = Sort[RandomInteger[{0, 100}, length]];
(Do[binSearchIterative[lst, 2], {i, 1, iterations}] // Timing //
First)/iterations
]
which produces
so, much faster and with better scaling behaviour.
Actually it's not necessary to compile it but I did anyway.
In conclusion, then, don't use Select for long lists.
This concludes my answer. There follow some comments on doing a binary search by hand or via the Combinatorica package.
I compared the speed of a (compiled) short routine to do binary search vs the BinarySearch from Combinatorica. Note that this does not do what the question asks (and neither does BinarySearch from Combinatorica); the code I gave above does.
The binary search may be implemented iteratively as
binarySearch = Compile[{{arg, _Integer}, {list, _Integer, 1}},
Module[ {min = 1, max = Length#list,
i, x},
While[
min <= max,
i = Floor[(min + max)/2];
x = list[[i]];
Which[
x == arg, min = max = i; Break[],
x < arg, min = i + 1,
True, max = i - 1
]
];
If[ 0 == max,
0,
max
]
],
CompilationTarget -> "C",
RuntimeOptions -> "Speed"
];
and we can now compare this and BinarySearch from Combinatorica. Note that a) the list must be sorted b) this will not return the first matching element, but a matching element.
lst = Sort[RandomInteger[{0, 100}, 1000000]];
Let us compare the two binary search routines. Repeating 50000 times:
Needs["Combinatorica`"]
Do[binarySearch[2, lst], {i, 50000}] // Timing
Do[BinarySearch[lst, 2], {i, 50000}] // Timing
(*
{0.073437, Null}
{4.8354, Null}
*)
So the handwritten one is faster. Now since in fact a binary search just visits 6-7 points in the list for these parameters (something like {500000, 250000, 125000, 62500, 31250, 15625, 23437} for instance), clearly the difference is simply overhead; perhaps BinarySearch is more general, for instance, or not compiled.
You might want to look at TakeWhile[] and LengthWhile[] as well.
http://reference.wolfram.com/mathematica/ref/TakeWhile.html
http://reference.wolfram.com/mathematica/ref/LengthWhile.html
list /. {___, y_ /; y > 3, ___} :> {y}
For example
{3, 5, 4, 1} /. {___, y_ /; y > 3, ___} :> {y}
{5}
Using Select will solve the problem, but it is a poor solution if you care about efficiency. Select goes over all the elements of the list, and therefore will take time which is linear in the length of the list.
Since you say the list is ordered, it is much better to use BinarySearch, which will work in a time which is logarithmic in the size of the list. The expression (edit: I have made a small adjustment since the previous expression I wrote did not handle correctly recurring elements in the list. another edit: this still doesn't work when the threshold itself appears in the list as a recurring element, see comments):
Floor[BinarySearch[list,threshold]+1]
will give you the index of the desired element. If all the elements are smaller than the threshold, you'll get the length of the list plus one.
p.s. don't forget to call Needs["Combinatorica'"] before using BinarySearch.
Just for future reference, starting from v10 you can use SelectFirst
It has some added niceties, such as returning Missing[] or default values.
From the docs:
SelectFirst[{e1,e2,…}, crit] gives the first ei for which crit[ei] is True, or Missing["NotFound"] if none is found.
SelectFirst[{e1,e2,…}, crit, default] gives default if there is no ei such that crit[ei] is True.
For your specific case, you would use:
SelectFirst[list, # > threshold &]