Related
I have a list of numbers. I want to extract from the list runs of numbers that fall inside some band and have some minimum length. For instance, suppose the I want to operate on this list:
thisList = {-1.2, -1.8, 1.5, -0.6, -0.8, -0.1, 1.4, -0.3, -0.1, -0.7}
with band=1 and runLength=3. I would like to have
{{-0.6, -0.8, -0.1}, {-0.3, -0.1, -0.7}}
as the result. Right now I'm using
Cases[
Partition[thisList,runLength,1],
x_ /; Abs[Max[x] - Min[x]] < band
]
The main problem is that where runs overlap, I get many copies of the run. For instance, using
thisList = {-1.2, -1.8, 1.5, -0.6, -0.8, -0.1, -0.5, -0.3, -0.1, -0.7}
gives me
{{-0.6, -0.8, -0.1}, {-0.8, -0.1, -0.5}, {-0.1, -0.5, -0.3}, {-0.5, -0.3, -0.1}, {-0.3, -0.1, -0.7}}
where I would rather have
{-0.6, -0.8, -0.1, -0.5, -0.3, -0.1, -0.7}
without doing some hokey reduction of the overlapping result. What's the proper way? It'd be nice if it didn't involve exploding the data using Partition.
EDIT
Apparenty, my first solution has at least two serious flaws: it is dead slow and completely impractical for lists larger than 100 elements, and it contains some bug(s) which I wasn't able to identify yet - it is missing some bands sometimes. So, I will provide two (hopefuly correct) and much more efficient alternatives, and I provide the flawed one below for any one interested.
A solution based on linked lists
Here is a solution based on linked lists. It allows us to still use patterns but avoid inefficiencies caused by patterns containing __ or ___ (when repeatedly applied):
ClearAll[toLinkedList];
toLinkedList[x_List] := Fold[{#2, #1} &, {}, Reverse#x]
ClearAll[accumF];
accumF[llFull_List, acc_List, {h_, t_List}, ctr_, max_, min_, band_, rLen_] :=
With[{cmax = Max[max, h], cmin = Min[min, h]},
accumF[llFull, {acc, h}, t, ctr + 1, cmax, cmin, band, rLen] /;
Abs[cmax - cmin] < band];
accumF[llFull_List, acc_List, ll : {h_, _List}, ctr_, _, _, band_, rLen_] /; ctr >= rLen :=
accumF[ll, (Sow[acc]; {}), ll, 0, h, h, band, rLen];
accumF[llFull : {h_, t : {_, _List}}, _List, ll : {head_, _List}, _, _, _, band_, rLen_] :=
accumF[t, {}, t, 0, First#t, First#t, band, rLen];
accumF[llFull_List, acc_List, {}, ctr_, _, _, _, rLen_] /; ctr >= rLen := Sow[acc];
ClearAll[getBandsLL];
getBandsLL[lst_List, runLength_Integer, band_?NumericQ] :=
Block[{$IterationLimit = Infinity},
With[{ll = toLinkedList#lst},
Map[Flatten,
If[# === {}, #, First##] &#
Reap[
accumF[ll, {}, ll, 0, First#ll, First#ll, band,runLength]
][[2]]
]
]
];
Here are examples of use:
In[246]:= getBandsLL[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,1.4,-0.3,-0.1,-0.7},3,1]
Out[246]= {{-0.6,-0.8,-0.1},{-0.3,-0.1,-0.7}}
In[247]:= getBandsLL[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7},3,1]
Out[247]= {{-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7}}
The main idea of the function accumF is to traverse the number list (converted to a linked list prior to that), and accumulate a band in another linked list, which is passed to it as a second argument. Once the band condition fails, the accumulated band is memorized using Sow (if it was long enough), and the process starts over with the remaining part of the linked list. The ctr parameter may not be needed if we would choose to use Depth[acc] instead.
There are a few non-obvious things present in the above code. One subtle point is that an attempt to join the two middle rules for accumF into a single rule (they look very similar) and use CompoundExpression (something like (If[ctr>=rLen, Sow[acc];accumF[...])) on the r.h.s. would lead to a non-tail-recursive accumF (See this answer for a more detailed discussion of this issue. This is also why I make the (Sow[acc]; {}) line inside a function call - to avoid the top-level CompoundExpression on the r.h.s.). Another subtle point is that I have to maintain a copy of the linked list containing the remaining elements right after the last successful match was found, since in the case of unsuccessful sequence I need to roll back to that list minus its first element, and start over. This linked list is stored in the first argument of accumF.
Note that passing large linked lists does not cost much, since what is copied is only a first element (head) and a pointer to the rest (tail). This is the main reason why using linked lists vastly improves performance, as compared to the case of patterns like {___,x__,right___} - because in the latter case, a full sequences x or right are copied. With linked lists, we effectively only copy a few references, and therefore our algorithms behave roughly as we expect (linearly in the length of the data list here). In this answer, I also mentioned the use of linked lists in such cases as one of the techniques to optimize code (section 3.4).
Compile - based solution
Here is a straightforward but not too elegant function based on Compile, which finds a list of starting and ending bands positions in the list:
bandPositions =
Compile[{{lst, _Real, 1}, {runLength, _Integer}, {band, _Real}},
Module[{i = 1, j, currentMin, currentMax,
startEndPos = Table[{0, 0}, {Length[lst]}], ctr = 0},
For[i = 1, i <= Length[lst], i++,
currentMin = currentMax = lst[[i]];
For[j = i + 1, j <= Length[lst], j++,
If[lst[[j]] < currentMin,
currentMin = lst[[j]],
(* else *)
If[lst[[j]] > currentMax,
currentMax = lst[[j]]
]
];
If[Abs[currentMax - currentMin] >= band ,
If[ j - i >= runLength,
startEndPos[[++ctr]] = {i, j - 1}; i = j - 1
];
Break[],
(* else *)
If[j == Length[lst] && j - i >= runLength - 1,
startEndPos[[++ctr]] = {i, j}; i = Length[lst];
Break[];
];
]
]; (* inner For *)
]; (* outer For *)
Take[startEndPos, ctr]], CompilationTarget -> "C"];
This is used in the final function:
getBandsC[lst_List, runLength_Integer, band_?NumericQ] :=
Map[Take[lst, #] &, bandPositions[lst, runLength, band]]
Examples of use:
In[305]:= getBandsC[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,1.4,-0.3,-0.1,-0.7},3,1]
Out[305]= {{-0.6,-0.8,-0.1},{-0.3,-0.1,-0.7}}
In[306]:= getBandsC[{-1.2,-1.8,1.5,-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7},3,1]
Out[306]= {{-0.6,-0.8,-0.1,-0.5,-0.3,-0.1,-0.7}}
Benchmarks
In[381]:=
largeTest = RandomReal[{-5,5},50000];
(res1 =getBandsLL[largeTest,3,1]);//Timing
(res2 =getBandsC[largeTest,3,1]);//Timing
res1==res2
Out[382]= {1.109,Null}
Out[383]= {0.016,Null}
Out[384]= True
Obviously, if one wants performance, Compile wins hands down. My observations for larger lists are that both solutions have approximately linear complexity with the size of the number list (as they should), with compiled one roughly 150 times faster on my machine than the one based on linked lists.
Remarks
In fact, both methods encode the same algorithm, although this may not be obvious. The one with recursion and patterns is arguably somewhat more understandable, but that is a matter of opinion.
A simple, but slow and buggy version
Here is the original code that I wrote first to solve this problem. This is based on a rather straightforward use of patterns and repeated rule application. As mentioned, one disadvantage of this method is its very bad performance. This is actually another case against using constructs like {___,x__,y___} in conjunction with repeated rule application, for anything longer than a few dozens elements. In the mentioned recommendations for code optimization techniques, this corresponds to the section 4.1.
Anyways, here is the code:
If[# === {}, #, First##] &#
Reap[thisList //. {
left___,
Longest[x__] /;Length[{x}] >= runLength && Abs[Max[{x}] - Min[{x}]] < band,
right___} :> (Sow[{x}]; {right})][[2]]
It works correctly for both of the original small test lists. It also looks generally correct, but for larger lists it often misses some bands, which can be seen by comparison with the other two methods. I wasn't so far able to localize the bug, since the code seems pretty transparent.
I'd try this instead:
thisList /. {___, Longest[a : Repeated[_, {3, Infinity}]], ___} :>
{a} /; Abs[Max#{a} - Min#{a}] < 1
where Repeated[_, {3, Infinity}] guarantees that you get at least 3 terms, and Longest ensures it gives you the longest run possible. As a function,
Clear[f]
f[list_List, band_, minlen_Integer?Positive] := f[list, band, minlen, Infinity]
f[list_List, band_,
minlen_Integer?Positive, maxlen_?Positive] /; maxlen >= minlen :=
list /. {___, Longest[a : Repeated[_, {minlen, maxlen}]], ___} :> {a} /;
Abs[Max#{a} - Min#{a}] < band
Very complex answers given. :-) I think I have a simpler approach for you. Define to yourself what similarity means to you, and use GatherBy[] to collect all similar elements, or SplitBy[] to collect "runs" of similar elements (then remove "runs" of minimal unaccepted length, say 1 or 2, via DeleteCases[]).
Harder question is establishing similarity. By your method 1.2,0.9,1.9,0.8 would group first three elements, but not last three, but 0.9 and 0.8 are just as similar, and 1.9 would kick it out of your band. What about .4,.5,.6,.7,.8,.9,1.0,1.1,1.2,1.3,1.4,1.5 - where does similarity end?
Also look into ClusteringComponents[] and FindClusters[]
Given a set of numbers and a set of binary operations,
what is the fastest way to create random expression trees or exhaustively check every possible combination in Mathematica?
What I am trying to solve is given:
numbers={25,50,75,100,3,6} (* each can ONLY be used ONCE *)
operators={Plus,Subtract,Times,Divide} (* each can be used repeatedly *)
target=99
find expression trees that would evaluate to target.
I have two solutions whose performances I give for the case where expression trees contain exactly 4 of the numbers and 3 operators:
random sample & choice: ~25K trees / second
exhaustive scan: 806400 trees in ~2.15 seconds
(timed on a laptop with: Intel(R) Core(TM)2 Duo CPU T9300 # 2.50GHz, 3GB ram, no parallelization used yet but would be most welcome in answers)
My notebooks are a bit messy at the moment. So I would first love to pose the question and hope for original ideas and answers while I clean up my code for sharing.
Largest possible case is where every expression tree uses up all the (6) numbers and 'Length[numbers]-1' (5) operators.
Performance of methods in the largest case is:
random sample & choice: ~21K trees / second
exhaustive scan: 23224320 trees in ~100 seconds
Also I am using Mathematica 8.0.1 so I am more than all ears if there are any ways to do it in OpenCL or using compiled functions wiht CompilationTarget->"C", etc.
OK, this is not elegant or fast, and it's buggy, but it works (sometimes). It uses a monte carlo method, implementing the metropolis algorithm for a weight function that I (arbitrarily) selected just to see if this would work. This was some time ago for a similar problem; I suppose my mathematica skills have improved as it looks ugly now, but I have no time to fix it at the moment.
Execute this (it looks more reasonable when you paste it into a notebook):
ClearAll[swap];
swap[lst_, {p1_, p2_}] :=
ReplacePart[
lst, {p1 \[Rule] lst\[LeftDoubleBracket]p2\[RightDoubleBracket],
p2 \[Rule] lst\[LeftDoubleBracket]p1\[RightDoubleBracket]}]
ClearAll[evalops];
(*first element of opslst is Identity*)
evalops[opslst_, ord_, nums_] :=
Module[{curval}, curval = First#nums;
Do[curval =
opslst\[LeftDoubleBracket]p\[RightDoubleBracket][curval,
nums\[LeftDoubleBracket]ord\[LeftDoubleBracket]p\
\[RightDoubleBracket]\[RightDoubleBracket]], {p, 2, Length#nums}];
curval]
ClearAll[randomizeOrder];
randomizeOrder[ordlst_] :=
swap[ordlst, RandomInteger[{1, Length#ordlst}, 2]]
ClearAll[randomizeOps];
(*never touch the first element*)
randomizeOps[oplst_, allowedOps_] :=
ReplacePart[
oplst, {RandomInteger[{2, Length#oplst}] \[Rule] RandomChoice[ops]}]
ClearAll[takeMCstep];
takeMCstep[goal_, opslst_, ord_, nums_, allowedops_] :=
Module[{curres, newres, newops, neword, p},
curres = evalops[opslst, ord, nums];
newops = randomizeOps[opslst, allowedops];
neword = randomizeOrder[ord];
newres = evalops[newops, neword, nums];
Switch[Abs[newres - goal],
0, {newops,
neword}, _, (p = Abs[curres - goal]/Abs[newres - goal];
If[RandomReal[] < p, {newops, neword}, {opslst, ord}])]]
then to solve your actual problem, do
ops = {Times, Plus, Subtract, Divide}
nums = {25, 50, 75, 100, 3, 6}
ord = Range[Length#nums]
(*the first element is identity to simplify the logic later*)
oplist = {Identity}~Join~RandomChoice[ops, Length#nums - 1]
out = NestList[
takeMCstep[
99, #\[LeftDoubleBracket]1\[RightDoubleBracket], #\
\[LeftDoubleBracket]2\[RightDoubleBracket], nums, ops] &, {oplist,
ord}, 10000]
and then to see that it worked,
ev = Map[evalops[#\[LeftDoubleBracket]1\[RightDoubleBracket], #\
\[LeftDoubleBracket]2\[RightDoubleBracket], nums] &, out];
ev // Last // N
ev // ListPlot[#, PlotMarkers \[Rule] None] &
giving
thus, it obtained the correct order of operators and numbers after around 2000 tries.
As I said, it's ugly, inefficient, and badly programmed as it was a quick-and-dirty adaptation of a quick-and-dirty hack. If you're interested I can clean up and explain the code.
This was a fun question. Here's my full solution:
ExprEval[nums_, ops_] := Fold[
#2[[1]][#1, #2[[2]]] &,
First#nums,
Transpose[{ops, Rest#nums}]]
SymbolicEval[nums_, ops_] := ExprEval[nums, ToString /# ops]
GetExpression[nums_, ops_, target_] := Select[
Tuples[ops, Length#nums - 1],
(target == ExprEval[nums, #]) &]
Usage example:
nums = {-1, 1, 2, 3};
ops = {Plus, Subtract, Times, Divide};
solutions = GetExpression[nums, ops, 3]
ExprEval[nums, #] & /# solutions
SymbolicEval[nums, #] & /# solutions
Outputs:
{{Plus, Times, Plus}, {Plus, Divide, Plus}, {Subtract, Plus,
Plus}, {Times, Plus, Times}, {Divide, Plus, Times}}
{3, 3, 3, 3, 3}
{"Plus"["Times"["Plus"[-1, 1], 2], 3],
"Plus"["Divide"["Plus"[-1, 1], 2], 3],
"Plus"["Plus"["Subtract"[-1, 1], 2], 3],
"Times"["Plus"["Times"[-1, 1], 2], 3],
"Times"["Plus"["Divide"[-1, 1], 2], 3]}
How it works
The ExprEval function takes in the numbers and operations, and applies them using (I think) RPN:
ExprEval[{1, 2, 3}, {Plus, Times}] == (1 + 2) * 3
It does this by continually folding pairs of numbers using the appropriate operation.
Now that I have a way to evaluate an expression tree, I just needed to generate them. Using Tuples, I'm able to generate all the different operators that I would intersperse between the numbers.
Once you get all possible operations, I used Select to pick out the the ones that evaluate to the target number.
Drawbacks
The solution above is really slow. Generating all the possible tuples is exponential in time. If there are k operations and n numbers, it's on the order of O(k^n).
For n = 10, it took 6 seconds to complete on Win 7 x64, Core i7 860, 12 GB RAM. The timings of the runs match the theoretical time complexity almost exactly:
Red line is the theoretical, blue is experimental. The x-axis is size of the nums input and the y-axis is the time in seconds to enumerate all solutions.
The above solution also solves the problem using a functional programming style. It looks pretty, but the thing also sucks up a butt ton of memory since it's storing the full results at nearly every step.
It doesn't even make use of parallelization, and I'm not entirely certain how you would even parallelize the solution I produced.
Some limitations
Mr. Wizard brought to my attention that this code only solves for only particular set of solutions. Given some input such as {a, b, c, d, e, ... } it only permutes the operators in between the numbers. It doesn't permute the ordering of the numbers. If it were to permute the numbers as well, the time complexity would rise up to O(k^n * n!) where k is the number of operators and n is the length of the input number array.
The following will produce the set of solutions for any permutation of the input numbers and operators:
(* generates a lists of the form
{
{number permutation, {{op order 1}, {op order 2}, ... }
}, ...
}*)
GetAllExpressions[nums_, ops_, target_] :=
ParallelMap[{#, GetExpression[#, ops, target]} &,
Tuples[nums, Length#nums]]
I have the following problem.
I am developing a stochastic simulator which samples configurations of the system at random and stores the statistics of how many times each configuration has been visited at certain time instances. Roughly the code works like this
f[_Integer][{_Integer..}] :=0
...
someplace later in the code, e.g.,
index = get index;
c = get random configuration (i.e. a tuple of integers, say a pair {n1, n2});
f[index][c] = f[index][c] + 1;
which tags that configuration c has occurred once more in the simulation at time instance index.
Once the code has finished there is a list of definitions for f that looks something like this (I typed it by hand just to emphasize the most important parts)
?f
f[1][{1, 2}] = 112
f[1][{3, 4}] = 114
f[2][{1, 6}] = 216
f[2][{2, 7}] = 227
...
f[index][someconfiguration] = some value
...
f[_Integer][{_Integer..}] :=0
Please note that pattern free definitions that come first can be rather sparse. Also one cannot know which values and configurations will be picked.
The problem is to efficiently extract down values for a desired index, for example issue something like
result = ExtractConfigurationsAndOccurences[f, 2]
which should give a list with the structure
result = {list1, list2}
where
list1 = {{1, 6}, {2, 7}} (* the list of configurations that occurred during the simulation*)
list2 = {216, 227} (* how many times each of them occurred *)
The problem is that ExtractConfigurationsAndOccurences should be very fast. The only solution I could come up with was to use SubValues[f] (which gives the full list) and filter it with Cases statement. I realize that this procedure should be avoided at any cost since there will be exponentially many configurations (definitions) to test, which slows down the code considerably.
Is there a natural way in Mathematica to do this in a fast way?
I was hoping that Mathematica would see f[2] as a single head with many down values but using DownValues[f[2]] gives nothing. Also using SubValues[f[2]] results in an error.
This is a complete rewrite of my previous answer. It turns out that in my previous attempts, I overlooked a much simpler method based on a combination of packed arrays and sparse arrays, that is much faster and more memory - efficient than all previous methods (at least in the range of sample sizes where I tested it), while only minimally changing the original SubValues - based approach. Since the question was asked about the most efficient method, I will remove the other ones from the answer (given that they are quite a bit more complex and take a lot of space. Those who would like to see them can inspect past revisions of this answer).
The original SubValues - based approach
We start by introducing a function to generate the test samples of configurations for us. Here it is:
Clear[generateConfigurations];
generateConfigurations[maxIndex_Integer, maxConfX_Integer, maxConfY_Integer,
nconfs_Integer] :=
Transpose[{
RandomInteger[{1, maxIndex}, nconfs],
Transpose[{
RandomInteger[{1, maxConfX}, nconfs],
RandomInteger[{1, maxConfY}, nconfs]
}]}];
We can generate a small sample to illustrate:
In[3]:= sample = generateConfigurations[2,2,2,10]
Out[3]= {{2,{2,1}},{2,{1,1}},{1,{2,1}},{1,{1,2}},{1,{1,2}},
{1,{2,1}},{2,{1,2}},{2,{2,2}},{1,{2,2}},{1,{2,1}}}
We have here only 2 indices, and configurations where both "x" and "y" numbers vary from 1 to 2 only - 10 such configurations.
The following function will help us imitate the accumulation of frequencies for configurations, as we increment SubValues-based counters for repeatedly occurring ones:
Clear[testAccumulate];
testAccumulate[ff_Symbol, data_] :=
Module[{},
ClearAll[ff];
ff[_][_] = 0;
Do[
doSomeStuff;
ff[#1][#2]++ & ## elem;
doSomeMoreStaff;
, {elem, data}]];
The doSomeStuff and doSomeMoreStaff symbols are here to represent some code that might preclude or follow the counting code. The data parameter is supposed to be a list of the form produced by generateConfigurations. For example:
In[6]:=
testAccumulate[ff,sample];
SubValues[ff]
Out[7]= {HoldPattern[ff[1][{1,2}]]:>2,HoldPattern[ff[1][{2,1}]]:>3,
HoldPattern[ff[1][{2,2}]]:>1,HoldPattern[ff[2][{1,1}]]:>1,
HoldPattern[ff[2][{1,2}]]:>1,HoldPattern[ff[2][{2,1}]]:>1,
HoldPattern[ff[2][{2,2}]]:>1,HoldPattern[ff[_][_]]:>0}
The following function will extract the resulting data (indices, configurations and their frequencies) from the list of SubValues:
Clear[getResultingData];
getResultingData[f_Symbol] :=
Transpose[{#[[All, 1, 1, 0, 1]], #[[All, 1, 1, 1]], #[[All, 2]]}] &#
Most#SubValues[f, Sort -> False];
For example:
In[10]:= result = getResultingData[ff]
Out[10]= {{2,{2,1},1},{2,{1,1},1},{1,{2,1},3},{1,{1,2},2},{2,{1,2},1},
{2,{2,2},1},{1,{2,2},1}}
To finish with the data-processing cycle, here is a straightforward function to extract data for a fixed index, based on Select:
Clear[getResultsForFixedIndex];
getResultsForFixedIndex[data_, index_] :=
If[# === {}, {}, Transpose[#]] &[
Select[data, First## == index &][[All, {2, 3}]]];
For our test example,
In[13]:= getResultsForFixedIndex[result,1]
Out[13]= {{{2,1},{1,2},{2,2}},{3,2,1}}
This is presumably close to what #zorank tried, in code.
A faster solution based on packed arrays and sparse arrays
As #zorank noted, this becomes slow for larger sample with more indices and configurations. We will now generate a large sample to illustrate that (note! This requires about 4-5 Gb of RAM, so you may want to reduce the number of configurations if this exceeds the available RAM):
In[14]:=
largeSample = generateConfigurations[20,500,500,5000000];
testAccumulate[ff,largeSample];//Timing
Out[15]= {31.89,Null}
We will now extract the full data from the SubValues of ff:
In[16]:= (largeres = getResultingData[ff]); // Timing
Out[16]= {10.844, Null}
This takes some time, but one has to do this only once. But when we start extracting data for a fixed index, we see that it is quite slow:
In[24]:= getResultsForFixedIndex[largeres,10]//Short//Timing
Out[24]= {2.687,{{{196,26},{53,36},{360,43},{104,144},<<157674>>,{31,305},{240,291},
{256,38},{352,469}},{<<1>>}}}
The main idea we will use here to speed it up is to pack individual lists inside the largeres, those for indices, combinations and frequencies. While the full list can not be packed, those parts individually can:
In[18]:= Timing[
subIndicesPacked = Developer`ToPackedArray[largeres[[All,1]]];
subCombsPacked = Developer`ToPackedArray[largeres[[All,2]]];
subFreqsPacked = Developer`ToPackedArray[largeres[[All,3]]];
]
Out[18]= {1.672,Null}
This also takes some time, but it is a one-time operation again.
The following functions will then be used to extract the results for a fixed index much more efficiently:
Clear[extractPositionFromSparseArray];
extractPositionFromSparseArray[HoldPattern[SparseArray[u___]]] := {u}[[4, 2, 2]]
Clear[getCombinationsAndFrequenciesForIndex];
getCombinationsAndFrequenciesForIndex[packedIndices_, packedCombs_,
packedFreqs_, index_Integer] :=
With[{positions =
extractPositionFromSparseArray[
SparseArray[1 - Unitize[packedIndices - index]]]},
{Extract[packedCombs, positions],Extract[packedFreqs, positions]}];
Now, we have:
In[25]:=
getCombinationsAndFrequenciesForIndex[subIndicesPacked,subCombsPacked,subFreqsPacked,10]
//Short//Timing
Out[25]= {0.094,{{{196,26},{53,36},{360,43},{104,144},<<157674>>,{31,305},{240,291},
{256,38},{352,469}},{<<1>>}}}
We get a 30 times speed-up w.r.t. the naive Select approach.
Some notes on complexity
Note that the second solution is faster because it uses optimized data structures, but its complexity is the same as that of Select- based one, which is, linear in the length of total list of unique combinations for all indices. Therefore, in theory, the previously - discussed solutions based on nested hash-table etc may be asymptotically better. The problem is, that in practice we will probably hit the memory limitations long before that. For the 10 million configurations sample, the above code was still 2-3 times faster than the fastest solution I posted before.
EDIT
The following modification:
Clear[getCombinationsAndFrequenciesForIndex];
getCombinationsAndFrequenciesForIndex[packedIndices_, packedCombs_,
packedFreqs_, index_Integer] :=
With[{positions =
extractPositionFromSparseArray[
SparseArray[Unitize[packedIndices - index], Automatic, 1]]},
{Extract[packedCombs, positions], Extract[packedFreqs, positions]}];
makes the code twice faster still. Moreover, for more sparse indices (say, calling the sample-generation function with parameters like generateConfigurations[2000, 500, 500, 5000000] ), the speed-up with respect to the Select- based function is about 100 times.
I'd probably use SparseArrays here (see update below), but if you insist on using functions and *Values to store and retrieve values an approach would be to have the first part (f[2] etc.) replaced by a symbol you create on the fly like:
Table[Symbol["f" <> IntegerString[i, 10, 3]], {i, 11}]
(* ==> {f001, f002, f003, f004, f005, f006, f007, f008, f009, f010, f011} *)
Symbol["f" <> IntegerString[56, 10, 3]]
(* ==> f056 *)
Symbol["f" <> IntegerString[56, 10, 3]][{3, 4}] = 12;
Symbol["f" <> IntegerString[56, 10, 3]][{23, 18}] = 12;
Symbol["f" <> IntegerString[56, 10, 3]] // Evaluate // DownValues
(* ==> {HoldPattern[f056[{3, 4}]] :> 12, HoldPattern[f056[{23, 18}]] :> 12} *)
f056 // DownValues
(* ==> {HoldPattern[f056[{3, 4}]] :> 12, HoldPattern[f056[{23, 18}]] :> 12} *)
Personally I prefer Leonid's solution, as it's much more elegant but YMMV.
Update
On OP's request, about using SparseArrays:
Large SparseArrays take up a fraction of the size of standard nested lists. We can make f to be a large (100,000 entires) sparse array of sparse arrays:
f = SparseArray[{_} -> 0, 100000];
f // ByteCount
(* ==> 672 *)
(* initialize f with sparse arrays, takes a few seconds with f this large *)
Do[ f[[i]] = SparseArray[{_} -> 0, {100, 110}], {i,100000}] // Timing//First
(* ==> 18.923 *)
(* this takes about 2.5% of the memory that a normal array would take: *)
f // ByteCount
(* ==> 108000040 *)
ConstantArray[0, {100000, 100, 100}] // ByteCount
(* ==> 4000000176 *)
(* counting phase *)
f[[1]][[1, 2]]++;
f[[1]][[1, 2]]++;
f[[1]][[42, 64]]++;
f[[2]][[100, 11]]++;
(* reporting phase *)
f[[1]] // ArrayRules
f[[2]] // ArrayRules
f // ArrayRules
(*
==>{{1, 2} -> 2, {42, 64} -> 1, {_, _} -> 0}
==>{{100, 11} -> 1, {_, _} -> 0}
==>{{1, 1, 2} -> 2, {1, 42, 64} -> 1, {2, 100, 11} -> 1, {_, _, _} -> 0}
*)
As you can see, ArrayRules makes a nice list with contributions and counts. This can be done for each f[i] separately or the whole bunch together (last line).
In some scenarios (depending upon the performance needed to generate the values), the following easy solution using an auxiliary list (f[i,0]) may be useful:
f[_Integer][{_Integer ..}] := 0;
f[_Integer, 0] := Sequence ## {};
Table[
r = RandomInteger[1000, 2];
f[h = RandomInteger[100000]][r] = RandomInteger[10];
f[h, 0] = Union[f[h, 0], {r}];
, {i, 10^6}];
ExtractConfigurationsAndOccurences[f_, i_] := {f[i, 0], f[i][#] & /# f[i, 0]};
Timing#ExtractConfigurationsAndOccurences[f, 10]
Out[252]= {4.05231*10^-15, {{{172, 244}, {206, 115}, {277, 861}, {299,
862}, {316, 194}, {361, 164}, {362, 830}, {451, 306}, {614,
769}, {882, 159}}, {5, 2, 1, 5, 4, 10, 4, 4, 1, 8}}}
Many thanks for everyone on the help provided. I've been thinking a lot about everybody's input and I believe that in the simulation setup the following is the optimal solution:
SetAttributes[linkedList, HoldAllComplete];
temporarySymbols = linkedList[];
SetAttributes[bookmarkSymbol, Listable];
bookmarkSymbol[symbol_]:=
With[{old = temporarySymbols}, temporarySymbols= linkedList[old,symbol]];
registerConfiguration[index_]:=registerConfiguration[index]=
Module[
{
cs = linkedList[],
bookmarkConfiguration,
accumulator
},
(* remember the symbols we generate so we can remove them later *)
bookmarkSymbol[{cs,bookmarkConfiguration,accumulator}];
getCs[index] := List ## Flatten[cs, Infinity, linkedList];
getCsAndFreqs[index] := {getCs[index],accumulator /# getCs[index]};
accumulator[_]=0;
bookmarkConfiguration[c_]:=bookmarkConfiguration[c]=
With[{oldCs=cs}, cs = linkedList[oldCs, c]];
Function[c,
bookmarkConfiguration[c];
accumulator[c]++;
]
]
pattern = Verbatim[RuleDelayed][Verbatim[HoldPattern][HoldPattern[registerConfiguration [_Integer]]],_];
clearSimulationData :=
Block[{symbols},
DownValues[registerConfiguration]=DeleteCases[DownValues[registerConfiguration],pattern];
symbols = List ## Flatten[temporarySymbols, Infinity, linkedList];
(*Print["symbols to purge: ", symbols];*)
ClearAll /# symbols;
temporarySymbols = linkedList[];
]
It is based on Leonid's solution from one of previous posts, appended with belsairus' suggestion to include extra indexing for configurations that have been processed. Previous approaches are adapted so that configurations can be naturally registered and extracted using the same code more or less. This is hitting two flies at once since bookkeeping and retrieval and strongly interrelated.
This approach will work better in the situation when one wants to add simulation data incrementally (all curves are normally noisy so one has to add runs incrementally to obtain good plots). The sparse array approach will work better when data are generated in one go and then analyzed, but I do not remember being personally in such a situation where I had to do that.
Also, I was rather naive thinking that the data extraction and generation could be treated separately. In this particular case it seems one should have both perspectives in mind. I profoundly apologise for bluntly dismissing any previous suggestions in this direction (there were few implicit ones).
There are some open/minor problems that I do not know how to handle, e.g. when clearing the symbols I cannot clear headers like accumulator$164, I can only clean subvalues associated with it. Have not clue why. Also, if With[{oldCs=cs}, cs = linkedList[oldCs, c]]; is changed into something like cs = linkedList[cs, c]]; configurations are not stored. Have no clue either why the second option does not work. But these minor problems are well defined satellite issues that one can address in the future. By and large the problem seems solved by the generous help from all involved.
Many thanks again for all the help.
Regards
Zoran
p.s. There are some timings, but to understand what is going on I will append the code that is used for benchmarking. In brief, idea is to generate lists of configurations and just Map through them by invoking registerConfiguration. This essentially simulates data generation process. Here is the code used for testing:
fillSimulationData[sampleArg_] :=MapIndexed[registerConfiguration[#2[[1]]][#1]&, sampleArg,{2}];
sampleForIndex[index_]:=
Block[{nsamples,min,max},
min = Max[1,Floor[(9/10)maxSamplesPerIndex]];
max = maxSamplesPerIndex;
nsamples = RandomInteger[{min, max}];
RandomInteger[{1,10},{nsamples,ntypes}]
];
generateSample :=
Table[sampleForIndex[index],{index, 1, nindexes}];
measureGetCsTime :=((First # Timing[getCs[#]])& /# Range[1, nindexes]) // Max
measureGetCsAndFreqsTime:=((First # Timing[getCsAndFreqs[#]])& /# Range[1, nindexes]) // Max
reportSampleLength[sampleArg_] := StringForm["Total number of confs = ``, smallest accumulator length ``, largest accumulator length = ``", Sequence## {Total[#],Min[#],Max[#]}& [Length /# sampleArg]]
The first example is relatively modest:
clearSimulationData;
nindexes=100;maxSamplesPerIndex = 1000; ntypes = 2;
largeSample1 = generateSample;
reportSampleLength[largeSample1];
Total number of confs = 94891, smallest accumulator length 900, largest accumulator length = 1000;
First # Timing # fillSimulationData[largeSample1]
gives 1.375 secs which is fast I think.
With[{times = Table[measureGetCsTime, {50}]},
ListPlot[times, Joined -> True, PlotRange -> {0, Max[times]}]]
gives times around 0.016 secs, and
With[{times = Table[measureGetCsAndFreqsTime, {50}]},
ListPlot[times, Joined -> True, PlotRange -> {0, Max[times]}]]
gives same times. Now the real killer
nindexes = 10; maxSamplesPerIndex = 100000; ntypes = 10;
largeSample3 = generateSample;
largeSample3 // Short
{{{2,2,1,5,1,3,7,9,8,2},92061,{3,8,6,4,9,9,7,8,7,2}},8,{{4,10,1,5,9,8,8,10,8,6},95498,{3,8,8}}}
reported as
Total number of confs = 933590, smallest accumulator length 90760, largest accumulator length = 96876
gives generation times of ca 1.969 - 2.016 secs which is unbeliavably fast. I mean this is like going through the gigantic list of ca one million elements and applying a function to each element.
The extraction times for configs and {configs, freqs} are roughly 0.015 and 0.03 secs respectivelly.
To me this is a mind blowing speed I would never expect from Mathematica!
The problem:
I am trying to solve this diffrential equation:
K[x_, x1_] := 1;
NDSolve[{A''[x] == Integrate[K[x, x1] A[x1], {x1, 0, 1}],
A[0] == 0, A'[1] == 1}, A[x], x]
and I'm getting errors (Function::slotn and NDSolve::ndnum)
(it should return a numeric function that is equal to 3/16 x^2 + 5/8 x)
I am looking for a way to solve this differential equation: Is there a way to write it in a better form, such that NDSolve will understand it? Is there another function or package that can help?
Note 1: In my full problem, K[x, x1] is not 1 -- it depends (in a complex way) on x and x1.
Note 2: Naively deriving the two sides of the equation with respect to x won't work, because the integral limits are definite.
My first impression:
It seems that Mathematica doesn't like me referencing a point in A[x] -- the same errors occur when I'm doing this simplified version:
NDSolve[{A''[x] == A[0.5], A[0] == 0, A'[1] == 1}, A[x], x]
(it should return a numeric function that is equal to 2/11 x^2 + 7/11 x)
In this case one can avoid this problem by analytically solving A''[x] == c, and then finding c, but in my first problem it seems to not work -- it only transform the differential equation to an integral one, which (N)DSolve doesn't solve afterwards.
I can suggest a way to reduce your equation to an integral equation, which can be solved numerically by approximating its kernel with a matrix, thereby reducing the integration to matrix multiplication.
First, it is clear that the equation can be integrated twice over x, first from 1 to x, and then from 0 to x, so that:
We can now discretize this equation, putting it on a equidistant grid:
Here, the A[x] becomes a vector, and the integrated kernel iniIntK becomes a matrix, while integration is replaced by a matrix multiplication. The problem is then reduced to a system of linear equations.
The easiest case (that I will consider here) is when the kernel iniIntK can be derived analytically - in this case this method will be quite fast. Here is the function to produce the integrated kernel as a pure function:
Clear[computeDoubleIntK]
computeDoubleIntK[kernelF_] :=
Block[{x, x1},
Function[
Evaluate[
Integrate[
Integrate[kernelF[y, x1], {y, 1, x}] /. x -> y, {y, 0, x}] /.
{x -> #1, x1 -> #2}]]];
In our case:
In[99]:= K[x_,x1_]:=1;
In[100]:= kernel = computeDoubleIntK[K]
Out[100]= -#1+#1^2/2&
Here is the function to produce the kernel matrix and the r.h,s vector:
computeDiscreteKernelMatrixAndRHS[intkernel_, a0_, aprime1_ ,
delta_, interval : {_, _}] :=
Module[{grid, rhs, matrix},
grid = Range[Sequence ## interval, delta];
rhs = a0 + aprime1*grid; (* constant plus a linear term *)
matrix =
IdentityMatrix[Length[grid]] - delta*Outer[intkernel, grid, grid];
{matrix, rhs}]
To give a very rough idea how this may look like (I use here delta = 1/2):
In[101]:= computeDiscreteKernelMatrixAndRHS[kernel,0,1,1/2,{0,1}]
Out[101]= {{{1,0,0},{3/16,19/16,3/16},{1/4,1/4,5/4}},{0,1/2,1}}
We now need to solve the linear equation, and interpolate the result, which is done by the following function:
Clear[computeSolution];
computeSolution[intkernel_, a0_, aprime1_ , delta_, interval : {_, _}] :=
With[{grid = Range[Sequence ## interval, delta]},
Interpolation#Transpose[{
grid,
LinearSolve ##
computeDiscreteKernelMatrixAndRHS[intkernel, a0, aprime1, delta,interval]
}]]
Here I will call it with a delta = 0.1:
In[90]:= solA = computeSolution[kernel,0,1,0.1,{0,1}]
Out[90]= InterpolatingFunction[{{0.,1.}},<>]
We now plot the result vs. the exact analytical solution found by #Sasha, as well as the error:
I intentionally chose delta large enough so the errors are visible. If you chose delta say 0.01, the plots will be visually identical. Of course, the price of taking smaller delta is the need to produce and solve larger matrices.
For kernels that can be obtained analytically, the main bottleneck will be in the LinearSolve, but in practice it is pretty fast (for matrices not too large). When kernels can not be integrated analytically, the main bottleneck will be in computing the kernel in many points (matrix creation. The matrix inverse has a larger asymptotic complexity, but this will start play a role for really large matrices - which are not necessary in this approach, since it can be combined with an iterative one - see below). You will typically define:
intK[x_?NumericQ, x1_?NumericQ] := NIntegrate[K[y, x1], {y, 1, x}]
intIntK[x_?NumericQ, x1_?NumericQ] := NIntegrate[intK[z, x1], {z, 0, x}]
As a way to speed it up in such cases, you can precompute the kernel intK on a grid and then interpolate, and the same for intIntK. This will however introduce additional errors, which you'll have to estimate (account for).
The grid itself needs not be equidistant (I just used it for simplicity), but may (and probably should) be adaptive, and generally non-uniform.
As a final illustration, consider an equation with a non-trivial but symbolically integrable kernel:
In[146]:= sinkern = computeDoubleIntK[50*Sin[Pi/2*(#1-#2)]&]
Out[146]= (100 (2 Sin[1/2 \[Pi] (-#1+#2)]+Sin[(\[Pi] #2)/2]
(-2+\[Pi] #1)))/\[Pi]^2&
In[157]:= solSin = computeSolution[sinkern,0,1,0.01,{0,1}]
Out[157]= InterpolatingFunction[{{0.,1.}},<>]
Here are some checks:
In[163]:= Chop[{solSin[0],solSin'[1]}]
Out[163]= {0,1.}
In[153]:=
diff[x_?NumericQ]:=
solSin''[x] - NIntegrate[50*Sin[Pi/2*(#1-#2)]&[x,x1]*solSin[x1],{x1,0,1}];
In[162]:= diff/#Range[0,1,0.1]
Out[162]= {-0.0675775,-0.0654974,-0.0632056,-0.0593575,-0.0540479,-0.0474074,
-0.0395995,-0.0308166,-0.0212749,-0.0112093,0.000369261}
To conclude, I just want to stress that one has to perform a careful error - estimation analysis for this method, which I did not do.
EDIT
You can also use this method to get the initial approximate solution, and then iteratively improve it using FixedPoint or other means - in this way you will have a relatively fast convergence and will be able to reach the required precision without the need to construct and solve huge matrices.
This is complementary to Leonid Shifrin's approach. We start with a linear function that interpolates the value and first derivative at the starting point. We use that in the integration with the given kernel function. We can then iterate, using each previous approximation in the integrated kernel that is used to make the next approximation.
I show an example below, using a more complicated kernel than just a constant function. I'll take it through two iterations and show tables of discrepancies.
kernel[x_, y_] := Sqrt[x]/(y^2 + 1/5)*Sin[x^2 + y]
intkern[x_?NumericQ, aa_] :=
NIntegrate[kernel[x, y]*aa[y], {y, 0, 1}, MinRecursion -> 2,
AccuracyGoal -> 3]
Clear[a];
a0 = 0;
a1 = 1;
a[0][x_] := a0 + a1*x
soln1 = a[1][x] /.
First[NDSolve[{(a[1]^\[Prime]\[Prime])[x] == intkern[x, a[0], y],
a[1][0] == a0, a[1][1] == a1}, a[1][x], {x, 0, 1}]];
a[1][x_] = soln1;
In[283]:= Table[a[1]''[x] - intkern[x, a[1]], {x, 0., 1, .1}]
Out[283]= {4.336808689942018*10^-19, 0.01145100326794241, \
0.01721655945379122, 0.02313249302884235, 0.02990900241909161, \
0.03778448183557359, 0.04676409320217928, 0.05657128568058478, \
0.06665818935524814, 0.07624149919589895, 0.08412643746245929}
In[285]:=
soln2 = a[2][x] /.
First[NDSolve[{(a[2]^\[Prime]\[Prime])[x] == intkern[x, a[1]],
a[2][0] == a0, a[2][1] == a1}, a[2][x], {x, 0, 1}]];
a[2][x_] = soln2;
In[287]:= Table[a[2]''[x] - intkern[x, a[2]], {x, 0., 1, .1}]
Out[287]= {-2.168404344971009*10^-19, -0.001009606971360516, \
-0.00152476679745811, -0.002045817184941901, -0.002645356229312557, \
-0.003343218015068372, -0.004121109614310836, -0.004977453722712966, \
-0.005846840469889258, -0.006731367269472544, -0.007404971586975062}
So we have errors of less than .01 at this stage. Not too bad. One drawback is that it was fairly slow to get the second approximation. There may be ways to tune NDSolve to improve on that.
This is complementary to Leonid's method for two reasons.
(1) If this did not converge well because the initial linear approximation was not sufficiently close to the true result, one might instead begin with an approximation found by a finite differencing scheme. That would be akin to what he did.
(2) He pretty much indicated this himself, as a method that might follow his and produce refinements.
Daniel Lichtblau
The way your equation is currently written A''[x] == const, and than constant is independent of x. Hence the solution always has the form of quadratic polynomial. Your problem then reduces to a solving for indeterminate coefficients:
In[13]:= A[x_] := a2 x^2 + a1 x + a0;
In[14]:= K[x_, x1_] := 1;
In[16]:= Solve[{A''[x] == Integrate[K[x, x1] A[x1], {x1, 0, 1}],
A[0] == 0, A'[1] == 1}, {a2, a1, a0}]
Out[16]= {{a2 -> 3/16, a1 -> 5/8, a0 -> 0}}
In[17]:= A[x] /. First[%]
Out[17]= (5 x)/8 + (3 x^2)/16
I have a linear recurrence problem where the next element relies on more than just the prior value, e.g. the Fibonacci sequence. One method calculating the nth element is to define it via a function call, e.g.
Fibonacci[0] = 0; Fibonacci[1] = 1;
Fibonacci[n_Integer?Positive] := Fibonacci[n] + Fibonacci[n - 1]
and for the sequence I'm working with, that is exactly what I do. (The definition is inside of a Module so I don't pollute Global`.) However, I am going to be using this with 210 - 213 points, so I'm concerned about the extra overhead when I just need the last term and none of the prior elements. I'd like to use Fold to do this, but Fold only passes the immediately prior result which means it is not directly useful for a general linear recurrence problem.
I'd like a pair of functions to replace Fold and FoldList that pass a specified number of prior sequence elements to the function, i.e.
In[1] := MultiFoldList[f, {1,2}, {3,4,5}] (* for lack of a better name *)
Out[1]:= {1, 2, f[3,2,1], f[4,f[3,2,1],2], f[5,f[4,f[3,2,1],2],f[3,2,1]]}
I had something that did this, but I closed the notebook prior to saving it. So, if I rewrite it on my own, I'll post it.
Edit: as to why I am not using RSolve or MatrixPower to solve this. My specific problem is I'm performing an n-point Pade approximant to analytically continue a function I only know at a set number of points on the imaginary axis, {zi}. Part of creating the approximant is to generate a set of coefficients, ai, which is another recurrence relation, that are then fed into the final relationship
A[n+1]== A[n] + (z - z[[n]]) a[[n+1]] A[n-1]
which is not amenable to either RSolve nor MatrixPower, at least that I can see.
Can RecurrenceTable perform this task for you?
Find the 1000th term in a recurrence depending on two previous values:
In[1]:= RecurrenceTable[{a[n] == a[n - 1] + a[n - 2],
a[1] == a[2] == 1}, a,
{n, {1000}}]
Out[1]= {4346655768693745643568852767504062580256466051737178040248172\
9089536555417949051890403879840079255169295922593080322634775209689623\
2398733224711616429964409065331879382989696499285160037044761377951668\
49228875}
Edit: If your recurrence is defined by a function f[m, n] that doesn't like to get evaluated for non-numeric m and n, then you could use Condition:
In[2]:= f[m_, n_] /; IntegerQ[m] && IntegerQ[n] := m + n
The recurrence table in terms of f:
In[3]:= RecurrenceTable[{a[n] == f[a[n - 1], a[n - 2]],
a[1] == a[2] == 1}, a, {n, {1000}}]
Out[3]= {4346655768693745643568852767504062580256466051737178040248172\
9089536555417949051890403879840079255169295922593080322634775209689623\
2398733224711616429964409065331879382989696499285160037044761377951668\
49228875}
A multiple foldlist can be useful but it would not be an efficient way to get linear recurrences evaluated for large inputs. A couple of alternatives are to use RSolve or matrix powers times a vector of initial values.
Here are these methods applied to example if nth term equal to n-1 term plus two times n-2 term.
f[n_] = f[n] /. RSolve[{f[n] == f[n - 1] + 2*f[n - 2], f[1] == 1, f[2] == 1},
f[n], n][[1]]
Out[67]= 1/3 (-(-1)^n + 2^n)
f2[n_Integer] := Last[MatrixPower[{{0, 1}, {2, 1}}, n - 2].{1, 1}]
{f[11], f2[11]}
Out[79]= {683, 683}
Daniel Lichtblau
Wolfram Research
Almost a convoluted joke, but you could use a side-effect of NestWhileList
fibo[n_] :=
Module[{i = 1, s = 1},
NestWhileList[ s &, 1, (s = Total[{##}]; ++i < n) &, 2]];
Not too bad performance:
In[153]:= First#Timing#fibo[10000]
Out[153]= 0.235
By changing the last 2 by any integer you may pass the last k results to your function (in this case Total[]).
LinearRecurrence and RecurrenceTable are very useful.
For small kernels, the MatrixPower method that Daniel gave is the fastest.
For some problems these may not be applicable, and you may need to roll your own.
I will be using Nest because I believe that is appropriate for this problem, but a similar construct can be used with Fold.
A specific example, the Fibonacci sequence. This may not be the cleanest possible for that, but I believe you will see the utility as I continue.
fib[n_] :=
First#Nest[{##2, # + #2} & ## # &, {1, 1}, n - 1]
fib[15]
Fibonacci[15]
Here I use Apply (##) so that I can address elements with #, #2, etc., rathern than #[[1]] etc. I use SlotSequence to drop the first element from the old list, and Sequence it into the new list at the same time.
If you are going to operate on the entire list at once, then a simple Append[Rest##, ... may be better. Either method can be easily generalized. For example, a simple linear recurrence implementation is
lr[a_, b_, n_Integer] := First#Nest[Append[Rest##, a.#] &, b, n - 1]
lr[{1,1}, {1,1}, 15]
(the kernel is in reverse order from the built in LinearRecurrence)