Suppose I have two very large lists {a1, a2, …} and {b1, b2, …} where all ai and bj are large sparse arrays. For the sake of memory efficiency I store each list as one comprehensive sparse array.
Now I would like to compute some function f on all possible pairs of ai and bj where each result f[ai, bj] is a sparse array again. All these sparse arrays have the same dimensions, by the way.
While
Flatten[Outer[f, {a1, a2, ...}, {b1, b2, ...}, 1], 1]
returns the desired result (in principle) it appears to consume excessive amounts of memory. Not the least because the return value is a list of sparse arrays whereas one comprehensive sparse array turns out much more efficient in my cases of interest.
Is there an efficient alternative to the above use of Outer?
More specific example:
{SparseArray[{{1, 1, 1, 1} -> 1, {2, 2, 2, 2} -> 1}],
SparseArray[{{1, 1, 1, 2} -> 1, {2, 2, 2, 1} -> 1}],
SparseArray[{{1, 1, 2, 1} -> 1, {2, 2, 1, 2} -> 1}],
SparseArray[{{1, 1, 2, 2} -> -1, {2, 2, 1, 1} -> 1}],
SparseArray[{{1, 2, 1, 1} -> 1, {2, 1, 2, 2} -> 1}],
SparseArray[{{1, 2, 1, 2} -> 1, {2, 1, 2, 1} -> 1}],
SparseArray[{{1, 2, 2, 1} -> -1, {2, 1, 1, 2} -> 1}],
SparseArray[{{1, 2, 2, 2} -> 1, {2, 1, 1, 1} -> 1}]};
ByteCount[%]
list = SparseArray[%%]
ByteCount[%]
Flatten[Outer[Dot, list, list, 1], 1];
ByteCount[%]
list1x2 = SparseArray[%%]
ByteCount[%]
Flatten[Outer[Dot, list1x2, list, 1], 1];
ByteCount[%]
list1x3 = SparseArray[%%]
ByteCount[%]
etc. Not only are the raw intermediate results of Outer (lists of sparse arrays) extremely inefficient, Outer seems to consume way too much memory during the computation itself, too.
I will propose a solution which is rather complex but allows one to only use about twice as much memory during the computation as is needed to store the final result as a SparseArray. The price to pay for this will be a much slower execution.
The code
Sparse array construction / deconstruction API
Here is the code. First, a slightly modified (to address higher-dimensional sparse arrays) sparse array construction - deconstruction API, taken from this answer:
ClearAll[spart, getIC, getJR, getSparseData, getDefaultElement,
makeSparseArray];
HoldPattern[spart[SparseArray[s___], p_]] := {s}[[p]];
getIC[s_SparseArray] := spart[s, 4][[2, 1]];
getJR[s_SparseArray] := spart[s, 4][[2, 2]];
getSparseData[s_SparseArray] := spart[s, 4][[3]];
getDefaultElement[s_SparseArray] := spart[s, 3];
makeSparseArray[dims_List, jc_List, ir_List, data_List, defElem_: 0] :=
SparseArray ## {Automatic, dims, defElem, {1, {jc, ir}, data}};
Iterators
The following functions produce iterators. Iterators are a good way to encapsulate the iteration process.
ClearAll[makeTwoListIterator];
makeTwoListIterator[fname_Symbol, a_List, b_List] :=
With[{indices = Flatten[Outer[List, a, b, 1], 1]},
With[{len = Length[indices]},
Module[{i = 0},
ClearAll[fname];
fname[] := With[{ind = ++i}, indices[[ind]] /; ind <= len];
fname[] := Null;
fname[n_] :=
With[{ind = i + 1}, i += n;
indices[[ind ;; Min[len, ind + n - 1]]] /; ind <= len];
fname[n_] := Null;
]]];
Note that I could have implemented the above function more memory - efficiently and not use Outer in it, but for our purposes this won't be the major concern.
Here is a more specialized version, which produces interators for pairs of 2-dimensional indices.
ClearAll[make2DIndexInterator];
make2DIndexInterator[fname_Symbol, i : {iStart_, iEnd_}, j : {jStart_, jEnd_}] :=
makeTwoListIterator[fname, Range ## i, Range ## j];
make2DIndexInterator[fname_Symbol, ilen_Integer, jlen_Integer] :=
make2DIndexInterator[fname, {1, ilen}, {1, jlen}];
Here is how this works:
In[14]:=
makeTwoListIterator[next,{a,b,c},{d,e}];
next[]
next[]
next[]
Out[15]= {a,d}
Out[16]= {a,e}
Out[17]= {b,d}
We can also use this to get batch results:
In[18]:=
makeTwoListIterator[next,{a,b,c},{d,e}];
next[2]
next[2]
Out[19]= {{a,d},{a,e}}
Out[20]= {{b,d},{b,e}}
, and we will be using this second form.
SparseArray - building function
This function will build a SparseArray object iteratively, by getting chunks of data (also in SparseArray form) and gluing them together. It is basically code used in this answer, packaged into a function. It accepts the code piece used to produce the next chunk of data, wrapped in Hold (I could alternatively make it HoldAll)
Clear[accumulateSparseArray];
accumulateSparseArray[Hold[getDataChunkCode_]] :=
Module[{start, ic, jr, sparseData, dims, dataChunk},
start = getDataChunkCode;
ic = getIC[start];
jr = getJR[start];
sparseData = getSparseData[start];
dims = Dimensions[start];
While[True, dataChunk = getDataChunkCode;
If[dataChunk === {}, Break[]];
ic = Join[ic, Rest#getIC[dataChunk] + Last#ic];
jr = Join[jr, getJR[dataChunk]];
sparseData = Join[sparseData, getSparseData[dataChunk]];
dims[[1]] += First[Dimensions[dataChunk]];
];
makeSparseArray[dims, ic, jr, sparseData]];
Putting it all together
This function is the main one, putting it all together:
ClearAll[sparseArrayOuter];
sparseArrayOuter[f_, a_SparseArray, b_SparseArray, chunkSize_: 100] :=
Module[{next, wrapperF, getDataChunkCode},
make2DIndexInterator[next, Length#a, Length#b];
wrapperF[x_List, y_List] := SparseArray[f ### Transpose[{x, y}]];
getDataChunkCode :=
With[{inds = next[chunkSize]},
If[inds === Null, Return[{}]];
wrapperF[a[[#]] & /# inds[[All, 1]], b[[#]] & /# inds[[All, -1]]]
];
accumulateSparseArray[Hold[getDataChunkCode]]
];
Here, we first produce the iterator which will give us on demand portions of index pair list, used to extract the elements (also SparseArrays). Note that we will generally extract more than one pair of elements from two large input SparseArray-s at a time, to speed up the code. How many pairs we process at once is governed by the optional chunkSize parameter, which defaults to 100. We then construct the code to process these elements and put the result back into SparseArray, where we use an auxiliary function wrapperF. The use of iterators wasn't absolutely necessary (could use Reap-Sow instead, as with other answers), but allowed me to decouple the logic of iteration from the logic of generic accumulation of sparse arrays.
Benchmarks
First we prepare large sparse arrays and test our functionality:
In[49]:=
arr = {SparseArray[{{1,1,1,1}->1,{2,2,2,2}->1}],SparseArray[{{1,1,1,2}->1,{2,2,2,1}->1}],
SparseArray[{{1,1,2,1}->1,{2,2,1,2}->1}],SparseArray[{{1,1,2,2}->-1,{2,2,1,1}->1}],
SparseArray[{{1,2,1,1}->1,{2,1,2,2}->1}],SparseArray[{{1,2,1,2}->1,{2,1,2,1}->1}]};
In[50]:= list=SparseArray[arr]
Out[50]= SparseArray[<12>,{6,2,2,2,2}]
In[51]:= larger = sparseArrayOuter[Dot,list,list]
Out[51]= SparseArray[<72>,{36,2,2,2,2,2,2}]
In[52]:= (large= sparseArrayOuter[Dot,larger,larger])//Timing
Out[52]= {0.047,SparseArray[<2592>,{1296,2,2,2,2,2,2,2,2,2,2}]}
In[53]:= SparseArray[Flatten[Outer[Dot,larger,larger,1],1]]==large
Out[53]= True
In[54]:= MaxMemoryUsed[]
Out[54]= 21347336
Now we do the power tests
In[55]:= (huge= sparseArrayOuter[Dot,large,large,2000])//Timing
Out[55]= {114.344,SparseArray[<3359232>,{1679616,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}]}
In[56]:= MaxMemoryUsed[]
Out[56]= 536941120
In[57]:= ByteCount[huge]
Out[57]= 262021120
In[58]:= (huge1 = Flatten[Outer[Dot,large,large,1],1]);//Timing
Out[58]= {8.687,Null}
In[59]:= MaxMemoryUsed[]
Out[59]= 2527281392
For this particular example, the suggested method is 5 times more memory-efficient than the direct use of Outer, but about 15 times slower. I had to tweak the chunksize parameter (default is 100, but for the above I used 2000, to get the optimal speed / memory use combination). My method only used as a peak value twice as much memory as needed to store the final result. The degree of memory-savings as compared to Outer- based method will depend on the sparse arrays in question.
If lst1 and lst2 are your lists,
Reap[
Do[Sow[f[#1[[i]], #2[[j]]]],
{i, 1, Length##1},
{j, 1, Length##2}
] &[lst1, lst2];
] // Last // Last
does the job and may be more memory-efficient. On the other hand, maybe not. Nasser is right, an explicit example would be useful.
EDIT: Using Nasser's randomly-generated arrays, and for len=200, MaxMemoryUsed[] indicates that this form needs 170MB while the Outer form in the question takes 435MB.
Using your example list data, I believe that you will find the ability to Append to a SparseArray quite helpful.
acc = SparseArray[{}, {1, 2, 2, 2, 2, 2, 2}]
Do[AppendTo[acc, i.j], {i, list}, {j, list}]
Rest[acc]
I need Rest to drop the first zero-filled tensor in the result. The second argument of the seed SparseArray must be the dimensions of each of your elements with a prefixed 1. You may need to explicitly specify a background for the seed SparseArray to optimize performance.
Related
I have a very large array of numbers in the form of a third order tensor.I want to find the highest of all the values in that tensor. How can I do it in mathematica? The context is that a reaction is carried out by varying temperature pressure and vessel volume. I want to find the optimum combination of the three to maximize the product. Each element of the tensor represents a value of the product produced corresponding to a specific combination of temperature pressure and volume.
Given some matrix, tensor, or basically any list-of-lists of real numbers, you can simply use the Max function to determine the maximum value and then Position to say where it is. Assuming your data isn't enormous (requiring some conservative/careful approach to save time/memory), this should be fine.
For example, here is a random list of lists of of lists of reals:
data = Table[RandomReal[],
{i, 1, RandomInteger[{4, 8}]},
{j, 1, RandomInteger[{4, 8}]},
{k, 1, RandomInteger[{4, 8}]}
];
You can just do:
m = Max[data]
Position[data, m]
This will tell you the position of the maximum value. If you did random integers instead, you could have ties, in which case you might have repeats:
data = Table[RandomInteger[{1, 10}],
{i, 1, RandomInteger[{4, 8}]},
{j, 1, RandomInteger[{4, 8}]},
{k, 1, RandomInteger[{4, 8}]}
];
m = Max[data]
Position[data, m]
Table[RandomInteger[100, 3], 3]
Prepend[Ordering[%[[First[Ordering[Reverse#*Sort /# %, -1]]]], -1],
First[Ordering[Reverse#*Sort /# %, -1]]]
% stands for the tensor to sort, in this case it's a random tensor generated from Table[RandomInteger[100, 3], 3]
This gives the position and value in one shot.
(m = RandomReal[{-1, 1}, {4, 3, 2}]) // MatrixForm
First#MaximalBy[
Flatten[MapIndexed[ {##} &, #, {-1}], ArrayDepth[#] - 1],
First] &#m
{0.903213, {3, 2, 2}}
Here is an alternate that will work with ragged lists:
Module[{h},
First#MaximalBy[List ### Flatten[MapIndexed[h### &, #, {-1}]],
First]] &#{{1, 2, 3}, {4, 5, {2, 3}}}
{5, {2, 2}}
Suppose I want to construct a matrix A such that A[[i,i]]=f[x_,y_]+d[i], A[[i,i+1]]=u[i], A[[i+1,i]]=l[i], i=1,N . Say, f[x_,y_]=x^2+y^2.
How can I code this in Mathematica?
Additionally, if I want to integrate the first diagonal element of A, i.e. A[[1,1]] over x and y, both running from 0 to 1, how can I do that?
In[1]:= n = 4;
f[x_, y_] := x^2 + y^2;
A = Normal[SparseArray[{
{i_,i_}/;i>1 -> f[x,y]+ d[i],
{i_,j_}/;j-i==1 -> u[i],
{i_,j_}/;i-j==1 -> l[i-1],
{1, 1} -> Integrate[f[x,y]+d[1], {x,0,1}, {y,0,1}]},
{n, n}]]
Out[3]= {{2/3+d[1], l[1], 0, 0},
{u[1], x^2+y^2+ d[2], l[2], 0},
{0, u[2], x^2+y^2+d[3], l[3]},
{0, 0, u[3], x^2+y^2+d[4]}}
Band is tailored specifically for this:
myTridiagonalMatrix#n_Integer?Positive :=
SparseArray[
{ Band#{1, 1} -> f[x, y] + Array[d, n]
, Band#{1, 2} -> Array[u, n - 1]
, Band#{2, 1} -> Array[l, n - 1]}
, {n, n}]
Check it out (no need to define f, d, u, l):
myTridiagonalMatrix#5 // MatrixForm
Note that MatrixForm should not be part of a definition. For example, it's a bad idea to set A = (something) // MatrixForm. You will get a MatrixForm object instead of a table (= array of arrays) or a sparse array, and its only purpose is to be pretty-printed in FrontEnd. Trying to use MatrixForm in calculations will yield errors and will lead to unnecessary confusion.
Integrating the element at {1, 1}:
myTridiagonalMatrixWithFirstDiagonalElementIntegrated#n_Integer?Positive :=
MapAt[
Integrate[#, {x, 0, 1}, {y, 0, 1}]&
, myTridiagonalMatrix#n
, {1, 1}]
You may check it out without defining f or d, as well:
myTridiagonalMatrixWithFirstDiagonalElementIntegrated#5
The latter operation, however, looks suspicious. For example, it does not leave your matrix (or its corresponding linear system) invariant w.r.t. reasonable transformations. (This operation does not even preserve linearity of matrices.) You probably don't want to do it.
Comment on comment above: there's no need to define A[x_, y_] := … to Integrate[A[[1,1]], {x,0,1}, {y,0,1}]. Note that A[[1,1]] is totally different from A[1, 1]: the former is Part[A, 1, 1] which is a certain element of table A. A[1, 1] is a different expression: if A is some table then A[1, 1] is (that table)[1, 1], which is a valid expression but is normally considered meaningless.
I'm looking for a fast implementation for the following, I'll call it Position2D for lack of a better term:
Position2D[ matrix, sub_matrix ]
which finds the locations of sub_matrix inside matrix and returns the upper left and lower right row/column of a match.
For example, this:
Position2D[{
{0, 1, 2, 3},
{1, 2, 3, 4},
{2, 3, 4, 5},
{3, 4, 5, 6}
}, {
{2, 3},
{3, 4}
}]
should return this:
{
{{1, 3}, {2, 4}},
{{2, 2}, {3, 3}},
{{3, 1}, {4, 2}}
}
It should be fast enough to work quickly on 3000x2000 matrices with 100x100 sub-matrices. For simplicity, it is enough to only consider integer matrices.
Algorithm
The following code is based on an efficient custom position function to find positions of (possibly overlapping) integer sequences in a large integer list. The main idea is that we can first try to eficiently find the positions where the first row of the sub-matrix is in the Flatten-ed large matrix, and then filter those, extracting full sub-matrices and comparing to the sub-matrix of interest. This will be efficient for most cases except very pathological ones (those, for which this procedure would generate a huge number of potential position candidates, while the true number of entries of the sub-matrix would be much smaller. But such cases seem rather unlikely generally, and then, further improvements to this simple scheme can be made).
For large matrices, the proposed solution will be about 15-25 times faster than the solution of #Szabolcs when a compiled version of sequence positions function is used, and 3-5 times faster for the top-level implementation of sequence positions - finding function. The actual speedup depends on matrix sizes, it is more for larger matrices. The code and benchmarks are below.
Code
A generally efficient function for finding positions of a sub-list (sequence)
These helper functions are due to Norbert Pozar and taken from this Mathgroup thread. They are used to efficiently find starting positions of an integer sequence in a larger list (see the mentioned post for details).
Clear[seqPos];
fdz[v_] := Rest#DeleteDuplicates#Prepend[v, 0];
seqPos[list_List, seq_List] :=
Fold[
fdz[#1 (1 - Unitize[list[[#1]] - #2])] + 1 &,
fdz[Range[Length[list] - Length[seq] + 1] *
(1 - Unitize[list[[;; -Length[seq]]] - seq[[1]]])] + 1,
Rest#seq
] - Length[seq];
Example of use:
In[71]:= seqPos[{1,2,3,2,3,2,3,4},{2,3,2}]
Out[71]= {2,4}
A faster position-finding function for integers
However fast seqPos might be, it is still the major bottleneck in my solution. Here is a compiled-to-C version of this, which gives another 5x performance boost to my code:
seqposC =
Compile[{{list, _Integer, 1}, {seq, _Integer, 1}},
Module[{i = 1, j = 1, res = Table[0, {Length[list]}], ctr = 0},
For[i = 1, i <= Length[list], i++,
If[list[[i]] == seq[[1]],
While[j < Length[seq] && i + j <= Length[list] &&
list[[i + j]] == seq[[j + 1]],
j++
];
If[j == Length[seq], res[[++ctr]] = i];
j = 1;
]
];
Take[res, ctr]
], CompilationTarget -> "C", RuntimeOptions -> "Speed"]
Example of use:
In[72]:= seqposC[{1, 2, 3, 2, 3, 2, 3, 4}, {2, 3, 2}]
Out[72]= {2, 4}
The benchmarks below have been redone with this function (also the code for main function is slightly modified )
Main function
This is the main function. It finds positions of the first row in a matrix, and then filters them, extracting the sub-matrices at these positions and testing against the full sub-matrix of interest:
Clear[Position2D];
Position2D[m_, what_,seqposF_:Automatic] :=
Module[{posFlat, pos2D,sp = If[seqposF === Automatic,seqposC,seqposF]},
With[{dm = Dimensions[m], dwr = Reverse#Dimensions[what]},
posFlat = sp[Flatten#m, First#what];
pos2D =
Pick[Transpose[#], Total[Clip[Reverse#dm - # - dwr + 2, {0, 1}]],2] &#
{Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &#Last[dm];
Transpose[{#, Transpose[Transpose[#] + dwr - 1]}] &#
Select[pos2D,
m[[Last## ;; Last## + Last#dwr - 1,
First## ;; First## + First#dwr - 1]] == what &
]
]
];
For integer lists, the faster compiled subsequence position-finding function seqposC can be used (this is a default). For generic lists, one can supply e.g. seqPos, as a third argument.
How it works
We will use a simple example to dissect the code and explain its inner workings. This defines our test matrix and sub-matrix:
m = {{0, 1, 2, 3}, {1, 2, 3, 4}, {2, 3, 4, 5}};
what = {{2, 3}, {3, 4}};
This computes the dimensions of the above (it is more convenient to work with reversed dimensions for a sub-matrix):
In[78]:=
dm=Dimensions[m]
dwr=Reverse#Dimensions[what]
Out[78]= {3,4}
Out[79]= {2,2}
This finds a list of starting positions of the first row ({2,3} here) in the Flattened main matrix. These positions are at the same time "flat" candidate positions of the top left corner of the sub-matrix:
In[77]:= posFlat = seqPos[Flatten#m, First#what]
Out[77]= {3, 6, 9}
This will reconstruct the 2D "candidate" positions of the top left corner of a sub-matrix in a full matrix, using the dimensions of the main matrix:
In[83]:= posInterm = Transpose#{Mod[posFlat,#,1],IntegerPart[posFlat/#]+1}&#Last[dm]
Out[83]= {{3,1},{2,2},{1,3}}
We can then try using Select to filter them out, extracting the full sub-matrix and comparing to what, but we'll run into a problem here:
In[84]:=
Select[posInterm,
m[[Last##;;Last##+Last#dwr-1,First##;;First##+First#dwr-1]]==what&]
During evaluation of In[84]:= Part::take: Cannot take positions 3 through 4
in {{0,1,2,3},{1,2,3,4},{2,3,4,5}}. >>
Out[84]= {{3,1},{2,2}}
Apart from the error message, the result is correct. The error message itself is due to the fact that for the last position ({1,3}) in the list, the bottom right corner of the sub-matrix will be outside the main matrix. We could of course use Quiet to simply ignore the error messages, but that's a bad style. So, we will first filter those cases out, and this is what the line Pick[Transpose[#], Total[Clip[Reverse#dm - # - dwr + 2, {0, 1}]], 2] &# is for. Specifically, consider
In[90]:=
Reverse#dm - # - dwr + 2 &#{Mod[posFlat, #, 1],IntegerPart[posFlat/#] + 1} &#Last[dm]
Out[90]= {{1,2,3},{2,1,0}}
The coordinates of the top left corners should stay within a difference of dimensions of matrix and a sub-matrix. The above sub-lists were made of x and y coordiantes of top - left corners. I added 2 to make all valid results strictly positive. We have to pick only coordiantes at those positions in Transpose#{Mod[posFlat, #, 1], IntegerPart[posFlat/#] + 1} &#Last[dm] ( which is posInterm), at which both sub-lists above have strictly positive numbers. I used Total[Clip[...,{0,1}]] to recast it into picking only at those positions at which this second list has 2 (Clip converts all positive integers to 1, and Total sums numbers in 2 sublists. The only way to get 2 is when numbers in both sublists are positive).
So, we have:
In[92]:=
pos2D=Pick[Transpose[#],Total[Clip[Reverse#dm-#-dwr+2,{0,1}]],2]&#
{Mod[posFlat,#,1],IntegerPart[posFlat/#]+1}&#Last[dm]
Out[92]= {{3,1},{2,2}}
After the list of 2D positions has been filtered, so that no structurally invalid positions are present, we can use Select to extract the full sub-matrices and test against the sub-matrix of interest:
In[93]:=
finalPos =
Select[pos2D,m[[Last##;;Last##+Last#dwr-1,First##;;First##+First#dwr-1]]==what&]
Out[93]= {{3,1},{2,2}}
In this case, both positions are genuine. The final thing to do is to reconstruct the positions of the bottom - right corners of the submatrix and add them to the top-left corner positions. This is done by this line:
In[94]:= Transpose[{#,Transpose[Transpose[#]+dwr-1]}]&#finalPos
Out[94]= {{{3,1},{4,2}},{{2,2},{3,3}}}
I could have used Map, but for a large list of positions, the above code would be more efficient.
Example and benchmarks
The original example:
In[216]:= Position2D[{{0,1,2,3},{1,2,3,4},{2,3,4,5},{3,4,5,6}},{{2,3},{3,4}}]
Out[216]= {{{3,1},{4,2}},{{2,2},{3,3}},{{1,3},{2,4}}}
Note that my index conventions are reversed w.r.t. #Szabolcs' solution.
Benchmarks for large matrices and sub-matrices
Here is a power test:
nmat = 1000;
(* generate a large random matrix and a sub-matrix *)
largeTestMat = RandomInteger[100, {2000, 3000}];
what = RandomInteger[10, {100, 100}];
(* generate upper left random positions where to insert the submatrix *)
rposx = RandomInteger[{1,Last#Dimensions[largeTestMat] - Last#Dimensions[what] + 1}, nmat];
rposy = RandomInteger[{1,First#Dimensions[largeTestMat] - First#Dimensions[what] + 1},nmat];
(* insert the submatrix nmat times *)
With[{dwr = Reverse#Dimensions[what]},
Do[largeTestMat[[Last#p ;; Last#p + Last#dwr - 1,
First#p ;; First#p + First#dwr - 1]] = what,
{p,Transpose[{rposx, rposy}]}]]
Now, we test:
In[358]:= (ps1 = position2D[largeTestMat,what])//Short//Timing
Out[358]= {1.39,{{{1,2461},{100,2560}},<<151>>,{{1900,42},{1999,141}}}}
In[359]:= (ps2 = Position2D[largeTestMat,what])//Short//Timing
Out[359]= {0.062,{{{2461,1},{2560,100}},<<151>>,{{42,1900},{141,1999}}}}
(the actual number of sub-matrices is smaller than the number we try to generate, since many of them overlap and "destroy" the previously inserted ones - this is so because the sub-matrix size is a sizable fraction of the matrix size in our benchmark).
To compare, we should reverse the x-y indices in one of the solutions (level 3), and sort both lists, since positions may have been obtained in different order:
In[360]:= Sort#ps1===Sort[Reverse[ps2,{3}]]
Out[360]= True
I do not exclude a possibility that further optimizations are possible.
This is my implementation:
position2D[m_, k_] :=
Module[{di, dj, extractSubmatrix, pos},
{di, dj} = Dimensions[k] - 1;
extractSubmatrix[{i_, j_}] := m[[i ;; i + di, j ;; j + dj]];
pos = Position[ListCorrelate[k, m], ListCorrelate[k, k][[1, 1]]];
pos = Select[pos, extractSubmatrix[#] == k &];
{#, # + {di, dj}} & /# pos
]
It uses ListCorrelate to get a list of potential positions, then filters those that actually match. It's probably faster on packed real matrices.
As per Leonid's suggestion here's my solution. I know it isn't very efficient (it's about 600 times slower than Leonid's when I timed it) but it's very short, rememberable, and a nice illustration of a rarely used function, PartitionMap. It's from the Developer package, so it needs a Needs["Developer`"] call first.
Given that, Position2D can be defined as:
Position2D[m_, k_] := Position[PartitionMap[k == # &, m, Dimensions[k], {1, 1}], True]
This only gives the upper-left coordinates. I feel the lower-right coordinates are really redundant, since the dimensions of the sub-matrix are known, but if the need arises one can add those to the output by prepending {#, Dimensions[k] + # - {1, 1}} & /# to the above definition.
How about something like
Position2D[bigMat_?MatrixQ, smallMat_?MatrixQ] :=
Module[{pos, sdim = Dimensions[smallMat] - 1},
pos = Position[bigMat, smallMat[[1, 1]]];
Quiet[Select[pos, (MatchQ[
bigMat[[Sequence##Thread[Span[#, # + sdim]]]], smallMat] &)],
Part::take]]
which will return the top left-hand positions of the submatrices.
Example:
Position2D[{{0, 1, 2, 3}, {1, 2, 3, 4}, {2, 3, 4, 5}, {3, 5, 5, 6}},
{{2, 3}, {3, _}}]
(* Returns: {{1, 3}, {2, 2}, {3, 1}} *)
And to search a 1000x1000 matrix, it takes about 2 seconds on my old machine
SeedRandom[1]
big = RandomInteger[{0, 10}, {1000, 1000}];
Position2D[big, {{1, 1, _}, {1, 1, 1}}] // Timing
(* {1.88012, {{155, 91}, {295, 709}, {685, 661},
{818, 568}, {924, 45}, {981, 613}}} *)
Suppose we want to generate a list of primes p for which p + 2 is also prime.
A quick solution is to generate a complete list of the first n primes and use the Select function to return the elements which meet the condition.
Select[Table[Prime[k], {k, n}], PrimeQ[# + 2] &]
However, this is inefficient as it loads a large list into the memory before returning the filtered list. A For loop with Sow/Reap (or l = {}; AppendTo[l, k]) solves the memory issue, but it is far from elegant and is cumbersome to implement a number of times in a Mathematica script.
Reap[
For[k = 1, k <= n, k++,
p = Prime[k];
If[PrimeQ[p + 2], Sow[p]]
]
][[-1, 1]]
An ideal solution would be a built-in function which allows an option similar to this.
Table[Prime[k], {k, n}, AddIf -> PrimeQ[# + 2] &]
I will interpret this more as a question about automation and software engineering rather than about the specific problem at hand, and given a large number of solutions posted already. Reap and Sow are good means (possibly, the best in the symbolic setting) to collect intermediate results. Let us just make it general, to avoid code duplication.
What we need is to write a higher-order function. I will not do anything radically new, but will simply package your solution to make it more generally applicable:
Clear[tableGen];
tableGen[f_, iter : {i_Symbol, __}, addif : Except[_List] : (True &)] :=
Module[{sowTag},
If[# === {}, #, First##] &#
Last#Reap[Do[If[addif[#], Sow[#,sowTag]] &[f[i]], iter],sowTag]];
The advantages of using Do over For are that the loop variable is localized dynamically (so, no global modifications for it outside the scope of Do), and also the iterator syntax of Do is closer to that of Table (Do is also slightly faster).
Now, here is the usage
In[56]:= tableGen[Prime, {i, 10}, PrimeQ[# + 2] &]
Out[56]= {3, 5, 11, 17, 29}
In[57]:= tableGen[Prime, {i, 3, 10}, PrimeQ[# + 1] &]
Out[57]= {}
In[58]:= tableGen[Prime, {i, 10}]
Out[58]= {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}
EDIT
This version is closer to the syntax you mentioned (it takes an expression rather than a function):
ClearAll[tableGenAlt];
SetAttributes[tableGenAlt, HoldAll];
tableGenAlt[expr_, iter_List, addif : Except[_List] : (True &)] :=
Module[{sowTag},
If[# === {}, #, First##] &#
Last#Reap[Do[If[addif[#], Sow[#,sowTag]] &[expr], iter],sowTag]];
It has an added advantage that you may even have iterator symbols defined globally, since they are passed unevaluated and dynamically localized. Examples of use:
In[65]:= tableGenAlt[Prime[i], {i, 10}, PrimeQ[# + 2] &]
Out[65]= {3, 5, 11, 17, 29}
In[68]:= tableGenAlt[Prime[i], {i, 10}]
Out[68]= {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}
Note that since the syntax is different now, we had to use the Hold-attribute to prevent the passed expression expr from premature evaluation.
EDIT 2
Per #Simon's request, here is the generalization for many dimensions:
ClearAll[tableGenAltMD];
SetAttributes[tableGenAltMD, HoldAll];
tableGenAltMD[expr_, iter__List, addif : Except[_List] : (True &)] :=
Module[{indices, indexedRes, sowTag},
SetDelayed ## Prepend[Thread[Map[Take[#, 1] &, List ## Hold ### Hold[iter]],
Hold], indices];
indexedRes =
If[# === {}, #, First##] &#
Last#Reap[Do[If[addif[#], Sow[{#, indices},sowTag]] &[expr], iter],sowTag];
Map[
First,
SplitBy[indexedRes ,
Table[With[{i = i}, Function[Slot[1][[2, i]]]], {i,Length[Hold[iter]] - 1}]],
{-3}]];
It is considerably less trivial, since I had to Sow the indices together with the added values, and then split the resulting flat list according to the indices. Here is an example of use:
{i, j, k} = {1, 2, 3};
tableGenAltMD[i + j + k, {i, 1, 5}, {j, 1, 3}, {k, 1, 2}, # < 7 &]
{{{3, 4}, {4, 5}, {5, 6}}, {{4, 5}, {5, 6}, {6}}, {{5, 6}, {6}}, {{6}}}
I assigned the values to i,j,k iterator variables to illustrate that this function does localize the iterator variables and is insensitive to possible global values for them. To check the result, we may use Table and then delete the elements not satisfying the condition:
In[126]:=
DeleteCases[Table[i + j + k, {i, 1, 5}, {j, 1, 3}, {k, 1, 2}],
x_Integer /; x >= 7, Infinity] //. {} :> Sequence[]
Out[126]= {{{3, 4}, {4, 5}, {5, 6}}, {{4, 5}, {5, 6}, {6}}, {{5, 6}, {6}}, {{6}}}
Note that I did not do extensive checks so the current version may contain bugs and needs some more testing.
EDIT 3 - BUG FIX
Note the important bug-fix: in all functions, I now use Sow with a custom unique tag, and Reap as well. Without this change, the functions would not work properly when expression they evaluate also uses Sow. This is a general situation with Reap-Sow, and resembles that for exceptions (Throw-Catch).
EDIT 4 - SyntaxInformation
Since this is such a potentially useful function, it is nice to make it behave more like a built-in function. First we add syntax highlighting and basic argument checking through
SyntaxInformation[tableGenAltMD] = {"ArgumentsPattern" -> {_, {_, _, _., _.}.., _.},
"LocalVariables" -> {"Table", {2, -2}}};
Then, adding a usage message allows the menu item "Make Template" (Shift+Ctrl+k) to work:
tableGenAltMD::usage = "tableGenAltMD[expr,{i,imax},addif] will generate \
a list of values expr when i runs from 1 to imax, \
only including elements if addif[expr] returns true.
The default of addiff is True&."
A more complete and formatted usage message can be found in this gist.
I think the Reap/Sow approach is likely to be most efficient in terms of memory usage. Some alternatives might be:
DeleteCases[(With[{p=Prime[#]},If[PrimeQ[p+2],p,{}] ] ) & /# Range[K]),_List]
Or (this one might need some sort of DeleteCases to eliminate Null results):
FoldList[[(With[{p=Prime[#2]},If[PrimeQ[p+2],p] ] )& ,1.,Range[2,K] ]
Both hold a big list of integers 1 to K in memory, but the Primes are scoped inside the With[] construct.
Yes, this is another answer. Another alternative that includes the flavour of the Reap/Sow approach and the FoldList approach would be to use Scan.
result = {1};
Scan[With[{p=Prime[#]},If[PrimeQ[p+2],result={result,p}]]&,Range[2,K] ];
Flatten[result]
Again, this involves a long list of integers, but the intermediate Prime results are not stored because they are in the local scope of With. Because p is a constant in the scope of the With function, you can use With rather than Module, and gain a bit of speed.
You can perhaps try something like this:
Clear[f, primesList]
f = With[{p = Prime[#]},Piecewise[{{p, PrimeQ[p + 2]}}, {}] ] &;
primesList[k_] := Union#Flatten#(f /# Range[k]);
If you want both the prime p and the prime p+2, then the solution is
Clear[f, primesList]
f = With[{p = Prime[#]},Piecewise[{{p, PrimeQ[p + 2]}}, {}] ] &;
primesList[k_] :=
Module[{primes = f /# Range[k]},
Union#Flatten#{primes, primes + 2}];
Well, someone has to allocate memory somewhere for the full table size, since it is not known before hand what the final size will be.
In the good old days before functional programming :), this sort of thing was solved by allocating the maximum array size, and then using a separate index to insert to it so no holes are made. Like this
x=Table[0,{100}]; (*allocate maximum possible*)
j=0;
Table[ If[PrimeQ[k+2], x[[++j]]=k],{k,100}];
x[[1;;j]] (*the result is here *)
{1,3,5,9,11,15,17,21,27,29,35,39,41,45,51,57,59,65,69,71,77,81,87,95,99}
Here's another couple of alternatives using NextPrime:
pairs1[pmax_] := Select[Range[pmax], PrimeQ[#] && NextPrime[#] == 2 + # &]
pairs2[pnum_] := Module[{p}, NestList[(p = NextPrime[#];
While[p + 2 != (p = NextPrime[p])];
p - 2) &, 3, pnum]]
and a modification of your Reap/Sow solution that lets you specify the maximum prime:
pairs3[pmax_] := Module[{k,p},
Reap[For[k = 1, (p = Prime[k]) <= pmax, k++,
If[PrimeQ[p + 2], Sow[p]]]][[-1, 1]]]
The above are in order of increasing speed.
In[4]:= pairs2[10000]//Last//Timing
Out[4]= {3.48,1261079}
In[5]:= pairs1[1261079]//Last//Timing
Out[5]= {6.84,1261079}
In[6]:= pairs3[1261079]//Last//Timing
Out[7]= {0.58,1261079}
Example:
list:={ Plus[1,1], Times[2,3] }
When looking at list, I get
{2,6}
I want to keep them unevaluated (as above) so that list returns
{ Plus[1,1], Times[2,3] }
Later I want to evaluate the functions in list sequence to get
{2,6}
The number of unevaluated functions in list is not known beforehand. Besides Plus, user defined functions like f[x_] may be stored in list
I hope the example is clear.
What is the best way to do this?
The best way is to store them in Hold, not List, like so:
In[255]:= f[x_] := x^2;
lh = Hold[Plus[1, 1], Times[2, 3], f[2]]
Out[256]= Hold[1 + 1, 2 3, f[2]]
In this way, you have full control over them. At some point, you may call ReleaseHold to evaluate them:
In[258]:= ReleaseHold#lh
Out[258]= Sequence[2, 6, 4]
If you want the results in a list rather than Sequence, you may use just List##lh instead. If you need to evaluate a specific one, simply use Part to extract it:
In[261]:= lh[[2]]
Out[261]= 6
If you insist on your construction, here is a way:
In[263]:= l:={Plus[1,1],Times[2,3],f[2]};
Hold[l]/.OwnValues[l]
Out[264]= Hold[{1+1,2 3,f[2]}]
EDIT
In case you have some functions/symbols with UpValues which can evaluate even inside Hold, you may want to use HoldComplete in place of Hold.
EDIT2
As pointed by #Mr.Wizard in another answer, sometimes you may find it more convenient to have Hold wrapped around individual items in your sequence. My comment here is that the usefulness of both forms is amplified once we realize that it is very easy to transform one into another and back. The following function will split the sequence inside Hold into a list of held items:
splitHeldSequence[Hold[seq___], f_: Hold] := List ## Map[f, Hold[seq]]
for example,
In[274]:= splitHeldSequence[Hold[1 + 1, 2 + 2]]
Out[274]= {Hold[1 + 1], Hold[2 + 2]}
grouping them back into a single Hold is even easier - just Apply Join:
In[275]:= Join ## {Hold[1 + 1], Hold[2 + 2]}
Out[275]= Hold[1 + 1, 2 + 2]
The two different forms are useful in diferrent circumstances. You can easily use things such as Union, Select, Cases on a list of held items without thinking much about evaluation. Once finished, you can combine them back into a single Hold, for example, to feed as unevaluated sequence of arguments to some function.
EDIT 3
Per request of #ndroock1, here is a specific example. The setup:
l = {1, 1, 1, 2, 4, 8, 3, 9, 27}
S[n_] := Module[{}, l[[n]] = l[[n]] + 1; l]
Z[n_] := Module[{}, l[[n]] = 0; l]
placing functions in Hold:
In[43]:= held = Hold[Z[1], S[1]]
Out[43]= Hold[Z[1], S[1]]
Here is how the exec function may look:
exec[n_] := MapAt[Evaluate, held, n]
Now,
In[46]:= {exec[1], exec[2]}
Out[46]= {Hold[{0, 1, 1, 2, 4, 8, 3, 9, 27}, S[1]], Hold[Z[1], {1, 1, 1, 2, 4, 8, 3, 9, 27}]}
Note that the original variable held remains unchanged, since we operate on the copy. Note also that the original setup contains mutable state (l), which is not very idiomatic in Mathematica. In particular, the order of evaluations matter:
In[61]:= Reverse[{exec[2], exec[1]}]
Out[61]= {Hold[{0, 1, 1, 2, 4, 8, 3, 9, 27}, S[1]], Hold[Z[1], {2, 1, 1, 2, 4, 8, 3, 9, 27}]}
Whether or not this is desired depends on the specific needs, I just wanted to point this out. Also, while the exec above is implemented according to the requested spec, it implicitly depends on a global variable l, which I consider a bad practice.
An alternative way to store functions suggested by #Mr.Wizard can be achieved e.g. like
In[63]:= listOfHeld = splitHeldSequence[held]
Out[63]= {Hold[Z1], Hold[S1]}
and here
In[64]:= execAlt[n_] := MapAt[ReleaseHold, listOfHeld, n]
In[70]:= l = {1, 1, 1, 2, 4, 8, 3, 9, 27} ;
{execAlt[1], execAlt[2]}
Out[71]= {{{0, 1, 1, 2, 4, 8, 3, 9, 27}, Hold[S[1]]}, {Hold[Z[1]], {1, 1, 1, 2, 4, 8, 3, 9, 27}}}
The same comments about mutability and dependence on a global variable go here as well. This last form is also more suited to query the function type:
getType[n_, lh_] := lh[[n]] /. {Hold[_Z] :> zType, Hold[_S] :> sType, _ :> unknownType}
for example:
In[172]:= getType[#, listOfHeld] & /# {1, 2}
Out[172]= {zType, sType}
The first thing that spings to mind is to not use List but rather use something like this:
SetAttributes[lst, HoldAll];
heldL=lst[Plus[1, 1], Times[2, 3]]
There will surely be lots of more erudite suggestions though!
You can also use Hold on every element that you want held:
a = {Hold[2 + 2], Hold[2*3]}
You can use HoldForm on either the elements or the list, if you want the appearance of the list without Hold visible:
b = {HoldForm[2 + 2], HoldForm[2*3]}
c = HoldForm#{2 + 2, 2*3}
{2 + 2, 2 * 3}
And you can recover the evaluated form with ReleaseHold:
a // ReleaseHold
b // ReleaseHold
c // ReleaseHold
Out[8]= {4, 6}
Out[9]= {4, 6}
Out[10]= {4, 6}
The form Hold[2+2, 2*3] or that of a and b above are good because you can easily add terms with e.g. Append. For b type is it logically:
Append[b, HoldForm[8/4]]
For Hold[2+2, 2*3]:
Hold[2+2, 2*3] ~Join~ Hold[8/4]
Another way:
lh = Function[u, Hold#u, {HoldAll, Listable}];
k = lh#{2 + 2, Sin[Pi]}
(*
->{Hold[2 + 2], Hold[Sin[\[Pi]]]}
*)
ReleaseHold#First#k
(*
-> 4
*)