Definition lookup speed: a performance issue - wolfram-mathematica

I have the following problem.
I need to build a very large number of definitions(*) such as
f[{1,0,0,0}] = 1
f[{0,1,0,0}] = 2
f[{0,0,1,0}] = 3
f[{0,0,0,1}] = 2
...
f[{2,3,1,2}] = 4
...
f[{n1,n2,n3,n4}] = some integer
...
This is just an example. The length of the argument list does not need to be 4 but can be anything.
I realized that the lookup for each value slows down with exponential complexity when the length of the argument list increases. Perhaps this is not so strange, since it is clear that in principle there is a combinatorial explosion in how many definitions Mathematica needs to store.
Though, I have expected Mathematica to be smart and that value extract should be constant time complexity. Apparently it is not.
Is there any way to speed up lookup time? This probably has to do with how Mathematica internally handles symbol definition lookups. Does it phrases the list until it finds the match? It seems that it does so.
All suggestions highly appreciated.
With best regards
Zoran
(*) I am working on a stochastic simulation software that generates all configurations of a system and needs to store how many times each configuration occurred. In that sense a list {n1, n2, ..., nT} describes a particular configuration of the system saying that there are n1 particles of type 1, n2 particles of type 2, ..., nT particles of type T. There can be exponentially many such configurations.

Could you give some detail on how you worked out that lookup time is exponential?
If it is indeed exponential, perhaps you could speed things up by using Hash on your keys (configurations), then storing key-value pairs in a list like {{key1,value1},{key2,value2}}, kept sorted by key and then using binary search (which should be log time). This should be very quick to code up but not optimum in terms of speed.
If that's not fast enough, one could think about setting up a proper hashtable implementation (which I thought was what the f[{0,1,0,1}]=3 approach did, without having checked).
But some toy example of the slowdown would be useful to proceed further...
EDIT: I just tried
test[length_] := Block[{f},
Do[
f[RandomInteger[{0, 10}, 100]] = RandomInteger[0, 10];,
{i, 1, length}
];
f[{0, 0, 0, 0, 1, 7, 0, 3, 7, 8, 0, 4, 5, 8, 0, 8, 6, 7, 7, 0, 1, 6,
3, 9, 6, 9, 2, 7, 2, 8, 1, 1, 8, 4, 0, 5, 2, 9, 9, 10, 6, 3, 6,
8, 10, 0, 7, 1, 2, 8, 4, 4, 9, 5, 1, 10, 4, 1, 1, 3, 0, 3, 6, 5,
4, 0, 9, 5, 4, 6, 9, 6, 10, 6, 2, 4, 9, 2, 9, 8, 10, 0, 8, 4, 9,
5, 5, 9, 7, 2, 7, 4, 0, 2, 0, 10, 2, 4, 10, 1}] // timeIt
]
with timeIt defined to accurately time even short runs as follows:
timeIt::usage = "timeIt[expr] gives the time taken to execute expr,
repeating as many times as necessary to achieve a total time of \
1s";
SetAttributes[timeIt, HoldAll]
timeIt[expr_] := Module[{t = Timing[expr;][[1]], tries = 1},
While[t < 1.,
tries *= 2;
t = Timing[Do[expr, {tries}];][[1]];
];
Return[t/tries]]
and then
out = {#, test[#]} & /# {10, 100, 1000, 10000, 100000, 100000};
ListLogLogPlot#out
(also for larger runs). So it seems constant time here.

Suppose you enter your information not like
f[{1,0,0,0}] = 1
f[{0,1,0,0}] = 2
but into a n1 x n2 x n3 x n4 matrix m like
m[[2,1,1,1]] = 1
m[[1,2,1,1]] = 2
etc.
(you could even enter values not as f[{1,0,0,0}]=1, but as f[{1,0,0,0},1] with
f[li_List, i_Integer] := Part[m, Apply[Sequence, li + 1]] = i;
f[li_List] := Part[m, Apply[Sequence, li + 1]];
where you have to initialize m e.g. by m = ConstantArray[0, {4, 4, 4, 4}];)
Let's compare timings:
testf[z_] :=
(
Do[ f[{n1, n2, n3, n4}] = RandomInteger[{1,100}], {n1,z}, {n2,z}, {n3,z},{n4,z}];
First[ Timing[ Do[ f[{n2, n4, n1, n3}], {n1, z}, {n2, z}, {n3, z}, {n4, z} ] ] ]
);
Framed[
ListLinePlot[
Table[{z, testf[z]}, {z, 22, 36, 2}],
PlotLabel -> Row[{"DownValue approach: ",
Round[MemoryInUse[]/1024.^2],
" MB needed"
}],
AxesLabel -> {"n1,n2,n3,n4", "time/s"},ImageSize -> 500
]
]
Clear[f];
testf2[z_] :=
(
m = RandomInteger[{1, 100}, {z, z, z, z}];
f2[ni__Integer] := m[[Sequence ## ({ni} + 1)]];
First[ Timing[ Do[ f2[{n2, n4, n1, n3}], {n1, z}, {n2, z}, {n3, z}, {n4, z}] ] ]
)
Framed[
ListLinePlot[
Table[{z, testf2[z]}, {z, 22, 36, 2}],
PlotLabel -> Row[{"Matrix approach: ",
Round[MemoryInUse[]/1024.^2],
" MB needed"
}],
AxesLabel -> {"n1,n2,n3,n4", "time/s"}, ImageSize -> 500
]
]
gives
So for larger sets up information a matrix approach seems clearly preferrable.
Of course, if you have truly large data, say more GB than you have RAM, then you just
have to use a database and DatabaseLink.

Related

prolog improvement of an algorithm

% SEND+MORE=MONEY
solve(VarList):-
VarList=[D,E,M,N,O,R,S,Y], % Οι μεταβλητές του προβλήματος
Digits=[0,1,2,3,4,5,6,7,8,9], % Οι τιμές των μεταβλητών (τα ψηφία)
member(D,Digits),
member(E,Digits),
member(M,Digits),
member(N,Digits), % Ανάθεση τιμών στις μεταβλητές
member(O,Digits),
member(R,Digits),
member(S,Digits),
member(Y,Digits),
M=0, S=0, % Περιορισμοί
E=D,
M=D, M=E,
N=D, N=E, N=M,
O=D, O=E, O=M, O=N,
R=D, R=E, R=M, R=N, R=O,
S=D, S=E, S=M, S=N, S=O, S=R,
Y=D, Y=E, Y=M, Y=N, Y=O, Y=R, Y=S,
S*1000+E*100+N*10+D + M*1000+O*100+R*10+E =:= M*10000+O*1000+N*100+E*10+Y.
if i decrease the number of varriables VarList. does it improves its speed?
if i S*1000+E*100+N*10+D + M*1000+O*100+R*10+E =:= M*10000+O*1000+N*100+E*10+Y
before the checks does it improve its speed?
A clpfd approach, I am putting my solution in case someone is looking into this problem.
:- use_module( library( clpfd)).
puzzle(X):-
X=([S,E,N,D]+[M,O,R,E]=[M,O,N,E,Y]),
Vars=[S,E,N,D,M,O,R,Y],Vars ins 0..9,
S*1000 + E*100 + N*10 + D +
M*1000 + O*100 + R*10 + E #=
M*1000 + O*1000 + N*100 + E*10 + Y,
S#\=0, M#\=0,
all_different(Vars),
labeling([],Vars).
?- puzzle(X).
X = ([1, 8, 0, 5]+[4, 2, 7, 8]=[4, 2, 0, 8, 3])
X = ([1, 8, 0, 5]+[6, 2, 7, 8]=[6, 2, 0, 8, 3])
X = ([1, 8, 0, 5]+[9, 2, 7, 8]=[9, 2, 0, 8, 3])
X = ([1, 8, 0, 6]+[3, 2, 7, 8]=[3, 2, 0, 8, 4])
X = ([1, 8, 0, 6]+[5, 2, 7, 8]=[5, 2, 0, 8, 4])
X = ([1, 8, 0, 6]+[9, 2, 7, 8]=[9, 2, 0, 8, 4])
X = ([2, 7, 0, 4]+[5, 3, 6, 7]=[5, 3, 0, 7, 1])....
No, if you move the line
S*1000+E*100+N*10+D + M*1000+O*100+R*10+E =:= M*10000+O*1000+N*100+E*10+Y
above what you call "Περιορισμοί" ("restrictions", according to Google Translate), it will only become slower because it will needlessly perform the arithmetic calculations which would have been avoided with the restrictions weeding out the illegal digits assignments first.
You also have erroneous equations S = 0, M = 0, E = D, ... when it should have been S =\= 0, M =\= 0, E =\= D, ..., since all the digits in these numbers are required to be unique and the first digits in the numbers can't be zeroes.
Overall your code's speed can be improved, by reducing the domain of available values with each choice of a digit value, using select/3, instead of making all the choices from the same unaffected domain Digits with member/2. This will much reduce the combinatorial choices space, and all the digits picked will be different by construction obviating the inequality checks. The tag cryptarithmetic-puzzle's info page and Q&A entries should have more discussion and / or examples of this technique (also, the tag zebra-puzzle).

How can I implement such a map-like operation in mathematica

I have a list and an arbitrary function taking 4 parameters, let's say {1, 11, 3, 13, 9, 0, 12, 7} and f[{x,y,z,w}]={x+y, z+w}, what I want to do is to form a new list such that 4 consecutive elements in the original list are evaluated to get a new value as the new list's component, and the evaluation has to be done in every 2 positions in the original list, in this case, the resulting list is:
{{12, 16}, {16, 9}, {9, 19}}
Note here 4 and 2 can change. How to do this conveniently in Mathematica? I imagine this as something like Map, but not sure how to relate.
There's an alternative to Map[f, Partition[...]]: Developer`PartitionMap. Which works exactly like Map[f, Partition[list, n, ...]]. So, your code would be
Needs["Developer`"]
f[{x_, y_, z_, w_}] = {x + y, z + w};
list = {1, 11, 3, 13, 9, 0, 12, 7};
PartitionMap[f,list, 4, 2]
giving the same result as Mark's answer.
f[{x_, y_, z_, w_}] = {x + y, z + w};
list = {1, 11, 3, 13, 9, 0, 12, 7};
f /# Partition[list, 4, 2]

Deriving Mean of Values within a Tensor

I have a 20000 x 185 x 5 tensor, which looks like
{{{a1_1,a2_1,a3_1,a4_1,a5_1},{b1_1,b2_1,b3_1,b4_1,b5_1}...
(continue for 185 times)}
{{a1_2,a2_2,a3_2,a4_2,a5_2},{b1_2,b2_2,b3_2,b4_2,b5_2}...
...
...
...
{{a1_20000,a2_20000,a3_20000,a4_20000,a5_20000},
{b1_20000,b2_20000,b3_20000,b4_20000,b5_20000}... }}
The 20000 represents iteration number, the 185 represents individuals, and each individual has 5 attributes. I need to construct a 185 x 5 matrix that stores the mean value for each individual's 5 attributes, averaged across the 20000 iterations.
Not sure what the best way to do this is. I know Mean[ ] works on matrices, but with a Tensor, the derived values might not be what I need. Also, Mathematica ran out of memory if I tried to do Mean[tensor]. Please provide some help or advice. Thank you.
When in doubt, drop the size of the dimensions. (You can still keep them distinct to easily see where things end up.)
(* In[1]:= *) data = Array[a, {4, 3, 2}]
(* Out[1]= *) {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1],
a[1, 2, 2]}, {a[1, 3, 1], a[1, 3, 2]}}, {{a[2, 1, 1],
a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}, {a[2, 3, 1],
a[2, 3, 2]}}, {{a[3, 1, 1], a[3, 1, 2]}, {a[3, 2, 1],
a[3, 2, 2]}, {a[3, 3, 1], a[3, 3, 2]}}, {{a[4, 1, 1],
a[4, 1, 2]}, {a[4, 2, 1], a[4, 2, 2]}, {a[4, 3, 1], a[4, 3, 2]}}}
(* In[2]:= *) Dimensions[data]
(* Out[2]= *) {4, 3, 2}
(* In[3]:= *) means = Mean[data]
(* Out[3]= *) {
{1/4 (a[1, 1, 1] + a[2, 1, 1] + a[3, 1, 1] + a[4, 1, 1]),
1/4 (a[1, 1, 2] + a[2, 1, 2] + a[3, 1, 2] + a[4, 1, 2])},
{1/4 (a[1, 2, 1] + a[2, 2, 1] + a[3, 2, 1] + a[4, 2, 1]),
1/4 (a[1, 2, 2] + a[2, 2, 2] + a[3, 2, 2] + a[4, 2, 2])},
{1/4 (a[1, 3, 1] + a[2, 3, 1] + a[3, 3, 1] + a[4, 3, 1]),
1/4 (a[1, 3, 2] + a[2, 3, 2] + a[3, 3, 2] + a[4, 3, 2])}
}
(* In[4]:= *) Dimensions[means]
(* Out[4]= *) {3, 2}
Mathematica ran out of memory if I tried to do Mean[tensor]
This is probably because intermediate results are larger than the final result. This is likely if the elements are not type Real or Integer. Example:
a = Tuples[{x, Sqrt[y], z^x, q/2, Mod[r, 1], Sin[s]}, {2, 4}];
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
{109125576, 124244808}
{269465456, 376960648}
If they are, and are in packed array form, perhaps the elements are such that the array in unpacked during processing.
Here is an example where the tensor is a packed array of small numbers, and unpacking does not occur.
a = RandomReal[99, {20000, 185, 5}];
PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
True
{163012808, 163016952}
{163018944, 163026688}
Here is the same size of tensor with very large numbers.
a = RandomReal[$MaxMachineNumber, {20000, 185, 5}];
Developer`PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
True
{163010680, 458982088}
{163122608, 786958080}
To elaborate a little on the other answers, there is no reason to expect Mathematica functions to operate materially differently on tensors than matrices because Mathemetica considers them both to be nested Lists, that are just of different nesting depth. How functions behave with lists depends on whether they're Listable, which you can check using Attributes[f], where fis the function you are interested in.
Your data list's dimensionality isn't actually that big in the scheme of things. Without seeing your actual data it is hard to be sure, but I suspect the reason you are running out of memory is that some of your data is non-numerical.
I don't know what you're doing incorrectly (your code will help). But Mean[] already works as you want it to.
a = RandomReal[1, {20000, 185, 5}];
b = Mean#a;
Dimensions#b
Out[1]= {185, 5}
You can even check that this is correct:
{Max#b, Min#b}
Out[2]={0.506445, 0.494061}
which is the expected value of the mean given that RandomReal uses a uniform distribution by default.
Assume you have the following data :
a = Table[RandomInteger[100], {i, 20000}, {j, 185}, {k, 5}];
In a straightforward manner You can find a table which stores the means of a[[1,j,k]],a[[2,j,k]],...a[[20000,j,k]]:
c = Table[Sum[a[[i, j, k]], {i, Length[a]}], {j, 185}, {k, 5}]/
Length[a] // N; // Timing
{37.487, Null}
or simply :
d = Total[a]/Length[a] // N; // Timing
{0.702, Null}
The second way is about 50 times faster.
c == d
True
To extend on Brett's answer a bit, when you call Mean on a n-dimensional tensor then it averages over the first index and returns an n-1 dimensional tensor:
a = RandomReal[1, {a1, a2, a3, ... an}];
Dimensions[a] (* This would have n entries in it *)
b = Mean[a];
Dimensions[b] (* Has n-1 entries, where averaging was done over the first index *)
In the more general case where you may wish to average over the i-th argument, you would have to transpose the data around first. For example, say you want to average the 3nd of 5 dimensions. You would need the 3rd element first, followed by the 1st, 2nd, 4th, 5th.
a = RandomReal[1, {5, 10, 2, 40, 10}];
b = Transpose[a, {2, 3, 4, 1, 5}];
c = Mean[b]; (* Now of dimensions {5, 10, 40, 10} *)
In other words, you would make a call to Transpose where you placed the i-th index as the first tensor index and moved everything before it ahead one. Anything that comes after the i-th index stays the same.
This tends to come in handy when your data comes in odd formats where the first index may not always represent different realizations of a data sample. I've had this come up, for example, when I had to do time averaging of large wind data sets where the time series came third (!) in terms of the tensor representation that was available.
You could imagine the generalizedTenorMean would look something like this then:
Clear[generalizedTensorMean];
generalizedTensorMean[A_, i_] :=
Module[{n = Length#Dimensions#A, ordering},
ordering =
Join[Table[x, {x, 2, i}], {1}, Table[x, {x, i + 1, n}]];
Mean#Transpose[A, ordering]]
This reduces to the plain-old-mean when i == 1. Try it out:
A = RandomReal[1, {2, 4, 6, 8, 10, 12, 14}];
Dimensions#A (* {2, 4, 6, 8, 10, 12, 14} *)
Dimensions#generalizedTensorMean[A, 1] (* {4, 6, 8, 10, 12, 14} *)
Dimensions#generalizedTensorMean[A, 7] (* {2, 4, 6, 8, 10, 12} *)
On a side note, I'm surprised that Mathematica doesn't support this by default. You don't always want to average over the first level of a list.

What is the best way to find the period of a (repeating) list in Mathematica?

What is the best way to find the period in a repeating list?
For example:
a = {4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2}
has repeat {4, 5, 1, 2, 3} with the remainder {4, 5, 1, 2} matching, but being incomplete.
The algorithm should be fast enough to handle longer cases, like so:
b = RandomInteger[10000, {100}];
a = Join[b, b, b, b, Take[b, 27]]
The algorithm should return $Failed if there is no repeating pattern like above.
Please see the comments interspersed with the code on how it works.
(* True if a has period p *)
testPeriod[p_, a_] := Drop[a, p] === Drop[a, -p]
(* are all the list elements the same? *)
homogeneousQ[list_List] := Length#Tally[list] === 1
homogeneousQ[{}] := Throw[$Failed] (* yes, it's ugly to put this here ... *)
(* auxiliary for findPeriodOfFirstElement[] *)
reduce[a_] := Differences#Flatten#Position[a, First[a], {1}]
(* the first element occurs every ?th position ? *)
findPeriodOfFirstElement[a_] := Module[{nl},
nl = NestWhileList[reduce, reduce[a], ! homogeneousQ[#] &];
Fold[Total#Take[#2, #1] &, 1, Reverse[nl]]
]
(* the period must be a multiple of the period of the first element *)
period[a_] := Catch#With[{fp = findPeriodOfFirstElement[a]},
Do[
If[testPeriod[p, a], Return[p]],
{p, fp, Quotient[Length[a], 2], fp}
]
]
Please ask if findPeriodOfFirstElement[] is not clear. I did this independently (for fun!), but now I see that the principle is the same as in Verbeia's solution, except the problem pointed out by Brett is fixed.
I was testing with
b = RandomInteger[100, {1000}];
a = Flatten[{ConstantArray[b, 1000], Take[b, 27]}];
(Note the low integer values: there will be lots of repeating elements within the same period *)
EDIT: According to Leonid's comment below, another 2-3x speedup (~2.4x on my machine) is possible by using a custom position function, compiled specifically for lists of integers:
(* Leonid's reduce[] *)
myPosition = Compile[
{{lst, _Integer, 1}, {val, _Integer}},
Module[{pos = Table[0, {Length[lst]}], i = 1, ctr = 0},
For[i = 1, i <= Length[lst], i++,
If[lst[[i]] == val, pos[[++ctr]] = i]
];
Take[pos, ctr]
],
CompilationTarget -> "C", RuntimeOptions -> "Speed"
]
reduce[a_] := Differences#myPosition[a, First[a]]
Compiling testPeriod gives a further ~20% speedup in a quick test, but I believe this will depend on the input data:
Clear[testPeriod]
testPeriod =
Compile[{{p, _Integer}, {a, _Integer, 1}},
Drop[a, p] === Drop[a, -p]]
Above methods are better if you have no noise. If your signal is only approximate then Fourier transform methods might be useful. I'll illustrate with a "parametrized" setup wherein the length and number of repetitions of the base signal, the length of the trailing part, and a bound on the noise perturbation are all variables one can play with.
noise = 20;
extra = 40;
baselen = 103;
base = RandomInteger[10000, {baselen}];
repeat = 5;
signal = Flatten[Join[ConstantArray[base, repeat], Take[base, extra]]];
noisysignal = signal + RandomInteger[{-noise, noise}, Length[signal]];
We compute the absolute value of the FFT. We adjoin zeros to both ends. The object will be to threshold by comparing to neighbors.
sigfft = Join[{0.}, Abs[Fourier[noisysignal]], {0}];
Now we create two 0-1 vectors. In one we threshold by making a 1 for each element in the fft that is greater than twice the geometric mean of its two neighbors. In the other we use the average (arithmetic mean) but we lower the size bound to 3/4. This was based on some experimentation. We count the number of 1s in each case. Ideally we'd get 100 for each, as that would be the number of nonzeros in a "perfect" case of no noise and no tail part.
In[419]:=
thresh1 =
Table[If[sigfft[[j]]^2 > 2*sigfft[[j - 1]]*sigfft[[j + 1]], 1,
0], {j, 2, Length[sigfft] - 1}];
count1 = Count[thresh1, 1]
thresh2 =
Table[If[sigfft[[j]] > 3/4*(sigfft[[j - 1]] + sigfft[[j + 1]]), 1,
0], {j, 2, Length[sigfft] - 1}];
count2 = Count[thresh2, 1]
Out[420]= 114
Out[422]= 100
Now we get our best guess as to the value of "repeats", by taking the floor of the total length over the average of our counts.
approxrepeats = Floor[2*Length[signal]/(count1 + count2)]
Out[423]= 5
So we have found that the basic signal is repeated 5 times. That can give a start toward refining to estimate the correct length (baselen, above). To that end we might try removing elements at the end and seeing when we get ffts closer to actually having runs of four 0s between nonzero values.
Something else that might work for estimating number of repeats is finding the modal number of zeros in run length encoding of the thresholded ffts. While I have not actually tried that, it looks like it might be robust to bad choices in the details of how one does the thresholding (mine were just experiments that seem to work).
Daniel Lichtblau
The following assumes that the cycle starts on the first element and gives the period length and the cycle.
findCyclingList[a_?VectorQ] :=
Module[{repeats1, repeats2, cl, cLs, vec},
repeats1 = Flatten#Differences[Position[a, First[a]]];
repeats2 = Flatten[Position[repeats1, First[repeats1]]];
If[Equal ## Differences[repeats2] && Length[repeats2] > 2(*
is potentially cyclic - first element appears cyclically *),
cl = Plus ### Partition[repeats1, First[Differences[repeats2]]];
cLs = Partition[a, First[cl]];
If[SameQ ## cLs (* candidate cycles all actually the same *),
vec = First[cLs];
{Length[vec], vec}, $Failed], $Failed] ]
Testing
b = RandomInteger[50, {100}];
a = Join[b, b, b, b, Take[b, 27]];
findCyclingList[a]
{100, {47, 15, 42, 10, 14, 29, 12, 29, 11, 37, 6, 19, 14, 50, 4, 38,
23, 3, 41, 39, 41, 17, 32, 8, 18, 37, 5, 45, 38, 8, 39, 9, 26, 33,
40, 50, 0, 45, 1, 48, 32, 37, 15, 37, 49, 16, 27, 36, 11, 16, 4, 28,
31, 46, 30, 24, 30, 3, 32, 31, 31, 0, 32, 35, 47, 44, 7, 21, 1, 22,
43, 13, 44, 35, 29, 38, 31, 31, 17, 37, 49, 22, 15, 28, 21, 8, 31,
42, 26, 33, 1, 47, 26, 1, 37, 22, 40, 27, 27, 16}}
b1 = RandomInteger[10000, {100}];
a1 = Join[b1, b1, b1, b1, Take[b1, 23]];
findCyclingList[a1]
{100, {1281, 5325, 8435, 7505, 1355, 857, 2597, 8807, 1095, 4203,
3718, 3501, 7054, 4620, 6359, 1624, 6115, 8567, 4030, 5029, 6515,
5921, 4875, 2677, 6776, 2468, 7983, 4750, 7609, 9471, 1328, 7830,
2241, 4859, 9289, 6294, 7259, 4693, 7188, 2038, 3994, 1907, 2389,
6622, 4758, 3171, 1746, 2254, 556, 3010, 1814, 4782, 3849, 6695,
4316, 1548, 3824, 5094, 8161, 8423, 8765, 1134, 7442, 8218, 5429,
7255, 4131, 9474, 6016, 2438, 403, 6783, 4217, 7452, 2418, 9744,
6405, 8757, 9666, 4035, 7833, 2657, 7432, 3066, 9081, 9523, 3284,
3661, 1947, 3619, 2550, 4950, 1537, 2772, 5432, 6517, 6142, 9774,
1289, 6352}}
This case should fail because it isn't cyclical.
findCyclingList[Join[b, Take[b, 11], b]]
$Failed
I tried to something with Repeated, e.g. a /. Repeated[t__, {2, 100}] -> {t} but it just doesn't work for me.
Does this work for you?
period[a_] :=
Quiet[Check[
First[Cases[
Table[
{k, Equal ## Partition[a, k]},
{k, Floor[Length[a]/2]}],
{k_, True} :> k
]],
$Failed]]
Strictly speaking, this will fail for things like
a = {1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5}
although this can be fixed by using something like:
(Equal ## Partition[a, k]) && (Equal ## Partition[Reverse[a], k])
(probably computing Reverse[a] just once ahead of time.)
I propose this. It borrows from both Verbeia and Brett's answers.
Do[
If[MatchQ ## Equal ## Partition[#, i, i, 1, _], Return ## i],
{i, #[[ 2 ;; Floor[Length##/2] ]] ~Position~ First##}
] /. Null -> $Failed &
It is not quite as efficient as Vebeia's function on long periods, but it is faster on short ones, and it is simpler as well.
I don't know how to solve it in mathematica, but the following algorithm (written in python) should work. It's O(n) so speed should be no concern.
def period(array):
if len(array) == 0:
return False
else:
s = array[0]
match = False
end = 0
i = 0
for k in range(1,len(array)):
c = array[k]
if not match:
if c == s:
i = 1
match = True
end = k
else:
if not c == array[i]:
match = False
i += 1
if match:
return array[:end]
else:
return False
# False
print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2,1]))
# [4, 5, 1, 2, 3]
print(period([4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2]))
# False
print(period([4]))
# [4, 2]
print(period([4,2,4]))
# False
print(period([4,2,1]))
# False
print(period([]))
Ok, just to show my own work here:
ModifiedTortoiseHare[a_List] := Module[{counter, tortoise, hare},
Quiet[
Check[
counter = 1;
tortoise = a[[counter]];
hare = a[[2 counter]];
While[(tortoise != hare) || (a[[counter ;; 2 counter - 1]] != a[[2 counter ;; 3 counter - 1]]),
counter++;
tortoise = a[[counter]];
hare = a[[2 counter]];
];
counter,
$Failed]]]
I'm not sure this is a 100% correct, especially with cases like {pattern,pattern,different,pattern, pattern} and it gets slower and slower when there are a lot of repeating elements, like so:
{ 1,2,1,1, 1,2,1,1, 1,2,1,1, ...}
because it is making too many expensive comparisons.
#include <iostream>
#include <vector>
using namespace std;
int period(vector<int> v)
{
int p=0; // period 0
for(int i=p+1; i<v.size(); i++)
{
if(v[i] == v[0])
{
p=i; // new potential period
bool periodical=true;
for(int i=0; i<v.size()-p; i++)
{
if(v[i]!=v[i+p])
{
periodical=false;
break;
}
}
if(periodical) return p;
i=p; // try to find new period
}
}
return 0; // no period
}
int main()
{
vector<int> v3{1,2,3,1,2,3,1,2,3};
cout<<"Period is :\t"<<period(v3)<<endl;
vector<int> v0{1,2,3,1,2,3,1,9,6};
cout<<"Period is :\t"<<period(v0)<<endl;
vector<int> v1{1,2,1,1,7,1,2,1,1,7,1,2,1,1};
cout<<"Period is :\t"<<period(v1)<<endl;
return 0;
}
This sounds like it might relate to sequence alignment. These algorithms are well studied, and might already be implemented in mathematica.

Permutations distinct under given symmetry (Mathematica 8 group theory)

Given a list of integers like {2,1,1,0} I'd like to list all permutations of that list that are not equivalent under given group. For instance, using symmetry of the square, the result would be {{2, 1, 1, 0}, {2, 1, 0, 1}}.
Approach below (Mathematica 8) generates all permutations, then weeds out the equivalent ones. I can't use it because I can't afford to generate all permutations, is there a more efficient way?
Update: actually, the bottleneck is in DeleteCases. The following list {2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0} has about a million permutations and takes 0.1 seconds to compute. Apparently there are supposed to be 1292 orderings after removing symmetries, but my approach doesn't finish in 10 minutes
removeEquivalent[{}] := {};
removeEquivalent[list_] := (
Sow[First[list]];
equivalents = Permute[First[list], #] & /# GroupElements[group];
DeleteCases[list, Alternatives ## equivalents]
);
nonequivalentPermutations[list_] := (
reaped = Reap#FixedPoint[removeEquivalent, Permutations#list];
reaped[[2, 1]]
);
group = DihedralGroup[4];
nonequivalentPermutations[{2, 1, 1, 0}]
What's wrong with:
nonequivalentPermutations[list_,group_]:= Union[Permute[list,#]& /# GroupElements[group];
nonequivalentPermutations[{2,1,1,0},DihedralGroup[4]]
I don't have Mathematica 8, so I can't test this. I just have Mathematica 7.
I got an elegant and fast solution from Maxim Rytin, relying on ConnectedComponents function
Module[{gens, verts, edges},
gens = PermutationList /# GroupGenerators#DihedralGroup[16];
verts =
Permutations#{2, 2, 2, 2, 2, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0};
edges = Join ## (Transpose#{verts, verts[[All, #]]} &) /# gens;
Length#ConnectedComponents#Graph[Rule ### Union#edges]] // Timing

Resources