Related
I have not used PackedArray before, but just started looking at using them from reading some discussion on them here today.
What I have is lots of large size 1D and 2D matrices of all reals, and no symbolic (it is a finite difference PDE solver), and so I thought that I should take advantage of using PackedArray.
I have an initialization function where I allocate all the data/grids needed. So I went and used ToPackedArray on them. It seems a bit faster, but I need to do more performance testing to better compare speed before and after and also compare RAM usage.
But while I was looking at this, I noticed that some operations in M automatically return lists in PackedArray already, and some do not.
For example, this does not return packed array
a = Table[RandomReal[], {5}, {5}];
Developer`PackedArrayQ[a]
But this does
a = RandomReal[1, {5, 5}];
Developer`PackedArrayQ[a]
and this does
a = Table[0, {5}, {5}];
b = ListConvolve[ {{0, 1, 0}, {1, 4, 1}, {0, 1, 1}}, a, 1];
Developer`PackedArrayQ[b]
and also matrix multiplication does return result in packed array
a = Table[0, {5}, {5}];
b = a.a;
Developer`PackedArrayQ[b]
But element wise multiplication does not
b = a*a;
Developer`PackedArrayQ[b]
My question : Is there a list somewhere which documents which M commands return PackedArray vs. not? (assuming data meets the requirements, such as Real, not mixed, no symbolic, etc..)
Also, a minor question, do you think it will be better to check first if a list/matrix created is already packed before calling calling ToPackedArray on it? I would think calling ToPackedArray on list already packed will not cost anything, as the call will return right away.
thanks,
update (1)
Just wanted to mention, that just found that PackedArray symbols not allowed in a demo CDF as I got an error uploading one with one. So, had to remove all my packing code out. Since I mainly write demos, now this topic is just of an academic interest for me. But wanted to thank everyone for time and good answers.
There isn't a comprehensive list. To point out a few things:
Basic operations with packed arrays will tend to remain packed:
In[66]:= a = RandomReal[1, {5, 5}];
In[67]:= Developer`PackedArrayQ /# {a, a.a, a*a}
Out[67]= {True, True, True}
Note above that that my version (8.0.4) doesn't unpack for element-wise multiplication.
Whether a Table will result in a packed array depends on the number of elements:
In[71]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {10}]]
Out[71]= False
In[72]:= Developer`PackedArrayQ[Table[RandomReal[], {24}, {11}]]
Out[72]= True
In[73]:= Developer`PackedArrayQ[Table[RandomReal[], {25}, {10}]]
Out[73]= True
On["Packing"] will turn on messages to let you know when things unpack:
In[77]:= On["Packing"]
In[78]:= a = RandomReal[1, 10];
In[79]:= Developer`PackedArrayQ[a]
Out[79]= True
In[80]:= a[[1]] = 0 (* force unpacking due to type mismatch *)
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {10}. >>
Out[80]= 0
Operations that do per-element inspection will usually unpack the array,
In[81]:= a = RandomReal[1, 10];
In[82]:= Position[a, Max[a]]
Developer`FromPackedArray::unpack: Unpacking array in call to Position. >>
Out[82]= {{4}}
There penalty for calling ToPackedArray on an already packed list is small enough that I wouldn't worry about it too much:
In[90]:= a = RandomReal[1, 10^7];
In[91]:= Timing[Do[Identity[a], {10^5}];]
Out[91]= {0.028089, Null}
In[92]:= Timing[Do[Developer`ToPackedArray[a], {10^5}];]
Out[92]= {0.043788, Null}
The frontend prefers packed to unpacked arrays, which can show up when dealing with Dynamic and Manipulate:
In[97]:= Developer`PackedArrayQ[{1}]
Out[97]= False
In[98]:= Dynamic[Developer`PackedArrayQ[{1}]]
Out[98]= True
When looking into performance, focus on cases where large lists are getting unpacked, rather than the small ones. Unless the small ones are in big loops.
This is just an addendum to Brett's answer:
SystemOptions["CompileOptions"]
will give you the lengths being used for which a function will return a packed array. So if you did need to pack a small list, as an alternative to using Developer`ToPackedArray you could temporarily set a smaller number for one of the compile options. e.g.
SetSystemOptions["CompileOptions" -> {"TableCompileLength" -> 20}]
Note also some difference between functions which to me at least doesn't seem intuitive so I generally have to test these kind of things whenever I use them rather than instinctively knowing what will work best:
f = # + 1 &;
g[x_] := x + 1;
data = RandomReal[1, 10^6];
On["Packing"]
Timing[Developer`PackedArrayQ[f /# data]]
{0.131565, True}
Timing[Developer`PackedArrayQ[g /# data]]
Developer`FromPackedArray::punpack1: Unpacking array with dimensions {1000000}.
{1.95083, False}
Another addition to Brett's answer: If a list is a packed array then a ToPackedArray is very fast since this checked quite early. Also you might find this valuable:
http://library.wolfram.com/infocenter/Articles/3141/
In general for numerics stuff look for talks from Rob Knapp and/or Mark Sofroniou.
When I develop numerics codes, I write the function and then use On["Packing"] to make sure that everything is packed that needs to be packed.
Concerning Mike's answer, the threshold has been introduced since for small stuff there is overhead. Where the threshold is is hardware dependent. It might be an idea to write a function that sets these threshold based on measurements done on the computer.
Please consider:
Clear[x]
expr = Sum[x^i, {i, 15}]^30;
CoefficientList[expr, x]; // Timing
Coefficient[Expand#expr, x, 234]; // Timing
Coefficient[expr, x, 234]; // Timing
{0.047, Null}
{0.047, Null}
{4.93, Null}
Help states:
Coefficient works whether or not expr is explicitly given in expanded form.
Is there a reason why Coefficient needs to be so slow in the last case?
Here is a hack which may enable your code to be fast, but I don't guarantee it to always work correctly:
ClearAll[withFastCoefficient];
SetAttributes[withFastCoefficient, HoldFirst];
withFastCoefficient[code_] :=
Block[{Binomial},
Binomial[x_, y_] := 10 /; ! FreeQ[Stack[_][[-6]], Coefficient];
code]
Using it, we get:
In[58]:= withFastCoefficient[Coefficient[expr,x,234]]//Timing
Out[58]= {0.172,3116518719381876183528738595379210}
The idea is that, Coefficient is using Binomial internally to estimate the number of terms, and then expands (calls Expand) if the number of terms is less than 1000, which you can check by using Trace[..., TraceInternal->True]. And when it does not expand, it computes lots of sums of large coefficient lists dominated by zeros, and this is apparently slower than expanding, for a range of expressions. What I do is to fool Binomial into returning a small number (10), but I also tried to make it such that it will only affect the Binomial called internally by Coefficient:
In[67]:= withFastCoefficient[f[Binomial[7,4]]Coefficient[expr,x,234]]
Out[67]= 3116518719381876183528738595379210 f[35]
I can not however guarantee that there are no examples where Binomial somewhere else in the code will be computed incorrectly.
EDIT
Of course, a safer alternative that always exists is to redefine Coefficient using the Villegas - Gayley trick, expanding an expression inside it and calling it again:
Unprotect[Coefficient];
Module[{inCoefficient},
Coefficient[expr_, args__] :=
Block[{inCoefficient = True},
Coefficient[Expand[expr], args]] /; ! TrueQ[inCoefficient]
];
Protect[Coefficient];
EDIT 2
My first suggestion had an advantage that we defined a macro which modified the properties of functions locally, but disadvantage that it was unsafe. My second suggestion is safer but modifies Coefficient globally, so it will always expand until we remove that definition. We can have the best of both worlds with the help of Internal`InheritedBlock, which creates a local copy of a given function. Here is the code:
ClearAll[withExpandingCoefficient];
SetAttributes[withExpandingCoefficient, HoldFirst];
withExpandingCoefficient[code_] :=
Module[{inCoefficient},
Internal`InheritedBlock[{Coefficient},
Unprotect[Coefficient];
Coefficient[expr_, args__] :=
Block[{inCoefficient = True},
Coefficient[Expand[expr], args]] /; ! TrueQ[inCoefficient];
Protect[Coefficient];
code
]
];
The usage is similar to the first case:
In[92]:= withExpandingCoefficient[Coefficient[expr,x,234]]//Timing
Out[92]= {0.156,3116518719381876183528738595379210}
The main Coefficient function remains unaffected however:
In[93]:= DownValues[Coefficient]
Out[93]= {}
Coefficient will not expand unless it deems it absolutely necessary to do so. This does indeed avoid memory explosions. I believe it has been this way since version 3 (I think I was working on it around 1995 or so).
It can also be faster to avoid expansion. Here is a simple example.
In[28]:= expr = Sum[x^i + y^j + z^k, {i, 15}, {j, 10}, {k, 20}]^20;
In[29]:= Coefficient[expr, x, 234]; // Timing
Out[29]= {0.81, Null}
But this next appears to hang in version 8, and takes at least a half minute in the development Mathematica (where Expand was changed).
Coefficient[Expand[expr], x, 234]; // Timing
Possibly some heuristics should be added to look for univariates that will not explode. Does not seem like a high priority item though.
Daniel Lichtblau
expr = Sum[x^i, {i, 15}]^30;
scoeff[ex_, var_, n_] /; PolynomialQ[ex, var] :=
ex + O[var]^(n + 1) /.
Verbatim[SeriesData][_, 0, c_List, nmin_, nmax_, 1] :>
If[nmax - nmin != Length[c], 0, c[[-1]]];
Timing[scoeff[expr, x, 234]]
seems quite fast, too.
After some experimentation following Rolf Mertig's answer, this appears to be the fastest method on the type of expression such as Sum[x^i, {i, 15}]^30:
SeriesCoefficient[expr, {x, 0, 234}]
The problem:
I am trying to solve this diffrential equation:
K[x_, x1_] := 1;
NDSolve[{A''[x] == Integrate[K[x, x1] A[x1], {x1, 0, 1}],
A[0] == 0, A'[1] == 1}, A[x], x]
and I'm getting errors (Function::slotn and NDSolve::ndnum)
(it should return a numeric function that is equal to 3/16 x^2 + 5/8 x)
I am looking for a way to solve this differential equation: Is there a way to write it in a better form, such that NDSolve will understand it? Is there another function or package that can help?
Note 1: In my full problem, K[x, x1] is not 1 -- it depends (in a complex way) on x and x1.
Note 2: Naively deriving the two sides of the equation with respect to x won't work, because the integral limits are definite.
My first impression:
It seems that Mathematica doesn't like me referencing a point in A[x] -- the same errors occur when I'm doing this simplified version:
NDSolve[{A''[x] == A[0.5], A[0] == 0, A'[1] == 1}, A[x], x]
(it should return a numeric function that is equal to 2/11 x^2 + 7/11 x)
In this case one can avoid this problem by analytically solving A''[x] == c, and then finding c, but in my first problem it seems to not work -- it only transform the differential equation to an integral one, which (N)DSolve doesn't solve afterwards.
I can suggest a way to reduce your equation to an integral equation, which can be solved numerically by approximating its kernel with a matrix, thereby reducing the integration to matrix multiplication.
First, it is clear that the equation can be integrated twice over x, first from 1 to x, and then from 0 to x, so that:
We can now discretize this equation, putting it on a equidistant grid:
Here, the A[x] becomes a vector, and the integrated kernel iniIntK becomes a matrix, while integration is replaced by a matrix multiplication. The problem is then reduced to a system of linear equations.
The easiest case (that I will consider here) is when the kernel iniIntK can be derived analytically - in this case this method will be quite fast. Here is the function to produce the integrated kernel as a pure function:
Clear[computeDoubleIntK]
computeDoubleIntK[kernelF_] :=
Block[{x, x1},
Function[
Evaluate[
Integrate[
Integrate[kernelF[y, x1], {y, 1, x}] /. x -> y, {y, 0, x}] /.
{x -> #1, x1 -> #2}]]];
In our case:
In[99]:= K[x_,x1_]:=1;
In[100]:= kernel = computeDoubleIntK[K]
Out[100]= -#1+#1^2/2&
Here is the function to produce the kernel matrix and the r.h,s vector:
computeDiscreteKernelMatrixAndRHS[intkernel_, a0_, aprime1_ ,
delta_, interval : {_, _}] :=
Module[{grid, rhs, matrix},
grid = Range[Sequence ## interval, delta];
rhs = a0 + aprime1*grid; (* constant plus a linear term *)
matrix =
IdentityMatrix[Length[grid]] - delta*Outer[intkernel, grid, grid];
{matrix, rhs}]
To give a very rough idea how this may look like (I use here delta = 1/2):
In[101]:= computeDiscreteKernelMatrixAndRHS[kernel,0,1,1/2,{0,1}]
Out[101]= {{{1,0,0},{3/16,19/16,3/16},{1/4,1/4,5/4}},{0,1/2,1}}
We now need to solve the linear equation, and interpolate the result, which is done by the following function:
Clear[computeSolution];
computeSolution[intkernel_, a0_, aprime1_ , delta_, interval : {_, _}] :=
With[{grid = Range[Sequence ## interval, delta]},
Interpolation#Transpose[{
grid,
LinearSolve ##
computeDiscreteKernelMatrixAndRHS[intkernel, a0, aprime1, delta,interval]
}]]
Here I will call it with a delta = 0.1:
In[90]:= solA = computeSolution[kernel,0,1,0.1,{0,1}]
Out[90]= InterpolatingFunction[{{0.,1.}},<>]
We now plot the result vs. the exact analytical solution found by #Sasha, as well as the error:
I intentionally chose delta large enough so the errors are visible. If you chose delta say 0.01, the plots will be visually identical. Of course, the price of taking smaller delta is the need to produce and solve larger matrices.
For kernels that can be obtained analytically, the main bottleneck will be in the LinearSolve, but in practice it is pretty fast (for matrices not too large). When kernels can not be integrated analytically, the main bottleneck will be in computing the kernel in many points (matrix creation. The matrix inverse has a larger asymptotic complexity, but this will start play a role for really large matrices - which are not necessary in this approach, since it can be combined with an iterative one - see below). You will typically define:
intK[x_?NumericQ, x1_?NumericQ] := NIntegrate[K[y, x1], {y, 1, x}]
intIntK[x_?NumericQ, x1_?NumericQ] := NIntegrate[intK[z, x1], {z, 0, x}]
As a way to speed it up in such cases, you can precompute the kernel intK on a grid and then interpolate, and the same for intIntK. This will however introduce additional errors, which you'll have to estimate (account for).
The grid itself needs not be equidistant (I just used it for simplicity), but may (and probably should) be adaptive, and generally non-uniform.
As a final illustration, consider an equation with a non-trivial but symbolically integrable kernel:
In[146]:= sinkern = computeDoubleIntK[50*Sin[Pi/2*(#1-#2)]&]
Out[146]= (100 (2 Sin[1/2 \[Pi] (-#1+#2)]+Sin[(\[Pi] #2)/2]
(-2+\[Pi] #1)))/\[Pi]^2&
In[157]:= solSin = computeSolution[sinkern,0,1,0.01,{0,1}]
Out[157]= InterpolatingFunction[{{0.,1.}},<>]
Here are some checks:
In[163]:= Chop[{solSin[0],solSin'[1]}]
Out[163]= {0,1.}
In[153]:=
diff[x_?NumericQ]:=
solSin''[x] - NIntegrate[50*Sin[Pi/2*(#1-#2)]&[x,x1]*solSin[x1],{x1,0,1}];
In[162]:= diff/#Range[0,1,0.1]
Out[162]= {-0.0675775,-0.0654974,-0.0632056,-0.0593575,-0.0540479,-0.0474074,
-0.0395995,-0.0308166,-0.0212749,-0.0112093,0.000369261}
To conclude, I just want to stress that one has to perform a careful error - estimation analysis for this method, which I did not do.
EDIT
You can also use this method to get the initial approximate solution, and then iteratively improve it using FixedPoint or other means - in this way you will have a relatively fast convergence and will be able to reach the required precision without the need to construct and solve huge matrices.
This is complementary to Leonid Shifrin's approach. We start with a linear function that interpolates the value and first derivative at the starting point. We use that in the integration with the given kernel function. We can then iterate, using each previous approximation in the integrated kernel that is used to make the next approximation.
I show an example below, using a more complicated kernel than just a constant function. I'll take it through two iterations and show tables of discrepancies.
kernel[x_, y_] := Sqrt[x]/(y^2 + 1/5)*Sin[x^2 + y]
intkern[x_?NumericQ, aa_] :=
NIntegrate[kernel[x, y]*aa[y], {y, 0, 1}, MinRecursion -> 2,
AccuracyGoal -> 3]
Clear[a];
a0 = 0;
a1 = 1;
a[0][x_] := a0 + a1*x
soln1 = a[1][x] /.
First[NDSolve[{(a[1]^\[Prime]\[Prime])[x] == intkern[x, a[0], y],
a[1][0] == a0, a[1][1] == a1}, a[1][x], {x, 0, 1}]];
a[1][x_] = soln1;
In[283]:= Table[a[1]''[x] - intkern[x, a[1]], {x, 0., 1, .1}]
Out[283]= {4.336808689942018*10^-19, 0.01145100326794241, \
0.01721655945379122, 0.02313249302884235, 0.02990900241909161, \
0.03778448183557359, 0.04676409320217928, 0.05657128568058478, \
0.06665818935524814, 0.07624149919589895, 0.08412643746245929}
In[285]:=
soln2 = a[2][x] /.
First[NDSolve[{(a[2]^\[Prime]\[Prime])[x] == intkern[x, a[1]],
a[2][0] == a0, a[2][1] == a1}, a[2][x], {x, 0, 1}]];
a[2][x_] = soln2;
In[287]:= Table[a[2]''[x] - intkern[x, a[2]], {x, 0., 1, .1}]
Out[287]= {-2.168404344971009*10^-19, -0.001009606971360516, \
-0.00152476679745811, -0.002045817184941901, -0.002645356229312557, \
-0.003343218015068372, -0.004121109614310836, -0.004977453722712966, \
-0.005846840469889258, -0.006731367269472544, -0.007404971586975062}
So we have errors of less than .01 at this stage. Not too bad. One drawback is that it was fairly slow to get the second approximation. There may be ways to tune NDSolve to improve on that.
This is complementary to Leonid's method for two reasons.
(1) If this did not converge well because the initial linear approximation was not sufficiently close to the true result, one might instead begin with an approximation found by a finite differencing scheme. That would be akin to what he did.
(2) He pretty much indicated this himself, as a method that might follow his and produce refinements.
Daniel Lichtblau
The way your equation is currently written A''[x] == const, and than constant is independent of x. Hence the solution always has the form of quadratic polynomial. Your problem then reduces to a solving for indeterminate coefficients:
In[13]:= A[x_] := a2 x^2 + a1 x + a0;
In[14]:= K[x_, x1_] := 1;
In[16]:= Solve[{A''[x] == Integrate[K[x, x1] A[x1], {x1, 0, 1}],
A[0] == 0, A'[1] == 1}, {a2, a1, a0}]
Out[16]= {{a2 -> 3/16, a1 -> 5/8, a0 -> 0}}
In[17]:= A[x] /. First[%]
Out[17]= (5 x)/8 + (3 x^2)/16
I have a question regarding Mathematica's global optimization capability. I came across this text related to the NAG toolbox (kind of white paper).
Now I tried to solve the test case from the paper. As expected Mathematica was pretty fast in solving it.
n=2;
fun[x_,y_]:=10 n+(x-2)^2-10Cos[2 Pi(x-2)]+(y-2)^2-10 Cos[2 Pi(y-2)];
NMinimize[{fun[x,y],-5<= x<= 5&&-5<= y<= 5},{x,y},Method->{"RandomSearch","SearchPoints"->13}]//AbsoluteTiming
Output was
{0.0470026,{0.,{x->2.,y->2.}}}
One can see the points visited by the optimization routine.
{sol, pts}=Reap[NMinimize[{fun[x,y],-5<= x<= 5&&-5<= y<= 5},{x,y},Method->`{"RandomSearch","SearchPoints"->13},EvaluationMonitor:>Sow[{x,y}]]];Show[ContourPlot[fun[x,y],{x,-5.5,5.5},{y,-5.5,5.5},ColorFunction->"TemperatureMap",Contours->Function[{min,max},Range[min,max,5]],ContourLines->True,PlotRange-> All],ListPlot[pts,Frame-> True,Axes-> False,PlotRange-> All,PlotStyle-> Directive[Red,Opacity[.5],PointSize[Large]]],Graphics[Map[{Black,Opacity[.7],Arrowheads[.026],Arrow[#]}&,Partition[pts//First,2,1]],PlotRange-> {{-5.5,5.5},{-5.5,5.5}}]]`
Now I thought of solving the same problem on higher dimension. For problems of five variables mathematica started to fall in the traps of local minimum even when large number of search points are allowed.
n=5;funList[x_?ListQ]:=Block[{i,symval,rule},
i=Table[ToExpression["x$"<>ToString[j]],{j,1,n}];symval=10 n+Sum[(i[[k]]-2)^2-10Cos[2Pi(i[[k]]-2)],{k,1,n}];rule=MapThread[(#1-> #2)&,{i,x}];symval/.rule]val=Table[RandomReal[{-5,5}],{i,1,n}];vars=Table[ToExpression["x$"<>ToString[j]],{j,1,n}];cons=Table[-5<=ToExpression["x$"<>ToString[j]]<= 5,{j,1,n}]/.List-> And;NMinimize[{funList[vars],cons},vars,Method->{"RandomSearch","SearchPoints"->4013}]//AbsoluteTiming
Output was not what we wold have liked to see. Took 49 sec in my core2duo machine and still it is a local minimum.
{48.5157750,{1.98992,{x$1->2.,x$2->2.,x$3->2.,x$4->2.99496,x$5->1.00504}}}
Then tried SimulatedAnealing with 100000 iterations.
NMinimize[{funList[vars],cons},vars,Method->"SimulatedAnnealing",MaxIterations->100000]//AbsoluteTiming
Output was still not agreeable.
{111.0733530,{0.994959,{x$1->2.,x$2->2.99496,x$3->2.,x$4->2.,x$5->2.}}}
Now Mathematica has an exact optimization algorithm called Minimize. Which as expected must fail on practicality but it fails very fast as the problem size increases.
n=3;funList[x_?ListQ]:=Block[{i,symval,rule},i=Table[ToExpression["x$"<>ToString[j]],{j,1,n}];symval=10 n+Sum[(i[[k]]-2)^2-10Cos[2 Pi(i[[k]]-2)],{k,1,n}];rule=MapThread[(#1-> #2)&,{i,x}];symval/.rule]val=Table[RandomReal[{-5,5}],{i,1,n}];vars=Table[ToExpression["x$"<>ToString[j]],{j,1,n}];cons=Table[-5<=ToExpression["x$"<>ToString[j]]<= 5,{j,1,n}]/.List-> And;Minimize[{funList[vars],cons},vars]//AbsoluteTiming
output is perfectly all right.
{5.3593065,{0,{x$1->2,x$2->2,x$3->2}}}
But if one changes the problem size one step further with n=4 you see the result. Solution does not appear for long time in my notebook.
Now the question simple does anybody here think there is a way to numerically solve this problem efficiently in Mathematica for higher dimensional cases? Lets share our ideas and experience. However one should remember that it is a benchmark nonlinear global optimization problem. Most numerical root-finding/minimization algorithms usually searches the local minimum.
BR
P
Increasing initial points allows me to get to the global minimum:
n = 5;
funList[x_?ListQ] := Total[10 + (x - 2)^2 - 10 Cos[2 Pi (x - 2)]]
val = Table[RandomReal[{-5, 5}], {i, 1, n}];
vars = Array[Symbol["x$" <> ToString[#]] &, n];
cons = Apply[And, Thread[-5 <= vars <= 5]];
These are the calls. The timing may not be too efficient though, but with randomized algorithms, one has to have enough initial samples, or a good feel for the function.
In[27]:= NMinimize[{funList[vars], cons}, vars,
Method -> {"DifferentialEvolution",
"SearchPoints" -> 5^5}] // AbsoluteTiming
Out[27]= {177.7857768, {0., {x$1 -> 2., x$2 -> 2., x$3 -> 2.,
x$4 -> 2., x$5 -> 2.}}}
In[29]:= NMinimize[{funList[vars], cons}, vars,
Method -> {"RandomSearch", "SearchPoints" -> 7^5}] // AbsoluteTiming
Out[29]= {609.3419281, {0., {x$1 -> 2., x$2 -> 2., x$3 -> 2.,
x$4 -> 2., x$5 -> 2.}}}
Have you seen this page of the documentation? It goes over the methods supported by NMinimize, with examples for each. One of the SimulatedAnnealing examples is Rastgrin's function (or one very similar), and the docs suggest that you need to increase the perturbation size to get good results.
While preparing an answer to Count how many different values a list takes in Mathematica I came across an instability (for lack of a better term) in both DeleteDuplicates and Tally that I do not understand.
Consider first:
a = {2.2000000000000005, 2.2, 2.1999999999999999};
a // InputForm
DeleteDuplicates#a // InputForm
Union#a // InputForm
Tally#a // InputForm
{2.2000000000000006`, 2.2, 2.1999999999999997`}
{2.2000000000000006`, 2.2, 2.1999999999999997`}
{2.1999999999999997`, 2.2, 2.2000000000000006`}
{{2.2000000000000006`, 3}}
This behavior is as I expected in each case. Tally compensates for the slight numerical differences and sees each element as being equivalent. Union and DeleteDuplicates see all elements as unique. (This behavior of Tally is not documented to my knowledge, but I have made use of it before.)
Now, consider this complication:
a = {11/5, 2.2000000000000005, 2.2, 2.1999999999999997};
a // InputForm
DeleteDuplicates#a // InputForm
Union#a // InputForm
Tally#a // InputForm
{11/5, 2.2000000000000006, 2.2, 2.1999999999999997}
{11/5, 2.2000000000000006, 2.2}
{2.1999999999999997, 2.2, 11/5, 2.2000000000000006}
{{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}
The output of Union is as anticipated, but the results from both DeleteDuplicates and Tally are surprising.
Why does DeleteDuplicates suddenly see 2.1999999999999997 as a duplicate to be eliminated?
Why does Tally suddenly see 2.2000000000000006 and 2.2 as distinct, when it did not before?
As a related point, it can be seen that packed arrays affect Tally:
a = {2.2000000000000005, 2.2, 2.1999999999999999};
a // InputForm
Tally#a // InputForm
{2.2000000000000006, 2.2, 2.1999999999999997}
{{2.2000000000000006`, 3}}
a = Developer`ToPackedArray#a;
a // InputForm
Tally#a // InputForm
{2.2000000000000006, 2.2, 2.1999999999999997}
{{2.2000000000000006`, 1}, {2.2, 2}}
The exhibited behaviour appears to be the result of a the usual woes associated with floating point arithmetic coupled with some questionable behaviour in some of the functions under discussion.
SameQ Is Not An Equivalence Relation
First on the slate: consider that SameQ is not an equivalence relation because it is not transitive:
In[1]:= $a = {11/5, 2.2000000000000005, 2.2, 2.1999999999999997};
In[2]:= SameQ[$a[[2]], $a[[3]]]
Out[2]= True
In[3]:= SameQ[$a[[3]], $a[[4]]]
Out[3]= True
In[4]:= SameQ[$a[[2]], $a[[4]]]
Out[4]= False (* !!! *)
So right out the gate, we are faced with erratic behaviour even before turning to the other functions.
This behaviour is due to the documented rule for SameQ that says that two real numbers are treated as "equal" if they "differ in their last binary digit":
In[5]:= {# // InputForm, Short#RealDigits[#, 2][[1, -10;;]]} & /# $a[[2;;4]] // TableForm
(* showing only the last ten binary digits for each *)
Out[5]//TableForm= 2.2000000000000006 {0,1,1,0,0,1,1,0,1,1}
2.2 {0,1,1,0,0,1,1,0,1,0}
2.1999999999999997 {0,1,1,0,0,1,1,0,0,1}
Note that, strictly speaking, $a[[3]] and $a[[4]] differ in the last two binary digits, but the magnitude of the difference is one bit of the lowest order.
DeleteDuplicates Does Not Really Use SameQ
Next, consider that the documentation states that DeleteDuplicates[...] is equivalent to DeleteDuplicates[..., SameQ]. Well, that is strictly true -- but probably not in the sense that you might expect:
In[6]:= DeleteDuplicates[$a] // InputForm
Out[6]//InputForm= {11/5, 2.2000000000000006, 2.2}
In[7]:= DeleteDuplicates[$a, SameQ] // InputForm
Out[7]//InputForm= {11/5, 2.2000000000000006, 2.2}
The same, as documented... but what about this:
In[8]:= DeleteDuplicates[$a, SameQ[#1, #2]&] // InputForm
Out[8]//InputForm= {11/5, 2.2000000000000006, 2.1999999999999997}
It appears that DeleteDuplicates goes through a different branch of logic when the comparison function is manifestly SameQ as opposed to a function whose behaviour is identical to SameQ.
Tally is... Confused
Tally shows similar, but not identical, erratic behaviour:
In[9]:= Tally[$a] // InputForm
Out[9]//InputForm= {{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}
In[10]:= Tally[$a, SameQ] // InputForm
Out[10]//InputForm= {{11/5, 1}, {2.2000000000000006, 1}, {2.2, 2}}
In[11]:= Tally[$a, SameQ[#1, #2]&] // InputForm
Out[11]//InputForm= {{11/5, 1}, {2.2000000000000006, 1}, {2.2000000000000006, 2}}
That last is particularly baffling, since the same number appears twice in the list with different counts.
Equal Suffers Similar Problems
Now, back to the problem of floating point equality. Equal fares a little bit better than SameQ -- but emphasis on "little". Equal looks at the last seven binary digits instead of the last one. That doesn't fix the problem, though... troublesome cases can always be found:
In[12]:= $x1 = 0.19999999999999823;
$x2 = 0.2;
$x3 = 0.2000000000000018;
In[15]:= Equal[$x1, $x2]
Out[15]= True
In[16]:= Equal[$x2, $x3]
Out[16]= True
In[17]:= Equal[$x1, $x3]
Out[17]= False (* Oops *)
The Villain Unmasked
The main culprit in all of this discussion is the floating-point real number format. It is simply not possible to represent arbitrary real numbers in full fildelity using a finite format. This is why Mathematica stresses symbolic form and makes every possible attempt to work with expressions in symbolic form for as long as possible. If one finds numeric forms to be unavoidable, then one must wade into that swamp called numerical analysis to sort out all of the corner cases involving equality and inequality.
Poor SameQ, Equal, DeleteDuplicates, Tally and all of their friends never stood a chance.
In my opinion, relying on anything for Tally or DeleteDuplicates with default (SameQ-like based) comparison function and numerical values is relying on implementation details, because SameQ does not have a well-defined semantics on numerical values. What you see is what is normally called "undefined behavior" in other languages. What one should be doing to get robust results is to use
DeleteDuplicates[a,Equal]
or
Tally[a,Equal]
and similarly for Union (although I would not use Union since explicit test leads to quadratic complexity for it). OTOH, if your desire is to understand the internal implementation details because you want to make use of them, I can not say much except to warn that this may cause more harm than good, particularly because these implementations are subject to change from version to version - even assuming that you get all their details right for some particular version.