Standard deviation of one element - wolfram-mathematica

When I try to execute
StandardDeviation[{1}]
I get an error
StandardDeviation::shlen: "The argument {1} should have at least two elements"
But std of one element is 0, isn't it?

The standard deviation is commonly defined as the square-root of the unbiased estimator of the variance:
You can easily see that for a single sample, N=1 and you get 0/0, which is undefined. Hence your standard deviation is undefined for a single sample in Mathematica.
Now depending on your conventions, you might want to define a standard deviation for a single sample (either return Null or some value or 0). Here's an example that shows you how to define it for a single sample.
std[x_List] := Which[(Length[x] == 1), 0, True, StandardDeviation[x]]
std[{1}]
Out[1]= 0

The standard deviation of a constant is zero.
The estimated standard deviation of one sample is undefined.

If you want some formality:
p[x_] := DiracDelta[x - mu];
expValue = Integrate[x p[x] , {x, -Infinity, Infinity}]
stdDev = Sqrt[Integrate[(x - expValue)^2 p[x] , {x, -Infinity, Infinity}]]
(*
-> ConditionalExpression[mu, mu \[Element] Reals]
-> ConditionalExpression[0, mu \[Element] Reals]
*)
Edit
Or better, using Mathematica ProbabilityDistribution[]:
dist = ProbabilityDistribution[DiracDelta[x - mu], {x, -Infinity, Infinity}];
{Mean[dist], StandardDeviation[dist]}
(*
-> { mu, ConditionalExpression[0, mu \[Element] Reals]}
*)

If your population size is one element, then yes the standard deviation of your population will be 0. However typically standard deviations are used on samples, and not on the entire population, so instead of dividing by the number of elements in the sample, you divide by the number of elements minus one. This is due to the error inherent in performing calculations on a sample, rather than a population.
Performing a calculation of the standard deviation over a population of size 1 makes absolutely no sense, which I think is where the confusion is coming from. If you know that your population contains only one element then finding out the standard deviation of that element is pointless, so generally you will see the standard deviation of a single element written as undefined.

Standard deviation - which is a measure for the deviation of the actual value from the average of a given set - for a list of one element doesn't make any sense (you can set it to 0 if you want).

Related

Algorithm for recovering sparse vector x from Ax - Restricted Isometry Property

I encountered this problem and can't figure out how to do it. Help will be appreciated!
I'm pretty sure that the distribution of matrices need to satisfy the Restricted Isometry Property, but afterwards I don't know how to recover the vector:
Assume that x is s-sparse, for a constant s, which we set as s = 60
for convenience.
Find the best parameter k (you can use O()-notation) and a distribution of matrices A of size k×n, together with an efficient ‘recovery algorithm’, such that both of the
following two properties hold:
(a) With probability at least 0.99, for all s-sparse vectors x ∈ R
n with coefficients in {−1, 0, 1}, the recovery algorithm returns x upon input Ax, and
(b) For all non s-sparse vectors x with coefficients in {−1, 0, 1}, with
probability at least 0.99, the algorithm returns ‘FAIL’.
Note 1: The order of the quantifiers in (a) and (b) is reversed. In [(a)] you have
‘with prob.... for all’ and in[(b) you have ‘for all...with prob’.
Note 2: The requirement of the coefficients being in {−1, 0, 1} can be removed, and a
stronger statement can be made for integer coefficients.
Thank you!

Bug in density calculation std::piecewise_constant_distribution?

It seems that std::piecewise_constant_distribution computes the densities wrongly , at least with GCC and its standard library.
According to http://www.cplusplus.com/reference/random/piecewise_constant_distribution/:
The densities should be computed as:
Checking this manually reveals the bug!
This can be seen here: http://coliru.stacked-crooked.com/a/ca171bf600b5148f
The source code related to this is found in /usr/include/c++/4.8/bits/random.tcc (on linux) and the extract of the initialization function _M_initialize called by the constructor shows that there is something incorrect here:
const double __sum = std::accumulate(_M_den.begin(),
_M_den.end(), 0.0);
__detail::__normalize(_M_den.begin(), _M_den.end(), _M_den.begin(),
__sum); <----- WRONG
// THIS is not the cummulative distribution (since the above normalization does not give the probability of the intervalls!)
_M_cp.reserve(_M_den.size());
std::partial_sum(_M_den.begin(), _M_den.end(),
std::back_inserter(_M_cp));
// Make sure the last cumulative probability is one.
_M_cp[_M_cp.size() - 1] = 1.0;
// Dividing here by the interval length is WRONG!!!
for (size_t __k = 0; __k < _M_den.size(); ++__k)
_M_den[__k] /= _M_int[__k + 1] - _M_int[__k];
Here's the applicable part of the specification, straight from N4296:
As can be seen clearly, the summation applies only to the weights.
It's easy to see that there's something wrong with your testing code. Reducing the number of intervals to two, the first of length 1 and the second of length 2:
std::array<PREC,3> intervals {0, 1, 3};
and giving each interval weight equal to its length:
std::array<PREC,2> weights {1, 2};
One would expect the density to be constant. But your code reports:
Probability : 0.200000000000000011102230246252
Probability : 0.400000000000000022204460492503
The wording on cplusplus.com is ambiguous. cppreference.com gives a clearer explanation, and that's exactly what is written in C++ standard:

NMinimize with function containing random variables

I was wondering if it is possible to use NMinimize from mathematica with an objective function, which contains random variables? E.g. I have a function with parameters which follow a distribution (normal and truncated normal). I want to fit its histogram to data that I have and constructed an objective function which now I need to minimize (so, the objective function depends on the mus and sigmas of the parameters and need to be determined). If I run my code, there is an error message: It claims the parameter for the NormalDistribution needs to be positive (If I plug in numbers for the mus and sigmas of my objective functionby hand, i don't get an error message).
So, I am wondering if it is not possible for NMinimize to handle a non-analytic function.
Thanks!
Here, I give you an example code (please note that the original function is more complicated)
listS and listT are both lists of event times. I want to fit the curve of my statistical model for the times (here, a very simple one, it consists of a truncated normal distribution) to the data I have.
For this I compare the survival curves and need to minimize the sum of the least squares.
My problem is that the function NMinimize doesn't seem to work. (Please note, that the original objective function consists of a more complicated function with parameters that are random variables)
(* Both lists are supposed to be the list of times *)
SurvivalS[listeS_, x_] := Module[{res, survivald},
survivald = SurvivalDistribution[listeS];
res = SurvivalFunction[survivald, x];
res]
Residuum[listeT_, listeS_] :=
Table[(SurvivalS[listeT, listeT[[i]]] - SurvivalS[listeS, listeT[[i]]]), {i,
1, dataN}];
LeastSquare[listeT_, listeS_] :=
Total[Function[x, x^2] /#
Residuum[listeT,
listeS]];(* objective function, here ist is the sum of least square *)
objectiveF[mu_, sigma_] :=
Piecewise[{{LeastSquare[listeT, listeS[mu, sigma]], mu > 0 && sigma > 0}},
20 (1 + (sigma + mu)^2)];
pool = 100; (* No. points from MonteCarlo *)
listeS[mu_, sigma_] := RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[mu, sigma]],pool];(* simulated data *)
listeT = Sort[RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[.5, .9]],60]]; (* list of "measured" data *)
dataN = Length[listeT];
NMinimize[objectiveF[mu, .9], {{mu, .4}}]
The error message is: "RandomVariate::realprm: Parameter mu at position 1 in NormalDistribution[mu,0.9] is expected to be real. >>"

Please explain this code in Mathematica that creates a heat / intensity map

Graphics#Flatten[Table[
(*colors, dont mind*)
{ColorData["CMYKColors"][(a[[r, t]] - .000007)/(.0003 - 0.000007)],
(*point size, dont mind*)
PointSize[1/Sqrt[r]/10],
(*Coordinates for your points "a" is your data matrix *)
Point[
{(rr =Log[.025 + (.58 - .25)/64 r]) Cos#(tt = t 5 Degree),
rr Sin#tt}]
} &#
(*values for the iteration*)
, {r, 7, 64}, {t, 1, 72}], 1]
(*Rotation, dont mind*)
/. gg : Graphics[___] :> Rotate[gg, Pi/2]
Okay, I'll bite. First, Mathematica allows functions to be applied via one of several forms: standard form - f[x], prefix form - f # x, postfix form - f // x, and infix form - x ~ f ~ y. Belisarius's code uses both standard and prefix form.
So, let's look at the outermost functions first: Graphics # x /. gg : Graphics[___]:> Rotate[gg,Pi/2], where x is everything inside of Flatten. Essentially, what this does is create a Graphics object from x and using a named pattern (gg : Graphics[___]) rotates the resulting Graphics object by 90 degrees.
Now, to create a Graphics object, we need to supply a bunch of primitives and this is in the form of a nested list, where each sublist describes some element. This is done via the Table command which has the form: Table[ expr, iterators ]. Iterators can have several forms, but here they both have the form {var, min, max}, and since they lack a 4th term, they take on each value between min and max in integer steps. So, our iterators are {r, 7, 64} and {t, 1, 72}, and expr is evaluated for each value that they take on. Since, we have two iterators this produces a matrix, which would confuse Graphics, so we using Flatten[ Table[ ... ], 1] we take every element of the matrix and put it into a simple list.
Each element that Table produces is simply: color (ColorData), point size (PointSize), and point location (Point). So, with Flatten, we have created the following:
Graphics[{{color, point size, point}, {color, point size, point}, ... }]
The color generation is taken from the data, and it assumes that the data has been put into a list called a. The individual elements of a are accessed through the Part construct: [[]]. On the surface, the ColorData construct is a little odd, but it can be read as ColorData["CMYKColors"] returns a ColorDataFunction that produces a CMYK color value when a value between 0 and 1 is supplied. That is why the data from a is scaled the way it is.
The point size is generated from the radial coordinate. You'd expect with 1/Sqrt[r] the point size should be getting smaller as r increases, but the Log inverts the scale.
Similarly, the point location is produced from the radial and angular (t) variables, but Point only accepts them in {x,y} form, so he needed to convert them. Two odd constructs occur in the transformation from {r,t} to {x,y}: both rr and tt are Set (=) while calculating x allowing them to be used when calculating y. Also, the term t 5 Degree lets Mathematica know that the angle is in degrees, not radians. Additionally, as written, there is a bug: immediately following the closing }, the terms & and # should not be there.
Does that help?

Continuous Fourier transform on discrete data using Mathematica?

I have some periodic data, but the amount of data is not a multiple of
the period. How can I Fourier analyze this data? Example:
% Let's create some data for testing:
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}]
% I now receive this data, but have no idea that it came from the
formula above. I'm trying to reconstruct the formula just from 'data'.
% Looking at the first few non-constant terms of the Fourier series:
ListPlot[Table[Abs[Fourier[data]][[x]], {x,2,20}], PlotJoined->True,
PlotRange->All]
shows an expected spike at 6 (since the number of periods is really
25000/(623*2*Pi) or about 6.38663, though we don't know this).
% Now, how do I get back 6.38663? One way is to "convolve" the data with
arbitrary multiples of Cos[x].
convolve[n_] := Sum[data[[x]]*Cos[n*x], {x,1,25000}]
% And graph the "convolution" near n=6:
Plot[convolve[n],{n,5,7}, PlotRange->All]
we see a spike roughly where expected.
% We try FindMaximum:
FindMaximum[convolve[n],{n,5,7}]
but the result is useless and inaccurate:
FindMaximum::fmmp:
Machine precision is insufficient to achieve the requested accuracy or
precision.
Out[119]= {98.9285, {n -> 5.17881}}
because the function is very wiggly.
% By refining our interval (using visual analysis on the plots), we
finally find an interval where convolve[] doesn't wiggle too much:
Plot[convolve[n],{n,6.2831,6.2833}, PlotRange->All]
and FindMaximum works:
FindMaximum[convolve[n],{n,6.2831,6.2833}] // FortranForm
List(1.984759605826571e7,List(Rule(n,6.2831853071787975)))
% However, this process is ugly, requires human intervention, and
computing convolve[] is REALLY slow. Is there a better way to do this?
% Looking at the Fourier series of the data, can I somehow divine the
"true" number of periods is 6.38663? Of course, the actual result
would be 6.283185, since my data fits that better (because I'm only
sampling at a finite number of points).
Based on Mathematica help for the Fourier function / Applications / Frequency Identification:
Checked on version 7
n = 25000;
data = Table[N[753 + 919*Sin[x/623 - 125]], {x, 1, n}];
pdata = data - Total[data]/Length[data];
f = Abs[Fourier[pdata]];
pos = Ordering[-f, 1][[1]]; (*the position of the first Maximal value*)
fr = Abs[Fourier[pdata Exp[2 Pi I (pos - 2) N[Range[0, n - 1]]/n],
FourierParameters -> {0, 2/n}]];
frpos = Ordering[-fr, 1][[1]];
N[(pos - 2 + 2 (frpos - 1)/n)]
returns 6.37072
Look for the period length using autocorrelation to get an estimate:
autocorrelate[data_, d_] :=
Plus ## (Drop[data, d]*Drop[data, -d])/(Length[data] - d)
ListPlot[Table[{d, autocorrelate[data, d]}, {d, 0, 5000, 100}]]
A smart search for the first maximum away from d=0 may be the best estimate you can get form the available data?
(* the data *)
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}];
(* Find the position of the largest Fourier coefficient, after
removing the last half of the list (which is redundant) and the
constant term; the [[1]] is necessary because Ordering returns a list *)
f2 = Ordering[Abs[Take[Fourier[data], {2,Round[Length[data]/2+1]}]],-1][[1]]
(* Result: 6 *)
(* Directly find the least squares difference between all functions of
the form a+b*Sin[c*n-d], with intelligent starting values *)
sol = FindMinimum[Sum[((a+b*Sin[c*n-d]) - data[[n]])^2, {n,1,Length[data]}],
{{a,Mean[data]},{b,(Max[data]-Min[data])/2},{c,2*f2*Pi/Length[data]},d}]
(* Result (using //InputForm):
FindMinimum::sszero:
The step size in the search has become less than the tolerance prescribed by
the PrecisionGoal option, but the gradient is larger than the tolerance
specified by the AccuracyGoal option. There is a possibility that the method
has stalled at a point that is not a local minimum.
{2.1375902350021628*^-19, {a -> 753., b -> -919., c -> 0.0016051364365971107,
d -> 2.477886509998064}}
*)
(* Create a table of values for the resulting function to compare to 'data' *)
tab = Table[a+b*Sin[c*x-d], {x,1,Length[data]}] /. sol[[2]];
(* The maximal difference is effectively 0 *)
Max[Abs[data-tab]] // InputForm
(* Result: 7.73070496506989*^-12 *)
Although the above doesn't necessarily fully answer my question, I found it
somewhat remarkable.
Earlier, I'd tried using FindFit[] with Method -> NMinimize (which is
supposed to give a better global fit), but that didn't work well,
possibly because you can't give FindFit[] intelligent starting values.
The error I get bugs me but appears to be irrelevant.

Resources