NMinimize with function containing random variables - wolfram-mathematica

I was wondering if it is possible to use NMinimize from mathematica with an objective function, which contains random variables? E.g. I have a function with parameters which follow a distribution (normal and truncated normal). I want to fit its histogram to data that I have and constructed an objective function which now I need to minimize (so, the objective function depends on the mus and sigmas of the parameters and need to be determined). If I run my code, there is an error message: It claims the parameter for the NormalDistribution needs to be positive (If I plug in numbers for the mus and sigmas of my objective functionby hand, i don't get an error message).
So, I am wondering if it is not possible for NMinimize to handle a non-analytic function.
Thanks!
Here, I give you an example code (please note that the original function is more complicated)
listS and listT are both lists of event times. I want to fit the curve of my statistical model for the times (here, a very simple one, it consists of a truncated normal distribution) to the data I have.
For this I compare the survival curves and need to minimize the sum of the least squares.
My problem is that the function NMinimize doesn't seem to work. (Please note, that the original objective function consists of a more complicated function with parameters that are random variables)
(* Both lists are supposed to be the list of times *)
SurvivalS[listeS_, x_] := Module[{res, survivald},
survivald = SurvivalDistribution[listeS];
res = SurvivalFunction[survivald, x];
res]
Residuum[listeT_, listeS_] :=
Table[(SurvivalS[listeT, listeT[[i]]] - SurvivalS[listeS, listeT[[i]]]), {i,
1, dataN}];
LeastSquare[listeT_, listeS_] :=
Total[Function[x, x^2] /#
Residuum[listeT,
listeS]];(* objective function, here ist is the sum of least square *)
objectiveF[mu_, sigma_] :=
Piecewise[{{LeastSquare[listeT, listeS[mu, sigma]], mu > 0 && sigma > 0}},
20 (1 + (sigma + mu)^2)];
pool = 100; (* No. points from MonteCarlo *)
listeS[mu_, sigma_] := RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[mu, sigma]],pool];(* simulated data *)
listeT = Sort[RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[.5, .9]],60]]; (* list of "measured" data *)
dataN = Length[listeT];
NMinimize[objectiveF[mu, .9], {{mu, .4}}]
The error message is: "RandomVariate::realprm: Parameter mu at position 1 in NormalDistribution[mu,0.9] is expected to be real. >>"

Related

How to apply function involving elements in a column of a table

I have a n x m matrix of data.
How do I create a function that has a sum that includes elements of each column, such that if I input a value, I would get a 1 x m row (where m > 100)?
More specifically, I am computing a discrete Fourier transform of the data in each column that should work for any input frequency I put in.
Here is my code for a single column:
(* Length of time data *)
n = Length[t]
(* Compute discrete fourier transform at specified frequency f *)
DFT[f_] := (t[[2]] - t[[1]]) Sum[
mat[[i + 1]] * Exp[2 Pi I f mat[[i + 1]]], {i, 0, n - 1}];
I'd like to extend this to m columns so that if I want to compute the DFT for a given column at a specific frequency, I can just extract an element of a 1 x m row.
I've considered a function like Map, but it seems like it'll directly apply my function by inputting the value of each element in the row, which isn't exactly what I want.
I am guessing you meant you just want to map a function on a column?
mat = RandomInteger[{0, 10}, {5, 6}];
map[f_, mat_?(MatrixQ[#] &), c_Integer /; c > 0] := f /# mat[[All, c]]
map[f, mat, 2]
It seems like you just need to get the column. The way that matrices are stored in Mathematica has the first coordinate as the row and the second as the column. All coordinates start at 1, not 0. To get an element at a specific coordinate, you use matrix[[row, column]]. If you want a whole row, matrix[[row]]. If you want a column, matrix[[All, column]]. Accordingly, here is one way you might adjust the DFT function:
DFT[f_, list_] := (t[[2]] - t[[1]]) Sum[
list[[i]] * Exp[2 Pi I f list[[i]]], {i, 1, n}];
yourColumnDFT = DFT[f, matrix[[All, columnNumber]]]
In fact, you can make this even simpler by removing the call to Sum because these operations automatically map over lists by index:
DFT[f_, list_] := (t[[2]] - t[[1]]) Total[list Exp[2 Pi I f list]]
By the way, there is a built-in function for this, Fourier (documentation here), which gives a slightly different DFT than yours but is also useful. I recommend looking for built-in functions for these tasks in the future, because Mathematica has a wide range of functionality like this and will save you a lot of trouble.

Set::write: Error When Creating Function

I'm very new to Mathematica, and I'm getting pretty frustrated with the errors I'm generating when it comes to creating a function. Below, I have a function I'm writing for 'centering' a matrix where rows correspond to examples, columns to features. The aim is to subtract from each element the mean of the column to which it belongs.
centerdata[datamat_] := (
numdatapoints =
Dimensions[datamat][[1]](*Get number of datapoints*)
numberfeatures =
Dimensions[datamat[[1]]][[1]](*Get number of datapoints*)
columnmean = ((Total[datamat])/numdatapoints)
For[i = 1, i < numdatapoints + 1, i++, (* For each row*)
For[j = 1, j < numfeatures + 1, j++, (* For each element*)
datum = datamat[[i]][[j]];
newval = (datum - (colmean[[j]]));
ReplacePart[datamat, {i, j} -> newval];
];
];
Return[datamat];
)
Running this function for a matrix, I get the following error:
"Set::write: Tag Times in 4 {5.84333,3.054,3.75867,1.19867} is Protected. >>
Set::write: "Tag Times in 4\ 150 is Protected."
Where {5.84333,3.054,3.75867,1.19867} is the first example in the data matrix and 150 is the number of examples in the matrix (I'm using the famous iris dataset, for anyone interested). These errors correspond to this code:
numdatapoints = Dimensions[datamat][[1]](*Get number of datapoints*)
numberfeatures = Dimensions[datamat[[1]]][[1]](*Get number of datapoints*)
Googling and toying with this error hasn't helped much as the replies in general relate to multiplication, which clearly isn't being done here.
Given a table (tab) of data the function Mean[tab] will return a list of the means of each column. Next, you want to subtract this (element-wise) from each row in the table, try this:
Map[Plus[-Mean[tab],#]&,tab]
I have a feeling that there is probably either an intrinsic statistical function to do this in one statement or that I am blind to a much simpler solution.
Since you are a beginner I suggest that you immediately read the documentation for:
Map, which is one of the fundamental operators in functional programming languages such as Mathematica pretends to be; and
pure functions whose use involves the cryptic symbols # and &.
If you are writing loops in Mathematica programs you are almost certainly mis-using the system.

Fastest way to count elements matching predicate

Earlier today I asked if there's an idiomatic way to count elements matching predicate function in Mathematica, as I was concerned with performance.
My initial approach for a given predicate pred was the following:
PredCount1[lst_, pred_] := Length#Select[lst, pred];
and I got a suggestion to instead use
PredCount2[lst_, pred_] := Count[lst, x_/;pred#x];
I started profiling these functions, with different lst sizes and pred functions, and added two more definitions:
PredCount3[lst_, pred_] := Count[Thread#pred#lst, True];
PredCount4[lst_, pred_] := Total[If[pred##, 1, 0] & /# lst];
My data samples were ranges between 1 and 10 million elements, and my test functions were EvenQ, #<5& and PrimeQ. The following graphs demonstrate time taken.
EvenQ
PredCount2 is slowest, 3 and 4 duke it out.
Comparison predicate: #<5&
I've selected this function, because it's close to what I need in my actual problem. Don't worry that this is a silly test function, it actually proves that the 4th function has some merit, which I actually ended up using it in my solution.
Same as EvenQ, but 3 is clearly slower than 4.
PrimeQ
This is just bizarre. Everything is flipped. I'm not suspecting caching as the culprit here, since worst values are for the function computed last.
So, what's the right (fastest) way to count the number of elements in a list, that match a given predicate function?
You are seeing the result of auto-compilation.
First, for a Listable function such as EvenQ and PrimeQ use of Thread is unnecessary:
EvenQ[{1, 2, 3}]
{False, True, False}
This also explains why PredCount3 performs well on these functions. (They are internally optimized for threading over a list.)
Now let us look at timings.
dat = RandomInteger[1*^6, 1*^6];
test = # < 5 &;
First#Timing[#[dat, test]] & /# {PredCount1, PredCount2, PredCount3, PredCount4}
{0.343, 0.437, 0.25, 0.047}
If we change a System Option to prevent auto-compilation within Map and run the test again:
SetSystemOptions["CompileOptions" -> {"MapCompileLength" -> Infinity}]
First#Timing[#[dat, test]] & /# {PredCount1, PredCount2, PredCount3, PredCount4}
{0.343, 0.452, 0.234, 0.765}
You can clearly see that without compilation PredCount4 is much slower. In short, if your test function can be compiled by Mathematica this is a good option.
Here are some other examples of fast counting using numeric functions.
The nature of the integers in the list can have a significant effect on the achievable timings. The use of Tally can improve performance if the range of the integers is constrained.
(* Count items in the list matching predicate, pred *)
PredCountID[lst_, pred_] :=
Select[Tally#lst, pred#First## &]\[Transpose] // Last // Total
(* Define the values over which to check timings *)
ranges = {100, 1000, 10000, 100000, 1000000};
sizes = {100, 1000, 10000, 100000, 1000000, 10000000,100000000};
For PrimeQ this function gives the following timings:
Showing that even in a 10^8 sized list, Primes can be counted in less than a tenth of a second if they are from the set of integers, of {0,...,100000} and below the resolution of Timing if they are within a small range such as 1 to 100.
Because the predicate only has to be applied over the set of Tally values, this approach is relatively insensitive to the exact predicate function.

Standard deviation of one element

When I try to execute
StandardDeviation[{1}]
I get an error
StandardDeviation::shlen: "The argument {1} should have at least two elements"
But std of one element is 0, isn't it?
The standard deviation is commonly defined as the square-root of the unbiased estimator of the variance:
You can easily see that for a single sample, N=1 and you get 0/0, which is undefined. Hence your standard deviation is undefined for a single sample in Mathematica.
Now depending on your conventions, you might want to define a standard deviation for a single sample (either return Null or some value or 0). Here's an example that shows you how to define it for a single sample.
std[x_List] := Which[(Length[x] == 1), 0, True, StandardDeviation[x]]
std[{1}]
Out[1]= 0
The standard deviation of a constant is zero.
The estimated standard deviation of one sample is undefined.
If you want some formality:
p[x_] := DiracDelta[x - mu];
expValue = Integrate[x p[x] , {x, -Infinity, Infinity}]
stdDev = Sqrt[Integrate[(x - expValue)^2 p[x] , {x, -Infinity, Infinity}]]
(*
-> ConditionalExpression[mu, mu \[Element] Reals]
-> ConditionalExpression[0, mu \[Element] Reals]
*)
Edit
Or better, using Mathematica ProbabilityDistribution[]:
dist = ProbabilityDistribution[DiracDelta[x - mu], {x, -Infinity, Infinity}];
{Mean[dist], StandardDeviation[dist]}
(*
-> { mu, ConditionalExpression[0, mu \[Element] Reals]}
*)
If your population size is one element, then yes the standard deviation of your population will be 0. However typically standard deviations are used on samples, and not on the entire population, so instead of dividing by the number of elements in the sample, you divide by the number of elements minus one. This is due to the error inherent in performing calculations on a sample, rather than a population.
Performing a calculation of the standard deviation over a population of size 1 makes absolutely no sense, which I think is where the confusion is coming from. If you know that your population contains only one element then finding out the standard deviation of that element is pointless, so generally you will see the standard deviation of a single element written as undefined.
Standard deviation - which is a measure for the deviation of the actual value from the average of a given set - for a list of one element doesn't make any sense (you can set it to 0 if you want).

Continuous Fourier transform on discrete data using Mathematica?

I have some periodic data, but the amount of data is not a multiple of
the period. How can I Fourier analyze this data? Example:
% Let's create some data for testing:
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}]
% I now receive this data, but have no idea that it came from the
formula above. I'm trying to reconstruct the formula just from 'data'.
% Looking at the first few non-constant terms of the Fourier series:
ListPlot[Table[Abs[Fourier[data]][[x]], {x,2,20}], PlotJoined->True,
PlotRange->All]
shows an expected spike at 6 (since the number of periods is really
25000/(623*2*Pi) or about 6.38663, though we don't know this).
% Now, how do I get back 6.38663? One way is to "convolve" the data with
arbitrary multiples of Cos[x].
convolve[n_] := Sum[data[[x]]*Cos[n*x], {x,1,25000}]
% And graph the "convolution" near n=6:
Plot[convolve[n],{n,5,7}, PlotRange->All]
we see a spike roughly where expected.
% We try FindMaximum:
FindMaximum[convolve[n],{n,5,7}]
but the result is useless and inaccurate:
FindMaximum::fmmp:
Machine precision is insufficient to achieve the requested accuracy or
precision.
Out[119]= {98.9285, {n -> 5.17881}}
because the function is very wiggly.
% By refining our interval (using visual analysis on the plots), we
finally find an interval where convolve[] doesn't wiggle too much:
Plot[convolve[n],{n,6.2831,6.2833}, PlotRange->All]
and FindMaximum works:
FindMaximum[convolve[n],{n,6.2831,6.2833}] // FortranForm
List(1.984759605826571e7,List(Rule(n,6.2831853071787975)))
% However, this process is ugly, requires human intervention, and
computing convolve[] is REALLY slow. Is there a better way to do this?
% Looking at the Fourier series of the data, can I somehow divine the
"true" number of periods is 6.38663? Of course, the actual result
would be 6.283185, since my data fits that better (because I'm only
sampling at a finite number of points).
Based on Mathematica help for the Fourier function / Applications / Frequency Identification:
Checked on version 7
n = 25000;
data = Table[N[753 + 919*Sin[x/623 - 125]], {x, 1, n}];
pdata = data - Total[data]/Length[data];
f = Abs[Fourier[pdata]];
pos = Ordering[-f, 1][[1]]; (*the position of the first Maximal value*)
fr = Abs[Fourier[pdata Exp[2 Pi I (pos - 2) N[Range[0, n - 1]]/n],
FourierParameters -> {0, 2/n}]];
frpos = Ordering[-fr, 1][[1]];
N[(pos - 2 + 2 (frpos - 1)/n)]
returns 6.37072
Look for the period length using autocorrelation to get an estimate:
autocorrelate[data_, d_] :=
Plus ## (Drop[data, d]*Drop[data, -d])/(Length[data] - d)
ListPlot[Table[{d, autocorrelate[data, d]}, {d, 0, 5000, 100}]]
A smart search for the first maximum away from d=0 may be the best estimate you can get form the available data?
(* the data *)
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}];
(* Find the position of the largest Fourier coefficient, after
removing the last half of the list (which is redundant) and the
constant term; the [[1]] is necessary because Ordering returns a list *)
f2 = Ordering[Abs[Take[Fourier[data], {2,Round[Length[data]/2+1]}]],-1][[1]]
(* Result: 6 *)
(* Directly find the least squares difference between all functions of
the form a+b*Sin[c*n-d], with intelligent starting values *)
sol = FindMinimum[Sum[((a+b*Sin[c*n-d]) - data[[n]])^2, {n,1,Length[data]}],
{{a,Mean[data]},{b,(Max[data]-Min[data])/2},{c,2*f2*Pi/Length[data]},d}]
(* Result (using //InputForm):
FindMinimum::sszero:
The step size in the search has become less than the tolerance prescribed by
the PrecisionGoal option, but the gradient is larger than the tolerance
specified by the AccuracyGoal option. There is a possibility that the method
has stalled at a point that is not a local minimum.
{2.1375902350021628*^-19, {a -> 753., b -> -919., c -> 0.0016051364365971107,
d -> 2.477886509998064}}
*)
(* Create a table of values for the resulting function to compare to 'data' *)
tab = Table[a+b*Sin[c*x-d], {x,1,Length[data]}] /. sol[[2]];
(* The maximal difference is effectively 0 *)
Max[Abs[data-tab]] // InputForm
(* Result: 7.73070496506989*^-12 *)
Although the above doesn't necessarily fully answer my question, I found it
somewhat remarkable.
Earlier, I'd tried using FindFit[] with Method -> NMinimize (which is
supposed to give a better global fit), but that didn't work well,
possibly because you can't give FindFit[] intelligent starting values.
The error I get bugs me but appears to be irrelevant.

Resources