I am trying to model the Kerr effect with experimental data, and the relationship between the independent variable voltage applied(U) and light intensity on crossed polarizers (L) is L = a * sin(b*U^2), where a and b are independent constants to be determined.
data = {{300, 0.014336918}, {350, 0.023297491}, {400,
0.053763441}, {450, 0.098566308}, {500, 0.172043011}, {550,
0.23297491}, {600, 0.336917563}, {650, 0.336917563}, {700,
0.403225806}, {750, 0.448028674}, {800, 0.480286738}, {850,
0.485663082}, {900, 0.487455197}, {950, 0.476702509}, {970,
0.465949821}, {985, 0.435483871}, {995, 0.415770609}}
nlm = NonlinearModelFit[data, a*Sin (b*(x^2)), {a, b}, x]
However, I get the following error:
NonlinearModelFit::nrlnum: ...
is not a list of real numbers with dimensions {17} at {a,b} = {1.,1.}.
I'm new to programming in this language but I have no idea what I am doing wrong. Is there any way to structure my data so that this function actually works?
After reading through the documentation, I realize that the non-linear function is a local iterated approximation method and since the coefficient of the parameter b is too small, Mathematica is unable to compute the value of b. Thus linearisation of the function and substitution back into the original equation solved my problem.
Related
My y variable (n=30,000) is distributed with very heavy tails (both positive and negative), for which the fitDist GAMLSS function selects the ST4 family.
I tried to assess a GAMLSS-based regression with an explanatory variable x (pb smoothing), but tails on y are so heavy that convergence does not reach after 50 cycles, even after refit (time consuming+++).
Therefore, I normalized y using the orderNorm transformation (bestNormalize package), which allowed to easily and quickly reach convergence, and then to predict the fitted value from the GAMLSS object.
However, these fitted "orderNormalized" values are a GAMLSS object, and thus cannot be inversed using the predict function from bestNormalize (since this latter seems to not recognize a GAMLSS object).
My question: is it possible, whatever the means, to apply an inverse orderNorm transformation to fitted values from a GAMLSS object?
It is easy to get confused about on what to use the predict function, so I list here the steps without code (as there is no example in the question):
1) transposeObj = orderNorm(data$outputvariable)
2) fitObj = gamlls(transposeObj$x.t ~., data)
3) pred = predict(fitObj, type = 'response')
4) inversedpredictions = predict(transposeObj, newdata = pred, inverse = TRUE)
In plain text, you normalize your data, fit a model, make predictions with the fit, and then predict on the predictions with the normalization object obtained from orderNorm.
The vignette for bestNormalize has a similar example, using lm instead of GAMLSS. See the application section of the vignette. Once you have run the normalization procedure, you should be able to repeat and invert the transformation with the predict function.
The key is storing a transformation object as an R object that can then be fed into the predict (or rather, the predict.bestNormalize) function.
#dataset, sigmoidal curve
ratio278a267 ={{5.445, 0.0501}, {6.177, 0.035}, {7., 0.0589}, {7.368,
0.0953}, {7.73, 0.1419}, {8.169, 1.0697}, {9.141, 1.0869}, {10.3,
1.0981}}
#nonlinearmodelfitting for dataset
fit = FindFit[ratio278a267, (1/(a + (E^(-b*(x - c))))), {a, b, c}, x]
fit2 = NonlinearModelFit[
ratio278a267, (1/(a + (E^(-b*(x - c))))), {a, b, c}, x]["RSquared"]
#fit1 & fit2 output respectively
output:
{a -> 0.639719, b -> -250.201, c -> -1008.92}
0.
The code above is what i used for a nonlinear fitting in Mathematica, and the output has not provided reasonably small figures, despite me having plotted this in a graphing calculator overlaid ontop of my dataset, with numbers between 0-10 for a,b,c and have obtained reasonable fitting
One way to get FindFit to converge on a good solution is to give it good starting values, particularly when your model could give wildly bad fits for some values of the values.
x=.;a=.;b=.;c=.;
ratio278a267 ={{5.445, 0.0501}, {6.177, 0.035}, {7., 0.0589}, {7.368,0.0953},
{7.73, 0.1419}, {8.169, 1.0697}, {9.141, 1.0869}, {10.3,1.0981}};
fit = FindFit[ratio278a267, (1/(a+(E^(-b*(x-c))))), {{a,0.92}, {b,8.7}, {c,7.9}}, x]
Show[ListPlot[ratio278a267],Plot[(1/(a+(E^(-b*(x-c)))))/.fit,{x,5.445,10.3}]]
In this example I found those starting values by doing ten thousand Monte Carlo trials looking for the smallest sum of squared error between the model and the data points and then let FindFit converge on the best values that it could find.
I'm studying with this ppt. Starting from page 22, it's showing how a missing data can be filled with the most likely value with EM algorithm. I managed to understand this but I started wondering how I can fill 2 missing data. If 2 data only in field B were missing, I can see how I would calculate it. But what if one data is missing, both in A and B field? The calculation on the ppt is being conducted with the assumption that data on A is firm but in this case it's not... Can someone explain a little bit?
If you want missing values on both A and B, you need some additional hidden variables.
To be more precise:
Assume that you have 4 hidden variables, H1, H2, A' and B' taking values in {0, 1} which generates your observations (A, B) as follows:
A = A' if H1=0, A = 'H' otherwise
B = B' if H1=0, B = 'H' otherwise
and assume that (A', B') is independent from (H1, H2). Therefore, your model is parametrized by the joint distribution of (A', B') and the joint distribution of (H1, H2).
Now to learn the model, you can just run EM as you did before, the only difference is that your hidden variable H is now extended by A', B', H1 and H2. Once your model is learnt, you can fill the missing pairs of observations by the most likely pair (given the distribution of the model).
Graphics#Flatten[Table[
(*colors, dont mind*)
{ColorData["CMYKColors"][(a[[r, t]] - .000007)/(.0003 - 0.000007)],
(*point size, dont mind*)
PointSize[1/Sqrt[r]/10],
(*Coordinates for your points "a" is your data matrix *)
Point[
{(rr =Log[.025 + (.58 - .25)/64 r]) Cos#(tt = t 5 Degree),
rr Sin#tt}]
} &#
(*values for the iteration*)
, {r, 7, 64}, {t, 1, 72}], 1]
(*Rotation, dont mind*)
/. gg : Graphics[___] :> Rotate[gg, Pi/2]
Okay, I'll bite. First, Mathematica allows functions to be applied via one of several forms: standard form - f[x], prefix form - f # x, postfix form - f // x, and infix form - x ~ f ~ y. Belisarius's code uses both standard and prefix form.
So, let's look at the outermost functions first: Graphics # x /. gg : Graphics[___]:> Rotate[gg,Pi/2], where x is everything inside of Flatten. Essentially, what this does is create a Graphics object from x and using a named pattern (gg : Graphics[___]) rotates the resulting Graphics object by 90 degrees.
Now, to create a Graphics object, we need to supply a bunch of primitives and this is in the form of a nested list, where each sublist describes some element. This is done via the Table command which has the form: Table[ expr, iterators ]. Iterators can have several forms, but here they both have the form {var, min, max}, and since they lack a 4th term, they take on each value between min and max in integer steps. So, our iterators are {r, 7, 64} and {t, 1, 72}, and expr is evaluated for each value that they take on. Since, we have two iterators this produces a matrix, which would confuse Graphics, so we using Flatten[ Table[ ... ], 1] we take every element of the matrix and put it into a simple list.
Each element that Table produces is simply: color (ColorData), point size (PointSize), and point location (Point). So, with Flatten, we have created the following:
Graphics[{{color, point size, point}, {color, point size, point}, ... }]
The color generation is taken from the data, and it assumes that the data has been put into a list called a. The individual elements of a are accessed through the Part construct: [[]]. On the surface, the ColorData construct is a little odd, but it can be read as ColorData["CMYKColors"] returns a ColorDataFunction that produces a CMYK color value when a value between 0 and 1 is supplied. That is why the data from a is scaled the way it is.
The point size is generated from the radial coordinate. You'd expect with 1/Sqrt[r] the point size should be getting smaller as r increases, but the Log inverts the scale.
Similarly, the point location is produced from the radial and angular (t) variables, but Point only accepts them in {x,y} form, so he needed to convert them. Two odd constructs occur in the transformation from {r,t} to {x,y}: both rr and tt are Set (=) while calculating x allowing them to be used when calculating y. Also, the term t 5 Degree lets Mathematica know that the angle is in degrees, not radians. Additionally, as written, there is a bug: immediately following the closing }, the terms & and # should not be there.
Does that help?
I have some periodic data, but the amount of data is not a multiple of
the period. How can I Fourier analyze this data? Example:
% Let's create some data for testing:
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}]
% I now receive this data, but have no idea that it came from the
formula above. I'm trying to reconstruct the formula just from 'data'.
% Looking at the first few non-constant terms of the Fourier series:
ListPlot[Table[Abs[Fourier[data]][[x]], {x,2,20}], PlotJoined->True,
PlotRange->All]
shows an expected spike at 6 (since the number of periods is really
25000/(623*2*Pi) or about 6.38663, though we don't know this).
% Now, how do I get back 6.38663? One way is to "convolve" the data with
arbitrary multiples of Cos[x].
convolve[n_] := Sum[data[[x]]*Cos[n*x], {x,1,25000}]
% And graph the "convolution" near n=6:
Plot[convolve[n],{n,5,7}, PlotRange->All]
we see a spike roughly where expected.
% We try FindMaximum:
FindMaximum[convolve[n],{n,5,7}]
but the result is useless and inaccurate:
FindMaximum::fmmp:
Machine precision is insufficient to achieve the requested accuracy or
precision.
Out[119]= {98.9285, {n -> 5.17881}}
because the function is very wiggly.
% By refining our interval (using visual analysis on the plots), we
finally find an interval where convolve[] doesn't wiggle too much:
Plot[convolve[n],{n,6.2831,6.2833}, PlotRange->All]
and FindMaximum works:
FindMaximum[convolve[n],{n,6.2831,6.2833}] // FortranForm
List(1.984759605826571e7,List(Rule(n,6.2831853071787975)))
% However, this process is ugly, requires human intervention, and
computing convolve[] is REALLY slow. Is there a better way to do this?
% Looking at the Fourier series of the data, can I somehow divine the
"true" number of periods is 6.38663? Of course, the actual result
would be 6.283185, since my data fits that better (because I'm only
sampling at a finite number of points).
Based on Mathematica help for the Fourier function / Applications / Frequency Identification:
Checked on version 7
n = 25000;
data = Table[N[753 + 919*Sin[x/623 - 125]], {x, 1, n}];
pdata = data - Total[data]/Length[data];
f = Abs[Fourier[pdata]];
pos = Ordering[-f, 1][[1]]; (*the position of the first Maximal value*)
fr = Abs[Fourier[pdata Exp[2 Pi I (pos - 2) N[Range[0, n - 1]]/n],
FourierParameters -> {0, 2/n}]];
frpos = Ordering[-fr, 1][[1]];
N[(pos - 2 + 2 (frpos - 1)/n)]
returns 6.37072
Look for the period length using autocorrelation to get an estimate:
autocorrelate[data_, d_] :=
Plus ## (Drop[data, d]*Drop[data, -d])/(Length[data] - d)
ListPlot[Table[{d, autocorrelate[data, d]}, {d, 0, 5000, 100}]]
A smart search for the first maximum away from d=0 may be the best estimate you can get form the available data?
(* the data *)
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}];
(* Find the position of the largest Fourier coefficient, after
removing the last half of the list (which is redundant) and the
constant term; the [[1]] is necessary because Ordering returns a list *)
f2 = Ordering[Abs[Take[Fourier[data], {2,Round[Length[data]/2+1]}]],-1][[1]]
(* Result: 6 *)
(* Directly find the least squares difference between all functions of
the form a+b*Sin[c*n-d], with intelligent starting values *)
sol = FindMinimum[Sum[((a+b*Sin[c*n-d]) - data[[n]])^2, {n,1,Length[data]}],
{{a,Mean[data]},{b,(Max[data]-Min[data])/2},{c,2*f2*Pi/Length[data]},d}]
(* Result (using //InputForm):
FindMinimum::sszero:
The step size in the search has become less than the tolerance prescribed by
the PrecisionGoal option, but the gradient is larger than the tolerance
specified by the AccuracyGoal option. There is a possibility that the method
has stalled at a point that is not a local minimum.
{2.1375902350021628*^-19, {a -> 753., b -> -919., c -> 0.0016051364365971107,
d -> 2.477886509998064}}
*)
(* Create a table of values for the resulting function to compare to 'data' *)
tab = Table[a+b*Sin[c*x-d], {x,1,Length[data]}] /. sol[[2]];
(* The maximal difference is effectively 0 *)
Max[Abs[data-tab]] // InputForm
(* Result: 7.73070496506989*^-12 *)
Although the above doesn't necessarily fully answer my question, I found it
somewhat remarkable.
Earlier, I'd tried using FindFit[] with Method -> NMinimize (which is
supposed to give a better global fit), but that didn't work well,
possibly because you can't give FindFit[] intelligent starting values.
The error I get bugs me but appears to be irrelevant.