Linear fit with Math.NET: error in data and error in fit parameters? - curve-fitting

I am trying to use Math.NET to perform a simple linear fit through a small set of datapoints. Using Fit.Line I am very easily able to perform the linear fit and obtain the slope and intercept:
Tuple<double, double> result = Fit.Line(xdata, ydata);
var intercept = result.Item1;
var slope = result.Item2;
This is very simple, but what about errors?
Errors in y-data
My y-data might contain error bars, can Math.NET take these errors into account? There are no errors in x-data, just in y-data.
Errors in fit parameters
What about the error in the resulting fit parameters? The slope and intercept should have an error or at least some way for me to tell how good these parameters fit. Typically I think you'd use the covariance matrix and its diagonal elements would give the error in the parameters. I don't see any option to use that. Is Math.NET able to give me the fit parameter errors?

I supouse you can use this line to measure the fit error:
GoodnessOfFit.RSquared(xdata.Select(x => a+b*x), ydata); // == 1.0
where 1 means PERFECT (exactly on the line) and 0 means POOR.
it is described in Math.NET documentation on that page:
Math.net - Curve Fitting: Linear Regression

Related

MSE giving negative results in High-Level Synthesis

I am trying to calculate the Mean Squared Error in Vitis HLS. I am using hls::pow(...,2) and divide by n, but all I receive is a negative value for example -0.004. This does not make sense to me. Could anyone point the problem out or have a proper explanation for this??
Besides calculating the mean squared error using hls::pow does not give the same results as (a - b) * (a - b) and for information I am using ap_fixed<> types and not normal float or double precision
Thanks in advance!
It sounds like an overflow and/or underflow issue, meaning that the values reach the sign bit and are interpreted as negative while just be very large.
Have you tried tuning the representation precision or the different saturation/rounding options for the fixed point class? This tuning will depend on the data you're processing.
For example, if you handle data that you know will range between -128.5 and 1023.4, you might need very few fractional bits, say 3 or 4, leaving the rest for the integer part (which might roughly be log2((1023+128)^2)).
Alternatively, if n is very large, you can try a moving average and calculate the mean in small "chunks" of length m < n.
p.s. Getting the absolute value of a - b and store it into an ap_ufixed before the multiplication can already give you one extra bit, but adds an instruction/operation/logic to the algorithm (which might not be a problem if the design is pipelined, but require space if the size of ap_ufixed is very large).

How to minimize a cost function with Matlab when input variable is a large image: increase speed and prevent memory crash

I am trying to implement a differential phase integration method described in this paper:
Thüring, Thomas, et al. "Non-linear regularized phase retrieval for unidirectional X-ray differential phase contrast radiography." Optics express 19.25 (2011): 25545-25558.
Basically, it's a way to integrate a differential image across the columns only, while imposing some constraints on continuity across the rows to prevent stripe noise.
From a mathematical point of view, I want to minimize the following equation:
where ||.|| is the L2 norm, Dx is the derivative along the columns, Dy is the derivative across the rows, A is the unknown integrated matrix, lambda is a user-defined parameter and phi is the differential profile I measured. Note that for the Dy operator the L1 norm can also be used.
I wrote down a code using fminunc as Matlab solver
pdiff=imresize(diff(padarray(p,[0,1],'replicate','post'),1,2),[128,128]);
noise = 0.02 * randn(size(pdiff));
pdiff_noise = pdiff + noise ;
% normal integration
integratedProfile=cumsum(pdiff_noise,2);
options=optimoptions(#fminunc,'Display','iter-detailed','UseParallel',true,'MaxIterations',35);
% regularized integration
startingPoint=zeros(size(pdiff_noise));
fun=#(x)costFunction(pdiff_noise,x);
integratedProfile_optmized=fminunc(fun,startingPoint,options);
function difference=costFunction(ep,op)
L=0.2;
dep_o=diff(padarray(op,[0,1],'replicate','post'),1,2);
dep_v=diff(padarray(op,[1,0],'replicate','post'),1,1);
difference=sum(sum((ep-dep_o).^2))+L*sum(sum(dep_v.^2));
end
It works using a 128x128 differential image.
The problem arises as soon as I try to work with a larger image. In particular, when I use a 256x256 matrix takes forever to make each iteration even using the parallel option and takes almost the entire RAM.
When I move to a matrix that is 512x512 I get this error
Requested 262144x262144 (512.0GB) array exceeds maximum
array size preference.
Error in fminusub (line 165)
H = eye(sizes.nVar);
Error in fminunc (line 446)
[x,FVAL,GRAD,HESSIAN,EXITFLAG,OUTPUT] =
fminusub(funfcn,x, ...
Error in Untitled (line 13)
integratedProfile_optmized=fminunc(fun,startingPoint,options);
Unfortunately, my final goal is to process approximately 3000 images of 500x500 size.
I think I have understood that the crash problem is related to the size of the matrix and to the fact that each pixel is a variable. Therefore, Matlab needs to calculate a huge hessian that doesn't fit into the memory.
However, I don't really know how to solve it while also speeding up the processing.
Do you have any suggestions on how to work with large images? Is there another solver that may work in a faster way? Any mathematical approach to making the problem easier?
Thanks!

ListPlot only returning one value

I'm newer to Mathematica language, and I'm having a big issue in graphing a set of points. It goes as follows:
f[w_] = expr.1
calculations = Table[expr.1, {w,0,numtimes}]
omegas = Table[i,{i,0,numtimes}]
orderedpairs = Transpose[{calculations,omegas}]
ListPlot[orderedpairs]
This returns a graph with just one point rather than numtimes amount of points, and it doesn't match the first point in the dataset. I've tried a listplot command for the two lists seperately, like
Listplot[{orderedpairs[[i]],omegas[[i]]},{i,0,numtimes}]
but i get an error that says "the expression i cannot be used as a part specification."
The data set is in the form x+iy, where x and y are real numbers. If I could get some help, I would appreciate it greatly.
Plot the real by itself
ListPlot[Re[orderedpairs]]

MATLAB script to generate reports of rounding errors in algorithms

I am interested in use or created an script to get error rounding reports in algorithms.
I hope the script or something similar is already done...
I think this would be usefull for digital electronic system design because sometimes it´s neccesary to study how would be the accuracy error depending of the number of decimal places that are considered in the design.
This script would work with 3 elements, the algorithm code, the input, and the output.
This script would show the error line by line of the algorithm code.
It would modify the algorith code with some command like roundn and compare the error of the output.
I would define the error as
Errorrounding = Output(without rounding) - Output round
For instance I have the next algorithm
calculation1 = input*constan1 + constan2 %line 1 of the algorithm
output = exp(calculation1) %line 2 of the algorithm
Where 'input' is the input of n elements vector and 'output' is the output and 'constan1' and 'constan2' are constants.
n is the number of elements of the input vector
So, I would put my algorithm in the script and it generated in a automatic way the next algorithm:
input_round = roundn(input,-1*mdec)
calculation1 = input*constant1+constant2*ones(1,n)
calculation1_round = roundn(calculation1,-1*mdec)
output=exp(calculation1_round)
output_round= roundn(output,-1*mdec)
where mdec is the number of decimal places to consider.
Finally the script give the next message
The rounding error at line 1 is #Errorrounding_calculation1
Where '#Errorrounding' would be the result of the next operation Errorrounding_calculation1 = calculation1 - calculation1_round
The rounding error at line 2 is #Errorrounding_output
Where 'Errorrounding_output' would be the result of the next operation Errorrounding_output = output - output_round
Does anyone know if there is something similar already done, or Matlab provides a solution to deal with some issues related?
Thank you.
First point: I suggest reading What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg. It should illuminate a lot of issues regarding floating-point computations that will help you understand more of the intricacies of the problem you are considering.
Second point: I think the problem you are considering is a lot more complicated than you realize. You are interested in the error introduced into a calculation due to the reduced precision from rounding. What you don't realize is that these errors will propagate through your computations. Consider your example:
output = input*C1 + C2
If each of the three operands is a double-precision floating-point number, they will each have some round-off error in their precision. A bound on this round-off error can be found using the function EPS, which tells you the distance from one double-precision number to the next largest one. For example, a bound on the relative error of the representation of input will be 0.5*eps(input), or halfway between it and the next largest double-precision number. We can therefore estimate some errors bounds on the three operands as follows:
err_input = 0.5.*eps(input); %# Maximum round-off error for input
err_C1 = 0.5.*eps(C1); %# Maximum round-off error for C1
err_C2 = 0.5.*eps(C2); %# Maximum round-off error for C2
Note that these errors could be positive or negative, since the true number may have been rounded up or down to represent it as a double-precision value. Now, notice what happens when we estimate the true value of the operands before they were rounded-off by adding these errors to them, then perform the calculation for output:
output = (input+err_input)*(C1+err_C1) + C2+err_C2
%# ...and after reordering terms
output = input*C1 + C2 + err_input*C1 + err_C1*input + err_input*err_C1 + err_C2
%# ^-----------^ ^-----------------------------------------------------^
%# | |
%# rounded computation difference
You can see from this that the precision round-off of the three operands before performing the calculation could change the output we get by as much as difference. In addition, there will be another source of round-off error when the value output is rounded off to represent it as a double-precision value.
So, you can see how it's quite a bit more complicated than you thought to adequately estimate the errors introduced by precision round-off.
This is more of an extended comment than an answer:
I'm voting to close this on the grounds that it isn't a well-formed question. It sort of expresses a hope or wish that there exists some type of program which would be interesting or useful to you. I suggest that you revise the question to, well, to be a question.
You propose to write a Matlab program to analyse the numerical errors in other Matlab programs. I would not use Matlab for this. I'd probably use Mathematica, which offers more sophisticated structural operations on strings (such as program source text), symbolic computation, and arbitrary precision arithmetic. One of the limitations of Matlab for what you propose is that Matlab, like all other computer implementations of real arithmetic, suffers rounding errors. There are other languages which you might choose too.
What you propose is quite difficult, and would probably require a longer answer than most SOers, including this one, would be happy to contemplate writing. Happily for you, other people have written books on the subject, I suggest you start with this one by NJ Higham. You might also want to investigate matters such as interval arithmetic.
Good luck.

How should I filter this data?

I have a several series of data points that need to be graphed. For each graph, some points may need to be thrown out due to error. An example is the following:
The circled areas are errors in the data.
What I need is an algorithm to filter this data so that it eliminates the error by replacing the bad points with flat lines, like so:
Are there any algorithms out there that are especially good at detecting error points? Do you have any tips that could point me in the right direction?
EDIT: Error points are any points that don't look consistent with the data on both sides. There can be large jumps, as long as the data after the jump still looks consistent. If it's on the edge of the graph, large jumps should probably be considered error.
This is a problem that is hard to solve generically; your final solution will end up being very process-dependent, and unique to your situation.
That being said, you need to start by understanding your data: from one sample to the next, what kind of variation is possible? Using that, you can use previous data samples (and maybe future data samples) to decide if the current sample is bogus or not. Then, you'll end up with a filter that looks something like:
const int MaxQueueLength = 100; // adjust these two values as necessary
const double MaxProjectionError = 5;
List<double> FilterData(List<double> rawData)
{
List<double> toRet = new List<double>(rawData.Count);
Queue<double> history = new Queue<double>(MaxQueueLength); // adjust queue length as necessary
foreach (double raw_Sample in rawData)
{
while (history.Count > MaxQueueLength)
history.Dequeue();
double ProjectedSample = GuessNext(history, raw_Sample);
double CurrentSample = (Math.Abs(ProjectedSample - raw_Sample) > MaxProjectionError) ? ProjectedSample : raw_Sample;
toRet.Add(CurrentSample);
history.Enqueue(CurrentSample);
}
return toRet;
}
The magic, then, is coming up with your GuessNext function. Here, you'll be getting into stuff that is specific to your situation, and should take into account everything you know about the process that is gathering data. Are there physical limits to how quickly the input can change? Does your data have known bad values you can easily filter?
Here is a simple example for a GuessNext function that works off of the first derivative of your data (i.e. it assumes that your data is a roughly a straight line when you only look at a small section of it)
double lastSample = double.NaN;
double GuessNext(Queue<double> history, double nextSample)
{
lastSample = double.IsNaN(lastSample) ? nextSample : lastSample;
//ignore the history for simple first derivative. Assume that input will always approximate a straight line
double toRet = (nextSample + (nextSample - lastSample));
lastSample = nextSample;
return toRet;
}
If your data is particularly noisy, you may want to apply a smoothing filter to it before you pass it to GuessNext. You'll just have to spend some time with the algorithm to come up with something that makes sense for your data.
Your example data appears to be parametric in that each sample defines both a X and a Y value. You might be able to apply the above logic to each dimension independently, which would be appropriate if only one dimension is the one giving you bad numbers. This can be particularly successful in cases where one dimension is a timestamp, for instance, and the timestamp is occasionally bogus.
If removing the outliers by eye is not possible, try kriging (with error terms) as in http://www.ipf.tuwien.ac.at/cb/publications/pipeline.pdf . This seems to work quite well to automatically deal with occasional extreme noise. I know that French meteorologists use such an approach to remove outliers in their data (like a fire next to a temperature sensor or something kicking a wind sensor for instance).
Please note that it is a difficult problem in general. Any information about the errors is precious. Did someone kick the measuring device ? Then you cannot do much except removing the offending data by hand. Is your noise systematic ? You can do a lot of things then by making (reasonable) hypotheses about it.

Resources