Matlab - Least Squares data fitting - Cost function with extra constraint - algorithm

I am currently working on some MatLab code to fit experimental data to a sum of exponentials following a method described in this paper.
According to the paper, the data has to follow the following equation (written in pseudo-code):
y = sum(v(i)*exp(-x/tau(i)),i=1..n)
Here tau(i) is a set of n predefined constants. The number of constants also determines the size of the summation, and hence the size of v. For example, we can try to fit a sum of 100 exponentials, each with a different tau(i) to our data. However, due to the nature of the fitting and the exponential sum, we need to add another constraint to the problem, and hence to the cost function of the least-squares method used.
Normally, the cost function of the least-squares method is given by:
(y_data - sum(v(i)*exp(-x/tau(i)),i=1..n)^2
And this has to be minimized. However, to prevent over-fitting that would make the time-constant spectrum extremely noisy, the paper adds the following constraint to the cost function:
|v(i) - v(i+1)|^2
Because of this extra constraint, as far as I know, the regular algorithms, like lsqcurvefit aren't useable any longer, and I have to use fminsearch to search the minimum of my least-squares cost function with a constraint. The function that has to be minimized, according to me, is the following:
(y_data - sum(v(i)*exp(-x/tau(i)),i=1..n)^2 + sum(|v(j) - v(j+1)|^2,j=1..n-1)
My attempt to code this in MatLab is the following. Initially we define the function in a function script, then we use fminsearch to actually minimize the function and get values for v.
function res = funcost( v )
%FUNCOST Definition of the function that has to be minimised
%We define a function yvalues with 2 exponentials with known time-constants
% so we know the result that should be given by minimising.
xvalues = linspace(0,50,10000);
yvalues = 3-2*exp(-xvalues/1)-exp(-xvalues/10);
%Definition of 30 equidistant point in the logarithmic scale
terms = 30;
termsvector = [1:terms];
tau = termsvector;
for i = 1:terms
tau(i) = 10^(-1+3/terms*i);
end
%Definition of the regular function
res_1 = 3;
for i=1:terms
res_1 =res_1+ v(i).*exp(-xvalues./tau(i));
end
res_1 = res_1-yvalues;
%Added constraint
k=1;
res_2=0;
for i=1:terms-1
res_2 = res_2 + (v(i)-v(i+1))^2;
end
res=sum(res_1.*res_1) + k*res_2;
end
fminsearch(#funcost,zeros(30,1),optimset('MaxFunEvals',1000000,'MaxIter',1000000))
However, this code is giving me inaccurate results (no error, just inaccurate results), which leads me to believe I either made a mistake in the coding or in the interpretation of the added constraint for the least-squares method.

I would try to introduce the additional constrain in following way:
res_2 = max((v(1:(end-1))-v(2:end)).^2);
e.g. instead of minimizing an integrated (summed up) error, it does minmax.
You may also make this constrain stiff by
if res_2 > some_number
k = a_very_big_number;
else
k=0; % or k = a_small_number
end;

Related

Efficient Power Iteration for largest and smallest eigenvalue

I am preparing for my numerics exam in a couple of days and had an old exam asking to implement a search for the largest and the smallest eigenvalue of a symmetric matrix in one MATLAB-function, without of course using eig(A). Now are neither the power method nor the inverse power method a problem to be implemented standing alone, but there is one specification in the question I cannot really get:
You should access the elements of A as few times as possible.
Does there exist a trick how to get the following code snippet, using the power iteration method, calculate the smallest value as efficient as possible?
Attached code excerpt for power method:
while error > tol
i = i + 1;
y_k = z_k/norm(z_k);
z_new = A*y_k;
mu_k = transpose(y_k)*z_new;
%mu_small = transpose(y_small)*z_newsmal;
error = abs(mu_k - mu_old);
mu_old = mu_k;
z_k = z_new;
end

Understanding a FastICA implementation

I'm trying to implement FastICA (independent component analysis) for blind signal separation of images, but first I thought I'd take a look at some examples from Github that produce good results. I'm trying to compare the main loop from the algorithm's steps on Wikipedia's FastICA and I'm having quite a bit of difficulty seeing how they're actually the same.
They look very similar, but there's a few differences that I don't understand. It looks like this implementation is similar to (or the same as) the "Multiple component extraction" version from Wiki.
Would someone please help me understand what's going on in the four or so lines having to do with the nonlinearity function with its first and second derivatives, and the first line of updating the weight vector? Any help is greatly appreciated!
Here's the implementation with the variables changed to mirror the Wiki more closely:
% X is sized (NxM, 3x50K) mixed image data matrix (one row for each mixed image)
C=3; % number of components to separate
W=zeros(numofIC,VariableNum); % weights matrix
for p=1:C
% initialize random weight vector of length N
wp = rand(C,1);
wp = wp / norm(wp);
% like do:
i = 1;
maxIterations = 100;
while i <= maxIterations+1
% until mat iterations
if i == maxIterations
fprintf('No convergence: ', p,maxIterations);
break;
end
wp_old = wp;
% this is the main part of the algorithm and where
% I'm confused about the particular implementation
u = 1;
t = X'*b;
g = t.^3;
dg = 3*t.^2;
wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;
% 2nd and 3rd wp update steps make sense to me
wp = wp-W*W'*wp;
wp = wp / norm(wp);
% or until w_p converges
if abs(abs(b'*bOld)-1)<1e-10
W(:,p)=b;
break;
end
i=i+1;
end
end
And the Wiki algorithms for quick reference:
First, I don't understand why the term that is always zero remains in the code:
wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;
The above can be simplified into:
wp = X*g/M-mean(dg)*wp;
Also removing u since it is always 1.
Second, I believe the following line is wrong:
t = X'*b;
The correct expression is:
t = X'*wp;
Now let's go through each variable here. Let's refer to
w = E{Xg(wTX)T} - E{g'(wTX)}w
as the iteration equation.
X is your input data, i.e. X in the iteration equation.
wp is the weight vector, i.e. w in the iteration equation. Its initial value is randomised.
g is the first derivative of a nonquadratic nonlinear function, i.e. g(wTX) in the iteration equation
dg is the first derivative of g, i.e. g'(wTX) in the iteration equation
M although its definition is not shown in the code you provide, but I think it should be the size of X.
Having the knowledge of the meaning of all variables, we can now try to understand the codes.
t = X'*b;
The above line computes wTX.
g = t.^3;
The above line computes g(wTX) = (wTX)3. Note that g(u) can be any equation as long as f(u), where g(u) = df(u)/du, is nonlinear and nonquadratic.
dg = 3*t.^2;
The above line computes the derivative of g.
wp = X*g/M-mean(dg)*wp;
Xg obviously calculates Xg(wTX). Xg/M calculates the average of Xg, which is equivalent to E{Xg(wTX)T}.
mean(dg) is E{g'(wTX)} and multiplies by wp or w in the equation.
Now you have what you needed for Newton-Raphson Method.

MATLAB: Speeding up a discretization function using bsxfun

For a current project, I have to discretize quasi-continuous values into bins defined by some pre-defined binning resolution. For this purpose, I have written a function, which I expected to be highly efficient as it is able to both process scalar inputs as well as vector inputs using bsxfun. However, after some profiling, I found out that almost all processing time of my much larger project is produced in this function, and within the function, it's mainly the bsxfun part that takes time, with the min-query following on second place. Long story short, I am looking for advice on how to solve this task MUCH faster in MATLAB. Side note: I am usually passing vectors with some 50k elements.
Here's the code:
function sampleNo = value2sample(value,bins)
%Make sure both vectors have orientations fitting bsxfun
value = value(:);
bins = bins(:)';
%Recover bin resolution (avoids passing another parameter)
delta = median(diff(bins));
%Calculate distance matrix between all combinations
dist = abs(bsxfun(#minus,value,bins));
%What we really want to know is the minimum distance per row
[minval,ind] = min(dist,[],2);
%Make sure we don't accidentally further process NaNs as 1st bin
ind(isnan(minval))=NaN;
sampleNo = ind;
sampleNo(minval>delta) = NaN;
end
The reason that your function is slow is because you are computing the distance between every element of values and bins and storing them all in an array - if there are N values and M bins then you will require NM elements to store all the distances, and this is probably a really big number (e.g. if each input has 50,000 elements then you need 2.5 billion elements in the output array).
Moreover, since your bins are sorted (you didn't state this, but it looks like you are assuming it in your code) you do not need to compute the distance from every value to every bin. You can be much smarter,
function ind = value2sample(value, bins)
% Find median bin distance
delta = median(diff(bins));
% Bucket into 'nearest' bin by using midpoints
bins = bins(:);
mids = [-Inf; 0.5 * (bins(1:end-1) + bins(2:end))];
[~, ind] = histc(value, mids);
% Ensure that NaN values and points that aren't near any bin are returned as NaN
ind(isnan(value)) = NaN;
ind(abs(value - bins(ind)) > delta) = NaN;
end
In my tests, with values = randn(10000, 1) and bins = -50:50 it takes around 4.5 milliseconds to run the original function, and 485 microseconds to run the code above, so you are getting around a 10x speedup (and the speedup will be even greater as you increase the size of the inputs).
Thanks to #Chris Taylor, I was able to solve the problem very efficiently. The code now runs almost 400 times faster than before. The only changes I had to make from his version are reflected in the code below. Main issue was to replace histc (whose use is not encouraged anymore) by discretize.
function ind = value2sample(value, bins)
% Make sure the vectors are standing
value = value(:);
bins = bins(:);
% Bucket into 'nearest' bin by using midpoints
mids = [eps; 0.5 * (bins(1:end-1) + bins(2:end))];
ind = discretize(value, mids);
The only thing is, that in this implementation your bins must be non-negative. Other than that, this code does exactly what I want, including the fact that ind has the same size as value and contains NaNs whenever a value is NaN or out of the range of bins.

matlab curve fitting: restrictions on parameters

I have 5 non-parametric models all with 5 to 8 parameters. This models are used to fit longitudinal data y(t) with t being time. Every datafile is fitted by all 5 models for comparison. The model itself cannot be altered.
For fitting starting values are used and these are fitted into a lsqcurvefit model using a levenberg-marquardt algortihm. So I've written a script for several models and one function for curvefitting
if i perform the curve fitting a lot of the starting values are wandering off to extreme values. This is the thing I want to avoid since these parameters should stay in the proximity off it's starting values and should only change between a well defined range or so that only curve fits within a standard deviation are included.Important to note here is that this restrictions should be imposed during the curve fitting (iterative numerization techique) and not afterwards.
The function I've written to fit models into height:
% Fit a specific model for all valid persons
try
opts = optimoptions(#lsqcurvefit, 'Algorithm', 'levenberg-marquardt');
[personalParams,personalRes,personalResidual] = lsqcurvefit(heightModel,initialValues,personalData(:,1),personalData(:,2),[],[],opts);
catch
x=1;
end
The function I've written for one of my models
elseif strcmpi(model,'jpss')
% y = h_1(1-(1/(1+((t+0.75)^c_1/d_1)+((t+0.75)^c_2/d_2)+((t+0.75)^c_3/d_3)))
% heightModel = #(params,ages) params(1).*(1-1./(1+((ages+0.75).^params(2))./params(3) + ((ages+0.75).^params(4))./params(5) + ((ages+0.75).^params(6))./params(7)));
heightModel = #(params,ages) params(1).*(1-1./(1+(((ages+0.75)./params(3)).^params(2)) + (((ages+0.75)./params(5)).^params(4)) + ((ages+0.75)./params(7)).^params(6))); % Adapted 25/07
modelStrings = {'h1','c1','d1','c2','d2','c3','d3'};
% Define initial values
if strcmpi('male',gender)
initialValues = [174.8 0.6109 2.9743 3.614 9.88 22.393 13.59];
else
initialValues = [162.7 0.6546 2.43 4.011 8.579 18.394 11.846];
end
What I would like to do:
Is it possible to place restrictions on every startingvalue #initial values? Putting restrictions on lsqcurvefit wouldn't be a good idea I think since there are different models with different starting values and different ranges that are allowed.
I had 2 things in my mind:
1. using range and place this between the initial values
initialValues = [162.7 0.6546 2.43 4.011 8.579 18.394 11.846]`
if range a1=[150,180]; range a2=[0.3,0.8] and so one
place lb and ub restrictions seperatly on all my initialvalues between lsqcurvefit
if Heightmodel='name model'
initial value* 1.2 and lb = initial value* 0.8
Can someone give me some hints or pointers because I can't make it work.
Thanks in advance
Lucy
Could somebody help me out
You state: there are different models with different starting values and different ranges that are allowed. This is where you can use ub and lb. How to do this is outlined in the lsqcurvefit documentation:
X=LSQCURVEFIT(FUN,X0,XDATA,YDATA,LB,UB) defines a set of lower and
upper bounds on the design variables, X, so that the solution is in the
range LB <= X <= UB. Use empty matrices for LB and UB if no bounds
exist. Set LB(i) = -Inf if X(i) is unbounded below; set UB(i) = Inf if
X(i) is unbounded above.
For instance in the following example the parameters are constrained within limits during the fit. The lower bound (lb) and upper bound (ub) are set to 20% below and above the starting values, respectively.
heightModel = #(params,ages) abs(params(1).*(1-1./(1+(params(2).* (ages+params(8) )).^params(5) +(params(3).* (ages+params(8) )).^params(6) +(params(4) .*(ages+params(8) )).^params(7) )));
initialValues = [161.92 0.4173 0.1354 0.090 0.540 2.87 14.281 0.3701];
lb = 0.8*initialValues; % <-- lower bound is 20% smaller than initial par values
ub = 1.2*initialValues;
[parsout,resnorm,residual] = lsqcurvefit(heightModel,initialValues,t,ht,lb,ub);

matlab: optimum amount of points for linear fit

I want to make a linear fit to few data points, as shown on the image. Since I know the intercept (in this case say 0.05), I want to fit only points which are in the linear region with this particular intercept. In this case it will be lets say points 5:22 (but not 22:30).
I'm looking for the simple algorithm to determine this optimal amount of points, based on... hmm, that's the question... R^2? Any Ideas how to do it?
I was thinking about probing R^2 for fits using points 1 to 2:30, 2 to 3:30, and so on, but I don't really know how to enclose it into clear and simple function. For fits with fixed intercept I'm using polyfit0 (http://www.mathworks.com/matlabcentral/fileexchange/272-polyfit0-m) . Thanks for any suggestions!
EDIT:
sample data:
intercept = 0.043;
x = 0.01:0.01:0.3;
y = [0.0530642513911393,0.0600786706929529,0.0673485248329648,0.0794662409166333,0.0895915873196170,0.103837395346484,0.107224784565365,0.120300492775786,0.126318699218730,0.141508831492330,0.147135757370947,0.161734674733680,0.170982455701681,0.191799936622712,0.192312642057298,0.204771365716483,0.222689541632988,0.242582251060963,0.252582727297656,0.267390860166283,0.282890010610515,0.292381165948577,0.307990544720676,0.314264952297699,0.332344368808024,0.355781519885611,0.373277721489254,0.387722683944356,0.413648156978284,0.446500064130389;];
What you have here is a rather difficult problem to find a general solution of.
One approach would be to compute all the slopes/intersects between all consecutive pairs of points, and then do cluster analysis on the intersepts:
slopes = diff(y)./diff(x);
intersepts = y(1:end-1) - slopes.*x(1:end-1);
idx = kmeans(intersepts, 3);
x([idx; 3] == 2) % the points with the intersepts closest to the linear one.
This requires the statistics toolbox (for kmeans). This is the best of all methods I tried, although the range of points found this way might have a few small holes in it; e.g., when the slopes of two points in the start and end range lie close to the slope of the line, these points will be detected as belonging to the line. This (and other factors) will require a bit more post-processing of the solution found this way.
Another approach (which I failed to construct successfully) is to do a linear fit in a loop, each time increasing the range of points from some point in the middle towards both of the endpoints, and see if the sum of the squared error remains small. This I gave up very quickly, because defining what "small" is is very subjective and must be done in some heuristic way.
I tried a more systematic and robust approach of the above:
function test
%% example data
slope = 2;
intercept = 1.5;
x = linspace(0.1, 5, 100).';
y = slope*x + intercept;
y(1:12) = log(x(1:12)) + y(12)-log(x(12));
y(74:100) = y(74:100) + (x(74:100)-x(74)).^8;
y = y + 0.2*randn(size(y));
%% simple algorithm
[X,fn] = fminsearch(#(ii)P(ii, x,y,intercept), [0.5 0.5])
[~,inds] = P(X, y,x,intercept)
end
function [C, inds] = P(ii, x,y,intercept)
% ii represents fraction of range from center to end,
% So ii lies between 0 and 1.
N = numel(x);
n = round(N/2);
ii = round(ii*n);
inds = min(max(1, n+(-ii(1):ii(2))), N);
% Solve linear system with fixed intercept
A = x(inds);
b = y(inds) - intercept;
% and return the sum of squared errors, divided by
% the number of points included in the set. This
% last step is required to prevent fminsearch from
% reducing the set to 1 point (= minimum possible
% squared error).
C = sum(((A\b)*A - b).^2)/numel(inds);
end
which only finds a rough approximation to the desired indices (12 and 74 in this example).
When fminsearch is run a few dozen times with random starting values (really just rand(1,2)), it gets more reliable, but I still wouln't bet my life on it.
If you have the statistics toolbox, use the kmeans option.
Depending on the number of data values, I would split the data into a relative small number of overlapping segments, and for each segment calculate the linear fit, or rather the 1-st order coefficient, (remember you know the intercept, which will be same for all segments).
Then, for each coefficient calculate the MSE between this hypothetical line and entire dataset, choosing the coefficient which yields the smallest MSE.

Resources