How to find least square fit for two combined functions - curve-fitting

I have a curvefit problem
I have two functions
y = ax+b
y = ax^2+bx-2.3
I have one set of data each for the above functions
I need to find a and b using least square method combining both the functions
I was using fminsearch function to minimize the sum of squares of errors of these two functions.
I am unable to use this method in lsqcurvefit
Kindly help me
Regards
Ram

I think you'll need to worry less about which library routine to use and more about the math. Assuming you mean vertical offset least squares, then you'll want
D = sum_{i=1..m}(y_Li - a x_Li + b)^2 + sum_{i=j..n}(y_Pj - a x_Pj^2 - b x_Pj + 2.3)^2
where there are m points (x_Li, y_Li) on the line and n points (x_Pj, y_Pj) on the parabola. Now find partial derivatives of D with respect to a and b. Setting them to zero provides two linear equations in 2 unknowns, a and b. Solve this linear system.

y = ax+b
y = ax^2+bx-2.3
In order to not confuse y of the first equation with y of the second equation we use distinct notations :
u = ax+b
v = ax^2+bx+c
The method of linear regression combined for the two functions is shown on the joint page :
HINT : If you want to find by yourself the matrixial equation appearing above, follow the Gene's answer.

Related

Understanding a FastICA implementation

I'm trying to implement FastICA (independent component analysis) for blind signal separation of images, but first I thought I'd take a look at some examples from Github that produce good results. I'm trying to compare the main loop from the algorithm's steps on Wikipedia's FastICA and I'm having quite a bit of difficulty seeing how they're actually the same.
They look very similar, but there's a few differences that I don't understand. It looks like this implementation is similar to (or the same as) the "Multiple component extraction" version from Wiki.
Would someone please help me understand what's going on in the four or so lines having to do with the nonlinearity function with its first and second derivatives, and the first line of updating the weight vector? Any help is greatly appreciated!
Here's the implementation with the variables changed to mirror the Wiki more closely:
% X is sized (NxM, 3x50K) mixed image data matrix (one row for each mixed image)
C=3; % number of components to separate
W=zeros(numofIC,VariableNum); % weights matrix
for p=1:C
% initialize random weight vector of length N
wp = rand(C,1);
wp = wp / norm(wp);
% like do:
i = 1;
maxIterations = 100;
while i <= maxIterations+1
% until mat iterations
if i == maxIterations
fprintf('No convergence: ', p,maxIterations);
break;
end
wp_old = wp;
% this is the main part of the algorithm and where
% I'm confused about the particular implementation
u = 1;
t = X'*b;
g = t.^3;
dg = 3*t.^2;
wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;
% 2nd and 3rd wp update steps make sense to me
wp = wp-W*W'*wp;
wp = wp / norm(wp);
% or until w_p converges
if abs(abs(b'*bOld)-1)<1e-10
W(:,p)=b;
break;
end
i=i+1;
end
end
And the Wiki algorithms for quick reference:
First, I don't understand why the term that is always zero remains in the code:
wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;
The above can be simplified into:
wp = X*g/M-mean(dg)*wp;
Also removing u since it is always 1.
Second, I believe the following line is wrong:
t = X'*b;
The correct expression is:
t = X'*wp;
Now let's go through each variable here. Let's refer to
w = E{Xg(wTX)T} - E{g'(wTX)}w
as the iteration equation.
X is your input data, i.e. X in the iteration equation.
wp is the weight vector, i.e. w in the iteration equation. Its initial value is randomised.
g is the first derivative of a nonquadratic nonlinear function, i.e. g(wTX) in the iteration equation
dg is the first derivative of g, i.e. g'(wTX) in the iteration equation
M although its definition is not shown in the code you provide, but I think it should be the size of X.
Having the knowledge of the meaning of all variables, we can now try to understand the codes.
t = X'*b;
The above line computes wTX.
g = t.^3;
The above line computes g(wTX) = (wTX)3. Note that g(u) can be any equation as long as f(u), where g(u) = df(u)/du, is nonlinear and nonquadratic.
dg = 3*t.^2;
The above line computes the derivative of g.
wp = X*g/M-mean(dg)*wp;
Xg obviously calculates Xg(wTX). Xg/M calculates the average of Xg, which is equivalent to E{Xg(wTX)T}.
mean(dg) is E{g'(wTX)} and multiplies by wp or w in the equation.
Now you have what you needed for Newton-Raphson Method.

How is `(d*a)mod(b)=1` written in Ruby?

How should I write this:
(d*a)mod(b)=1
in order to make it work properly in Ruby? I tried it on Wolfram, but their solution:
(da(b, d))/(dd) = -a/d
doesn't help me. I know a and b. I need to solve (d*a)mod(b)=1 for d in the form d=....
It's not clear what you're asking, and, depending on what you mean, a solution may be impossible.
First off, (da(b, d))/(dd) = -a/d, is not a solution to that equation; rather, it's a misinterpretation of the notation used for partial derivatives. What Wolfram Alpha actually gave you was:
, which is entirely unrelated.
Secondly, if you're trying to solve (d*a)mod(b)=1 for d, you may be out of luck. For any value of a and b, where a and b have a common prime factor, there are an infinite number of values of d that satisfy the equation. If a and b are coprime, you can use the formula given in LutzL's answer.
Additionally, if you're looking to perform symbolic manipulation of equations, Ruby is likely not the proper tool. Consider using a CAS, like Python's SymPy or Wolfram Mathematica.
Finally, if you're just trying to compute (d*a)mod(b), the modulo operator in Ruby is %, so you'd write (d*a)%(b).
You are looking for the modular inverse of a modulo b.
For any two numbers a,b the extended euclidean algorithm
g,u,v = xgcd(a, b)
gives coefficients u,v such that
u*a+v*b = g
and g is the greatest common divisor. You need a,b co-prime, preferably by ensuring that b is a prime number, to get g=1 and then you can set d=u.
xgcd(a,b)
if b = 0
return (a,1,0)
q,r = a divmod b
// a = q*b + r
g,u,v = xgcd(b, r)
// g = u*b + v*r = u*b + v*(a-q*b) = v*a+(u-q*v)*b
return g,v,u - q*v

least square approximation: how this matrix calculation equation is deducted?

I am reading a book "kernel methods for pattern analysis". For the least square approximation, it is to minimise the sum of the square of the discrepancies:
e=y-Xw
Therefore it is to minimize
L(w,S)=(y-Xw)'(y-Xw)
Leading to
$$ w=(X'X)^-1 X'y $$
I understand until now.
But how does it leads to this? What is a exactly? Is it constant?
The same way you would solve for the minima (or maxima) of a quadratic function in only one variable: By solving for the zero in the first derivative:
diff((y-Xw)' (y-Xw), w) = 0
(only that this "0" is a row vector with as many elements as w.)
after performing the differentiation we get the following. (note that ' is the transpose, not a differentiation operator.)
-2y'X + 2w'X'X = 0
we transpose the whole expression (so 0 is a column vector) and divide by two:
-X'y + X'Xw = 0
and finally solve for w:
w = (X'X)^-1 X'y
Regarding your second question: The alpha is simply the whole expression X(X'X)^-2X'y. The point is that w can be written as the dot product of X' and some vector, which means that w is a linear combination of the columns of X' (rows of X).

matlab: optimum amount of points for linear fit

I want to make a linear fit to few data points, as shown on the image. Since I know the intercept (in this case say 0.05), I want to fit only points which are in the linear region with this particular intercept. In this case it will be lets say points 5:22 (but not 22:30).
I'm looking for the simple algorithm to determine this optimal amount of points, based on... hmm, that's the question... R^2? Any Ideas how to do it?
I was thinking about probing R^2 for fits using points 1 to 2:30, 2 to 3:30, and so on, but I don't really know how to enclose it into clear and simple function. For fits with fixed intercept I'm using polyfit0 (http://www.mathworks.com/matlabcentral/fileexchange/272-polyfit0-m) . Thanks for any suggestions!
EDIT:
sample data:
intercept = 0.043;
x = 0.01:0.01:0.3;
y = [0.0530642513911393,0.0600786706929529,0.0673485248329648,0.0794662409166333,0.0895915873196170,0.103837395346484,0.107224784565365,0.120300492775786,0.126318699218730,0.141508831492330,0.147135757370947,0.161734674733680,0.170982455701681,0.191799936622712,0.192312642057298,0.204771365716483,0.222689541632988,0.242582251060963,0.252582727297656,0.267390860166283,0.282890010610515,0.292381165948577,0.307990544720676,0.314264952297699,0.332344368808024,0.355781519885611,0.373277721489254,0.387722683944356,0.413648156978284,0.446500064130389;];
What you have here is a rather difficult problem to find a general solution of.
One approach would be to compute all the slopes/intersects between all consecutive pairs of points, and then do cluster analysis on the intersepts:
slopes = diff(y)./diff(x);
intersepts = y(1:end-1) - slopes.*x(1:end-1);
idx = kmeans(intersepts, 3);
x([idx; 3] == 2) % the points with the intersepts closest to the linear one.
This requires the statistics toolbox (for kmeans). This is the best of all methods I tried, although the range of points found this way might have a few small holes in it; e.g., when the slopes of two points in the start and end range lie close to the slope of the line, these points will be detected as belonging to the line. This (and other factors) will require a bit more post-processing of the solution found this way.
Another approach (which I failed to construct successfully) is to do a linear fit in a loop, each time increasing the range of points from some point in the middle towards both of the endpoints, and see if the sum of the squared error remains small. This I gave up very quickly, because defining what "small" is is very subjective and must be done in some heuristic way.
I tried a more systematic and robust approach of the above:
function test
%% example data
slope = 2;
intercept = 1.5;
x = linspace(0.1, 5, 100).';
y = slope*x + intercept;
y(1:12) = log(x(1:12)) + y(12)-log(x(12));
y(74:100) = y(74:100) + (x(74:100)-x(74)).^8;
y = y + 0.2*randn(size(y));
%% simple algorithm
[X,fn] = fminsearch(#(ii)P(ii, x,y,intercept), [0.5 0.5])
[~,inds] = P(X, y,x,intercept)
end
function [C, inds] = P(ii, x,y,intercept)
% ii represents fraction of range from center to end,
% So ii lies between 0 and 1.
N = numel(x);
n = round(N/2);
ii = round(ii*n);
inds = min(max(1, n+(-ii(1):ii(2))), N);
% Solve linear system with fixed intercept
A = x(inds);
b = y(inds) - intercept;
% and return the sum of squared errors, divided by
% the number of points included in the set. This
% last step is required to prevent fminsearch from
% reducing the set to 1 point (= minimum possible
% squared error).
C = sum(((A\b)*A - b).^2)/numel(inds);
end
which only finds a rough approximation to the desired indices (12 and 74 in this example).
When fminsearch is run a few dozen times with random starting values (really just rand(1,2)), it gets more reliable, but I still wouln't bet my life on it.
If you have the statistics toolbox, use the kmeans option.
Depending on the number of data values, I would split the data into a relative small number of overlapping segments, and for each segment calculate the linear fit, or rather the 1-st order coefficient, (remember you know the intercept, which will be same for all segments).
Then, for each coefficient calculate the MSE between this hypothetical line and entire dataset, choosing the coefficient which yields the smallest MSE.

Get equidistant intervals on approximated bark scale

Wikipedia says we can approximate Bark scale with the equation:
b(f) = 13*atan(0.00076*f)+3.5*atan(power(f/7500,2))
How can I divide frequency spectrum into n intervals of the same length on Bark scale (interval division points will be equidistant on Bark scale)?
The best way would be to analytically inverse function (express x by function of y). I was trying doing it on paper but failed. WolframAlpha search bar couldn't do it also. I tried Octave finverse function, but I got error.
Octave says (for simpler example):
octave:2> x = sym('x');
octave:3> finverse(2*x)
error: `finverse' undefined near line 3 column 1
This is finverse description from Matlab: http://www.mathworks.com/help/symbolic/finverse.html
There could be also numerical way to do it. I can imagine that you just start from dividing the y axis equally and search for ideal division by binary search. But maybe there are some existing tools that do it?
You need to numerically solve this equation (there is no analytical inverse function). Set values for b equally spaced and solve the equation to find the various f. Bissection is somewhat slow but a very good alternative is Brent's method. See http://en.wikipedia.org/wiki/Brent%27s_method
This function can't be inverted analytically. You'll have to use some numerical procedure. Binary search would be fine, but there are more efficient ways to do these sorts of things: look into root-finding algorithms. You can apply your algorithm of choice to the equation b(f) = f_n for each of the frequency interval endpoints f_n.
Just so you know, in (say) octave to implement rpsmi's or David Zaslavsky's answer, you'd do something like this:
global x0 = 0.
function res = b(f)
global x0
res = 13*atan(0.00076*f)+3.5*atan(power(f/7500,2)) - x0
end
function [intervals, barks] = barkintervals(left, right, n)
global x0
intervals = linspace(left, right, n);
barks = intervals;
for i = 1:n
x0 = intervals(i);
# 125*x0 is just a crude guess starting point given the values
[barks(i), fval, info] = fsolve('b', 125*x0);
endfor
end
and run it like so:
octave:1> barks
octave:2> [i,bx] = barkintervals(0, 10, 10)
[... lots of output from fsolve deleted...]
i =
Columns 1 through 8:
0.00000 1.11111 2.22222 3.33333 4.44444 5.55556 6.66667 7.77778
Columns 9 and 10:
8.88889 10.00000
bx =
Columns 1 through 6:
0.0000e+00 1.1266e+02 2.2681e+02 3.4418e+02 4.6668e+02 5.9653e+02
Columns 7 through 10:
7.3639e+02 8.8960e+02 1.0605e+03 1.2549e+03
I finally decided not to use the Bark values approximation but ideal values for critical bands centres (defined for n=1..24). I plotted them with gnuplot and on the same graph I plotted arbitrarily chosen values for points of greater density (for the required n>24). I adjusted the points values in Hz till the the both curves were approximately the same.
Of course rpsmi and David Zaslavsky answers are more general and scalable.

Resources