Understanding a FastICA implementation - algorithm

I'm trying to implement FastICA (independent component analysis) for blind signal separation of images, but first I thought I'd take a look at some examples from Github that produce good results. I'm trying to compare the main loop from the algorithm's steps on Wikipedia's FastICA and I'm having quite a bit of difficulty seeing how they're actually the same.
They look very similar, but there's a few differences that I don't understand. It looks like this implementation is similar to (or the same as) the "Multiple component extraction" version from Wiki.
Would someone please help me understand what's going on in the four or so lines having to do with the nonlinearity function with its first and second derivatives, and the first line of updating the weight vector? Any help is greatly appreciated!
Here's the implementation with the variables changed to mirror the Wiki more closely:
% X is sized (NxM, 3x50K) mixed image data matrix (one row for each mixed image)
C=3; % number of components to separate
W=zeros(numofIC,VariableNum); % weights matrix
for p=1:C
% initialize random weight vector of length N
wp = rand(C,1);
wp = wp / norm(wp);
% like do:
i = 1;
maxIterations = 100;
while i <= maxIterations+1
% until mat iterations
if i == maxIterations
fprintf('No convergence: ', p,maxIterations);
wp_old = wp;
% this is the main part of the algorithm and where
% I'm confused about the particular implementation
u = 1;
t = X'*b;
g = t.^3;
dg = 3*t.^2;
wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;
% 2nd and 3rd wp update steps make sense to me
wp = wp-W*W'*wp;
wp = wp / norm(wp);
% or until w_p converges
if abs(abs(b'*bOld)-1)<1e-10
And the Wiki algorithms for quick reference:

First, I don't understand why the term that is always zero remains in the code:
wp = ((1-u)*t'*g*wp+u*X*g)/M-mean(dg)*wp;
The above can be simplified into:
wp = X*g/M-mean(dg)*wp;
Also removing u since it is always 1.
Second, I believe the following line is wrong:
t = X'*b;
The correct expression is:
t = X'*wp;
Now let's go through each variable here. Let's refer to
w = E{Xg(wTX)T} - E{g'(wTX)}w
as the iteration equation.
X is your input data, i.e. X in the iteration equation.
wp is the weight vector, i.e. w in the iteration equation. Its initial value is randomised.
g is the first derivative of a nonquadratic nonlinear function, i.e. g(wTX) in the iteration equation
dg is the first derivative of g, i.e. g'(wTX) in the iteration equation
M although its definition is not shown in the code you provide, but I think it should be the size of X.
Having the knowledge of the meaning of all variables, we can now try to understand the codes.
t = X'*b;
The above line computes wTX.
g = t.^3;
The above line computes g(wTX) = (wTX)3. Note that g(u) can be any equation as long as f(u), where g(u) = df(u)/du, is nonlinear and nonquadratic.
dg = 3*t.^2;
The above line computes the derivative of g.
wp = X*g/M-mean(dg)*wp;
Xg obviously calculates Xg(wTX). Xg/M calculates the average of Xg, which is equivalent to E{Xg(wTX)T}.
mean(dg) is E{g'(wTX)} and multiplies by wp or w in the equation.
Now you have what you needed for Newton-Raphson Method.


How to speed up the solving of multiple optimization problems?

Currently, I'm writing a simulation that asses the performance of a positioning algorithm by measuring the mean error of the position estimator for different points around the room. Unfortunately the running times are pretty slow and so I am looking for ways to speed up my code.
The working principle of the position estimator is based on the MUSIC algorithm. The estimator gets an autocorrelation matrix (sized 12x12, with complex values in general) as an input and follows the next steps:
Find the 12 eigenvalues and eigenvectors of the autocorrelation matrix R.
Construct a new 12x11 matrix EN whose columns are the 11 eigenvectors corresponding to the 11 smallest eigenvalues.
Using the matrix EN, construct a function P = 1/(a' EN EN' a).
Where a is a 12x1 complex vector and a' is the Hermitian conjugate of a. The components of a are functions of 3 variables (named x,y and z) and so the scalar P is also a function P(x,y,z)
Finally, find the values (x0,y0,z0) which maximizes the value of P and return it as the position estimate.
In my code, I choose some constant z and create a grid on points in the plane (at heigh z, parallel to the xy plane). For each point I make n4Avg repetitions and calculate the error of the estimated point. At the end of the parfor loop (and some reshaping), I have a matrix of errors with dims (nx) x (ny) x (n4Avg) and the mean error is calculated by taking the mean of the error matrix (acting on the 3rd dimension).
nx=30 is the number of point along the x axis.
ny=15 is the number of points along the y axis.
n4Avg=100 is the number of repetitions used for calculating the mean error at each point.
nGen=100 is the number of generations in the GA algorithm (100 was tested to be good enough).
x = linspace(-20,20,nx);
y = linspace(0,20,ny);
z = 5;
[X,Y] = meshgrid(x,y);
parfor ri = 1:nx*ny
rT = [X(ri);Y(ri);z];
[ENs] = getEnNs(rT,stdv,n4R,n4Avg); % create n4Avg EN matrices
for rep = 1:n4Avg
pos_est = estPos_helper(squeeze(ENs(:,:,rep)),nGen);
posEstErr(ri,rep) = vecnorm(pos_est(:)-rT(:));
The matrices EN are generated by the following code
function [ENs] = getEnNs(rT,stdv,n4R,nEN)
% generate nEN simulated EN matrices, each using n4R simulated phases
f_c = 2402e6; % center frequency [Hz]
c0 = 299702547; % speed of light [m/s]
load antennaeArr1.mat antennaeArr1;
% generate initial phases.
phi0 = 2*pi*rand(n4R*nEN,1);
k0 = 2*pi.*(f_c)./c0;
I = cos(-k0.*vecnorm(antennaeArr1 - rT(:),2,1)-phi0);
Q = -sin(-k0.*vecnorm(antennaeArr1 - rT(:),2,1)-phi0);
phases = I+1i*Q;
phases = phases + stdv/sqrt(2)*(randn(size(phases)) + 1i*randn(size(phases)));
phases = reshape(phases',[12,n4R,nEN]);
Rxx = pagemtimes(phases,pagectranspose(phases));
ENs = zeros(12,11,nEN);
for i=1:nEN
[ENs(:,:,i),~] = eigs(squeeze(Rxx(:,:,i)),11,'smallestabs');
The position estimator uses a solver utilizing a 'genetic algorithm' (chosen because it preformed the best of all the other solvers).
function pos_est = estPos_helper(EN,nGen)
load antennaeArr1.mat antennaeArr1; % 3x12 constant matrix
antennae_array = antennaeArr1;
x0 = [0;10;5];
lb = [-20;0;0];
ub = [20;20;10];
function y = myfun(x)
k0 = 2*pi*2.402e9/299702547;
a = exp( -1i*k0*sqrt( (x(1)-antennae_array(1,:)').^2 + (x(2) - antennae_array(2,:)').^2 + (x(3)-antennae_array(3,:)').^2 ) );
y = 1/real((a')*(EN)*(EN')*a);
% Create optimization variables
x3 = optimvar("x",3,1,"LowerBound",lb,"UpperBound",ub);
% Set initial starting point for the solver
initialPoint2.x = x0;
% Create problem
problem = optimproblem("ObjectiveSense","Maximize");
% Define problem objective
problem.Objective = fcn2optimexpr(#myfun,x3);
% Set nondefault solver options
options2 = optimoptions("ga","Display","off","HybridFcn","fmincon",...
% Solve problem
solution = solve(problem,initialPoint2,"Solver","ga","Options",options2);
% Clear variables
clearvars x3 initialPoint2 options2
pos_est = solution.x;
The current runtime of the code, when setting the parameters as shown above, is around 700-800 seconds. This is a problem as I would like to increase the number of points in the grid and the number of repetitions to get a more accurate result.
The main ways I've tried to tackle this is by using parallel computing (in the form of the parloop) and by reducing the nested loops I had (one for x and one for y) into a single vectorized loop going over all the points in the grid.
It indeed helped, but not quite enough.
I apologize for the messy code.

Numeric and symbolic gradients don't match although Hessians do

For context, I have a small project in MATLAB where I try to replicate an algorithm involving some optimisation with the Newton algorithm. Although my issue is mainly with MATLAB, maybe it's my lacking profound background knowledge what's keeping me from finding a solution, so feel free to redirect me to the appropriate StackExchange site if needed.
The function I need to calculate the gradient vector and Hessian matrix for the optimization is :
function [zi] = Zi(lambda,j)
zi = m(j)*exp(-(lambda*v_tilde(j,:).'));
function [z] = Z(lambda)
res = arrayfun(#(x) Zi(lambda,x),1:length(omega));
z = sum(res);
function [f] = F(lambda)
f = log(Z(lambda));
where omega and v_tilde are Matrices of n d-Dimensional vectors and lambda is the d-Dimensional argument to the function. (right now, m(j) are just selectors (1 or 0), but the algorithm allows to refine these, so they shouldn't be removed.
I use the Derivest Suite to calculate the gradient and Hessian numerically, and, although logically slow for high dimensions, the algorithm as a whole works.
I implemented the same solution using the sym package, so that I could compute the gradient and Hessian in advance for some fix n and d, so they can then be evaluated quickly when needed. This would be the symbolic version:
V_TILDE = sym('v_tilde',[d,n])
syms n k
lambda = sym('lambda',[d,1]);
F = log(M*exp(-(transpose(V_TILDE)*lambda)));
matlabFunction( grad_F, 'File', sprintf('Grad_%d_dim_%d_n.m',d,n_max), 'vars',{a,lambda,V_TILDE});
matlabFunction( hesse_F, 'File', sprintf('Hesse_%d_dim_%d_n.m',d,n_max), 'vars',{a,lambda,V_TILDE});
As n is fix, there is no need to iterate over omega. The gradient and Hessian of this can be obtained through the corresponding functions of sym and then stored as matlabFunctions.
However, when I test both implementations against some values, they don't match, surprisingly though, the values of the hessian matrix match while the values of the gradient don't (the numerical calculation being correct), and the Newton algorithm iterates until the values are just NaN. These are some example values for d=2 and n=8:
12.6987 91.3376
95.7507 96.4889
15.7613 97.0593
95.7167 48.5376
70.6046 3.1833
27.6923 4.6171
9.7132 82.3458
95.0222 3.4446
gNum = HNum = 1.0e+03 *
8.3624 1.4066 -0.5653
-1.1496 -0.5653 1.6826
gSym = HSym = 1.0e+03 *
-52.8700 1.4066 -0.5653
-53.3768 -0.5653 1.6826
As you can see, the values of HNum and HSym match, but the gradients don't.
I'm happy to give any more context information, code snippets, or anything that helps. Thank you in advance!
Edit: As requested, here is a minimal test. The problem is basically that the values of gNum and gSym don't match (longer explanation above):
omega = [[12.6987, 91.3376];[95.7507, 96.4889];[15.7613, 97.0593];
[95.7167, 48.5376];[70.6046, 3.1833];[27.6923, 4.6171];[9.7132, 82.3458];
[95.0222, 3.4446]];
v = [61.2324;52.2271];
gradStr = sprintf('Grad_%d_dim_%d_n',length(omega(1,:)),length(omega));
hesseStr = sprintf('Hesse_%d_dim_%d_n',length(omega(1,:)),length(omega));
g = str2func(gradStr);
H = str2func(hesseStr);
selector = ones(1,length(omega)); %this will change, when n_max>n
vtilde = zeros(length(omega),length(omega(1,:)));
for i = 1:length(omega)
vtilde(i,:) = omega(i,:)-v;
lambda = zeros(1,length(omega(1,:))); % start of the optimization
[gNum,~,~] = gradest(#F,lambda)
[HNum,~] = hessian(#F,lambda)
gSym = g(selector,lambda.',omega.')
HSym = H(selector,lambda.',omega.')
Note: The DerivestSuite is a small library (~6 source files) that can be obtained under https://de.mathworks.com/matlabcentral/fileexchange/13490-adaptive-robust-numerical-differentiation

How to speed up a 3D nested loop to fill a (i,j,k)-matrix with indices from other arrays in Matlab?

I have the following problem: given a 3D irregular geometry A with
(i,j,k)-coordinates, which are the centroids of connected voxels, create a table with the (i_out,j_out,k_out)-coordinates of the cells that represent the complementary set B of the bounding box of A, which we may call C. That is to say, I need the voxel coordinates of the set B = C - A.
To get this done, I am using the Matlab code below, but it is taking too much time to complete when C is fairly large. Then, I would like to speed up the code. To make it clear: cvc is the matrix of voxel coordinates of A; allcvc should produce C and B results from outcvc after setdiff.
Someone has a clue regarding the code performance, or even to improve my strategy?
Problem: the for-loop seems to be the villain.
My attempts: I have tried to follow some hints of Yair Altman's book by doing some tic,toc analyses, using pre-allocation and int8 since I do not need double values. deal yet gave me a slight improvement with min,max. I have also checked this discussion here, but, parallelism, for instance, is a limitation that I have for now.
% A bounding box limits
m = min(cvc,[],1);
M = max(cvc,[],1);
[im,jm,km,iM,jM,kM] = deal(m(1),m(2),m(3),M(1),M(2),M(3));
% (i,j,k) indices of regular grid
I = im:iM;
J = jm:jM;
K = km:kM;
% (i,j,k) table
m = length(I);
n = length(J);
p = length(K);
num = m*n*p;
allcvc = zeros(num,3,'int8');
for N = 1:num
for i = 1:m
for j = 1:n
for k = 1:p
aux = [I(i),J(j),K(k)];
allcvc(N,:) = aux;
% operation of exclusion: out = all - in
[outcvc,~] = setdiff(allcvc,cvc,'rows');
To avoid all for-loops in the present code you can use ndgrid or meshgrid functions. For example
[I,J,K] = ndgrid(im:iM, jm:jM, km:kM);
allcvc = [I(:),J(:),K(:)];
instead of your code between % (i,j,k) indices of regular grid and % operation of exclusion: out =.

Global minimum in a huge convex matrix by using small matrices

I have a function J(x,y,z) that gives me the result of those coordinates. This function is convex. What is needed from me is to find the minimum value of this huge matrix.
At first I tried to loop through all of them, calculate then search with min function, but that takes too long ...
so I decided to take advantage of the convexity.
Take a random(for now) set of coordinates, that will be the center of my small 3x3x3 matrice, find the local minimum and make it the center for the next matrice. This will continue until we reach the global minimum.
Another issue is that the function is not perfectly convex, so this problem can appear as well
so I'm thinking of a control measure, when it finds a fake minimum, increase the search range to make sure of it.
How would you advise me to go with it? Is this approach good? Or should I look into something else?
This is something I started myself but I am fairly new to Matlab and I am not sure how to continue.
clear all
%the initial size of the search matrix 2*level +1
i=input('Enter the starting coordinate for i (X) : ');
j=input('Enter the starting coordinate for j (Y) : ');
k=input('Enter the starting coordinate for k (Z) : ');
for m=i-level:i+level
for n=j-level:j+level
for p=k-level:k+level
if A(m,n,p)<min
display(min, 'Minim');
[r,c,d] = ind2sub(size(A),find(A ==min));
Any guidance, improvement and constructive criticism are appreciated. Thanks in advance.
Try fminsearch because it is fairly general and easy to use. This is especially easy if you can specify your function anonymously. For example:
aFunc = #(x)100*(x(2)-x(1)^2)^2+(1-x(1))^2
then using fminsearch:
[x,fval] = fminsearch( aFunc, [-1.2, 1]);
If your 3-dimensional function, J(x,y,z), can be described anonymously or as regular function, then you can try fminsearch. The input takes a vector so you would need to write your function as J(X) where X is a vector of length 3 so x=X(1), y=X(2), z=X(3)
fminseach can fail especially if the starting point is not near the solution. It is often better to refine the initial starting point. For example, the code below samples a patch around the starting vector and generally improves the chances of finding the global minimum.
% deltaR is used to refine the start vector with scatter min search over
% region defined by a path of [-deltaR+starVec(i):dx:deltaR+startVec(i)] on
% a side.
% Determine dx using maxIter.
maxIter = 1e4;
dx = max( ( 2*deltaR+1)^2/maxIter, 1/8);
dim = length( startVec);
[x,y] = meshgrid( [-deltaR:dx:deltaR]);
xV = zeros( length(x(:)), dim);
% Alternate patches as sequential x-y grids.
for ii = 1:2:dim
xV(:, ii) = startVec(ii) + x(:);
for ii = 2:2:dim
xV(:, ii) = startVec(ii) + y(:);
% Find the scatter min index to update startVec.
for ii = 1: length( xV)
nS(ii)=aFunc( xV(ii,:));
[fSmin, iw] = min( nS);
startVec = xV( iw,:);
fSmin = fSmin
startVec = startVec
[x,fval] = fminsearch( aFunc, startVec);
You can run a 2 dimensional test case f(x,y)=z on AlgorithmHub. The app is running the above code in Octave. You can edit the in-line function (possibly even try your problem) from this web-site as well.

finding the best/ scale/shift between two vectors

I have two vectors that represents a function f(x), and another vector f(ax+b) i.e. a scaled and shifted version of f(x). I would like to find the best scale and shift factors.
*best - by means of least squares error , maximum likelihood, etc.
any ideas?
for example:
f1 = [0;0.450541598502498;0.0838213779969326;0.228976968716819;0.91333736150167;0.152378018969223;0.825816977489547;0.538342435260057;0.996134716626885;0.0781755287531837;0.442678269775446;0];
f2 = [-0.029171964726699;-0.0278570165494982;0.0331454732535324;0.187656956432487;0.358856370923984;0.449974662483267;0.391341738643094;0.244800719791534;0.111797007617227;0.0721767235173722;0.0854437239807415;0.143888234591602;0.251750993723227;0.478953530572365;0.748209818420035;0.908044924557262;0.811960826711455;0.512568916956487;0.22669198638799;0.168136111568694;0.365578085161896;0.644996661336714;0.823562159983554;0.792812945867018;0.656803251999341;0.545799498053254;0.587013303815021;0.777464637372241;0.962722388208354;0.980537136457874;0.734416947254272;0.375435649393553;0.106489547770962;0.0892376361668696;0.242467741982851;0.40610516900965;0.427497319032133;0.301874099075184;0.128396341665384;0.00246347624097456;-0.0322120242872125]
*note that f(x) may be irreversible...
For each f(x), take the absolute value of f(x) and normalize it such that it can be considered a probability mass function over its support. Calculate the expected value E[x] and variance of Var[x]. Then, we have that
E[a x + b] = a E[x] + b
Var[a x + b] = a^2 Var[x]
Use the above equations and the known values of E[x] and Var[x] to calculate a and b. Taking your values of f1 and f2 from your example, the following Octave script performs this procedure:
% Octave script
% f1, f2 are defined as given in your example
f1 = [zeros(length(f2) - length(f1), 1); f1];
save_f1 = f1; save_f2 = f2;
f1 = abs( f1 ); f2 = abs( f2 );
f1 = f1 ./ sum( f1 ); f2 = f2 ./ sum( f2 );
mean = #(x)sum(((1:length(x))' .* x));
var = #(x)sum((((1:length(x))'-mean(x)).^2) .* x);
m1 = mean(f1); m2 = mean(f2);
v1 = var(f1); v2 = var(f2)
a = sqrt( v2 / v1 ); b = m2 - a * m1;
plot( a .* (1:length( save_f1 )) + b, save_f1, ...
1:length( save_f2 ), save_f2 );
axis([0 length( save_f1 )];
And the output is
Here's a simple, effective, but perhaps somewhat naive approach.
First make sure you make a generic interpolator through both functions. That way you can evaluate both functions in between the given data points. I used a cubic-splines interpolator, since that seems general enough for the type of smooth functions you provided (and does not require additional toolboxes).
Then you evaluate the source function ("original") at a large number of points. Use this number also as a parameter in an inline function, that takes as input X, where
X = [a b]
(as in ax+b). For any input X, this inline function will compute
the function values of the target function at the same x-locations, but then scaled and offset by a and b, respectively.
The sum of the squared-differences between the resulting function values, and the ones of the source function you computed earlier.
Use this inline function in fminsearch with some initial estimate (one that you have obtained visually or by via automatic means). For the example you provided, I used a few random ones, which all converged to near-optimal fits.
All of the above in code:
function s = findScaleOffset
%% initialize
f2 = [0;0.450541598502498;0.0838213779969326;0.228976968716819;0.91333736150167;0.152378018969223;0.825816977489547;0.538342435260057;0.996134716626885;0.0781755287531837;0.442678269775446;0];
f1 = [-0.029171964726699;-0.0278570165494982;0.0331454732535324;0.187656956432487;0.358856370923984;0.449974662483267;0.391341738643094;0.244800719791534;0.111797007617227;0.0721767235173722;0.0854437239807415;0.143888234591602;0.251750993723227;0.478953530572365;0.748209818420035;0.908044924557262;0.811960826711455;0.512568916956487;0.22669198638799;0.168136111568694;0.365578085161896;0.644996661336714;0.823562159983554;0.792812945867018;0.656803251999341;0.545799498053254;0.587013303815021;0.777464637372241;0.962722388208354;0.980537136457874;0.734416947254272;0.375435649393553;0.106489547770962;0.0892376361668696;0.242467741982851;0.40610516900965;0.427497319032133;0.301874099075184;0.128396341665384;0.00246347624097456;-0.0322120242872125];
figure(1), clf, hold on
h(1) = subplot(2,1,1); hold on
h(2) = subplot(2,1,2); hold on
axis([0 max(length(f1),length(f2)), min(min(f1),min(f2)),max(max(f1),max(f2))])
%% make cubic interpolators and test points
pp1 = spline(1:numel(f1), f1);
pp2 = spline(1:numel(f2), f2);
maxX = max(numel(f1), numel(f2));
N = 100 * maxX;
x2 = linspace(1, maxX, N);
y1 = ppval(pp1, x2);
%% search for parameters
s = fminsearch(#(X) sum( (y1 - ppval(pp2,X(1)*x2+X(2))).^2 ), [0 0])
%% plot results
y2 = ppval( pp2, s(1)*x2+s(2));
figure(1), hold on
subplot(2,1,2), hold on
plot(x2,y2, 'r')
legend('before', 'after')
s =
2.886234493867320e-001 3.734482822175923e-001
Note that this computes the opposite transformation from the one you generated the data with. Reversing the numbers:
>> 1/s(1)
ans =
3.464721948700991e+000 % seems pretty decent
>> -s(2)
ans =
-3.734482822175923e-001 % hmmm...rather different from 7/11!
(I'm not sure about the 7/11 value you provided; using the exact values you gave to make a plot results in a less accurate approximation to the source function...Are you sure about the 7/11?)
Accuracy can be improved by either
using a different optimizer (fmincon, fminunc, etc.)
demanding a higher accuracy from fminsearch through optimset
having more sample points in both f1 and f2 to improve the quality of the interpolations
Using a better initial estimate
Anyway, this approach is pretty general and gives nice results. It also requires no toolboxes.
It has one major drawback though -- the solution found may not be the global optimizer, e.g., the quality of the outcomes of this method could be quite sensitive to the initial estimate you provide. So, always make a (difference) plot to make sure the final solution is accurate, or if you have a large number of such things to do, compute some sort of quality factor upon which you decide to re-start the optimization with a different initial estimate.
It is of course very possible to use the results of the Fourier+Mellin transforms (as suggested by chaohuang below) as an initial estimate to this method. That might be overkill for the simple example you provide, but I can easily imagine situations where this could indeed be very useful.
For the scale factor a, you can estimate it by computing the ratio of the amplitude spectra of the two signals since the Fourier transform is invariant to shift.
Similarly, you can estimate the shift factor b by using the Mellin transform, which is scale invariant.
Here's a super simple approach to estimate the scale a that works on your example data:
a = length(f2) / length(f1)
This gives 3.4167 which is close to your stated value of 3.4. If that estimate is good enough, you can use correlation to estimate the shift.
I realize that this is not exactly what you asked, but it may be an acceptable alternative depending on the data.
Both Rody Oldenhuis and jstarr's answers are correct. I'm adding my own answer just to sum things up, and connect between them.
I've messed up Rody's code a little bit and ended up with the following:
function findScaleShift
load f1f2
x0 = [length(f1)/length(f2) 0]; %initial guess, can do better
costFunc = #(z) sum((eval_f1(z,f2,n)-f1).^2);
opt.TolFun = eps;
title('squared error')
function y = eval_f1(x,f2,n)
t = maketform('affine',[x(1) 0 x(2); 0 1 0 ; 0 0 1]');
y=imtransform(f2',t,'cubic','xdata',[1 n ],'ydata',[1 1])';
This gives zero results:
This method is accurate but exhaustive and may take some time. Another disadvantage is that it finds only a local minima, and may give false results if initial guess (x0) is far.
On the other hand, jstarr method gave the following results:
xopt = [ 3.49655562549115 -0.676062367063033]
which is 10% deviation from the correct answer. Pretty fast solution, but not as accurate as I requested, but still should be noted.
I think in order to get the best results jstarr method should be used as an initial guess for the method purposed by Rody, giving an accurate solution.
