Gradient Descent for Linear Regression not finding optimal parameters

Gradient Descent for Linear Regression not finding optimal parameters - algorithm

I am trying to implement the gradient descent algorithm to fit a straight line to noisy data following the following image taken from Andrew Ng's course.
First, I am declaring the noisy straight line I want to fit:
xrange =(-10:0.1:10); % data lenght
ydata = 2*(xrange)+5; % data with gradient 2, intercept 5
plot(xrange,ydata); grid on;
noise = (2*randn(1,length(xrange))); % generating noise
target = ydata + noise; % adding noise to data
figure; scatter(xrange,target); grid on; hold on; % plot a sctter
I then initialize both parameters, the objective function history as follows:
tita0 = 0 %intercept (randomised)
tita1 = 0 %gradient (randomised)
% Initialize Objective Function History
J_history = zeros(num_iters, 1);
% Number of training examples
m = (length(xrange));
I proceed to write the gradient descent algorithm:
for iter = 1:num_iters
h = tita0 + tita1.*xrange; % building the estimated
%c = (1/(2*length(xrange)))*sum((h-target).^2)
temp0 = tita0 - alpha*((1/m)*sum((h-target)));
temp1 = tita1 - alpha*((1/m)*sum((h-target))).*xrange;
tita0 = temp0;
tita1 = temp1;
J_history(iter) = (1/(2*m))*sum((h-target).^2); % Calculating cost from data to estimate
end
Last but not least, the plots. I am using MATLAB's inbuilt polyfit function to test the accuracy of my fit.
% print theta to screen
fprintf('Theta found by gradient descent: %f %f\n',tita0, tita1(end));
fprintf('Minimum of objective function is %f \n',J_history(num_iters));
%Plot the linear fit
hold on; % keep previous plot visibledesg
plot(xrange, tita0+xrange*tita1(end), '-'); title(sprintf('Cost is %g',J_history(num_iters))); % plotting line on scatter
% Validate with polyfit fnc
poly_theta = polyfit(xrange,ydata,1);
plot(xrange, poly_theta(1)*xrange+poly_theta(2), 'y--');
legend('Training data', 'Linear regression','Linear regression with polyfit')
hold off
Result:
AS can be seen my linear regression is not working well at all. It seems as though both parameters (y-intercept and gradient) are not converging to the optimal solution.
Any suggestions on what I may be doing wrong in my implementation would be appreciated. I can't seem to understand where my solution is diverging from the equations shown above. Thanks!

Your implementation for theta_1 is incorrect. Andrew Ng's equation sums across the x's as well. What you have for theta_0 and theta_1 are
temp0 = tita0 - alpha*((1/m)*sum((h-target)));
temp1 = tita1 - alpha*((1/m)*sum((h-target))).*xrange;
Notice that sum((h-target)) appears in both formulas. You will need to multiply the x's first before summing it up. I am not a MatLab programmer so I can't fix your code.
The big picture of what you are doing in your incorrect implementation is that you are pushing the predicted values for the intercept and slope in the same direction because your change is always proportional to sum((h-target)). That's not the way that gradient descent works.

Change your update rule for tita1 as follows:
temp1 = tita1 - alpha*((1/m)*sum((h-target).*xrange));
Also, another remark is you don't really need the temporary variable.
By setting
num_iters = 100000
alpha = 0.001
I can recover
octave:152> tita0
tita0 = 5.0824
octave:153> tita1
tita1 = 2.0085

Related

Real-time peak detection in noisy sinusoidal time-series

I have been attempting to detect peaks in sinusoidal time-series data in real time, however I've had no success thus far. I cannot seem to find a real-time algorithm that works to detect peaks in sinusoidal signals with a reasonable level of accuracy. I either get no peaks detected, or I get a zillion points along the sine wave being detected as peaks.
What is a good real-time algorithm for input signals that resemble a sine wave, and may contain some random noise?
As a simple test case, consider a stationary, sine wave that is always the same frequency and amplitude. (The exact frequency and amplitude don't matter; I have arbitrarily chosen a frequency of 60 Hz, an amplitude of +/− 1 unit, at a sampling rate of 8 KS/s.) The following MATLAB code will generate such a sinusoidal signal:
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
x = sin(2*pi*60*t);
Using the algorithm developed and published by Jean-Paul, I either get no peaks detected (left) or a zillion "peaks" detected (right):
I've tried just about every combination of values for these 3 parameters that I could think of, following the "rules of thumb" that Jean-Paul gives, but I have so far been unable to get my expected result.
I found an alternative algorithm, developed and published by Eli Billauer, that does give me the results that I want—e.g.:
Even though Eli Billauer's algorithm is much simpler and does tend to reliably produce the results that I want, it is not suitable for real-time applications.
As another example of a signal that I'd like to apply such an algorithm to, consider the test case given by Eli Billauer for his own algorithm:
t = 0:0.001:10;
x = 0.3*sin(t) + sin(1.3*t) + 0.9*sin(4.2*t) + 0.02*randn(1, 10001);
This is a more unusual (less uniform/regular) signal, with a varying frequency and amplitude, but still generally sinusoidal. The peaks are plainly obvious to the eye when plotted, but hard to identify with an algorithm.
What is a good real-time algorithm to correctly identify the peaks in a sinusoidal input signal? I am not really an expert when it comes to signal processing, so it would be helpful to get some rules of thumb that consider sinusoidal inputs. Or, perhaps I need to modify e.g. Jean-Paul's algorithm itself in order to work properly on sinusoidal signals. If that's the case, what modifications would be required, and how would I go about making these?

Case 1: sinusoid without noise
If your sinusoid does not contain any noise, you can use a very classic signal processing technique: taking the first derivative and detecting when it is equal to zero.
For example:
function signal = derivesignal( d )
% Identify signal
signal = zeros(size(d));
for i=2:length(d)
if d(i-1) > 0 && d(i) <= 0
signal(i) = +1; % peak detected
elseif d(i-1) < 0 && d(i) >= 0
signal(i) = -1; % trough detected
end
end
end
Using your example data:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Approximate first derivative (delta y / delta x)
d = [0; diff(y)];
% Identify signal
signal = derivesignal(d);
% Plot result
figure(1); clf; set(gcf,'Position',[0 0 677 600])
subplot(4,1,1); hold on;
title('Data');
plot(t,y);
subplot(4,1,2); hold on;
title('First derivative');
area(d);
ylim([-0.05, 0.05]);
subplot(4,1,3); hold on;
title('Signal (-1 for trough, +1 for peak)');
plot(t,signal); ylim([-1.5 1.5]);
subplot(4,1,4); hold on;
title('Signals marked on data');
markers = abs(signal) > 0;
plot(t,y); scatter(t(markers),y(markers),30,'or','MarkerFaceColor','red');
This yields:
This method will work extremely well for any type of sinusoid, with the only requirement that the input signal contains no noise.
Case 2: sinusoid with noise
As soon as your input signal contains noise, the derivative method will fail. For example:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Add some noise
y = y + 0.2.*randn(2000,1);
Will now generate this result because first differences amplify noise:
Now there are many ways to deal with noise, and the most standard way is to apply a moving average filter. One disadvantage of moving averages is that they are slow to adapt to new information, such that signals may be identified after they have occurred (moving averages have a lag).
Another very typical approach is to use Fourier Analysis to identify all the frequencies in your input data, disregard all low-amplitude and high-frequency sinusoids, and use the remaining sinusoid as a filter. The remaining sinusoid will be (largely) cleansed from the noise and you can then use first-differencing again to determine the peaks and troughs (or for a single sine wave you know the peaks and troughs happen at 1/4 and 3/4 pi of the phase). I suggest you pick up any signal processing theory book to learn more about this technique. Matlab also has some educational material about this.
If you want to use this algorithm in hardware, I would suggest you also take a look at WFLC (Weighted Fourier Linear Combiner) with e.g. 1 oscillator or PLL (Phase-Locked Loop) that can estimate the phase of a noisy wave without doing a full Fast Fourier Transform. You can find a Matlab algorithm for a phase-locked loop on Wikipedia.
I will suggest a slightly more sophisticated approach here that will identify the peaks and troughs in real-time: fitting a sine wave function to your data using moving least squares minimization with initial estimates from Fourier analysis.
Here is my function to do that:
function [result, peaks, troughs] = fitsine(y, t, eps)
% Fast fourier-transform
f = fft(y);
l = length(y);
p2 = abs(f/l);
p1 = p2(1:ceil(l/2+1));
p1(2:end-1) = 2*p1(2:end-1);
freq = (1/mean(diff(t)))*(0:ceil(l/2))/l;
% Find maximum amplitude and frequency
maxPeak = p1 == max(p1(2:end)); % disregard 0 frequency!
maxAmplitude = p1(maxPeak); % find maximum amplitude
maxFrequency = freq(maxPeak); % find maximum frequency
% Initialize guesses
p = [];
p(1) = mean(y); % vertical shift
p(2) = maxAmplitude; % amplitude estimate
p(3) = maxFrequency; % phase estimate
p(4) = 0; % phase shift (no guess)
p(5) = 0; % trend (no guess)
% Create model
f = #(p) p(1) + p(2)*sin( p(3)*2*pi*t+p(4) ) + p(5)*t;
ferror = #(p) sum((f(p) - y).^2);
% Nonlinear least squares
% If you have the Optimization toolbox, use [lsqcurvefit] instead!
options = optimset('MaxFunEvals',50000,'MaxIter',50000,'TolFun',1e-25);
[param,fval,exitflag,output] = fminsearch(ferror,p,options);
% Calculate result
result = f(param);
% Find peaks
peaks = abs(sin(param(3)*2*pi*t+param(4)) - 1) < eps;
% Find troughs
troughs = abs(sin(param(3)*2*pi*t+param(4)) + 1) < eps;
end
As you can see, I first perform a Fourier transform to find initial estimates of the amplitude and frequency of the data. I then fit a sinusoid to the data using the model a + b sin(ct + d) + et. The fitted values represent a sine wave of which I know that +1 and -1 are the peaks and troughs, respectively. I can therefore identify these values as the signals.
This works very well for sinusoids with (slowly changing) trends and general (white) noise:
% Generate data
dt = 1/8000;
t = (0:dt:(1-dt)/4)';
y = sin(2*pi*60*t);
% Add some trends
y(1:1000) = y(1:1000) + 0.001*(1:1000)';
y(1001:2000) = y(1001:2000) - 0.002*(1:1000)';
% Add some noise
y = y + 0.2.*randn(2000,1);
% Loop through data (moving window) and fit sine wave
window = 250; % How many data points to consider
interval = 10; % How often to estimate
result = nan(size(y));
signal = zeros(size(y));
for i = window+1:interval:length(y)
data = y(i-window:i); % Get data window
period = t(i-window:i); % Get time window
[output, peaks, troughs] = fitsine(data,period,0.01);
result(i-interval:i) = output(end-interval:end);
signal(i-interval:i) = peaks(end-interval:end) - troughs(end-interval:end);
end
% Plot result
figure(1); clf; set(gcf,'Position',[0 0 677 600])
subplot(4,1,1); hold on;
title('Data');
plot(t,y); xlim([0 max(t)]); ylim([-4 4]);
subplot(4,1,2); hold on;
title('Model fit');
plot(t,result,'-k'); xlim([0 max(t)]); ylim([-4 4]);
subplot(4,1,3); hold on;
title('Signal (-1 for trough, +1 for peak)');
plot(t,signal,'r','LineWidth',2); ylim([-1.5 1.5]);
subplot(4,1,4); hold on;
title('Signals marked on data');
markers = abs(signal) > 0;
plot(t,y,'-','Color',[0.1 0.1 0.1]);
scatter(t(markers),result(markers),30,'or','MarkerFaceColor','red');
xlim([0 max(t)]); ylim([-4 4]);
Main advantages of this approach are:
You have an actual model of your data, so you can predict signals in the future before they happen! (e.g. fix the model and calculate the result by inputting future time periods)
You don't need to estimate the model every period (see parameter interval in the code)
The disadvantage is that you need to select a lookback window, but you will have this problem with any method that you use for real-time detection.
Video demonstration
Data is the input data, Model fit is the fitted sine wave to the data (see code), Signal indicates the peaks and troughs and Signals marked on data gives an impression of how accurate the algorithm is. Note: watch the model fit adjust itself to the trend in the middle of the graph!
That should get you started. There are also a lot of excellent books on signal detection theory (just google that term), which will go much further into these types of techniques. Good luck!

Consider using findpeaks, it is fast, which may be important for realtime. You should filter high-frequency noise to improve accuracy. here I smooth the data with a moving window.
t = 0:0.001:10;
x = 0.3*sin(t) + sin(1.3*t) + 0.9*sin(4.2*t) + 0.02*randn(1, 10001);
[~,iPeak0] = findpeaks(movmean(x,100),'MinPeakProminence',0.5);
You can time the process (0.0015sec)
f0 = #() findpeaks(movmean(x,100),'MinPeakProminence',0.5)
disp(timeit(f0,2))
To compare, processing the slope is only a bit faster (0.00013sec), but findpeaks have many useful options, such as minimum interval between peaks etc.
iPeaks1 = derivePeaks(x);
f1 = #() derivePeaks(x)
disp(timeit(f1,1))
Where derivePeaks is:
function iPeak1 = derivePeaks(x)
xSmooth = movmean(x,100);
goingUp = find(diff(movmean(xSmooth,100)) > 0);
iPeak1 = unique(goingUp([1,find(diff(goingUp) > 100),end]));
iPeak1(iPeak1 == 1 | iPeak1 == length(iPeak1)) = [];
end

How to fit rotated hyperbola accurately?

My experimental data points looks like pieces of hyperbola. Below I provide a code (Matlab), which generates "dummy" data, which is very similar to original one:
function [x_out,y_out,alpha1,alpha2,ecK,offsetX,offsetY,branchDirection] = dummyGenerator(mu_alpha)
alpha_range=0.1;
numberPoint2Return=100; % number of points to return
ecK=10.^((rand(1)-0.5)*2*2); % eccentricity-related parameter
% slope of the first asimptote (radians)
alpha1 = ((rand(1)-0.5)*alpha_range+mu_alpha);
% slope of the first asimptote (radians)
alpha2 = -((rand(1)-0.5)*alpha_range+mu_alpha);
beta = pi-abs(alpha1-alpha2); % angle between asimptotes (radians)
branchDirection = datasample([0,1],1); % where branch directed
% up: branchDirection==0;
% down: branchDirection==1;
% generate branch
x = logspace(-3,2,numberPoint2Return*100)'; %over sampling
y = (tan(pi/2-beta)*x+ecK./x);
% rotate branch using branchDirection
theta = -(pi/2-alpha1)-pi*branchDirection;
% get rotation matrix
rotM = [ cos(theta), -sin(theta);
sin(theta), cos(theta) ];
% get rotated coordinates
XY1=[x,y]*rotM;
x1=XY1(:,1); y1=XY1(:,2);
% remove possible Inf
x1(~isfinite(y1))=[];
y1(~isfinite(y1))=[];
% add noise
y1=((rand(numel(y1),1)-0.5)+y1);
% downsampling
%x_out=linspace(min(x1),max(x1),numberPoint2Return)';
x_out=linspace(-10,10,numberPoint2Return)';
y_out=interp1(x1,y1,x_out,'nearest');
% randomize offset
offsetX=(rand(1)-0.5)*50;
offsetY=(rand(1)-0.5)*50;
x_out=x_out+offsetX;
y_out=y_out+offsetY;
end
Typical results are presented on figure:
The data has following important property: slopes of both asymptotes comes from the same distribution (just different signs), so for my fitting I have rather goo estimation for mu_alpha.
Now starts the problematic part. I try to fit these data points. The main idea of my approach is to find a rotation to obtain y=k*x+b/x shape and then just fit it.
I use the following code:
function Rsquare = fitFunction(x,y,alpha1,alpha2,ecK,offsetX,offsetY)
R=[];
for branchDirection=[0 1]
% translate back
xt=x-offsetX;
yt=y-offsetY;
% rotate back
theta = (pi/2-alpha1)+pi*branchDirection;
rotM = [ cos(theta), -sin(theta);
sin(theta), cos(theta) ];
XY1=[xt,yt]*rotM;
x1=XY1(:,1); y1=XY1(:,2);
% get fitted values
beta = pi-abs(alpha1-alpha2);
%xf = logspace(-3,2,10^3)';
y1=y1(x1>0);
x1=x1(x1>0);
%x1=x1-min(x1);
xf=sort(x1);
yf=(tan(pi/2-beta)*xf+ecK./xf);
R(end+1)=sum((xf-x1).^2+(yf-y1).^2);
end
Rsquare=min(R);
end
Unfortunately this code works not good, very often I have bad results, even when I use known(from simulation) initial parameters.
Could You help me to find a good solution for such fitting problem?
UPDATE:
I find a solution (see Answer), but
I still have a small problem - my estimation of aparameter is bad, sometimes I did no have good fits because of this reason.
Could You suggest some ideas how to estimate a from experimental point?

I found the main problem (it was my brain as usually)! I did not know about general equation of hyperbola. So equation for my hyperbolas are:
((x-x0)/a).^2-((y-y0)/b).^2=-1
So ,we may not take care about sign, then I may use the following code:
mu_alpha=pi/6;
[x,y,alpha1,alpha2,ecK,offsetX,offsetY,branchDirection] = dummyGenerator(mu_alpha);
% hyperb=#(alpha,a,x0,y0) tan(alpha)*a*sqrt(((x-x0)/a).^2+1)+y0;
hyperb=#(x,P) tan(P(1))*P(2)*sqrt(((x-P(3))./P(2)).^2+1)+P(4);
cost =#(P) fitFunction(x,y,P);
x0=mean(x);
y0=mean(y);
a=(max(x)-min(x))./20;
P0=[mu_alpha,a,x0,y0];
[P,fval] = fminsearch(cost,P0);
hold all
plot(x,y,'-o')
plot(x,hyperb(x,P))
function Rsquare = fitFunction(x,y,P)
%x=sort(x);
yf=tan(P(1))*P(2)*sqrt(((x-P(3))./P(2)).^2+1)+P(4);
Rsquare=sum((yf-y).^2);
end
P.S. LaTex tags did not work for me

Why surface fitting is not working in logarithmic domain?

Let's have a corrupted Image C, Bias profile, B and True Image A. So if we can define a model,
C = A * B;
We can get the original Image back as,
A = C / B;
in the log domain,
log A = log C - log B.
Now let's say, I have true image, A and I am introducing the bias B and I am getting the corrupted image C. Now I can correct this biased Image, C using the polynomial regression. I will fit the surface once I convert the corrupted image C in the log domain, and I can subtract the bias profile from it as shown above. After the subtraction, I don't need to apply exp(log C - log B) as obvious. Onlu normalization is needed to get [0 255] range.
Algorithm:
Original Image without any bias field is introduced with the polynomial profile, which results in an image having non-uniform illumination.
Biased image is converted in log domain and surface is approximated using polynomial fit
approximated surface is subtracted from the Biased image which results in original image back with no bias fields.(Ideally).
measure RMSE between approximated surface and introduced polynomial field in step 1. Measure RMSE between the Biased Image and the image we get back at the end after subtraction.
Code:
clear;clc;close all;
%read the image, on which profile is to be generated
I = ones(300);
I = padarray(I,[20,20],'both','symmetric'); % padding
%%
%creating a bias profile using polynomial modeling
[x,y] = meshgrid(1:size(I,1),1:size(I,2));
profile = -2.5.*x.^3 - 2.5.* y.^3 + 0.25 .*(x.* y.^2) - 0.3*( x.^2 .* y ) - 0.5.* x .* y - x + y - 2.5*( x.^2) - y.^2 + 0.5 .* x .*y + 1;
% come to scale [0 1]
profile = profile - min(profile(:));
profile = profile / max(profile(:));
figure,imshow(profile,[]); %introduced bias profile
%% corrupt the image
biasedImage = (I .* profile);
figure,imshow(biasedImage,[]); %biased Image
cImage = log(biasedImage+1);% conversion to log domain/ +1 is needed to avoid infinite values in case of 0 intensty values in corrupted image.
%% forming the input for prediction of surface
colorChannel = cImage;
rows = size(colorChannel, 1);
columns = size(colorChannel, 2);
[X,Y] = meshgrid(1:columns, 1:rows);
z = colorChannel;
x1d = reshape(X, numel(X), 1);
y1d = reshape(Y, numel(Y), 1);
z1d = double(reshape(z, numel(z), 1)); %dependent variables
x = [x1d y1d]; % two regressors
%% surface fitting
m = 3; %define the order of polynomial
p = polyfitn(x,z1d,m); %prediction step
zg = polyvaln(p, x);
modeledColorChannel = reshape(zg, [rows columns]); % predicted surface
%modeledColorChannel = exp(modeledColorChannel)-1; if we use this step then the step below will be division instead of subtraction
%f = biasedImage./ modeledColorChannel; Same as the step below but as we are using exponential, it will be devision.
%% correction
f = cImage- modeledColorChannel; %getting the original image back.
%grayImage = f(21:end-20,21:end-20);
%modeledColorChannel = modeledColorChannel(21:end-20,21:end-20); %to remove the padding
figure,imshow(f,[]);
figure,imshow(modeledColorChannel,[]);
%% measure the RMSE for image
y = (I - f);
RMSE = sqrt(mean(y(:).^2));
disp(RMSE);
% RMSE for profile
z = (modeledColorChannel - profile);
RMSE = sqrt(mean(z(:).^2));
disp(RMSE);
Results:
In case of: f = cImage- modeledColorChannel
1.0000
0.2127
Corrected Image: enter image description here
In case of division: f = cImage ./ modeledColorChannel (although it is not correct as per theory.)
0.0190
0.2127
Corrected Image:enter image description here
Now, the question is: I am getting lower RMSE value at the end if I do division in the log domain instead of subtraction as I am doing here(See %% correction section). How does it possible to have higher RMSE for subtraction where it is theoriticaly correct? As per my understanding if I keep all of my calculation in log domain image division will become image subtraction.It is obvious if you run the code and see the image f at the end of the correction for division and subtraction in log domain.
Note: In both the cases, RMSE between the introduced and perceived profile is same as I am doing my estimation in log domain in both the cases.Either image division or in image subtraction.
See this for polyfitn tool box.
www.mathworks.com/matlabcentral/fileexchange/34765-polyfitn

Let me add to the answer to my question as I found my mistake, in case anybody faces same issue in future.
Mistake 1: After the subtraction, I don't need to apply exp(log C - log B) as obvious. Only normalisation is needed to get [0 255] range.
My intuition was, I don't need to apply exp() to get the original values back. But in fact I have to apply exp(). Log-log on LHS and RHS never cancels each other.
if log(A) = log(B), to get the values of A back I need A = exp(log(B)).
Mistake 2: In log domain, I subtract two images so I do not have to face infinity problem in the log domain, we usually face in case of division.
So simply while converting image in log domain, I could just do,
cImage = log(biasedImage);
instead of,
cImage = log(biasedImage+1);
Here, adding +1 is creating the unwanted blur in the images because in estimation while predicting surface it will push the surface to high values in the dark areas.

Efficiently calculate optical flow parameters - MATLAB

I am implementing the partial derivative equations from the Horn & Schunck paper on optical flow. However, even for relative small images (320x568), it takes a frustratingly long time (~30-40 seconds) to complete. I assume this is due to the 320 x 568 = 181760 loop iterations, but I can't figure out a more efficient way to do this (short of a MEX file).
Is there some way to turn this into a more efficient MATLAB operation (a convolution perhaps)? I can figure out how to do this as a convolution for It but not Ix and Iy. I've also considered matrix shifting, but that only works for It as well, as far as I can figure out.
Has anyone else run into this issue and found a solution?
My code is below:
function [Ix, Iy, It] = getFlowParams(img1, img2)
% Make sure image dimensions match up
assert(size(img1, 1) == size(img2, 1) && size(img1, 2) == size(img2, 2), ...
'Images must be the same size');
assert(size(img1, 3) == 1, 'Images must be grayscale');
% Dimensions of original image
[rows, cols] = size(img1);
Ix = zeros(numel(img1), 1);
Iy = zeros(numel(img1), 1);
It = zeros(numel(img1), 1);
% Pad images to handle edge cases
img1 = padarray(img1, [1,1], 'post');
img2 = padarray(img2, [1,1], 'post');
% Concatenate i-th image with i-th + 1 image
imgs = cat(3, img1, img2);
% Calculate energy for each pixel
for i = 1 : rows
for j = 1 : cols
cube = imgs(i:i+1, j:j+1, :);
Ix(sub2ind([rows, cols], i, j)) = mean(mean(cube(:, 2, :) - cube(:, 1, :)));
Iy(sub2ind([rows, cols], i, j)) = mean(mean(cube(2, :, :) - cube(1, :, :)));
It(sub2ind([rows, cols], i, j)) = mean(mean(cube(:, :, 2) - cube(:, :, 1)));
end
end

2D convolution is the way to go here as also predicted in the question to replace those heavy mean/average calculations. Also, those iterative differentiations could be replaced by MATLAB's diff. Thus, incorporating all that, a vectorized implementation would be -
%// Pad images to handle edge cases
img1 = padarray(img1, [1,1], 'post');
img2 = padarray(img2, [1,1], 'post');
%// Store size parameters for later usage
[m,n] = size(img1);
%// Differentiation along dim-2 on input imgs for Ix calculations
df1 = diff(img1,[],2)
df2 = diff(img2,[],2)
%// 2D Convolution to simulate average calculations & reshape to col vector
Ixvals = (conv2(df1,ones(2,1),'same') + conv2(df2,ones(2,1),'same'))./4;
Ixout = reshape(Ixvals(1:m-1,:),[],1);
%// Differentiation along dim-1 on input imgs for Iy calculations
df1 = diff(img1,[],1)
df2 = diff(img2,[],1)
%// 2D Convolution to simulate average calculations & reshape to col vector
Iyvals = (conv2(df1,ones(1,2),'same') + conv2(df2,ones(1,2),'same'))./4
Iyout = reshape(Iyvals(:,1:n-1),[],1);
%// It just needs elementwise diffentiation between input imgs.
%// 2D convolution to simulate mean calculations & reshape to col vector
Itvals = conv2(img2-img1,ones(2,2),'same')./4
Itout = reshape(Itvals(1:m-1,1:n-1),[],1)
Benefits with such a vectorized implementation would be :
Memory efficiency : No more concatenation along the third dimension that would incur memory overhead. Again, performance wise it would be a benefit as we won't need to index into such heavy arrays.
The iterative differentiations inside the loopy codes are replaced by differentiation with diff, so this should be another improvement.
Those expensive average calculations are replaced by very fast convolution calculations and this should be the major improvement section.

A faster method, with improved results (in the cases I have observed) is the following:
function [Ix, Iy, It] = getFlowParams(imNew,imPrev)
gg = [0.2163, 0.5674, 0.2163];
f = imNew + imPrev;
Ix = f(:,[2:end end]) - f(:,[1 1:(end-1)]);
Ix = conv2(Ix,gg','same');
Iy = f([2:end end],:) - f([1 1:(end-1)],:);
Iy = conv2(Iy,gg ,'same');
It = 2*conv2(gg,gg,imNew - imPrev,'same');
This handles the boundary cases elegantly.
I made this as part of my optical flow toolbox, where you can easily view H&S, Lucas Kanade and more in realtime. In the toolbox, the function is called grad3D.m. You may also want to check out grad3Drec.m, in the same toolbox, which adds simple temporal blurring.

How to speed up vector equations in Matlab?

I'm using the following code to do logistic regression with stochastic gradient descent in Matlab. The total number of training + testing samples are about 600K. The code runs for hours. How can I speed it up?
%% load dataset
clc;
clear;
load('covtype.mat');
Data = [X y];
%% Split into testing and training data in a 1:9 split
nRows=size(Data,1);
randRows=randperm(nRows); % generate random ordering of row indices
Test=Data(randRows(1:58101),:); % index using random order
Train=Data(randRows(58102:end),:);
Testx=Test(:,1:54);
Testy=Test(:,55:end);
Trainx=Train(:,1:54);
Trainy=Train(:,55:end);
%% Perform stochastic gradient descent on training data
lambda=0.01; % regularisation constant
alpha=0.01; % step length constant
theta_old = zeros(54,1);
theta_new = theta_old;
z=1;
for count = 1:size(Train,1)
theta_old = theta_new;
theta_new = theta_old + (alpha*Trainy(count)* (1.0 ./ (1.0 + exp(Trainy(count)*(Trainx(count,:)*theta_old)))).*Trainx(count,:))' - alpha*lambda*2*theta_old; %//'
n = norm(theta_new);
llr = lambda*n*n;
count_dummy(z)=count; % dummy variable to store iteration number for plotting later
% calculate log likelihood error for test data with current value of theta_new
for i = 1:size(Test,1)
llr = llr - 1.*log(1.0 + exp(-(Testy(i)*(Testx(i,:)*theta_new))));
end
llr_dummy(z)=llr; % dummy variable to store llr for plotting later
z=z+1;
end
thetaopt = theta_new; % this is optimal theta
%% Plot results on testing data
plot(count_dummy, llr_dummy);
I have to calculate the log likelihood error of test data at every iteration to plot it. How can I speed up this code?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio