Related
i have written a fairly large class for the calculation of measurement uncertainties, but it is painfully slow. Profiling the code shows that the slowest operation, by far, is to insert the computation results into a large sparse matrix. About 97% of all time is spent on that operation. The matrix keeps the uncertainties of all measurement data, and I cannot change the data structures without breaking a lot of other code. So my only option is to optimize the data insertion step. This is done about 5700 times in my benchmark, and every time the amout of data increases.
First solution, extremely slow:
% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), numel(obj.w));
% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), numel(obj.w));
obj.w = [obj.w; z(:)]; % insert new measurement results into the column vector obj.w
obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; % insert new uncertainties of the measurement results
obj.dw_dw = [obj.dw_dw, transpose(dz_dw); dz_dw, dz_dz]; % insert "dependencies" of new measurements on old results
The line obj.Uww = [obj.Uww, transpose(Uzw); Uzw, Uzz]; is the slowest, by far. Perhaps it is slow because Matlab needs to allocate a new, larger buffer for obj.Uww and copy everything over. Thus I changed the code to the following:
% Preallocation in the class constructor
obj.w = spalloc(nnz_w, 1, nnz_w);
obj.Uww = spalloc(nnz_w, nnz_w, nnz_Uww);
obj.dw_dw = spalloc(nnz_w, nnz_w, nnz_dw_dw);
obj.num_w = 0; % to manually keep track of the "real" size of obj.w, obj.Uww and obj.dw_dw
The class constructor is called with the sizes the three properties w, Uww and dw_dw will have at the end of the computation (nzz_w is approximately 0.1 million, nzz_Uww is at 9 million and nzz_dw_dw is about 1.6 million). Thus, no new allocation of memory should be needed. This is the inserting step now:
% this automatically sums up duplicate yInd entries
[zInd_grid, yInd_grid] = ndgrid(1:numel(z), yInd(:));
Uzw = sparse(zInd_grid(:), yInd_grid(:), Uzy(:), numel(z), obj.num_w);
% this automatically sums up duplicate yInd entries
dz_dw = sparse(zInd_grid(:), yInd_grid(:), dz_dy(:), numel(z), obj.num_w);
wInd = 1:obj.num_w;
obj.w(zInd, 1) = z(:); % insert new measurement results
obj.num_w = zInd(end); % new "real" size of w, Uww and dw_dw
obj.Uww(zInd, wInd) = Uzw; % about 51% of all computation time
obj.Uww(wInd, zInd) = transpose(Uzw); % about 15% of all computation time
obj.Uww(zInd, zInd) = Uzz; % about 14.4% of all computation time
obj.dw_dw(zInd, wInd) = dz_dw; % about 13% of all computation time
obj.dw_dw(wInd, zInd) = transpose(dz_dw); % about 3.5% of all computation time
obj.dw_dw(zInd, zInd) = dz_dz; % less than 3.5% of all computation time
Still, these lines account for 97% of all computation time, and no speed improvement. Thus I tried version three:
obj.w = [obj.w; z(:)];
[zInd_zy, yInd_zy] = ndgrid(zInd, yInd(:));
[zzInd_i, zzInd_j] = ndgrid(zInd, zInd);
[Uww_i, Uww_j, Uww_v] = find(obj.Uww); % 14% of all computation time
Uww_new = sparse( ... % this statement takes 66% of all computation time
[Uww_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
[Uww_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
[Uww_v; Uzy(:); Uzy(:); Uzz(:)], ...
numel(obj.w), numel(obj.w));
[dw_dw_i, dw_dw_j, dw_dw_v] = find(obj.dw_dw);
dw_dw_new = sparse( ... % 14% of all computation time
[dw_dw_i; zInd_zy(:); yInd_zy(:); zzInd_i(:)], ...
[dw_dw_j; yInd_zy(:); zInd_zy(:); zzInd_j(:)], ...
[dw_dw_v; dz_dy(:); dz_dy(:); dz_dz(:)], ...
numel(obj.w), numel(obj.w));
obj.Uww = Uww_new;
obj.dw_dw = dw_dw_new;
which is even slower thant the two other versions. Why is inserting into an already preallocated array so slow? And how can I speed it up?
(All the matrices are symmetric, but I did not try to exploit that yet.)
I don't understand the details of your update pattern, but keep in mind that Matlab stores sparse matrices internally in compressed-sparse column format. So adding entries in sequence column-by-column is significantly faster than other orders. E.g., on my old version of Matlab (R2006a), this:
n=10000;
nz=400000;
v=floor(n*rand(nz,3))+1;
fprintf('Random\n');
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
fprintf('Row-wise\n');
v=sortrows(v);
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
fprintf('Column-wise\n');
v=sortrows(v, [2 1]);
A=sparse(n, n);
tic
for k=1:nz
A(v(k,1), v(k,2))=v(k,3);
end
toc
gives this:
>> sparsetest
Random
Elapsed time is 19.276089 seconds.
Row-wise
Elapsed time is 20.714324 seconds.
Column-wise
Elapsed time is 1.498150 seconds.
Likely best of all would be if you can somehow just collect the nonzeros in a form suitable for spconvert or sparse and then make the whole sparse matrix at the end, but I gather that you might not be able to do that.
#bg2b Pointed out how adding data column-wise is much faster.
It turns out adding rows to a sparse matrix or sparse vector is painfully slow (more precisely, to the lower triangular part). Thus, I now store only the upper triangular part of the sparse matrix, because that is fast. When I need data from the matrix, I recreate it from the upper triangular sparse matrix. See the end of the benchmarking script for this.
This is my benchmarking script. It nicely shows the exponential increase in computation time for adding data.
% Benchmark the extension of sparse matrices of the form
% Uww_new = [ Uww, Uwz; ...
% Uzw, Uzz];
% where Uzw = transpose(Uwz). Uww and Uzz are always square and symmetric.
close all;
clearvars;
rng(70101557, 'twister'); % the seed is the number of the stack overflow question
density = 0.25;
nZ = 10;
iterations = 5e2;
n = nZ * iterations;
nonzeros = n*n*density;
all_Uzz = cell(iterations, 1);
all_Uwz = cell(iterations, 1);
for k = 1:iterations
% Uzz must be symmetric!
Uzz_nonsymmetric = sprand(nZ, nZ, density);
all_Uzz{k} = (Uzz_nonsymmetric + transpose(Uzz_nonsymmetric))./2;
all_Uwz{k} = sprand((k-1)*nZ, nZ, density);
end
f = figure();
ax = axes(f);
hold(ax, 'on');
grid(ax, 'on');
xlabel(ax, 'Iteration');
ylabel(ax, 'Elapsed time in seconds.');
h = gobjects(1, 0);
name = 'Optimised.';
fprintf('%s\n', name);
Uww_optimised = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_optimised, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_optimised, 1);
Uzz_triu = triu(Uzz);
Uww_optimised(wInd, zInd) = Uwz; % add columns
Uww_optimised(zInd, zInd) = Uzz_triu; % add rows and columns
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Only Uwz and Uzz.';
fprintf('%s\n', name);
Uww_wz_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_wz_zz, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_wz_zz, 1);
Uzw = transpose(Uwz);
Uww_wz_zz(wInd, zInd) = Uwz; % add columns
% Uww_wz_zz(zInd, wInd) = Uzw; % add rows
Uww_wz_zz(zInd, zInd) = Uzz; % add rows and columns
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Only Uzw and Uzz.';
fprintf('%s\n', name);
Uww_zw_zz = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww_zw_zz, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww_zw_zz, 1);
Uzw = transpose(Uwz);
% Uww_zw_zz(wInd, zInd) = Uwz;
Uww_zw_zz(zInd, wInd) = Uzw;
Uww_zw_zz(zInd, zInd) = Uzz;
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
name = 'Uzw, Uwz and Uzz.';
fprintf('%s\n', name);
Uww = spalloc(0, 0, nonzeros);
t1 = tic();
elapsedTimes = NaN(iterations, 1);
for k = 1:iterations
Uzz = all_Uzz{k};
Uwz = all_Uwz{k};
zInd = size(Uww, 1) + (1:size(Uzz, 1));
wInd = 1:size(Uww, 1);
Uzw = transpose(Uwz);
Uww(wInd, zInd) = Uwz;
Uww(zInd, wInd) = Uzw;
Uww(zInd, zInd) = Uzz;
elapsedTimes(k, 1) = toc(t1);
end
toc(t1)
h = [h, plot(ax, 1:iterations, seconds(elapsedTimes), 'DisplayName', name)];
leg = legend(ax, h, 'Location', 'northwest');
assert(issymmetric(Uww));
assert(istriu(Uww_optimised));
assert(isequal(Uww, Uww_optimised + transpose(triu(Uww_optimised, 1))));
%% Get Uyy from Uww_optimised. Uyy is a symmetric subset of Uww
yInd = randi(size(Uww_optimised, 1), 1, nZ); % indices to extract
[yIndRowInds, yIndColInds] = ndgrid(yInd, yInd);
indsToFlip = yIndRowInds > yIndColInds;
temp = yIndColInds(indsToFlip);
yIndColInds(indsToFlip) = yIndRowInds(indsToFlip);
yIndRowInds(indsToFlip) = temp;
linInd = sub2ind(size(Uww_optimised), yIndRowInds, yIndColInds);
assert(issymmetric(linInd));
Uyy = Uww_optimised(linInd);
assert(issymmetric(Uyy));
While using Matlab for image processing (exactly improving img by Fuzzy Logic) I found a really strange thing. My fuzzy function is correct, I tested it on random values and they are basically simple linear functions.
function f = Udark(z)
if z < 50
f = 1;
elseif z > 125
f = 0;
elseif (z >= 50) && (z <= 125)
f = -z/75 + 125/75;
end
end
where z is a value of a pixel (in grayscale). Now there is a really strange thing going on.
f = -z/75 + 125/75;, where a is an image. However, it is giving really different results if used as an input. I.e. if I use a variable p = 99, the output of the function is 0.3467 as it should be, when if I use A(i,j) it is giving me result f=2. Since it is clearly impossible, I do not know where is the problem. I thought that maybe there is a case with the type of the variable but if I change it to uint8 it stays the same... If you know what's going on, please, let me know :)
1.Changed line:
f = (125/75) - (z/75);
After editing the third condition the resultant/transformed image has no pixel values of 2. Not sure if you intend to work with decimals. If decimals are necessary using the im2double() function to convert the image and scaling it up by a factor of 255 might suffice your needs. See heading 3 for rounding details.
2.Reading in Image and Testing:
%Reading in the image and applying the function%
Image = imread("RGB_Image.png");
Greyscale_Image = rgb2gray(Image);
[Image_Height,Image_Width] = size(Greyscale_Image);
Transformed_Image = zeros(Image_Height,Image_Width);
for Row = 1: +1: Image_Height
for Column = 1: +1: Image_Width
Pixel_Value = Greyscale_Image(Row,Column);
[Transformed_Pixel_Value] = Udark(Pixel_Value);
Transformed_Image(Row,Column) = Transformed_Pixel_Value;
end
end
subplot(1,2,1); imshow(Greyscale_Image);
subplot(1,2,2); imshow(Transformed_Image);
%Checking that no transformed pixels falls in this impossible range%
Check = (Transformed_Image > (125/75)) & (Transformed_Image ~= 1);
Check_Flag = any(Check,'all');
%Function to transform pixel values%
function f = Udark(z)
if z < 50
f = 1;
elseif z > 125
f = 0;
elseif (z >= 50) && (z <= 125)
f = (125/75) - (z/75);
end
end
3.Evaluating the Specifics of the Third Condition
Working with integers (uint8) will force the values to be rounded to the nearest integer. Any number that falls between the range (50,125] will evaluate to 1 or 0.
f = -z/75 + 125/75;
If z = 50.1,
-50.1/75 + 125/75 = 74.9/75 ≈ 0.9987 → rounds to 1
Using MATLAB version: R2019b
Does this code have mutation, selection, and crossover, just like the original genetic algorithm.
Since this, a hybrid algorithm (i.e PSO with GA) does it use all steps of original GA or skips some
of them.Please do tell me.
I am just new to this and still trying to understand. Thank you.
%%% Hybrid GA and PSO code
function [gbest, gBestScore, all_scores] = QAP_PSO_GA(CreatePopFcn, FitnessFcn, UpdatePosition, ...
nCity, nPlant, nPopSize, nIters)
% Set algorithm parameters
constant = 0.95;
c1 = 1.5; %1.4944; %2;
c2 = 1.5; %1.4944; %2;
w = 0.792 * constant;
% Allocate memory and initialize
gBestScore = inf;
all_scores = inf * ones(nPopSize, nIters);
x = CreatePopFcn(nPopSize, nCity);
v = zeros(nPopSize, nCity);
pbest = x;
% update lbest
cost_p = inf * ones(1, nPopSize); %feval(FUN, pbest');
for i=1:nPopSize
cost_p(i) = FitnessFcn(pbest(i, 1:nPlant));
end
lbest = update_lbest(cost_p, pbest, nPopSize);
for iter = 1 : nIters
if mod(iter,1000) == 0
parents = randperm(nPopSize);
for i = 1:nPopSize
x(i,:) = (pbest(i,:) + pbest(parents(i),:))/2;
% v(i,:) = pbest(parents(i),:) - x(i,:);
% v(i,:) = (v(i,:) + v(parents(i),:))/2;
end
else
% Update velocity
v = w*v + c1*rand(nPopSize,nCity).*(pbest-x) + c2*rand(nPopSize,nCity).*(lbest-x);
% Update position
x = x + v;
x = UpdatePosition(x);
end
% Update pbest
cost_x = inf * ones(1, nPopSize);
for i=1:nPopSize
cost_x(i) = FitnessFcn(x(i, 1:nPlant));
end
s = cost_x<cost_p;
cost_p = (1-s).*cost_p + s.*cost_x;
s = repmat(s',1,nCity);
pbest = (1-s).*pbest + s.*x;
% update lbest
lbest = update_lbest(cost_p, pbest, nPopSize);
% update global best
all_scores(:, iter) = cost_x;
[cost,index] = min(cost_p);
if (cost < gBestScore)
gbest = pbest(index, :);
gBestScore = cost;
end
% draw current fitness
figure(1);
plot(iter,min(cost_x),'cp','MarkerEdgeColor','k','MarkerFaceColor','g','MarkerSize',8)
hold on
str=strcat('Best fitness: ', num2str(min(cost_x)));
disp(str);
end
end
% Function to update lbest
function lbest = update_lbest(cost_p, x, nPopSize)
sm(1, 1)= cost_p(1, nPopSize);
sm(1, 2:3)= cost_p(1, 1:2);
[cost, index] = min(sm);
if index==1
lbest(1, :) = x(nPopSize, :);
else
lbest(1, :) = x(index-1, :);
end
for i = 2:nPopSize-1
sm(1, 1:3)= cost_p(1, i-1:i+1);
[cost, index] = min(sm);
lbest(i, :) = x(i+index-2, :);
end
sm(1, 1:2)= cost_p(1, nPopSize-1:nPopSize);
sm(1, 3)= cost_p(1, 1);
[cost, index] = min(sm);
if index==3
lbest(nPopSize, :) = x(1, :);
else
lbest(nPopSize, :) = x(nPopSize-2+index, :);
end
end
If you are new to Optimization, I recommend you first to study each algorithm separately, then you may study how GA and PSO maybe combined, Although you must have basic mathematical skills in order to understand the operators of the two algorithms and in order to test the efficiency of these algorithm (this is what really matter).
This code chunk is responsible for parent selection and crossover:
parents = randperm(nPopSize);
for i = 1:nPopSize
x(i,:) = (pbest(i,:) + pbest(parents(i),:))/2;
% v(i,:) = pbest(parents(i),:) - x(i,:);
% v(i,:) = (v(i,:) + v(parents(i),:))/2;
end
Is not really obvious how selection randperm is done (I have no experience about Matlab).
And this is the code that is responsible for updating the velocity and position of each particle:
% Update velocity
v = w*v + c1*rand(nPopSize,nCity).*(pbest-x) + c2*rand(nPopSize,nCity).*(lbest-x);
% Update position
x = x + v;
x = UpdatePosition(x);
This version of velocity updating strategy is utilizing what is called Interia-Weight W, which basically mean we are preserving the velocity history of each particle (not completely recomputing it).
It worth mentioning that velocity updating is done more often than crossover (each 1000 iteration).
Could someone please run this for me and tell me how long it takes for you? It took my laptop 60s. I can't tell if it's my laptop that's crappy or my code. Probably both.
I just started learning MatLab, so I'm not yet familiar with which functions are better than others for specific tasks. If you have any suggestions on how I could improve this code, it would be greatly appreciated.
function gbp
clear; clc;
zi = 0; % initial position
zf = 100; % final position
Ei = 1; % initial electric field
c = 3*10^8; % speed of light
epsilon = 8.86*10^-12; % permittivity of free space
lambda = 1064*10^-9; % wavelength
k = 2*pi/lambda; % wave number
wi = 1.78*10^-3; % initial waist width (minimum spot size)
zr = (pi*wi^2)/lambda; % Rayleigh range
Ri = zi + zr^2/zi; % initial radius of curvature
qi = 1/(1/Ri-1i*lambda/(pi*wi^2)); % initial complex beam parameter
Psii = atan(real(qi)/imag(qi)); % Gouy phase
mat = [1 zf; 0 1]; % transformation matrix
A = mat(1,1); B = mat(1,2); C = mat(2,1); D = mat(2,2);
qf = (A*qi + B)/(C*qi + D); % final complex beam parameter
wf = sqrt(-lambda/pi*(1/imag(1/qf))); % final spot size
Rf = 1/real(1/qf); % final radius of curvature
Psif = atan(real(qf)/imag(qf)); % final Gouy phase
% Hermite - Gaussian modes function
u = #(z, x, n, w, R, Psi) (2/pi)^(1/4)*sqrt(exp(1i*(2*n+1)*Psi)/(2^n*factorial(n)*w))*...
hermiteH(n,sqrt(2)*x/w).*exp(-x.^2*(1/w^2+1i*k/(2*R))-1i*k*z);
% Complex amplitude coefficients function
a = #(n) exp(1i*k*zi)*integral(#(x) Ei.*conj(u(zi, x, n, wi, Ri, Psii)),-2*wi,2*wi);
%----------------------------------------------------------------------------
xlisti = -0.1:1/10000:0.1; % initial x-axis range
xlistf = -0.1:1/10000:0.1; % final x-axis range
nlist = 0:2:20; % modes range
function Eiplot
Efieldi = zeros(size(xlisti));
for nr = nlist
Efieldi = Efieldi + a(nr).*u(zi, xlisti, nr, wi, Ri, Psii)*exp(-1i*k*zi);
end
Ii = 1/2*c*epsilon*arrayfun(#(x)x.*conj(x),Efieldi);
end
function Efplot
Efieldf = zeros(size(xlistf));
for nr = nlist
Efieldf = Efieldf + a(nr).*u(zf, xlistf, nr, wf, Rf, Psif)*exp(-1i*k*zf);
end
If = 1/2*c*epsilon*arrayfun(#(x)x.*conj(x),Efieldf);
end
Eiplot
Efplot
plot(xlisti,real(Ii),xlistf,real(If))
xlabel('x(m)') % x-axis label
ylabel('I(W/m^2)') % y-axis label
end
The cost is coming from the calls to hermiteH -- for every call, this creates a new function using symbolic variables, then evaluates the function at your input. The key to speeding this up is to pre-compute the hermite polynomial functions then evaluate those rather than create them from scratch each time (speedup from ~26 seconds to around 0.75 secs on my computer).
With the changes:
function gbp
x = sym('x');
zi = 0; % initial position
zf = 100; % final position
Ei = 1; % initial electric field
c = 3*10^8; % speed of light
epsilon = 8.86*10^-12; % permittivity of free space
lambda = 1064*10^-9; % wavelength
k = 2*pi/lambda; % wave number
wi = 1.78*10^-3; % initial waist width (minimum spot size)
zr = (pi*wi^2)/lambda; % Rayleigh range
Ri = zi + zr^2/zi; % initial radius of curvature
qi = 1/(1/Ri-1i*lambda/(pi*wi^2)); % initial complex beam parameter
Psii = atan(real(qi)/imag(qi)); % Gouy phase
mat = [1 zf; 0 1]; % transformation matrix
A = mat(1,1); B = mat(1,2); C = mat(2,1); D = mat(2,2);
qf = (A*qi + B)/(C*qi + D); % final complex beam parameter
wf = sqrt(-lambda/pi*(1/imag(1/qf))); % final spot size
Rf = 1/real(1/qf); % final radius of curvature
Psif = atan(real(qf)/imag(qf)); % final Gouy phase
% Hermite - Gaussian modes function
nlist = 0:2:20; % modes range
% precompute hermite polynomials for nlist
hermites = {};
for n = nlist
if n == 0
hermites{n + 1} = #(x)1.0;
else
hermites{n + 1} = matlabFunction(hermiteH(n, x));
end
end
u = #(z, x, n, w, R, Psi) (2/pi)^(1/4)*sqrt(exp(1i*(2*n+1)*Psi)/(2^n*factorial(n)*w))*...
hermites{n + 1}(sqrt(2)*x/w).*exp(-x.^2*(1/w^2+1i*k/(2*R))-1i*k*z);
% Complex amplitude coefficients function
a = #(n) exp(1i*k*zi)*integral(#(x) Ei.*conj(u(zi, x, n, wi, Ri, Psii)),-2*wi,2*wi);
%----------------------------------------------------------------------------
xlisti = -0.1:1/10000:0.1; % initial x-axis range
xlistf = -0.1:1/10000:0.1; % final x-axis range
function Eiplot
Efieldi = zeros(size(xlisti));
for nr = nlist
Efieldi = Efieldi + a(nr).*u(zi, xlisti, nr, wi, Ri, Psii)*exp(-1i*k*zi);
end
Ii = 1/2*c*epsilon*arrayfun(#(x)x.*conj(x),Efieldi);
end
function Efplot
Efieldf = zeros(size(xlistf));
for nr = nlist
Efieldf = Efieldf + a(nr).*u(zf, xlistf, nr, wf, Rf, Psif)*exp(-1i*k*zf);
end
If = 1/2*c*epsilon*arrayfun(#(x)x.*conj(x),Efieldf);
end
Eiplot
Efplot
plot(xlisti,real(Ii),xlistf,real(If))
xlabel('x(m)') % x-axis label
ylabel('I(W/m^2)') % y-axis label
end
For a fixed and given tform, the imwarp command in the Image Processing Toolbox
B = imwarp(A,tform)
is linear with respect to A, meaning there exists some sparse matrix W, depending on tform but independent of A, such that the above can be equivalently implemented
B(:)=W*A(:)
for all A of fixed known dimensions [n,n]. My question is whether there are fast/efficient options for computing W. The matrix form is necessary when I need the transpose operation W.'*B(:), or if I need to do W\B(:) or similar linear algebraic things which I can't do directly through imwarp alone.
I know that it is possible to compute W column-by-column by doing
E=zeros(n);
W=spalloc(n^2,n^2,4*n^2);
for i=1:n^2
E(i)=1;
tmp=imwarp(E,tform);
E(i)=0;
W(:,i)=tmp(:);
end
but this is brute force and slow.
The routine FUNC2MAT is somewhat more optimal in that it uses the loop to compute/gather the sparse entry data I,J,S of each column W(:,i). Then, after the loop, it uses this to construct the overall sparse matrix. It also offers the option of using a PARFOR loop. However, this is still slower than I would like.
Can anyone suggest more speed-optimal alternatives?
EDIT:
For those uncomfortable with my claim that imwarp(A,tform) is linear w.r.t. A, I include the demo script below, which tests that the superposition property is satisfied for random input images and tform data. It can be run repeatedly to see that the nonlinearityError is always small, and easily attributable to floating point noise.
tform=affine2d(rand(3,2));
%tform=projective2d(rand(3));
fun=#(A) imwarp(A,tform,'cubic');
I1=rand(100); I2=rand(100);
c1=rand; c2=rand;
LHS=fun(c1*I1+c2*I2); %left hand side
RHS=c1*fun(I1)+c2*fun(I2); %right hand side
linearityError = norm(LHS(:)-RHS(:),'inf')
That's actually pretty simple:
W = sparse(B(:)/A(:));
Note that W is not unique, but this operation probably produces the most sparse result. Another way to calculate it would be
W = sparse( B(:) * pinv(A(:)) );
but that results in a much less sparse (yet still valid) result.
I constructed the warping matrix using the optical flow fields [u,v] and it is working well for my application
% this function computes the warping matrix
% M x N is the size of the image
function [ Fw ] = generateFwi( u,v,M,N )
Fw = zeros(M*N, M*N);
k =1;
for i=1:M
for j= 1:N
newcoord(1) = i+u(i,j);
newcoord(2) = j+v(i,j);
newi = newcoord(1);
newj = newcoord(2);
if newi >0 && newj >0
newi1x = floor(newi);
newi1y = floor(newj);
newi2x = floor(newi);
newi2y = ceil(newj);
newi3x = ceil(newi); % four nearest points to the given point
newi3y = floor(newj);
newi4x = ceil(newi);
newi4y = ceil(newj);
x1 = [newi,newj;newi1x,newi1y];
x2 = [newi,newj;newi2x,newi2y];
x3 = [newi,newj;newi3x,newi3y];
x4 = [newi,newj;newi4x,newi4y];
w1 = pdist(x1,'euclidean');
w2 = pdist(x2,'euclidean');
w3 = pdist(x3,'euclidean');
w4 = pdist(x4,'euclidean');
if ceil(newi) == floor(newi) && ceil(newj)==floor(newj) % both the new coordinates are integers
Fw(k,(newi1x-1)*N+newi1y) = 1;
else if ceil(newi) == floor(newi) % one of the new coordinates is an integer
w = w1+w2;
w1new = w1/w;
w2new = w2/w;
W = w1new*w2new;
y1coord = (newi1x-1)*N+newi1y;
y2coord = (newi2x-1)*N+newi2y;
if y1coord <= M*N && y2coord <=M*N
Fw(k,y1coord) = W/w2new;
Fw(k,y2coord) = W/w1new;
end
else if ceil(newj) == floor(newj) % one of the new coordinates is an integer
w = w1+w3;
w1 = w1/w;
w3 = w3/w;
W = w1*w3;
y1coord = (newi1x-1)*N+newi1y;
y2coord = (newi3x-1)*N+newi3y;
if y1coord <= M*N && y2coord <=M*N
Fw(k,y1coord) = W/w3;
Fw(k,y2coord) = W/w1;
end
else % both the new coordinates are not integers
w = w1+w2+w3+w4;
w1 = w1/w;
w2 = w2/w;
w3 = w3/w;
w4 = w4/w;
W = w1*w2*w3 + w2*w3*w4 + w3*w4*w1 + w4*w1*w2;
y1coord = (newi1x-1)*N+newi1y;
y2coord = (newi2x-1)*N+newi2y;
y3coord = (newi3x-1)*N+newi3y;
y4coord = (newi4x-1)*N+newi4y;
if y1coord <= M*N && y2coord <= M*N && y3coord <= M*N && y4coord <= M*N
Fw(k,y1coord) = w2*w3*w4/W;
Fw(k,y2coord) = w3*w4*w1/W;
Fw(k,y3coord) = w4*w1*w2/W;
Fw(k,y4coord) = w1*w2*w3/W;
end
end
end
end
else
Fw(k,k) = 1;
end
k=k+1;
end
end
end