Related
I am working in MATLAB to process two 512x512 images, the domain image and the range image. What I am trying to accomplish is the following:
Divide both domain and range images into 8x8 pixel blocks
For each 8x8 block in the domain image, I have to apply a linear transformations to it and compare each of the 4096 transformed blocks with each of the 4096 range blocks.
Compute error in each case between the transformed block and the range image block and find the minimum error.
Finally I'll have for each 8x8 range block, the id of the 8x8 domain block for which the error was minimum (error between the range block and the transformed domain block)
To achieve this, I have written the following code:
RangeImagecolor = imread('input.png'); %input is 512x512
DomainImagecolor = imread('input.png'); %Range and Domain images are identical
RangeImagetemp = rgb2gray(RangeImagecolor);
DomainImagetemp = rgb2gray(DomainImagecolor);
RangeImage = im2double(RangeImagetemp);
DomainImage = im2double(DomainImagetemp);
%For the (k,l)th 8x8 range image block
for k = 1:64
for l = 1:64
minerror = 9999;
min_i = 0;
min_j = 0;
for i = 1:64
for j = 1:64
%here I compute for the (i,j)th domain block, the transformed domain block stored in D_trans
error = 0;
D_trans = zeros(8,8);
R = zeros(8,8); %Contains the pixel values of the (k,l)th range block
for m = 1:8
for n = 1:8
R(m,n) = RangeImage(8*k-8+m,8*l-8+n);
%ApplyTransformation can depend on (k,l) so I can't compute the transformation outside the k,l loop.
[m_dash,n_dash] = ApplyTransformation(8*i-8+m,8*j-8+n);
D_trans(m,n) = DomainImage(m_dash,n_dash);
error = error + (R(m,n)-D_trans(m,n))^2;
end
end
if(error < minerror)
minerror = error;
min_i = i;
min_j = j;
end
end
end
end
end
As an example ApplyTransformation, one can use the identity transformation:
function [x_dash,y_dash] = Iden(x,y)
x_dash = x;
y_dash = y;
end
Now the problem I am facing is the high computation time. The order of computation in the above code is 64^5, which is of the order 10^9. This computation should take at the worst minutes or an hour. It takes about 40 minutes to compute just 50 iterations. I don't know why the code is running so slow.
Thanks for reading my question.
You can use im2col* to convert the image to column format so each block forms a column of a [64 * 4096] matrix. Then apply transformation to each column and use bsxfun to vectorize computation of error.
DomainImage=rand(512);
RangeImage=rand(512);
DomainImage_col = im2col(DomainImage,[8 8],'distinct');
R = im2col(RangeImage,[8 8],'distinct');
[x y]=ndgrid(1:8);
function [x_dash, y_dash] = ApplyTransformation(x,y)
x_dash = x;
y_dash = y;
end
[x_dash, y_dash] = ApplyTransformation(x,y);
idx = sub2ind([8 8],x_dash, y_dash);
D_trans = DomainImage_col(idx,:); %transformation is reduced to matrix indexing
Error = 0;
for mn = 1:64
Error = Error + bsxfun(#minus,R(mn,:),D_trans(mn,:).').^2;
end
[minerror ,min_ij]= min(Error,[],2); % linear index of minimum of each block;
[min_i min_j]=ind2sub([64 64],min_ij); % convert linear index to subscript
Explanation:
Our goal is to reduce number of loops as much as possible. For it we should avoid matrix indexing and instead we should use vectorization. Nested loops should be converted to one loop. As the first step we can create a more optimized loop as here:
min_ij = zeros(4096,1);
for kl = 1:4096 %%% => 1:size(D_trans,2)
minerror = 9999;
min_ij(kl) = 0;
for ij = 1:4096 %%% => 1:size(R,2)
Error = 0;
for mn = 1:64
Error = Error + (R(mn,kl) - D_trans(mn,ij)).^2;
end
if(Error < minerror)
minerror = Error;
min_ij(kl) = ij;
end
end
end
We can re-arrange the loops and we can make the most inner loop as the outer loop and separate computation of the minimum from the computation of the error.
% Computation of the error
Error = zeros(4096,4096);
for mn = 1:64
for kl = 1:4096
for ij = 1:4096
Error(kl,ij) = Error(kl,ij) + (R(mn,kl) - D_trans(mn,ij)).^2;
end
end
end
% Computation of the min
min_ij = zeros(4096,1);
for kl = 1:4096
minerror = 9999;
min_ij(kl) = 0;
for ij = 1:4096
if(Error(kl,ij) < minerror)
minerror = Error(kl,ij);
min_ij(kl) = ij;
end
end
end
Now the code is arranged in a way that can best be vectorized:
Error = 0;
for mn = 1:64
Error = Error + bsxfun(#minus,R(mn,:),D_trans(mn,:).').^2;
end
[minerror ,min_ij] = min(Error, [], 2);
[min_i ,min_j] = ind2sub([64 64], min_ij);
*If you don't have the Image Processing Toolbox a more efficient implementation of im2col can be found here.
*The whole computation takes less than a minute.
First things first - your code doesn't do anything. But you likely do something with this minimum error stuff and only forgot to paste this here, or still need to code that bit. Never mind for now.
One big issue with your code is that you calculate transformation for 64x64 blocks of resulting image AND source image. 64^5 iterations of a complex operation are bound to be slow. Rather, you should calculate all transformations at once and save them.
allTransMats = cell(64);
for i = 1 : 64
for j = 1 : 64
allTransMats{i,j} = getTransformation(DomainImage, i, j)
end
end
function D_trans = getTransformation(DomainImage, i,j)
D_trans = zeros(8);
for m = 1 : 8
for n = 1 : 8
[m_dash,n_dash] = ApplyTransformation(8*i-8+m,8*j-8+n);
D_trans(m,n) = DomainImage(m_dash,n_dash);
end
end
end
This serves to get allTransMat and is OUTSIDE the k, l loop. Preferably as a simple function.
Now, you make your big k, l, i, j loop, where you compare all the elements as needed. Comparison could be also done block-wise instead of filling a small 8x8 matrix, yet doing it per element for some reason.
m = 1 : 8;
n = m;
for ...
R = RangeImage(...); % This will give 8x8 output as n and m are vectors.
D = allTransMats{i,j};
difference = sum(sum((R-D).^2));
if (difference < minDifference) ...
end
Even though this is a simple no transformations case, this speeds up code a lot.
Finally, are you sure you need to compare each block of transformed output with each block in the source? Typically you compare block1(a,b) with block2(a,b) - blocks (or pixels) on the same position.
EDIT: allTransMats requires k and l too. Ouch. There is NO WAY to make this fast for a single iteration, as you require 64^5 calls to ApplyTransformation (or a vectorization of that function, but even then it might not be fast - we would have to see the function to help here).
Therefore, I will re-iterate my advice to generate all transformations and then perform lookup: this upper part of the answer with allTransMats generation should be changed to have all 4 loops and generate allTransMats{i,j,k,l};. It WILL be slow, there is no way around that as I mentioned in the upper part of edit. But, it is a cost you pay once, as after saving the allTransMats, all further image analyses will be able to simply load it instead of generating it again.
But ... what do you even do? Transformation that depends on source and destination block indices plus pixel indices (= 6 values total) sounds like a mistake somewhere, or a prime candidate to optimize instead of all the rest.
I tried implement the levenberg-marquardt method for solving non-linear equations on Julia based on Numerical Optimization using the
Levenberg-Marquardt Algorithm presentation. This my code:
function get_J(ArrOfFunc,X,delta)
N = length(ArrOfFunc)
J = zeros(Float64,N,N)
for i = 1:N
for j=1:N
Temp = copy(X);
Temp[j]=Temp[j]+delta;
J[i,j] = (ArrOfFunc[i](Temp)-ArrOfFunc[i](X))/delta;
end
end
return J
end
function get_resudial(ArrOfFunc,Arg)
return map((x)->x(Arg),ArrOfFunc)
end
function lm_solve(Funcs,Init)
X = copy(Init)
delta = 0.01;
Lambda = 0.01;
Factor = 2;
J = get_J(Funcs,X,delta)
R = get_resudial(Funcs,X)
N = 5
for t = 1:N
G = J'*J+Lambda.*eye(length(X))
dC = J'*R
C = sum(R.*R)/2;
Xnew = X-(inv(G)\dC);
Rnew = get_resudial(Funcs,Xnew)
Cnew = sum(Rnew.*Rnew)/2;
if ( Cnew < C)
X = Xnew;
R = Rnew;
Lambda = Lambda/Factor;
J = get_J(Funcs,X,delta)
else
Lambda = Lambda*Factor;
end
if(maximum(abs(Rnew)) < 0.001)
return X
end
end
return X
end
function test()
ArrOfFunc = [
(X)->X[1]+X[2]-2;
(X)->X[1]-X[2]
];
X = lm_solve(ArrOfFunc,Float64[3;3])
println(X)
return X
end
But from any starting point the step not accepted. What's I doing wrong?
Any help would be appreciated.
I have at the moment no way to test this, but one line does not make sense mathematically:
In the computation of Xnew it should be either inv(G)*dC or G\dC, but not a mix of both. Preferably the second, since the solution of a linear system does not require the computation of the inverse matrix.
With this one wrong calculation at the center of the iteration, the trajectory of the computation is almost surely going astray.
I need to evaluate an integral, and my code is
r=0:25;
t=0:250;
Ti=exp(-r.^2);
T=zeros(length(r),length(t));
for n=1:length(t)
w=1/2/t(n);
for m=1:length(r)
T(m,n)=w*trapz(r,Ti.*exp(-(r(m).^2+r.^2)*w/2).*r.*besseli(0,r(m)*r*w));
end
end
Currently the evaluation is fairly fast, but I wonder if there is a way to vectorize the double for-loop and make it even faster, especially when function trapz is used.
You can optimize it by passing matrix argument Y to trapz(A,Y), and using dim = 2, i.e. the loop becomes:
r = 0:25;
t = 0:250;
Ti = exp(-r.^2);
tic
T = zeros(length(r),length(t));
for n = 1:length(t)
w = 1/2/t(n);
for m = 1:length(r)
T(m,n) = w*trapz(r,Ti.*exp(-(r(m).^2+r.^2)*w/2).*r.*besseli(0,r(m)*r*w));
end
end
toc
tic
T1 = zeros(length(r),length(t));
for n = 1:length(t)
w = 1/2/t(n);
Y = bsxfun(#times,Ti.*r, exp(-bsxfun(#plus,r'.^2,r.^2)*w/2).*besseli(0,bsxfun(#times,r',r*w)));
T1(:,n) = w* trapz(r,Y,2);
end
toc
max(abs(T(:)-T1(:)))
You could probably vectorize it completely, will have a look later.
Here is my little script for simulating Levy motion:
clear all;
clc; close all;
t = 0; T = 1000; I = T-t;
dT = T/I; t = 0:dT:T; tau = T/I;
alpha = 1.5;
sigma = dT^(1/alpha);
mu = 0; beta = 0;
N = 1000;
X = zeros(N, length(I));
for k=1:N
L = zeros(1,I);
for i = 1:I-1
L( (i + 1) * tau ) = L(i*tau) + stable2( alpha, beta, sigma, mu, 1);
end
X(k,1:length(L)) = L;
end
q = 0.1:0.1:0.9;
quant = qlines2(X, q, t(1:length(X)), tau);
hold all
for i = 1:length(quant)
plot( t, quant(i) * t.^(1/alpha), ':k' );
end
Where stable2 returns a stable random variable with given parameters (you may replace it with normrnd(mu, sigma) for this case, it's not crucial); qlines2 returns quantiles needed for plotting.
But I don't want to talk about math here. My problem is that this implementation is pretty slow, and I would like to speed it up. Unfortunately, computer science is not my main field - I heard something about methods like memoization, vectorization and that there is a lot of other techniques, but I don't know how to use them.
For example, I'm pretty sure I should replace this filthy double for-loop somehow, but I'm not sure what to do instead.
EDIT: Maybe I should use (and learn...) another language (Python, C, any functional one)? I always though that Matlab/OCTAVE is designed for numerical computation, but if change, then for which one?
The crucial bit is, as you said, the for loops, Matlab does not like those, so vectorization is indeed the keyword. (Together with preallocating the space.
I just altered you for loop section somewhat so that you do not have to reset L over and over again, instead we save all Ls in a bigger matrix (also I elimiated the length(L) command).
L = zeros(N,I);
for k=1:N
for i = 1:I-1
L(k,(i + 1) * tau ) = L(k,i*tau) + normrnd(mu, sigma);
end
X(k,1:I) = L(k,1:I);
end
Now you can already see that X(k,1:I) = L(k,1:I); in the loop is obsolete and that also means that we can switch the order of the loops. This is crucial, because the i-steps are recursive (depend on the previous step) that means we cannot vectorize this loop, we can only vectorize the k-loop.
Now your original code needed 9.3 seconds on my machine, the new code still needs about the same time)
L = zeros(N,I);
for i = 1:I-1
for k=1:N
L(k,(i + 1) * tau ) = L(k,i*tau) + normrnd(mu, sigma);
end
end
X = L;
But now we can apply the vectorization, instead of looping throu all rows (the loop over k) we can instead eliminate this loop, and doing all rows at "once".
L = zeros(N,I);
for i = 1:I-1
L(:,(i + 1) * tau ) = L(:,i*tau) + normrnd(mu, sigma); %<- this is not yet what you want, see comment below
end
X = L;
This code need only 0.045 seconds on my machine. I hope you still get the same output, because I have no idea what you are calculating, but I also hope you could see how you go about vectorizing code.
PS: I just noticed that we now use the same random number in the last example for the whole column, this is obviously not what you want. Instad you should generate a whole vector of random numbers, e.g:
L = zeros(N,I);
for i = 1:I-1
L(:,(i + 1) * tau ) = L(:,i*tau) + normrnd(mu, sigma,N,1);
end
X = L;
PPS: Great question!
I am having problems with the following loop, since it is taking too much time. Hence, I would like to use parallel processing, specifically parfor function.
P = numel(scaleX); % quite BIG number
sz = P;
start = 1;
sqrL = 10; % sqr len
e = 200;
A = false(sz, sz);
for m = sz-sqrL/2:(-1)*sqrL:start
for n = M(m):-sqrL:1
temp = [scaleX(m), scaleY(m); scaleX(n), scaleY(n)];
d = pdist(temp, 'euclidean');
if d < e
A(m, n) = 1;
end
end
end
Can anyone, please, help me to convert the outer 'far' loop into 'parfor' in this code?