I normally replace the for in MATLAB code with parfor, but all the 2 dimension matrix did not work.
Code
parfor k=1 : n
sonic = data1((1+(k-1)*2400):(2400*k));
signal1 = (sonic(1:2400))./100;
Ar = abs(fftshift(fft(signal1,2400)));
[maxb,ind] = max(b);
Tp(k) = 2*pi/x(ind);
E = #(x)(x^2+1);
for i=1:length(x2)
Ex(i,k) = E(x2(i));
Exm0(i,k) = Ex(i,k)-m0(k);
signal2(i) = Exm0(i,k);
end
epsilong(:,k) = Ar;
end
Only variables such as Tp(k) appear in the workspace; two dimension matrices like Ex(i,k) did not work.
The limitations of FOR loops nested within PARFOR loops are described here - I think the problem in this case is your loop bounds 1:length(x2). As described in that page, you should be able to work around this like so:
len_x2 = length(x2);
parfor k = 1:n
...
for i = 1:len_x2
...
end
end
Related
I am working in MATLAB to process two 512x512 images, the domain image and the range image. What I am trying to accomplish is the following:
Divide both domain and range images into 8x8 pixel blocks
For each 8x8 block in the domain image, I have to apply a linear transformations to it and compare each of the 4096 transformed blocks with each of the 4096 range blocks.
Compute error in each case between the transformed block and the range image block and find the minimum error.
Finally I'll have for each 8x8 range block, the id of the 8x8 domain block for which the error was minimum (error between the range block and the transformed domain block)
To achieve this, I have written the following code:
RangeImagecolor = imread('input.png'); %input is 512x512
DomainImagecolor = imread('input.png'); %Range and Domain images are identical
RangeImagetemp = rgb2gray(RangeImagecolor);
DomainImagetemp = rgb2gray(DomainImagecolor);
RangeImage = im2double(RangeImagetemp);
DomainImage = im2double(DomainImagetemp);
%For the (k,l)th 8x8 range image block
for k = 1:64
for l = 1:64
minerror = 9999;
min_i = 0;
min_j = 0;
for i = 1:64
for j = 1:64
%here I compute for the (i,j)th domain block, the transformed domain block stored in D_trans
error = 0;
D_trans = zeros(8,8);
R = zeros(8,8); %Contains the pixel values of the (k,l)th range block
for m = 1:8
for n = 1:8
R(m,n) = RangeImage(8*k-8+m,8*l-8+n);
%ApplyTransformation can depend on (k,l) so I can't compute the transformation outside the k,l loop.
[m_dash,n_dash] = ApplyTransformation(8*i-8+m,8*j-8+n);
D_trans(m,n) = DomainImage(m_dash,n_dash);
error = error + (R(m,n)-D_trans(m,n))^2;
end
end
if(error < minerror)
minerror = error;
min_i = i;
min_j = j;
end
end
end
end
end
As an example ApplyTransformation, one can use the identity transformation:
function [x_dash,y_dash] = Iden(x,y)
x_dash = x;
y_dash = y;
end
Now the problem I am facing is the high computation time. The order of computation in the above code is 64^5, which is of the order 10^9. This computation should take at the worst minutes or an hour. It takes about 40 minutes to compute just 50 iterations. I don't know why the code is running so slow.
Thanks for reading my question.
You can use im2col* to convert the image to column format so each block forms a column of a [64 * 4096] matrix. Then apply transformation to each column and use bsxfun to vectorize computation of error.
DomainImage=rand(512);
RangeImage=rand(512);
DomainImage_col = im2col(DomainImage,[8 8],'distinct');
R = im2col(RangeImage,[8 8],'distinct');
[x y]=ndgrid(1:8);
function [x_dash, y_dash] = ApplyTransformation(x,y)
x_dash = x;
y_dash = y;
end
[x_dash, y_dash] = ApplyTransformation(x,y);
idx = sub2ind([8 8],x_dash, y_dash);
D_trans = DomainImage_col(idx,:); %transformation is reduced to matrix indexing
Error = 0;
for mn = 1:64
Error = Error + bsxfun(#minus,R(mn,:),D_trans(mn,:).').^2;
end
[minerror ,min_ij]= min(Error,[],2); % linear index of minimum of each block;
[min_i min_j]=ind2sub([64 64],min_ij); % convert linear index to subscript
Explanation:
Our goal is to reduce number of loops as much as possible. For it we should avoid matrix indexing and instead we should use vectorization. Nested loops should be converted to one loop. As the first step we can create a more optimized loop as here:
min_ij = zeros(4096,1);
for kl = 1:4096 %%% => 1:size(D_trans,2)
minerror = 9999;
min_ij(kl) = 0;
for ij = 1:4096 %%% => 1:size(R,2)
Error = 0;
for mn = 1:64
Error = Error + (R(mn,kl) - D_trans(mn,ij)).^2;
end
if(Error < minerror)
minerror = Error;
min_ij(kl) = ij;
end
end
end
We can re-arrange the loops and we can make the most inner loop as the outer loop and separate computation of the minimum from the computation of the error.
% Computation of the error
Error = zeros(4096,4096);
for mn = 1:64
for kl = 1:4096
for ij = 1:4096
Error(kl,ij) = Error(kl,ij) + (R(mn,kl) - D_trans(mn,ij)).^2;
end
end
end
% Computation of the min
min_ij = zeros(4096,1);
for kl = 1:4096
minerror = 9999;
min_ij(kl) = 0;
for ij = 1:4096
if(Error(kl,ij) < minerror)
minerror = Error(kl,ij);
min_ij(kl) = ij;
end
end
end
Now the code is arranged in a way that can best be vectorized:
Error = 0;
for mn = 1:64
Error = Error + bsxfun(#minus,R(mn,:),D_trans(mn,:).').^2;
end
[minerror ,min_ij] = min(Error, [], 2);
[min_i ,min_j] = ind2sub([64 64], min_ij);
*If you don't have the Image Processing Toolbox a more efficient implementation of im2col can be found here.
*The whole computation takes less than a minute.
First things first - your code doesn't do anything. But you likely do something with this minimum error stuff and only forgot to paste this here, or still need to code that bit. Never mind for now.
One big issue with your code is that you calculate transformation for 64x64 blocks of resulting image AND source image. 64^5 iterations of a complex operation are bound to be slow. Rather, you should calculate all transformations at once and save them.
allTransMats = cell(64);
for i = 1 : 64
for j = 1 : 64
allTransMats{i,j} = getTransformation(DomainImage, i, j)
end
end
function D_trans = getTransformation(DomainImage, i,j)
D_trans = zeros(8);
for m = 1 : 8
for n = 1 : 8
[m_dash,n_dash] = ApplyTransformation(8*i-8+m,8*j-8+n);
D_trans(m,n) = DomainImage(m_dash,n_dash);
end
end
end
This serves to get allTransMat and is OUTSIDE the k, l loop. Preferably as a simple function.
Now, you make your big k, l, i, j loop, where you compare all the elements as needed. Comparison could be also done block-wise instead of filling a small 8x8 matrix, yet doing it per element for some reason.
m = 1 : 8;
n = m;
for ...
R = RangeImage(...); % This will give 8x8 output as n and m are vectors.
D = allTransMats{i,j};
difference = sum(sum((R-D).^2));
if (difference < minDifference) ...
end
Even though this is a simple no transformations case, this speeds up code a lot.
Finally, are you sure you need to compare each block of transformed output with each block in the source? Typically you compare block1(a,b) with block2(a,b) - blocks (or pixels) on the same position.
EDIT: allTransMats requires k and l too. Ouch. There is NO WAY to make this fast for a single iteration, as you require 64^5 calls to ApplyTransformation (or a vectorization of that function, but even then it might not be fast - we would have to see the function to help here).
Therefore, I will re-iterate my advice to generate all transformations and then perform lookup: this upper part of the answer with allTransMats generation should be changed to have all 4 loops and generate allTransMats{i,j,k,l};. It WILL be slow, there is no way around that as I mentioned in the upper part of edit. But, it is a cost you pay once, as after saving the allTransMats, all further image analyses will be able to simply load it instead of generating it again.
But ... what do you even do? Transformation that depends on source and destination block indices plus pixel indices (= 6 values total) sounds like a mistake somewhere, or a prime candidate to optimize instead of all the rest.
I need to evaluate an integral, and my code is
r=0:25;
t=0:250;
Ti=exp(-r.^2);
T=zeros(length(r),length(t));
for n=1:length(t)
w=1/2/t(n);
for m=1:length(r)
T(m,n)=w*trapz(r,Ti.*exp(-(r(m).^2+r.^2)*w/2).*r.*besseli(0,r(m)*r*w));
end
end
Currently the evaluation is fairly fast, but I wonder if there is a way to vectorize the double for-loop and make it even faster, especially when function trapz is used.
You can optimize it by passing matrix argument Y to trapz(A,Y), and using dim = 2, i.e. the loop becomes:
r = 0:25;
t = 0:250;
Ti = exp(-r.^2);
tic
T = zeros(length(r),length(t));
for n = 1:length(t)
w = 1/2/t(n);
for m = 1:length(r)
T(m,n) = w*trapz(r,Ti.*exp(-(r(m).^2+r.^2)*w/2).*r.*besseli(0,r(m)*r*w));
end
end
toc
tic
T1 = zeros(length(r),length(t));
for n = 1:length(t)
w = 1/2/t(n);
Y = bsxfun(#times,Ti.*r, exp(-bsxfun(#plus,r'.^2,r.^2)*w/2).*besseli(0,bsxfun(#times,r',r*w)));
T1(:,n) = w* trapz(r,Y,2);
end
toc
max(abs(T(:)-T1(:)))
You could probably vectorize it completely, will have a look later.
I have a cell array of size m x 1 and each cell is again s x t cell array (size varies). I would like to concatenate vertically. The code is as follows:
function(cell_out) = vert_cat(cell_in)
[row,col] = cellfun(#size,cell_in,'Uni',0);
fcn_vert = #(x)([x,repmat({''},size(x,1),max(cell2mat(col))-size(x,2))]);
cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
cell_out = vertcat(cell_out{:});
end
Step 3 takes a lot of time. Is it the right way to do or is there any another faster way to achieve this?
cellfun has been found to be slower than loops (kind of old, but agrees with what I have seen).
In addition, repmat has also been a performance hit in the past (though that may be different now).
Try this two-loop code that aims to accomplish your task:
function cellOut = vert_cat(c)
nElem = length(c);
colPad = zeros(nElem,1);
nRow = zeros(nElem,1);
for k = 1:nElem
[nRow(k),colPad(k)] = size(c{k});
end
colMax = max(colPad);
colPad = colMax - colPad;
cellOut = cell(sum(nRow),colMax);
bottom = cumsum(nRow) - nRow + 1;
top = bottom + nRow - 1;
for k = 1:nElem
cellOut(bottom(k):top(k),:) = [c{k},cell(nRow(k),colPad(k))];
end
end
My test for this code was
A = rand(20,20);
A = mat2cell(A,ones(20,1),ones(20,1));
C = arrayfun(#(c) A(1:c,1:c),randi([1,15],1,5),'UniformOutput',false);
ccat = vert_cat(c);
I used this pice of code to generate data:
%generating some dummy data
m=1000;
s=100;
t=100;
cell_in=cell(m,1);
for idx=1:m
cell_in{idx}=cell(randi(s),randi(t));
end
Applying some minor modifications, I was able to speed up the code by a factor of 5
%Minor modifications of the original code
%use arrays instead of cells for row and col
[row,col] = cellfun(#size,cell_in);
%claculate max(col) once
tcol=max(col);
%use cell instead of repmat to generate an empty cell
fcn_vert = #(x)([x,cell(size(x,1),tcol-size(x,2))]);
cell_out = cellfun(fcn_vert,cell_in,'Uni',0); % Taking up lot of time
cell_out = vertcat(cell_out{:});
Using simply a for loop is even faster, because the data is only moved once
%new approac. Basic idea: move every data only once
[row,col] = cellfun(#size,cell_in);
trow=sum(row);
tcol=max(col);
r=1;
cell_out2 = cell(trow,tcol);
for idx=1:numel(cell_in)
cell_out2(r:r+row(idx)-1,1:col(idx))=cell_in{idx};
r=r+row(idx);
end
Communication overhead (parfor) and preallocating for speed (for) in Multidimensional Arrays
I am getting two warnings in the following script at the places indicated by **'s
Variable is indexed but not sliced... (the array A shown by ** in the second parfor loop) - What is causing this and how can it be avoided?
The variable appears to change size on every loop... (the array Sol shown by ** in the for loop) - Maybe I am not doing it right, but preallocating memory hasn't worked.
Edit: My initial idea was to preallocate the arrays (as done in the first parfor loop) so that it will execute the rest of the script faster (the full version of the script repeats various array operations similar to the second parfor and for loops).
Any suggestions? :)
N = 1000;
parfor i=1:N
A(:,:,i) = rand(2);
X(:,:,i) = rand(2,1);
Sol1(1,1,i) = zeros();
Sol2(1,1,i) = zeros();
Sol(2,1,i) = zeros();
end
t0 = tic;
parfor i=1:N
Sol1(1,:,i) = A(1,:,i)*X(:,1,i);
Sol2(1,:,i) = **A**(2,:,i)*X(:,1,i);
end
for i=1:N
**Sol**(:,1,i) = [Sol1(1,:,i);Sol2(1,:,i)];
end
toc(t0);
Your pre-allocation is not right - you need to do each in a single call.
A = rand(2, 2, N);
X = rand(2, 1, N);
Sol1 = zeros(1, 1, N);
Sol2 = zeros(1, 1, N);
Sol = zeros(2, 1, N); % not really needed actually.
In your PARFOR loop, you can avoid 'broadcasting' A by using a syntax that MATLAB understands as slicing
parfor i = 1:N
tmp = A(:, :, i);
Sol1(1, :, i) = tmp(1,:) * X(:, 1, i);
Sol2(1, :, i) = tmp(2,:) * X(:, 1, i);
end
Finally, I think you can do this as a vectorised concatenation like so:
Sol = [Sol1; Sol2];
EDIT
On the GPU, you can use pagefun to get the whole job done in a single call, like so:
Ag = gpuArray.rand(2,2,N);
Xg = gpuArray.rand(2,1,N);
Sol = pagefun(#mtimes, Ag, Xg);
I have these nested for-loops that I would like to convert to parfor:
row = 1;
for i = 5 : 0.2 : 5.4
col = 1;
for j = 2 : 0.5 : 2.5
matrx(row, col) = i * j;
col = col + 1;
end
row = row + 1;
end
Does anyone any way in which this would be possible?
I hope you're only displaying an extremely simplified version of your code, but anyways, the secret to parfor can be found by listening to Matlab numerous messages and reading the documentation. Start by learning good Matlab coding practices, and streamlining your code in such a way to fit your data into what Matlab wants in a parfor loop.
Things to note:
Parfor loops should be integers.
All matrices must be classified (read the documentation).
Container matrices should be used in nested for loops
This is one way I would do it, although it depends on your final application
iVal = 5 : 0.2 : 5.4;
jVal = 2 : 0.5 : 2.5;
iLen = length(iVal);
jLen = length(jVal);
matrx = zeros(iLen, jLen);
parfor i = 1:iLen
dummy = zeros(1, jLen);
for j = 1:jLen
dummy(j) = iVal(i) * jVal(j);
end
matrx(i,:) = dummy;
end