Faster alternative to INTERSECT with 'rows' - MATLAB - performance

I have a code written in Matlab that uses 'intersect' to find the vectors (and their indices) that intersect in two large matrices. I found that 'intersect' is the slowest line (by a large difference) in my code. Unfortunately I couldn't find a faster alternative so far.
As an example running the code below takes approx 5 seconds on my pc:
profile on
for i = 1 : 500
a = rand(10000,5);
b = rand(10000,5);
[intersectVectors, ind_a, ind_b] = intersect(a,b,'rows');
end
profile viewer
I was wondering if there is a faster way. Note that the matrices (a) and (b) have 5 columns. The number of rows don't necessary have to be the same for the two matrices.
Any help would be great.
Thanks

Discussion and solution codes
You can use an approach that leverages fast matrix multiplication in MATLAB to convert those 5 columns of input arrays into one column by considering each column as a significant "digit" of a single number. Thus, you would end up with an array with only column and then, you can use intersect or ismember without 'rows' and that must speedup the codes in a big way!
Here are the promised implementations as function codes for easy usage -
intersectrows_fast_v1.m:
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v1(a,b)
%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;
%// Use intersect without 'rows' option for a good speedup
[~, ind_a, ind_b] = intersect(acol1,bcol1);
intersectVectors = a(ind_a,:);
return;
intersectrows_fast_v2.m:
function [intersectVectors, ind_a, ind_b] = intersectrows_fast_v2(a,b)
%// Calculate equivalent one-column versions of input arrays
mult = [10^ceil(log10( 1+max( [a(:);b(:)] ))).^(size(a,2)-1:-1:0)]'; %//'
acol1 = a*mult;
bcol1 = b*mult;
%// Use ismember to get indices of the common elements
[match_a,idx_b] = ismember(acol1,bcol1);
%// Now, with ismember, duplicate items are not taken care of automatically as
%// are done with intersect. So, we need to find the duplicate items and
%// remove those from the outputs of ismember
[~,a_sorted_ind] = sort(acol1);
a_rm_ind =a_sorted_ind([false;diff(sort(acol1))==0]); %//indices to be removed
match_a(a_rm_ind)=0;
intersectVectors = a(match_a,:);
ind_a = find(match_a);
ind_b = idx_b(match_a);
return;
Quick tests and conclusions
With the datasizes listed in the question, the runtimes were -
-------------------------- With original approach
Elapsed time is 3.885792 seconds.
-------------------------- With Proposed approach - Version - I
Elapsed time is 0.581123 seconds.
-------------------------- With Proposed approach - Version - II
Elapsed time is 0.963409 seconds.
The results seem to suggest a big advantage in favour of the version - I of the two proposed approaches with a whooping speedup of around 6.7x over the original approach!!
Also, please note that if you don't need any one or two of the three outputs from the original intersect with 'rows' based approach, then both the proposed approaches could be further shortened for better runtime performances!

Related

MATLAB: Speeding up a discretization function using bsxfun

For a current project, I have to discretize quasi-continuous values into bins defined by some pre-defined binning resolution. For this purpose, I have written a function, which I expected to be highly efficient as it is able to both process scalar inputs as well as vector inputs using bsxfun. However, after some profiling, I found out that almost all processing time of my much larger project is produced in this function, and within the function, it's mainly the bsxfun part that takes time, with the min-query following on second place. Long story short, I am looking for advice on how to solve this task MUCH faster in MATLAB. Side note: I am usually passing vectors with some 50k elements.
Here's the code:
function sampleNo = value2sample(value,bins)
%Make sure both vectors have orientations fitting bsxfun
value = value(:);
bins = bins(:)';
%Recover bin resolution (avoids passing another parameter)
delta = median(diff(bins));
%Calculate distance matrix between all combinations
dist = abs(bsxfun(#minus,value,bins));
%What we really want to know is the minimum distance per row
[minval,ind] = min(dist,[],2);
%Make sure we don't accidentally further process NaNs as 1st bin
ind(isnan(minval))=NaN;
sampleNo = ind;
sampleNo(minval>delta) = NaN;
end
The reason that your function is slow is because you are computing the distance between every element of values and bins and storing them all in an array - if there are N values and M bins then you will require NM elements to store all the distances, and this is probably a really big number (e.g. if each input has 50,000 elements then you need 2.5 billion elements in the output array).
Moreover, since your bins are sorted (you didn't state this, but it looks like you are assuming it in your code) you do not need to compute the distance from every value to every bin. You can be much smarter,
function ind = value2sample(value, bins)
% Find median bin distance
delta = median(diff(bins));
% Bucket into 'nearest' bin by using midpoints
bins = bins(:);
mids = [-Inf; 0.5 * (bins(1:end-1) + bins(2:end))];
[~, ind] = histc(value, mids);
% Ensure that NaN values and points that aren't near any bin are returned as NaN
ind(isnan(value)) = NaN;
ind(abs(value - bins(ind)) > delta) = NaN;
end
In my tests, with values = randn(10000, 1) and bins = -50:50 it takes around 4.5 milliseconds to run the original function, and 485 microseconds to run the code above, so you are getting around a 10x speedup (and the speedup will be even greater as you increase the size of the inputs).
Thanks to #Chris Taylor, I was able to solve the problem very efficiently. The code now runs almost 400 times faster than before. The only changes I had to make from his version are reflected in the code below. Main issue was to replace histc (whose use is not encouraged anymore) by discretize.
function ind = value2sample(value, bins)
% Make sure the vectors are standing
value = value(:);
bins = bins(:);
% Bucket into 'nearest' bin by using midpoints
mids = [eps; 0.5 * (bins(1:end-1) + bins(2:end))];
ind = discretize(value, mids);
The only thing is, that in this implementation your bins must be non-negative. Other than that, this code does exactly what I want, including the fact that ind has the same size as value and contains NaNs whenever a value is NaN or out of the range of bins.

Looking for efficient way to perform a computation - Matlab

I have a scalar function f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2) which receives two 2-dimensional vectors as input (norm here implements the Euclidean norm). The values of x,i range in 1:w and the values y,j range in 1:h. I want to create a cell array X such that X{x,y} will contain a w x h matrix such that X{x,y}(i,j) = f([x,y],[i,j]). This can obviously be done using 4 nested loops like so:
for x=1:w;
for y=1:h;
X{x,y}=zeros(w,h);
for i=1:w
for j=1:h
X{x,y}(i,j)=f([x,y],[i,j])
end
end
end
end
This is however extremely inefficient. I would very much appreciate an efficient way to create X.
The one way to do this is to remove the 2 innermost loops and replace then with a vectorised version. By the look of your f function this shouldn't be too bad
First we need to construct two matrices containing the 1 to w on every row and 1 to h on every column like so
wMat=repmat(1:w,h,1);
hMat=repmat(1:h,w,1)';
This is going to represent the inner two loops, and the transpose will allow us to get all combinations. Now we can vectorise the calculation (f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2)):
for x=1:w;
for y=1:h;
temp1=sqrt((x-wMat).^2+(y-hMat).^2);
X{x,y}=exp(temp1/(sigma^2));
end
end
Where we have computed the Euclidean norm for all pairs of nodes in the inner loops at once.
Some discussion and code
The trick here is to perform the norm-calculations with numeric arrays and save the results into a cell array version as late as possible. For performing the norm-calculations you can take help of ndgrid, bsxfun and some permute + reshape to give it the "shape" as needed for the final cell array version. So, here's the vectorized approach to perform these tasks -
%// Create x-y/i-j values to be used for calculation of function values
[xi,yi] = ndgrid(1:w,1:h);
%// Get the norm values
normvals = sqrt(bsxfun(#minus,xi(:),xi(:).').^2 + ...
bsxfun(#minus,yi(:),yi(:).').^2);
%// Get the actual function values
vals = exp(-normvals.^2/sigma^2);
%// Get the values into blocks of a 4D array and then re-arrange to match
%// with the shape of numeric array version of X
blks = reshape(permute(reshape(vals, w*h, h, []), [2 1 3]), h, w, h, w);
arranged_blks = reshape(permute(blks,[2 3 1 4]),w,h,w,h);
%// Finally get the cell array version
X = squeeze(mat2cell(arranged_blks,w,h,ones(1,w),ones(1,h)));
Benchmarking and runtimes
After improving the original loopy code with pre-allocation for X and function-inling f, runtime-benchmarks were performed with it against the proposed vectorized approach with datasizes as w, h = 60 and the runtime results thus obtained were -
----------- With Improved loopy code
Elapsed time is 41.227797 seconds.
----------- With Vectorized code
Elapsed time is 2.116782 seconds.
This suggested a whooping close to 20x speedup with the proposed solution!
For extremely huge datasizes
If you are dealing with huge datasizes, essentially you are not giving enough memory for bsxfun to work with, and bsxfun is known to use up a lot of memory for giving you a performance-efficient vectorized solution. So, for such huge-datasize cases, you can use the following loopy approach to replace normvals calculations that was listed in the earlier bsxfun based solution -
%// Get the norm values
nx = numel(xi);
normvals = zeros(nx,nx);
for ii = 1:nx
normvals(:,ii) = sqrt( (xi(:) - xi(ii)).^2 + (yi(:) - yi(ii)).^2 );
end
It seems to me that when you run through the cycle for x=w, y=h, you are calculating all the values you need at once. So you don't need recalculate them. Once you have this:
for i=1:w
for j=1:h
temp(i,j)=f([x,y],[i,j])
end
end
Then, e.g. X{1,1} is just temp(1,1), X{2,2} is just temp(1:2,1:2), and so on. If you can vectorise the calculation of f (norm here is just the Euclidean norm of that vector?) then it will get even simpler.

Why is my Matlab for-loop code faster than my vectorized version

I had always heard that vectorized code runs faster than for loops in MATLAB. However, when I tried vectorizing my MATLAB code it seemed to run slower.
I used tic and toc to measure the times. I changed only the implementation of a single function in my program. My vectorized version ran in 47.228801 seconds and my for-loop version ran in 16.962089 seconds.
Also in my main program I used a large number for N, N = 1000000and DataSet's size is 1 301, and I ran each version several times for different data sets with the same size and N.
Why is the vectorized so much slower and how can I improve the speed further?
The "vectorized" version
function [RNGSet] = RNGAnal(N,DataSet)
%Creates a random number generated set of numbers to check accuracy overall
% This function will produce random numbers and normalize a new Data set
% that is derived from an old data set by multiply random numbers and
% then dividing by N/2
randData = randint(N,length(DataSet));
tempData = repmat(DataSet,N,1);
RNGSet = randData .* tempData;
RNGSet = sum(RNGSet,1) / (N/2); % sum and normalize by the N
end
The "for-loop" version
function [RNGData] = RNGAnsys(N,Data)
%RNGAnsys This function produces statistical RNG data using a for loop
% This function will produce RNGData that will be used to plot on another
% plot that possesses the actual data
multData = zeros(N,length(Data));
for i = 1:length(Data)
photAbs = randint(N,1); % Create N number of random 0's or 1's
multData(:,i) = Data(i) * photAbs; % multiply each element in the molar data by the random numbers
end
sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
end
Vectorization
First glance at the for-loop code tells us that since photAbs is a binary array each column of which is scaled according to each element of Data, this binary feature could be used for vectorization. This is abused in the code here -
function RNGData = RNGAnsys_vect1(N,Data)
%// Get the 2D Matrix of random ones and zeros
photAbsAll = randint(N,numel(Data));
%// Take care of multData internally by summing along the columns of the
%// binary 2D matrix and then multiply each element of it with each scalar
%// taken from Data by performing elementwise multiplication
sumData = Data.*sum(photAbsAll,1);
%// Divide by n, but account for 0.5 average by n/2
RNGData = (sumData./(N/2))'; %//'
return;
After profiling, it appears that the bottleneck is the random binary array creating part. So, using a faster random binary array creator as suggested in this smart solution, the above function could be further optimized like so -
function RNGData = RNGAnsys_vect2(N,Data)
%// Create a random binary array and sum along the columns on the fly to
%// save on any variable space that would be required otherwise.
%// Also perform the elementwise multiplication as discussed before.
sumData = Data.*sum(rand(N,numel(Data))<0.5,1);
%// Divide by n, but account for 0.5 average by n/2
RNGData = (sumData./(N/2))'; %//'
return;
Using the smart binary random array creator, the original code could be optimized as well, that will be used for a fair benchmarking between optimized for-loop and vectorized codes later on. The optimized for-loop code is listed here -
function RNGData = RNGAnsys_opt1(N,Data)
multData = zeros(N,numel(Data));
for i = 1:numel(Data)
%// Create N number of random 0's or 1's using a smart approach
%// Then, multiply each element in the molar data by the random numbers
multData(:,i) = Data(i) * rand(N,1)<.5;
end
sumData = sum(multData,1); % sum each individual energy level's data point
RNGData = (sumData/(N/2))'; % divide by n, but account for 0.5 average by n/2
return;
Benchmarking
Benchmarking Code
N = 15000; %// Kept at this value as it going out of memory with higher N's.
%// Size of dataset is more important anyway as that decides how
%// well is vectorized code against a for-loop code
DS_arr = [50 100 200 500 800 1500 5000]; %// Dataset sizes
timeall = zeros(2,numel(DS_arr));
for k1 = 1:numel(DS_arr)
DS = DS_arr(k1);
Data = rand(1,DS);
f = #() RNGAnsys_opt1(N,Data);%// Optimized for-loop code
timeall(1,k1) = timeit(f);
clear f
f = #() RNGAnsys_vect2(N,Data);%// Vectorized Code
timeall(2,k1) = timeit(f);
clear f
end
%// Display benchmark results
figure,hold on, grid on
plot(DS_arr,timeall(1,:),'-ro')
plot(DS_arr,timeall(2,:),'-kx')
legend('Optimized for-loop code','Vectorized code')
xlabel('Dataset size ->'),ylabel('Time(sec) ->')
avg_speedup = mean(timeall(1,:)./timeall(2,:))
title(['Average Speedup with vectorized code = ' num2str(avg_speedup) 'x'])
Results
Concluding remarks
Based on the experience I had so far with MATLAB, neither for loops nor vectorized techniques are fit for all situations, but everything is situation-specific.
Try using the matlab profiler to determine which line or lines of code are using the most amount of time. That way you can find out if the repmat function is what is slowing you down as is being suggested. Let us know what you find, I'm interested!
randData = randint(N,length(DataSet));
allocates a 1.2GB array. (4*301*1000000). Implicitly you create up to 4 of these monsters in your program, causing continuous cache-misses.
You for-loop code could nearly run in the processor cache (or it does on the bigger xeons).

Efficient (fastest) way to sum elements of matrix in matlab

Lets have matrix A say A = magic(100);. I have seen 2 ways of computing sum of all elements of matrix A.
sumOfA = sum(sum(A));
Or
sumOfA = sum(A(:));
Is one of them faster (or better practise) then other? If so which one is it? Or are they both equally fast?
It seems that you can't make up your mind about whether performance or floating point accuracy is more important.
If floating point accuracy were of paramount accuracy, then you would segregate the positive and negative elements, sorting each segment. Then sum in order of increasing absolute value. Yeah, I know, its more work than anyone would do, and it probably will be a waste of time.
Instead, use adequate precision such that any errors made will be irrelevant. Use good numerical practices about tests, etc, such that there are no problems generated.
As far as the time goes, for an NxM array,
sum(A(:)) will require N*M-1 additions.
sum(sum(A)) will require (N-1)*M + M-1 = N*M-1 additions.
Either method requires the same number of adds, so for a large array, even if the interpreter is not smart enough to recognize that they are both the same op, who cares?
It is simply not an issue. Don't make a mountain out of a mole hill to worry about this.
Edit: in response to Amro's comment about the errors for one method over the other, there is little you can control. The additions will be done in a different order, but there is no assurance about which sequence will be better.
A = randn(1000);
format long g
The two solutions are quite close. In fact, compared to eps, the difference is barely significant.
sum(A(:))
ans =
945.760668102446
sum(sum(A))
ans =
945.760668102449
sum(sum(A)) - sum(A(:))
ans =
2.72848410531878e-12
eps(sum(A(:)))
ans =
1.13686837721616e-13
Suppose you choose the segregate and sort trick I mentioned. See that the negative and positive parts will be large enough that there will be a loss of precision.
sum(sort(A(A<0),'descend'))
ans =
-398276.24754782
sum(sort(A(A<0),'descend')) + sum(sort(A(A>=0),'ascend'))
ans =
945.7606681037
So you really would need to accumulate the pieces in a higher precision array anyway. We might try this:
[~,tags] = sort(abs(A(:)));
sum(A(tags))
ans =
945.760668102446
An interesting problem arises even in these tests. Will there be an issue because the tests are done on a random (normal) array? Essentially, we can view sum(A(:)) as a random walk, a drunkard's walk. But consider sum(sum(A)). Each element of sum(A) (i.e., the internal sum) is itself a sum of 1000 normal deviates. Look at a few of them:
sum(A)
ans =
Columns 1 through 6
-32.6319600960983 36.8984589766173 38.2749084367497 27.3297721091922 30.5600109446534 -59.039228262402
Columns 7 through 12
3.82231962760523 4.11017616179294 -68.1497901792032 35.4196443983385 7.05786623564426 -27.1215387236418
Columns 13 through 18
When we add them up, there will be a loss of precision. So potentially, the operation as sum(A(:)) might be slightly more accurate. Is it so? What if we use a higher precision for the accumulation? So first, I'll form the sum down the columns using doubles, then convert to 25 digits of decimal precision, and sum the rows. (I've displayed only 20 digits here, leaving 5 digits hidden as guard digits.)
sum(hpf(sum(A)))
ans =
945.76066810244807408
Or, instead, convert immediately to 25 digits of precision, then summing the result.
sum(hpf(A(:))
945.76066810244749807
So both forms in double precision were equally wrong here, in opposite directions. In the end, this is all moot, since any of the alternatives I've shown are far more time consuming compared to the simple variations sum(A(:)) or sum(sum(A)). Just pick one of them and don't worry.
Performance-wise, I'd say both are very similar (assuming a recent MATLAB version). Here is quick test using the TIMEIT function:
function sumTest()
M = randn(5000);
timeit( #() func1(M) )
timeit( #() func2(M) )
end
function v = func1(A)
v = sum(A(:));
end
function v = func2(A)
v = sum(sum(A));
end
the results were:
>> sumTest
ans =
0.0020917
ans =
0.0017159
What I would worry about is floating-point issues. Example:
>> M = randn(1000);
>> abs( sum(M(:)) - sum(sum(M)) )
ans =
3.9108e-11
Error magnitude increases for larger matrices
i think a simple way to understand is apply " tic_ toc "function in first and last of your code.
tic
A = randn(5000);
format long g
sum(A(:));
toc
but when you used randn function ,elements of it are random and time of calculation can
different in each cycle CPU calculation .
This better you used a unique matrix whit so large elements to compare time of calculation.

How to write vectorized functions in MATLAB

I am just learning MATLAB and I find it hard to understand the performance factors of loops vs vectorized functions.
In my previous question: Nested for loops extremely slow in MATLAB (preallocated) I realized that using a vectorized function vs. 4 nested loops made a 7x times difference in running time.
In that example instead of looping through all dimensions of a 4 dimensional array and calculating median for each vector, it was much cleaner and faster to just call median(stack, n) where n meant the working dimension of the median function.
But median is just a very easy example and I was just lucky that it had this dimension parameter implemented.
My question is that how do you write a function yourself which works as efficiently as one which has this dimension range implemented?
For example you have a function my_median_1D which only works on a 1-D vector and returns a number.
How do you write a function my_median_nD which acts like MATLAB's median, by taking an n-dimensional array and a "working dimension" parameter?
Update
I found the code for calculating median in higher dimensions
% In all other cases, use linear indexing to determine exact location
% of medians. Use linear indices to extract medians, then reshape at
% end to appropriate size.
cumSize = cumprod(s);
total = cumSize(end); % Equivalent to NUMEL(x)
numMedians = total / nCompare;
numConseq = cumSize(dim - 1); % Number of consecutive indices
increment = cumSize(dim); % Gap between runs of indices
ixMedians = 1;
y = repmat(x(1),numMedians,1); % Preallocate appropriate type
% Nested FOR loop tracks down medians by their indices.
for seqIndex = 1:increment:total
for consIndex = half*numConseq:(half+1)*numConseq-1
absIndex = seqIndex + consIndex;
y(ixMedians) = x(absIndex);
ixMedians = ixMedians + 1;
end
end
% Average in second value if n is even
if 2*half == nCompare
ixMedians = 1;
for seqIndex = 1:increment:total
for consIndex = (half-1)*numConseq:half*numConseq-1
absIndex = seqIndex + consIndex;
y(ixMedians) = meanof(x(absIndex),y(ixMedians));
ixMedians = ixMedians + 1;
end
end
end
% Check last indices for NaN
ixMedians = 1;
for seqIndex = 1:increment:total
for consIndex = (nCompare-1)*numConseq:nCompare*numConseq-1
absIndex = seqIndex + consIndex;
if isnan(x(absIndex))
y(ixMedians) = NaN;
end
ixMedians = ixMedians + 1;
end
end
Could you explain to me that why is this code so effective compared to the simple nested loops? It has nested loops just like the other function.
I don't understand how could it be 7x times faster and also, that why is it so complicated.
Update 2
I realized that using median was not a good example as it is a complicated function itself requiring sorting of the array or other neat tricks. I re-did the tests with mean instead and the results are even more crazy:
19 seconds vs 0.12 seconds.
It means that the built in way for sum is 160 times faster than the nested loops.
It is really hard for me to understand how can an industry leading language have such an extreme performance difference based on the programming style, but I see the points mentioned in the answers below.
Update 2 (to address your updated question)
MATLAB is optimized to work well with arrays. Once you get used to it, it is actually really nice to just have to type one line and have MATLAB do the full 4D looping stuff itself without having to worry about it. MATLAB is often used for prototyping / one-off calculations, so it makes sense to save time for the person coding, and giving up some of C[++|#]'s flexibility.
This is why MATLAB internally does some loops really well - often by coding them as a compiled function.
The code snippet you give doesn't really contain the relevant line of code which does the main work, namely
% Sort along given dimension
x = sort(x,dim);
In other words, the code you show only needs to access the median values by their correct index in the now-sorted multi-dimensional array x (which doesn't take much time). The actual work accessing all array elements was done by sort, which is a built-in (i.e. compiled and highly optimized) function.
Original answer (about how to built your own fast functions working on arrays)
There are actually quite a few built-ins that take a dimension parameter: min(stack, [], n), max(stack, [], n), mean(stack, n), std(stack, [], n), median(stack,n), sum(stack, n)... together with the fact that other built-in functions like exp(), sin() automatically work on each element of your whole array (i.e. sin(stack) automatically does four nested loops for you if stack is 4D), you can built up a lot of functions that you might need just be relying on the existing built-ins.
If this is not enough for a particular case you should have a look at repmat, bsxfun, arrayfun and accumarray which are very powerful functions for doing things "the MATLAB way". Just search on SO for questions (or rather answers) using one of these, I learned a lot about MATLABs strong points that way.
As an example, say you wanted to implement the p-norm of stack along dimension n, you could write
function result=pnorm(stack, p, n)
result=sum(stack.^p,n)^(1/p);
... where you effectively reuse the "which-dimension-capability" of sum.
Update
As Max points out in the comments, also have a look at the colon operator (:) which is a very powerful tool for selecting elements from an array (or even changing it shape, which is more generally done with reshape).
In general, have a look at the section Array Operations in the help - it contains repmat et al. mentioned above, but also cumsum and some more obscure helper functions which you should use as building blocks.
Vectorization
In addition to whats already been said, you should also understand that vectorization involves parallelization, i.e. performing concurrent operations on data as opposed to sequential execution (think SIMD instructions), and even taking advantage of threads and multiprocessors in some cases...
MEX-files
Now although the "interpreted vs. compiled" point has already been argued, no one mentioned that you can extend MATLAB by writing MEX-files, which are compiled executables written in C, that can be called directly as normal function from inside MATLAB. This allows you to implement performance-critical parts using a lower-level language like C.
Column-major order
Finally, when trying to optimize some code, always remember that MATLAB stores matrices in column-major order. Accessing elements in that order can yield significant improvements compared to other arbitrary orders.
For example, in your previous linked question, you were computing the median of set of stacked images along some dimension. Now the order in which those dimensions are ordered greatly affect the performance. Illustration:
%# sequence of 10 images
fPath = fullfile(matlabroot,'toolbox','images','imdemos');
files = dir( fullfile(fPath,'AT3_1m4_*.tif') );
files = strcat(fPath,{filesep},{files.name}'); %'
I = imread( files{1} );
%# stacked images along the 1st dimension: [numImages H W RGB]
stack1 = zeros([numel(files) size(I) 3], class(I));
for i=1:numel(files)
I = imread( files{i} );
stack1(i,:,:,:) = repmat(I, [1 1 3]); %# grayscale to RGB
end
%# stacked images along the 4th dimension: [H W RGB numImages]
stack4 = permute(stack1, [2 3 4 1]);
%# compute median image from each of these two stacks
tic, m1 = squeeze( median(stack1,1) ); toc
tic, m4 = median(stack4,4); toc
isequal(m1,m4)
The timing difference was huge:
Elapsed time is 0.257551 seconds. %# stack1
Elapsed time is 17.405075 seconds. %# stack4
Could you explain to me that why is this code so effective compared to the simple nested loops? It has nested loops just like the other function.
The problem with nested loops is not the nested loops themselves. It's the operations you perform inside.
Each function call (especially to a non-built-in function) generates a little bit of overhead; more so if the function performs e.g. error checking that takes the same amount of time regardless of input size. Thus, if a function has only a 1 ms overhead, if you call it 1000 times, you will have wasted a second. If you can call it once to perform a vectorized calculation, you pay overhead only once.
Furthermore, the JIT compiler (pdf) can help vectorize simple for-loops, where you, for example, only perform basic arithmetic operations. Thus, the loops with simple calculations in your post are sped up by a lot, while the loops calling median are not.
In this case
M = median(A,dim) returns the median values for elements along the dimension of A specified by scalar dim
But with a general function you can try splitting your array with mat2cell (which can work with n-D arrays and not just matrices) and applying your my_median_1D function through cellfun. Below I will use median as an example to show that you get equivalent results, but instead you can pass it any function defined in an m-file, or an anonymous function defined with the #(args) notation.
>> testarr = [[1 2 3]' [4 5 6]']
testarr =
1 4
2 5
3 6
>> median(testarr,2)
ans =
2.5000
3.5000
4.5000
>> shape = size(testarr)
shape =
3 2
>> cellfun(#median,mat2cell(testarr,repmat(1,1,shape(1)),[shape(2)]))
ans =
2.5000
3.5000
4.5000

Resources