How to multiply tensors in MATLAB without looping? - performance

Suppose I have:
A = rand(1,10,3);
B = rand(10,16);
And I want to get:
C(:,1) = A(:,:,1)*B;
C(:,2) = A(:,:,2)*B;
C(:,3) = A(:,:,3)*B;
Can I somehow multiply this in a single line so that it is faster?
What if I create new tensor b like this
for i = 1:3
b(:,:,i) = B;
end
Can I multiply A and b to get the same C but faster? Time taken in creation of b by the loop above doesn't matter since I will be needing C for many different A-s while B stays the same.

Permute the dimensions of A and B and then apply matrix multiplication:
C = B.'*permute(A, [2 3 1]);

If A is a true 3D array, something like A = rand(4,10,3) and assuming that B stays as a 2D array, then each A(:,:,1)*B would yield a 2D array.
So, assuming that you want to store those 2D arrays as slices in the third dimension of output array, C like so -
C(:,:,1) = A(:,:,1)*B;
C(:,:,2) = A(:,:,2)*B;
C(:,:,3) = A(:,:,3)*B; and so on.
To solve this in a vectorized manner, one of the approaches would be to use reshape A into a 2D array merging the first and third dimensions and then performing matrix-muliplication. Finally, to bring the output size same as the earlier listed C, we need a final step of reshaping.
The implementation would look something like this -
%// Get size and then the final output C
[m,n,r] = size(A);
out = permute(reshape(reshape(permute(A,[1 3 2]),[],n)*B,m,r,[]),[1 3 2]);
Sample run -
>> A = rand(4,10,3);
B = rand(10,16);
C(:,:,1) = A(:,:,1)*B;
C(:,:,2) = A(:,:,2)*B;
C(:,:,3) = A(:,:,3)*B;
>> [m,n,r] = size(A);
out = permute(reshape(reshape(permute(A,[1 3 2]),[],n)*B,m,r,[]),[1 3 2]);
>> all(C(:)==out(:)) %// Verify results
ans =
1
As per the comments, if A is a 3D array with always a singleton dimension at the start, you can just use squeeze and then matrix-multiplication like so -
C = B.'*squeeze(A)

EDIT: #LuisMendo points out that this is indeed possible for this specific use case. However, it is not (in general) possible if the first dimension of A is not 1.
I've grappled with this for a while now, and I've never been able to come up with a solution. Performing element-wise calculations is made nice by bsxfun, but tensor multiplication is something which is woefully unsupported. Sorry, and good luck!
You can check out this mathworks file exchange file, which will make it easier for you and supports the behavior you're looking for, but I believe that it relies on loops as well. Edit: it relies on MEX/C++, so it isn't a pure MATLAB solution if that's what you're looking for.

I have to agree with #GJSein, the for loop is really fast
time
0.7050 0.3145
Here's the timer function
function time
n = 1E7;
A = rand(1,n,3);
B = rand(n,16);
t = [];
C = {};
tic
C{length(C)+1} = squeeze(cell2mat(cellfun(#(x) x*B,num2cell(A,[1 2]),'UniformOutput',false)));
t(length(t)+1) = toc;
tic
for i = 1:size(A,3)
C{length(C)+1}(:,i) = A(:,:,i)*B;
end
t(length(t)+1) = toc;
disp(t)
end

Related

Julia : How to fill a matrix row by row in julia

I have 200 vectors; each one has a length of 10000.
I want to fill a matrix such that each line represents a vector.
If your vectors are already stored in an array then you can use vcat( ) here:
A = [rand(10000)' for idx in 1:200]
B = vcat(A...)
Julia stores matrices in column-major order so you are going to have to adapt a bit to that
If you have 200 vectors of length 100000 you should make first an empty vector, a = [], this will be your matrix
Then you have to vcat the first vector to your empty vector, like so
v = your vectors, however they are defined
a = []
a = vcat(a, v[1])
Then you can iterate through vectors 2:200 by
for i in 2:200
a = hcat(a,v[i])
end
And finally transpose a
a = a'
Alternatively, you could do
a = zeros(200,10000)
for i in 1:length(v)
a[i,:] = v[i]
end
but I suppose that wont be as fast, if performance is at all an issue, because as I said, julia stores in column major order so access will be slower
EDIT from reschu's comment
a = zeros(10000,200)
for i in 1:length(v)
a[:,i] = v[i]
end
a = a'

Vectorizing distance calculation between vectors

I have a 3 X 1000 (and later 3 X 10 000) matrix cord given, which contains the three dimensional coordinates for my pixels.
My intention is to calculate the distance between all the pixels, and I do it with a for loop (see below), but I will have to calculate this for huge matrices soon, and am wondering if I could vectorize the code for making it faster...?
dist = zeros(size(cord,2),size(cord,2));
for i = 1:size(cord,2)
for j = 1:size(cord,2)
dist(i,j) = norm(cord(:,i)-cord(:,j));
dist(j,i) = dist(i,j);
end
end
pdist does exactly that. squareform is needed to get the result in the form of a square, symmetric matrix:
dist = squareform(pdist(cord.'));
Approach 1 (Vectorized apprach with bsxfun ) -
squeeze(sqrt(sum(bsxfun(#minus,cord,permute(cord,[1 3 2])).^2)))
Not sure if this will be faster though.
Approach 2 -
Inspired by this very smart approach and all credits to the poster. The code posted here is just slightly customized for your case and hopefully slightly better in terms of runtime. Here it is -
A = cord'; %//'
numA = size(cord,2);
helpA = ones(numA,9);
helpB = ones(numA,9);
for idx = 1:3
sqA_idx = A(:,idx).^2;
helpA(:,3*idx-1:3*idx) = [-2*A(:,idx), sqA_idx ];
helpB(:,3*idx-2:3*idx-1) = [sqA_idx , A(:,idx)];
end
dist1 = sqrt(helpA * helpB'); %// desired output
From your code, you have recognized that the dist matrix is symmetric
dist(i,j) = norm(cord(:,i)-cord(:,j));
dist(j,i) = dist(i,j);
You could change the inner loop to account for this and reduce by roughly one half the number of calculations needed
for j = i:size(cord,2)
Further, we can avoid the dist(j,i) = dist(i,j); at each iteration and just do that at the end by extracting the upper triangle part of dist and adding its transpose to the dist matrix to account for the symmetry
dist = zeros(size(cord,2),size(cord,2));
for i = 1:size(cord,2)
for j = i:size(cord,2)
dist(i,j) = norm(cord(:,i)-cord(:,j));
end
end
dist = dist + triu(dist)';
The above addition is fine since the main diagonal is all zeros.
It still performs poorly though and so we should take advantage of vectorization. We can do that as follows against the inner loop
dist = zeros(size(cord,2),size(cord,2));
for i = 1:size(cord,2)
dist(i,i+1:end) = sum((repmat(cord(:,i),1,size(cord,2)-i)-cord(:,i+1:end)).^2);
end
dist = dist + triu(dist)';
dist = sqrt(dist);
For every element in cord we need to calculate its distance with all other elements that follow it. We reproduce the element with repmat so that we can subtract it from every element that follows without the need for the loop. The differences are squared and summed and assigned to the dist matrix. We take care of the symmetry and then take the square root of the matrix to complete the norm operation.
With tic and toc, the original distance calculation with a random cord (cord = rand(3,num);) took ~93 seconds. This version took ~2.8.

Speeding up a nested for loop

I've been working on speeding up the following function, but with no results:
function beta = beta_c(k,c,gamma)
beta = zeros(size(k));
E = #(x) (1.453*x.^4)./((1 + x.^2).^(17/6));
for ii = 1:size(k,1)
for jj = 1:size(k,2)
E_int = integral(E,k(ii,jj),10000);
beta(ii,jj) = c*gamma/(k(ii,jj)*sqrt(E_int));
end
end
end
Up to now, I solved it this way:
function beta = beta_calc(k,c,gamma)
k_1d = reshape(k,[1,numel(k)]);
E_1d =#(k) 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int = zeros(1,numel(k_1d));
parfor ii = 1:numel(k_1d)
E_int(ii) = quad(E_1d,k_1d(ii),10000);
end
beta_1d = c*gamma./(k_1d.*sqrt(E_int));
beta = reshape(beta_1d,[size(k,1),size(k,2)]);
end
Seems to me, it didn't really enhance performances. What do you think about this?
Would you mind to shed a light?
I thank you in advance.
EDIT
I am gonna introduce some theoretical background involving my question.
Generally, beta is to be calculated as follows
Therefore, in the reduced case of unidimensional k array, E_int may be calculated as
E = 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int = 1.5 - cumtrapz(k,E);
or, alternatively as
E_int(1) = 1.5;
for jj = 2:numel(k)
E =#(k) 1.453.*k.^4./((1 + k.^2).^(17/6));
E_int(jj) = E_int(jj - 1) - integral(E,k(jj-1),k(jj));
end
Nonetheless, k is currently a matrix k(size1,size2).
Here's another approach, parallelize, because it's easy using spmd or parfor. Instead of integral consider quad, see this link for examples...
I like this question.
The problem: the function integral takes as integration limits only scalars. Hence, it is difficult to vectorize the computation of of E_int.
A clue: there seems to be lot of redundancy in integrating the same function over and over from k(ii,jj) to infinity...
Proposed solution: How about sorting the values of k from smallest to largest and integrating E_sort_int(si) = integral( E, sortedK(si), sortedK(si+1) ); with sortedK( numel(k) + 1 ) = 10000;. Then the full value of E_int = cumsum( E_sort_int ); (you only need to "undo" the sorting and reshape it back to the size of k).

How to speed this kind of for-loop?

I would like to compute the maximum of translated images along the direction of a given axis. I know about ordfilt2, however I would like to avoid using the Image Processing Toolbox.
So here is the code I have so far:
imInput = imread('tire.tif');
n = 10;
imMax = imInput(:, n:end);
for i = 1:(n-1)
imMax = max(imMax, imInput(:, i:end-(n-i)));
end
Is it possible to avoid using a for-loop in order to speed the computation up, and, if so, how?
First edit: Using Octave's code for im2col is actually 50% slower.
Second edit: Pre-allocating did not appear to improve the result enough.
sz = [size(imInput,1), size(imInput,2)-n+1];
range_j = 1:size(imInput, 2)-sz(2)+1;
range_i = 1:size(imInput, 1)-sz(1)+1;
B = zeros(prod(sz), length(range_j)*length(range_i));
counter = 0;
for j = range_j % left to right
for i = range_i % up to bottom
counter = counter + 1;
v = imInput(i:i+sz(1)-1, j:j+sz(2)-1);
B(:, counter) = v(:);
end
end
imMax = reshape(max(B, [], 2), sz);
Third edit: I shall show the timings.
For what it's worth, here's a vectorized solution using IM2COL function from the Image Processing Toolbox:
imInput = imread('tire.tif');
n = 10;
sz = [size(imInput,1) size(imInput,2)-n+1];
imMax = reshape(max(im2col(imInput, sz, 'sliding'),[],2), sz);
imshow(imMax)
You could perhaps write your own version of IM2COL as it simply consists of well crafted indexing, or even look at how Octave implements it.
Check out the answer to this question about doing a rolling median in c. I've successfully made it into a mex function and it is way faster than even ordfilt2. It will take some work to do a max, but I'm sure it's possible.
Rolling median in C - Turlach implementation

Sorting rows of two matrices using same ordering [duplicate]

Suppose I have a matrix A and I sort the rows of this matrix. How do I replicate the same ordering on a matrix B (same size of course)?
E.g.
A = rand(3,4);
[val ind] = sort(A,2);
B = rand(3,4);
%// Reorder the elements of B according to the reordering of A
This is the best I've come up with
m = size(A,1);
B = B(bsxfun(#plus,(ind-1)*m,(1:m)'));
Out of curiosity, any alternatives?
Update: Jonas' excellent solution profiled on 2008a (XP):
n = n
0.048524 1.4632 1.4791 1.195 1.0662 1.108 1.0082 0.96335 0.93155 0.90532 0.88976
n = 2m
0.63202 1.3029 1.1112 1.0501 0.94703 0.92847 0.90411 0.8849 0.8667 0.92098 0.85569
It just goes to show that loops aren't anathema to MATLAB programmers anymore thanks to JITA (perhaps).
A somewhat clearer way to do this is to use a loop
A = rand(3,4);
B = rand(3,4);
[sortedA,ind] = sort(A,2);
for r = 1:size(A,1)
B(r,:) = B(r,ind(r,:));
end
Interestingly, the loop version is faster for small (<12 rows) and large (>~700 rows) square arrays (r2010a, OS X). The more columns there are relative to rows, the better the loop performs.
Here's the code I quickly hacked up for testing:
siz = 10:100:1010;
tt = zeros(100,2,length(siz));
for s = siz
for k = 1:100
A = rand(s,1*s);
B = rand(s,1*s);
[sortedA,ind] = sort(A,2);
tic;
for r = 1:size(A,1)
B(r,:) = B(r,ind(r,:));
end,tt(k,1,s==siz) = toc;
tic;
m = size(A,1);
B = B(bsxfun(#plus,(ind-1)*m,(1:m).'));
tt(k,2,s==siz) = toc;
end
end
m = squeeze(mean(tt,1));
m(1,:)./m(2,:)
For square arrays
ans =
0.7149 2.1508 1.2203 1.4684 1.2339 1.1855 1.0212 1.0201 0.8770 0.8584 0.8405
For twice as many columns as there are rows (same number of rows)
ans =
0.8431 1.2874 1.3550 1.1311 0.9979 0.9921 0.8263 0.7697 0.6856 0.7004 0.7314
Sort() returns the index along the dimension you sorted on. You can explicitly construct indexes for the other dimensions that cause the rows to remain stable, and then use linear indexing to rearrange the whole array.
A = rand(3,4);
B = A; %// Start with same values so we can programmatically check result
[A2 ix2] = sort(A,2);
%// ix2 is the index along dimension 2, and we want dimension 1 to remain unchanged
ix1 = repmat([1:size(A,1)]', [1 size(A,2)]); %//'
%// Convert to linear index equivalent of the reordering of the sort() call
ix = sub2ind(size(A), ix1, ix2)
%// And apply it
B2 = B(ix)
ok = isequal(A2, B2) %// confirm reordering
Can't you just do this?
[val ind]=sort(A);
B=B(ind);
It worked for me, unless I'm understanding your problem wrong.

Resources