I am calculating vector multiply vector and vector multiply matrix multiply vector in Julia.
The orignal code is below.
using LinearAlgebra
n=10000
vector1 = randn(n,1)
vector2 = randn(n,1)
matrix1 = randn(n,n)
matrix2 = randn(n,n)
want1 = transpose(vector1)*matrix1*vector1
want2 = transpose(vector1)*vector2
I would like to improve the speed for the calculation, so I am using mul! in LinearAlgebra.
w1 = similar(vector1')
mywant1 = similar(want1)
mywant2 = similar(want2)
mul!(w1,vector1',matrix1)
mul!(mywant1,w1,vector1)
mul!(mywant2,vector1',vector2)
When n is large, for example n=100000, the calculation without mul! is faster. I am just wondering if there is anway to improve the speed of the calculation as well as save memory?
Related
Suppose I have the following two matrices:
x = torch.randint(0, 256, (100000, 48), dtype=torch.uint8)
x_new = torch.randint(0, 256, (1000, 48), dtype=torch.uint8)
I wish to do a matrix multiplication like operation where I compare the 48 dimensions and sum up all the elements that are equal. The following operation takes 7.81 seconds. Batching does not seem to help:
matrix = (x_new.unsqueeze(1) == x).sum(dim=-1)
However, doing a simple matrix multiplication (matrix = x_new # x.T) takes 3.54 seconds. I understand this is most likely calling a deeper library that isn't slowed down by python. However, the question is, is there a way to speed up the multiplication like operation? by using scripting, or any other way at all?
What is even stranger though is that if I do matrix = x_new.float() # x.float().T this operation takes 214ms. This is more than 10x faster than the uint8 multiplication.
For context, I am trying to quantize vectors so that I can find the closest vector by comparing integers than directly doing dot products.
How to vectorize this code in MATLAB?
If possible, I wish matrix B to be a sparse matrix.
%% Y is a matrix l*n
%% X is a matrix k*n
B = [];
for i=1:l
for j=1:n
temp1 = zeros(1,n*l);
temp1((i-1)*n+j) = -1;
temp2 = zeros(1,l*k);
temp2((i-1)*k+1:i*k) = (-Y(i,j)).*(X(:,j)');
B = [B;[temp1,temp2]];
end
end
I don't know how to vectorize this code, please help! Thanks!
Making use of bsxfun for masking and vectorized calculations of the elementwise multiplications, here's a vectorized approach -
%// Create left part of the output that is basically an identity matrix
parte1 = -eye(n*l);
%// Setup right part of output
parte2 = zeros(n*l,l*k);
%// Mask to set elements from the calculations of (-Y(i,j)).*(X(:,j)')
M = bsxfun(#eq,reshape(repmat(1:l,n,1),[],1),reshape(repmat(1:l,k,1),1,[]));
%// OR concisely : M = kron(eye(l),ones(n,k))==1
%// Perform vectorized calculations of (-Y(i,j)).*(X(:,j)') and set those
%// into second part at masked places
parte2(M) = -bsxfun(#times,permute(X,[2,1,3]),permute(Y,[2,3,1]));
%// Finally concatenate those parts for final output
out = [parte1,parte2];
I am implementing the partial derivative equations from the Horn & Schunck paper on optical flow. However, even for relative small images (320x568), it takes a frustratingly long time (~30-40 seconds) to complete. I assume this is due to the 320 x 568 = 181760 loop iterations, but I can't figure out a more efficient way to do this (short of a MEX file).
Is there some way to turn this into a more efficient MATLAB operation (a convolution perhaps)? I can figure out how to do this as a convolution for It but not Ix and Iy. I've also considered matrix shifting, but that only works for It as well, as far as I can figure out.
Has anyone else run into this issue and found a solution?
My code is below:
function [Ix, Iy, It] = getFlowParams(img1, img2)
% Make sure image dimensions match up
assert(size(img1, 1) == size(img2, 1) && size(img1, 2) == size(img2, 2), ...
'Images must be the same size');
assert(size(img1, 3) == 1, 'Images must be grayscale');
% Dimensions of original image
[rows, cols] = size(img1);
Ix = zeros(numel(img1), 1);
Iy = zeros(numel(img1), 1);
It = zeros(numel(img1), 1);
% Pad images to handle edge cases
img1 = padarray(img1, [1,1], 'post');
img2 = padarray(img2, [1,1], 'post');
% Concatenate i-th image with i-th + 1 image
imgs = cat(3, img1, img2);
% Calculate energy for each pixel
for i = 1 : rows
for j = 1 : cols
cube = imgs(i:i+1, j:j+1, :);
Ix(sub2ind([rows, cols], i, j)) = mean(mean(cube(:, 2, :) - cube(:, 1, :)));
Iy(sub2ind([rows, cols], i, j)) = mean(mean(cube(2, :, :) - cube(1, :, :)));
It(sub2ind([rows, cols], i, j)) = mean(mean(cube(:, :, 2) - cube(:, :, 1)));
end
end
2D convolution is the way to go here as also predicted in the question to replace those heavy mean/average calculations. Also, those iterative differentiations could be replaced by MATLAB's diff. Thus, incorporating all that, a vectorized implementation would be -
%// Pad images to handle edge cases
img1 = padarray(img1, [1,1], 'post');
img2 = padarray(img2, [1,1], 'post');
%// Store size parameters for later usage
[m,n] = size(img1);
%// Differentiation along dim-2 on input imgs for Ix calculations
df1 = diff(img1,[],2)
df2 = diff(img2,[],2)
%// 2D Convolution to simulate average calculations & reshape to col vector
Ixvals = (conv2(df1,ones(2,1),'same') + conv2(df2,ones(2,1),'same'))./4;
Ixout = reshape(Ixvals(1:m-1,:),[],1);
%// Differentiation along dim-1 on input imgs for Iy calculations
df1 = diff(img1,[],1)
df2 = diff(img2,[],1)
%// 2D Convolution to simulate average calculations & reshape to col vector
Iyvals = (conv2(df1,ones(1,2),'same') + conv2(df2,ones(1,2),'same'))./4
Iyout = reshape(Iyvals(:,1:n-1),[],1);
%// It just needs elementwise diffentiation between input imgs.
%// 2D convolution to simulate mean calculations & reshape to col vector
Itvals = conv2(img2-img1,ones(2,2),'same')./4
Itout = reshape(Itvals(1:m-1,1:n-1),[],1)
Benefits with such a vectorized implementation would be :
Memory efficiency : No more concatenation along the third dimension that would incur memory overhead. Again, performance wise it would be a benefit as we won't need to index into such heavy arrays.
The iterative differentiations inside the loopy codes are replaced by differentiation with diff, so this should be another improvement.
Those expensive average calculations are replaced by very fast convolution calculations and this should be the major improvement section.
A faster method, with improved results (in the cases I have observed) is the following:
function [Ix, Iy, It] = getFlowParams(imNew,imPrev)
gg = [0.2163, 0.5674, 0.2163];
f = imNew + imPrev;
Ix = f(:,[2:end end]) - f(:,[1 1:(end-1)]);
Ix = conv2(Ix,gg','same');
Iy = f([2:end end],:) - f([1 1:(end-1)],:);
Iy = conv2(Iy,gg ,'same');
It = 2*conv2(gg,gg,imNew - imPrev,'same');
This handles the boundary cases elegantly.
I made this as part of my optical flow toolbox, where you can easily view H&S, Lucas Kanade and more in realtime. In the toolbox, the function is called grad3D.m. You may also want to check out grad3Drec.m, in the same toolbox, which adds simple temporal blurring.
I have the following setup
matrix2D_1 = zeros(40,191);
matrix2D_2 = zeros(40,191);
matrix3D_1 = zeros(40,191,191);
for j = 1:40
for jw = 1:191
matrix2D_1(j,jw) = sum(squeeze(matrix3D_1(j,jw,:))'*matrix2D_2' );
end
end
so I want the sum of all products of the 3rd dimension of the 3D matrix with the elements of the of the first 2D matrix which is the matrix product in
squeeze(matrix3D_1(j,jw,:))'*matrix2D_2'
The sum of these results shall then be stored in the first 2D matrix.
As I have to run this in a large loop this takes the most time in my code. I cant get my head around it how to vectorize it in a more elegant way. Any faster solution would be higly appreciated....
Yup! Use matrix-multiplication and reshape magic -
M = size(matrix2D_1,2);
matrix2D_1 = reshape(sum(reshape(matrix3D_1,[],M)*matrix2D_2.',2),[],M)
Or sum and then do matrix-multiplication -
matrix2D_1 = reshape(reshape(matrix3D_1,[],M)*sum(matrix2D_2,1).',[],M)
I used the last two dimensions of this 3D matrix as a 2D matrix. So I just wanna multiply the 2D matrix from the result of Matrix1(i,:,:) (which i-by-i) with the vector Matrix2(i,:).') (which is 1-by-i).
The only way I could do that was using an auxiliary matrix that picked up all the numbers from the 2 dimensions from the 3D matrix:
matrixAux(:,:) = Matrix1(1,:,:)
and then I did the multiplication:
matrixAux * (Matrix2(i,:).')
and it worked. However, this is slow because I need to copy all the 3D matrix to a lot of auxiliary matrices, and I need to speed up my code because I'm doing the same operation many times.
How can I do that more efficiently, without having to copy the matrix?
Approach I: bsxfun Multiplication
One approach would be to use the output of bsxfun with #times whose values you can use instead of calculating the matrix multiplication results in a loop -
sum(bsxfun(#times,Matrix1,permute(Matrix2,[1 3 2])),3).'
Example
As an example, let's suppose Matrix1 and Matrix2 are defined like this -
nrows = 3;
p = 6;
ncols = 2;
Matrix1 = rand(nrows,ncols,p)
Matrix2 = rand(nrows,p)
Then, you have your loop like this -
for i = 1:size(Matrix1,1)
matrixAux(:,:) = Matrix1(i,:,:);
matrix_mult1 = matrixAux * (Matrix2(i,:).') %//'
end
So, instead of the loops, you can directly calculate the matrix multiplication results -
matrix_mult2 = sum(bsxfun(#times,Matrix1,permute(Matrix2,[1 3 2])),3).'
Thus, each column of matrix_mult2 would represent matrix_mult1 at each iteration of the loop, as the output of the codes would make it clearer -
matrix_mult1 =
0.7693
0.8690
matrix_mult1 =
1.0649
1.2574
matrix_mult1 =
1.2949
0.6222
matrix_mult2 =
0.7693 1.0649 1.2949
0.8690 1.2574 0.6222
Approach II: "Full" Matrix Multiplication
Now, this must be exciting! Well you can also leverage MATLAB's fast matrix multiplication to get your intermediate matrix multiplication results again without loops. If Matrix1 is nrows x p x ncols, you can reshape it to nrows*p x ncols and then perform matrix multiplication of it with Matrix2. Then, to get an equivalent of matrix_mult2, you need to select indices from the multiplication result. This is precisely achieved here -
%// Get size of Matrix1 to be used regularly inside the codes later on
[m1,n1,p1] = size(Matrix1);
%// Convert 3D Matrix1 to 2D and thus perform "full" matrix multiplication
fmult = reshape(Matrix1,m1*n1,p1)*Matrix2'; %//'
%// Get valid indices
ind = bsxfun(#plus,[1:m1:size(fmult,1)]',[0:nrows-1]*(size(fmult,1)+1)); %//'
%// Get values from the full matrix multiplication result
matrix_mult3 = fmult(ind);
Here, matrix_mult3 must be the same as matrix_mult2.
Observations: Since we are not using all the values calculated from the full matrix multiplication, rather indexing into it and selecting few of its elements, as such this approach performs better than the other approaches under certain circumstances. This approach seems to be the best one when nrows is a small value as we would be using more number of elements from the full matrix multiplication output in that case.
Benchmark Results
Two cases were tested against the three approaches and the test results seem to support our hypotheses discussed earlier.
Case 1
Matrix 1 as 400 x 400 x 400, the runtimes are -
--------------- With Loops
Elapsed time is 2.253536 seconds.
--------------- With BSXFUN
Elapsed time is 0.910104 seconds.
--------------- With Full Matrix Multiplication
Elapsed time is 4.361342 seconds.
Case 2
Matrix 1 as 40 x 2000 x 2000, the runtimes are -
--------------- With Loops
Elapsed time is 5.402487 seconds.
--------------- With BSXFUN
Elapsed time is 2.585860 seconds.
--------------- With Full Matrix Multiplication
Elapsed time is 1.516682 seconds.