Reduce number of allocations in matrix multiplication in Julia? - matrix

Is there any way to reduce the number of allocations when multiplying matrices in Julia? I would like to reduce this number if possible. I have attached a screenshot that displays this.

You can go down to no allocations if you pre-allocate the output matrix:
julia> x = rand(600, 600);
julia> y = rand(600, 600);
julia> z = zeros(600, 600);
julia> #allocated mul!(z, x, y)
0

Related

Speeding up matrix multiplication like operations

Suppose I have the following two matrices:
x = torch.randint(0, 256, (100000, 48), dtype=torch.uint8)
x_new = torch.randint(0, 256, (1000, 48), dtype=torch.uint8)
I wish to do a matrix multiplication like operation where I compare the 48 dimensions and sum up all the elements that are equal. The following operation takes 7.81 seconds. Batching does not seem to help:
matrix = (x_new.unsqueeze(1) == x).sum(dim=-1)
However, doing a simple matrix multiplication (matrix = x_new # x.T) takes 3.54 seconds. I understand this is most likely calling a deeper library that isn't slowed down by python. However, the question is, is there a way to speed up the multiplication like operation? by using scripting, or any other way at all?
What is even stranger though is that if I do matrix = x_new.float() # x.float().T this operation takes 214ms. This is more than 10x faster than the uint8 multiplication.
For context, I am trying to quantize vectors so that I can find the closest vector by comparing integers than directly doing dot products.

Improve the speed for vector multiple matrix multple vector

I am calculating vector multiply vector and vector multiply matrix multiply vector in Julia.
The orignal code is below.
using LinearAlgebra
n=10000
vector1 = randn(n,1)
vector2 = randn(n,1)
matrix1 = randn(n,n)
matrix2 = randn(n,n)
want1 = transpose(vector1)*matrix1*vector1
want2 = transpose(vector1)*vector2
I would like to improve the speed for the calculation, so I am using mul! in LinearAlgebra.
w1 = similar(vector1')
mywant1 = similar(want1)
mywant2 = similar(want2)
mul!(w1,vector1',matrix1)
mul!(mywant1,w1,vector1)
mul!(mywant2,vector1',vector2)
When n is large, for example n=100000, the calculation without mul! is faster. I am just wondering if there is anway to improve the speed of the calculation as well as save memory?

Counting sticks that touch red lines in Julia

I want to randomly "scatter" sticks of length 1 like shown in the diagram.
I also want to count the sticks that touched the red lines.
My approach was to create normalized vectors that are oriented randomly in space.
The problem is that they are all at the origin and I am also not sure how to identify and count those who touch the red lines
a = [randn(), randn()];
line = a/norm(a) # Normalized vector in random direction
Here is one of the ways to approach your question using simulation:
julia> using Statistics
julia> function gen_point()
α = rand() * 2π
range_x = 5 # anything as your lines are horizontal
range_y::Int = 5 # must be a positive integer
#assert range_y >= 1
x0 = [rand() * range_x, rand() * range_y]
xd = [cos(α), sin(α)]
return (x0, x0 .+ xd)
end
gen_point (generic function with 1 method)
julia> intersects(point) = floor(point[1][2]) != floor(point[2][2])
intersects (generic function with 1 method)
julia> mean((intersects(gen_point()) for _ in 1:100_000))
0.63731
julia> 2/π # our simulation recovers the theoretical result
0.6366197723675814
Some comments:
In my solution I sample the angle in (0,2pi) range;
I use x_range and y_range to define the rectangle in which the lines are scattered (x_range can be anything, but it is important that y_range is an integer);
I have not optimized the code for speed, but for simplicity; in my gen_point function I generate a 2-element vector holding 2-element vectors indicating (x,y) locations of endpoints of the line; the intersects function - as you can see is quite simple: if y-axis of both endpoints do not have the same integer part this means that the line must intersect horizontal line of the form y=i, where i is an integer (I ignore the case where we would sample points that have exactly integer y axis value, as this is negligible);
note that your plot is incorrect as x and y axes are scaled differently so actually the lines you have drawn do not all have length 1 (this is a side comment - not affecting the solution)

Compute the mean absolute error between two image matlab

I want to Compute the mean absolute error between two image in Matlab and named it MAE
there is the code:
x=imread('duck.jpg');
imshow(x)
xmin=min(x);
xmax = max(x);
xmean=mean(x);
I = double(x) / 255;
v = var(I(:));
y = imnoise(x, 'gaussian', 0, v / 10);
y = double(y) / 255;
imshow(y)
There's no need to evaluate the min(), max(), mean() for the first image in order to evaluate the MAE.
Since the MAE is the sum of the (L1-norm) differences between corresponding pixels in your two images x and y (divided by the number of pixels), you can simply evaluate it as:
MAE=sum(abs(x(:)-y(:)))/numel(x);
where numel() is a function that returns the number of elements in its argument. In your case since x and y have the same number of elements you can either put numel(x) or numel(y).

How can I perform this array-slicing and multiplication operation more efficiently?

I used the last two dimensions of this 3D matrix as a 2D matrix. So I just wanna multiply the 2D matrix from the result of Matrix1(i,:,:) (which i-by-i) with the vector Matrix2(i,:).') (which is 1-by-i).
The only way I could do that was using an auxiliary matrix that picked up all the numbers from the 2 dimensions from the 3D matrix:
matrixAux(:,:) = Matrix1(1,:,:)
and then I did the multiplication:
matrixAux * (Matrix2(i,:).')
and it worked. However, this is slow because I need to copy all the 3D matrix to a lot of auxiliary matrices, and I need to speed up my code because I'm doing the same operation many times.
How can I do that more efficiently, without having to copy the matrix?
Approach I: bsxfun Multiplication
One approach would be to use the output of bsxfun with #times whose values you can use instead of calculating the matrix multiplication results in a loop -
sum(bsxfun(#times,Matrix1,permute(Matrix2,[1 3 2])),3).'
Example
As an example, let's suppose Matrix1 and Matrix2 are defined like this -
nrows = 3;
p = 6;
ncols = 2;
Matrix1 = rand(nrows,ncols,p)
Matrix2 = rand(nrows,p)
Then, you have your loop like this -
for i = 1:size(Matrix1,1)
matrixAux(:,:) = Matrix1(i,:,:);
matrix_mult1 = matrixAux * (Matrix2(i,:).') %//'
end
So, instead of the loops, you can directly calculate the matrix multiplication results -
matrix_mult2 = sum(bsxfun(#times,Matrix1,permute(Matrix2,[1 3 2])),3).'
Thus, each column of matrix_mult2 would represent matrix_mult1 at each iteration of the loop, as the output of the codes would make it clearer -
matrix_mult1 =
0.7693
0.8690
matrix_mult1 =
1.0649
1.2574
matrix_mult1 =
1.2949
0.6222
matrix_mult2 =
0.7693 1.0649 1.2949
0.8690 1.2574 0.6222
Approach II: "Full" Matrix Multiplication
Now, this must be exciting! Well you can also leverage MATLAB's fast matrix multiplication to get your intermediate matrix multiplication results again without loops. If Matrix1 is nrows x p x ncols, you can reshape it to nrows*p x ncols and then perform matrix multiplication of it with Matrix2. Then, to get an equivalent of matrix_mult2, you need to select indices from the multiplication result. This is precisely achieved here -
%// Get size of Matrix1 to be used regularly inside the codes later on
[m1,n1,p1] = size(Matrix1);
%// Convert 3D Matrix1 to 2D and thus perform "full" matrix multiplication
fmult = reshape(Matrix1,m1*n1,p1)*Matrix2'; %//'
%// Get valid indices
ind = bsxfun(#plus,[1:m1:size(fmult,1)]',[0:nrows-1]*(size(fmult,1)+1)); %//'
%// Get values from the full matrix multiplication result
matrix_mult3 = fmult(ind);
Here, matrix_mult3 must be the same as matrix_mult2.
Observations: Since we are not using all the values calculated from the full matrix multiplication, rather indexing into it and selecting few of its elements, as such this approach performs better than the other approaches under certain circumstances. This approach seems to be the best one when nrows is a small value as we would be using more number of elements from the full matrix multiplication output in that case.
Benchmark Results
Two cases were tested against the three approaches and the test results seem to support our hypotheses discussed earlier.
Case 1
Matrix 1 as 400 x 400 x 400, the runtimes are -
--------------- With Loops
Elapsed time is 2.253536 seconds.
--------------- With BSXFUN
Elapsed time is 0.910104 seconds.
--------------- With Full Matrix Multiplication
Elapsed time is 4.361342 seconds.
Case 2
Matrix 1 as 40 x 2000 x 2000, the runtimes are -
--------------- With Loops
Elapsed time is 5.402487 seconds.
--------------- With BSXFUN
Elapsed time is 2.585860 seconds.
--------------- With Full Matrix Multiplication
Elapsed time is 1.516682 seconds.

Resources