Vectorization for array multiplication

Vectorization for array multiplication - performance

I have an 3d array U, and a 2d matrix A. I want to do the multiplication like following way. How can I vectorize my code? The loop is too slow, of course.
for j=1:N
for k=1:N
UU(:,j,k)=A*U(:,j,k);
end
end

Reshape U to 2D and perform the matrix-multiplication, thus reducing the first axis/dimension of U with A's last axis to give us a 2D array. Finally, reshape back to 3D for the final result, like so -
[m1,n1] = size(A);
[~,m2,n2] = size(U);
out = reshape(A*reshape(U,[n1,m2*n2]),[m1,m2,n2])

Depending on the size of your matrices, you might find that eliminating both loops chews up a lot of memory, and that removing just the loop over the columns is sufficient,
for k = 1:N
UU(:,:,k) = A*U(:,:,k);
end

Related

Vectorize code in MATLAB

How to vectorize this code in MATLAB?
If possible, I wish matrix B to be a sparse matrix.
%% Y is a matrix l*n
%% X is a matrix k*n
B = [];
for i=1:l
for j=1:n
temp1 = zeros(1,n*l);
temp1((i-1)*n+j) = -1;
temp2 = zeros(1,l*k);
temp2((i-1)*k+1:i*k) = (-Y(i,j)).*(X(:,j)');
B = [B;[temp1,temp2]];
end
end
I don't know how to vectorize this code, please help! Thanks!

Making use of bsxfun for masking and vectorized calculations of the elementwise multiplications, here's a vectorized approach -
%// Create left part of the output that is basically an identity matrix
parte1 = -eye(n*l);
%// Setup right part of output
parte2 = zeros(n*l,l*k);
%// Mask to set elements from the calculations of (-Y(i,j)).*(X(:,j)')
M = bsxfun(#eq,reshape(repmat(1:l,n,1),[],1),reshape(repmat(1:l,k,1),1,[]));
%// OR concisely : M = kron(eye(l),ones(n,k))==1
%// Perform vectorized calculations of (-Y(i,j)).*(X(:,j)') and set those
%// into second part at masked places
parte2(M) = -bsxfun(#times,permute(X,[2,1,3]),permute(Y,[2,3,1]));
%// Finally concatenate those parts for final output
out = [parte1,parte2];

Vectorize 2D 3D matrix product for speedup

I have the following setup
matrix2D_1 = zeros(40,191);
matrix2D_2 = zeros(40,191);
matrix3D_1 = zeros(40,191,191);
for j = 1:40
for jw = 1:191
matrix2D_1(j,jw) = sum(squeeze(matrix3D_1(j,jw,:))'*matrix2D_2' );
end
end
so I want the sum of all products of the 3rd dimension of the 3D matrix with the elements of the of the first 2D matrix which is the matrix product in
squeeze(matrix3D_1(j,jw,:))'*matrix2D_2'
The sum of these results shall then be stored in the first 2D matrix.
As I have to run this in a large loop this takes the most time in my code. I cant get my head around it how to vectorize it in a more elegant way. Any faster solution would be higly appreciated....

Yup! Use matrix-multiplication and reshape magic -
M = size(matrix2D_1,2);
matrix2D_1 = reshape(sum(reshape(matrix3D_1,[],M)*matrix2D_2.',2),[],M)
Or sum and then do matrix-multiplication -
matrix2D_1 = reshape(reshape(matrix3D_1,[],M)*sum(matrix2D_2,1).',[],M)

Best way to find all points of lattice in sphere

Given a bunch of arbitrary vectors (stored in a matrix A) and a radius r, I'd like to find all integer-valued linear combinations of those vectors which land inside a sphere of radius r. The necessary coordinates I would then store in a Matrix V. So, for instance, if the linear combination
K=[0; 1; 0]
lands inside my sphere, i.e. something like
if norm(A*K) <= r then
V(:,1)=K
end
etc.
The vectors in A are sure to be the simplest possible basis for the given lattice and the largest vector will have length 1. Not sure if that restricts the vectors in any useful way but I suspect it might. - They won't have as similar directions as a less ideal basis would have.
I tried a few approaches already but none of them seem particularly satisfying. I can't seem to find a nice pattern to traverse the lattice.
My current approach involves starting in the middle (i.e. with the linear combination of all 0s) and go through the necessary coordinates one by one. It involves storing a bunch of extra vectors to keep track of, so I can go through all the octants (in the 3D case) of the coordinates and find them one by one. This implementation seems awfully complex and not very flexible (in particular it doesn't seem to be easily generalizable to arbitrary numbers of dimension - although that isn't strictly necessary for the current purpose, it'd be a nice-to-have)
Is there a nice* way to find all the required points?
(*Ideally both efficient and elegant**. If REALLY necessary, it wouldn't matter THAT much to have a few extra points outside the sphere but preferably not that many more. I definitely do need all the vectors inside the sphere. - if it makes a large difference, I'm most interested in the 3D case.
**I'm pretty sure my current implementation is neither.)
Similar questions I found:
Find all points in sphere of radius r around arbitrary coordinate - this is actually a much more general case than what I'm looking for. I am only dealing with periodic lattices and my sphere is always centered at 0, coinciding with one point on the lattice.
But I don't have a list of points but rather a matrix of vectors with which I can generate all the points.
How to efficiently enumerate all points of sphere in n-dimensional grid - the case for a completely regular hypercubic lattice and the Manhattan-distance. I'm looking for completely arbitary lattices and euclidean distance (or, for efficiency purposes, obviously the square of that).

Offhand, without proving any assertions, I think that 1) if the set of vectors is not of maximal rank then the number of solutions is infinite; 2) if the set is of maximal rank, then the image of the linear transformation generated by the vectors is a subspace (e.g., plane) of the target space, which intersects the sphere in a lower-dimensional sphere; 3) it follows that you can reduce the problem to a 1-1 linear transformation (kxk matrix on a k-dimensional space); 4) since the matrix is invertible, you can "pull back" the sphere to an ellipsoid in the space containing the lattice points, and as a bonus you get a nice geometric description of the ellipsoid (principal axis theorem); 5) your problem now becomes exactly one of determining the lattice points inside the ellipsoid.
The latter problem is related to an old problem (counting the lattice points inside an ellipse) which was considered by Gauss, who derived a good approximation. Determining the lattice points inside an ellipse(oid) is probably not such a tidy problem, but it probably can be reduced one dimension at a time (the cross-section of an ellipsoid and a plane is another ellipsoid).

I found a method that makes me a lot happier for now. There may still be possible improvements, so if you have a better method, or find an error in this code, definitely please share. Though here is what I have for now: (all written in SciLab)
Step 1: Figure out the maximal ranges as defined by a bounding n-parallelotope aligned with the axes of the lattice vectors. Thanks for ElKamina's vague suggestion as well as this reply to another of my questions over on math.se by chappers: https://math.stackexchange.com/a/1230160/49989
function I=findMaxComponents(A,r) //given a matrix A of lattice basis vectors
//and a sphere radius r,
//find the corners of the bounding parallelotope
//built from the lattice, and store it in I.
[dims,vecs]=size(A); //figure out how many vectors there are in A (and, unnecessarily, how long they are)
U=eye(vecs,vecs); //builds matching unit matrix
iATA=pinv(A'*A); //finds the (pseudo-)inverse of A^T A
iAT=pinv(A'); //finds the (pseudo-)inverse of A^T
I=[]; //initializes I as an empty vector
for i=1:vecs //for each lattice vector,
t=r*(iATA*U(:,i))/norm(iAT*U(:,i)) //find the maximum component such that
//it fits in the bounding n-parallelotope
//of a (n-1)-sphere of radius r
I=[I,t(i)]; //and append it to I
end
I=[-I;I]; //also append the minima (by symmetry, the negative maxima)
endfunction
In my question I only asked for a general basis, i.e, for n dimensions, a set of n arbitrary but linearly independent vectors. The above code, by virtue of using the pseudo-inverse, works for matrices of arbitrary shapes and, similarly, Scilab's "A'" returns the conjugate transpose rather than just the transpose of A so it equally should work for complex matrices.
In the last step I put the corresponding minimal components.
For one such A as an example, this gives me the following in Scilab's console:
A =
0.9701425 - 0.2425356 0.
0.2425356 0.4850713 0.7276069
0.2425356 0.7276069 - 0.2425356
r=3;
I=findMaxComponents(A,r)
I =
- 2.9494438 - 3.4186986 - 4.0826424
2.9494438 3.4186986 4.0826424
I=int(I)
I =
- 2. - 3. - 4.
2. 3. 4.
The values found by findMaxComponents are the largest possible coefficients of each lattice vector such that a linear combination with that coefficient exists which still land on the sphere. Since I'm looking for the largest such combinations with integer coefficients, I can safely drop the part after the decimal point to get the maximal plausible integer ranges. So for the given matrix A, I'll have to go from -2 to 2 in the first component, from -3 to 3 in the second and from -4 to 4 in the third and I'm sure to land on all the points inside the sphere (plus superfluous extra points, but importantly definitely every valid point inside) Next up:
Step 2: using the above information, generate all the candidate combinations.
function K=findAllCombinations(I) //takes a matrix of the form produced by
//findMaxComponents() and returns a matrix
//which lists all the integer linear combinations
//in the respective ranges.
v=I(1,:); //starting from the minimal vector
K=[];
next=1; //keeps track of what component to advance next
changed=%F; //keeps track of whether to add the vector to the output
while or(v~=I(2,:)) //as long as not all components of v match all components of the maximum vector
if v <= I(2,:) then //if each current component is smaller than each largest possible component
if ~changed then
K=[K;v]; //store the vector and
end
v(next)=v(next)+1; //advance the component by 1
next=1; //also reset next to 1
changed=%F;
else
v(1:next)=I(1,1:next); //reset all components smaller than or equal to the current one and
next=next+1; //advance the next larger component next time
changed=%T;
end
end
K=[K;I(2,:)]'; //while loop ends a single iteration early so add the maximal vector too
//also transpose K to fit better with the other functions
endfunction
So now that I have that, all that remains is to check whether a given combination actually does lie inside or outside the sphere. All I gotta do for that is:
Step 3: Filter the combinations to find the actually valid lattice points
function points=generatePoints(A,K,r)
possiblePoints=A*K; //explicitly generates all the possible points
points=[];
for i=possiblePoints
if i'*i<=r*r then //filter those that are too far from the origin
points=[points i];
end
end
endfunction
And I get all the combinations that actually do fit inside the sphere of radius r.
For the above example, the output is rather long: Of originally 315 possible points for a sphere of radius 3 I get 163 remaining points.
The first 4 are: (each column is one)
- 0.2425356 0.2425356 1.2126781 - 0.9701425
- 2.4253563 - 2.6678919 - 2.4253563 - 2.4253563
1.6977494 0. 0.2425356 0.4850713
so the remainder of the work is optimization. Presumably some of those loops could be made faster and especially as the number of dimensions goes up, I have to generate an awful lot of points which I have to discard, so maybe there is a better way than taking the bounding n-parallelotope of the n-1-sphere as a starting point.

Let us just represent K as X.
The problem can be represented as:
(a11x1 + a12x2..)^2 + (a21x1 + a22x2..)^2 ... < r^2
(x1,x2,...) will not form a sphere.

This can be done with recursion on dimension--pick a lattice hyperplane direction and index all such hyperplanes that intersect the r-radius ball. The ball intersection of each such hyperplane itself is a ball, in one lower dimension. Repeat. Here's the calling function code in Octave:
function lat_points(lat_bas_mx,rr)
% **globals for hyperplane lattice point recursive function**
clear global; % this seems necessary/important between runs of this function
global MLB;
global NN_hat;
global NN_len;
global INP; % matrix of interior points, each point(vector) a column vector
global ctr; % integer counter, for keeping track of lattice point vectors added
% in the pre-allocated INP matrix; will finish iteration with actual # of points found
ctr = 0; % counts number of ball-interior lattice points found
MLB = lat_bas_mx;
ndim = size(MLB)(1);
% **create hyperplane normal vectors for recursion step**
% given full-rank lattice basis matrix MLB (each vector in lattice basis a column),
% form set of normal vectors between successive, nested lattice hyperplanes;
% store them as columnar unit normal vectors in NN_hat matrix and their lengths in NN_len vector
NN_hat = [];
for jj=1:ndim-1
tmp_mx = MLB(:,jj+1:ndim);
tmp_mx = [NN_hat(:,1:jj-1),tmp_mx];
NN_hat(:,jj) = null(tmp_mx'); % null space of transpose = orthogonal to columns
tmp_len = norm(NN_hat(:,jj));
NN_hat(:,jj) = NN_hat(:,jj)/tmp_len;
NN_len(jj) = dot(MLB(:,jj),NN_hat(:,jj));
if (NN_len(jj)<0) % NN_hat(:,jj) and MLB(:,jj) must have positive dot product
% for cutting hyperplane indexing to work correctly
NN_hat(:,jj) = -NN_hat(:,jj);
NN_len(jj) = -NN_len(jj);
endif
endfor
NN_len(ndim) = norm(MLB(:,ndim));
NN_hat(:,ndim) = MLB(:,ndim)/NN_len(ndim); % the lowest recursion level normal
% is just the last lattice basis vector
% **estimate number of interior lattice points, and pre-allocate memory for INP**
vol_ppl = prod(NN_len); % the volume of the ndim dimensional lattice paralellepiped
% is just the product of the NN_len's (they amount to the nested altitudes
% of hyperplane "paralellepipeds")
vol_bll = exp( (ndim/2)*log(pi) + ndim*log(rr) - gammaln(ndim/2+1) ); % volume of ndim ball, radius rr
est_num_pts = ceil(vol_bll/vol_ppl); % estimated number of lattice points in the ball
err_fac = 1.1; % error factor for memory pre-allocation--assume max of err_fac*est_num_pts columns required in INP
INP = zeros(ndim,ceil(err_fac*est_num_pts));
% **call the (recursive) function**
% for output, global variable INP (matrix of interior points)
% stores each valid lattice point (as a column vector)
clp = zeros(ndim,1); % confirmed lattice point (start at origin)
bpt = zeros(ndim,1); % point at center of ball (initially, at origin)
rd = 1; % initial recursion depth must always be 1
hyp_fun(clp,bpt,rr,ndim,rd);
printf("%i lattice points found\n",ctr);
INP = INP(:,1:ctr); % trim excess zeros from pre-allocation (if any)
endfunction
Regarding the NN_len(jj)*NN_hat(:,jj) vectors--they can be viewed as successive (nested) altitudes in the ndim-dimensional "parallelepiped" formed by the vectors in the lattice basis, MLB. The volume of the lattice basis parallelepiped is just prod(NN_len)--for a quick estimate of the number of interior lattice points, divide the volume of the ndim-ball of radius rr by prod(NN_len). Here's the recursive function code:
function hyp_fun(clp,bpt,rr,ndim,rd)
%{
clp = the lattice point we're entering this lattice hyperplane with
bpt = location of center of ball in this hyperplane
rr = radius of ball
rd = recrusion depth--from 1 to ndim
%}
global MLB;
global NN_hat;
global NN_len;
global INP;
global ctr;
% hyperplane intersection detection step
nml_hat = NN_hat(:,rd);
nh_comp = dot(clp-bpt,nml_hat);
ix_hi = floor((rr-nh_comp)/NN_len(rd));
ix_lo = ceil((-rr-nh_comp)/NN_len(rd));
if (ix_hi<ix_lo)
return % no hyperplane intersections detected w/ ball;
% get out of this recursion level
endif
hp_ix = [ix_lo:ix_hi]; % indices are created wrt the received reference point
hp_ln = length(hp_ix);
% loop through detected hyperplanes (updated)
if (rd<ndim)
bpt_new_mx = bpt*ones(1,hp_ln) + NN_len(rd)*nml_hat*hp_ix; % an ndim by length(hp_ix) matrix
clp_new_mx = clp*ones(1,hp_ln) + MLB(:,rd)*hp_ix; % an ndim by length(hp_ix) matrix
dd_vec = nh_comp + NN_len(rd)*hp_ix; % a length(hp_ix) row vector
rr_new_vec = sqrt(rr^2-dd_vec.^2);
for jj=1:hp_ln
hyp_fun(clp_new_mx(:,jj),bpt_new_mx(:,jj),rr_new_vec(jj),ndim,rd+1);
endfor
else % rd=ndim--so at deepest level of recursion; record the points on the given 1-dim
% "lattice line" that are inside the ball
INP(:,ctr+1:ctr+hp_ln) = clp + MLB(:,rd)*hp_ix;
ctr += hp_ln;
return
endif
endfunction
This has some Octave-y/Matlab-y things in it, but most should be easily understandable; M(:,jj) references column jj of matrix M; the tic ' means take transpose; [A B] concatenates matrices A and B; A=[] declares an empty matrix.
Updated / better optimized from original answer:
"vectorized" the code in the recursive function, to avoid most "for" loops (those slowed it down a factor of ~10; the code now is a bit more difficult to understand though)
pre-allocated memory for the INP matrix-of-interior points (this speeded it up by another order of magnitude; before that, Octave was having to resize the INP matrix for every call to the innermost recursion level--for large matrices/arrays that can really slow things down)
Because this routine was part of a project, I also coded it in Python. From informal testing, the Python version is another 2-3 times faster than this (Octave) version.
For reference, here is the old, much slower code in the original posting of this answer:
% (OLD slower code, using for loops, and constantly resizing
% the INP matrix) loop through detected hyperplanes
if (rd<ndim)
for jj=1:length(hp_ix)
bpt_new = bpt + hp_ix(jj)*NN_len(rd)*nml_hat;
clp_new = clp + hp_ix(jj)*MLB(:,rd);
dd = nh_comp + hp_ix(jj)*NN_len(rd);
rr_new = sqrt(rr^2-dd^2);
hyp_fun(clp_new,bpt_new,rr_new,ndim,rd+1);
endfor
else % rd=ndim--so at deepest level of recursion; record the points on the given 1-dim
% "lattice line" that are inside the ball
for jj=1:length(hp_ix)
clp_new = clp + hp_ix(jj)*MLB(:,rd);
INP = [INP clp_new];
endfor
return
endif

How can I perform this array-slicing and multiplication operation more efficiently?

I used the last two dimensions of this 3D matrix as a 2D matrix. So I just wanna multiply the 2D matrix from the result of Matrix1(i,:,:) (which i-by-i) with the vector Matrix2(i,:).') (which is 1-by-i).
The only way I could do that was using an auxiliary matrix that picked up all the numbers from the 2 dimensions from the 3D matrix:
matrixAux(:,:) = Matrix1(1,:,:)
and then I did the multiplication:
matrixAux * (Matrix2(i,:).')
and it worked. However, this is slow because I need to copy all the 3D matrix to a lot of auxiliary matrices, and I need to speed up my code because I'm doing the same operation many times.
How can I do that more efficiently, without having to copy the matrix?

Approach I: bsxfun Multiplication
One approach would be to use the output of bsxfun with #times whose values you can use instead of calculating the matrix multiplication results in a loop -
sum(bsxfun(#times,Matrix1,permute(Matrix2,[1 3 2])),3).'
Example
As an example, let's suppose Matrix1 and Matrix2 are defined like this -
nrows = 3;
p = 6;
ncols = 2;
Matrix1 = rand(nrows,ncols,p)
Matrix2 = rand(nrows,p)
Then, you have your loop like this -
for i = 1:size(Matrix1,1)
matrixAux(:,:) = Matrix1(i,:,:);
matrix_mult1 = matrixAux * (Matrix2(i,:).') %//'
end
So, instead of the loops, you can directly calculate the matrix multiplication results -
matrix_mult2 = sum(bsxfun(#times,Matrix1,permute(Matrix2,[1 3 2])),3).'
Thus, each column of matrix_mult2 would represent matrix_mult1 at each iteration of the loop, as the output of the codes would make it clearer -
matrix_mult1 =
0.7693
0.8690
matrix_mult1 =
1.0649
1.2574
matrix_mult1 =
1.2949
0.6222
matrix_mult2 =
0.7693 1.0649 1.2949
0.8690 1.2574 0.6222
Approach II: "Full" Matrix Multiplication
Now, this must be exciting! Well you can also leverage MATLAB's fast matrix multiplication to get your intermediate matrix multiplication results again without loops. If Matrix1 is nrows x p x ncols, you can reshape it to nrows*p x ncols and then perform matrix multiplication of it with Matrix2. Then, to get an equivalent of matrix_mult2, you need to select indices from the multiplication result. This is precisely achieved here -
%// Get size of Matrix1 to be used regularly inside the codes later on
[m1,n1,p1] = size(Matrix1);
%// Convert 3D Matrix1 to 2D and thus perform "full" matrix multiplication
fmult = reshape(Matrix1,m1*n1,p1)*Matrix2'; %//'
%// Get valid indices
ind = bsxfun(#plus,[1:m1:size(fmult,1)]',[0:nrows-1]*(size(fmult,1)+1)); %//'
%// Get values from the full matrix multiplication result
matrix_mult3 = fmult(ind);
Here, matrix_mult3 must be the same as matrix_mult2.
Observations: Since we are not using all the values calculated from the full matrix multiplication, rather indexing into it and selecting few of its elements, as such this approach performs better than the other approaches under certain circumstances. This approach seems to be the best one when nrows is a small value as we would be using more number of elements from the full matrix multiplication output in that case.
Benchmark Results
Two cases were tested against the three approaches and the test results seem to support our hypotheses discussed earlier.
Case 1
Matrix 1 as 400 x 400 x 400, the runtimes are -
--------------- With Loops
Elapsed time is 2.253536 seconds.
--------------- With BSXFUN
Elapsed time is 0.910104 seconds.
--------------- With Full Matrix Multiplication
Elapsed time is 4.361342 seconds.
Case 2
Matrix 1 as 40 x 2000 x 2000, the runtimes are -
--------------- With Loops
Elapsed time is 5.402487 seconds.
--------------- With BSXFUN
Elapsed time is 2.585860 seconds.
--------------- With Full Matrix Multiplication
Elapsed time is 1.516682 seconds.

Matlab mode filter for dependent RGB channels

I've been performing a 2D mode filter on an RGB image by running medfilt2 independently on the R,G and B channels. However, splitting the RGB channels like this gives artifacts in the colouring. Is there a way to perform the 2D median filter while keeping RGB values 'together'?
Or, I could explain this more abstractly: Imagine I had a 2D matrix, where each value contained a pair of index coordinates (i.e. a cell matrix of 2X1 vectors). How would I go about performing a median filter on this?
Here's how I can do an independent mode filter (giving the artifacts):
r = colfilt(r0,[5 5],'sliding',#mode);
g = colfilt(g0,[5 5],'sliding',#mode);
b = colfilt(b0,[5 5],'sliding',#mode);
However colfilt won't work on a cell matrix.
Another approach could be to somehow combine my RGB channels into a single number and thus create a standard 2D matrix. Not sure how to implement this, though...
Any ideas?
Thanks for your help.
Cheers,
Hugh
EDIT:
OK, so problem solved. Here's how I did it.
I adapted my question so that I'm no longer dealing with (RGB) vectors, but (UV) vectors. Still essentially the same problem, except that my vectors are 2D not 3D.
So firstly I load the individual U and V channels, arrange them each into a 1D list, then combine them, so I essentially have a list of vectors. Then I reduce it to just those which are unique. Then, I assign each pixel in my matrix the value of the index of that unique vector. After this I can do the mode filter. Then I basically do the reverse, in that I go through the filtered image pixelwise, and read the value at each pixel (i.e. an index in my list), and find the unique vector associated with that index and insert it at that pixel.
% Create index list
img_u = img_iuv(:,:,2);
img_v = img_iuv(:,:,3);
coordlist = unique(cat(2,img_u(:),img_v(:)),'rows');
% Create a 2D matrix of indices
img_idx = zeros(size(img_iuv,1),size(img_iuv,2),2);
for y = 1:length(Y)
for x = 1:length(X)
coords = squeeze(img_iuv(x,y,2:3))';
[~,idx] = ismember(coords,coordlist,'rows');
img_idx(x,y) = idx;
end
end
% Apply the mode filter
img_idx = colfilt(img_idx,[n,n],'sliding',#mode);
% Re-construct the original image using the filtered data
for y = 1:length(Y)
for x = 1:length(X)
idx = img_idx(x,y);
try
coords = coordlist(idx,:);
end
img_iuv(x,y,2:3) = coords(:);
end
end
Not pretty but it gets the job done. I suppose this approach would also work for RGB images, or other similar situations.
Cheers,
Hugh

I don't see how you can define the median of a vector variable. You probably need to reduce the R,G,B components to a single value and then compunte the median on that value. Why not use the intensity level as that single value? You could do it easily with rgb2gray.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio