I have to multiply arrays A and B element by element and calculate the sum of the first dimension, then returns the result in C. A is a N-by-M-by-L matrix. B is a N-by-1-by-L matrix. N and M is lower than 30, but L is very large. My code is:
C=zeros(size(B));
parfor i=1:size(A,2)
C(i,1,:) = sum(bsxfun(#times, A(:,i,:), B(:,1,:)), 1);
end
The problem is the code is slow, anyone can help to make the code faster? Thank you very much.
How about something along the lines of this:
C = permute(sum(A.*repmat(B,1,M)),[2,1,3]);
This speeds computation on my PC up by a factor of ~4. Interestingly enough, you can actually speed up the computation by a factor of 2 (at least on my PC) simply by changing the parfor loop to a for loop.
If I understand correctly, just do this:
C = squeeze(sum(bsxfun(#times, A, B)));
This gives C with size M x L.
Taking the comments from Luis Mendo, I propose to use this command:
C=reshape(sum(bsxfun(#times, A, B), 1), size(B))
I think this is the fastest.
Related
I have this code which runs very slow. Can someone help me vectorize it.
for ii=1:K,
y=w*x(ii,:)'; % y is N by 1
u=zeros(N,M);
disp(num2str(ii));
for jj=1:N,
u(jj,:)=y(jj)*(x(ii,:)-y(1:jj)'*w(1:jj,:));
end
wold=w;
w=wold+eta*u; % updated weight matrix
end
the inner loop takes the most time. The code is for generalised hebb algorithm.
Input sizes:
M=153600;
K=5000;
N=400;
eta=0.004;
size(w)=5000x153600
size(x)=400x153600
You can kill the inner loop to get u with bsxfun -
yN = y(1:N);
u = bsxfun(#times,yN,bsxfun(#minus,x(ii,:),cumsum(bsxfun(#times,w(1:N,:),yN))))
For the outer loop, owing to the data dependency between iterations with the updates on w, it might be hard to vectorize that one.
I have a scalar function f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2) which receives two 2-dimensional vectors as input (norm here implements the Euclidean norm). The values of x,i range in 1:w and the values y,j range in 1:h. I want to create a cell array X such that X{x,y} will contain a w x h matrix such that X{x,y}(i,j) = f([x,y],[i,j]). This can obviously be done using 4 nested loops like so:
for x=1:w;
for y=1:h;
X{x,y}=zeros(w,h);
for i=1:w
for j=1:h
X{x,y}(i,j)=f([x,y],[i,j])
end
end
end
end
This is however extremely inefficient. I would very much appreciate an efficient way to create X.
The one way to do this is to remove the 2 innermost loops and replace then with a vectorised version. By the look of your f function this shouldn't be too bad
First we need to construct two matrices containing the 1 to w on every row and 1 to h on every column like so
wMat=repmat(1:w,h,1);
hMat=repmat(1:h,w,1)';
This is going to represent the inner two loops, and the transpose will allow us to get all combinations. Now we can vectorise the calculation (f([x,y],[i,j])= exp(-norm([x,y]-[i,j])^2/sigma^2)):
for x=1:w;
for y=1:h;
temp1=sqrt((x-wMat).^2+(y-hMat).^2);
X{x,y}=exp(temp1/(sigma^2));
end
end
Where we have computed the Euclidean norm for all pairs of nodes in the inner loops at once.
Some discussion and code
The trick here is to perform the norm-calculations with numeric arrays and save the results into a cell array version as late as possible. For performing the norm-calculations you can take help of ndgrid, bsxfun and some permute + reshape to give it the "shape" as needed for the final cell array version. So, here's the vectorized approach to perform these tasks -
%// Create x-y/i-j values to be used for calculation of function values
[xi,yi] = ndgrid(1:w,1:h);
%// Get the norm values
normvals = sqrt(bsxfun(#minus,xi(:),xi(:).').^2 + ...
bsxfun(#minus,yi(:),yi(:).').^2);
%// Get the actual function values
vals = exp(-normvals.^2/sigma^2);
%// Get the values into blocks of a 4D array and then re-arrange to match
%// with the shape of numeric array version of X
blks = reshape(permute(reshape(vals, w*h, h, []), [2 1 3]), h, w, h, w);
arranged_blks = reshape(permute(blks,[2 3 1 4]),w,h,w,h);
%// Finally get the cell array version
X = squeeze(mat2cell(arranged_blks,w,h,ones(1,w),ones(1,h)));
Benchmarking and runtimes
After improving the original loopy code with pre-allocation for X and function-inling f, runtime-benchmarks were performed with it against the proposed vectorized approach with datasizes as w, h = 60 and the runtime results thus obtained were -
----------- With Improved loopy code
Elapsed time is 41.227797 seconds.
----------- With Vectorized code
Elapsed time is 2.116782 seconds.
This suggested a whooping close to 20x speedup with the proposed solution!
For extremely huge datasizes
If you are dealing with huge datasizes, essentially you are not giving enough memory for bsxfun to work with, and bsxfun is known to use up a lot of memory for giving you a performance-efficient vectorized solution. So, for such huge-datasize cases, you can use the following loopy approach to replace normvals calculations that was listed in the earlier bsxfun based solution -
%// Get the norm values
nx = numel(xi);
normvals = zeros(nx,nx);
for ii = 1:nx
normvals(:,ii) = sqrt( (xi(:) - xi(ii)).^2 + (yi(:) - yi(ii)).^2 );
end
It seems to me that when you run through the cycle for x=w, y=h, you are calculating all the values you need at once. So you don't need recalculate them. Once you have this:
for i=1:w
for j=1:h
temp(i,j)=f([x,y],[i,j])
end
end
Then, e.g. X{1,1} is just temp(1,1), X{2,2} is just temp(1:2,1:2), and so on. If you can vectorise the calculation of f (norm here is just the Euclidean norm of that vector?) then it will get even simpler.
I have a problem which I hope can be easily solved.
A is a NG matrix, B is NG matrix. The goal is to get matrix C
which is equal to multiplying each column of transposed A by each row of B and summing resulting matrices; total number of such matrices before summing is NN, their size is GG
This can be easily done in MatLab with two for-loops:
N=5;
G=10;
A=rand(N,G);
B=rand(N,G);
C=zeros(G);
for n=1:1:N
for m=1:1:N
C=C+A(m,:)'*B(n,:);
end
end
However, for large matrices it is quite slow.
So, my question is:
is there a more efficient way for calculating C matrix in Matlab?
Thank you
If you write it all out for two 3×3 matrices, you'll find that the operation basically equals this:
C = bsxfun(#times, sum(B), sum(A).');
Running each of the answers here for N=50, G=100 and repeating each method 100 times:
Elapsed time is 13.839893 seconds. %// OP's original method
Elapsed time is 19.773445 seconds. %// Luis' method
Elapsed time is 0.306447 seconds. %// Robert's method
Elapsed time is 0.005036 seconds. %// Rody's method
(a factor of ≈ 4000 between the fastest and slowest method...)
I think this should improve the performance significantly
C = zeros(G);
for n = 1:N
C = C + sum(A,1)'*B(n,:);
end
You avoid one loop, and should also avoid the problems of running out of memory. According to my benchmarking, it's about 20 times faster than the approach with two loops. (Note, I had to benchmark in Octace since I don't have MATLAB on this PC).
Use bsxfun instead of the loops, and then sum twice:
C = sum(sum(bsxfun(#times, permute(A, [2 3 1]), permute(B,[3 2 4 1])), 3), 4);
I don't have enough memory to simply create a diagonal D-by-D matrix, since D is large. I keep getting an 'out of memory' error.
Instead of performing M x D x D operations in the first multiplication, I do M x D operations, but still my code takes ages to run.
Can anybody find a more effective way to perform the multiplication A'*B*A? Here's what I've attempted so far:
D=20000
M=25
A = floor(rand(D,M)*10);
B = floor(rand(1,D)*10);
for i=1:D
for j=1:M
result(i,j) = A(i,j) * B(1,j);
end
end
manual = result * A';
auto = A*diag(B)*A';
isequal(manual,auto)
One option that should solve your problem is using sparse matrices. Here's an example:
D = 20000;
M = 25;
A = floor(rand(D,M).*10); %# A D-by-M matrix
diagB = rand(1,D).*10; %# Main diagonal of B
B = sparse(1:D,1:D,diagB); %# A sparse D-by-D diagonal matrix
result = (A.'*B)*A; %'# An M-by-M result
Another option would be to replicate the D elements along the main diagonal of B to create an M-by-D matrix using the function REPMAT, then use element-wise multiplication with A.':
B = repmat(diagB,M,1); %# Replicate diagB to create an M-by-D matrix
result = (A.'.*B)*A; %'# An M-by-M result
And yet another option would be to use the function BSXFUN:
result = bsxfun(#times,A.',diagB)*A; %'# An M-by-M result
Maybe I'm having a bit of a brainfart here, but can't you turn your DxD matrix into a DxM matrix (with M copies of the vector you're given) and then .* the last two matrices rather than multiply them (and then, of course, normally multiply the first with the found product quantity)?
You are getting "out of memory" because MATLAB can not find a chunk of memory large enough to accommodate the entire matrix. There are different techniques to avoid this error described in MATLAB documentation.
In MATLAB you obviously do not need programming explicit loops in most cases because you can use operator *. There exists a technique how to speed up matrix multiplication if it is done with explicit loops, here is an example in C#. It has a good idea how (potentially large) matrix can be split into smaller matrices. To contain these smaller matrices in MATLAB you can use cell matrix. It is much more probably that system finds enough RAM to accommodate two smaller sub-matrices then the resulting large matrix.
I have five values, A, B, C, D and E.
Given the constraint A + B + C + D + E = 1, and five functions F(A), F(B), F(C), F(D), F(E), I need to solve for A through E such that F(A) = F(B) = F(C) = F(D) = F(E).
What's the best algorithm/approach to use for this? I don't care if I have to write it myself, I would just like to know where to look.
EDIT: These are nonlinear functions. Beyond that, they can't be characterized. Some of them may eventually be interpolated from a table of data.
There is no general answer to this question. A solver finding the solution to any equation does not exist. As Lance Roberts already says, you have to know more about the functions. Just a few examples
If the functions are twice differentiable, and you can compute the first derivative, you might try a variant of Newton-Raphson
Have a look at the Lagrange Multiplier Method for implementing the constraint.
If the function F is continuous (which it probably is, if it is an interpolant), you could also try the Bisection Method, which is a lot like binary search.
Before you can solve the problem, you really need to know more about the function you're studying.
As others have already posted, we do need some more information on the functions. However, given that, we can still try to solve the following relaxation with a standard non-linear programming toolbox.
min k
st.
A + B + C + D + E = 1
F1(A) - k = 0
F2(B) - k = 0
F3(C) -k = 0
F4(D) - k = 0
F5(E) -k = 0
Now we can solve this in any manner we wish, such as penalty method
min k + mu*sum(Fi(x_i) - k)^2
st
A+B+C+D+E = 1
or a straightforward SQP or interior-point method.
More details and I can help advise as to a good method.
m
The functions are all monotonically increasing with their argument. Beyond that, they can't be characterized. The approach that worked turned out to be:
1) Start with A = B = C = D = E = 1/5
2) Compute F1(A) through F5(E), and recalculate A through E such that each function equals that sum divided by 5 (the average).
3) Rescale the new A through E so that they all sum to 1, and recompute F1 through F5.
4) Repeat until satisfied.
It converges surprisingly fast - just a few iterations. Of course, each iteration requires 5 root finds for step 2.
One solution of the equations
A + B + C + D + E = 1
F(A) = F(B) = F(C) = F(D) = F(E)
is to take A, B, C, D and E all equal to 1/5. Not sure though whether that is what you want ...
Added after John's comment (thanks!)
Assuming the second equation should read F1(A) = F2(B) = F3(C) = F4(D) = F5(E), I'd use the Newton-Raphson method (see Martijn's answer). You can eliminate one variable by setting E = 1 - A - B - C - D. At every step of the iteration you need to solve a 4x4 system. The biggest problem is probably where to start the iteration. One possibility is to start at a random point, do some iterations, and if you're not getting anywhere, pick another random point and start again.
Keep in mind that if you really don't know anything about the function then there need not be a solution.
ALGENCAN (part of TANGO) is really nice. There are Python bindings, too.
http://www.ime.usp.br/~egbirgin/tango/codes.php - " general nonlinear programming that does not use matrix manipulations at all and, so, is able to solve extremely large problems with moderate computer time. The general algorithm is of Augmented Lagrangian type ... "
http://pypi.python.org/pypi/TANGO%20Project%20-%20ALGENCAN/1.0
Google OPTIF9 or ALLUNC. We use these for general optimization.
You could use standard search technic as the others mentioned. There are a few optimization you could make use of it while doing the search.
First of all, you only need to solve A,B,C,D because 1-E = A+B+C+D.
Second, you have F(A) = F(B) = F(C) = F(D), then you can search for A. Once you get F(A), you could solve B, C, D if that is possible. If it is not possible to solve the functions, you need to continue search each variable, but now you have a limited range to search for because A+B+C+D <= 1.
If your search is discrete and finite, the above optimizations should work reasonable well.
I would try Particle Swarm Optimization first. It is very easy to implement and tweak. See the Wiki page for it.