Speedup by reducing number of for loops - performance

I am currently coding using Matlab. As everyone knows, for loops are slow, while Matlab does matrix multiplication very efficiently. Unfortunately, my code is full of for loops.
My code resembles this:
function FInt = ComputeF(A, B, C, D, E, F, G, H)
%A is a real column vector of size [Na, 1].
%B is a real column vector of size [Nb, 1].
%C is a real column vector of size [Nc, 1].
%D is a real column vector of size [Nd, 1].
%E, F, G are real column vectors of the same size as A.
%H is a real column vector of the same size as A.
%This function evaluates FInt, a tensor of size [Na, Nb, Nc, Nd].
%Recording the correct dimensions and initializing FInt
Na = size(A, 1);
Nb = size(B, 1);
Nc = size(C, 1);
Nd = size(D, 1);
FInt = zeros(Na, Nb, Nc, Nd);
%Computing the tensor FInt
for na = 1:Na
for nc=1:Nc
for nd=1:Nd
%Calculating intermediate values
S1 = -((B(:) - C(nc) + E(na)) ./ (2 * sin(D(nd) ./ 2))).^2;
S2 = (B(:) + C(nc) + F(na)) ./ (2 .* cos(D(nd) ./ 2));
S3 = (B(:) + C(nc) + G(na)) ./ (2 .* cos(D(nd) ./ 2));
S4 = H(na) ./ cos(D(nd) ./ 2);
%Calculating the integrand FInt
FInt(na, nc, :, nd) = exp(S1) .* (sinh(S2 + 1i * S4) + conj(sinh(S3 + 1i * S4)));
end
end
end
end
As you can see I have already tried to vectorize the process by using : for the vector B, improving at least a bit the speed of computations. (Why B? Usually it is the longest vector).
My problem is that the quantities depend on so many indexes that I have no idea how to vectorize it properly.

In numpy, there is an idea that is formally called broadcasting. MATLAB introduced the concept in R2016b. It is called "vectorization", "expansion", and sometimes "broadcasting" in the MATLAB community. The idea is that if you line up the dimensions of a bunch of arrays, you can expand unit dimensions to match the full ones. Here is a great resource on the subject: https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/.
If you want your result to have size [Na, Nb, Nc, Nd], you can make all your arrays of the appropriate size, with ones filling in the missing dimensions:
A = reshape(A, Na, 1, 1, 1);
B = reshape(B, 1, Nb, 1, 1);
C = reshape(C, 1, 1, Nc, 1);
D = reshape(D, 1, 1, 1, Nd);
E = reshape(E, Na, 1, 1, 1);
F = reshape(F, Na, 1, 1, 1);
G = reshape(G, Na, 1, 1, 1);
H = reshape(H, Na, 1, 1, 1);
Now you can perform vectorized operations on these arrays directly with no ambiguity:
S1 = -((B - C + E) ./ (2 * sin(D ./ 2))).^2;
S2 = (B + C + F) ./ (2 .* cos(D ./ 2));
S3 = (B + C + G) ./ (2 .* cos(D ./ 2));
S4 = H ./ cos(D ./ 2);
%Calculating the integrand F
FInt = exp(S1) .* (sinh(S2 + 1i * S4) + conj(sinh(S3 + 1i * S4)));
Notice that all the explicit loops were removed here. The sizes of the intermediate arrays depend on the sizes of their inputs:
size(S1) == [Na, Nb, Nc, Nd]
size(S2) == [Na, Nb, Nc, Nd]
size(S3) == [Na, Nb, Nc, Nd]
size(S4) == [Na, 1, 1, Nd]
You don't need to preallocate the output because it automatically results from the sizes of the inputs.

Related

Solving systems of second order differential equations

I'm working on a script in mathematica that will take simulate a string held at either end and plucked, by solving the wave equation via numerical methods. (http://en.wikipedia.org/wiki/Wave_equation#Investigation_by_numerical_methods)
n = 5; (*The number of discreet elements to be used*)
L = 1.0; (*The length of the string that is vibrating*)
a = 1.0/3.0; (*The distance from the left side that the string is \
plucked at*)
T = 1; (*The tension in the string*)
[Rho] = 1; (*The length density of the string*)
y0 = 0.1; (*The vertical distance of the string pluck*)
[CapitalDelta]x = L/n; (*The length of each discreet element*)
m = ([Rho]*L)/n;(*The mass of each individual node*)
c = Sqrt[T/[Rho]];(*The speed at which waves in the string propogate*)
I set all my variables
Y[t] = Array[f[t], {n - 1, 1}];
MatrixForm(*Creates a vector size n-1 by 1 of functions \
representing each node*)
I define my Vector of nodal position functions
K = MatrixForm[
SparseArray[{Band[{1, 1}] -> -2, Band[{2, 1}] -> 1,
Band[{1, 2}] -> 1}, {n - 1,
n - 1}]](*Creates a matrix size n by n governing the coupling \
between each node*)
I create the stiffness matrix relating all the nodal functions to one another
Y0 = MatrixForm[
Table[Piecewise[{{(((i*L)/n)*y0)/a,
0 < ((i*L)/n) < a}, {(-((i*L)/n)*y0)/(L - a) + (y0*L)/(L - a),
a < ((i*L)/n) < L}}], {i, 1, n - 1}]]
I define the initial positions of each node using a piecewise function
NDSolve[{Y''[t] == (c/[CapitalDelta]x)^2 Y[t].K, Y[0] == Y0,
Y'[0] == 0},
Y, {t, 0, 10}];(*Numerically solves the system of second order DE's*)
Finally, This should solve for the values of the individual nodes, but it returns an error:
"NDSolve::ndinnt : Initial condition [Y0 table] is not a number or a rectangular array"
So , it would seem that I don't have a firm grasp on how matrices work in mathematica. I would greatly appreciate it if anyone could help me get this last line of code to run properly.
Thank you,
Brad
I don't think you should use MatrixForm when defining the matrices. MatrixForm is used to format a list of list as a matrix, usually when you display it. Try removing it and see if it works.

optimization of pairwise L2 distance computations

I need help optimizing this loop. matrix_1 is a (nx 2) int matrix and matrix_2 is a (m x 2), m & n very.
index_j = 1;
for index_k = 1:size(Matrix_1,1)
for index_l = 1:size(Matrix_2,1)
M2_Index_Dist(index_j,:) = [index_l, sqrt(bsxfun(#plus,sum(Matrix_1(index_k,:).^2,2),sum(Matrix_2(index_l,:).^2,2)')-2*(Matrix_1(index_k,:)*Matrix_2(index_l,:)'))];
index_j = index_j + 1;
end
end
I need M2_Index_Dist to provide a ((n*m) x 2) matrix with the index of matrix_2 in the first column and the distance in the second column.
Output example:
M2_Index_Dist = [ 1, 5.465
2, 56.52
3, 6.21
1, 35.3
2, 56.52
3, 0
1, 43.5
2, 9.3
3, 236.1
1, 8.2
2, 56.52
3, 5.582]
Here's how to apply bsxfun with your formula (||A-B|| = sqrt(||A||^2 + ||B||^2 - 2*A*B)):
d = real(sqrt(bsxfun(#plus, dot(Matrix_1,Matrix_1,2), ...
bsxfun(#minus, dot(Matrix_2,Matrix_2,2).', 2 * Matrix_1*Matrix_2.')))).';
You can avoid the final transpose if you change your interpretation of the matrix.
Note: There shouldn't be any complex values to handle with real but it's there in case of very small differences that may lead to tiny negative numbers.
Edit: It may be faster without dot:
d = sqrt(bsxfun(#plus, sum(Matrix_1.*Matrix_1,2), ...
bsxfun(#minus, sum(Matrix_2.*Matrix_2,2)', 2 * Matrix_1*Matrix_2.'))).';
Or with just one call to bsxfun:
d = sqrt(bsxfun(#plus, sum(Matrix_1.*Matrix_1,2), sum(Matrix_2.*Matrix_2,2)') ...
- 2 * Matrix_1*Matrix_2.').';
Note: This last order of operations gives identical results to you, rather than with an error ~1e-14.
Edit 2: To replicate M2_Index_Dist:
II = ndgrid(1:size(Matrix_2,1),1:size(Matrix_2,1));
M2_Index_Dist = [II(:) d(:)];
If I understand correctly, this does what you want:
ind = repmat((1:size(Matrix_2,1)).',size(Matrix_1,1),1); %'// first column: index
d = pdist2(Matrix_2,Matrix_1); %// compute distance between each pair of rows
d = d(:); %// second column: distance
result = [ind d]; %// build result from first column and second column
As you see, this code calls pdist2 to compute the distance between every pair of rows of your matrices. By default this function uses Euclidean distance.
If you don't have pdist2 (which is part of the the Statistics Toolbox), you can replace line 2 above with bsxfun:
d = squeeze(sqrt(sum(bsxfun(#minus,Matrix_2,permute(Matrix_1, [3 2 1])).^2,2)));

Vectorization: friend or foe? bsxfun/arrayfun to avoid loops, repmat, permute, squeeze, etc

This question is related to this question and probably to this other as well.
Suppose you have two matrices A and B. A is M-by-N and B is N-by-K. I want to obtain an M-by-K matrix C such that C(i, j) = 1 - prod(1 - A(i, :)' .* B(:, j)). I have tried some solutions in Matlab - I am here comparing their computation performance.
% Size of matrices:
M = 4e3;
N = 5e2;
K = 5e1;
GG = 50; % GG instances
rntm1 = zeros(GG, 1); % running time of first algorithm
rntm2 = zeros(GG, 1); % running time of second algorithm
rntm3 = zeros(GG, 1); % running time of third algorithm
rntm4 = zeros(GG, 1); % running time of fourth algorithm
rntm5 = zeros(GG, 1); % running time of fifth algorithm
for gg = 1:GG
A = rand(M, N); % M-by-N matrix of random numbers
A = A ./ repmat(sum(A, 2), 1, N); % M-by-N matrix of probabilities (?)
B = rand(N, K); % N-by-K matrix of random numbers
B = B ./ repmat(sum(B), N, 1); % N-by-K matrix of probabilities (?)
%% First solution
% One-liner solution:
tic
C = squeeze(1 - prod(1 - repmat(A, [1 1 K]) .* permute(repmat(B, [1 1 M]), [3 1 2]), 2));
rntm1(gg) = toc;
%% Second solution
% Full vectorization, using meshgrid, arrayfun and reshape (from Luis Mendo, second link above)
tic
[ii jj] = meshgrid(1:size(A, 1), 1:size(B, 2));
D = arrayfun(#(n) 1 - prod(1 - A(ii(n), :)' .* B(:, jj(n))), 1:numel(ii));
D = reshape(D, size(B, 2), size(A, 1)).';
rntm2(gg) = toc;
clear ii jj
%% Third solution
% Partial vectorization 1
tic
E = zeros(M, K);
for hh = 1:M
tmp = repmat(A(hh, :)', 1, K);
E(hh, :) = 1 - prod((1 - tmp .* B), 1);
end
rntm3(gg) = toc;
clear tmp hh
%% Fourth solution
% Partial vectorization 2
tic
F = zeros(M, K);
for hh = 1:M
for ii = 1:K
F(hh, ii) = 1 - prod(1 - A(hh, :)' .* B(:, ii));
end
end
rntm4(gg) = toc;
clear hh ii
%% Fifth solution
% No vectorization at all
tic
G = ones(M, K);
for hh = 1:M
for ii = 1:K
for jj = 1:N
G(hh, ii) = G(hh, ii) * prod(1 - A(hh, jj) .* B(jj, ii));
end
G(hh, ii) = 1 - G(hh, ii);
end
end
rntm5(gg) = toc;
clear hh ii jj C D E F G
end
prctile([rntm1 rntm2 rntm3 rntm4 rntm5], [2.5 25 50 75 97.5])
% 3.6519 3.5261 0.5912 1.9508 2.7576
% 5.3449 6.8688 1.1973 3.3744 3.9940
% 8.1094 8.7016 1.4116 4.9678 7.0312
% 8.8124 10.5170 1.9874 6.1656 8.8227
% 9.5881 12.0150 2.1529 6.6445 9.5115
mean([rntm1 rntm2 rntm3 rntm4 rntm5])
% 7.2420 8.3068 1.4522 4.5865 6.4423
std([rntm1 rntm2 rntm3 rntm4 rntm5])
% 2.1070 2.5868 0.5261 1.6122 2.4900
The solutions are equivalent but the algorithms with partial vectorization are way more efficient in terms of memory and execution time. Even the triple loop seems to perform better than arrayfun! Is there any approach that is actually better than the third, only partially vectorized solution?
EDIT: Dan's solutions are the best so far. Let rntm6, rntm7 and rntm8 be the runtime of his first, second and third solution. Then:
prctile(rntm6, [2.5 25 50 75 97.5])
% 0.6337 0.6377 0.6480 0.7110 1.2932
mean(rntm6)
% 0.7440
std(rntm6)
% 0.1970
prctile(rntm7, [2.5 25 50 75 97.5])
% 0.6898 0.7130 0.9050 1.1505 1.4041
mean(rntm7)
% 0.9313
std(rntm7)
% 0.2276
prctile(rntm8, [2.5 25 50 75 97.5])
% 0.5949 0.6005 0.6036 0.6370 1.3529
mean(rntm8)
% 0.6753
std(rntm8)
% 0.1890
You can get a minor performance gain with bsxfun:
E = zeros(M, K);
for hh = 1:M
E(hh, :) = 1 - prod((1 - bsxfun(#times, A(hh,:)', B)), 1);
end
And you could squeeze (pun intended) a tiny bit more performance with this:
E = squeeze(1 - prod((1-bsxfun(#times, permute(B, [3 1 2]), A)),2));
Or you could try pre-compute the transpose for my first suggestion:
E = zeros(M, K);
At = A';
for hh = 1:M
E(hh, :) = 1 - prod((1 - bsxfun(#times, At(:,hh), B)), 1);
end
One situation where you would absolutely benefit from using arrayfun or bsxfun is where you have Parallel Computing Toolbox available and a compatible NVIDIA GPU. In that case, the performance of those two functions is blazingly fast since the body can be sent to the GPU for execution there. See for example: http://www.mathworks.co.uk/help/distcomp/examples/improve-performance-of-element-wise-matlab-functions-on-the-gpu-using-arrayfun.html

How to find optimal overlap of noisy bivalent matricies

I'm dealing with an image processing problem that I've simplified as follows. I have three 10x10 matrices, each with the values 1 or -1 in each cell. Each matrix has an irregular object located somewhere, and there is some noise in the matrix. I'd like to figure out how to find the optimal alignment of the matrices that would let me line up the objects so I can get their average.
With the 1/-1 coding, I know that the product of two matrices (using element-wise multiplication, not matrix multiplication) will yield 1 if there is a match between two multiplied cells and -1 if there is a mismatch, thus the sum of the products yields a measure of overlap. With this, I know I can try out all possible alignments of two matrices to find that which yields the optimal overlap, but I'm not sure how to do this with 3 matrices (or more - I really have 20+ in my actual data set).
To help clarify the problem, here is some code, written in R, that sets up the sort of matricies I'm dealing with:
#set up the 3 matricies
m1 = c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1)
m1 = matrix(m1,10)
m2 = c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1)
m2 = matrix(m2,10)
m3 = c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1)
m3 = matrix(m3,10)
#show the matricies
image(m1)
image(m2)
image(m3)
#notice there's a "+" shaped object in each
#create noise
set.seed(1)
n1 = sample(c(1,-1),100,replace=T,prob=c(.95,.05))
n1 = matrix(n1,10)
n2 = sample(c(1,-1),100,replace=T,prob=c(.95,.05))
n2 = matrix(n2,10)
n3 = sample(c(1,-1),100,replace=T,prob=c(.95,.05))
n3 = matrix(n3,10)
#add noise to the matricies
mn1 = m1*n1
mn2 = m2*n2
mn3 = m3*n3
#show the noisy matricies
image(mn1)
image(mn2)
image(mn3)
Here is a program in Mathematica that does what you want (I think).
I may explain it in more detail, if you need.
(*define temp tables*)
r = m = Table[{}, {100}];
(*define noise function*)
noise := Partition[RandomVariate[BinomialDistribution[1, .05], 100],
10];
For[i = 1, i <= 100, i++,
(*generate 100 10x10 matrices with the random cross and noise added*)
w = RandomInteger[6]; h = w = RandomInteger[6];
m[[i]] = (ArrayPad[CrossMatrix[4, 4], {{w, 6 - w}, {h, 6 - h}}] +
noise) /. 2 -> 1;
(*Select connected components in each matrix and keep only the biggest*)
id = Last#
Commonest[
Flatten#(mf =
MorphologicalComponents[m[[i]], CornerNeighbors -> False]), 2];
d = mf /. {id -> x, x_Integer -> 0} /. {x -> 1};
{minX, maxX, minY, maxY} =
{Min#Thread[g[#]] /. g -> First,
Max#Thread[g[#]] /. g -> First,
Min#Thread[g[#]] /. g -> Last,
Max#Thread[g[#]] /. g -> Last} &#Position[d, 1];
(*Trim the image of the biggest component *)
r[[i]] = d[[minX ;; maxX, minY ;; maxY]];
]
(*As the noise is low, the more repeated component is the image*)
MatrixPlot ## Commonest#r
Result:

Initial conditions with a non-linear ODE in Mathematica

I'm trying to use Mathematica's NDSolve[] to compute a geodesic along a sphere using the coupled ODE:
x" - (x" . x) x = 0
The problem is that I can only enter initial conditions for x(0) and x'(0) and the solver is happy with the solution where x" = 0. The problem is that my geodesic on the sphere has the initial condition that x"(0) = -x(0), which I have no idea how to tell mathematica. If I add this as a condition, it says I'm adding True to the list of conditions.
Here is my code:
s1 = NDSolve[{x1''[t] - (x1[t] * x1''[t] + x2[t] * x2''[t] + x3[t]*x3''[t]) * x1[t] == 0, x2''[t] - (x1[t] * x1''[t] + x2[t] * x2''[t] + x3[t]*x3''[t]) * x2[t] == 0, x3''[t] - (x1[t] * x1''[t] + x2[t] * x2''[t] + x3[t]*x3''[t]) * x3[t] == 0, x1[0] == 1, x2[0] == 0, x3[0] == 0, x1'[0] == 0, x2'[0] == 0, x3'[0] == 1} , { x1, x2, x3}, {t, -1, 1}][[1]]
I would like to modify this so that the initial acceleration is not zero but -x(0).
Thanks
Well, as the error message says -- NDSolve only accepts initial conditions for derivatives of orders strictly less than the maximal order appearing in the ODE.
I have a feeling this is more of a mathematics question. Mathematically, {x''[0]=-x0, x[0]==x0}, doesn't define a unique solution - you'd have to do something along the lines of {x0.x''[0]==-1, x[0]==x0, x'[0]-x0 x0.x'[0]==v0} for that to work out (NDSolve would still fail with the same error). You do realize you will just get a great circle on the unit sphere, right?
By the way, here is how I would have coded up your example:
x[t_] = Table[Subscript[x, j][t], {j, 3}];
s1 = NDSolve[Flatten[Thread /# #] &#{
x''[t] - (x''[t].x[t]) x[t] == {0, 0, 0},
x[0] == {1, 0, 0},
x'[0] == {0, 0, 1}
}, x[t], {t, -1, 1}]
I fixed this problem through a mathematical rearrangement rather than addressing my original issue:
Let V(t) be a vector field along x(t).
x . V = 0 implies d/dt (x . V) = (x' . V) + (x . V') = 0
So the equation D/dt V = V' - (x . V') x = V' + (x' . V) x holds
This means the geodesic equation becomes: x" + (x' . x') x = 0 and so it can be solved using the initial conditions I originally had.
Thanks a lot Janus for going through and pointing out the various problems I was having including horrible code layout, I learnt a lot through your re-writing as well.

Resources