Speeding up program in matlab

Speeding up program in matlab - performance

I have 2 functions:
ccexpan - which calculates coefficients of interpolating polynomial of function f with N nodes in Chebyshew polynomial of the first kind basis.
csum - calculates value for arguments t using coefficients c from ccexpan (using Clenshaw algorithm).
This is what I have written so far:
function c = ccexpan(f,N)
z = zeros (1,N+1);
s = zeros (1,N+1);
for i = 1:(N+1)
z(i) = pi*(i-1)/N;
end
t = f(cos(z));
for k = 1:(N+1)
s(k) = sum(t.*cos(z.*(k-1)));
s(k) = s(k)-(f(1)+f(-1)*cos(pi*(k-1)))/2;
end
c = s.*2/N;
and:
function y = csum(t,c)
M = length(t);
N = length(c);
y = t;
b = zeros(1,N+2);
for k = 1:M
for i = N:-1:1
b(i) = c(i)+2*t(k)*b(i+1)-b(i+2);
end
y(k)=(b(1)-b(3))/2;
end
Unfortunately these programs are very slow, and also slightly inacurrate. Please give me some tips on how to speed them up, and how to improve accuracy.

Where possible try to get away from looping structures. At first blush, I would trade out your first for loop of
for i = 1:(N+1)
z(i) = pi*(i-1)/N;
end
and replace with
i=1:(N+1)
z = pi*(i-1)/N
I did not check the rest of you code but the above example will definitely speed up you code. And a second strategy is to combine loops when possible.

Martin,
Consider the following strategy.
% create hypothetical N and f
N = 3
f = #(x) 1./(1+15*x.*x)
% calculate z and t
i=1:(N+1)
z = pi*(i-1)/N
t = f(cos(z))
% make a column vector of k's
k = (1:(N+1))'
% do this: s(k) = sum(t.*cos(z.*(k-1)))
s1 = t.*cos(z.*(k-1)) % should be a matrix with one row for each row of k
% via implicit expansion
s2 = sum(s1,2) % row sum, i.e., one value for each row of k
% do this: s(k) = s(k)-(f(1)+f(-1)*cos(pi*(k-1)))/2
s3 = s2 - (f(1)+f(-1)*cos(pi*(k-1)))/2
% calculate c
c = s3 .* 2/N

Related

Optimizing algorithm calculating (sin(x)-x)*x^{-3} (in matlab)

My task is to write optimal program that calculates matrix Y, given matrix X, where:
y = (sin(x)-x) x-3
Here's the code I have written so far:
n = size(X, 1);
m = size(X, 2);
Y = zeros(n, m);
d = n*m;
for i = 1:d
x = X(i);
if abs(x)<0.1
Y(i) = -1/6+x.^2/120-x.^4/5040+x.^6/362880;
else
Y(i) = (sin(x)-x).*(x.^(-3));
end
end
So, generally the formula was inaccurate around 0, so I have approximated it using Taylor theorem.
Unfortunately this program has accuracy of 91% and efficiency of only 24% (so it's 4 times slower than the optimal solution).
The tests are around 13 million samples, out of which around 6 million have value of less than 0.1. The range of samples is (-8π , 8π).
The target accuracy (100%) is 4*epsilon where epsilon equals 2^(-52) (that means that numbers calculated by program shouldn't be larger or smaller than numbers calculated "perfectly" than 4*epsilon).
100*epsilon means accuracy of 86%.
Do you have any ideas on how to make it faster and more accurate? I'm looking both for mathematical tricks on how to further transform given formula, and general MATLAB tips that can accelerate programs?
EDIT:
Using Horner method, I have managed to bring up efficiency up to 81% (accuracy still 91%) with this program:
function Y = main(X)
Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = horner(X(i));
function y = horner (x)
pow = x.*x;
y = -1/6+pow.*(1/120+pow.*(-1/5040+pow./362880));
Do you have any further ideas on how to improve it?

Program seems to work fine for a great range of input:
x = linspace(-8*pi,8*pi,13e6); % 13 million samples in the desired range
y = (sin(x)-x)./x.^3;
plot(x,y)
Due due round-off errors, you may have problem calculating it for very small values of x:
x = 0
y = (sin(x)-x)./x.^3
y =
NaN
You already have the Taylor series expansion of the function around 0. As the Taylor expansion does not include a division by x, you can expect a better behaviour of the Taylor function around this region:
x = -1e-6:1e-9:1e-6;
y = (sin(x)-x)./x.^3;
y_taylor = -1/6 + x.^2/120 - x.^4/5040 + x.^6/362880;
plot(x,y,x,y_taylor); legend('y','taylor expansion','location','best')

You can replace your loop with vectorized code. This is usually more efficient than loop because the loop has a conditional in it, which is bad for branch prediction:
Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = -1/6+X(i).^2/120-X(i).^4/5040+X(i).^6/362880;
Rewriting the primary equation to avoid the cubic root yields a 3x speedup for that computation:
Y = (sin(X)./X - 1) ./ (X.*X);
Speed comparison:
The following script compares timing for this method compared to OP's loop code. I use data that has 7 million values uniformly distributed in (-8π, 8π), and another 6 million values uniformly distributed in (-0.1,0.1).
OP's loop code takes 2.4412 s, and the vectorized solution takes 0.7224 s. Using OP's Horner method and the rewritten sin expression it takes 0.1437 s.
X = [linspace(-8*pi,8*pi,7e6), linspace(-0.1,0.1,6e6)];
timeit(#()method1(X))
timeit(#()method2(X))
function Y = method1(X)
n = size(X, 1);
m = size(X, 2);
Y = zeros(n, m);
d = n*m;
for i = 1:d
x = X(i);
if abs(x)<0.1
Y(i) = -1/6+x.^2/120-x.^4/5040+x.^6/362880;
else
Y(i) = (sin(x)-x).*(x.^(-3));
end
end
end
function Y = method2(X)
Y = (sin(X)-X).*(X.^(-3));
i = abs(X) < 0.1;
Y(i) = -1/6+X(i).^2/120-X(i).^4/5040+X(i).^6/362880;
end
function Y = method3(X)
Y = (sin(X)./X - 1) ./ (X.*X);
i = abs(X) < 0.1;
Y(i) = horner(X(i));
end
function y = horner (x)
pow = x.*x;
y = -1/6+pow.*(1/120+pow.*(-1/5040+pow./362880));
end

How to use the RK4 algorithm to solve an ODE?

I am using an RK4 algorithm:
function R=RK4_h(f,a,b,ya,h)
% Input
% - f field of the edo y'=f(t,y). A string of characters 'f'
% - a and b initial and final time
% - ya initial value y0
% - h lenght of the step
% Output
% - R=[T' Y'] where T independent variable and Y dependent variable
N = fix((b-a) / h);
T = zeros(1,N+1);
Y = zeros(1,N+1);
% Vector of the time values
T = a:h:b;
% Solving ordinary differential equation
Y(1) = ya;
for j = 1:N
k1 = h*feval(f,T(j),Y(j));
k2 = h*feval(f,T(j)+h/2,Y(j)+k1/2);
k3 = h*feval(f,T(j)+h/2,Y(j)+k2/2);
k4 = h*feval(f,T(j)+h,Y(j)+k3);
Y(j+1) = Y(j) + (k1+2*k2+2*k3+k4)/6;
end
R=[T' Y'];
In my main script I call it for every value as:
xlabel('x')
ylabel('y')
h=0.05;
fprintf ('\n First block \n');
xx = [0:h:1];
Nodes = length(xx);
yy = zeros(1,Nodes);
for i=1:Nodes
fp(i)=feval('edo',-1,xx(i));
end
E=RK4_h('edo',0,1,-1,h);
plot(E);
fprintf ('\n%f',E);
The problem is when I try to use RK4 algorithm with edo formula:
function edo = edo(y,t)
edo = 6*((exp(1))^(6*t))*(y-(2*t))^2+2;
The results are not logical, for example, the real value for are: y(0)=8, y(1)=11,53. But the estimate is not close. Any of both coordinates in E vector represents a feasible approach for the problem, so I do not know if this is the correct implementation.
There is a basic error of implementation?

The function edo takes t as the first parameter, and y as the second parameter. You have the parameters reversed.
Your function should be:
function edo = edo(t,y) % NOT edo(y,t)
edo = 6*((exp(1))^(6*t))*(y-(2*t))^2+2;

Compute double sum in matlab efficiently?

I am looking for an optimal way to program this summation ratio. As input I have two vectors v_mn and x_mn with (M*N)x1 elements each.
The ratio is of the form:
The vector x_mn is 0-1 vector so when x_mn=1, the ration is r given above and when x_mn=0 the ratio is 0.
The vector v_mn is a vector which contain real numbers.
I did the denominator like this but it takes a lot of times.
function r_ij = denominator(v_mn, M, N, i, j)
%here x_ij=1, to get r_ij.
S = [];
for m = 1:M
for n = 1:N
if (m ~= i)
if (n ~= j)
S = [S v_mn(i, n)];
else
S = [S 0];
end
else
S = [S 0];
end
end
end
r_ij = 1+S;
end
Can you give a good way to do it in matlab. You can ignore the ratio and give me the denominator which is more complicated.
EDIT: I am sorry I did not write it very good. The i and j are some numbers between 1..M and 1..N respectively. As you can see, the ratio r is many values (M*N values). So I calculated only the value i and j. More precisely, I supposed x_ij=1. Also, I convert the vectors v_mn into a matrix that's why I use double index.

If you reshape your data, your summation is just a repeated matrix/vector multiplication.
Here's an implementation for a single m and n, along with a simple speed/equality test:
clc
%# some arbitrary test parameters
M = 250;
N = 1000;
v = rand(M,N); %# (you call it v_mn)
x = rand(M,N); %# (you call it x_mn)
m0 = randi(M,1); %# m of interest
n0 = randi(N,1); %# n of interest
%# "Naive" version
tic
S1 = 0;
for mm = 1:M %# (you call this m')
if mm == m0, continue; end
for nn = 1:N %# (you call this n')
if nn == n0, continue; end
S1 = S1 + v(m0,nn) * x(mm,nn);
end
end
r1 = v(m0,n0)*x(m0,n0) / (1+S1);
toc
%# MATLAB version: use matrix multiplication!
tic
ninds = [1:m0-1 m0+1:M];
minds = [1:n0-1 n0+1:N];
S2 = sum( x(minds, ninds) * v(m0, ninds).' );
r2 = v(m0,n0)*x(m0,n0) / (1+S2);
toc
%# Test if values are equal
abs(r1-r2) < 1e-12
Outputs on my machine:
Elapsed time is 0.327004 seconds. %# loop-version
Elapsed time is 0.002455 seconds. %# version with matrix multiplication
ans =
1 %# and yes, both are equal
So the speedup is ~133×
Now that's for a single value of m and n. To do this for all values of m and n, you can use an (optimized) double loop around it:
r = zeros(M,N);
for m0 = 1:M
xx = x([1:m0-1 m0+1:M], :);
vv = v(m0,:).';
for n0 = 1:N
ninds = [1:n0-1 n0+1:N];
denom = 1 + sum( xx(:,ninds) * vv(ninds) );
r(m0,n0) = v(m0,n0)*x(m0,n0)/denom;
end
end
which completes in ~15 seconds on my PC for M = 250, N= 1000 (R2010a).
EDIT: actually, with a little more thought, I was able to reduce it all down to this:
denom = zeros(M,N);
for mm = 1:M
xx = x([1:mm-1 mm+1:M],:);
denom(mm,:) = sum( xx*v(mm,:).' ) - sum( bsxfun(#times, xx, v(mm,:)) );
end
denom = denom + 1;
r_mn = x.*v./denom;
which completes in less than 1 second for N = 250 and M = 1000 :)

For a start you need to pre-alocate your S matrix. It changes size every loop so put
S = zeros(m*n, 1)
at the start of your function. This will also allow you to do away with your else conditional statements, ie they will reduce to this:
if (m ~= i)
if (n ~= j)
S(m*M + n) = v_mn(i, n);
Otherwise since you have to visit every element im afraid it may not be able to get much faster.
If you desperately need more speed you can look into doing some mex coding which is code in c/c++ but run in matlab.
http://www.mathworks.com.au/help/matlab/matlab_external/introducing-mex-files.html

Rather than first jumping into vectorization of the double loop, you may want modify the above to make sure that it does what you want. In this code, there is no summing of the data, instead a vector S is being resized at each iteration. As well, the signature could include the matrices V and X so that the multiplication occurs as in the formula (rather than just relying on the value of X to be zero or one, let us pass that matrix in).
The function could look more like the following (I've replaced the i,j inputs with m,n to be more like the equation):
function result = denominator(V,X,m,n)
% use the size of V to determine M and N
[M,N] = size(V);
% initialize the summed value to one (to account for one at the end)
result = 1;
% outer loop
for i=1:M
% ignore the case where m==i
if i~=m
for j=1:N
% ignore the case where n==j
if j~=n
result = result + V(m,j)*X(i,j);
end
end
end
end
Note how the first if is outside of the inner for loop since it does not depend on j. Try the above and see what happens!

You can vectorize from within Matlab to speed up your calculations. Every time you use an operation like ".^" or ".*" or any matrix operation for that matter, Matlab will do them in parallel, which is much, much faster than iterating over each item.
In this case, look at what you are doing in terms of matrices. First, in your loop you are only dealing with the mth row of $V_{nm}$, which we can use as a vector for itself.
If you look at your formula carefully, you can figure out that you almost get there if you just write this row vector as a column vector and multiply the matrix $X_{nm}$ to it from the left, using standard matrix multiplication. The resulting vector contains the sums over all n. To get the final result, just sum up this vector.
function result = denominator_vectorized(V,X,m,n)
% get the part of V with the first index m
Vm = V(m,:)';
% remove the parts of X you don't want to iterate over. Note that, since I
% am inside the function, I am only editing the value of X within the scope
% of this function.
X(m,:) = 0;
X(:,n) = 0;
%do the matrix multiplication and the summation at once
result = 1-sum(X*Vm);
To show you how this optimizes your operation, I will compare it to the code proposed by another commenter:
function result = denominator(V,X,m,n)
% use the size of V to determine M and N
[M,N] = size(V);
% initialize the summed value to one (to account for one at the end)
result = 1;
% outer loop
for i=1:M
% ignore the case where m==i
if i~=m
for j=1:N
% ignore the case where n==j
if j~=n
result = result + V(m,j)*X(i,j);
end
end
end
end
The test:
V=rand(10000,10000);
X=rand(10000,10000);
disp('looped version')
tic
denominator(V,X,1,1)
toc
disp('matrix operation')
tic
denominator_vectorized(V,X,1,1)
toc
The result:
looped version
ans =
2.5197e+07
Elapsed time is 4.648021 seconds.
matrix operation
ans =
2.5197e+07
Elapsed time is 0.563072 seconds.
That is almost ten times the speed of the loop iteration. So, always look out for possible matrix operations in your code. If you have the Parallel Computing Toolbox installed and a CUDA-enabled graphics card installed, Matlab will even perform these operations on your graphics card without any further effort on your part!
EDIT: That last bit is not entirely true. You still need to take a few steps to do operations on CUDA hardware, but they aren't a lot. See Matlab documentation.

Correct implementation of The Johnson-Lindenstrauss lemma

I am trying to implement the Johnson-Lindenstrauss lemma. I have search for the pseudocode here but could not get any.
I don't know if I have implemented it correctly or not. I just want you guys who understand the lemma to please check my code for me and advice me as to the correct matlab implementation.
n = 2;
d = 4;
k = 2;
G = rand(n,d);
epsilon = sqrt(log(n)/k);
% Projection in dim k << d
% Defining P (k x d)
P = randn(k,d);
% Projecting down to k-dim
proj = P.*G;
u = proj(:,1);
v = proj(:,2);
% u = P * G(:,5);
% v = P * G(:,36);
norm(G(:,1)-G(:,2))^2 * k * (1-epsilon);
norm(u - v)^2;
norm(G(:,1)-G(:,2))^2 * k * (1+epsilon);

for the first part of that to find the epsilon you need to solve a polynomial equation.
n = 2;
k = 2;
pol1 = [-1/3 1/2 0 4*log2(n)/k];
c = roots(pol1)
1.4654 + 1.4304i
1.4654 - 1.4304i
-1.4308 + 0.0000i
Then you need to remove the complex roots and keep the real one:
epsilon = c(imag(c)==0);
% if there are more than one root with imaginary part equal to 0 then you need to select the smaller one.
now you know that the epsilon should be equal or greater that the result.

For any set of m points in R^N and for k = 20*logm/epsilon^2 and epsilon < 1/2:
1/sqrt(k).*randn(k,N)
obtain Pr[success]>=1-2m^(5*epsilon-3)

An R package is available to perform Random projection using Johnson Lindenstrauss Lemma RandPro

Improving performance of interpolation (Barycentric formula)

I have been given an assignment in which I am supposed to write an algorithm which performs polynomial interpolation by the barycentric formula. The formulas states that:
p(x) = (SIGMA_(j=0 to n) w(j)*f(j)/(x - x(j)))/(SIGMA_(j=0 to n) w(j)/(x - x(j)))
I have written an algorithm which works just fine, and I get the polynomial output I desire. However, this requires the use of some quite long loops, and for a large grid number, lots of nastly loop operations will have to be done. Thus, I would appreciate it greatly if anyone has any hints as to how I may improve this, so that I will avoid all these loops.
In the algorithm, x and f stand for the given points we are supposed to interpolate. w stands for the barycentric weights, which have been calculated before running the algorithm. And grid is the linspace over which the interpolation should take place:
function p = barycentric_formula(x,f,w,grid)
%Assert x-vectors and f-vectors have same length.
if length(x) ~= length(f)
sprintf('Not equal amounts of x- and y-values. Function is terminated.')
return;
end
n = length(x);
m = length(grid);
p = zeros(1,m);
% Loops for finding polynomial values at grid points. All values are
% calculated by the barycentric formula.
for i = 1:m
var = 0;
sum1 = 0;
sum2 = 0;
for j = 1:n
if grid(i) == x(j)
p(i) = f(j);
var = 1;
else
sum1 = sum1 + (w(j)*f(j))/(grid(i) - x(j));
sum2 = sum2 + (w(j)/(grid(i) - x(j)));
end
end
if var == 0
p(i) = sum1/sum2;
end
end

This is a classical case for matlab 'vectorization'. I would say - just remove the loops. It is almost that simple. First, have a look at this code:
function p = bf2(x, f, w, grid)
m = length(grid);
p = zeros(1,m);
for i = 1:m
var = grid(i)==x;
if any(var)
p(i) = f(var);
else
sum1 = sum((w.*f)./(grid(i) - x));
sum2 = sum(w./(grid(i) - x));
p(i) = sum1/sum2;
end
end
end
I have removed the inner loop over j. All I did here was in fact removing the (j) indexing and changing the arithmetic operators from / to ./ and from * to .* - the same, but with a dot in front to signify that the operation is performed on element by element basis. This is called array operators in contrast to ordinary matrix operators. Also note that treating the special case where the grid points fall onto x is very similar to what you had in the original implementation, only using a vector var such that x(var)==grid(i).
Now, you can also remove the outermost loop. This is a bit more tricky and there are two major approaches how you can do that in MATLAB. I will do it the simpler way, which can be less efficient, but more clear to read - using repmat:
function p = bf3(x, f, w, grid)
% Find grid points that coincide with x.
% The below compares all grid values with all x values
% and returns a matrix of 0/1. 1 is in the (row,col)
% for which grid(row)==x(col)
var = bsxfun(#eq, grid', x);
% find the logical indexes of those x entries
varx = sum(var, 1)~=0;
% and of those grid entries
varp = sum(var, 2)~=0;
% Outer-most loop removal - use repmat to
% replicate the vectors into matrices.
% Thus, instead of having a loop over j
% you have matrices of values that would be
% referenced in the loop
ww = repmat(w, numel(grid), 1);
ff = repmat(f, numel(grid), 1);
xx = repmat(x, numel(grid), 1);
gg = repmat(grid', 1, numel(x));
% perform the calculations element-wise on the matrices
sum1 = sum((ww.*ff)./(gg - xx),2);
sum2 = sum(ww./(gg - xx),2);
p = sum1./sum2;
% fix the case where grid==x and return
p(varp) = f(varx);
end
The fully vectorized version can be implemented with bsxfun rather than repmat. This can potentially be a bit faster, since the matrices are not explicitly formed. However, the speed difference may not be large for small system sizes.
Also, the first solution with one loop is also not too bad performance-wise. I suggest you test those and see, what is better. Maybe it is not worth it to fully vectorize? The first code looks a bit more readable..

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Speeding up program in matlab - performance

Related

Optimizing algorithm calculating (sin(x)-x)*x^{-3} (in matlab)

How to use the RK4 algorithm to solve an ODE?

Compute double sum in matlab efficiently?

Correct implementation of The Johnson-Lindenstrauss lemma

Improving performance of interpolation (Barycentric formula)

Categories

Resources