Vectorization of matlab code - performance

i'm kinda new to vectorization. Have tried myself but couldn't. Can somebody help me vectorize this code as well as give a short explaination on how u do it, so that i can adapt the thinking process too. Thanks.
function [result] = newHitTest (point,Polygon,r,tol,stepSize)
%This function calculates whether a point is allowed.
%First is a quick test is done by calculating the distance from point to
%each point of the polygon. If that distance is smaller than range "r",
%the point is not allowed. This will slow down the algorithm at some
%points, but will greatly speed it up in others because less calls to the
%circleTest routine are needed.
polySize=size(Polygon,1);
testCounter=0;
for i=1:polySize
d = sqrt(sum((Polygon(i,:)-point).^2));
if d < tol*r
testCounter=1;
break
end
end
if testCounter == 0
circleTestResult = circleTest (point,Polygon,r,tol,stepSize);
testCounter = circleTestResult;
end
result = testCounter;

Given the information that Polygon is 2 dimensional, point is a row vector and the other variables are scalars, here is the first version of your new function (scroll down to see that there are lots of ways to skin this cat):
function [result] = newHitTest (point,Polygon,r,tol,stepSize)
result = 0;
linDiff = Polygon-repmat(point,size(Polygon,1),1);
testLogicals = sqrt( sum( ( linDiff ).^2 ,2 )) < tol*r;
if any(testLogicals); result = circleTest (point,Polygon,r,tol,stepSize); end
The thought process for vectorization in Matlab involves trying to operate on as much data as possible using a single command. Most of the basic builtin Matlab functions operate very efficiently on multi-dimensional data. Using for loop is the reverse of this, as you are breaking your data down into smaller segments for processing, each of which must be interpreted individually. By resorting to data decomposition using for loops, you potentially loose some of the massive performance benefits associated with the highly optimised code behind the Matlab builtin functions.
The first thing to think about in your example is the conditional break in your main loop. You cannot break from a vectorized process. Instead, calculate all possibilities, make an array of the outcome for each row of your data, then use the any keyword to see if any of your rows have signalled that the circleTest function should be called.
NOTE: It is not easy to efficiently conditionally break out of a calculation in Matlab. However, as you are just computing a form of Euclidean distance in the loop, you'll probably see a performance boost by using the vectorized version and calculating all possibilities. If the computation in your loop were more expensive, the input data were large, and you wanted to break out as soon as you hit a certain condition, then a matlab extension made with a compiled language could potentially be much faster than a vectorized version where you might be performing needless calculation. However this is assuming that you know how to program code that matches the performance of the Matlab builtins in a language that compiles to native code.
Back on topic ...
The first thing to do is to take the linear difference (linDiff in the code example) between Polygon and your row vector point. To do this in a vectorized manner, the dimensions of the 2 variables must be identical. One way to achieve this is to use repmat to copy each row of point to make it the same size as Polygon. However, bsxfun is usually a superior alternative to repmat (as described in this recent SO question), making the code ...
function [result] = newHitTest (point,Polygon,r,tol,stepSize)
result = 0;
linDiff = bsxfun(#minus, Polygon, point);
testLogicals = sqrt( sum( ( linDiff ).^2 ,2 )) < tol*r;
if any(testLogicals); result = circleTest (point,Polygon,r,tol,stepSize); end
I rolled your d value into a column of d by summing across the 2nd axis (note the removal of the array index from Polygon and the addition of ,2 in the sum command). I then went further and evaluated the logical array testLogicals inline with the calculation of the distance measure. You will quickly see that a downside of heavy vectorisation is that it can make the code less readable to those not familiar with Matlab, but the performance gains are worth it. Comments are pretty necessary.
Now, if you want to go completely crazy, you could argue that the test function is so simple now that it warrants use of an 'anonymous function' or 'lambda' rather than a complete function definition. The test for whether or not it is worth doing the circleTest does not require the stepSize argument either, which is another reason for perhaps using an anonymous function. You can roll your test into an anonymous function and then jut use circleTest in your calling script, making the code self documenting to some extent . . .
doCircleTest = #(point,Polygon,r,tol) any(sqrt( sum( bsxfun(#minus, Polygon, point).^2, 2 )) < tol*r);
if doCircleTest(point,Polygon,r,tol)
result = circleTest (point,Polygon,r,tol,stepSize);
else
result = 0;
end
Now everything is vectorised, the use of function handles gives me another idea . . .
If you plan on performing this at multiple points in the code, the repetition of the if statements would get a bit ugly. To stay dry, it seems sensible to put the test with the conditional function into a single function, just as you did in your original post. However, the utility of that function would be very narrow - it would only test if the circleTest function should be executed, and then execute it if needs be.
Now imagine that after a while, you have some other conditional functions, just like circleTest, with their own equivalent of doCircleTest. It would be nice to reuse the conditional switching code maybe. For this, make a function like your original that takes a default value, the boolean result of the computationally cheap test function, and the function handle of the expensive conditional function with its associated arguments ...
function result = conditionalFun( default, cheapFunResult, expensiveFun, varargin )
if cheapFunResult
result = expensiveFun(varargin{:});
else
result = default;
end
end %//of function
You could call this function from your main script with the following . . .
result = conditionalFun(0, doCircleTest(point,Polygon,r,tol), #circleTest, point,Polygon,r,tol,stepSize);
...and the beauty of it is you can use any test, default value, and expensive function. Perhaps a little overkill for this simple example, but it is where my mind wandered when I brought up the idea of using function handles.

Related

Poor performance in matlab

So I had to write a program in Matlab to calculate the convolution of two functions, manually. I wrote this simple piece of code that I know is not that optimized probably:
syms recP(x);
recP(x) = rectangularPulse(-1,1,x);
syms triP(x);
triP(x) = triangularPulse(-1,1,x);
t = -10:0.1:10;
s1 = -10:0.1:10;
for i = 1:201
s1(i) = 0;
for j = t
s1(i) = s1(i) + ( recP(j) * triP(t(i)-j) );
end
end
plot(t,s1);
I have a core i7-7700HQ coupled with 32 GB of RAM. Matlab is stored on my HDD and my Windows is on my SSD. The problem is that this simple code is taking I think at least 20 minutes to run. I have it in a section and I don't run the whole code. Matlab is only taking 18% of my CPU and 3 GB of RAM for this task. Which is I think probably enough, I don't know. But I don't think it should take that long.
Am I doing anything wrong? I've searched for how to increase the RAM limit of Matlab, and I found that it is not limited and it takes how much it needs. I don't know if I can increase the CPU usage of it or not.
Is there any solution to how make things a little bit faster? I have like 6 or 7 of these for loops in my homework and it takes forever if I run the whole live script. Thanks in advance for your help.
(Also, it highlights the piece of code that is currently running. It is the for loop, the outer one is highlighted)
Like Ander said, use the symbolic toolbox in matlab as a last resort. Additionally, when trying to speed up matlab code, focus on taking advantage of matlab's vectorized operations. What I mean by this is matlab is very efficient at performing operations like this:
y = x.*z;
where x and z are some Nx1 vectors each and the operator '.*' is called 'dot multiplication'. This is essentially telling matlab to perform multiplication on x1*z1, x[2]*z[2] .... x[n]*z[n] and assign all the values to the corresponding value in the vector y. Additionally, many of the functions in matlab are able to accept vectors as inputs and perform their operations on each element and return an equal size vector with the output at each element. You can check this for any given function by scrolling down in its documentation to the inputs and outputs section and checking what form of array the inputs and outputs can take. For example, rectangularPulse's documentation says it can accept vectors as inputs. Therefore, you can simplify your inner loop to this:
s1(i) = s1(i) + ( rectangularPulse(-1,1,t) * triP(t(i)-t) );
So to summarize:
Avoid the symbolic toolbox in matlab until you have a better handle of what you're doing or you absolutely have to use it.
Use matlab's ability to handle vectors and arrays very well.
Deconstruct any nested loops you write one at a time from the inside out. Usually this dramatically accelerates matlab code especially when you are new to writing it.
See if you can even further simplify the code and get rid of your outer loop as well.

construct a structured matrix efficiently in fortran

Having left Fortran for several years, now I have to pick it up and start to work with it again.
I'd like to construct a matrix with entry(i,j) in the form f(x_i,y_j), where f is a function of two variables, e.g., f(x,y)=cos(x-y). In Matlab or Python(Numpy), there are efficient ways to handle this kind of specific issue. I wonder whether there is such optimization in Fortran.
BTW, is it also true in Fortran that a vectorized operation is faster than a do/for loop (as is the case in Matlab and Numpy) ?
If you mean by vectorized the same as you mean in Matlab and Python, the short form you call on whole array then no, these forms are often slower, because they mey be harder to optimize than simple loops. What is faster is when the compiler actually uses the vector instructions of the CPU, but that is something else. And it is easier for the compiler to use them for simple loops.
Fortran has elemental functions, do concurrent, forall and where constructs, implied loops and array constructors. There is no point repeating them here, they have been described many times on this site or in tutorials.
Your example is most simply done using a loop
do j = 1, ny
do i = 1, nx
entry(i,j) = f(x(i), y(j))
end do
end do
One of the short ways, you probably meant by Python-like vectorization, would be the whole-array operations, e.g.,
A = cos(B)
C = A * B
D = f(A*B)
and similar. The function (which is called on each element of the array), must be elemental. These operations are not necessarily efficient. For example, the last call may require a temporary array to be created, which would be avoided when using a loop.

Faster way of testing a condition in MATLAB

I need to run many many tests of the form a<0 where a is a vector (a relatively short one). I am currently doing it with
all(v<0)
Is there a faster way?
Not sure which one will be faster (that may depend on the machine and Matlab version), but here are some alternatives to all(v<0):
~any(v>0)
nnz(v>=0)==0 %// Or ~nnz(v>=0)
sum(v>=0)==0 %// Or ~sum(v>=0)
isempty(find(v>0, 1)) %// Or isempty(find(v>0))
I think the issue is that the conditional is executed on all elements of the array first, then the condition is tested... That is, for the test "any(v<0)", matlab does the following I believe:
Step 1: compute v<0 for every element of v
Step 2: search through the results of step 1 for a true value
So even if the first element of v is less than zero, the conditional was first computed for all elements, hence wasting a lot of time. I think this is also true for any of the alternative solutions offered above.
I don't know of a faster way to do it easily, but wish I did. In some cases, breaking the array v up into smaller chunks and testing incrementally could speed things up, particularly if the condition is common. For example:
function result = anyLessThanZero(v);
w = v(:);
result = true;
for i=1:numel(w)
if ( w(i) < 0 )
return;
end
end
result = false;
end
but that can be very inefficient if the condition is rare. (If you were to really do this, there is probably a better way than I illustrate above to handle any condition, not just <0, but I show it this way to make it clear).

How to run a method in parallel using Julia?

I was reading Parallel Computing docs of Julia, and having never done any parallel coding, I was left wanting a gentler intro. So, I thought of a (probably) simple problem that I couldn't figure out how to code in parallel Julia paradigm.
Let's say I have a matrix/dataframe df from some experiment. Its N rows are variables, and M columns are samples. I have a method pwCorr(..) that calculates pairwise correlation of rows. If I wanted an NxN matrix of all the pairwise correlations, I'd probably run a for-loop that'd iterate for N*N/2 (upper or lower triangle of the matrix) and fill in the values; however, this seems like a perfect thing to parallelize since each of the pwCorr() calls are independent of others. (Am I correct in thinking this way about what can be parallelized, and what cannot?)
To do this, I feel like I'd have to create a DArray that gets filled by a #parallel for loop. And if so, I'm not sure how this can be achieved in Julia. If that's not the right approach, I guess I don't even know where to begin.
This should work, first you need to propagate the top level variable (data) to all the workers:
for pid in workers()
remotecall(pid, x->(global data; data=x; nothing), data)
end
then perform the computation in chunks using the DArray constructor with some fancy indexing:
corrs = DArray((20,20)) do I
out=zeros(length(I[1]),length(I[2]))
for i=I[1], j=I[2]
if i<j
out[i-minimum(I[1])+1,j-minimum(I[2])+1]= 0.0
else
out[i-minimum(I[1])+1,j-minimum(I[2])+1] = cor(vec(data[i,:]), vec(data[j,:]))
end
end
out
end
In more detail, the DArray constructor takes a function which takes a tuple of index ranges and returns a chunk of the resulting matrix which corresponds to those index ranges. In the code above, I is the tuple of ranges with I[1] being the first range. You can see this more clearly with:
julia> DArray((10,10)) do I
println(I)
return zeros(length(I[1]),length(I[2]))
end
From worker 2: (1:10,1:5)
From worker 3: (1:10,6:10)
where you can see it split the array into two chunks on the second axis.
The trickiest part of the example was converting from these 'global' index ranges to local index ranges by subtracting off the minimum element and then adding back 1 for the 1 based indexing of Julia.
Hope that helps!

Unexpected slowdown of function that modifies array in-place

This bug is due to Matlab being too smart for its own good.
I have something like
for k=1:N
stats = subfun(E,k,stats);
end
where statsis a 1xNarray, N=5000 say, and subfun calculates stats(k)from E, and fills it into stats
function stats = subfun(E,k,stats)
s = mean(E);
stats(k) = s;
end
Of course, there is some overhead in passing a large array back and forth, only to fill in one of its elements. In my case, however, the overhead is negligable, and I prefer this code instead of
for k=1:N
s = subfun(E,k);
stats(k) = s;
end
My preference is because I actually have a lot more assignments than just stats.
Also some of the assignments are actually a good deal more complicated.
As mentioned, the overhead is negligable. But, if I do something trivial, like this inconsequential if-statement
for k=1:N
i = k;
if i>=1
stats = subfun(E,i,stats);
end
end
the assignments that take place inside subfun then suddenly takes "forever" (it increases much faster than linearly with N). And it's the assignment, not the calculation that takes forever. In fact, it is even worse than the following nonsensical subfun
function stats = subfun(E,k,stats)
s = calculation_on_E(E);
clear stats
stats(k) = s;
end
which requires re-allocation of stats every time.
Does anybody have the faintest idea why this happens?
This might be due to some obscure detail of Matlab's JIT. The JIT of recent versions of Matlab knows not to create a new array, but to do modifications in-place in some limited cases. One of the requirements is that the function is defined as
function x = modify_big_matrix(x, i, j)
x(i, j) = 123;
and not as
function x_out = modify_big_matrix(x_in, i, j)
x_out = x_in;
x_out(i, j) = 123;
Your examples seem to follow this rule, so, as Praetorian mentioned, your if statement might prevent the JIT from recognizing that it is an in-place operation.
If you really need to speed up your algorithm, it is possible to modify arrays in-place using your own mex-functions. I have successfully used this trick to gain a factor of 4 speedup on some medium sized arrays (order 100x100x100 IIRC). This is however not recommended, could segfault Matlab if you are not careful and might stop working in future versions.
As discussed by others, the problem almost certainly lies with JIT and its relatively fragile ability to modify in place.
As mentioned, I really prefer the first form of the function call and assignments, although other workable solutions have been suggested. Without relying on JIT, the only way this can be efficient (as far as I can see) is some form of passing by reference.
Therefore I made a class Stats that inherits from handle, and which contains the data array for k=1:N. It is then passed by reference.
For future reference, this seems to work very well, with good performance, and I'm currently using it as my working solution.

Resources