Related
Suppose I have:
A = rand(1,10,3);
B = rand(10,16);
And I want to get:
C(:,1) = A(:,:,1)*B;
C(:,2) = A(:,:,2)*B;
C(:,3) = A(:,:,3)*B;
Can I somehow multiply this in a single line so that it is faster?
What if I create new tensor b like this
for i = 1:3
b(:,:,i) = B;
end
Can I multiply A and b to get the same C but faster? Time taken in creation of b by the loop above doesn't matter since I will be needing C for many different A-s while B stays the same.
Permute the dimensions of A and B and then apply matrix multiplication:
C = B.'*permute(A, [2 3 1]);
If A is a true 3D array, something like A = rand(4,10,3) and assuming that B stays as a 2D array, then each A(:,:,1)*B would yield a 2D array.
So, assuming that you want to store those 2D arrays as slices in the third dimension of output array, C like so -
C(:,:,1) = A(:,:,1)*B;
C(:,:,2) = A(:,:,2)*B;
C(:,:,3) = A(:,:,3)*B; and so on.
To solve this in a vectorized manner, one of the approaches would be to use reshape A into a 2D array merging the first and third dimensions and then performing matrix-muliplication. Finally, to bring the output size same as the earlier listed C, we need a final step of reshaping.
The implementation would look something like this -
%// Get size and then the final output C
[m,n,r] = size(A);
out = permute(reshape(reshape(permute(A,[1 3 2]),[],n)*B,m,r,[]),[1 3 2]);
Sample run -
>> A = rand(4,10,3);
B = rand(10,16);
C(:,:,1) = A(:,:,1)*B;
C(:,:,2) = A(:,:,2)*B;
C(:,:,3) = A(:,:,3)*B;
>> [m,n,r] = size(A);
out = permute(reshape(reshape(permute(A,[1 3 2]),[],n)*B,m,r,[]),[1 3 2]);
>> all(C(:)==out(:)) %// Verify results
ans =
1
As per the comments, if A is a 3D array with always a singleton dimension at the start, you can just use squeeze and then matrix-multiplication like so -
C = B.'*squeeze(A)
EDIT: #LuisMendo points out that this is indeed possible for this specific use case. However, it is not (in general) possible if the first dimension of A is not 1.
I've grappled with this for a while now, and I've never been able to come up with a solution. Performing element-wise calculations is made nice by bsxfun, but tensor multiplication is something which is woefully unsupported. Sorry, and good luck!
You can check out this mathworks file exchange file, which will make it easier for you and supports the behavior you're looking for, but I believe that it relies on loops as well. Edit: it relies on MEX/C++, so it isn't a pure MATLAB solution if that's what you're looking for.
I have to agree with #GJSein, the for loop is really fast
time
0.7050 0.3145
Here's the timer function
function time
n = 1E7;
A = rand(1,n,3);
B = rand(n,16);
t = [];
C = {};
tic
C{length(C)+1} = squeeze(cell2mat(cellfun(#(x) x*B,num2cell(A,[1 2]),'UniformOutput',false)));
t(length(t)+1) = toc;
tic
for i = 1:size(A,3)
C{length(C)+1}(:,i) = A(:,:,i)*B;
end
t(length(t)+1) = toc;
disp(t)
end
Suppose I have an unknown vector v, and a permutation p.
How can I reconstruct v from v(p) and p?
An equivalent question would be to find a permutation q such that p(q) = [1 2 ... n]?
Since this is going to run in a tight loop, I need the answer to be vectorized (and efficient).
To find the inverse permutation I usually use:
[~,q] = sort(p);
Which is faster than the methods suggested by Divakar.
If you want the inverse permutation q of p, it won't get more efficient than:
q(p) = 1:numel(p);
You can thus reconstruct v from vp = v(p) and p via:
q(p) = 1:numel(p);
v = vp(q);
or even faster without explicitly constructing q:
v(p) = vp;
(You might have noticed that v = vp(q) corresponds to v == P^(-1)*vp and v(p) = vp corresponds to P*v == vp for appropriate permutation operators (matrices) P = sparse(1:numel(p),p,1) and P^(-1)==P.'==sparse(p,1:numel(p),1). Thus yielding the same result.)
If you use this in a loop, do however mind to properly reset q or v respectively to [] before this operation. In case of changing length of p, you would otherwise get wrong results if the new p was shorter than the old p.
With ismember -
[~,q] = ismember(1:numel(p),p)
With intersect -
[~,~,q] = intersect(1:numel(p),p)
With bsxfun -
[q,~] = find(bsxfun(#eq,[1:numel(p)],p(:)))
My question is twofold:
In the below, A = full(S) where S is a sparse matrix.
What's the "correct" way to access an element in a sparse matrix?
That is, what would the sparse equivalent to var = A(row, col) be?
My view on this topic: You wouldn't do anything different. var = S(row, col) is as efficient as it gets.
What's the "correct" way to add elements to a sparse matrix?
That is, what would the sparse equivalent of A(row, col) = var be? (Assuming A(row, col) == 0 to begin with)
It is known that simply doing A(row, col) = var is slow for large sparse matrices. From the documentation:
If you wanted to change a value in this matrix, you might be tempted
to use the same indexing:
B(3,1) = 42; % This code does work, however, it is slow.
My view on this topic: When working with sparse matrices, you often start with the vectors and use them to create the matrix this way: S = sparse(i,j,s,m,n). Of course, you could also have created it like this: S = sparse(A) or sprand(m,n,density) or something similar.
If you start of the first way, you would simply do:
i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n);
If you started out not having the vectors, you would do the same thing, but use find first:
[i, j, s] = find(S);
i = [i; new_i];
j = [j; new_j];
s = [s; new_s];
S = sparse(i,j,s,m,n);
Now you would of course have the vectors, and can reuse them if you're doing this operation several times. It would however be better to add all new elements at once, and not do the above in a loop, because growing vectors are slow. In this case, new_i, new_j and new_s will be vectors corresponding to the new elements.
MATLAB stores sparse matrices in compressed column format. This means that when you perform an operations like A(2,2) (to get the element in at row 2, column 2) MATLAB first access the second column and then finds the element in row 2 (row indices in each column are stored in ascending order). You can think of it as:
A2 = A(:,2);
A2(2)
If you are only accessing a single element of sparse matrix doing var = S(r,c) is fine. But if you are looping over the elements of a sparse matrix, you probably want to access one column at a time, and then loop over the nonzero row indices via [i,~,x]=find(S(:,c)). Or use something like spfun.
You should avoid constructing a dense matrix A and then doing S = sparse(A), as this operations just squeezes out zeros. Instead, as you note, it's much more efficient to build a sparse matrix from scratch using triplet-form and a call to sparse(i,j,x,m,n). MATLAB has a nice page which describes how to efficiently construct sparse matrices.
The original paper describing the implementation of sparse matrices in MATLAB is quite a good read. It provides some more info on how the sparse matrix algorithms were originally implemented.
EDIT: Answer modified according to suggestions by Oleg (see comments).
Here is my benchmark for the second part of your question. For testing direct insertion, the matrices are initialized empty with a varying nzmax. For testing rebuilding from index vectors this is irrelevant as the matrix is built from scratch at every call. The two methods were tested for doing a single insertion operation (of a varying number of elements), or for doing incremental insertions, one value at a time (up to the same numbers of elements). Due to the computational strain I lowered the number of repetitions from 1000 to 100 for each test case. I believe this is still statistically viable.
Ssize = 10000;
NumIterations = 100;
NumInsertions = round(logspace(0, 4, 10));
NumInitialNZ = round(logspace(1, 4, 4));
NumTests = numel(NumInsertions) * numel(NumInitialNZ);
TimeDirect = zeros(numel(NumInsertions), numel(NumInitialNZ));
TimeIndices = zeros(numel(NumInsertions), 1);
%% Single insertion operation (non-incremental)
% Method A: Direct insertion
for iInitialNZ = 1:numel(NumInitialNZ)
disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
S(r,c) = 1;
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
TimeDirect(iInsertions, iInitialNZ) = tSum;
end
end
% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
i = []; j = []; s = [];
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
s_ones = ones(NumInsertions(iInsertions), 1);
tic
i_new = [i; r];
j_new = [j; c];
s_new = [s; s_ones];
S = sparse(i_new, j_new ,s_new , Ssize, Ssize);
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
TimeIndices(iInsertions) = tSum;
end
SingleOperation.TimeDirect = TimeDirect;
SingleOperation.TimeIndices = TimeIndices;
%% Incremental insertion
for iInitialNZ = 1:numel(NumInitialNZ)
disp(['Running with initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]);
% Method A: Direct insertion
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
S = spalloc(Ssize, Ssize, NumInitialNZ(iInitialNZ));
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
for ii = 1:NumInsertions(iInsertions)
S(r(ii),c(ii)) = 1;
end
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' direct insertions: ' num2str(tSum) ' seconds']);
TimeDirect(iInsertions, iInitialNZ) = tSum;
end
end
% Method B: Rebuilding from index vectors
for iInsertions = 1:numel(NumInsertions)
tSum = 0;
for jj = 1:NumIterations
i = []; j = []; s = [];
r = randi(Ssize, NumInsertions(iInsertions), 1);
c = randi(Ssize, NumInsertions(iInsertions), 1);
tic
for ii = 1:NumInsertions(iInsertions)
i = [i; r(ii)];
j = [j; c(ii)];
s = [s; 1];
S = sparse(i, j ,s , Ssize, Ssize);
end
tSum = tSum + toc;
end
disp([num2str(NumInsertions(iInsertions)) ' indexed insertions: ' num2str(tSum) ' seconds']);
TimeIndices(iInsertions) = tSum;
end
IncremenalInsertion.TimeDirect = TimeDirect;
IncremenalInsertion.TimeIndices = TimeIndices;
%% Plot results
% Single insertion
figure;
loglog(NumInsertions, SingleOperation.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
loglog(NumInsertions, SingleOperation.TimeDirect(:, iInitialNZ));
cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for single insertion operation');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
% Incremental insertions
figure;
loglog(NumInsertions, IncremenalInsertion.TimeIndices);
cellLegend = {'Using index vectors'};
hold all;
for iInitialNZ = 1:numel(NumInitialNZ)
loglog(NumInsertions, IncremenalInsertion.TimeDirect(:, iInitialNZ));
cellLegend = [cellLegend; {['Direct insertion, initial nzmax = ' num2str(NumInitialNZ(iInitialNZ))]}];
end
hold off;
title('Benchmark for incremental insertions');
xlabel('Number of insertions'); ylabel('Runtime for 100 operations [sec]');
legend(cellLegend, 'Location', 'NorthWest');
grid on;
I ran this in MATLAB R2012a. The results for doing a single insertion operations are summarized in this graph:
This shows that using direct insertion is much slower than using index vectors, if only a single operation is done. The growth in the case of using index vectors can be either because of growing the vectors themselves or from the lengthier sparse matrix construction, I'm not sure which. The initial nzmax used to construct the matrices seems to have no effect on their growth.
The results for doing incremental insertions are summarized in this graph:
Here we see the opposite trend: using index vectors is slower, because of the overhead of incrementally growing them and rebuilding the sparse matrix at every step. A way to understand this is to look at the first point in the previous graph: for insertion of a single element, it is more effective to use direct insertion rather than rebuilding using the index vectors. In the incrementlal case, this single insertion is done repetitively, and so it becomes viable to use direct insertion rather than index vectors, against MATLAB's suggestion.
This understanding also suggests that were we to incrementally add, say, 100 elements at a time, the efficient choice would then be to use index vectors rather than direct insertion, as the first graph shows this method to be faster for insertions of this size. In between these two regimes is an area where you should probably experiment to see which method is more effective, though probably the results will show that the difference between the methods is neglibile there.
Bottom line: which method should I use?
My conclusion is that this is dependant on the nature of your intended insertion operations.
If you intend to insert elements one at a time, use direct insertion.
If you intend to insert a large (>10) number of elements at a time, rebuild the matrix from index vectors.
This is the code:
f = dsolve('D3y+12*Dy+y = 0 ,y(2) = 1 ,Dy(2) = 1, D2y(2) = -1');
feval(symengine, 'numeric::solve',strcat(char(f),'=1'),'t=-4..16','AllRealRoots')
If I remove 'AllRealRoots' option it works fast and finds a solution, but when I enable the option Matlab does not finish for an hour. Am I using a wrong numerical method?
First, straight from the documentation for numeric::solve:
If eqs is a non-polynomial/non-rational equation or a set or list containing such an equation, then the equations and the appropriate optional arguments are passed to the numerical solver numeric::fsolve.
So, as your equation f is non-polynomial, you should probably call numeric::fsolve directly. However, even with the 'MultiSolutions' it fails to return more than one root over your range (A bug perhaps? – I'm using R2013b). A workaround is to call numeric::realroots to get bounds on each of the district real roots in your range and then solve for them separately:
f = dsolve('D3y+12*Dy+y = 0 ,y(2) = 1 ,Dy(2) = 1, D2y(2) = -1');
r = feval(symengine, 'numeric::realroots', f==1, 't = -4 .. 16');
num_roots = numel(r);
T = zeros(num_roots,1); % Wrap in sym or vpa for higher precision output
syms t;
for i = 1:num_roots
bnds = r(i);
ri = feval(symengine, '_range', bnds(1), bnds(2));
s = feval(symengine, 'numeric::fsolve', f==1, t==ri);
T(i) = feval(symengine, 'rhs', s(1));
end
The resultant solution vector, T, is double-precision (allocate it as sym or vpa you want higher precision):
T =
-0.663159371123072
0.034848320470578
0.999047064621451
2.000000000000000
2.695929753727520
3.933983894260340
4.405822476913172
5.868112290810963
6.108685019679461
You may be able to remove the for loop if you can figure out how to cleanly pass in the output of 'numeric::realroots' to 'numeric::fsolve' in one go (it's doable, but may require converting stuf to strings unless you're clever).
Another (possibly even faster) approach is to switch to using the numeric (floating-point) function fzero for the second half after you bound all of the roots:
f = dsolve('D3y+12*Dy+y = 0 ,y(2) = 1 ,Dy(2) = 1, D2y(2) = -1');
r = feval(symengine, 'numeric::realroots', f==1, 't = -4 .. 16');
num_roots = numel(r);
T = zeros(num_roots,1);
g = matlabFunction(f-1); % Create anonymous function from f
for i = 1:num_roots
bnds = double(r(i));
T(i) = fzero(g,bnds);
end
I checked and, for your problem here and using the default tolerances, the resultant T is within a few times machine epsilon (eps) of the numeric::fsolve' solution.
I would like to compute the maximum of translated images along the direction of a given axis. I know about ordfilt2, however I would like to avoid using the Image Processing Toolbox.
So here is the code I have so far:
imInput = imread('tire.tif');
n = 10;
imMax = imInput(:, n:end);
for i = 1:(n-1)
imMax = max(imMax, imInput(:, i:end-(n-i)));
end
Is it possible to avoid using a for-loop in order to speed the computation up, and, if so, how?
First edit: Using Octave's code for im2col is actually 50% slower.
Second edit: Pre-allocating did not appear to improve the result enough.
sz = [size(imInput,1), size(imInput,2)-n+1];
range_j = 1:size(imInput, 2)-sz(2)+1;
range_i = 1:size(imInput, 1)-sz(1)+1;
B = zeros(prod(sz), length(range_j)*length(range_i));
counter = 0;
for j = range_j % left to right
for i = range_i % up to bottom
counter = counter + 1;
v = imInput(i:i+sz(1)-1, j:j+sz(2)-1);
B(:, counter) = v(:);
end
end
imMax = reshape(max(B, [], 2), sz);
Third edit: I shall show the timings.
For what it's worth, here's a vectorized solution using IM2COL function from the Image Processing Toolbox:
imInput = imread('tire.tif');
n = 10;
sz = [size(imInput,1) size(imInput,2)-n+1];
imMax = reshape(max(im2col(imInput, sz, 'sliding'),[],2), sz);
imshow(imMax)
You could perhaps write your own version of IM2COL as it simply consists of well crafted indexing, or even look at how Octave implements it.
Check out the answer to this question about doing a rolling median in c. I've successfully made it into a mex function and it is way faster than even ordfilt2. It will take some work to do a max, but I'm sure it's possible.
Rolling median in C - Turlach implementation