Is there a way to vectorize this for loop to speed up?
thank you
for j =1 :size(Rond_Input2Cell,1)
for k=1: size(Rond_Input2Cell,2)
Rond_Input2Cell(j,k)= (Pre_Rond_Input2Cell(j,k)*Y_FGate(k))+(net_Cell(k)*Y_InGate(k)*tmp_input(j)) ;
end
end
P.s.
Matrix size:
Rond_Input2Cell =39*120
Pre_Rond_Input2Cell = 39*120
Y_FGate=1*120 (row vector)
net_Cell=1*120 (row vector)
Y_InGate =1*120 (row vector)
tmp_input =1*39 (row vector)
You can speed up this calculation without using a for loop but instead using bsxfun which uses memory to speed up the processing
This code below perform the same function row by row and adds them
Rond_Input2Cell = bsxfun(#times,tmp_input.' ,net_Cell.*Y_InGate) + bsxfun(#times ,Pre_Rond_Input2Cell,Y_FGate);
Exlpanation :
Pre_Rond_Input2Cell(j,k)*Y_FGate(k)
This is performed by using bsxfun(#times ,Pre_Rond_Input2Cell,Y_FGate) which mutiplies each 39 rows of Pre_Rond_Input2Cell with 120 columns of Y_FGate
net_Cell(k)*Y_InGate(k)*tmp_input(j) is replaced by bsxfun(#times,tmp_input.' ,net_Cell.*Y_InGate) which mutiplies each element of tmp_input with dot mutiplication of net_Cell and Y_InGateIn the end the it is stored in Rond_Input2Cell
Here is a performance check
>> perform_check
Elapsed time is 0.000475 seconds.
Elapsed time is 0.000156 seconds.
>> perform_check
Elapsed time is 0.001089 seconds.
Elapsed time is 0.000288 seconds.
One more Method is to use repmat
tic;
Rond_Input2Cell =(Pre_Rond_Input2Cell.*repmat(Y_FGate,size(Pre_Rond_Input2Cell,1),1)) + (repmat(tmp_input.',1,size(Pre_Rond_Input2Cell,2)).*repmat(net_Cell.*Y_InGate,size(Pre_Rond_Input2Cell,1),1));
toc;
Here is a performance test with a for loop
>> perf_test
Elapsed time is 0.003268 seconds.
Elapsed time is 0.001719 seconds.
>> perf_test
Elapsed time is 0.004211 seconds.
Elapsed time is 0.002348 seconds.
>> perf_test
Elapsed time is 0.002384 seconds.
Elapsed time is 0.000509 seconds.
Here is an article by Loren on Performance of repmat vs bsxfun
Your vectorized code should be something like this.
temp_mat = tmp_input' * (net_Cell .* Y_InGate) - size (39*120)
Rond_Input2Cell = (Pre_Rond_Input2Cell .* Y_FGate) .+ temp_mat - size (39*120)
Related
I've been scratching my head at this for several hours. So i have a script that i'm calling a function 625 times but that causes lag so i want to delay each iteration of the for loop by 5 seconds. Any help would be great.
I use this little function for second-resolution delays.
function os.sleep(sec)
local now = os.time() + sec
repeat until os.time() >= now
end
EDIT: Added msec version (approximate -- not very precise)
function os.sleep(msec)
local now = os.clock() + msec/1000
repeat until os.clock() >= now
end
I have the following piece of code in Matlab
n=15
eqtocheck=randn(196584,17);
tic
others=zeros(size(eqtocheck,1),n-1);
for i=1:n-1
behavothers=eqtocheck(:,3:end);
behavothers(:,i)=[];
others(:,i)=sum(behavothers,2);
%for each kth row of eqtocheck,
%sum all elements of the row except the ith element
%and report the sum in the (k,i) element of others
end
toc
It takes me around 0.25 sec to run it with Matlab-r2015a. Could you suggest a way to reduce execution time (I cannot use parfor because it is applied to an external loop)?
Let's bsxfun it -
A = eqtocheck(:,3:end);
others = bsxfun(#minus,sum(A,2),A(:,1:end-1));
Benchmarking
Benchmarking code -
n=15;
eqtocheck=randn(196584,17);
disp('---------------- Before BSXFUNing -------------')
tic
others=zeros(size(eqtocheck,1),n-1);
for i=1:n-1
behavothers=eqtocheck(:,3:end);
behavothers(:,i)=[];
others(:,i)=sum(behavothers,2);
end
toc
disp('---------------- After BSXFUNing -------------')
tic
A = eqtocheck(:,3:end);
others_out = bsxfun(#minus,sum(A,2),A(:,1:end-1));
toc
Runtimes -
---------------- Before BSXFUNing -------------
Elapsed time is 0.759202 seconds.
---------------- After BSXFUNing -------------
Elapsed time is 0.069710 seconds.
Verify results -
>> error_val = max(abs(others(:)-others_out(:)))
error_val =
6.2172e-15
n=13
eqtocheck=randn(196584,16);
tic
others=zeros(size(eqtocheck,1),16);
for i=1:n-1
behavothers=eqtocheck(:,3:end);
behavothers(:,i)=[];
others(:,i)=sum(behavothers,2);
%for each kth row of eqtocheck,
%sum all elements of the row except the ith element
%and report the sum in the (k,i) element of others
end
toc
% using straight math with repmat to expand the matrix
tic
values = eqtocheck(:,3:end);
tempsum = sum(values, 2);
tempsum2 = repmat(tempsum, 1, n + 1);
result = tempsum2 - values;
toc
Elapsed time is 0.237134 seconds.
Elapsed time is 0.026789 seconds.
The values are not exactly the same but they are numerically equivalent. The columns are not the same between others and result but you can see which ones you want.
I get a pretty consistent time difference for small matrices in favor of max(A(:)):
>> A=rand(100); tic; max(A(:)); toc; tic; max(max(A)); toc;
Elapsed time is 0.000060 seconds.
Elapsed time is 0.000083 seconds.
but for large matrices, the time difference is inconsistent:
>> A=rand(1e3); tic; max(A(:)); toc; tic; max(max(A)); toc;
Elapsed time is 0.001072 seconds.
Elapsed time is 0.001103 seconds.
>> A=rand(1e3); tic; max(A(:)); toc; tic; max(max(A)); toc;
Elapsed time is 0.000847 seconds.
Elapsed time is 0.000792 seconds.
same for larger,
>> A = rand(1e4); tic; max(A(:)); toc; tic; max(max(A)); toc;
Elapsed time is 0.049073 seconds.
Elapsed time is 0.050206 seconds.
>> A = rand(1e4); tic; max(A(:)); toc; tic; max(max(A)); toc;
Elapsed time is 0.072577 seconds.
Elapsed time is 0.060357 seconds.
Why is there a difference and what would be the best practice?
As horchler says this is machine dependent. However, on my machine I saw a clear performance decrease for the max(max(max(... for higher dimensions. I also saw a slight (but consistent) advantage in speed for max(A(:)) for a more sorted type o matrix as the toeplitz matrix. Still, for the test case that you tried I saw hardly any difference.
Also max(max(max(... is error prone due to all the paranthesis I would prefer the max(A(:)). The execution time for this function seems to be stable for all dimensions, which means that it is easy to know how much time this function takes to execute.
Thirdly: The function max seems to be very fast and this mean that the performance should be a minor issue here. This means that max(A(:)) would be preferred in this case for its readability.
So as a conclusion, I would prefer max(A(:)), but if you think that max(max(A)) is clearer you could probably use this.
On my machine there are no differences in times that are really worth worrying about.
n = 2:0.2:4;
for i = 1:numel(n)
a = rand(floor(10^n(i)));
t1(i) = timeit(#()max(a(:)));
t2(i) = timeit(#()max(max(a)));
end
>> t1
t1 =
Columns 1 through 7
7.4706e-06 1.5349e-05 3.1569e-05 2.803e-05 5.6141e-05 0.00041006 0.0011328
Columns 8 through 11
0.0027755 0.006876 0.0171 0.042889
>> t2
t2 =
Columns 1 through 7
1.1959e-05 2.2539e-05 2.3641e-05 4.1313e-05 7.6301e-05 0.00040654 0.0011396
Columns 8 through 11
0.0027885 0.0068966 0.01718 0.042997
I have the following data:
a cell array of labels (e.g. a cell array of 4 options of types of messages where each type is a string)
an cell array of messages (e.g. a cell array of 5000 messages where each message is a cell array of many words strings).
an cell array of labels for each message (e.g. a cell array of 5000 strings where string in cell i is type of message in cell i in array in part 2).
My goal is to get from this data a cell array of size as of num of labels where in each cell there is concatenated contents from all the messages of type as the label (e.g. get a cell array of 4 cells where in cell i there is a cell array of all the words from all the messages that their type is i).
I implemented 3 method to perform this. This is the code for my 3 implementations:
%...............................................................
% setting data for tic toc tests
messagesTypesOptions = {'type1';'type2';'type3';'type4'};
messages = cell(5000,1);
for i = 1:5000
messages{i} = {'word1';'word2';'word3';'word4';'word5';'word6';'word7';'word8';'word9';'word10'};
end
messages_labels = cell(5000,1);
for i = 1:5000
messages_labels{i} = messagesTypesOptions{randi([1 4])};
end
%...............................................................
% start test
% method 1
type_to_msgs1 = cell(size(messagesTypesOptions,1),1);
tic
for i = 1:size(messagesTypesOptions,1)
type_to_msgs1{i} = messages(strcmp(messages_labels,messagesTypesOptions{i}));
end
type_to_concatenated1 = cell(4,1);
for i = 1:4
type_to_msgs1{i} = type_to_msgs1{i}';
end
for i =1:4
label_msgs = type_to_msgs1{i};
num_of_label_msgs = size(label_msgs,2);
for j = 1: num_of_label_msgs
label_msgs{j} = label_msgs{j}';
end
type_to_concatenated1{i} = [label_msgs{:}];
end
toc
% method 2
type_to_concatenated2 = cell(4,1);
tic
labelStr_to_labelIndex = containers.Map(messagesTypesOptions,1:4);
for textIndex = 1:5000
type_to_concatenated2{labelStr_to_labelIndex(messages_labels{textIndex})} = ...
[type_to_concatenated2{labelStr_to_labelIndex(messages_labels{textIndex})},...
messages{textIndex}'];
end
toc
% method 3
type_to_concatenated3 = cell(4,1);
tic
labelStr_to_labelIndex2 = containers.Map(messagesTypesOptions,1:4);
matrix_label_to_isMsgFromLabel = zeros(4,5000);
for textIndex = 1:5000
matrix_label_to_isMsgFromLabel(labelStr_to_labelIndex2(messages_labels{textIndex})...
,textIndex) = 1;
end
for i = 1:4
label_msgs3 = messages(~~matrix_label_to_isMsgFromLabel(i,:))';
num_of_label_msgs3 = size(label_msgs3,2);
for j = 1: num_of_label_msgs3
label_msgs3{j} = label_msgs3{j}';
end
type_to_concatenated3{i} = [label_msgs3{:}];
end
toc
Those are the results I get:
Elapsed time is 0.033120 seconds.
Elapsed time is 0.471959 seconds.
Elapsed time is 0.095011 seconds.
So, the conclusion is that method 1 is the fastest.
Now, my question is: Is there a way to solve this in a faster way?
Intuitively, it seams that my method1 is not very efficient because it has a for loop with strcmp and the strcmp is reading all the messages, so it is reading num of labels times all the messages, i.e reading num of labels (types) the same thing.
So, is there a way to modify one of my methods to get faster solution? Is there another method which is faster?
EDIT: Here I used for the examples constant messages. But, I want a solution for the case that the messages are different from each other and can be of different size.
EDIT2: Also, the types are strings that don't necessarily has numbers in them. (e.g. instead of type1,type2,... that I used for the example code, it can be 'error', 'warning', 'valid').
Basically you have messages and need to index into them to get output for each cell of the output cell array and finally concatenate the elements. For indexing you can use logical indexing which in most cases is very efficient. For getting the logical indexing arrays, you can take help of bsxfun. Here's the code to wrap up the discussion -
%// Get the parameters
lbls_len = numel(messages_labels);
msgtypeops_len = numel(messagesTypesOptions);
%// Tag messages_labels and messagesTypesOptions with numbers
alltypes = [messages_labels ; messagesTypesOptions];
[~,~,IDs] = unique(alltypes,'stable');
lbls = IDs(1:lbls_len);
typeops = IDs(lbls_len+1:end);
%// Positions of matches for each label IDs against type IDS
pos = bsxfun(#eq,lbls,typeops'); %//'
%// Logically index into messages and select the ones based on positions
%// obtained in the previous step for the final output and finally
%// concatenate along the rows to get us the final output cell array
out = arrayfun(#(n) vertcat(messages{pos(:,n)})',1:msgtypeops_len,'Uni',0)';
Benchmarking
Here are some runtimes comparing Method - 1 that turned out to be best one as listed in the question against the proposed solution.
1) With length of messages_labels as 5000:
------------------ With Method - 1
Elapsed time is 0.072821 seconds.
------------------ With Proposed solution
Elapsed time is 0.053961 seconds.
2) With length of messages_labels as 500000:
------------------ With Method - 1
Elapsed time is 6.998149 seconds.
------------------ With Proposed solution
Elapsed time is 2.765090 seconds.
An almost 1.5x-2.5x speeedup might be good enough for you!
As ever, this boils down to a simple indexing problem, and for cell arrays of strings MATLAB has a nice way to generate those indices: ismember. There might be a clever way to then use that index vector to pull all the messages out in one go, but logical indexing is easy and quick enough, and JIT magic actually makes the trivial loop faster than arrayfun (using R2013b on Linux). That gives us this:
tic
out = cell(4,1);
[~, idx] = ismember(messages_labels, messagesTypesOptions);
for ii=1:4
out{ii} = vertcat(messages{idx == ii})';
end
toc
With the above added to the end of the original code:
>> test
Elapsed time is 0.056497 seconds.
Elapsed time is 0.857934 seconds.
Elapsed time is 0.201966 seconds.
Elapsed time is 0.017667 seconds.
Not bad :D
Replace all the 5000's with 50000's and it still scales linearly like #1 and #3:
>> test
Elapsed time is 0.550462 seconds.
Elapsed time is 48.685048 seconds.
Elapsed time is 1.965559 seconds.
Elapsed time is 0.162989 seconds.
Just to be sure:
>> isequal(type_to_concatenated1, type_to_concatenated2, type_to_concatenated3, out)
ans =
1
And, if you can handle the grouped messages being column vectors rather than rows, take out the transpose...
...
out{ii} = vertcat(messages{idx == ii});
...
...and it's twice as fast again:
>> test
Elapsed time is 0.552040 seconds.
Elapsed time is <skipped>
Elapsed time is 1.986059 seconds.
Elapsed time is 0.077958 seconds.
What would be the best way to manage large number of instances of the same class in MATLAB?
Using the naive way produces absymal results:
classdef Request
properties
num=7;
end
methods
function f=foo(this)
f = this.num + 4;
end
end
end
>> a=[];
>> tic,for i=1:1000 a=[a Request];end;toc
Elapsed time is 5.426852 seconds.
>> tic,for i=1:1000 a=[a Request];end;toc
Elapsed time is 31.261500 seconds.
Inheriting handle drastically improve the results:
classdef RequestH < handle
properties
num=7;
end
methods
function f=foo(this)
f = this.num + 4;
end
end
end
>> tic,for i=1:1000 a=[a RequestH];end;toc
Elapsed time is 0.097472 seconds.
>> tic,for i=1:1000 a=[a RequestH];end;toc
Elapsed time is 0.134007 seconds.
>> tic,for i=1:1000 a=[a RequestH];end;toc
Elapsed time is 0.174573 seconds.
but still not an acceptable performance, especially considering the increasing reallocation overhead
Is there a way to preallocate class array? Any ideas on how to manage lange quantities of object effectively?
Thanks,
Dani
Coming to this late, but would this not be another solution?
a = Request.empty(1000,0); tic; for i=1:1000, a(i)=Request; end; toc;
Elapsed time is 0.087539 seconds.
Or even better:
a(1000, 1) = Request;
Elapsed time is 0.019755 seconds.
This solution expands on Marc's answer. Use repmat to initialize an array of RequestH objects and then use a loop to create the desired objects:
>> a = repmat(RequestH,10000,1);tic,for i=1:10000 a(i)=RequestH;end;toc
Elapsed time is 0.396645 seconds.
This is an improvement over:
>> a=[];tic,for i=1:10000 a=[a RequestH];end;toc
Elapsed time is 2.313368 seconds.
repmat is your friend:
b = repmat(Request, 1000, 1);
Elapsed time is 0.056720 seconds
b = repmat(RequestH, 1000, 1);
Elapsed time is 0.021749 seconds.
Growing by appending is abysmally slow, which is why mlint calls it out.