I've got a detector that spits out two numbers at a high rate when I turn it on. I've been capturing my data using HyperTerminal, which seems to be able to keep up with the device.
I wanted to automate the process and control the device entirely through Matlab, but discovered that less than half of the data gets through to Matlab. Are there any known issues with Matlab's speed in this area?
Here's what I'm using to read in data:
s = serial('COM1', 'BaudRate', 115200, 'DataBits', 8, 'Terminator','CR/LF', 'InputBufferSize', 1024);
T1 = 1; % Initial T1, T2 values
T2 = 10000;
timer = 300;
% Inputs to serial device: T1, T2, runtime (seconds)
fprintf(s, sprintf('%d %d %d\r', T1, T2, timer));
tdata = zeros(1e5,2,'uint16');
data = fopen(sprintf('%s.txt',date_and_trial),'w');
tic;
while toc <= timer
% Read data into an array, and write to file.
if s.BytesAvailable >= 13
line = fgets(s);
if length(line) == 13
a = sscanf(line, '%u %u');
if length(a) == 2
tdata(i,1) = a(1);
tdata(i,2) = a(2);
fprintf(data, sprintf('%d %d\r', tdata(i,1), tdata(i,2)));
i++;
end
end
else
pause(0.01);
end
end
disp(toc);
fclose(s);
fclose(data);
fprintf('Finished!\r');
I've been thinking that the conditionals might be slowing it down, but they also seem to be necessary to keep things in the '%i %i\n' format that I need. Maybe there's some way to read in all the data and process it after completion?
Since you pass 'Terminator', 'CR/LF' in the serial port open call, you can replace
if s.BytesAvailable >= 13
line = fgets(s);
if length(line) == 13
with line = fscanf(s);. fscanf will wait for the terminator, and return a whole line (assuming there aren't errors on the wire). You can also remove the else pause part. These changes should make the loop run fast enough to keep up with the serial data. I would guess that s.BytesAvailable and pause actually take much more time than you'd expect - the former because it calls out to the OS and the latter because the pause could go much longer than you specify depending on timeslicing.
Making this change introduces a new problem though: fscanf will block waiting for the terminator, meaning if the device stops sending before your toc >= timer condition becomes true, the program will hang. So you should make sure to set a sensible timeout in the serial call.
You can make two minor speedups in the body of the loop: tdata(i,:) = a'; will fill the row in one shot, and fprintf(data, '%s\r', line); will skip printf. So putting it all together:
s = serial('COM1', 'BaudRate', 115200, 'DataBits', 8, 'Terminator','CR/LF', 'InputBufferSize', 1024, 'Timeout', 3);
T1 = 1; % Initial T1, T2 values
T2 = 10000;
timer = 300;
% Inputs to serial device: T1, T2, runtime (seconds)
fprintf(s, sprintf('%d %d %d\r', T1, T2, timer));
tdata = zeros(1e5,2,'uint16');
data = fopen(sprintf('%s.txt',date_and_trial),'wt');
tic;
while toc <= timer
% Read data into an array, and write to file.
line = fscanf(s); %# waits for CR/LF terminator
a = sscanf(line, '%u %u');
if length(a) == 2
tdata(i,:) = a'; %# ' assumes sscanf returned a column vector
fprintf(data, '%s\r', line);
i++;
end
end
disp(toc);
fclose(data);
fclose(s);
fprintf('Finished!\r');
Related
I have a program I am trying to parallelize using OpenMP - it makes a very large loop over some data. Since incrementing a shared variable (so I can report progress as it goes) is somewhat of an issue, I thought I'd break the loop up into smaller chunks, loop over those multiple times, and just report the status at the end of/outside the openmp loop.
Problem is, before the OpenMP for loop starts for the 3rd time, the program locks up. Just sits there, does nothing. I've stripped out all but the simplest code. Here it is:
some other variable declarations for removed code above here
int dbl = 0;
int lasttime = 0;
int seedbase = 0;
const char *pl = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
const double mm = 62.0 / 2147483647.0;
for(dbl = 0; dbl < 2048 && !abort; dbl++) {
seedbase = dbl; //(dbl * 2097152) - 2147483648;
printf("Loop %d %d\n", dbl, abort);
#pragma omp parallel for private(seed) shared(dbl)
for(seed = 0; seed < 20971; seed++) { //52
if(dbl == 2)
printf("oo\n");
}
if(abort)
break;
lasttime = time();
hps = (double)((dbl*2097152) * clk_tck) / (double)((times(&tms) - start_time));
printf("So far: %0.2fsec (%0.2fhps) %0.2f sec left\n", (double)(times(&tms) - start_time) / (double)clk_tck, hps, (((long)1 << 32) - (dbl * 2097152)) / hps);
}
}
When compiled and run, I get:
Loop 0 0
So far: 0.02sec (0.00hps) inf sec left
Loop 1 0
So far: 0.02sec (104857600.00hps) 40.94 sec left
Loop 2 0
^C
Loop 0 starts, and the openmp runs (and does nothing) then exits, and the "So far:" is printed.
Loop 1 starts, same thing.
Loop 2 starts, and everything hangs. The printf("oo"); never happens. If I change the line to be if(dbl <= 2) my screen fills with looped "oo"'s as the loop runs.
But before the seed loop ever happens the third time - it's dead. Just sits there chewing up CPU time doing nothing.
Can you not quickly loop over a openmp loop? Is that the issue? I find it odd it's ALWAYS stopping before the 3rd run, regardless of how complex the code inside the seed loop is (I removed 200 lines of code - it had no effect)
This is a follow-up question of this question.
The following code takes an enormous amount of time to loop through. Do you have any recommendations for speeding up the process? The variable z has a size of 479x1672 and others will be around 479x12000.
z = HongKongPrices;
zmat = false(size(z));
r = size(z,1);
c = size(z,2);
for k = 1:c
for i = 5:r
if z(i,k) == z(i-4,k) && z(i,k) == z(i-3,k) && z(i,k) == z(end,k)
zmat(i-3:i,k) = 1
end
end
end
z(zmat) = NaN
I am currently running this with MatLab R2014b on an iMac with 3.2 Intel i5 and 16 GB DDR3.
You can use logical indexing here to your advantage to replace the IF-conditional statement and have a small-loop -
%// Get size parameters
[r,c] = size(z);
%// Get logical mask with ones for each column at places that satisfy the condition
%// mentioned as the IF conditional statement in the problem code
mask = z(1:r-4,:) == z(5:r,:) & z(2:r-3,:) == z(5:r,:) & ...
bsxfun(#eq,z(end,:),z(5:r,:));
%// Use logical indexing to map entire z array and set mask elements as NaNs
for k = 1:4
z([false(k,c) ; mask ; false(4-k,c)]) = NaN;
end
Benchmarking
%// Size parameters
nrows = 479;
ncols = 12000;
max_num = 10;
num_iter = 10; %// number of iterations to run each approach,
%// so that runtimes are over 1 sec mark
z_org = randi(max_num,nrows,ncols); %// random input data of specified size
disp('--------------------------------- With proposed approach')
tic
for iter = 1:num_iter
z = z_org;
[..... code from the proposed approach ...]
end
toc, clear z k mask r c
disp('--------------------------------- With original approach')
tic
for iter = 1:num_iter
z = z_org;
[..... code from the problem ...]
end
toc
Results
Case # 1: z as 479 x 1672 (num_iter = 50)
--------------------------------- With proposed approach
Elapsed time is 1.285337 seconds.
--------------------------------- With original approach
Elapsed time is 2.008256 seconds.
Case # 2: z as 479 x 12000 (num_iter = 10)
--------------------------------- With proposed approach
Elapsed time is 1.941858 seconds.
--------------------------------- With original approach
Elapsed time is 2.897006 seconds.
My question doesn't depend expressly on one snippet of code, but is more conceptual.
Unlike some programming languages, MATLAB doesn't require variables to be initialized expressly before they're used. For example, this is perfectly valid to have halfway through a script file to define 'myVector':
myVector = vectorA .* vectorB
My question is: Is it faster to initialize variables (such as 'myVector' above) to zero and then assign values to them, or to keep initializing things throughout the program?
Here's a direct comparison of what I'm talking about:
Initializing throughout:
varA = 8;
varB = 2;
varC = varA - varB;
varD = varC * varB;
Initializing at start:
varA = 8;
varB = 2;
varC = 0;
varD = 0;
varC = varA - varB;
varD = varC * varB;
On one hand, it seems a bit of a waste to have these extra lines of code for no reason. On the other hand, though, it makes a little bit of sense that it would be faster to allocate all the memory for a program at once instead of spread out over the runtime.
Does anyone have a little insight?
Copy and paste your Initializing at start: code into MATLAB Editor Window and you would get this warning that looks like this -
And if you go into the Details, you would read this -
Explanation
The code does not appear to use the assignment to the indicated variable. This situation occurs when any of the following are true:
Another assignment overwrites the value of the variable before an operation uses it.
The specified argument value contains a typographical error, causing it to appear unused.
The code does not use all values returned by a function call...
In our case, the reason for this warning is The code does not use all values. So, this clarifies that initialization/pre-allocation won't help for that case.
When should we pre-allocate?
From my experience, pre-allocation helps when you need to later on index into part of it.
Thus, if you need to index into a portion of varC to store the results, pre-allocation would help. Hence, this would make more sense -
varC = zeros(...)
varD = zeros(...)
varC(k,:) = varA - varB;
varD(k,:) = varC * varB;
Again, while indexing if you are going beyond the size of varC, MATLAB would spend time trying to allocate more memory space for it, so that would slow things a bit. So, pre-allocate output variables to the maximum size which you think would be used for storing results. But, if you don't know the size of results, you are in a catch there and have to append results into the output variable(s) and that would slow down things for sure.
Alright! I've done some tests, and here are the results.
This is the code I used for the "throughout" variable assignments:
tic;
a = 1;
b = 2;
c = 3;
d = 4;
e = a - b;
f = e + c;
g = f - a;
h = g * c;
i = h - g;
j = 9 * i;
k = [j i h];
l = any(k);
b2(numel(b2) + 1) = toc
Here's the code for the "At Start" variable assignments:
tic;
a = 1;
b = 2;
c = 3;
d = 4;
e = 0;
f = 0;
g = 0;
h = 0;
i = 0;
j = 0;
k = 0;
l = 0;
e = a - b;
f = e + c;
g = f - a;
h = g * c;
i = h - g;
j = 9 * i;
k = [j i h];
l = any(k);
b1(numel(b1) + 1) = toc
I saved the time in the vectors 'b1' and 'b2'. Each was run with only MATLAB and Chrome open, and was the only script file open inside MATLAB. Each was run 201 times. Because the first time a program is run it compiles, I disregarded the first time value for both (I'm not interested in compile time).
To find the average, I used
mean(b1(2:201))
and
mean(b2(2:201))
The results:
"Throughout": 1.634311562062418e-05 seconds (0.000016343)
"At Start": 2.832598989758290e-05 seconds (0.000028326)
Interestingly (or perhaps not, who knows) defining variables only when needed, spread throughout the program was almost twice as fast.
I don't know whether this is because of the way MATLAB allocates memory (maybe it just grabs a huge chunk and doesn't need to keep allocating more every time you define a variable?) or if the allocation speed is just so fast that it's eclipsed by the extra lines of code.
NOTE: As Divakar points out, mileage may vary when using arrays. My testing should hold true for when the size of variables doesn't change, however.
tl;dr Setting variables to zero only to change it later is slow
I am trying to create random lines and select some of them, which are really rare. My code is rather simple, but to get something that I can use I need to create very large vectors(i.e.: <100000000 x 1, tracks variable in my code). Is there any way to be able to creater larger vectors and to reduce the time needed for all those calculations?
My code is
%Initial line values
tracks=input('Give me the number of muon tracks: ');
width=1e-4;
height=2e-4;
Ystart=15.*ones(tracks,1);
Xstart=-40+80.*rand(tracks,1);
%Xend=-40+80.*rand(tracks,1);
Xend=laprnd(tracks,1,Xstart,15);
X=[Xstart';Xend'];
Y=[Ystart';zeros(1,tracks)];
b=(Ystart.*Xend)./(Xend-Xstart);
hot=0;
cold=0;
for i=1:tracks
if ((Xend(i,1)<width/2 && Xend(i,1)>-width/2)||(b(i,1)<height && b(i,1)>0))
plot(X(:, i),Y(:, i),'r');%the chosen ones!
hold all
hot=hot+1;
else
%plot(X(:, i),Y(:, i),'b');%the rest of them
%hold all
cold=cold+1;
end
end
I am also using and calling a Laplace distribution generator made my Elvis Chen which can be found here
function y = laprnd(m, n, mu, sigma)
%LAPRND generate i.i.d. laplacian random number drawn from laplacian distribution
% with mean mu and standard deviation sigma.
% mu : mean
% sigma : standard deviation
% [m, n] : the dimension of y.
% Default mu = 0, sigma = 1.
% For more information, refer to
% http://en.wikipedia.org./wiki/Laplace_distribution
% Author : Elvis Chen (bee33#sjtu.edu.cn)
% Date : 01/19/07
%Check inputs
if nargin < 2
error('At least two inputs are required');
end
if nargin == 2
mu = 0; sigma = 1;
end
if nargin == 3
sigma = 1;
end
% Generate Laplacian noise
u = rand(m, n)-0.5;
b = sigma / sqrt(2);
y = mu - b * sign(u).* log(1- 2* abs(u));
The result plot is
As you indicate, your problem is two-fold. On the one hand, you have memory issues because you need to do so many trials. On the other hand, you have performance issues, because you have to process all those trials.
Solutions to each issue often have a negative impact on the other issue. IMHO, the best approach would be to find a compromise.
More trials are only possible of you get rid of those gargantuan arrays that are required for vectorization, and use a different strategy to do the loop. I will give priority to the possibility of using more trials, possibly at the cost of optimal performance.
When I execute your code as-is in the Matlab profiler, it immediately shows that the initial memory allocation for all your variables takes a lot of time. It also shows that the plot and hold all commands are the most time-consuming lines of them all. Some more trial-and-error shows that there is a disappointingly low maximum value for the trials you can do before OUT OF MEMORY errors start appearing.
The loop can be accelerated tremendously if you know a few things about its limitations in Matlab. In older versions of Matlab, it used to be true that loops should be avoided completely in favor of 'vectorized' code. In recent versions (I believe R2008a and up), the Mathworks introduced a piece of technology called the JIT accelerator (Just-in-Time compiler) which translates M-code into machine language on the fly during execution. Simply put, the JIT accelerator allows your code to bypass Matlab's interpreter and talk much more directly with the underlying hardware, which can save a lot of time.
The advice you'll hear a lot that loops should be avoided in Matlab, is no longer generally true. While vectorization still has its value, any procedure of sizable complexity that is implemented using only vectorized code is often illegible, hard to understand, hard to change and hard to upkeep. An implementation of the same procedure that uses loops, often has none of these drawbacks, and moreover, it will quite often be faster and require less memory.
Unfortunately, the JIT accelerator has a few nasty (and IMHO, unnecessary) limitations that you'll have to learn about.
One such thing is plot; it's generally a better idea to let a loop do nothing other than collect and manipulate data, and delay any plotting commands etc. until after the loop.
Another such thing is hold; the hold function is not a Matlab built-in function, meaning, it is implemented in M-language. Matlab's JIT accelerator is not able to accelerate non-builtin functions when used in a loop, meaning, your entire loop will run at Matlab's interpretation speed, rather than machine-language speed! Therefore, also delay this command until after the loop :)
Now, in case you're wondering, this last step can make a HUGE difference -- I know of one case where copy-pasting a function body into the upper-level loop caused a 1200x performance improvement. Days of execution time had been reduced to minutes!).
There is actually another minor issue in your loop (which is really small, and rather inconvenient, I will immediately agree with) -- the name of the loop variable should not be i. The name i is the name of the imaginary unit in Matlab, and the name resolution will also unnecessarily consume time on each iteration. It's small, but non-negligible.
Now, considering all this, I've come to the following implementation:
function [hot, cold, h] = MuonTracks(tracks)
% NOTE: no variables larger than 1x1 are initialized
width = 1e-4;
height = 2e-4;
% constant used for Laplacian noise distribution
bL = 15 / sqrt(2);
% Loop through all tracks
X = [];
hot = 0;
ii = 0;
while ii <= tracks
ii = ii + 1;
% Note that I've inlined (== copy-pasted) the original laprnd()
% function call. This was necessary to work around limitations
% in loops in Matlab, and prevent the nececessity of those HUGE
% variables.
%
% Of course, you can still easily generalize all of this:
% the new data
u = rand-0.5;
Ystart = 15;
Xstart = 800*rand-400;
Xend = Xstart - bL*sign(u)*log(1-2*abs(u));
b = (Ystart*Xend)/(Xend-Xstart);
% the test
if ((b < height && b > 0)) ||...
(Xend < width/2 && Xend > -width/2)
hot = hot+1;
% growing an array is perfectly fine when the chances of it
% happening are so slim
X = [X [Xstart; Xend]]; %#ok
end
end
% This is trivial to do here, and prevents an 'else' in the loop
cold = tracks - hot;
% Now plot the chosen ones
h = figure;
hold all
Y = repmat([15;0], 1, size(X,2));
plot(X, Y, 'r');
end
With this implementation, I can do this:
>> tic, MuonTracks(1e8); toc
Elapsed time is 24.738725 seconds.
with a completely negligible memory footprint.
The profiler now also shows a nice and even distribution of effort along the code; no lines that really stand out because of their memory use or performance.
It's possibly not the fastest possible implementation (if anyone sees obvious improvements, please, feel free to edit them in). But, if you're willing to wait, you'll be able to do MuonTracks(1e23) (or higher :)
I've also done an implementation in C, which can be compiled into a Matlab MEX file:
/* DoMuonCounting.c */
#include <math.h>
#include <matrix.h>
#include <mex.h>
#include <time.h>
#include <stdlib.h>
void CountMuons(
unsigned long long tracks,
unsigned long long *hot, unsigned long long *cold, double *Xout);
/* simple little helper functions */
double sign(double x) { return (x>0)-(x<0); }
double rand_double() { return (double)rand()/(double)RAND_MAX; }
/* the gateway function */
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int
dims[] = {1,1};
const mxArray
/* Output arguments */
*hot_out = plhs[0] = mxCreateNumericArray(2,dims, mxUINT64_CLASS,0),
*cold_out = plhs[1] = mxCreateNumericArray(2,dims, mxUINT64_CLASS,0),
*X_out = plhs[2] = mxCreateDoubleMatrix(2,10000, mxREAL);
const unsigned long long
tracks = (const unsigned long long)mxGetPr(prhs[0])[0];
unsigned long long
*hot = (unsigned long long*)mxGetPr(hot_out),
*cold = (unsigned long long*)mxGetPr(cold_out);
double
*Xout = mxGetPr(X_out);
/* call the actual function, and return */
CountMuons(tracks, hot,cold, Xout);
}
// The actual muon counting
void CountMuons(
unsigned long long tracks,
unsigned long long *hot, unsigned long long *cold, double *Xout)
{
const double
width = 1.0e-4,
height = 2.0e-4,
bL = 15.0/sqrt(2.0),
Ystart = 15.0;
double
Xstart,
Xend,
u,
b;
unsigned long long
i = 0ul;
*hot = 0ul;
*cold = tracks;
/* seed the RNG */
srand((unsigned)time(NULL));
/* aaaand start! */
while (i++ < tracks)
{
u = rand_double() - 0.5;
Xstart = 800.0*rand_double() - 400.0;
Xend = Xstart - bL*sign(u)*log(1.0-2.0*fabs(u));
b = (Ystart*Xend)/(Xend-Xstart);
if ((b < height && b > 0.0) || (Xend < width/2.0 && Xend > -width/2.0))
{
Xout[0 + *hot*2] = Xstart;
Xout[1 + *hot*2] = Xend;
++(*hot);
--(*cold);
}
}
}
compile in Matlab with
mex DoMuonCounting.c
(after having run mex setup :) and then use it in conjunction with a small M-wrapper like this:
function [hot,cold, h] = MuonTrack2(tracks)
% call the MEX function
[hot,cold, Xtmp] = DoMuonCounting(tracks);
% process outputs, and generate plots
hot = uint32(hot); % circumvents limitations in 32-bit matlab
X = Xtmp(:,1:hot);
clear Xtmp
h = NaN;
if ~isempty(X)
h = figure;
hold all
Y = repmat([15;0], 1, hot);
plot(X, Y, 'r');
end
end
which allows me to do
>> tic, MuonTrack2(1e8); toc
Elapsed time is 14.496355 seconds.
Note that the memory footprint of the MEX version is slightly larger, but I think that's nothing to worry about.
The only flaw I see is the fixed maximum number of Muon counts (hard-coded as 10000 as the initial array size of Xout; needed because there are no dynamically growing arrays in standard C)...if you're worried this limit could be broken, simply increase it, change it to be equal to a fraction of tracks, or do some smarter (but more painful) dynamic array-growing tricks.
In Matlab, it is sometimes faster to vectorize rather than use a for loop. For example, this expression:
(Xend(i,1) < width/2 && Xend(i,1) > -width/2) || (b(i,1) < height && b(i,1) > 0)
which is defined for each value of i, can be rewritten in a vectorised manner like this:
isChosen = (Xend(:,1) < width/2 & Xend(:,1) > -width/2) | (b(:,1) < height & b(:,1)>0)
Expessions like Xend(:,1) will give you a column vector, so Xend(:,1) < width/2 will give you a column vector of boolean values. Note then that I have used & rather than && - this is because & performs an element-wise logical AND, unlike && which only works on scalar values. In this way you can build the entire expression, such that the variable isChosen holds a column vector of boolean values, one for each row of your Xend/b vectors.
Getting counts is now as simple as this:
hot = sum(isChosen);
since true is represented by 1. And:
cold = sum(~isChosen);
Finally, you can get the data points by using the boolean vector to select rows:
plot(X(:, isChosen),Y(:, isChosen),'r'); % Plot chosen values
hold all;
plot(X(:, ~isChosen),Y(:, ~isChosen),'b'); % Plot unchosen values
EDIT: The code should look like this:
isChosen = (Xend(:,1) < width/2 & Xend(:,1) > -width/2) | (b(:,1) < height & b(:,1)>0);
hot = sum(isChosen);
cold = sum(~isChosen);
plot(X(:, isChosen),Y(:, isChosen),'r'); % Plot chosen values
I have written my own SHA1 implementation in MATLAB, and it gives correct hashes. However, it's very slow (a string a 1000 a's takes 9.9 seconds on my Core i7-2760QM), and I think the slowness is a result of how MATLAB implements bitwise logical operations (bitand, bitor, bitxor, bitcmp) and bitwise shifts (bitshift, bitrol, bitror) of integers.
Especially I wonder the need to construct fixed-point numeric objects for bitrol and bitror using fi command, because anyway in Intel x86 assembly there's rol and ror both for registers and memory addresses of all sizes. However, bitshift is quite fast (it doesn't need any fixed-point numeric costructs, a regular uint64 variable works fine), which makes the situation stranger: why in MATLAB bitrol and bitror need fixed-point numeric objects constructed with fi, whereas bitshift does not, when in assembly level it all comes down to shl, shr, rol and ror?
So, before writing this function in C/C++ as a .mex file, I'd be happy to know if there is any way to improve the performance of this function. I know there are some specific optimizations for SHA1, but that's not the issue, if the very basic implementation of bitwise rotations is so slow.
Testing a little bit with tic and toc, it's evident that what makes it slow are the loops in with bitrol and fi. There are two such loops:
%# Define some variables.
FFFFFFFF = uint64(hex2dec('FFFFFFFF'));
%# constants: K(1), K(2), K(3), K(4).
K(1) = uint64(hex2dec('5A827999'));
K(2) = uint64(hex2dec('6ED9EBA1'));
K(3) = uint64(hex2dec('8F1BBCDC'));
K(4) = uint64(hex2dec('CA62C1D6'));
W = uint64(zeros(1, 80));
... some other code here ...
%# First slow loop begins here.
for index = 17:80
W(index) = uint64(bitrol(fi(bitxor(bitxor(bitxor(W(index-3), W(index-8)), W(index-14)), W(index-16)), 0, 32, 0), 1));
end
%# First slow loop ends here.
H = sha1_handle_block_struct.H;
A = H(1);
B = H(2);
C = H(3);
D = H(4);
E = H(5);
%# Second slow loop begins here.
for index = 1:80
rotatedA = uint64(bitrol(fi(A, 0, 32, 0), 5));
if (index <= 20)
% alternative #1.
xorPart = bitxor(D, (bitand(B, (bitxor(C, D)))));
xorPart = bitand(xorPart, FFFFFFFF);
temp = rotatedA + xorPart + E + W(index) + K(1);
elseif ((index >= 21) && (index <= 40))
% FIPS.
xorPart = bitxor(bitxor(B, C), D);
xorPart = bitand(xorPart, FFFFFFFF);
temp = rotatedA + xorPart + E + W(index) + K(2);
elseif ((index >= 41) && (index <= 60))
% alternative #2.
xorPart = bitor(bitand(B, C), bitand(D, bitxor(B, C)));
xorPart = bitand(xorPart, FFFFFFFF);
temp = rotatedA + xorPart + E + W(index) + K(3);
elseif ((index >= 61) && (index <= 80))
% FIPS.
xorPart = bitxor(bitxor(B, C), D);
xorPart = bitand(xorPart, FFFFFFFF);
temp = rotatedA + xorPart + E + W(index) + K(4);
else
error('error in the code of sha1_handle_block.m!');
end
temp = bitand(temp, FFFFFFFF);
E = D;
D = C;
C = uint64(bitrol(fi(B, 0, 32, 0), 30));
B = A;
A = temp;
end
%# Second slow loop ends here.
Measuring with tic and toc, the entire computation of SHA1 hash of message abc takes on my laptop around 0.63 seconds, of which around 0.23 seconds is passed in the first slow loop and around 0.38 seconds in the second slow loop. So is there some way to optimize those loops in MATLAB before writing a .mex file?
There's this DataHash from the MATLAB File Exchange that calculates SHA-1 hashes lightning fast.
I ran the following code:
x = 'The quick brown fox jumped over the lazy dog'; %# Just a short sentence
y = repmat('a', [1, 1e6]); %# A million a's
opt = struct('Method', 'SHA-1', 'Format', 'HEX', 'Input', 'bin');
tic, x_hashed = DataHash(uint8(x), opt), toc
tic, y_hashed = DataHash(uint8(y), opt), toc
and got the following results:
x_hashed = F6513640F3045E9768B239785625CAA6A2588842
Elapsed time is 0.029250 seconds.
y_hashed = 34AA973CD4C4DAA4F61EEB2BDBAD27316534016F
Elapsed time is 0.020595 seconds.
I verified the results with a random online SHA-1 tool, and the calculation was indeed correct. Also, the 106 a's were hashed ~1.5 times faster than the first sentence.
So how does DataHash do it so fast??? Using the java.security.MessageDigest library, no less!
If you're interested with a fast MATLAB-friendly SHA-1 function, this is the way to go.
However, if this is just an exercise for implementing fast bit-level operations, then MATLAB doesn't really handle them efficiently, and in most cases you'll have to resort to MEX.
why in MATLAB bitrol and bitror need fixed-point numeric objects constructed with fi, whereas bitshift does not
bitrol and bitror are not part of the set of bitwise logic functions that are applicable for uints. They are part of the fixed-point toolbox, which also contains variants of bitand, bitshift etc that apply to fixed-point inputs.
A bitrol could be expressed as two bitshifts, a bitand and a bitor if you want to try using only the uint-functions. That might be even slower though.
As most MATLAB functions, bitand, bitor, bitxor are vectorized. So you get a lot faster if you give these function vector input rather than calling them in a loop over each element
Example:
%# create two sets of 10k random numbers
num = 10000;
hex = '0123456789ABCDEF';
A = uint64(hex2dec( hex(randi(16, [num 16])) ));
B = uint64(hex2dec( hex(randi(16, [num 16])) ));
%# compare loop vs. vectorized call
tic
C1 = zeros(size(A), class(A));
for i=1:numel(A)
C1(i) = bitxor(A(i),B(i));
end
toc
tic
C2 = bitxor(A,B);
toc
assert(isequal(C1,C2))
The timing was:
Elapsed time is 0.139034 seconds.
Elapsed time is 0.000960 seconds.
That's an order of magnitude faster!
The problem is, and as far as I can tell, the SHA-1 computation cannot be well vectorized. So you might not be able to take advantage of such vectorization.
As an experiment, I implemented a pure MATLAB-based funciton to compute such bit operations:
function num = my_bitops(op,A,B)
%# operation to perform: not, and, or, xor
if ischar(op)
op = str2func(op);
end
%# integer class: uint8, uint16, uint32, uint64
clss = class(A);
depth = str2double(clss(5:end));
%# bit exponents
e = 2.^(depth-1:-1:0);
%# convert to binary
b1 = logical(dec2bin(A,depth)-'0');
if nargin == 3
b2 = logical(dec2bin(B,depth)-'0');
end
%# perform binary operation
if nargin < 3
num = op(b1);
else
num = op(b1,b2);
end
%# convert back to integer
num = sum(bsxfun(#times, cast(num,clss), cast(e,clss)), 2, 'native');
end
Unfortunately, this was even worse in terms of performance:
tic, C1 = bitxor(A,B); toc
tic, C2 = my_bitops('xor',A,B); toc
assert(isequal(C1,C2))
The timing was:
Elapsed time is 0.000984 seconds.
Elapsed time is 0.485692 seconds.
Conclusion: write a MEX function or search the File Exchange to see if someone already did :)