for example: the period is 10ms and i need to find the frequency it would look like this: f= 1/t = 1/10ms = 100Hz because (10ms=.01 seconds, so its really 1/.01=100)
I understand that but when it changes to a larger unit such as GHz i get confused.
Example: if the frequency of a clock pulse is 2GHz, what is the period of the clock pulse?
My thoughts: F= 2GHz= 2 000 000 000Hz
T= 1/F = 1/2 000 000 000?? i somehow doubt this is even close to the answer. Could someone explain this one to me? Thanks a million.
The period is correct as you calculated it.
P = 1/F
P = 1/2GHz = 1/(2×10^9)
P = 0.0000000005s (second)
P = 0.0000005ms (millisecond)
P = 0.0005us (microsecond)
P = 0.5ns (nanosecond)
P = 500ps (picosecond)
Another way to look at it is 2Ghz is 2×10^7 times faster than 100hz so the period should be 2×10^7 times shorter
2Ghz/100Hz = 2×10^7
10ms/(2×10^7) = 500ps
Related
I am doing some online exercises, and am stuck as the last test keeps failing.
The task involves creating a program to calculate the number of cars produced by a factory given the factory's speed. The rate per hour is equal to the speed of the factory (between 1-10) * 221. However, the speed effects the quality of the cars produced exponentially, so discounts are made depending on the speed, which were provided in the specification.
The code below is used to calculate how many whole and working cars can be produced per minute by a factory of a given speed between 1-10.
I have used the .floor method to ensure the answer is rounded down to the nearest whole number. All the tests are passing, except for when I enter 10. In this case, the code returns 27, not 28.
class AssemblyLine
def initialize(speed)
#speed = speed
end
def working_items_per_minute
rate = #speed * 221 / 60
if #speed <= 4
rate = rate
elsif #speed <= 8
rate = rate * 0.9
elsif #speed == 9
rate = rate * 0.8
else
rate = rate * 0.77
end
return rate.floor
end
end
Any ideas on why it is returning 27 would be appreciated! (221*10/60 *0.77 should be rounded down to 28!)
The problem you're having is because ruby will deal in whole numbers until you introduce a Float.
It's easy to see what's going on if you launch a repl like irb or pry:
$ pry
pry(main)> 2210 / 60
=> 36
pry(main)> 2210 / 60.0
=> 36.833333333333336
pry(main)> 60.class
=> Integer
pry(main)> 60.0.class
=> Float
I have the following data:
a cell array of labels (e.g. a cell array of 4 options of types of messages where each type is a string)
an cell array of messages (e.g. a cell array of 5000 messages where each message is a cell array of many words strings).
an cell array of labels for each message (e.g. a cell array of 5000 strings where string in cell i is type of message in cell i in array in part 2).
My goal is to get from this data a cell array of size as of num of labels where in each cell there is concatenated contents from all the messages of type as the label (e.g. get a cell array of 4 cells where in cell i there is a cell array of all the words from all the messages that their type is i).
I implemented 3 method to perform this. This is the code for my 3 implementations:
%...............................................................
% setting data for tic toc tests
messagesTypesOptions = {'type1';'type2';'type3';'type4'};
messages = cell(5000,1);
for i = 1:5000
messages{i} = {'word1';'word2';'word3';'word4';'word5';'word6';'word7';'word8';'word9';'word10'};
end
messages_labels = cell(5000,1);
for i = 1:5000
messages_labels{i} = messagesTypesOptions{randi([1 4])};
end
%...............................................................
% start test
% method 1
type_to_msgs1 = cell(size(messagesTypesOptions,1),1);
tic
for i = 1:size(messagesTypesOptions,1)
type_to_msgs1{i} = messages(strcmp(messages_labels,messagesTypesOptions{i}));
end
type_to_concatenated1 = cell(4,1);
for i = 1:4
type_to_msgs1{i} = type_to_msgs1{i}';
end
for i =1:4
label_msgs = type_to_msgs1{i};
num_of_label_msgs = size(label_msgs,2);
for j = 1: num_of_label_msgs
label_msgs{j} = label_msgs{j}';
end
type_to_concatenated1{i} = [label_msgs{:}];
end
toc
% method 2
type_to_concatenated2 = cell(4,1);
tic
labelStr_to_labelIndex = containers.Map(messagesTypesOptions,1:4);
for textIndex = 1:5000
type_to_concatenated2{labelStr_to_labelIndex(messages_labels{textIndex})} = ...
[type_to_concatenated2{labelStr_to_labelIndex(messages_labels{textIndex})},...
messages{textIndex}'];
end
toc
% method 3
type_to_concatenated3 = cell(4,1);
tic
labelStr_to_labelIndex2 = containers.Map(messagesTypesOptions,1:4);
matrix_label_to_isMsgFromLabel = zeros(4,5000);
for textIndex = 1:5000
matrix_label_to_isMsgFromLabel(labelStr_to_labelIndex2(messages_labels{textIndex})...
,textIndex) = 1;
end
for i = 1:4
label_msgs3 = messages(~~matrix_label_to_isMsgFromLabel(i,:))';
num_of_label_msgs3 = size(label_msgs3,2);
for j = 1: num_of_label_msgs3
label_msgs3{j} = label_msgs3{j}';
end
type_to_concatenated3{i} = [label_msgs3{:}];
end
toc
Those are the results I get:
Elapsed time is 0.033120 seconds.
Elapsed time is 0.471959 seconds.
Elapsed time is 0.095011 seconds.
So, the conclusion is that method 1 is the fastest.
Now, my question is: Is there a way to solve this in a faster way?
Intuitively, it seams that my method1 is not very efficient because it has a for loop with strcmp and the strcmp is reading all the messages, so it is reading num of labels times all the messages, i.e reading num of labels (types) the same thing.
So, is there a way to modify one of my methods to get faster solution? Is there another method which is faster?
EDIT: Here I used for the examples constant messages. But, I want a solution for the case that the messages are different from each other and can be of different size.
EDIT2: Also, the types are strings that don't necessarily has numbers in them. (e.g. instead of type1,type2,... that I used for the example code, it can be 'error', 'warning', 'valid').
Basically you have messages and need to index into them to get output for each cell of the output cell array and finally concatenate the elements. For indexing you can use logical indexing which in most cases is very efficient. For getting the logical indexing arrays, you can take help of bsxfun. Here's the code to wrap up the discussion -
%// Get the parameters
lbls_len = numel(messages_labels);
msgtypeops_len = numel(messagesTypesOptions);
%// Tag messages_labels and messagesTypesOptions with numbers
alltypes = [messages_labels ; messagesTypesOptions];
[~,~,IDs] = unique(alltypes,'stable');
lbls = IDs(1:lbls_len);
typeops = IDs(lbls_len+1:end);
%// Positions of matches for each label IDs against type IDS
pos = bsxfun(#eq,lbls,typeops'); %//'
%// Logically index into messages and select the ones based on positions
%// obtained in the previous step for the final output and finally
%// concatenate along the rows to get us the final output cell array
out = arrayfun(#(n) vertcat(messages{pos(:,n)})',1:msgtypeops_len,'Uni',0)';
Benchmarking
Here are some runtimes comparing Method - 1 that turned out to be best one as listed in the question against the proposed solution.
1) With length of messages_labels as 5000:
------------------ With Method - 1
Elapsed time is 0.072821 seconds.
------------------ With Proposed solution
Elapsed time is 0.053961 seconds.
2) With length of messages_labels as 500000:
------------------ With Method - 1
Elapsed time is 6.998149 seconds.
------------------ With Proposed solution
Elapsed time is 2.765090 seconds.
An almost 1.5x-2.5x speeedup might be good enough for you!
As ever, this boils down to a simple indexing problem, and for cell arrays of strings MATLAB has a nice way to generate those indices: ismember. There might be a clever way to then use that index vector to pull all the messages out in one go, but logical indexing is easy and quick enough, and JIT magic actually makes the trivial loop faster than arrayfun (using R2013b on Linux). That gives us this:
tic
out = cell(4,1);
[~, idx] = ismember(messages_labels, messagesTypesOptions);
for ii=1:4
out{ii} = vertcat(messages{idx == ii})';
end
toc
With the above added to the end of the original code:
>> test
Elapsed time is 0.056497 seconds.
Elapsed time is 0.857934 seconds.
Elapsed time is 0.201966 seconds.
Elapsed time is 0.017667 seconds.
Not bad :D
Replace all the 5000's with 50000's and it still scales linearly like #1 and #3:
>> test
Elapsed time is 0.550462 seconds.
Elapsed time is 48.685048 seconds.
Elapsed time is 1.965559 seconds.
Elapsed time is 0.162989 seconds.
Just to be sure:
>> isequal(type_to_concatenated1, type_to_concatenated2, type_to_concatenated3, out)
ans =
1
And, if you can handle the grouped messages being column vectors rather than rows, take out the transpose...
...
out{ii} = vertcat(messages{idx == ii});
...
...and it's twice as fast again:
>> test
Elapsed time is 0.552040 seconds.
Elapsed time is <skipped>
Elapsed time is 1.986059 seconds.
Elapsed time is 0.077958 seconds.
I was writing a simple Autocorrelation function for the matrix array. Each row is a separate time series and we autocorrelate it with itself with certain set of lags. I have found out that one of the most contrintuitive things works best.
% 17 second benchmark
A = cell2mat(arrayfun(#(i) AutoCorrelation(array(i,:),lags)',1:size(array,1),'UniformOutput',false))';
where the function itself is the following
function ans = AutoCorrelation(array,lags)
arrayfun(#(x) dot(array(lags(x)+1:end),array(1:end-lags(x)))/(length(array)-lags(x)),1:length(lags));
end
Other things I have tried:
A = zeros(size(array,1),length(lags));
T = size(array,2);
% 97 seconds benchmark
A = cell2mat(arrayfun(#(i) arrayfun(#(x) dot(array(i,lags(x)+1:end),array(i,1:end-lags(x)))/(size(array,2)-lags(x)),1:length(lags))',1:size(array,1),'UniformOutput',false))';
% 100 seconds benchmark
A = arrayfun(#(i,x) dot(array(i,lags(x)+1:end),array(i,1:end-lags(x)))/(size(array,2)-lags(x)),repmat((1:size(array,1))',1,length(lags)),repmat(1:length(lags),size(array,1),1));
% 27 second benchmark
for i = 1:length(lags)
A(:,i) = dot(array(:,lags(i)+1:end),array(:,1:end-lags(i)),2)/(T-lags(i));
end
% 95 second benchmark
for i = 1:length(lags)
for j = 1:size(array,1);
A(j,i) = dot(array(j,lags(i)+1:end),array(j,1:end-lags(i)),2)/(T-lags(i));
end
end
It is more a curiosity question. If you ask me I would bet a direct dot product method would work the best. Also if arrayfun works that well, why then the double argument arrayfun doesn't perform well?
My array was 512*100000 array of doubles.
I did a program that takes a .csv file and stacks the 3rd column of each files in the appropriate 3rd dimension of a 512x512xNumberOfFiles cell array. The code goes like this :
[filenames,filepath] = uigetfile('*.csv','Opening the data files','','Multiselect','on');
filenames = fullfile(filepath,filenames);
NumFiles = numel(filenames);
Pixel = cell(512,512,NumFiles);
count=0;
num_pixels = size(Pixel,1)*size(Pixel,2);
for k = 1:NumFiles
fid = fopen(char(filenames(k)));
C = textscan(fid, '%d, %d, %d','HeaderLines',1);
Pixel(count + sub2ind(size(Pixel),C{1}+1,C{2}+1)) = num2cell(C{3});
count = count + num_pixels;
fclose(fid);
end
The textscan call here takes approximately 0.5 +/- 0.03s per file I open (which is 262144 (512x512) data long), and my sub2ind call takes approximately 0.2 +/- 0.01s per file.
Is there any way to decrease this time or this seems like the most optimal way to run the code? I'll be working with approximately 1000 files each time, so waiting 8-9 minutes only to get the data right seems a bit excessive (considering I haven't used it yet for anything else).
Any tips?
Marc-Olivier
Hoping this would result in some improvement by still keeping it with textscan. Also, make sure the values look good.
Code
[filenames,filepath] = uigetfile('*.csv','Opening the data files',...
'','Multiselect','on');
filenames = fullfile(filepath,filenames);
NumFiles = numel(filenames);
PixelDouble = NaN(512*512,NumFiles);
for k = 1:NumFiles
fid = fopen(char(filenames(k)));
C = textscan(fid, '%d, %d, %d','HeaderLines',1);
PixelDouble(:,k) = C{3};
fclose(fid);
end
Pixel = num2cell(permute(reshape(PixelDouble,512,512,[]),[2 1 3]))
I must encourage you to follow this question - Fastest Matlab file reading? and it's answers.
I have 2 input variables:
a vector of p-values (p) with N elements (unsorted)
and N x M matrix with p-values obtained by random permutations (pr) with M iterations. N is quite large, 10K to 100K or more. M let's say 100.
I'm estimating the False Discovery Rate (FDR) for each element of p representing how many p-values from random permutations will pass if the current p-value (from p) will be the threshold.
I wrote the function with ARRAYFUN, but it takes lot of time for large N (2 min for N=20K), comparable to for-loop.
function pfdr = fdr_from_random_permutations(p, pr)
%# ... skipping arguments checks
pfdr = arrayfun( #(x) mean(sum(pr<=x))./sum(p<=x), p);
Any ideas how to make it faster?
Comments about statistical issues here are also welcome.
The test data can be generated as p = rand(N,1); pr = rand(N,M);.
Well, the trick was indeed sorting the vectors. I give credit to #EgonGeerardyn for that. Also, there is no need to use mean. You can just divide everything afterwards by M. When p is sorted, finding the amount of values that are less than current x, is just a running index. pr is a more interesting case - I used a running index called place to discover how many elements are less than x.
Edit(2): Here is the fastest version I come up with:
function Speedup2()
N = 10000/4 ;
M = 100/4 ;
p = rand(N,1); pr = rand(N,M);
tic
pfdr = arrayfun( #(x) mean(sum(pr<=x))./sum(p<=x), p);
toc
tic
out = zeros(numel(p),1);
[p,sortIndex] = sort(p);
pr = sort(pr(:));
pr(end+1) = Inf;
place = 1;
N = numel(pr);
for i=1:numel(p)
x = p(i);
while pr(place)<=x
place = place+1;
end
exp1a = place-1;
exp2 = i;
out(i) = exp1a/exp2;
end
out(sortIndex) = out/ M;
toc
disp(max(abs(pfdr-out)));
end
And the benchmark results for N = 10000/4 ; M = 100/4 :
Elapsed time is 0.898689 seconds.
Elapsed time is 0.007697 seconds.
2.220446049250313e-016
and for N = 10000 ; M = 100 ;
Elapsed time is 39.730695 seconds.
Elapsed time is 0.088870 seconds.
2.220446049250313e-016
First of all, tr to analyze this using the profiler. Profiling should ALWAYS be the first step when trying to improve performance. We can all guess at what is causing your performance drop, but the only way to be sure and focus on the right part is to inspect the profiler report.
I didn't run the profiler on your code, as I don't want to generate test data to do so; but I have some ideas about what work is being carried out in vain. In your function mean(sum(pr<=x))./sum(p<=x), you are repeatedly summing over p<=x. All in all, one call includes N comparisons and N-1 summations. So for both, you have behavior that is quadratic in N when all N values of p are calculated.
If you step through a sorted version of p, you need less calculations and comparisons, as you can keep track of a running sum (i.e. behavior that is linear in N). I guess a similar method could be applied to the other part of the calculation.
edit:
The implementation of my idea as expressed above:
function pfdr = fdr(p,pr)
[N, M] = size(pr);
[p, idxP] = sort(p);
[pr] = sort(pr(:));
pfdr = NaN(N,1);
parfor iP = 1:N
x = p(iP);
m = sum(pr<=x)/M;
pfdr(iP) = m/iP;
end
pfdr(idxP) = pfdr;
If you have access to the parallel computing toolbox, the parfor loop will allow you to gain some performance. I used two basic ideas: mean(sum(pr<=x)) is actually equal to sum(pr(:)<=x)/M. On the other hand, since p is sorted, this allows you to just take the index as the number of elements (in the assumption that every element is unique, otherwise you'll have to work with unique to do the full rigorous analysis).
As you should already know very well by running the profiler yourself, the line m = sum(pr<=x)/M; is the main resource hog. This can be tackled similarly to p by making use of the sorted nature of pr.
I tested my code (both for identical results and for time consumption) against yours. For N=20e3; M=100, I get about 63 seconds to run your code and 43 seconds to run mine on my main computer (MATLAB 2011a on 64 bit Arch Linux, 8 GiB RAM, Core i7 860). For smaller values of M the gain is larger. But this gain is in part due to parallelization.
edit2: Apparently, I came to very similar results as Andrey, my result would have been very similar had I pursued the same approach.
However, I realised that there are some built-in functions that do more or less what you need, i.e. quite similar to determining the empirical cumulative density function. And this can be done by constructing the histogram:
function pfdr = fdr(p,pr)
[N, M] = size(pr);
[p, idxP] = sort(p);
count = histc(pr(:), [0; p]);
count = cumsum(count(1:N));
pfdr = count./(1:N).';
pfdr(idxP) = pfdr/M;
For the same M and N as above, this code takes 228 milliseconds on my computer. It takes 104 milliseconds for Andrey's parameters, so on my computer it turns out a bit slower, but I think this code is far more readable than intricate for loops (as was the case in both our examples).
Following the discussion between me and Andrey in this question, this very late answer is just to prove to Andrey that vectorized solutions are still faster than JIT'ed loops, they sometimes just aren't as easy to find.
I am more than willing to remove this answer if it is deemed inappropriate by the OP.
Now, on to business, here's the original arrayfun, looped version by Andrey, and vectorized version by Egon:
function test
clc
N = 10000/4 ;
M = 100/4 ;
p = rand(N,1);
pr = rand(N,M);
%% first option
tic
pfdr = arrayfun( #(x) mean(sum(pr<=x))./sum(p<=x), p);
toc
%% second option
tic
out = zeros(numel(p),1);
[p2,sortIndex] = sort(p);
pr2 = sort(pr(:));
pr2(end+1) = Inf;
place = 1;
for i=1:numel(p2)
x = p2(i);
while pr2(place)<=x
place = place+1;
end
exp1a = place-1;
exp2 = i;
out(i) = exp1a/exp2;
end
out(sortIndex) = out/ M;
toc
%% third option
tic
[p2,sortIndex] = sort(p);
count = histc(pr2(:), [0; p2]);
count = cumsum(count(1:N));
out = count./(1:N).';
out(sortIndex) = out/M;
toc
end
Results on my laptop:
Elapsed time is 0.916196 seconds.
Elapsed time is 0.011429 seconds.
Elapsed time is 0.007328 seconds.
and for N=1000; M = 100; :
Elapsed time is 38.082718 seconds.
Elapsed time is 0.127052 seconds.
Elapsed time is 0.042686 seconds.
So: vectorized is 2-3 times faster.