I've been scratching my head at this for several hours. So i have a script that i'm calling a function 625 times but that causes lag so i want to delay each iteration of the for loop by 5 seconds. Any help would be great.
I use this little function for second-resolution delays.
function os.sleep(sec)
local now = os.time() + sec
repeat until os.time() >= now
end
EDIT: Added msec version (approximate -- not very precise)
function os.sleep(msec)
local now = os.clock() + msec/1000
repeat until os.clock() >= now
end
Related
Is there a way to vectorize this for loop to speed up?
thank you
for j =1 :size(Rond_Input2Cell,1)
for k=1: size(Rond_Input2Cell,2)
Rond_Input2Cell(j,k)= (Pre_Rond_Input2Cell(j,k)*Y_FGate(k))+(net_Cell(k)*Y_InGate(k)*tmp_input(j)) ;
end
end
P.s.
Matrix size:
Rond_Input2Cell =39*120
Pre_Rond_Input2Cell = 39*120
Y_FGate=1*120 (row vector)
net_Cell=1*120 (row vector)
Y_InGate =1*120 (row vector)
tmp_input =1*39 (row vector)
You can speed up this calculation without using a for loop but instead using bsxfun which uses memory to speed up the processing
This code below perform the same function row by row and adds them
Rond_Input2Cell = bsxfun(#times,tmp_input.' ,net_Cell.*Y_InGate) + bsxfun(#times ,Pre_Rond_Input2Cell,Y_FGate);
Exlpanation :
Pre_Rond_Input2Cell(j,k)*Y_FGate(k)
This is performed by using bsxfun(#times ,Pre_Rond_Input2Cell,Y_FGate) which mutiplies each 39 rows of Pre_Rond_Input2Cell with 120 columns of Y_FGate
net_Cell(k)*Y_InGate(k)*tmp_input(j) is replaced by bsxfun(#times,tmp_input.' ,net_Cell.*Y_InGate) which mutiplies each element of tmp_input with dot mutiplication of net_Cell and Y_InGateIn the end the it is stored in Rond_Input2Cell
Here is a performance check
>> perform_check
Elapsed time is 0.000475 seconds.
Elapsed time is 0.000156 seconds.
>> perform_check
Elapsed time is 0.001089 seconds.
Elapsed time is 0.000288 seconds.
One more Method is to use repmat
tic;
Rond_Input2Cell =(Pre_Rond_Input2Cell.*repmat(Y_FGate,size(Pre_Rond_Input2Cell,1),1)) + (repmat(tmp_input.',1,size(Pre_Rond_Input2Cell,2)).*repmat(net_Cell.*Y_InGate,size(Pre_Rond_Input2Cell,1),1));
toc;
Here is a performance test with a for loop
>> perf_test
Elapsed time is 0.003268 seconds.
Elapsed time is 0.001719 seconds.
>> perf_test
Elapsed time is 0.004211 seconds.
Elapsed time is 0.002348 seconds.
>> perf_test
Elapsed time is 0.002384 seconds.
Elapsed time is 0.000509 seconds.
Here is an article by Loren on Performance of repmat vs bsxfun
Your vectorized code should be something like this.
temp_mat = tmp_input' * (net_Cell .* Y_InGate) - size (39*120)
Rond_Input2Cell = (Pre_Rond_Input2Cell .* Y_FGate) .+ temp_mat - size (39*120)
Lua is said to be a fast scripting language. But when I tested looping e.g.:
a = 0
while a < 1000000000 do
a = a + 1
end
It takes a lot of time (over 1 minute). Is it because Lua needs to copy and paste loop's content, and then evaluate?
I know that when evaluating you need to pop() items away from stack.
I tested this "speed-test" on Ruby too and it did the loop in about 20s.
EDIT:
Why is this so much faster on local variables? (~16 seconds to do same iteration but on local variable inside function)
Try the code below. It compares while vs for loops and globals vs local variables.
I get these numbers (with Lua 5.1.4, but they are similar for 5.3.2), which tell you the cost of using global variables in a loop:
WG 9.16 100
WL 1.96 467
FG 4.93 186
FL 1.18 776
Of course, these costs get diluted if you do real work inside the loop.
Here is the code:
local N=1e8
t0=os.clock()
a = 0
while a < N do
a = a + 1
end
t1=os.clock()-t0
print("WG",t1,math.floor(t1/t1*100+0.5))
t0=os.clock()
local a = 0
while a < N do
a = a + 1
end
t2=os.clock()-t0
print("WL",t2,math.floor(t1/t2*100+0.5))
t0=os.clock()
b = 0
for i=1,N do
b = b + 1
end
t3=os.clock()-t0
print("FG",t3,math.floor(t1/t3*100+0.5))
t0=os.clock()
local b = 0
for i=1,N do
b = b + 1
end
t4=os.clock()-t0
print("FL",t4,math.floor(t1/t4*100+0.5))
Your loop is inefficient and unpractical.
You're doing one billion iterations. That's not exactly "light".
Not to mention you're using a while loop to substitute a numeric for loop.
I'm trying to oversimplify this as much as possible.
functions f1and f2 implement a very simplified version of a roulette wheel selection over a Vector R. The only difference between them is that f1 uses a for and f2 a while. Both functions return the index of the array where the condition was met.
R=rand(100)
function f1(X::Vector)
l = length(X)
r = rand()*X[l]
for i = 1:l
if r <= X[i]
return i
end
end
end
function f2(X::Vector)
l = length(X)
r = rand()*X[l]
i = 1
while true
if r <= X[i]
return i
end
i += 1
end
end
now I created a couple of test functions...
M is the number of times we repeat the function execution.
Now this is critical... I want to store the values I get from the functions because I need them later... To oversimplify the code I just created a new variable r where I sum up the returns from the functions.
function test01(M,R)
cumR = cumsum(R)
r = 0
for i = 1:M
a = f1(cumR)
r += a
end
return r
end
function test02(M,R)
cumR = cumsum(R)
r = 0
for i = 1:M
a = f2(cumR)
r += a
end
return r
end
So, next I get:
#time test01(1e7,R)
elapsed time: 1.263974802 seconds (320000832 bytes allocated, 15.06% gc time)
#time test02(1e7,R)
elapsed time: 0.57086421 seconds (1088 bytes allocated)
So, for some reason I can't figure out f1 allocates a lot of memory and its even greater the larger M gets.
I said the line r += a was critical, because if I remove it from both test functions, I get the same result with both tests, so no problems! So I thought there was a problem with the type of a being returned by the functions (because f1 returns the iterator of the for loop, and f2 uses its own variable i "manually declared" inside the function).
But...
aa = f1(cumsum(R))
bb = f2(cumsum(R))
typeof(aa) == typeof(bb)
true
So... what that hell is going on???
I apologize if this is some sort of basic question but, I've been going over this for over 3 hours now and couldn't find an answer... Even though the functions are fixed by using a while loop I hate not knowing what's going on.
Thanks.
When you see lots of surprising allocations like that, a good first thing to check is type-stability. The #code_warntype macro is very helpful here:
julia> #code_warntype f1(R)
# … lots of annotated code, but the important part is this last line:
end::Union{Int64,Void}
Compare that to f2:
julia> #code_warntype f2(R)
# ...
end::Int64
So, why are the two different? Julia thinks that f1 might sometimes return nothing (which is of type Void)! Look again at your f1 function: what would happen if the last element of X is NaN? It'll just fall off the end of the function with no explicit return statement. In f2, however, you'll end up indexing beyond the bounds of X and get an error instead. Fix this type-instabillity by deciding what to do if the loop completes without finding the answer and you'll see more similar timings.
As I stated in the comment, your functions f1 and f2 both contain random numbers inside it, and you are using the random numbers as stopping criterion. Thus, there is no deterministic way to measure which of the functions is faster (doesn't depend in the implementation).
You can replace f1 and f2 functions to accept r as a parameter:
function f1(X::Vector, r)
for i = 1:length(X)
if r <= X[i]
return i
end
end
end
function f2(X::Vector, r)
i = 1
while i <= length(X)
if r <= X[i]
return i
end
i += 1
end
end
And then measure the time properly with the same R and r for both functions:
>>> R = cumsum(rand(100))
>>> r = rand(1_000_000) * R[end] # generate 1_000_000 random thresholds
>>> #time for i=1:length(r); f1(R, r[i]); end;
0.177048 seconds (4.00 M allocations: 76.278 MB, 2.70% gc time)
>>> #time for i=1:length(r); f2(R, r[i]); end;
0.173244 seconds (4.00 M allocations: 76.278 MB, 2.76% gc time)
As you can see, the timings are now nearly identical. Any difference will be caused for external factors (warming or processor busy with other tasks).
I'm trying to understand why the following AppleScript handlers complete in such different amounts of time. I have started (!) reading a little about Big O and complexity, but am struggling to apply my thus far limited understanding to these cases:
Handler 1:
on ranger1(n)
set outList to {}
repeat with i from 1 to n
set end of outList to i
end repeat
return outList
end ranger1
Handler 2:
on ranger2(n)
set outList to {}
set i to 1
repeat n times
set end of outList to i
set i to i+1
end repeat
return outList
end ranger2
I've tried these handlers out with values for n of up to 1 000 000. (If anyone reading plans on trying these out, stick to values <= 100 000!)
Timing a call of ranger1(100000):
set timeStart to (time of (current date))
ranger1(100000)
log (time of (current date)) - timeStart
is giving me a time of between 8-10 secs to complete.
However, timing a call of ranger2(100000) results in about 240 secs to complete.
I'm assuming that in ranger2() it is the statement set i to i+1 that is increasing the "complexity" of the handler. I might be wrong, I might be right; I honestly don't know.
So, I guess my question is (!) - Am i wrong?
I will be extremely appreciative of any explanation that can help me understand the real difference between these handlers. Particularly one that can help me move towards applying concepts of "complexity" to such simple functions.
Cheers :)
Big O tells you how the run time will develop, as the size of the data increases.
So it is really nothing practical in it, except that, a rules of thumb.
Your findings suggests that it is a little bit faster to use the repeat with i from 1 to n loop, since the counter is then increased behind the scenes. If you try to measure stuff theoretically, then the i+1 of course also counts as an extra statement. :)
For comparison, here's the equivalent in Python (which took me a week to get into coming from AppleScript, and I'm a slow learner):
#!/usr/bin/python
from time import time
def ranger3(n):
outlist = []
i = 1
for _ in range(n):
outlist.append(i)
i += 1
return outlist
def ranger4(n):
outlist = []
for i in range(1, n+1):
outlist.append(i)
return outlist
n = 10000000 # 10 million
t = time()
ranger3(n)
print(time()-t) # 2.2633600235
t = time()
ranger4(n)
print(time()-t) # 1.52647018433
Needless to say, both are O(n) as you'd expect, in addition to being one or two magnitudes faster than AS - and Python is considered slow compared to most mainstream languages. Just to show how pointless it is obssessing over "performance-optimizing" AppleScript, when every other language leaves it in the dust straight out of the box.
I ran the following timer loops on the AS ranger code supplied above:
set minIter to 0
set maxIter to 200000
set incIter to 50000
repeat with iters from minIter to maxIter by incIter
set timeStart to (time of (current date))
ranger1(iters)
log (" ranger1(" & iters & ") took seconds:" & (time of (current date)) - timeStart) & " seconds "
end repeat
repeat with iters from minIter to maxIter by incIter
set timeStart to (time of (current date))
ranger2(iters)
log (" ranger2(" & iters & ") took seconds:" & (time of (current date)) - timeStart) & " seconds "
end repeat
with these results:
(* ranger1(0) took seconds:0 seconds *)
(* ranger1(50000) took seconds:1 seconds *)
(* ranger1(100000) took seconds:3 seconds *)
(* ranger1(150000) took seconds:8 seconds *)
(* ranger1(200000) took seconds:13 seconds *)
(* ranger2(0) took seconds:0 seconds *)
(* ranger2(50000) took seconds:74 seconds *)
(* ranger2(100000) took seconds:262 seconds *)
(* ranger2(150000) took seconds:471 seconds *)
(* ranger2(200000) took seconds:734 seconds *)
Certainly ranger1 is (relatively) faster, but definitely not linear, and ranger2 is downright glacial.
Just to check how parallel processing works in matlab, I tried the below piece of codes and measured the time of execution. But I found the parallel processing code takes more time than normal code which is unexpected. Am I doing wrong somewhere?
Code with parallel processing
function t = parl()
matlabpool('open',2);
tic;
A = 5:10000000;
parfor i = 1:length(A)
A(i) = 3*A(i) + (A(i)/5);
A(i) = 0.456*A(i) + (A(i)/45);
end
tic;
matlabpool('close');
t = toc;
end
There result for parallel processing
>> parl Starting matlabpool using the 'local' profile ... connected to 2 workers. Sending a stop signal to all the workers ... stopped.
ans =
3.3332
function t = parl()
tic;
A = 5:10000000;
for i = 1:length(A)
A(i) = 3*A(i) + (A(i)/5);
A(i) = 0.456*A(i) + (A(i)/45);
end
tic;
t = toc;
end
Result for without parallel processing code
>> parl
ans =
2.8737e-05
Look at the time to (apparently) execute the serial version of the code, it is effectively 0. That's suspicious, so look at the code ...
tic;
t = toc;
Hmmm, this starts a stopwatch and immediately stops it. Yep, that should take about 0s. Have a look at the parallel code ...
tic;
matlabpool('close');
t = toc;
Ahh, in this case the code times the execution of the closing of the pool of workers. That's requires a fair bit of work and the time it takes, the 3.33s, is part of the overhead of using parallel computation in Matlab.
Yes, I do believe that you are doing something wrong, you are not measuring what you (probably) think you are measuring. tic starts a stopwatch and toc reads it. Your code starts a stopwatch twice and reads it once, it should probably start timing only once.