I'm trying to understand why the following AppleScript handlers complete in such different amounts of time. I have started (!) reading a little about Big O and complexity, but am struggling to apply my thus far limited understanding to these cases:
Handler 1:
on ranger1(n)
set outList to {}
repeat with i from 1 to n
set end of outList to i
end repeat
return outList
end ranger1
Handler 2:
on ranger2(n)
set outList to {}
set i to 1
repeat n times
set end of outList to i
set i to i+1
end repeat
return outList
end ranger2
I've tried these handlers out with values for n of up to 1 000 000. (If anyone reading plans on trying these out, stick to values <= 100 000!)
Timing a call of ranger1(100000):
set timeStart to (time of (current date))
ranger1(100000)
log (time of (current date)) - timeStart
is giving me a time of between 8-10 secs to complete.
However, timing a call of ranger2(100000) results in about 240 secs to complete.
I'm assuming that in ranger2() it is the statement set i to i+1 that is increasing the "complexity" of the handler. I might be wrong, I might be right; I honestly don't know.
So, I guess my question is (!) - Am i wrong?
I will be extremely appreciative of any explanation that can help me understand the real difference between these handlers. Particularly one that can help me move towards applying concepts of "complexity" to such simple functions.
Cheers :)
Big O tells you how the run time will develop, as the size of the data increases.
So it is really nothing practical in it, except that, a rules of thumb.
Your findings suggests that it is a little bit faster to use the repeat with i from 1 to n loop, since the counter is then increased behind the scenes. If you try to measure stuff theoretically, then the i+1 of course also counts as an extra statement. :)
For comparison, here's the equivalent in Python (which took me a week to get into coming from AppleScript, and I'm a slow learner):
#!/usr/bin/python
from time import time
def ranger3(n):
outlist = []
i = 1
for _ in range(n):
outlist.append(i)
i += 1
return outlist
def ranger4(n):
outlist = []
for i in range(1, n+1):
outlist.append(i)
return outlist
n = 10000000 # 10 million
t = time()
ranger3(n)
print(time()-t) # 2.2633600235
t = time()
ranger4(n)
print(time()-t) # 1.52647018433
Needless to say, both are O(n) as you'd expect, in addition to being one or two magnitudes faster than AS - and Python is considered slow compared to most mainstream languages. Just to show how pointless it is obssessing over "performance-optimizing" AppleScript, when every other language leaves it in the dust straight out of the box.
I ran the following timer loops on the AS ranger code supplied above:
set minIter to 0
set maxIter to 200000
set incIter to 50000
repeat with iters from minIter to maxIter by incIter
set timeStart to (time of (current date))
ranger1(iters)
log (" ranger1(" & iters & ") took seconds:" & (time of (current date)) - timeStart) & " seconds "
end repeat
repeat with iters from minIter to maxIter by incIter
set timeStart to (time of (current date))
ranger2(iters)
log (" ranger2(" & iters & ") took seconds:" & (time of (current date)) - timeStart) & " seconds "
end repeat
with these results:
(* ranger1(0) took seconds:0 seconds *)
(* ranger1(50000) took seconds:1 seconds *)
(* ranger1(100000) took seconds:3 seconds *)
(* ranger1(150000) took seconds:8 seconds *)
(* ranger1(200000) took seconds:13 seconds *)
(* ranger2(0) took seconds:0 seconds *)
(* ranger2(50000) took seconds:74 seconds *)
(* ranger2(100000) took seconds:262 seconds *)
(* ranger2(150000) took seconds:471 seconds *)
(* ranger2(200000) took seconds:734 seconds *)
Certainly ranger1 is (relatively) faster, but definitely not linear, and ranger2 is downright glacial.
Related
We can use
c = #parallel (vcat) for i=1:10
(i,i+1)
end
But when I'm trying to use push!() instead of vcat() I'm getting some error. How can I use push!() in this parallel loop?
c = #parallel (push!) for i=1:10
(c, (i,i+1))
end
The #parallel is somewhat similar to foldl(op, itr) in that it uses the first value of itr as an initial first parameter for op. push! lacks the required symmetry between the operands. Perhaps what you are looking for is:
julia> c = #parallel (append!) for i=1:10
[(i,i+1)]
end
Elaborating a bit on Dan's point; to see how the parallel macro works, see the difference between the following two invocations:
julia> #parallel print for i in 1:10
(i,i+1)
end
(1, 2)(2, 3)nothing(3, 4)nothing(4, 5)nothing(5, 6)nothing(6, 7)nothing(7, 8)nothing(8, 9)nothing(9, 10)nothing(10, 11)
julia> #parallel string for i in 1:10
(i,i+1)
end
"(1, 2)(2, 3)(3, 4)(4, 5)(5, 6)(6, 7)(7, 8)(8, 9)(9, 10)(10, 11)"
From the top one it should be clear what's going on. Each iteration produces an output. When it comes to using the specified function on those outputs, this is done in output pairs. Two first pair of outputs is fed to print, and the result of the print operation then becomes the first item in the next pair to be processed. Since the output is nothing, print prints nothing then (3,4). The result of this print statement is nothing, therefore the next pair to be printed is nothing and (4,5), and so on until all elements are consumed. I.e. in terms of pseudocode, this is what's happening:
Step 1: state = print((1,2), (2,3)); # state becomes nothing
Step 2: state = print(state, (3,4)); # state becomes nothing again
Step 3: state = print(state, (4,5)); # and so forth
The reason string works as expected is because what's happening is the following steps:
Step 1: state = string((1,2),(2,3));
Step 2: state = string(state, (3,4));
Step 3: state = string(state, (4,5);
etc
In general, the function you pass to the parallel macro should be something that takes two inputs of the same type, and outputs an object of the same type.
Therefore you cannot use push!, because this always uses two inputs of different types (one array, and one plain element), and outputs an array. Therefore you need to use append! instead, which fits the specification.
Also note that the order of outputs is not guaranteed. (here it happens to be in order because I only used 1 worker). If you want something where the order of operations matters, then you shouldn't use this construct. E.g., obviously in something like addition it doesn't matter, because addition is a completely associative operation; but if I used string, if outputs are processed in different order, then obviously you could end up with a different string than what you'd expect.
EDIT - addressing benchmark between vcat / append! / indexed assignment
I think the most efficient way to do this is in fact via normal indexing onto a preallocated array. But between append! and vcat, append will most certainly be faster as vcat always makes a copy (as I understand it).
Benchmarks:
function parallelWithVcat!( A::Array{Tuple{Int64, Int64}, 1} )
A = #parallel vcat for i = 1:10000
(i, i+1)
end
end;
function parallelWithFunction!( A::Array{Tuple{Int64, Int64}, 1} )
A = #parallel append! for i in 1:10000
[(i, i+1)];
end
end;
function parallelWithPreallocation!( A::Array{Tuple{Int64, Int64}, 1} )
#parallel for i in 1:10000
A[i] = (i, i+1);
end
end;
A = Array{Tuple{Int64, Int64}, 1}(10000);
### first runs omitted, all benchmarks here are from 2nd runs ###
# first on a single worker:
#time for n in 1:100; parallelWithVcat!(A); end
#> 8.050429 seconds (24.65 M allocations: 75.341 GiB, 15.42% gc time)
#time for n in 1:100; parallelWithFunction!(A); end
#> 0.072325 seconds (1.01 M allocations: 141.846 MiB, 52.69% gc time)
#time for n in 1:100; parallelWithPreallocation!(A); end
#> 0.000387 seconds (4.21 k allocations: 234.750 KiB)
# now with true parallelism:
addprocs(10);
#time for n in 1:100; parallelWithVcat!(A); end
#> 1.177645 seconds (160.02 k allocations: 109.618 MiB, 0.75% gc time)
#time for n in 1:100; parallelWithFunction!(A); end
#> 0.060813 seconds (111.87 k allocations: 70.585 MiB, 3.91% gc time)
#time for n in 1:100; parallelWithPreallocation!(A); end
#> 0.058134 seconds (116.16 k allocations: 4.174 MiB)
If someone can suggest an even more efficient way, please do so!
Note in particular that the indexed assignment is much faster than the rest, such that it appears (for this example at least) that most of its computation in the parallel case appears to be lost on the parallelisation itself.
Disclaimer: I make no claim that the above are correct summonings of the #parallel spell. I have not delved into the inner workings of the macro in detail to be able to claim otherwise. In particular, I am not aware which parts the macro causes to be processed remotely vs local (e.g. the assignment part). Caution is advised, ymmv, etc.
I am trying to increase the number of sets until workout is at least as long as min_workout but not longer thanmax_workout. I am testing with valuesmin30,max40,run3, andwalk2`. It should stop at 5 repeats and total of 33 minutes.
warm_cool = 8
puts "What is the least amount of time you want to work out?"
min_workout = gets.chomp.to_i
puts "What is the longest time you want to work out?"
max_workout = gets.chomp.to_i
puts "How many minutes per set do you want to run?"
run = gets.chomp.to_i
puts "How many minutes do you want to walk each set?"
walk = gets.chomp.to_i
i = 0
workout = (run+walk)*i + warm_cool
until workout >=min_workout && workout <=max_workout do
workout = (run+walk)*i+=1 + warm_cool
puts "You need to perform #{i} repeats and your workout time will be #
{workout} minutes, including a 4 minute warmup and cooldown "
end
I can't figure out why I am getting an infinite loop here.
There's a few little slips here that have had some pretty dramatic consequences. First of all is using += in the middle of a statement. That's generally bad form and it's caused absolute chaos here.
The reason for this is your code is evaluated as this:
workout = (run + walk) * i += (1 + warm_cool)
Since warm_cool is 8 then it increments by 9 each time and you can easily skip past the end of your range. This is why it's generally best to limit how many times you try things to a reasonable count. Wrapping it in a simple method also helps contain things and makes managing flow easier:
def intervals_required(run, walk, warm_cool, range)
10.times do |i|
workout = (run + walk) * i + warm_cool
return i if range.include?(workout)
end
# Couldn't find a matching interval
nil
end
Where you call it like this:
if (intervals = intervals_required(run, walk, (min_workout..max_workout))
puts "..."
end
You're probably thinking that your first iteration is yielding the expression
(3 + 2) * 0 + 1 + 8
which would evaluate correctly, but you need to understand the way += works under the hood.
Underneath the syntax conveniences of += given to you by Ruby, you're actually doing a few things at once. += is an assignment method, and everything to the right of it is an argument to the method and wrapped in implicit parentheses, as in i += (1 + 8). It's actually two method calls in one, adding the receiver to the argument before assignment, like so
i = i + (1 + 8)
Underneath all the syntactic sugar it really looks like this with dot notation and parentheses
i.=(i.+(1 + 8))
So instead of
(3 + 2) * 1 + 8
5 * 1 + 8
5 + 8
13
on the first pass you're actually getting
(3 + 2) * (i = 0 + (1 + 8))
(3 + 2) * 9
5 * 9
45
and skipping your upper condition of 40, so it just keeps going. i is now set to 9, and i increases by 9 on each pass, so your next result is 90, then 135, and so on.
Try wrapping the assignment in parentheses, like this
(run + walk) * (i += 1) + 8
Also consider adding a guard clause inside your loop to prevent infinite repetition, something like break if workout > max_workout
I believe you problem is that you have reached an invalid state with your until loop and it just keeps going out of bounds. Your logic makes sense for a while type scenario because you want it to stop whenever both of those are true. However with an until it will break from the loop when your condition equals true.
So if workout is more than the max_workout it will keep incrementing since it will never satisfy the conditional and the until will keep going.
I've been scratching my head at this for several hours. So i have a script that i'm calling a function 625 times but that causes lag so i want to delay each iteration of the for loop by 5 seconds. Any help would be great.
I use this little function for second-resolution delays.
function os.sleep(sec)
local now = os.time() + sec
repeat until os.time() >= now
end
EDIT: Added msec version (approximate -- not very precise)
function os.sleep(msec)
local now = os.clock() + msec/1000
repeat until os.clock() >= now
end
I'm trying to oversimplify this as much as possible.
functions f1and f2 implement a very simplified version of a roulette wheel selection over a Vector R. The only difference between them is that f1 uses a for and f2 a while. Both functions return the index of the array where the condition was met.
R=rand(100)
function f1(X::Vector)
l = length(X)
r = rand()*X[l]
for i = 1:l
if r <= X[i]
return i
end
end
end
function f2(X::Vector)
l = length(X)
r = rand()*X[l]
i = 1
while true
if r <= X[i]
return i
end
i += 1
end
end
now I created a couple of test functions...
M is the number of times we repeat the function execution.
Now this is critical... I want to store the values I get from the functions because I need them later... To oversimplify the code I just created a new variable r where I sum up the returns from the functions.
function test01(M,R)
cumR = cumsum(R)
r = 0
for i = 1:M
a = f1(cumR)
r += a
end
return r
end
function test02(M,R)
cumR = cumsum(R)
r = 0
for i = 1:M
a = f2(cumR)
r += a
end
return r
end
So, next I get:
#time test01(1e7,R)
elapsed time: 1.263974802 seconds (320000832 bytes allocated, 15.06% gc time)
#time test02(1e7,R)
elapsed time: 0.57086421 seconds (1088 bytes allocated)
So, for some reason I can't figure out f1 allocates a lot of memory and its even greater the larger M gets.
I said the line r += a was critical, because if I remove it from both test functions, I get the same result with both tests, so no problems! So I thought there was a problem with the type of a being returned by the functions (because f1 returns the iterator of the for loop, and f2 uses its own variable i "manually declared" inside the function).
But...
aa = f1(cumsum(R))
bb = f2(cumsum(R))
typeof(aa) == typeof(bb)
true
So... what that hell is going on???
I apologize if this is some sort of basic question but, I've been going over this for over 3 hours now and couldn't find an answer... Even though the functions are fixed by using a while loop I hate not knowing what's going on.
Thanks.
When you see lots of surprising allocations like that, a good first thing to check is type-stability. The #code_warntype macro is very helpful here:
julia> #code_warntype f1(R)
# … lots of annotated code, but the important part is this last line:
end::Union{Int64,Void}
Compare that to f2:
julia> #code_warntype f2(R)
# ...
end::Int64
So, why are the two different? Julia thinks that f1 might sometimes return nothing (which is of type Void)! Look again at your f1 function: what would happen if the last element of X is NaN? It'll just fall off the end of the function with no explicit return statement. In f2, however, you'll end up indexing beyond the bounds of X and get an error instead. Fix this type-instabillity by deciding what to do if the loop completes without finding the answer and you'll see more similar timings.
As I stated in the comment, your functions f1 and f2 both contain random numbers inside it, and you are using the random numbers as stopping criterion. Thus, there is no deterministic way to measure which of the functions is faster (doesn't depend in the implementation).
You can replace f1 and f2 functions to accept r as a parameter:
function f1(X::Vector, r)
for i = 1:length(X)
if r <= X[i]
return i
end
end
end
function f2(X::Vector, r)
i = 1
while i <= length(X)
if r <= X[i]
return i
end
i += 1
end
end
And then measure the time properly with the same R and r for both functions:
>>> R = cumsum(rand(100))
>>> r = rand(1_000_000) * R[end] # generate 1_000_000 random thresholds
>>> #time for i=1:length(r); f1(R, r[i]); end;
0.177048 seconds (4.00 M allocations: 76.278 MB, 2.70% gc time)
>>> #time for i=1:length(r); f2(R, r[i]); end;
0.173244 seconds (4.00 M allocations: 76.278 MB, 2.76% gc time)
As you can see, the timings are now nearly identical. Any difference will be caused for external factors (warming or processor busy with other tasks).
I'm trying to parallelize a little scientific code I wrote. But when I add #parallelize, similar code on just one processor suddenly takes 10 times as long to execute. It should take roughly the same amount of time. The first code makes one memory allocation, while the second makes 20. But zeros(Float64, num_bins) should not be a bottleneck. num_bins is 1800. So each call to zeros() should be allocating 8*1800 bytes. 20 calls to allocate 14,400 bytes should not be taking this long.
I can't figure out what I'm doing wrong, and the Julia documentation is vague and non-specific about how variables are accessed within #parallel. Both versions of the code below compute the correct value for the rdf vector. Can anyone tell by looking at it what is making it allocate so much memory and take so long?
atoms = readAtoms(file)
rdf = zeros(Float64, num_bins)
#time for k = 1:20
for i = 1:num_atoms
for j = 1:num_atoms
r = distance(k, atoms, i, atoms, j)
bin_number = floor(r / dr) + 1
rdf[bin_number] += 1
end
end
end
elapsed time: 8.1 seconds (0 bytes allocated)
atoms = readAtoms(file)
#time rdf = #parallel (+) for k = 1:20
rdf_part = zeros(Float64, num_bins)
for i = 1:num_atoms
for j = 1:num_atoms
r = distance(k, atoms, i, atoms, j)
bin_number = floor(r / dr) + 1
rdf_part[bin_number] += 1
end
end
rdf_part
end
elapsed time: 81.2 seconds (33472513332 bytes allocated, 17.40% gc time)