Julia: Return early from pmap() - parallel-processing

Say I have something like the following:
function f(x)
some_test ? true : false
end
If I do pmap(f,some_array) I'll get some array of Bools. I'd like to do something if contains(==,p,false). However, I'd like to do this thing if there is at least just one false. I.e. if some_array is very very large I would like pmap to stop once it finds its first false.
some_test may be quite involved so I've read that a parallel for loop is not the way to go.
E.g if I have
p = pmap(f,some_array,[N for i = 1:some_large_value])
if contains(==,p,false)
return false
else
return true
end
and a false appears when i=100, how can I stop pmap from checking 101:some_large_value?
As another example of the behavior I'd like to do, take this example from ?pmap.
julia> pmap(x->iseven(x) ? error("foo") : x, 1:4; on_error=ex->0)
4-element Array{Int64,1}:
1
0
3
0
Instead of on_error=ex->0 I'd like pmap to return on the first even. Something like
pmap(x->iseven(x) ? return : x, 1:4)
which would ideally result in only a 1-element Array{Int64,1}.

This is generally hard to do, since other tasks may have already started. If you're not that worried about doing a few extra runs, one way is to modify the pmap example from the parallel computing docs
function pmap_chk(f, lst)
np = nprocs() # determine the number of processes available
n = length(lst)
results = Vector{Any}(n)
i = 0
nextidx() = (i+=1; i)
done = false
isdone() = done
setdone(flag) = (done = flag)
#sync begin
for p=1:np
if p != myid() || np == 1
#async begin
while !isdone()
idx = nextidx()
if idx > n
break
end
r, flag = remotecall_fetch(f, p, lst[idx])
results[idx] = r
if flag
setdone(flag)
end
end
end
end
end
end
resize!(results, i)
end
Here f should return a tuple containing the result and whether or not it is done, e.g.
julia> pmap_chk(1:100) do f
r = rand()
sleep(r)
(r, r>0.9)
end
15-element Array{Any,1}:
0.197364
0.60551
0.794526
0.105827
0.612087
0.170032
0.8584
0.533681
0.46603
0.901562
0.0894842
0.718619
0.501523
0.407671
0.514958
Note that it doesn't stop immediately.

Related

Parallelise backslash matrix inversion using #distributed

I'm solving a PDE using an implicit scheme, which I can divide into two matrices at every time step, that are then connected by a boundary condition (also at every time step). I'm trying to speed up the process by using multi-processing to invert both matrices at the same time.
Here's an example of what this looks like in a minimal (non-PDE-solving) example.
using Distributed
using LinearAlgebra
function backslash(N, T, b, exec)
A = zeros(N,N)
α = 0.1
for i in 1:N, j in 1:N
abs(i-j)<=1 && (A[i,j]+=-α)
i==j && (A[i,j]+=3*α+1)
end
A = Tridiagonal(A)
a = zeros(N, 4, T)
if exec == "parallel"
for i = 1:T
#distributed for j = 1:2
a[:, j, i] = A\b[:, i]
end
end
elseif exec == "single"
for i = 1:T
for j = 1:2
a[:, j, i] = A\b[:, i]
end
end
end
return a
end
b = rand(1000, 1000)
a_single = #time backslash(1000, 1000, b, "single");
a_parallel = #time backslash(1000, 1000, b, "parallel");
a_single == a_parallel
Here comes the problem: the last line evaluate to true, with an 6-fold speed-up, however, only 2-fold should be possible. What am I getting wrong?
You are measuring compile time
Your #distributed loop exits prematurely
Your #distributed loop does not collect the results
Hence obviously you have:
julia> addprocs(2);
julia> sum(backslash(1000, 1000, b, "single")), sum(backslash(1000, 1000, b, "parallel"))
(999810.3418359067, 0.0)
So in order to make your code work you need to collect the data from the distributed loop which can be done as:
function backslash2(N, T, b, exec)
A = zeros(N,N)
α = 0.1
for i in 1:N, j in 1:N
abs(i-j)<=1 && (A[i,j]+=-α)
i==j && (A[i,j]+=3*α+1)
end
A = Tridiagonal(A)
a = zeros(N, 4, T)
if exec == :parallel
for i = 1:T
aj = #distributed (append!) for j = 1:2
[A\b[:, i]]
end
# you could consider using SharedArrays instead
a[:, 1, i] .= aj[1]
a[:, 2, i] .= aj[2]
end
elseif exec == :single
for i = 1:T
for j = 1:2
a[:, j, i] = A\b[:, i]
end
end
end
return a
end
Now you have equal results:
julia> sum(backslash2(1000, 1000, b, :single)) == sum(backslash2(1000, 1000, b, :parallel))
true
However, the distributed code is very inefficient for loops that take few milliseconds to execute so the #distributed code will take in this example many times longer to execute as you run it 1000 times and it takes something like few milliseconds to dispatch a distributed job each time.
Perhaps your production task takes longer so it than makes sense. Or maybe you will consider Threads.#threads instead.
Last but not least BLAS might be configured to be multi-threaded and in this scenario on a single machine it might make no sense to parallelize (depends on the scenario)

Execution time julia program to count primes

I am experimenting a bit with julia, since I've heard that it is suitable for scientific calculus and its syntax is reminiscent of python. I tried to write and execute a program to count prime numbers below a certain n, but the performances are not the ones hoped.
Here I post my code, with the disclaimer that I've literally started yesterday in julia programming and I am almost sure that something is wrong:
n = 250000
counter = 0
function countPrime(counter)
for i = 1:n
# print("begin counter= ", counter, "\n")
isPrime = true
# print("i= ", i, "\n")
for j = 2:(i-1)
if (i%j) == 0
isPrime = false
# print("j= ", j, "\n")
break
end
end
(isPrime==true) ? counter += 1 : counter
# print("Counter= ", counter, "\n")
end
return counter
end
println(countPrime(counter))
The fact is that the same program ported in C has about 5 seconds of execution time, while this one in julia has about 3 minutes and 50 seconds, which sounds odd to me since I thought that julia is a compiled language. What's happening?
Here is how I would change it:
function countPrime(n)
counter = 0
for i in 1:n
isPrime = true
for j in 2:i-1
if i % j == 0
isPrime = false
break
end
end
isPrime && (counter += 1)
end
return counter
end
This code runs in about 5 seconds on my laptop. Apart from stylistic changes the major change is that you should pass n as a parameter to your function and define the counter variable inside your functions.
The changes follow one of the first advices in the Performance Tips section of the Julia Manual.
The point is that when you use a global variable the Julia compiler is not able to make assumptions about the type of this variable (as it might change after the function was compiled), so it defensively assumes that it might be anything, which slows things down.
As for stylistic changes note that (isPrime==true) ? counter += 1 : counter can be written just as isPrime && (counter += 1) as you want to increment the counter if isPrime is true. Using the ternary operator ? : is not needed here.
To give a MWE of a problem with using global variables in functions:
julia> x = 10
10
julia> f() = x
f (generic function with 1 method)
julia> #code_warntype f()
MethodInstance for f()
from f() in Main at REPL[2]:1
Arguments
#self#::Core.Const(f)
Body::Any
1 ─ return Main.x
You can see that here inside the f function you refer to the global variable x. Therefore, when Julia compiles f it must assume that the value of x can have any type (which is called in Julia Any). Working with such values is slow as the compiler cannot use any optimizations that would take advantage of more specific type of value processed.

How to compare 1 number to an entire array?

Create a function that takes an array of hurdle heights and a jumper's jump height, and determine whether or not the hurdler can clear all the hurdles. A hurdler can clear a hurdle if their jump height is greater than or equal to the hurdle height.
My code:
def hj (arr, h)
i = 0
while i < arr.length
j = 0
while j < arr.length
if arr[i] > h
return false
end
j += 1
end
return true
i += 1
end
end
puts hj([2, 3, 6, 1, 3, 1, 8], 7)
Desired output: true if h is >= to any number in the array; false if h is < any number in the array (I want true or false to display once)
Where I'm questioning my own code:
Not sure if I need two while statements
the current array being passed should output false
loop seems to only be comparing the first set of numbers, so 7 and 2. Not sure why the loop is stopping.
Not sure if I'm utilizing true and false correctly
Feel like I should be using a block for this, but not sure where to implement it.
Thank you in advance for any feedback.
Some solutions:
Using loop
def hj(arr, h)
for elem in arr
return false if elem > h
end
true
end
See? Only one loop. Actually this is the most unruby implementation.
Using Enumerable#all?
def hj(arr, h)
arr.all?{|elem| elem <= h}
end
This is the most intuitive and most Ruby implementation.
Using Enumerable#max
If one can jump over the highest hurdle, he can jump over all hurdles.
def hj(arr, h)
arr.max <= h
end
Not sure if I need two while statements
You don't. You only need to traverse the list once. You're traversing, not sorting / reordering.
loop seems to only be comparing the first set of numbers, so 7 and 2. Not sure why the loop is stopping.
This is because you are using return true as the second last statement of your outer loop. Return interrupts function execution and returns immediately to the calling function - in this case, the last line of your program.
Feel like I should be using a block for this, but not sure where to implement it.
A block is the idiomatic ruby way to solve this. You are, essentially, wanting to check that your second parameter is larger than any value in the list which is your first parameter.
A solution in idiomatic ruby would be
def hj (arr, h)
# return true if h >= all elements in arr, false otherwise
# given arr = [1, 2, 3] and h = 2,
# returns [ true, true, false ] which all? then returns as false
# (since all? returns the boolean AND of the results of the block evaluation
arr.all? { |elem| elem <= h }
end

Is each slower than while in Ruby?

The following two functions, which check if a number is prime:
def prime1?(prime_candidate)
return true if [1, 2].include? prime_candidate
range = 2.upto(Math.sqrt(prime_candidate).ceil).to_a
i = 0
while i < range.count
return false if prime_candidate % range[i] == 0
range = range.reject { |j| j % range[i] == 0 }
end
true
end
def prime2?(prime_candidate)
return true if [1, 2].include? prime_candidate
range = 2.upto(Math.sqrt(prime_candidate).ceil).to_a
range.each do |i|
return false if prime_candidate % i == 0
range = range.reject { |j| j % i == 0 }
end
true
end
yield the following benchamrking result when testing with a very large prime (5915587277):
user system total real
prime1: 2.500000 0.010000 2.510000 ( 2.499585)
prime2: 20.700000 0.030000 20.730000 ( 20.717267)
Why is that? Is it because in the second function range does not get modified by the reject, so the each is iterating over the original long range?
When you do range=range.reject {..}, you don't modify the parent range (which you shouldn't do, because it would mess up the iteration--you need reject! to do that) but rather construct a temporary array which only gets assigned to the parent range variable at the end of the iteration.
The each iteration in prime2 runs over the whole original range, not the shortened which, before the loop ends, only exist in the block.
The while version modifies the original array and is therefore quicker (BTW, you realize that i remains zero and it's the range.count that changes (decreases) in that while condition and that reject iterates over the WHOLE array again--even the beginning where there couldn't possibly be any more nonprimes to reject).
You'll get a much faster result if you improve the logic of your code. That array manipulation is costly and for this, you don't even need an array:
def prime3?(prime_candidate)
return false if prime_candidate==1
return true if prime_candidate==2
range = 2..(Math.sqrt(prime_candidate).ceil)
range.all? {|x| prime_candidate % x !=0 }
end #about 300× times as fast for your example as prime1? on my PC

Max and Min Value ... Need method to return two variable values

I am new to coding and need help understanding what is wrong with my logic and or syntax in the following method... The program is supposed to return the max and min values of an array. My goal was to have two variables (max and min) outside of the method, so that as the method ran through the array the values would get replaced accordingly. thank you for your help...
list=[4,6,10,7,1,2]
max=list[0]
min=list[0]
def maxmin(list)
f=list.shift
if list.empty?then
return max = f
return min = f
end
t=maxmin(list)
if(f>t) then
return max = f
return min = t
else
return max = t
return min = f
end
end
printf("max=#{max}, min=#{min}, method return=%d\n", maxmin(list))
Using 1.9.1, there's minmax
>> list=[4,6,10,7,1,2]
=> [4, 6, 10, 7, 1, 2]
>> list.minmax
=> [1, 10]
Max and Min method are already in Stdlib or ruby
Enumerable#max
Enumerable#min
So use it
list.max
list.min
Edit: is your question just about returning two variable? If so, just separate them with a comma, and they will be returned as an array:
return min_value, max_value
To add to what has already been written (yes, use the built-in libraries), it is generally a bad idea to modify variables outside of the method they are being used in. Note that in framework call, new values are being returned. This lets the person calling the method decide what to do with those values, rather than this method assuming that the variables exist and then changing them all the time.
If I had to write it (and I'm new to Ruby so I might not be doing this as elegantly as possible, but it should be easy to follow), I would write it something like this:
def find_min_max(list)
if (list.nil? || list.count == 0)
return nil, nil
end
min = list.first
max = list.first
list.each do |item|
if item.nil?
next
elsif item < min
min = item
elsif item > max
max = item
end
end
return min, max
end
list = [1, 439, 2903, 23]
min_max = find_min_max list
p min_max
I would agree with the other answers given. There is no point writing this method, other than as a programming exercise.
There are a few problems with the logic which might explain why you are not getting the result you expect.
Firstly there are 3 pairs of return statements, the second of which would never be called because the method has already returned, e.g.
return max = f
return min = f # never gets called
You need to return both the min and max values to make the recursive algorithm work so I guess you need to return a pair of values or an array in a single return statement.
Secondly, your min and max variables initialized on lines 3 and 4 are not in scope within the minmax method body so you are actually defining new local variables there.
If you adapt your code you might end up with something like this, but this is not a method you need to write in production and I'm sure there are better ways to do it:
list = [4,6,10,7,1,2]
def maxmin(list)
f = list.shift
if list.empty?
return f, f
end
max, min = maxmin(list)
return f > max ? f : max, f < min ? f : min
end
max, min = maxmin(list)
puts "min = #{min}, max = #{max}"
The problem is that you try to use global variables (that are called like that : #max, #min) but you want your code to assign the values, that you don't even assign. You'd rather choose local variables over global, if it's possible, because of accessibility.
The second problem is that in the case you use global variable, you don't have to return anything. for exemple :
#value = 0
def test
#value = 1
end
puts #value ==> 0
test # change #value to 1
# it also return 1 because ruby return last statement value
puts #value ==> 1
In the case you use local variables you should return multiple results.
That's where Ruby do the job, Ruby automatically cast multiple return statement to array and array to multiple variables assignment)
list = [4,6,10,7,1,2]
def maxmin(list)
f = list.shift
if list.empty? then
return f, f # f is the minimum and the maximum of a list of one element
end
mi, ma = maxmin(list)
if (f > ma) then
ma = f
elsif (f < mi)
min = f
end
return mi, ma
end
min, max = maxmin(list)
printf("max=#{max}, min=#{min}")
Your way of doing is pretty fun (love recursivity) but it's not really elegant, the performances are not really good and moreover it's a bit confusing, that's far from ruby vision.
list = [4,6,10,7,1,2]
def minmax(list)
max = list[0]
min = list[0]
list.each do |elem|
if elem > max then
max = elem
elsif elem < min
min = elem
end
end
return min, max
end
min, max = minmax(list)
printf("max=#{max}, min=#{min}")
Is a clearer version of your code, even if less cool. You can try those answers with global variables it should be easy.
Obviously because of Ruby vision when you're done with that you're welcome to use Array.max and Array.min.

Resources