Confusing scoping rules under #distributed macro in julia - parallel-processing

I am trying to parallelize a code in Julia but ran into a weird scoping issue. I do not understand the scoping rules when passing a local variable to a function in the #distributed for loop. You get the expected behavior when executing the following code
using Distributed
addprocs(4)
#everywhere function k(x)
println("x = ", x)
return x
end
sumk = 0
sumk += #distributed (+) for i in 2:nprocs()
k(myid())
end
println("sumk = ", sumk)
Running this code gives
From worker 4: x = 4
From worker 2: x = 2
From worker 3: x = 3
From worker 5: x = 5
14
sumk = 14
Now, I modify the code a little bit to
using Distributed
addprocs(4)
#everywhere function k(x)
println("x = ", x)
return x
end
#everywhere x = myid()
sumk = 0
sumk += #distributed (+) for i in 2:nprocs()
k(x)
end
println("sumk = ", sumk)
which gives the following result upon execution:
From worker 2: x = 1
From worker 4: x = 1
From worker 5: x = 1
From worker 3: x = 1
4
sumk = 4
Here, I do not understand why myid() works locally but x is taken from the process 1 only.
Thank you for your help.

It looks like the right-hand side of the #everywhere macro is evaluated locally.
You could do:
remote_do.( Ref(()->global x = myid()), workers())
And now:
#distributed (+) for i in 2:nprocs()
k(x)
end
From worker 3: x = 3
From worker 2: x = 2
From worker 5: x = 5
From worker 4: x = 4
14

Related

Julia 1.0 UndefVarError - Scope of Variable

I am moving from Julia 0.7 to 1.0. It seems that Julia's rule for the scope of variables changed from 0.7 to 1.0. For example, I want to run a simple loop like this:
num = 0
for i = 1:5
if i == 3
num = num + 1
end
end
print(num)
In Julia 0.7 (and in most of other languages), we could expect num = 1 after the loop. However, it will incur UndefVarError: num not defined in Julia 1.0. I know that by using let I can do this
let
num = 0
for i = 1:5
if i == 3
num = num + 1
end
end
print(num)
end
It will print out 1. But I do want to get the num = 1 outside the loop and the let block. Some answers suggest putting all code in a let block, but it will incur other problems including UndefVarError while testing line-by-line. Is there any way instead of using let blocking? Thanks!
This is discussed here.
Add global as shown below inside the loop for the num variable.
num = 0
for i = 1:5
if i == 3
global num = num + 1
end
end
print(num)
Running in the Julia 1.0.0 REPL:
julia> num = 0
0
julia> for i = 1:5
if i == 3
global num = num + 1
end
end
julia> print(num)
1
Edit
For anyone coming here new to Julia, the excellent comment made in the answer below by vasja, should be noted:
Just remember that inside a function you won't use global, since the scope rules inside a function are as you would expect:
See that answer for a good example of using a function for the same code without the scoping problem.
Just remember that inside a function you won't use global, since the scope rules inside a function are as you would expect:
function testscope()
num = 0
for i = 1:5
if i == 3
num = num + 1
end
end
return num
end
julia> t = testscope()
1
The unexpected behaviour is only in REPL.
More on this here

use different arrays for each workers instead of SharedArrays in Julia

I have a function like this:
#everywhere function bellman_operator!(rbc::RBC)
...
#sync #parallel for i = 1:m
....
for j = 1:n
v_max = -1000.0
...
for l = Next : n
......
if v > vmax
vmax = v
Next = l
else
break
end
end
f_v[j, i] = vmax
f_p[j, i] = k
end
end
end
f_v and f_p are sharedArrays, I want to give different arrays for result of each workers, I saw some sample but I can't fix it.How can I use arrays for result of each workers and finally combine the results instead of using SharedArrays?
Is this what you want?
Example 1. Combining results using +:
a = #parallel (+) for i in 1:1000
rand(10, 10)
end
Example 2. Just collecting the results without combining them:
x = Future[]
for i in 1:1000
push!(x, #spawn rand(10,10))
end
y = fetch.(x)

Perform index-wise matrix operation in julia

I want to perform index-wise operation on a matrix. I know that you can write a regular function and perform it on each entry of the matrix e.g.
function foo(x::Int64)
return x * 2
end
myArray = [1 2 3; 4 5 6]
foo.(myArray)
how would I go about doing something like x * x.elementCol + x.elementrow? essentially the following code in parallel:
function goo(x::Array{Int64,2})
for j = 1:size(x,2)
for i = 1:size(x,1)
x[i,j] = (x[i,j] * j) + i
end
end
return x
end
You can write:
x .= x .* indices(x, 2)' .+ indices(x, 1)

Why is my Julia shared array code running so slow?

I'm trying to implement Smith-Waterman alignment in parallel using Julia (see: Figure 1 of http://www.cs.virginia.edu/~rl6sf/paper_dump/2011:12:33:22.pdf), but the algorithm is running much slower in Julia than the serial version. I'm using shared arrays to do this and figure I am doing something silly that is making the code run slow. Could someone take a look and see if my code is optimized as possible? The parallel version should run faster than in serial….
The basic concept of it is to compute the anti-diagonal elements of a matrix in parallel from the upper left to lower right corner and to update them. I'm trying to use 32 cores on a shared array machine to do this. I have a SharedArray matrix that I am using to do this and am computing the elements of each anti-diagonal in parallel as shown below. The while loops in the spSW function submit tasks to workers in sync for each anti-diagonal using the helper function shared_get_score(). The main goal of this function is to fill in each element in the shared arrays "matrix" and "path".
function spSW(seq1,seq2,p)
indel = -1
match = 2
seq1 = "^$seq1"
seq2 = "^$seq2"
col = length(seq1)
row = length(seq2)
wl = workers()
matrix,path = shared_initialize_path(seq1,seq2)
for j = 2:col
jcol = j
irow = 2
#sync begin
count = 0
while jcol > 1 && irow < row + 1
#println(j," ",irow," ",jcol)
if seq1[jcol] == seq2[irow]
equal = true
else
equal = false
end
w = wl[(count % p) + 1]
#async remotecall_wait(w,shared_get_score!,matrix,path,equal,indel,match,irow,jcol)
jcol -= 1
irow += 1
count += 1
end
end
end
for i = 3:row
jcol = col
irow = i
#sync begin
count = 0
while irow < row+1 && jcol > 1
#println(j," ",irow," ",jcol)
if seq1[jcol] == seq2[irow]
equal = true
else
equal = false
end
w = wl[(count % p) + 1]
#async remotecall_wait(w,shared_get_score!,matrix,path,equal,indel,match,irow,jcol)
jcol -= 1
irow += 1
count += 1
end
end
end
return matrix,path
end
The other helper functions are:
function shared_initialize_path(seq1,seq2)
col = length(seq1)
row = length(seq2)
matrix = convert(SharedArray,fill(0,(row,col)))
path = convert(SharedArray,fill(0,(row,col)))
return matrix,path
end
#everywhere function shared_get_score!(matrix,path,equal,indel,match,i,j)
pathvalscode = ["-","|","M"]
pathvals = [1,2,3]
scores = []
push!(scores,matrix[i,j-1]+indel)
push!(scores,matrix[i-1,j]+indel)
if equal
push!(scores,matrix[i-1,j-1]+match)
else
push!(scores,matrix[i-1,j-1]+indel)
end
val,ind = findmax(scores)
if val < 0
matrix[i,j] = 0
else
matrix[i,j] = val
end
path[i,j] = pathvals[ind]
end
Does anyone see an obvious way to make this run faster? Right now it's about 10 times slower than the serial version.

How to write a parallel loop in julia?

I have the following Julia code and I would like to parallelize it.
using DistributedArrays
function f(x)
return x^2;
end
y = DArray[]
#parallel for i in 1:100
y[i] = f(i)
end
println(y)
The output is DistributedArrays.DArray[]. I would like to have the value of y as follows: y=[1,4,9,16,...,10000]
You can use n-dimensional distributed array comprehensions:
First you need to add some more processes, either local or remote:
julia> addprocs(CPU_CORES - 1);
Then you must use DistributedArrays at every one of the spawned processes:
julia> #everywhere using DistributedArrays
Finally you can use the #DArray macro, like this:
julia> x = #DArray [#show x^2 for x = 1:10];
From worker 2: x ^ 2 = 1
From worker 2: x ^ 2 = 4
From worker 4: x ^ 2 = 64
From worker 2: x ^ 2 = 9
From worker 4: x ^ 2 = 81
From worker 4: x ^ 2 = 100
From worker 3: x ^ 2 = 16
From worker 3: x ^ 2 = 25
From worker 3: x ^ 2 = 36
From worker 3: x ^ 2 = 49
You can see it does what you expect:
julia> x
10-element DistributedArrays.DArray{Int64,1,Array{Int64,1}}:
1
4
9
16
25
36
49
64
81
100
Remember it works with an arbitrary number of dimensions:
julia> y = #DArray [#show i + j for i = 1:3, j = 4:6];
From worker 4: i + j = 7
From worker 4: i + j = 8
From worker 4: i + j = 9
From worker 2: i + j = 5
From worker 2: i + j = 6
From worker 2: i + j = 7
From worker 3: i + j = 6
From worker 3: i + j = 7
From worker 3: i + j = 8
julia> y
3x3 DistributedArrays.DArray{Int64,2,Array{Int64,2}}:
5 6 7
6 7 8
7 8 9
julia>
This is the most julian way to do what you intended IMHO.
We can look at macroexpand output in order to see what's going on:
Note: this output has been slightly edited for readability, T stands for:
DistributedArrays.Tuple{DistributedArrays.Vararg{DistributedArrays.UnitRange{DistributedArrays.Int}}}
julia> macroexpand(:(#DArray [i^2 for i = 1:10]))
:(
DistributedArrays.DArray(
(
#231#I::T -> begin
[i ^ 2 for i = (1:10)[#231#I[1]]]
end
),
DistributedArrays.tuple(DistributedArrays.length(1:10))
)
)
Which basically is the same as manually typing:
julia> n = 10; dims = (n,);
julia> DArray(x -> [i^2 for i = (1:n)[x[1]]], dims)
10-element DistributedArrays.DArray{Any,1,Array{Any,1}}:
1
4
9
16
25
36
49
64
81
100
julia>
Hi Kira,
I am new on Julia, but facing the same problem. Try this approach and see if it fits your needs.
function f(x)
return x^2;
end
y=#parallel vcat for i= 1:100
f(i);
end;
println(y)
Regards, RN

Resources