Julia: parallel loops involving a function that calls another function - parallel-processing

I have two files
test_file.jl
using Distributed
function inner_function(v)
v_2 = 2 * v
return v_2
end
function loop(N, v)
#sync #distributed for i in 1:N
v_3 = inner_function(v)
v_3[1] = i
println(i)
println(v_3)
end
end
test_file_call.jl
#everywhere include("test_file.jl")
v = [1 2 3 4]
loop(100,v_2)
When I run julia -p 2 test_file_call.jl, I get an error saying that
ERROR: LoadError: UndefVarError: v_2 not defined
I'm not sure why. v_2 is a variable created in a function and I've already used #everywhere include("test_file.jl") so Julia shouldn't say that the variable is undefined. Can I get a hint? Thank you!

You use v_2 in loop(100, v_2) call, so Julia looks for v_2 in global scope and does not find it there. Probably you wanted to write loop(100, v) as you define v.

Related

Run function in same module and file in different process

If I have the following Julia code snippet, is there any way I can run the for loop with multiple processes without putting complicated into an extra file and doing something like #everywhere include("complicated.jl")?
Otherwise, the processes don't seem to be able to find the function.
function complicated(x)
# long and complicated computation
x^2
end
function run()
results = []
for i in 1:4
push!(results, #spawn complicated(3))
end
return mean(results)
end
Just annotate the expression you want to define in all processors with #everywhere macro (in Julia everything is an expression):
julia> addprocs(Sys.CPU_CORES)
12-element Array{Int64,1}:
2
3
4
5
6
7
8
9
10
11
12
13
julia> #everywhere function complicated(x)
# long and complicated computation
x^2
end
julia> function main()
#sync results = #parallel vcat for i in 1:4
complicated(3)
end
return mean(results)
end
main (generic function with 1 method)
julia> main()
9.0
Note: run is an already existing function in Base.

Data Movement on Julia (Parallel)

We have a function in count_heads.jl:
function count_hands(n)
c::Int=0
for i=1:n
c+=rand(Bool)
end
c
end
we run julia as ./julia -p 2
we want to calculate a and b in different process and we have :
julia> #everywhere include("count_hands.jl")
julia> a=#spawn count_hands(1000000000)
julia> b=#spawn count_hands(1000000000)
julia> fetch(a)+fetch(b)
1: How we can be sure we are calculating a and b in a different process?
I know we can use #spawnat instead of #spawn and choose the number of process but I saw this code and I want to know How we can sure about that.
we suppose it is correct and both of them are computing in different process, count_hands(1000000000) for each a and b is calculating in different process and then they are adding together in process 1. Is this right?
How we can be sure we are calculating a and b in a different process?
You can't unless you use #spawnat n and ensure that nprocs() is greater than or equal to n, and that the ns are different.
Is this right?
Yes, assuming that you've used #spawnat 1 for a. You can test this by rewriting your function as follows:
julia> #everywhere function count_hands(n)
println("this process is $(myid())")
c::Int=0
for i=1:n
c+=rand(Bool)
end
c
end
julia> a = #spawnat 1 count_hands(1000)
this process is 1
Future(1, 1, 11, Nullable{Any}())
julia> b = #spawnat 2 count_hands(1000)
Future(2, 1, 12, Nullable{Any}())
julia> From worker 2: this process is 2
julia>
julia> fetch(a) + fetch(b)
1021

juMP - use variable defined in sum range

I am trying to define a constraint containing summation over two indices, k and t.
for i in data.I
for j in 1:length(data.P[i])
#constraint(m, w[i, j, length(data.T[data.P[i][j]])]/(1+sum(data.A[i][k][t] for k in 1:length(data.P[i]), t in data.T[data.P[i][k]])) <= s[i, j])
end
end
I get the following error in running the code:
ERROR: LoadError: UndefVarError: k not defined
I have implemented the same model in OPL for CPLEX in the same way, and this was not an issue. Am I not allowed to introduce such variable as an index in the summation, then use it subsequently as an index to an array within the same sum() as I am trying to do above?
This is a question of Julia syntax:
julia> sum(i+j for i in 1:3, j in 1:i)
ERROR: UndefVarError: i not defined
julia> sum(i+j for i in 1:3 for j in 1:i)
24
The same should hold for JuMP.
My colleague found a workaround to this issue. Converting the sum into the equivalent double sum made it work, i.e.:
sum(data.A[i][k][t] for k = 1:length(data.P[i]), t = data.T[data.P[i][k]])
was changed to:
sum(sum(data.A[i][k][t] for t = data.T[data.P[i][k]]) for k = 1:length(data.P[i]))
This solves the issue.

Julia - partially shared arrays with #parallel

I want to alter a shared array owned by only some of my processes:
julia> addprocs(4)
4-element Array{Int64,1}:
2
3
4
5
julia> s = SharedArray(Int, (100,), pids=[2,3]);
julia>for i in procs() println(remotecall_fetch(localindexes, i, s)) end
1:0
1:50
51:100
1:0
1:0
This works, but I want to be able to parallelize the loop:
julia> for i=1:100 s[i] = i end
This results in processes 4 and 5 terminating with a segfault:
julia> #parallel for i=1:100 s[i] = i end
Question: Why does this terminate the processes rather than throw an exception or split the loop only among the processes that share the array?
I expected this to work instead, but it does not fill the entire array:
julia> #parallel for i in localindexes(s) s[i] = i end
Each process updates the part of the array that is local to it, and since the array is shared, changes made by one process should be visible to all processes. Why is some of the array still unchanged?
Replacing #parallel with #everywhere gives the error that s is not defined on process 2. How can a process which owns part of the array be unaware of it?
I am so confused. What is the best way to parallelize this loop?
Will this do the trick?
#sync begin
for i in procs(s) # <-- note loop over process IDs of SharedArray s!
#async #spawnat i setindex!(s, i, localindexes(s))
end
end
You may run into issues if the master process is called here; in that case, you can try building your own function modeled on the pmap example.

Julia: Accessing DArray in a function on a processor

I create a DArray:
d = dzeros(3)
Now I would like to run a function in parallel using pmap. I would like that function to access whatever part of d is local on the current processor. Something like
function foo()
global d
a = localpart(d)
a[1] = 1
end
However, I get
exception on 2: exception on 4: ERROR: d not defined
in mcmc_sub! at /home/benjamin/.julia/v0.3/Mamba/src/model/mcmc.jl:67
in anonymous at multi.jl:847
in run_work_thunk at multi.jl:613
in anonymous at task.jl:847
on each process.
d should be defined on each processor. For example code like this works:
julia> d = dzeros(3)
3-element DArray{Float64,1,Array{Float64,1}}:
0.0
0.0
0.0
julia> #spawnat(2, (a = localpart(d); a[1]=1;))
RemoteRef(2,1,65)
julia> d
3-element DArray{Float64,1,Array{Float64,1}}:
1.0
0.0
0.0
I'm not totally sure no copy happens, but my understanding is that you can just pass d as a parameter (it is a reference to the whole thing, passing it won't move the data).
Simple example:
function foo(d, u)
r, = myindexes(d)
return u * 100000 + sum(d[r])
end
function main()
d = distribute(1:100)
show(pmap(x-> foo(d, x), 1:10))
end
# julia -p 2 -L test.jl -e "main()"
I not sure whether you can assign to the distributed array in this way, you probably want to create a new one (piece-by-piece); this is what is done in the cellular automaton example.

Resources