Run function in same module and file in different process - multiprocessing

If I have the following Julia code snippet, is there any way I can run the for loop with multiple processes without putting complicated into an extra file and doing something like #everywhere include("complicated.jl")?
Otherwise, the processes don't seem to be able to find the function.
function complicated(x)
# long and complicated computation
x^2
end
function run()
results = []
for i in 1:4
push!(results, #spawn complicated(3))
end
return mean(results)
end

Just annotate the expression you want to define in all processors with #everywhere macro (in Julia everything is an expression):
julia> addprocs(Sys.CPU_CORES)
12-element Array{Int64,1}:
2
3
4
5
6
7
8
9
10
11
12
13
julia> #everywhere function complicated(x)
# long and complicated computation
x^2
end
julia> function main()
#sync results = #parallel vcat for i in 1:4
complicated(3)
end
return mean(results)
end
main (generic function with 1 method)
julia> main()
9.0
Note: run is an already existing function in Base.

Related

Execution time julia program to count primes

I am experimenting a bit with julia, since I've heard that it is suitable for scientific calculus and its syntax is reminiscent of python. I tried to write and execute a program to count prime numbers below a certain n, but the performances are not the ones hoped.
Here I post my code, with the disclaimer that I've literally started yesterday in julia programming and I am almost sure that something is wrong:
n = 250000
counter = 0
function countPrime(counter)
for i = 1:n
# print("begin counter= ", counter, "\n")
isPrime = true
# print("i= ", i, "\n")
for j = 2:(i-1)
if (i%j) == 0
isPrime = false
# print("j= ", j, "\n")
break
end
end
(isPrime==true) ? counter += 1 : counter
# print("Counter= ", counter, "\n")
end
return counter
end
println(countPrime(counter))
The fact is that the same program ported in C has about 5 seconds of execution time, while this one in julia has about 3 minutes and 50 seconds, which sounds odd to me since I thought that julia is a compiled language. What's happening?
Here is how I would change it:
function countPrime(n)
counter = 0
for i in 1:n
isPrime = true
for j in 2:i-1
if i % j == 0
isPrime = false
break
end
end
isPrime && (counter += 1)
end
return counter
end
This code runs in about 5 seconds on my laptop. Apart from stylistic changes the major change is that you should pass n as a parameter to your function and define the counter variable inside your functions.
The changes follow one of the first advices in the Performance Tips section of the Julia Manual.
The point is that when you use a global variable the Julia compiler is not able to make assumptions about the type of this variable (as it might change after the function was compiled), so it defensively assumes that it might be anything, which slows things down.
As for stylistic changes note that (isPrime==true) ? counter += 1 : counter can be written just as isPrime && (counter += 1) as you want to increment the counter if isPrime is true. Using the ternary operator ? : is not needed here.
To give a MWE of a problem with using global variables in functions:
julia> x = 10
10
julia> f() = x
f (generic function with 1 method)
julia> #code_warntype f()
MethodInstance for f()
from f() in Main at REPL[2]:1
Arguments
#self#::Core.Const(f)
Body::Any
1 ─ return Main.x
You can see that here inside the f function you refer to the global variable x. Therefore, when Julia compiles f it must assume that the value of x can have any type (which is called in Julia Any). Working with such values is slow as the compiler cannot use any optimizations that would take advantage of more specific type of value processed.

Julia: parallel loops involving a function that calls another function

I have two files
test_file.jl
using Distributed
function inner_function(v)
v_2 = 2 * v
return v_2
end
function loop(N, v)
#sync #distributed for i in 1:N
v_3 = inner_function(v)
v_3[1] = i
println(i)
println(v_3)
end
end
test_file_call.jl
#everywhere include("test_file.jl")
v = [1 2 3 4]
loop(100,v_2)
When I run julia -p 2 test_file_call.jl, I get an error saying that
ERROR: LoadError: UndefVarError: v_2 not defined
I'm not sure why. v_2 is a variable created in a function and I've already used #everywhere include("test_file.jl") so Julia shouldn't say that the variable is undefined. Can I get a hint? Thank you!
You use v_2 in loop(100, v_2) call, so Julia looks for v_2 in global scope and does not find it there. Probably you wanted to write loop(100, v) as you define v.

Data Movement on Julia (Parallel)

We have a function in count_heads.jl:
function count_hands(n)
c::Int=0
for i=1:n
c+=rand(Bool)
end
c
end
we run julia as ./julia -p 2
we want to calculate a and b in different process and we have :
julia> #everywhere include("count_hands.jl")
julia> a=#spawn count_hands(1000000000)
julia> b=#spawn count_hands(1000000000)
julia> fetch(a)+fetch(b)
1: How we can be sure we are calculating a and b in a different process?
I know we can use #spawnat instead of #spawn and choose the number of process but I saw this code and I want to know How we can sure about that.
we suppose it is correct and both of them are computing in different process, count_hands(1000000000) for each a and b is calculating in different process and then they are adding together in process 1. Is this right?
How we can be sure we are calculating a and b in a different process?
You can't unless you use #spawnat n and ensure that nprocs() is greater than or equal to n, and that the ns are different.
Is this right?
Yes, assuming that you've used #spawnat 1 for a. You can test this by rewriting your function as follows:
julia> #everywhere function count_hands(n)
println("this process is $(myid())")
c::Int=0
for i=1:n
c+=rand(Bool)
end
c
end
julia> a = #spawnat 1 count_hands(1000)
this process is 1
Future(1, 1, 11, Nullable{Any}())
julia> b = #spawnat 2 count_hands(1000)
Future(2, 1, 12, Nullable{Any}())
julia> From worker 2: this process is 2
julia>
julia> fetch(a) + fetch(b)
1021

Julia - partially shared arrays with #parallel

I want to alter a shared array owned by only some of my processes:
julia> addprocs(4)
4-element Array{Int64,1}:
2
3
4
5
julia> s = SharedArray(Int, (100,), pids=[2,3]);
julia>for i in procs() println(remotecall_fetch(localindexes, i, s)) end
1:0
1:50
51:100
1:0
1:0
This works, but I want to be able to parallelize the loop:
julia> for i=1:100 s[i] = i end
This results in processes 4 and 5 terminating with a segfault:
julia> #parallel for i=1:100 s[i] = i end
Question: Why does this terminate the processes rather than throw an exception or split the loop only among the processes that share the array?
I expected this to work instead, but it does not fill the entire array:
julia> #parallel for i in localindexes(s) s[i] = i end
Each process updates the part of the array that is local to it, and since the array is shared, changes made by one process should be visible to all processes. Why is some of the array still unchanged?
Replacing #parallel with #everywhere gives the error that s is not defined on process 2. How can a process which owns part of the array be unaware of it?
I am so confused. What is the best way to parallelize this loop?
Will this do the trick?
#sync begin
for i in procs(s) # <-- note loop over process IDs of SharedArray s!
#async #spawnat i setindex!(s, i, localindexes(s))
end
end
You may run into issues if the master process is called here; in that case, you can try building your own function modeled on the pmap example.

Parallel programming in Julia

I have been following the docs for parallel programming in julia and for my mind, which thinks like openMP or MPI, I find the design choice quite strange.
I have an application where I want data to be distributed among processes, and then I want to tell each process to apply some operation to whatever data it is assigned, yet I do not see a way of doing this in Julia. Here is an example
julia> r = remotecall(2, rand, 2)
RemoteRef{Channel{Any}}(2,1,30)
julia> fetch(r)
2-element Array{Float64,1}:
0.733308
0.45227
so on process 2 lives a random array with 2 elements. I can apply some function to this array via
julia> remotecall_fetch(2, getindex, r, 1)
0.7333080770447185
but why does it not work if i apply a function which should change the vector, like:
julia> remotecall_fetch(2, setindex!, r, 1,1)
ERROR: On worker 2:
MethodError: `setindex!` has no method matching setindex!(::RemoteRef{Channel{Any}}, ::Int64, ::Int64)
in anonymous at multi.jl:892
in run_work_thunk at multi.jl:645
[inlined code] from multi.jl:892
in anonymous at task.jl:63
in remotecall_fetch at multi.jl:731
in remotecall_fetch at multi.jl:734
I don't quite know how to describe it, but it seems like the workers can only return "new" things. I don't see how I can send some variables and a function to a worker and have the function modify the variables in place. In the above example, I'd like the array to live on a single process and ideally I'd be able to tell that process to perform some operations on that array. After all the operations are finished I could then fetch results etc.
I think you can achive this with the macro #spawnat:
julia> addprocs(2)
2-element Array{Int64,1}:
2
3
julia> r = remotecall(2, rand, 2)
RemoteRef{Channel{Any}}(2,1,3)
julia> fetch(r)
2-element Array{Float64,1}:
0.149753
0.687653
julia> remotecall_fetch(2, getindex, r, 1)
0.14975250913699378
julia> #spawnat 2 setindex!(fetch(r), 320.0, 1)
RemoteRef{Channel{Any}}(2,1,6)
julia> fetch(r)
2-element Array{Float64,1}:
320.0
0.687653
julia> #spawnat 2 setindex!(fetch(r), 950.0, 2)
RemoteRef{Channel{Any}}(2,1,8)
julia> fetch(r)
2-element Array{Float64,1}:
320.0
950.0
But with remotecall_fetch, it looks like the returned array is really a copy:
julia> remotecall_fetch(2, setindex!, fetch(r), 878.99, 1)
2-element Array{Float64,1}:
878.99
950.0
julia> remotecall_fetch(2, setindex!, fetch(r), 232.99, 2)
2-element Array{Float64,1}:
320.0
232.99
julia> fetch(r)
2-element Array{Float64,1}:
320.0
950.0
with: Julia Version 0.4.3
You may find Distributed Arrays useful, based on the description of your need.

Resources