Data Movement on Julia (Parallel) - parallel-processing

We have a function in count_heads.jl:
function count_hands(n)
c::Int=0
for i=1:n
c+=rand(Bool)
end
c
end
we run julia as ./julia -p 2
we want to calculate a and b in different process and we have :
julia> #everywhere include("count_hands.jl")
julia> a=#spawn count_hands(1000000000)
julia> b=#spawn count_hands(1000000000)
julia> fetch(a)+fetch(b)
1: How we can be sure we are calculating a and b in a different process?
I know we can use #spawnat instead of #spawn and choose the number of process but I saw this code and I want to know How we can sure about that.
we suppose it is correct and both of them are computing in different process, count_hands(1000000000) for each a and b is calculating in different process and then they are adding together in process 1. Is this right?

How we can be sure we are calculating a and b in a different process?
You can't unless you use #spawnat n and ensure that nprocs() is greater than or equal to n, and that the ns are different.
Is this right?
Yes, assuming that you've used #spawnat 1 for a. You can test this by rewriting your function as follows:
julia> #everywhere function count_hands(n)
println("this process is $(myid())")
c::Int=0
for i=1:n
c+=rand(Bool)
end
c
end
julia> a = #spawnat 1 count_hands(1000)
this process is 1
Future(1, 1, 11, Nullable{Any}())
julia> b = #spawnat 2 count_hands(1000)
Future(2, 1, 12, Nullable{Any}())
julia> From worker 2: this process is 2
julia>
julia> fetch(a) + fetch(b)
1021

Related

Run function in same module and file in different process

If I have the following Julia code snippet, is there any way I can run the for loop with multiple processes without putting complicated into an extra file and doing something like #everywhere include("complicated.jl")?
Otherwise, the processes don't seem to be able to find the function.
function complicated(x)
# long and complicated computation
x^2
end
function run()
results = []
for i in 1:4
push!(results, #spawn complicated(3))
end
return mean(results)
end
Just annotate the expression you want to define in all processors with #everywhere macro (in Julia everything is an expression):
julia> addprocs(Sys.CPU_CORES)
12-element Array{Int64,1}:
2
3
4
5
6
7
8
9
10
11
12
13
julia> #everywhere function complicated(x)
# long and complicated computation
x^2
end
julia> function main()
#sync results = #parallel vcat for i in 1:4
complicated(3)
end
return mean(results)
end
main (generic function with 1 method)
julia> main()
9.0
Note: run is an already existing function in Base.

reduction parallel loop in julia

We can use
c = #parallel (vcat) for i=1:10
(i,i+1)
end
But when I'm trying to use push!() instead of vcat() I'm getting some error. How can I use push!() in this parallel loop?
c = #parallel (push!) for i=1:10
(c, (i,i+1))
end
The #parallel is somewhat similar to foldl(op, itr) in that it uses the first value of itr as an initial first parameter for op. push! lacks the required symmetry between the operands. Perhaps what you are looking for is:
julia> c = #parallel (append!) for i=1:10
[(i,i+1)]
end
Elaborating a bit on Dan's point; to see how the parallel macro works, see the difference between the following two invocations:
julia> #parallel print for i in 1:10
(i,i+1)
end
(1, 2)(2, 3)nothing(3, 4)nothing(4, 5)nothing(5, 6)nothing(6, 7)nothing(7, 8)nothing(8, 9)nothing(9, 10)nothing(10, 11)
julia> #parallel string for i in 1:10
(i,i+1)
end
"(1, 2)(2, 3)(3, 4)(4, 5)(5, 6)(6, 7)(7, 8)(8, 9)(9, 10)(10, 11)"
From the top one it should be clear what's going on. Each iteration produces an output. When it comes to using the specified function on those outputs, this is done in output pairs. Two first pair of outputs is fed to print, and the result of the print operation then becomes the first item in the next pair to be processed. Since the output is nothing, print prints nothing then (3,4). The result of this print statement is nothing, therefore the next pair to be printed is nothing and (4,5), and so on until all elements are consumed. I.e. in terms of pseudocode, this is what's happening:
Step 1: state = print((1,2), (2,3)); # state becomes nothing
Step 2: state = print(state, (3,4)); # state becomes nothing again
Step 3: state = print(state, (4,5)); # and so forth
The reason string works as expected is because what's happening is the following steps:
Step 1: state = string((1,2),(2,3));
Step 2: state = string(state, (3,4));
Step 3: state = string(state, (4,5);
etc
In general, the function you pass to the parallel macro should be something that takes two inputs of the same type, and outputs an object of the same type.
Therefore you cannot use push!, because this always uses two inputs of different types (one array, and one plain element), and outputs an array. Therefore you need to use append! instead, which fits the specification.
Also note that the order of outputs is not guaranteed. (here it happens to be in order because I only used 1 worker). If you want something where the order of operations matters, then you shouldn't use this construct. E.g., obviously in something like addition it doesn't matter, because addition is a completely associative operation; but if I used string, if outputs are processed in different order, then obviously you could end up with a different string than what you'd expect.
EDIT - addressing benchmark between vcat / append! / indexed assignment
I think the most efficient way to do this is in fact via normal indexing onto a preallocated array. But between append! and vcat, append will most certainly be faster as vcat always makes a copy (as I understand it).
Benchmarks:
function parallelWithVcat!( A::Array{Tuple{Int64, Int64}, 1} )
A = #parallel vcat for i = 1:10000
(i, i+1)
end
end;
function parallelWithFunction!( A::Array{Tuple{Int64, Int64}, 1} )
A = #parallel append! for i in 1:10000
[(i, i+1)];
end
end;
function parallelWithPreallocation!( A::Array{Tuple{Int64, Int64}, 1} )
#parallel for i in 1:10000
A[i] = (i, i+1);
end
end;
A = Array{Tuple{Int64, Int64}, 1}(10000);
### first runs omitted, all benchmarks here are from 2nd runs ###
# first on a single worker:
#time for n in 1:100; parallelWithVcat!(A); end
#> 8.050429 seconds (24.65 M allocations: 75.341 GiB, 15.42% gc time)
#time for n in 1:100; parallelWithFunction!(A); end
#> 0.072325 seconds (1.01 M allocations: 141.846 MiB, 52.69% gc time)
#time for n in 1:100; parallelWithPreallocation!(A); end
#> 0.000387 seconds (4.21 k allocations: 234.750 KiB)
# now with true parallelism:
addprocs(10);
#time for n in 1:100; parallelWithVcat!(A); end
#> 1.177645 seconds (160.02 k allocations: 109.618 MiB, 0.75% gc time)
#time for n in 1:100; parallelWithFunction!(A); end
#> 0.060813 seconds (111.87 k allocations: 70.585 MiB, 3.91% gc time)
#time for n in 1:100; parallelWithPreallocation!(A); end
#> 0.058134 seconds (116.16 k allocations: 4.174 MiB)
If someone can suggest an even more efficient way, please do so!
Note in particular that the indexed assignment is much faster than the rest, such that it appears (for this example at least) that most of its computation in the parallel case appears to be lost on the parallelisation itself.
Disclaimer: I make no claim that the above are correct summonings of the #parallel spell. I have not delved into the inner workings of the macro in detail to be able to claim otherwise. In particular, I am not aware which parts the macro causes to be processed remotely vs local (e.g. the assignment part). Caution is advised, ymmv, etc.

Julia - partially shared arrays with #parallel

I want to alter a shared array owned by only some of my processes:
julia> addprocs(4)
4-element Array{Int64,1}:
2
3
4
5
julia> s = SharedArray(Int, (100,), pids=[2,3]);
julia>for i in procs() println(remotecall_fetch(localindexes, i, s)) end
1:0
1:50
51:100
1:0
1:0
This works, but I want to be able to parallelize the loop:
julia> for i=1:100 s[i] = i end
This results in processes 4 and 5 terminating with a segfault:
julia> #parallel for i=1:100 s[i] = i end
Question: Why does this terminate the processes rather than throw an exception or split the loop only among the processes that share the array?
I expected this to work instead, but it does not fill the entire array:
julia> #parallel for i in localindexes(s) s[i] = i end
Each process updates the part of the array that is local to it, and since the array is shared, changes made by one process should be visible to all processes. Why is some of the array still unchanged?
Replacing #parallel with #everywhere gives the error that s is not defined on process 2. How can a process which owns part of the array be unaware of it?
I am so confused. What is the best way to parallelize this loop?
Will this do the trick?
#sync begin
for i in procs(s) # <-- note loop over process IDs of SharedArray s!
#async #spawnat i setindex!(s, i, localindexes(s))
end
end
You may run into issues if the master process is called here; in that case, you can try building your own function modeled on the pmap example.

Add a row to a matrix inside a function (and propagate the changes outside) in Julia?

This is similar to this question:
Add a row to a matrix in Julia?
But now I want to grow the matrix inside a function:
function f(mat)
mat = vcat(mat, [1 2 3])
end
Now, outside this function:
mat = [2 3 4]
f(mat)
But this doesn't work. The changes made to mat inside f aren't propagated outside, because a new mat was created inside f (see http://docs.julialang.org/en/release-0.4/manual/faq/#functions).
Is it possible to do what I want?
Multi-dimensional arrays cannot have their size changed. There are pointer hacks to share data, but these do not modify the size of the original array.
Even if it were possible, be aware that because Julia matrices are column major, this operation is very slow, and requires a copy of the array.
In Julia, operations that modify the data passed in (i.e., performing computations on data instead of with data) are typically marked with !. This denotes to the programmer that the collection being processed will be modified. These kinds of operations are typically called "in-place" operations, because although they are harder to use and reason about, they avoid using additional memory, and can usually complete faster.
There is no way to avoid a copy for this operation because of how matrices are stored in memory. So there is not much real benefit to turning this particular operation into an in-place operation. Therefore, I recommend against it.
If you really need this operation for some reason, you should not use a matrix, but rather a vector of vectors:
v = Vector{Float64}[]
push!(v, [1.0, 2.0, 3.0])
This data structure is slightly slower to access, but much faster to add to.
On the other hand, from what it sounds like, you may be interested in a more specialized data structure, such as a DataFrame.
I agree with Fengyang's very comprehensive answer and excellent avoidance of an XY Problem. Just adding my 2¢.
push!ing is a fairly expensive operation, since at each push a new location in memory needs to be found to accommodate the new variable of larger size.
If you care about efficiency, then it would be a lot more prudent to preallocate your mat and mutate it (i.e. alter its contents) inside your function, which is permitted, i.e.
julia> mat = Array(Int64, (100,3)); # mat[1,:] contains garbage values
julia> f(mat, ind, x) = mat[ind,:] = x;
julia> f(mat, 1, [1 2 3]); # mat[1,:] now contains [1,2,3]
If the reason you prefer the push! approach is because you don't want to keep track of the index and pass it manually, then you can automate this process in your function too, by keeping a mutatable counter e.g.
function f(mat, c, x)
c[1] = c[1] + 1;
mat[c[1], :] = x;
end;
julia> mat = Array(Int64, (100,3)); counter = [0];
julia> f(mat, counter, [1 2 3]);
julia> f(mat, counter, [1 2 3]);
julia> f(mat, counter, [1 2 3]);
julia> mat[1:3,:]
3×3 Array{Int64,2}:
1 2 3
1 2 3
1 2 3
Alternatively, you can even create a closure, i.e. a function with state, that has an internal counter, and forget about keeping an external counter variable, e.g.
julia> f = () -> (); # creating an f at 'outer' scope
julia> let c = 1 # creates c at local 'let' scope
f = (mat, x) -> (mat[c,:] = x; c += 1;) # f reassigned from outer scope
end; # f now contains the 'closed' variable c
julia> mat = Array(Int64, (100,3));
julia> f(mat, [1 2 3]);
julia> f(mat, [2 3 4]);
julia> f(mat, [3 4 5]);
julia> mat[1:3,:]
3×3 Array{Int64,2}:
1 2 3
2 3 4
3 4 5

Parallel programming in Julia

I have been following the docs for parallel programming in julia and for my mind, which thinks like openMP or MPI, I find the design choice quite strange.
I have an application where I want data to be distributed among processes, and then I want to tell each process to apply some operation to whatever data it is assigned, yet I do not see a way of doing this in Julia. Here is an example
julia> r = remotecall(2, rand, 2)
RemoteRef{Channel{Any}}(2,1,30)
julia> fetch(r)
2-element Array{Float64,1}:
0.733308
0.45227
so on process 2 lives a random array with 2 elements. I can apply some function to this array via
julia> remotecall_fetch(2, getindex, r, 1)
0.7333080770447185
but why does it not work if i apply a function which should change the vector, like:
julia> remotecall_fetch(2, setindex!, r, 1,1)
ERROR: On worker 2:
MethodError: `setindex!` has no method matching setindex!(::RemoteRef{Channel{Any}}, ::Int64, ::Int64)
in anonymous at multi.jl:892
in run_work_thunk at multi.jl:645
[inlined code] from multi.jl:892
in anonymous at task.jl:63
in remotecall_fetch at multi.jl:731
in remotecall_fetch at multi.jl:734
I don't quite know how to describe it, but it seems like the workers can only return "new" things. I don't see how I can send some variables and a function to a worker and have the function modify the variables in place. In the above example, I'd like the array to live on a single process and ideally I'd be able to tell that process to perform some operations on that array. After all the operations are finished I could then fetch results etc.
I think you can achive this with the macro #spawnat:
julia> addprocs(2)
2-element Array{Int64,1}:
2
3
julia> r = remotecall(2, rand, 2)
RemoteRef{Channel{Any}}(2,1,3)
julia> fetch(r)
2-element Array{Float64,1}:
0.149753
0.687653
julia> remotecall_fetch(2, getindex, r, 1)
0.14975250913699378
julia> #spawnat 2 setindex!(fetch(r), 320.0, 1)
RemoteRef{Channel{Any}}(2,1,6)
julia> fetch(r)
2-element Array{Float64,1}:
320.0
0.687653
julia> #spawnat 2 setindex!(fetch(r), 950.0, 2)
RemoteRef{Channel{Any}}(2,1,8)
julia> fetch(r)
2-element Array{Float64,1}:
320.0
950.0
But with remotecall_fetch, it looks like the returned array is really a copy:
julia> remotecall_fetch(2, setindex!, fetch(r), 878.99, 1)
2-element Array{Float64,1}:
878.99
950.0
julia> remotecall_fetch(2, setindex!, fetch(r), 232.99, 2)
2-element Array{Float64,1}:
320.0
232.99
julia> fetch(r)
2-element Array{Float64,1}:
320.0
950.0
with: Julia Version 0.4.3
You may find Distributed Arrays useful, based on the description of your need.

Resources