Parallel programming in Julia - parallel-processing

I have been following the docs for parallel programming in julia and for my mind, which thinks like openMP or MPI, I find the design choice quite strange.
I have an application where I want data to be distributed among processes, and then I want to tell each process to apply some operation to whatever data it is assigned, yet I do not see a way of doing this in Julia. Here is an example
julia> r = remotecall(2, rand, 2)
RemoteRef{Channel{Any}}(2,1,30)
julia> fetch(r)
2-element Array{Float64,1}:
0.733308
0.45227
so on process 2 lives a random array with 2 elements. I can apply some function to this array via
julia> remotecall_fetch(2, getindex, r, 1)
0.7333080770447185
but why does it not work if i apply a function which should change the vector, like:
julia> remotecall_fetch(2, setindex!, r, 1,1)
ERROR: On worker 2:
MethodError: `setindex!` has no method matching setindex!(::RemoteRef{Channel{Any}}, ::Int64, ::Int64)
in anonymous at multi.jl:892
in run_work_thunk at multi.jl:645
[inlined code] from multi.jl:892
in anonymous at task.jl:63
in remotecall_fetch at multi.jl:731
in remotecall_fetch at multi.jl:734
I don't quite know how to describe it, but it seems like the workers can only return "new" things. I don't see how I can send some variables and a function to a worker and have the function modify the variables in place. In the above example, I'd like the array to live on a single process and ideally I'd be able to tell that process to perform some operations on that array. After all the operations are finished I could then fetch results etc.

I think you can achive this with the macro #spawnat:
julia> addprocs(2)
2-element Array{Int64,1}:
2
3
julia> r = remotecall(2, rand, 2)
RemoteRef{Channel{Any}}(2,1,3)
julia> fetch(r)
2-element Array{Float64,1}:
0.149753
0.687653
julia> remotecall_fetch(2, getindex, r, 1)
0.14975250913699378
julia> #spawnat 2 setindex!(fetch(r), 320.0, 1)
RemoteRef{Channel{Any}}(2,1,6)
julia> fetch(r)
2-element Array{Float64,1}:
320.0
0.687653
julia> #spawnat 2 setindex!(fetch(r), 950.0, 2)
RemoteRef{Channel{Any}}(2,1,8)
julia> fetch(r)
2-element Array{Float64,1}:
320.0
950.0
But with remotecall_fetch, it looks like the returned array is really a copy:
julia> remotecall_fetch(2, setindex!, fetch(r), 878.99, 1)
2-element Array{Float64,1}:
878.99
950.0
julia> remotecall_fetch(2, setindex!, fetch(r), 232.99, 2)
2-element Array{Float64,1}:
320.0
232.99
julia> fetch(r)
2-element Array{Float64,1}:
320.0
950.0
with: Julia Version 0.4.3

You may find Distributed Arrays useful, based on the description of your need.

Related

Replace values with for loop

Suppose I have the following function:
function y1(x)
y = x^(2) - 4
return y
end
Now, I want to evaluate all the values from this sequence: collect(range(-10,10, 1000))
I tried this
y_1 = zeros(1000);
for x in collect(range(-10, 10, 1000))
y_1 = y1.(x)
end
Note that I use the broadcast operator to apply the function y1 for every value that takes the iterator. But if I don't use it I get the same result.
But as an answer, I just get 96.0.
How can I refill the y_1 vector with the for loop, so I get the evaluated values?
The evaluated vector should be of size 1000
Thanks in advance!
Edit:
I found a way to get to my desired result without the for loop:
y_1 = y1.(collect(range(-10, 10, 1000)))
But I still want to know how can I do it in a loop.
The broadcast operator broadcasts the function over the entire iterator by itself i.e. y1.(arr) will
call y1 on each of the elements of the array arr
collect the results of all those calls, and
allocate memory to store those results as an array too
So the following are all equivalent in terms of functionality:
julia> arr = range(-4, 5, length = 10) #define a simple range
-4.0:1.0:5.0
julia> y1.(arr)
10-element Vector{Float64}:
12.0
5.0
0.0
-3.0
-4.0
-3.0
0.0
5.0
12.0
21.0
julia> [y1(x) for x in arr]
10-element Vector{Float64}:
(same values as above)
julia> map(y1, arr)
10-element Vector{Float64}:
(same values as above)
julia> y_1 = zeros(10);
julia> for (i, x) in pairs(arr)
y_1[i] = y1(x)
end
julia> y_1
10-element Vector{Float64}:
(same values as above)
In practice, there maybe other considerations, including performance, that decides between these and other choices.
As an aside, note that very often you don't want to collect a range in Julia i.e. don't think of collect as somehow equivalent to c() in R. For many operations, the ranges can be directly used, including for iteration in for loops. collect should only be necessary in the rare cases where an actual Vector is necessary, for eg. a value in the middle of the array needs to be changed for some reason. As a general rule, use the range results as they are, until and unless you get an error that requires you to change it.

Manipulating several variables within a for loop in Julia

I'm new to Julia. I want to write code which, for each of several vectors, outputs a new vector, the name of which depends on the name of the input vector.
For example, the following code works
a = ones(10)
b = ones(10)
for var in [a, b]
global log_var = log.(var)
end
except I want the resulting vectors to be named log_a and log_b (rather than have the loop overwrite log_var). I had thought this would be simple, but having read a few tutorials about locals in Julia, I'm still lost! Is there a simple way to go about this?
In case this question is unclear, I'll describe how I would do this in Stata, with which I'm more familiar:
clear
set obs 10
gen a = 1
gen b = 1
foreach var in a b {
gen log_`var' = log(`var')
}
Thank you!
if you are looking for something similar to what you do in stata, you can use DataFrames.jl,
julia> using DataFrames
julia> df = DataFrame(a=ones(10), b=ones(10))
julia> for col in ["a", "b"]
df[:, "log_"*col] = log.(df[:, col])
end
julia> df
You really probably don't want to do that. But, if you had to, you could do it pretty easily with metaprogramming. In this case for example:
macro logify(variable)
quote
$(esc(Symbol("log_$variable"))) = log.($variable)
end
end
then
julia> b = rand(5)
5-element Vector{Float64}:
0.29129581739244315
0.21098023915449915
0.8736387630142392
0.34378216482772417
0.621583372934101
julia> #logify b;
julia> log_b
5-element Vector{Float64}:
-1.2334159735391819
-1.555990803188027
-0.13508830339365252
-1.0677470639708686
-0.4754852291054692
In general, any time you need to depend on the name of a variable rather than its contents, you're going to need metaprogramming.
However, to emphasize, again, this feels like a bad idea.
Rather than defining new top-level variables, you might consider instead using some sort of data structure like a Dict or a NamedTuple or a DataFrame, or even just a multidimensional Array. For example, with NamedTuples:
julia> data = (a = rand(5), b = rand(5));
julia> typeof(data)
NamedTuple{(:a, :b), Tuple{Vector{Float64}, Vector{Float64}}}
julia> data.a
5-element Vector{Float64}:
0.7146929585896256
0.5248314042991269
0.040560190890127856
0.9714549101298824
0.9477790450084252
julia> data.b
5-element Vector{Float64}:
0.6856764745285641
0.3066093923258396
0.5655243277481422
0.13478854894985115
0.8495720250298817
julia> logdata = NamedTuple{keys(data)}(log.(data[x]) for x in keys(data));
julia> logdata.a
5-element Vector{Float64}:
-0.335902257064951
-0.6446782026336225
-3.204968213346185
-0.02896042387181646
-0.05363387877891503
julia> logdata.b
5-element Vector{Float64}:
-0.3773493739743169
-1.182180679204628
-0.5700019644606769
-2.0040480325554944
-0.1630225562612911
Not really recommended for such usage, but a quick and dirty variant is
for var in [:a, :b]
#eval global $(Symbol("log_", var)) = log.($var)
end

Pre-allocation in Julia

I am trying to minimize memory allocations in Julia by pre-allocating arrays as shown in the documentation. My sample code looks as follows:
using BenchmarkTools
dim1 = 100
dim2 = 1000
A = rand(dim1,dim2)
B = rand(dim1,dim2)
C = rand(dim1,dim2)
D = rand(dim1,dim2)
M = Array{Float64}(undef,dim1,dim2)
function calc!(a, b, c, d, E)
#. E = a * b * ((d-c)/d)
nothing
end
function run_calc(A,B,C,D,M)
for i in 1:dim2
#views calc!(A[:,i], B[:,i], C[:,i], D[:,i], M[:,i])
end
end
My understanding is that this should essentially not allocate since M is pre-allocated outside the either of the two functions. However, when I benchmark this I still see a lot of allocations:
#btime run_calc(A,B,C,D,M)
1.209 ms (14424 allocations: 397.27 KiB)
In this case I can of course run the much more concise
#btime #. M = A * B * ((D-C)/D)
which performs very few allocations as expected:
122.599 μs (6 allocations: 144 bytes)
However my actual code is more complex and cannot be reduced like this, hence I am wondering where I am going wrong with the first version.
You are not doing anything wrong. Currently creation of views in Julia is allocating (as Stefan noted it has gotten much better than in the past, but still some allocations seem to happen in this case). The allocations you see are a consequence of this.
See:
julia> #allocated view(M, 1:10, 1:10)
64
Your case is one of the situations where it is simplest to just write an appropriate loop (I assume that in your code the loop will be more complex but I hope the intent is clear), e.g.:
julia> function run_calc2(A,B,C,D,M)
#inbounds for i in eachindex(A,B,C,D,M)
M[i] = A[i] * B[i] * ((D[i] - C[i])/D[i])
end
end
run_calc2 (generic function with 1 method)
julia> #btime run_calc2($A,$B,$C,$D,$M)
56.441 μs (0 allocations: 0 bytes)
julia> #btime run_calc($A,$B,$C,$D,$M)
893.789 μs (14424 allocations: 397.27 KiB)
julia> #btime #. $M = $A * $B * (($D-$C)/$D);
381.745 μs (0 allocations: 0 bytes)
EDIT: all timings on Julia Version 1.6.0-DEV.1580
EDIT2: for completeness a code that passes #views down to the inner function. It still allocates (but is better) and is still slower than using just the loop:
julia> function calc2!(a, b, c, d, E, i)
#inbounds #. #views E[:,i] = a[:,i] * b[:,i] * ((d[:,i]-c[:,i])/d[:,i])
nothing
end
calc2! (generic function with 1 method)
julia> function run_calc3(A,B,C,D,M)
for i in 1:dim2
calc2!(A,B,C,D,M,i)
end
end
run_calc3 (generic function with 1 method)
julia> #btime run_calc3($A,$B,$C,$D,$M);
305.709 μs (1979 allocations: 46.56 KiB)
Prior to Julia 1.5, creating array views will often allocate a bit of memory for the view object. After Julia 1.5, creating views will usually not cause any allocation. Your post doesn't include what version of Julia you're using, so I'll assume that it's older than 1.5. In your code, you are creating a view for each index of a potentially large array dimension, which will definitely add up. You could refactor this code to pass the dimension through to the inner calculation. Otherwise you can upgrade Julia and see if the allocation goes away.

Data Movement on Julia (Parallel)

We have a function in count_heads.jl:
function count_hands(n)
c::Int=0
for i=1:n
c+=rand(Bool)
end
c
end
we run julia as ./julia -p 2
we want to calculate a and b in different process and we have :
julia> #everywhere include("count_hands.jl")
julia> a=#spawn count_hands(1000000000)
julia> b=#spawn count_hands(1000000000)
julia> fetch(a)+fetch(b)
1: How we can be sure we are calculating a and b in a different process?
I know we can use #spawnat instead of #spawn and choose the number of process but I saw this code and I want to know How we can sure about that.
we suppose it is correct and both of them are computing in different process, count_hands(1000000000) for each a and b is calculating in different process and then they are adding together in process 1. Is this right?
How we can be sure we are calculating a and b in a different process?
You can't unless you use #spawnat n and ensure that nprocs() is greater than or equal to n, and that the ns are different.
Is this right?
Yes, assuming that you've used #spawnat 1 for a. You can test this by rewriting your function as follows:
julia> #everywhere function count_hands(n)
println("this process is $(myid())")
c::Int=0
for i=1:n
c+=rand(Bool)
end
c
end
julia> a = #spawnat 1 count_hands(1000)
this process is 1
Future(1, 1, 11, Nullable{Any}())
julia> b = #spawnat 2 count_hands(1000)
Future(2, 1, 12, Nullable{Any}())
julia> From worker 2: this process is 2
julia>
julia> fetch(a) + fetch(b)
1021

Add a row to a matrix inside a function (and propagate the changes outside) in Julia?

This is similar to this question:
Add a row to a matrix in Julia?
But now I want to grow the matrix inside a function:
function f(mat)
mat = vcat(mat, [1 2 3])
end
Now, outside this function:
mat = [2 3 4]
f(mat)
But this doesn't work. The changes made to mat inside f aren't propagated outside, because a new mat was created inside f (see http://docs.julialang.org/en/release-0.4/manual/faq/#functions).
Is it possible to do what I want?
Multi-dimensional arrays cannot have their size changed. There are pointer hacks to share data, but these do not modify the size of the original array.
Even if it were possible, be aware that because Julia matrices are column major, this operation is very slow, and requires a copy of the array.
In Julia, operations that modify the data passed in (i.e., performing computations on data instead of with data) are typically marked with !. This denotes to the programmer that the collection being processed will be modified. These kinds of operations are typically called "in-place" operations, because although they are harder to use and reason about, they avoid using additional memory, and can usually complete faster.
There is no way to avoid a copy for this operation because of how matrices are stored in memory. So there is not much real benefit to turning this particular operation into an in-place operation. Therefore, I recommend against it.
If you really need this operation for some reason, you should not use a matrix, but rather a vector of vectors:
v = Vector{Float64}[]
push!(v, [1.0, 2.0, 3.0])
This data structure is slightly slower to access, but much faster to add to.
On the other hand, from what it sounds like, you may be interested in a more specialized data structure, such as a DataFrame.
I agree with Fengyang's very comprehensive answer and excellent avoidance of an XY Problem. Just adding my 2¢.
push!ing is a fairly expensive operation, since at each push a new location in memory needs to be found to accommodate the new variable of larger size.
If you care about efficiency, then it would be a lot more prudent to preallocate your mat and mutate it (i.e. alter its contents) inside your function, which is permitted, i.e.
julia> mat = Array(Int64, (100,3)); # mat[1,:] contains garbage values
julia> f(mat, ind, x) = mat[ind,:] = x;
julia> f(mat, 1, [1 2 3]); # mat[1,:] now contains [1,2,3]
If the reason you prefer the push! approach is because you don't want to keep track of the index and pass it manually, then you can automate this process in your function too, by keeping a mutatable counter e.g.
function f(mat, c, x)
c[1] = c[1] + 1;
mat[c[1], :] = x;
end;
julia> mat = Array(Int64, (100,3)); counter = [0];
julia> f(mat, counter, [1 2 3]);
julia> f(mat, counter, [1 2 3]);
julia> f(mat, counter, [1 2 3]);
julia> mat[1:3,:]
3×3 Array{Int64,2}:
1 2 3
1 2 3
1 2 3
Alternatively, you can even create a closure, i.e. a function with state, that has an internal counter, and forget about keeping an external counter variable, e.g.
julia> f = () -> (); # creating an f at 'outer' scope
julia> let c = 1 # creates c at local 'let' scope
f = (mat, x) -> (mat[c,:] = x; c += 1;) # f reassigned from outer scope
end; # f now contains the 'closed' variable c
julia> mat = Array(Int64, (100,3));
julia> f(mat, [1 2 3]);
julia> f(mat, [2 3 4]);
julia> f(mat, [3 4 5]);
julia> mat[1:3,:]
3×3 Array{Int64,2}:
1 2 3
2 3 4
3 4 5

Resources