I'm trying to run two functions simultaneously in julia, but I don't know how to do it. Here you can see my code:
function area(side::Float64)
return side*side
end
function f(n::Int64)
mat = zeros(n,n)
for i=1:n
for j=1:n
mat[i,j] = area(rand())
end
end
return mat
end
function g(n::Int64)
mat = zeros(n,n)
for i=1:n
for j=1:n
mat[i,j] = area(rand()*rand())
end
end
return mat
end
s1 = f(10)
s2 = g(10)
hcat(s1,s2)
In Julia 1.3 you can spawn tasks that will get scheduled on different threads using Threads.#spawn:
begin
s1 = Threads.#spawn f(10)
s2 = Threads.#spawn g(10)
s1 = fetch(s1)
s2 = fetch(s2)
end
See the announcement blog post for more info: https://julialang.org/blog/2019/07/multithreading.
Generally, there are different notions of "simultaneous" in parallel computing.
Since you tagged your question as "multiprocessing" let me give you a straightforward multi process solution (which should work for any Julia version >= 0.7). As such, it utilizes Julias built-in Distributed computing tools.
using Distributed
nworkers() < 2 && addprocs(2) # add two worker processes if necessary
#everywhere begin # define your functions on both workers
area(side::Float64) = side*side
function f(n::Int64)
mat = zeros(n,n)
for i=1:n
for j=1:n
mat[i,j] = area(rand())
end
end
return mat
end
function g(n::Int64)
mat = zeros(n,n)
for i=1:n
for j=1:n
mat[i,j] = area(rand()*rand())
end
end
return mat
end
end
# spawn tasks on the two workers (non-blocking)
t1 = #spawn f(10)
t2 = #spawn g(10)
# fetch the results (blocking until workers have finished)
r1 = fetch(t1)
r2 = fetch(t2)
hcat(r1,r2)
For more on how to use Distributed for parallel computing checkout, for example, this part of the Julia documentation or this Jupyter workshop from one of my workshops: Parallel Computing in Julia.
Related
I have a function like this:
#everywhere function bellman_operator!(rbc::RBC)
...
#sync #parallel for i = 1:m
....
for j = 1:n
v_max = -1000.0
...
for l = Next : n
......
if v > vmax
vmax = v
Next = l
else
break
end
end
f_v[j, i] = vmax
f_p[j, i] = k
end
end
end
f_v and f_p are sharedArrays, I want to give different arrays for result of each workers, I saw some sample but I can't fix it.How can I use arrays for result of each workers and finally combine the results instead of using SharedArrays?
Is this what you want?
Example 1. Combining results using +:
a = #parallel (+) for i in 1:1000
rand(10, 10)
end
Example 2. Just collecting the results without combining them:
x = Future[]
for i in 1:1000
push!(x, #spawn rand(10,10))
end
y = fetch.(x)
I am getting performance degradation after parallelizing the code that is calculating graph centrality. Graph is relatively large, 100K vertices. Single threaded application take approximately 7 minutes. As recommended on julialang site (http://julia.readthedocs.org/en/latest/manual/parallel-computing/#man-parallel-computing) I adapted code and used pmap api in order to parallelize calculations. I started calculation with 8 processes (julia -p 8 calc_centrality.jl). To my surprise I got 10 fold slow down. Parallel process now take more than hour. I noticed that it take several minutes for parallel process to initialize and starts calculation. Even after all 8 CPUs are %100 busy with julia app, calculation is super slow.
Any suggestion how to improve parallel performance is appreciated.
calc_centrality.jl :
using Graphs
require("read_graph.jl")
require("centrality_mean.jl")
function main()
file_name = "test_graph.csv"
println("graph creation: ", file_name)
g = create_generic_graph_from_file(file_name)
println("num edges: ", num_edges(g))
println("num vertices: ", num_vertices(g))
data = cell(8)
data[1] = {g, 1, 2500}
data[2] = {g, 2501, 5000}
data[3] = {g, 5001, 7500}
data[4] = {g, 7501, 10000}
data[5] = {g, 10001, 12500}
data[6] = {g, 12501, 15000}
data[7] = {g, 15001, 17500}
data[8] = {g, 17501, 20000}
cm = pmap(centrality_mean, data)
println(cm)
end
println("Elapsed: ", #elapsed main(), "\n")
centrality_mean.jl
using Graphs
function centrality_mean(gr, start_vertex)
centrality_cnt = Dict()
vertex_to_visit = Set()
push!(vertex_to_visit, start_vertex)
cnt = 0
while !isempty(vertex_to_visit)
next_vertex_set = Set()
for vertex in vertex_to_visit
if !haskey(centrality_cnt, vertex)
centrality_cnt[vertex] = cnt
for neigh in out_neighbors(vertex, gr)
push!(next_vertex_set, neigh)
end
end
end
cnt += 1
vertex_to_visit = next_vertex_set
end
mean([ v for (k,v) in centrality_cnt ])
end
function centrality_mean(data::Array{})
gr = data[1]
v_start = data[2]
v_end = data[3]
n = v_end - v_start + 1;
cm = Array(Float64, n)
v = vertices(gr)
cnt = 0
for i = v_start:v_end
cnt += 1
if cnt%10 == 0
println(cnt)
end
cm[cnt] = centrality_mean(gr, v[i])
end
return cm
end
I'm guessing this has nothing to do with parallelism. Your second centrality_mean method has no clue what type of objects gr, v_start, and v_end are. So it's going to have to use non-optimized, slow code for that "outer loop."
While there are several potential solutions, probably the easiest is to break up your function that receives the "command" from pmap:
function centrality_mean(data::Array{})
gr = data[1]
v_start = data[2]
v_end = data[3]
centrality_mean(gr, v_start, v_end)
end
function centrality_mean(gr, v_start, v_end)
n = v_end - v_start + 1;
cm = Array(Float64, n)
v = vertices(gr)
cnt = 0
for i = v_start:v_end
cnt += 1
if cnt%10 == 0
println(cnt)
end
cm[cnt] = centrality_mean(gr, v[i])
end
return cm
end
All this does is create a break, and gives julia a chance to optimize the second part (which contains the performance-critical loop) for the actual types of the inputs.
Below is code with #everywhere suggested in comments by https://stackoverflow.com/users/1409374/rickhg12hs. That fixed performance issue!!!
test_parallel_pmap.jl
using Graphs
require("read_graph.jl")
require("centrality_mean.jl")
function main()
#everywhere file_name = "test_data.csv"
println("graph creation from: ", file_name)
#everywhere data_graph = create_generic_graph_from_file(file_name)
#everywhere data_graph_vertex = vertices(data_graph)
println("num edges: ", num_edges(data_graph))
println("num vertices: ", num_vertices(data_graph))
range = cell(2)
range[1] = {1, 25000}
range[2] = {25001, 50000}
cm = pmap(centrality_mean_pmap, range)
for i = 1:length(cm)
println(length(cm[i]))
end
end
println("Elapsed: ", #elapsed main(), "\n")
centrality_mean.jl
using Graphs
function centrality_mean(start_vertex::ExVertex)
centrality_cnt = Dict{ExVertex, Int64}()
vertex_to_visit = Set{ExVertex}()
push!(vertex_to_visit, start_vertex)
cnt = 0
while !isempty(vertex_to_visit)
next_vertex_set = Set()
for vertex in vertex_to_visit
if !haskey(centrality_cnt, vertex)
centrality_cnt[vertex] = cnt
for neigh in out_neighbors(vertex, data_graph)
push!(next_vertex_set, neigh)
end
end
end
cnt += 1
vertex_to_visit = next_vertex_set
end
mean([ v for (k,v) in centrality_cnt ])
end
function centrality_mean(v_start::Int64, v_end::Int64)
n = v_end - v_start + 1;
cm = Array(Float64, n)
cnt = 0
for i = v_start:v_end
cnt += 1
cm[cnt] = centrality_mean(data_graph_vertex[i])
end
return cm
end
function centrality_mean_pmap(range::Array{})
v_start = range[1]
v_end = range[2]
centrality_mean(v_start, v_end)
end
From the Julia page on parallel computing:
Julia provides a multiprocessing environment based on message passing to allow programs to run on multiple processes in separate memory domains at once.
If I interpret this right, Julia's parallelism requires message passing to synchonize the processess. If each individual process does only a little work, and then does a message-pass, the computation will be dominated by message-passing overhead and not doing any work.
I can't see from your code, and I don't know Julia well enough, to see where the parallelism breaks are. But you have a big complicated graph that may be spread willy-nilly across multiple processes. If they need to interact on walking across graph links, you'll have exactly that kind of overhead.
You may be able to fix it by precomputing a partition of the graph into roughly equal size, highly cohesive regions. I suspect that breakup requires that same type of complex graph processing that you already want to do, so you may have a chicken and egg problem to boot.
It may be that Julia offers you the wrong parallelism model. You might want a shared address space so that threads walking the graph dont' have to use messages to traverse arcs.
I normally replace the for in MATLAB code with parfor, but all the 2 dimension matrix did not work.
Code
parfor k=1 : n
sonic = data1((1+(k-1)*2400):(2400*k));
signal1 = (sonic(1:2400))./100;
Ar = abs(fftshift(fft(signal1,2400)));
[maxb,ind] = max(b);
Tp(k) = 2*pi/x(ind);
E = #(x)(x^2+1);
for i=1:length(x2)
Ex(i,k) = E(x2(i));
Exm0(i,k) = Ex(i,k)-m0(k);
signal2(i) = Exm0(i,k);
end
epsilong(:,k) = Ar;
end
Only variables such as Tp(k) appear in the workspace; two dimension matrices like Ex(i,k) did not work.
The limitations of FOR loops nested within PARFOR loops are described here - I think the problem in this case is your loop bounds 1:length(x2). As described in that page, you should be able to work around this like so:
len_x2 = length(x2);
parfor k = 1:n
...
for i = 1:len_x2
...
end
end
Using DistributedArrays in cases when the worker only needs to store unshared data seems overly complicated. I would like to do
r=remotecall(2,a=Float64[])
remotecall(2,setindex!,a,5,10) #Error
or
r=remotecall(2,zeros,10)
remotecall(2,setindex!,r,5,10) #Error.
I would like to do this for each worker and then access the array in an async context. Perform some computations and then fetch the results. I am not sure of this is possible because of the let behavior of async
Below I have made an simplified example for which I modified the pmap example form the docs. T
times=linspace(0.1,2.0,10) # times in secs representing different difficult computations
sort!(times,rev=true)
np = nprocs()
n = length(times)
#create local variables
for p=1:np
if p != myid() || np == 1
remotecall(p,stack = Float64p[]) #does not work
end
end
#everywhere function fun(s)
mid=myid()
sleep(s)
#s represents some computation save to local stack
push!(stack,s)
end
#asynchronously do the computations
#everywhere i = 1
function nextidx()
global i
idx=i;
i+=1;
return idx;
end
#sync begin
for p=1:np
if p != myid() || np == 1
#async begin
j=1
res=zeros(40);
while true
idx = nextidx()
if idx > n
break
end
remotecall(fun, times[idx])
end
end
end
end
end
# collect the results of the computations
for p=1:np
if p != myid() || np == 1
tmpStack=fetch(p,stack)
#do someting with the results
end
end
By using 'global' when you modify the global variable of the worker (e.g., set by #everywhere a = 3), you may be able to resolve your problem. Check out the example code below.
#everywhere a = 0
remotecall_fetch(2, ()->a) # print 0
#everywhere function change_a(b)
global a
a = b
end
b = 10
remotecall_fetch(2, change_a, b)
remotecall_fetch(2, ()->a) # print 10
In the Matlab programs I use I often have to average within a matrix (interpolation). The most straightforward way is to add the matrix and a shifted one (avg). However you could do the same operation using matrix multiplication (avg2). I noticed a considerable speed increase in the case of using matrix multiplication in the case of large matrices.
Could anyone explain why Matlab is able to process this multiplication faster than adding the same matrix? Also what are the possible downsides of using avg2() in respect to avg()?
Difference in runtime was a factor ~6 for this case (n=500).
function [] = speed()
%Speed test for averaging a matrix
n = 500;
A = rand(n,n);
tic
for i=1:100
avg(A);
end
toc
tic
for i=1:100
avg2(A);
end
toc
end
function B = avg(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2, B = (A(2:end,:)+A(1:end-1,:))/2; else B = avg(A,k-1); end
if size(A,2)==1, B = B'; end
end
function B = avg2(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2,
m = size(A,1);
e = ones(m,1);
S = spdiags(e*[1 1],-1:0,m,m-1)'/2;
B = S*A; else B = avg2(A,k-1); end
if size(A,2)==1, B = B'; end
end
Im afraid I cant give you an answer to the inner workings of the functions you are using. However, as they seem overly complicated, I felt I should make you aware of an easier (and a bit faster) way of doing this averaging.
You can instead use conv2 with a kernel of [0.5;0.5]. I have extended your code below:
function [A, T1, T2 T3] = speed()
%Speed test for averaging a matrix
n = 900;
A = rand(n,n);
tic
for i=1:100
T1 = avg(A);
end
toc
tic
for i=1:100
T2 = avg2(A);
end
toc
tic
for i=1:100
T3 = conv2(A,[1;1]/2,'valid');
end
toc
if sum(sum(abs(T3-T2))) > 0
warning('Method 3 not equal the other methods')
end
end
function B = avg(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2, B = (A(2:end,:)+A(1:end-1,:))/2; else B = avg(A,k-1); end
if size(A,2)==1, B = B'; end
end
function B = avg2(A,k)
if nargin<2, k = 1; end
if size(A,1)==1, A = A'; end
if k<2,
m = size(A,1);
e = ones(m,1);
S = spdiags(e*[1 1],-1:0,m,m-1)'/2;
B = S*A; else B = avg2(A,k-1); end
if size(A,2)==1, B = B'; end
end
Results:
Elapsed time is 10.201399 seconds.
Elapsed time is 1.088003 seconds.
Elapsed time is 1.040471 seconds.
Apologies if you already knew this.