Julia parallel processing in separate modules - parallel-processing

I'm trying to write a simple program to do parallel processing with code in separate modules. I have the following code in two separate files:
main.jl
push!(LOAD_PATH, ".")
#using other # This doesn't work either
importall other
np = 4
b = [Bounds(k, k+8) for k in 1:np]
fut = Array{Future}(np)
for k = 1:np
fut[k] = #spawn center(b[k])
end
for k = 1:np
xc = fetch(fut[k])
println("Center for k ", k, " is ", xc)
end
other.jl
#everywhere module other
export Bounds
export center
#everywhere type Bounds
x1::Int
x2::Int
end
#everywhere function center(bound::Bounds)
return (bound.x1 + bound.x2) / 2
end
end
When i run with a single process with "julia main.jl" it runs with no errors, but if I try to add processes with "julia -p4 main.jl" I get the error below. It looks like maybe the additional processes can't see the code in other.jl, but I have the #everywhere macro at all the right places it seems. What is the problem?
ERROR: LoadError: LoadError: On worker 2:
UndefVarError: ##5#7 not defined
in deserialize_datatype at ./serialize.jl:823
in handle_deserialize at ./serialize.jl:571
in deserialize_msg at ./multi.jl:120
in message_handler_loop at ./multi.jl:1317
in process_tcp_streams at ./multi.jl:1276
in #618 at ./event.jl:68
in #remotecall_fetch#606(::Array{Any,1}, ::Function, ::Function, ::Base.Worker) at ./multi.jl:1070
in remotecall_fetch(::Function, ::Base.Worker) at ./multi.jl:1062
in #remotecall_fetch#609(::Array{Any,1}, ::Function, ::Function, ::Int64) at ./multi.jl:1080
in remotecall_fetch(::Function, ::Int64) at ./multi.jl:1080
in (::other.##6#8)() at ./multi.jl:1959
...and 3 other exceptions.
in sync_end() at ./task.jl:311
in macro expansion; at ./multi.jl:1968 [inlined]
in anonymous at ./<missing>:?
in eval(::Module, ::Any) at ./boot.jl:234
in (::##1#3)() at ./multi.jl:1957
in sync_end() at ./task.jl:311
in macro expansion; at ./multi.jl:1968 [inlined]
in anonymous at ./<missing>:?
in include_from_node1(::String) at ./loading.jl:488
in eval(::Module, ::Any) at ./boot.jl:234
in require(::Symbol) at ./loading.jl:409
in include_from_node1(::String) at ./loading.jl:488
in process_options(::Base.JLOptions) at ./client.jl:262
in _start() at ./client.jl:318
while loading /mnt/mint320/home/bmaier/BillHome/Programs/Julia/parallel/modules/other.jl, in expression starting on line 1
while loading /mnt/mint320/home/bmaier/BillHome/Programs/Julia/parallel/modules/main.jl, in expression starting on line 5

Try #everywhere importall other in main.jl
This will load other on all of the current workers. There is no requirement for other to be anything other than a normal script or module, and so you don't need the #everywhere's in other.jl.

Related

julia implement convert for struct containing NTuple

I'm trying to implement a convert for struct containing NTuple:
import Base: convert
abstract type AbstractMyType{N, T} end
struct MyType1{N, T} <: AbstractMyType{N, T}
data::NTuple{T, N}
end
struct MyType2{N, T} <: AbstractMyType{N, T}
data::NTuple{T, N}
end
foo(::Type{MyType2}, x::AbstractMyType{N, T}) where {N, T} = x
convert(::Type{MyType2}, x::AbstractMyType{N, T}) where {N, T} = MyType2{T}(x.data)
println(foo(MyType2, MyType1((1,2,3)))) # MyType1{Int64,3}((1, 2, 3))
println(convert(MyType2, MyType1((1,2,3)))) # MethodError
Defined functions foo and convert have the same signature. For some reason function foo returns normally while convert throws MethodError. Why Julia cannot find my convert method?
julia version 1.4.1
Julia is finding your convert method:
julia> println(convert(MyType2, MyType1((1,2,3)))) # MethodError
ERROR: MethodError: no method matching MyType2{3,T} where T(::Tuple{Int64,Int64,Int64})
Stacktrace:
[1] convert(::Type{MyType2}, ::MyType1{Int64,3}) at ./REPL[16]:1
[2] top-level scope at REPL[18]:1
That stack trace is saying that it's inside your convert function (in my case, I defined it on the first line of the 16th REPL prompt). The problem is that it cannot find a MyType2{T}(::Tuple) constructor.
Julia automatically creates a number of constructors for you when you don't use an inner constructor; in this case you can either call MyType(()) or MyType{T, N}(()), but Julia doesn't know what to do with only one type parameter passed (by default):
julia> MyType2((1,2,3))
MyType2{Int64,3}((1, 2, 3))
julia> MyType2{Int, 3}((1,2,3))
MyType2{Int64,3}((1, 2, 3))
julia> MyType2{Int}((1,2,3))
ERROR: MethodError: no method matching MyType2{Int64,T} where T(::Tuple{Int64,Int64,Int64})
Stacktrace:
[1] top-level scope at REPL[7]:1
[2] eval(::Module, ::Any) at ./boot.jl:331
[3] eval_user_input(::Any, ::REPL.REPLBackend) at /Users/mbauman/Julia/release-1.4/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
[4] run_backend(::REPL.REPLBackend) at /Users/mbauman/.julia/packages/Revise/AMRie/src/Revise.jl:1023
[5] top-level scope at none:0
So the fix is either to define that method yourself, or change the body of your convert method to call MyType{T,N} explicitly.
Just define the method
convert(::Type{MyType2}, x::AbstractMyType{N, T}) where {N, T} = MyType2(x.data)
Testing:
julia> convert(MyType2, MyType1((1,2,3)))
MyType2{Int64,3}((1, 2, 3))

Distributed Julia: parallel map (pmap) with a timeout / time limit for each map task to complete

My project involves computing in parallel a map using Julia's Distributed's pmap function.
Mapping a given element could take a few seconds, or it could take essentially forever. I want a timeout or time limit for an individual map task/computation to complete.
If a map task finishes in time, great, return the result of the computation. If the task doesn't complete by the time limit, stop computation when the time limit has been reached, and return some value or message indicating a timeout occurred.
A minimal example follows. First are imported modules, and then worker processes are launched:
num_procs = 1
using Distributed
if num_procs > 1
# The main process (no calling addprocs) can be used for `pmap`:
addprocs(num_procs-1)
end
Next, the mapping task is defined for all the worker processes. The mapping task should timeout after 1 second:
#everywhere import Random
#everywhere begin
"""
Compute stuff for `wait_time` seconds, and return `wait_time`.
If `timeout` seconds elapses, stop computation and return something else.
"""
function waitForTimeUnlessTimeout(wait_time, timeout=1)
# < Insert some sort of timeout code? >
# This block of code simulates a long computation.
# (pretend the computation time is unknown)
x = 0
while time()-t0 < wait_time
x += Random.rand() - 0.5
end
# computation completed before time limit. Return wait_time.
round(wait_time, digits=2)
end
end
The function that executes the parallel map (pmap) is defined on the main process. Each map task randomly takes up to 2 seconds to complete, but should time out after 1 second.
function myParallelMapping(num_tasks = 20, max_runtime=2)
# random task runtimes between 0 and max_runtime
runtimes = Random.rand(num_tasks) * max_runtime
# return the parallel computation of the mapping tasks
pmap((runtime)->waitForTimeUnlessTimeout(runtime), runtimes)
end
print(myParallelMapping())
How should this time-limited parallel map be implemented?
You could put something like this inside your pmap body
pmap(runtimes) do runtime
t0 = time()
task = #async waitForTimeUnlessTimeout(runtime)
while !istaskdone(task) && time()-t0 < time_limit
sleep(1)
end
istaskdone(task) && (return fetch(task))
error("time over")
end
Also note that (runtime)->waitForTimeUnlessTimeout(runtime) is the same as just waitForTimeUnlessTimeout .
Following #Fredrik Bagge's very helpful answer, here is the full working example implementation with some extra explanation.
num_procs = 8
using Distributed
if num_procs > 1
addprocs(num_procs-1)
end
#everywhere import Random
#everywhere begin
function waitForTime(wait_time)
# This code block simulates a long computation.
# Pretend the computation time is unknown.
t0 = time()
x = 0
while time()-t0 < wait_time
x += Random.rand() - 0.5
yield() # CRITICAL to release computation to check if task is done.
# If you comment out #yield(), you will see timeout doesn't work!
end
return round(wait_time, digits=2)
end
end
function myParallelMapping(num_tasks = 16, max_runtime=2, time_limit=1)
# random task runtimes between 0 and max_runtime
runtimes = Random.rand(num_tasks) * max_runtime
# parallel compute the mapping tasks. See "do block" in
# the Julia documentation, it's just syntactic sugar.
return pmap(runtimes) do runtime
t0 = time()
task = #async waitForTime(runtime)
while !istaskdone(task) && time()-t0 < time_limit
# releases computation to waitForTime
sleep(0.1)
# nothing past here will run until waitForTime calls yield()
# *and* 0.1 seconds have passed.
end
# equal to if istaskdone(task); return fetch(task); end
istaskdone(task) && (return fetch(task))
return "TimeOut"
# `return error("TimeOut")` halts pmap unless pmap is
# given an error handler argument. See pmap documentation.
end
end
The output is
julia> print(myParallelMapping())
Any["TimeOut", "TimeOut", 0.33, 0.35, 0.56, 0.41, 0.08, 0.14, 0.72,
"TimeOut", "TimeOut", "TimeOut", 0.52, "TimeOut", 0.33, "TimeOut"]
Note that there are two tasks per process in this example. The original task (the "time checker") is checking every 0.1 seconds if the other task has completed computation. The other task (created with #async) is computing something, periodically calling yield() to release control to the time checker; if it doesn't call yield(), time checking cannot occur.

Tensorflow doesn't calculate summary

I try to understand how to collect summaries for tensorboard and wrote a simple code to increment x from 1 till 5.
For some unknown reason I see variable My_x as 0 in all steps.
import tensorflow as tf
tf.reset_default_graph() # To clear the defined variables/operations
# create the scalar variable
x = tf.Variable(0, name='x')
# ____step 1:____ create the scalar summary
x_summ = tf.summary.scalar(name='My_x', tensor=x)
# accumulate all summaries
merged_summary = tf.summary.merge_all()
# create the op for initializing all variables
model = tf.global_variables_initializer()
# launch the graph in a session
with tf.Session() as session:
# ____step 2:____ creating the writer inside the session
summary_writer = tf.summary.FileWriter('output', session.graph)
for i in range(5):
#initialize variables
session.run(model)
x = x + 1
# ____step 3:____ evaluate the scalar summary
merged_summary_ans, x_summ_ans, x_ans = session.run([merged_summary, x_summ, x])
print(x_ans)
print(x_summ_ans)
print(merged_summary_ans)
# ____step 4:____ add the summary to the writer (i.e. to the event file)
summary_writer.add_summary(summary=x_summ_ans, global_step=i)
summary_writer.flush()
summary_writer.close()
print('Done with writing the scalar summary')
There are two problems that I can see in your code:
1) The first is that in each loop you are re-initialising the global variables again. This is resetting x back to its original value (0).
2) Second of all when you are updating x you are overwriting the link to the variable with a TensorFlow addition operation. Your code to increase x replaces 'x' with a tf.add operation and then your summary value is no longer tracing a tf.Variable but an addition operation. If you add "print(x)" after you define it and have it run once in every loop, you will see that originally it starts out as <tf.Variable 'x:0' shape=() dtype=int32_ref> but then after seeing that "x = x+1" then print(x) becomes Tensor("add:0", shape=(), dtype=int32). Here you can see that tf.summary.scalar is only compatible with the original value and you can see why it can't be updated.
Here is code I altered to get it to work so you can see the linear of the value of x in Tensorboard.
import tensorflow as tf
tf.reset_default_graph()
x = tf.Variable(0, name='x')
x_summary = tf.summary.scalar('x_', x)
init = tf.global_variables_initializer()
with tf.Session() as session:
session.run(init)
merged_summary_op = tf.summary.merge_all()
summary_writer = tf.summary.FileWriter('output', session.graph)
for i in range(5):
print(x.eval())
summary = session.run(merged_summary_op)
summary_writer.add_summary(summary, i)
session.run(tf.assign(x, x+1))
summary_writer.flush()
summary_writer.close()

Julia: How to copy data to another processor in Julia

How do you move data from one processor to another in julia?
Say I have an array
a = [1:10]
Or some other data structure. What is the proper way to put it on all other available processors so that it will be available on those processors as the same variable name?
I didn't know how to do this at first, so I spent some time figuring it out.
Here are some functions I wrote to pass objects:
sendto
Send an arbitrary number of variables to specified processes.
New variables are created in the Main module on specified processes. The
name will be the key of the keyword argument and the value will be the
associated value.
function sendto(p::Int; args...)
for (nm, val) in args
#spawnat(p, eval(Main, Expr(:(=), nm, val)))
end
end
function sendto(ps::Vector{Int}; args...)
for p in ps
sendto(p; args...)
end
end
Examples
# creates an integer x and Matrix y on processes 1 and 2
sendto([1, 2], x=100, y=rand(2, 3))
# create a variable here, then send it everywhere else
z = randn(10, 10); sendto(workers(), z=z)
getfrom
Retrieve an object defined in an arbitrary module on an arbitrary
process. Defaults to the Main module.
The name of the object to be retrieved should be a symbol.
getfrom(p::Int, nm::Symbol; mod=Main) = fetch(#spawnat(p, getfield(mod, nm)))
Examples
# get an object from named x from Main module on process 2. Name it x
x = getfrom(2, :x)
passobj
Pass an arbitrary number of objects from one process to arbitrary
processes. The variable must be defined in the from_mod module of the
src process and will be copied under the same name to the to_mod
module on each target process.
function passobj(src::Int, target::Vector{Int}, nm::Symbol;
from_mod=Main, to_mod=Main)
r = RemoteRef(src)
#spawnat(src, put!(r, getfield(from_mod, nm)))
for to in target
#spawnat(to, eval(to_mod, Expr(:(=), nm, fetch(r))))
end
nothing
end
function passobj(src::Int, target::Int, nm::Symbol; from_mod=Main, to_mod=Main)
passobj(src, [target], nm; from_mod=from_mod, to_mod=to_mod)
end
function passobj(src::Int, target, nms::Vector{Symbol};
from_mod=Main, to_mod=Main)
for nm in nms
passobj(src, target, nm; from_mod=from_mod, to_mod=to_mod)
end
end
Examples
# pass variable named x from process 2 to all other processes
passobj(2, filter(x->x!=2, procs()), :x)
# pass variables t, u, v from process 3 to process 1
passobj(3, 1, [:t, :u, :v])
# Pass a variable from the `Foo` module on process 1 to Main on workers
passobj(1, workers(), [:foo]; from_mod=Foo)
use #eval #everywhere... and escape the local variable. like this:
julia> a=collect(1:3)
3-element Array{Int64,1}:
1
2
3
julia> addprocs(1)
1-element Array{Int64,1}:
2
julia> #eval #everywhere a=$a
julia> #fetchfrom 2 a
3-element Array{Int64,1}:
1
2
3
Just so everyone here knows, I put these ideas together into a package ParallelDataTransfer.jl for this. So you just need to do
using ParallelDataTransfer
(after installing) in order to use the functions mentioned in the answers here. Why? These functions are pretty useful! I added some testing, some new macros, and updated them a bit (they pass on v0.5, fail on v0.4.x). Feel free to put in pull requests to edit these and add more.
To supplement #spencerlyon2 's answer here are some macros:
function sendtosimple(p::Int, nm, val)
ref = #spawnat(p, eval(Main, Expr(:(=), nm, val)))
end
macro sendto(p, nm, val)
return :( sendtosimple($p, $nm, $val) )
end
macro broadcast(nm, val)
quote
#sync for p in workers()
#async sendtosimple(p, $nm, $val)
end
end
end
The #spawnat macro binds a value to a symbol on a particular process
julia> #sendto 2 :bip pi/3
RemoteRef{Channel{Any}}(9,1,5340)
julia> #fetchfrom 2 bip
1.0471975511965976
The #broadcast macro binds a value to a symbol in all processes except 1 (as I found doing so made future expressions using the name copy the version from process 1)
julia> #broadcast :bozo 5
julia> #fetchfrom 2 bozo
5
julia> bozo
ERROR: UndefVarError: bozo not defined
julia> bozo = 3 #these three lines are why I exclude pid 1
3
julia> #fetchfrom 7 bozo
3
julia> #fetchfrom 7 Main.bozo
5

mpi4py: Internal Error: invalid error code 409e0e (Ring ids do not match)

I am coding in python and using mpi4py to do some optimization in parallel. I am using Ordinary Least Squares, and my data is too large to fit on one processor, so I have a master process that then spawns other processes. These child processes each import a section of the data that they respectively work with throughout the optimization process.
I am using scipy.optimize.minimize for the optimization, so the child processes receive a coefficient guess from the parent process, and then report the sum of squared error (SSE) to the parent process, and then scipy.optimize.minimize goes through iterations, trying to find a minimum for the SSE. After each iteration of the minimize function, the parent broadcasts the new coefficient guesses to the child processes, who then calculate the SSE again. In the child processes, this algorithm is set up in a while loop. In the parent process, I simply call scipy.optimize.minimize.
On the part that is giving me a problem, I am doing a nested optimization, or an optimization within an optimization. The inner optimization is an OLS regression as described above, and then the outer optimization is minimizing another function that uses the coefficient of the inner optimization (the OLS regression).
So in my parent process, I have two functions that I minimize, and the second function calls on the first and does a new optimization for every iteration of the second function's optimization. The child processes have a nested while loop for those two optimizations.
Hopefully that all makes sense. If more information is needed, please let me know.
Here is the relevant code for the parent process:
comm = MPI.COMM_SELF.Spawn(sys.executable,args = ['IVQTparallelSlave_cdf.py'],maxprocs=processes)
# First stage: reg D on Z, X
def OLS(betaguess):
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
comm.Bcast([np.array([1],'i'),MPI.INT], root=MPI.ROOT)
return SSE
# Here is the CDF function.
def CDF(yguess, delta_FS, tau):
# Calculate W(y) in the slave process
# Solving the Reduced form after every iteration: reg W(y) on Z, X
comm.Bcast([yguess,MPI.DOUBLE], root=MPI.ROOT)
betaguess = np.zeros(94).astype('d')
###########
# This calculates the reduced form coefficient
coeffs_RF = scipy.minimize(OLS,betaguess,method='Powell')
# This little block is to get the slave processes to stop
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
cont = np.array([0],'i')
comm.Bcast([cont,MPI.INT], root=MPI.ROOT)
###########
contCDF = np.array([1],'i')
comm.Bcast([contCDF,MPI.INT], root=MPI.ROOT) # This is to keep the outer while loop going
delta_RF = coeffs_RF.x[1]
return abs(delta_RF/delta_FS - tau)
########### This one finds Y(1) ##############
betaguess = np.zeros(94).astype('d')
######### First Stage: reg D on Z, X #########
coeffs_FS = scipy.minimize(OLS,betaguess,method='Powell')
print coeffs_FS
# This little block is to get the slave processes' while loops to stop
comm.Bcast([betaguess,MPI.DOUBLE], root=MPI.ROOT)
SSE = np.array([0],dtype='d')
comm.Reduce(None,[SSE,MPI.DOUBLE], op=MPI.SUM, root = MPI.ROOT)
cont = np.array([0],'i')
comm.Bcast([cont,MPI.INT], root=MPI.ROOT)
delta_FS = coeffs_FS.x[1]
######### CDF Function #########
yguess = np.array([3340],'d')
CDF1 = lambda yguess: CDF(yguess, delta_FS, tau)
y_minned_1 = scipy.minimize(CDF1,yguess,method='Powell')
Here is the relevant code for the child processes:
#IVQTparallelSlave_cdf.py
comm = MPI.Comm.Get_parent()
.
.
.
# Importing data. The data is the matrices D, and ZX
.
.
.
########### This one finds Y(1) ##############
######### First Stage: reg D on Z, X #########
cont = np.array([1],'i')
betaguess = np.zeros(94).astype('d')
# This corresponds to 'coeffs_FS = scipy.minimize(OLS,betaguess,method='Powell')' of the parent process
while cont[0]:
comm.Bcast([betaguess,MPI.DOUBLE], root=0)
SSE = np.array(((D - np.dot(ZX,betaguess).reshape(local_n,1))**2).sum(),'d')
comm.Reduce([SSE,MPI.DOUBLE],None, op=MPI.SUM, root = 0)
comm.Bcast([cont,MPI.INT], root=0)
if rank==0: print '1st Stage OLS regression done'
######### CDF Function #########
cont = np.array([1],'i')
betaguess = np.zeros(94).astype('d')
contCDF = np.array([1],'i')
yguess = np.array([0],'d')
# This corresponds to 'y_minned_1 = spicy.minimize(CDF1,yguess,method='Powell')'
while contCDF[0]:
comm.Bcast([yguess,MPI.DOUBLE], root=0)
# This calculates the reduced form coefficient
while cont[0]:
comm.Bcast([betaguess,MPI.DOUBLE], root=0)
W = 1*(Y<=yguess)*D
SSE = np.array(((W - np.dot(ZX,betaguess).reshape(local_n,1))**2).sum(),'d')
comm.Reduce([SSE,MPI.DOUBLE],None, op=MPI.SUM, root = 0)
comm.Bcast([cont,MPI.INT], root=0)
#if rank==0: print cont
comm.Bcast([contCDF,MPI.INT], root=0)
My problem is that after one iteration through the outer minimization, it spits out the following error:
Internal Error: invalid error code 409e0e (Ring ids do not match) in MPIR_Bcast_impl:1328
Traceback (most recent call last):
File "IVQTparallelSlave_cdf.py", line 100, in <module>
if rank==0: print 'CDF iteration'
File "Comm.pyx", line 406, in mpi4py.MPI.Comm.Bcast (src/mpi4py.MPI.c:62117)
mpi4py.MPI.Exception: Other MPI error, error stack:
PMPI_Bcast(1478).....: MPI_Bcast(buf=0x2409f50, count=1, MPI_INT, root=0, comm=0x84000005) failed
MPIR_Bcast_impl(1328):
I haven't been able to find any information about this "ring id" error or how to fix it. Help would be much appreciated. Thanks!

Resources