How to run a function in parallel with Julia language? - parallel-processing

I am trying to figure out how to work with parallel computing with Julia. The documentation looks great, even for someone like me that has never worked with Parallel Computing (and that does not understand most of the concepts behind the documentation ;)).
Just to mention: I am working in a PC with Ubuntu. It has a 4-core processor.
To run the code I describe below I am calling the julia terminal as:
$ julia -p 4
I am following the documentation here. I am facing some problems with examples described in this section
I am trying to run the following piece of code:
#everywhere advection_shared_chunk!(q, u) = advection_chunk!(q, u, myrange(q)..., 1:size(q,3)-1)
function advection_shared!(q, u)
#sync begin
for p in procs(q)
#async remotecall_wait(advection_shared_chunk!, p, q, u)
q = SharedArray(Float64, (500,500,500))
u = SharedArray(Float64, (500,500,500))
#Run once to JIT-compile
But I am facing the following error:
ERROR: MethodError: `remotecall_wait` has no method matching remotecall_wait(::Function, ::Int64, ::SharedArray{Float64,3}, ::SharedArray{Float64,3})
Closest candidates are:
remotecall_wait(::LocalProcess, ::Any, ::Any...)
remotecall_wait(::Base.Worker, ::Any, ::Any...)
remotecall_wait(::Integer, ::Any, ::Any...)
in anonymous at task.jl:447
...and 3 other exceptions.
in sync_end at ./task.jl:413
[inlined code] from task.jl:422
in advection_shared! at none:2
What am I doing wrong here? As far as I know I am just reproducing the example in the docs... or not?
Thanks for any help,
Thanks #Daniel Arndt, you found the trick! I was looking at the docs in: I thought it was supposed to be the one relative to Julia 0.4.x (the latest stable version so far) but it seems that it is relative to Julia 0.5.x (the latest version among all versions).
I did the changes you suggested (changed the order and added the functions that were missing) and everything worked like a charm. I will leave here the updated code
# Here's the kernel
#everywhere function advection_chunk!(q, u, irange, jrange, trange)
#show (irange, jrange, trange) # display so we can see what's happening
for t in trange, j in jrange, i in irange
q[i,j,t+1] = q[i,j,t] + u[i,j,t]
# This function retuns the (irange,jrange) indexes assigned to this worker
#everywhere function myrange(q::SharedArray)
idx = indexpids(q)
if idx == 0
# This worker is not assigned a piece
return 1:0, 1:0
nchunks = length(procs(q))
splits = [round(Int, s) for s in linspace(0,size(q,2),nchunks+1)]
1:size(q,1), splits[idx]+1:splits[idx+1]
#everywhere advection_shared_chunk!(q, u) = advection_chunk!(q, u, myrange(q)..., 1:size(q,3)-1)
function advection_shared!(q, u)
#sync begin
for p in procs(q)
#async remotecall_wait(p, advection_shared_chunk!, q, u)
q = SharedArray(Float64, (500,500,500))
u = SharedArray(Float64, (500,500,500))
#Run once to JIT-compile

I don't believe you are doing anything wrong, other than you are likely using a newer version of the docs (or we're seeing different things!).
Lets make sure you're using Julia 0.4.x and these docs:
In Julia v0.5.0, the order of the first two parameters for remotecall_wait was changed. Switch the order to remotecall_wait(p, advection_shared_chunk!, q, u) and you should be on to your next error (myrange is not defined, which can be found earlier in the docs)


I can't get or pool.starmap to work the way I think it should

So for the life of me I can't figure out which combination of or pool.starmap() should be used to make this work as I want. The code below executes correctly, firing off (sequentially!) two instances of runSim and appending the results.
results = []
argsToRun = []
for n in range(2):
env = simpy.Environment() # Create the SimPy environment
env.clockRate = 1e9
argsToRun.append([env, adist, sdist, tdist, 108, intSpeed, runUntil])
for arg_list in argsToRun:
line = runSim(*arg_list)
I was under the impression that something like the code below would be a simple way to run these instances in parallel (the results are not order-dependent)... but neither of them seem to work. Surely I'm missing something obvious?
with multiprocessing.Pool() as pool:
results =, argsToRun)
with multiprocessing.Pool() as pool:
results = pool.starmap(runSim, argsToRun)
After lots of frustration and searching, it turns out that some of my parameters were not pickleable. This, combined with a bug in VSCode which prevented the debugger from running made it much harder to diagnose than it should have been.
Switching to the pathos versions of Pool() solved the problem.
for n in range(15):
env = simpy.Environment() # Create the SimPy environment
argsToRun.append((env, adist, sdist, tdist, 108, intSpeed, runUntil))
with as pool: ## tada!
results = pool.starmap(runSim, argsToRun)

Expected type '{__name__}', got '() -> None' instead

I have a question about my Python(3.6) code or PyCharm IDE on MacBook
I wrote a function using "timeit" to test time spent by other function
def timeit_func(func_name, num_of_round=1):
print("start" + func_name.__name__ + "()")
str_setup = "from __main__ import " + func_name.__name__
print('%s() spent %f s' % (func_name.__name__,
timeit.timeit(func_name.__name__ + "()",
print(func_name.__name__ + "() finish")
parameter "func_name" is just a function need to be tested and has already been defined.
and I call this function with the code
if __name__ == "__main__":
the function works well, but pycharm show the info with this code "func_name=another_function":
Expected type '{__name__}', got '() -> None' instead less... (⌃F1 ⌥T)
This inspection detects type errors in function call expressions. Due to dynamic dispatch and duck typing, this is possible in a limited but useful number of cases. Types of function parameters can be specified in docstrings or in Python 3 function annotations
I have googled "Expected type '{name}', got '() -> None" but got nothing helpful.I am new on Python.
I want to ask what it means? And how can I let this information disappear? because now it is highlighted and let me feel uncomfortable.
I use it in Python3.6 byimport time,this is what I found in the doc of timeit module()(timeit.timeit())
def timeit(stmt="pass", setup="pass", timer=default_timer, number=default_number, globals=None):
"""Convenience function to create Timer object and call timeit method."""
return Timer(stmt, setup, timer, globals).timeit(number)
Your parameter func_name is badly named because you are passing it a function, not the name of a function. This probably indicates the source of your confusion.
The error message is simply saying that pycharm is expecting you to pass an object with an attribute __name__ but it was given a function instead. Functions do have that attribute but it is part of the internal detail, not something you normally need to access.
The simplest solution would be to work with the function directly. The documentation for timeit isn't very clear on this point, but you can actually give it a function (or any callable) instead of a string. So your code could be:
def timeit_func(func, num_of_round=1):
print("start" + func.__name__ + "()")
print('%s() spent %f s' % (func.__name__,
print(func.__name__ + "() finish")
if __name__ == "__main__":
That at least makes the code slightly less confusing as the parameter name now matches the value rather better. I don't use pycharm so I don't know if it will still warn, that probably depends whether it knows that timeit takes a callable.
An alternative that should get rid of the error would be to make the code match your parameter name by actually passing in a function name:
def timeit_func(func_name, num_of_round=1):
print("start" + func_name + "()")
str_setup = "from __main__ import " + func_name
print('%s() spent %f s' % (func_name,
timeit.timeit(func_name + "()",
print(func_name + "() finish")
if __name__ == "__main__":
This has the disadvantage that you can now only time functions defined and importable from in your main script whereas if you actually pass the function to timeit you could use a function defined anywhere.

Building an UCS4 string buffer in python 2.7 ctypes

In an attempt to recreate the getenvironment(..) C function of _winapi.c (direct link) in plain python using ctypes, I'm wondering how the following C code could be translated:
buffer = PyMem_NEW(Py_UCS4, totalsize);
if (! buffer) {
goto error;
p = buffer;
end = buffer + totalsize;
for (i = 0; i < envsize; i++) {
PyObject* key = PyList_GET_ITEM(keys, i);
PyObject* value = PyList_GET_ITEM(values, i);
if (!PyUnicode_AsUCS4(key, p, end - p, 0))
goto error;
p += PyUnicode_GET_LENGTH(key);
*p++ = '=';
if (!PyUnicode_AsUCS4(value, p, end - p, 0))
goto error;
p += PyUnicode_GET_LENGTH(value);
*p++ = '\0';
/* add trailing null byte */
*p++ = '\0';
It seems that the function ctypes.create_unicode_buffer(..) (doc, code) is doing something quite close that I could reproduce if only I could have an access to Py_UCS4 C type or be sure of its link to any other type accessible to python through ctypes.
Would c_wchar be a good candidate ?, but it seems I can't make that assumption, as python 2.7 could be compiled in UCS-2 if I'm right (source), and I guess windows is really waiting fo UCS-4 there... even if it seems that ctypes.wintypes.LPWSTR is an alias to c_wchart_p in cPython 2.7 (code).
For this question, it is safe to make the assumption that the target platform is python 2.7 on Windows if that helps.
Context (if it has some importance):
I'm in the process of delving for the first time in ctypes to attempt a plain python fix at cPython 2.7's bug hitting windows subprocess.Popen(..) implementation. This bug is a won't fix. This bug prevents the usage of unicode in command line calls (as executable name or arguments). This is fixed in python 3, so I'm having a go at reverse implementing in plain python the actual cPython3 implementation of the required CreateProcess(..) in _winapi.c which calls in turn getenvironment(..).
This possible workaround was mentionned in the comments of this answer to a question related to subprocess.Popen(..) unicode issues.
This doesn't answer the part in the title about build specifically UCS4 buffer. But it gives a partial answer to the question in bold and manage to create a unicode buffer that seems to work on my current python 2.7 on windows: (so maybe UCS4 is not required).
So we are here taking the assumption that c_wchar is what windows require (if it is UCS4 or UCS2 is not so clear to me yet, and it might have no importance, but I recon having a very light confidence in my knowledge here).
So here is the python code that reproduces the C code as requested in the question:
## creation of buffer of size totalsize
wenv = (c_wchar * totalsize)()
wenv.value = (unicode("").join([
unicode("%s=%s\0") % (key, value)
for k, v in env.items()])) + "\0"
This wenv can then be fed to CreateProcessW and this seems to work.

Profiling parallel code Julia

How can we profile parallel code in julia? This question has been asked before. In fact, following the advice to call profile on each node does not work.
function profileSlaveTask(param)
#profile slaveTask(param)
return Profile.retrieve()
for i=1:length(machines)
rrefs[i]= #spawnat machines[i] slaveTask(initdamp)
pres= fetch(rrefs[1])
using ProfileView
Using ProfileView I obtain :
Works just fine for me (julia 0.4.0-dev, Ubuntu 14.04):
p = addprocs(1)[1]
#everywhere function profile_svd(A)
println("starting profile_svd")
#profile svd(sdata(A))
println("done with svd")
println("about to allocate SharedArray")
A = SharedArray(Float64,1000,1000)
println("about to fill SharedArray")
println("about to call worker")
bt, lidict = remotecall_fetch(p, profile_svd, A)

Using parfor and labSend/labRecieve

I want to run two matlab scripts in parallel for a project and communicate between them. The purpose of this is to have one script do image analysis and sending the results to the other which will use it for more calculations (time consuming, but not related to the task of finding stuff in the images). Since both tasks are time consuming, and should preferably be done in real time, I believe that parallelization is necessary.
To get a feel for how this should be done I created a test script to find out how to communicate between the two scripts.
The first script takes a user input using the built in function input, and then using labSend sends it to the other, which recieves it, and prints it.
function [blarg] = inputStuff(blarg)
mpiInit(); %added because of error message, but do not work...
for i=1:2
labBarrier; % added because of error message
inp = input('Enter a number to write');
if (inp == 0)
i = 1;
function [ blarg ] = testWrite( blarg )
mpiInit(); % added because of error message, but does not help
par = 0;
if ( blarg == 0)
par = 1;
for i = 1:10
if (par == 1)
delta = labReceive();
i = 1;
delta = input('Enter number to write');
if (delta == 0)
s = strcat('This lab no', num2str(labindex), '. Delta is = ')
%%This is the file test_parfor.m
funlist = {#inputStuff, #testWrite};
mpiInit(); % added because of error message, but does not help
parfor i=1:2
matlabpool close;
Then, when the code is run, the following error message appears:
Starting matlabpool using the 'local' profile ... connected to 2 labs.
Error using parallel_function (line 589)
The MPI implementation has not yet been loaded. Please
call mpiInit.
Error stack:
testWrite.m at 11
Error in test_parfor (line 8)
parfor i=1:2
Calling the method mpiInit does not help... (Called as shown in the code above.)
And nowhere in the examples that mathworks have in the documentation, or on their website, show this error or what to do with it.
Any help is appreciated!
You would typically use constructs such as labSend, labRecieve and labBarrier within an spmd block, rather than a parfor block.
parfor is intended for implementing embarrassingly parallel algorithms, in other words algorithms that consist of multiple independent tasks that can be run in parallel, and do not require communication between tasks.
I'm stretching my knowledge here (perhaps someone more expert can correct me), but as I understand things, it does not set up an MPI ring for communication between workers, which is probably the explanation for the (rather uninformative) error message you're getting.
An spmd block enables communication between workers using labSend, labRecieve and labBarrier. There are quite a few examples of using them all in the documentation.
Sam is right that the MPI functionality is not enabled during parfor, only during spmd. You need to do something more like this:
(Sam is also quite right that the error message you saw is pretty unhelpful)
