Why is SharedArray not correct after operation? - parallel-processing

I have the following simple code to understand using the #distributed command, as in the docs https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1:
using Distributed
using SharedArrays
using DelimitedFiles
f(x) = x^2
t = collect(-1:0.001:1)
y = SharedArray{Float64}(size(t,1))
#distributed for i in 1:size(t,1)
y[i] = f(t[i])
end
file = open("foo.dat", "w")
writedlm(file, [t, y])
close(file)
But when I open the file data = readdlm("foo.dat"), the y values are all zero. Interestingly enough, if I were running a Jupyter notebook, and the file write section,
file = open("foo.dat", "w")
writedlm(file, [t, y])
close(file)
was in a different cell, then the file contains the correct content. This is consistent in the REPL where running the data write commands works fine. Additionally, if the above code is in a script, the foo.dat file is also incorrect unless I have something before the writedlm command dealing with y. For example, having println(y) before writedlm(file, [t, y]), then foo.dat will contain the correct contents. Is there something I'm not doing right? It seems that there is a workaround by simply doing something with y before writing it to file, but it just seems like a weird bug and I'm wondering if anyone has any suggestions, or if this is something that should be brought up as an issue on GitHub.

The macro #distributed is launching the distributed computations asynchronously using green threads to control them. You should wait until they complete before processing further the data (e.g. writing to a file).
Hence your loop should look like this:
#sync #distributed for i in 1:size(t,1)
y[i] = f(t[i])
end
Additionally, your code does not spawn any worker processes.
You could run for an example to add two workers:
addprocs(2)
But then you will notice that your #distributed loop crashes because your f function should be defined across all worker processes not just the master. Hence, your code should look like:
#everywhere f(x) = x^2
The above line should be after the addprocs command.
Happy distributed computing!

Related

Parallel for loop in Julia

I am aware there are a multitude of questions about running parallel for loops in Julia, using #threads, #distributed, and other methods. I have tried to implement the solutions there with no luck. The structure of what I'd like to do is as follows.
for index in list_of_indices
data = h5read("data_set_$index.h5")
result = perform_function(data)
save(result)
end
The data sets are independent, and no part of this loop depends on any other. It seems this should be parallelizable.
I have tried, e.g.,
"#threads for index in list_of_indices..." and I get a segmentation error
"#distributed for index in list_of_indices..." and the code does not actually perform the function on my data.
I assume I'm missing something about how parallel processes work, and any insight would be appreciated.
Here is a MWE:
Assume we have files data_1.h5, data_2.h5, data_3.h5 in our working directory. (I don't know how to make things more self-contained than this because I think the problem is arising from asking multiple threads to read files.)
using Distributed
using HDF5
list = [1,2,3]
Threads.#threads for index in list
data = h5read("data_$index.h5", "data")
println(data)
end
The error I get is
signal (11): Segmentation fault
signal (6): Aborted
Allocations: 1587194 (Pool: 1586780; Big: 414); GC: 1
Segmentation fault (core dumped)
As noted by other people there is no enough details. However, given the current state of information the safest code that has the highest chance to work is:
using Distributed
addprocs(4)
#everywhere using HDF5
list = [1,2,3]
#sync #distributed for index in list
data = h5read("data_$index.h5", "data")
println(data)
end
Distributed approach separates processes completely and hence you have much lesser chance to do something wrong (eg. use a library with a shared resource etc).

Julia: call a parallel loop in another file

Suppose I have these two files.
test_file.jl
using Distributed
function loop(N)
#distributed for i in 1:N
println(i)
end
end
test_file_call.jl
include("test_file.jl")
loop(10)
If I run julia -p 2 test_file_call.jl, I expect the function loop executed on different processors, printing out 10 numbers in an arbitrary number. However, this command doesn't render anything.
I'm not sure what I did wrong? It's just a simple loop. Is it possible that I include a parallel loop in file A, write another file, B, that contains this loop, and call B to execute the parallel loop in file A? This two file structure is what I want. Is that doable?
The problem is that you forgot to #sync your loop so it exits before it actually has time to print anything.
Hence this should be:
function loop(N)
#sync #distributed for i in 1:N
println(i, " ",myid())
end
end

Julia: serialize error when sending large objects to workers

I am trying to send data from the master process to worker processes. I am able to do so just fine with relatively small pieces of data. But, as soon as they get above a certain size, I encounter a serialize error.
Is there a way to resolve this, or would I just need to break my objects down into smaller pieces and then reassemble them on the workers? If so, is there a good way to determine ahead of time the max size that I can send (which I suppose may be dependent upon system variables)? Below is code showing a transfer that works and one that fails. It's possible the sizes might need to be tinkered with to reproduce on other systems.
function sendto(p::Int; args...)
for (nm, val) in args
#spawnat(p, eval(Main, Expr(:(=), nm, val)))
end
end
X1 = rand(10^5, 10^3);
X2 = rand(10^6, 10^3);
sendto(2, X1 = X1) ## works fine
sendto(2, X2 = X2)
ERROR: write: invalid argument (EINVAL)
in yieldto at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in wait at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in stream_wait at /Applications/Julia-0.4.6.app/Contents/Resources/julia/lib/julia/sys.dylib
in uv_write at stream.jl:962
in buffer_or_write at stream.jl:982
in write at stream.jl:1011
in serialize_array_data at serialize.jl:164
in serialize at serialize.jl:181
in serialize at serialize.jl:127
in serialize at serialize.jl:310
in serialize_any at serialize.jl:422
in send_msg_ at multi.jl:222
in remotecall at multi.jl:726
in sendto at none:3
Note: I have plenty of system memory, even for two copies of the larger object, so the problem isn't in that.
This issue seems to be resolved now with Julia 0.5.

Effect of print("") in a while loop in julia

I encountered a strange bug in julia. Essentially, adding a print("") statement somewhere sensibly changes the behavior of the following code (in a positive way). I am quite puzzled. Why?
xs = [1,2,3,4,5,6,7,8]
cmds = [`sleep $x` for x in xs]
f = open("results.txt", "w")
i = 1
nb_cmds = length(cmds)
max_running_ps = 3
nb_running_ps = 0
ps = Dict()
while true
# launching new processes if possible
if nb_running_ps < max_running_ps
if i <= nb_cmds && !(i in keys(ps))
print("spawn:")
println(i)
p = spawn(cmds[i], Base.DevNull, f, f)
setindex!(ps,p,i)
nb_running_ps = nb_running_ps + 1
i = i+1
end
end
# detecting finished processes to be able to launch new ones
for j in keys(ps)
if process_exited(ps[j])
print("endof:")
println(j)
delete!(ps,j)
nb_running_ps = nb_running_ps - 1
else
print("")
# why do I need that ????
end
end
# nothing runs and there is nothing to run
if nb_running_ps <= 0 && i > nb_cmds
break
end
end
close(f)
println("finished")
(Indeed, the commands are in fact more useful than sleep.)
If the print("") is removed or commented, the content of the conditional "if process_exited(ps[j])" seems to never run, and the program runs into an infinite while loop even though the first max_running_ps processes have finished.
Some background: I need to run a piece of code which takes quite a long time to run (and uses a lot of memory), for different values of a set of parameters (represented here by x). As they take a long time, I want to run them in parallel. On the other hand, there is nearly nothing to share between the different runs, so the usual parallel tools are not really relevant. Finally, I want to avoid a simple pmap, first in order to avoid loosing everything if there is a failure, and second because it may be useful to have partial results during the run. Hence this code (written in julia because the main code doing actual computations is in julia).
This is not a bug, though it might be surprising if you are not used to this kind of asynchronous programming.
Julia is single-threaded by default and only one task will run at once. And for a task to finish, Julia needs to switch to it. Tasks are switched whenever the current task yields.
print is also an asynchronous operation and so it will yield for you, but a simpler way to do it is yield(), which achieves the same result. For more information on asynchronous programming, see the documentation.

Self-restarting MathKernel - is it possible in Mathematica?

This question comes from the recent question "Correct way to cap Mathematica memory use?"
I wonder, is it possible to programmatically restart MathKernel keeping the current FrontEnd process connected to new MathKernel process and evaluating some code in new MathKernel session? I mean a "transparent" restart which allows a user to continue working with the FrontEnd while having new fresh MathKernel process with some code from the previous kernel evaluated/evaluating in it?
The motivation for the question is to have a way to automatize restarting of MathKernel when it takes too much memory without breaking the computation. In other words, the computation should be automatically continued in new MathKernel process without interaction with the user (but keeping the ability for user to interact with the Mathematica as it was originally). The details on what code should be evaluated in new kernel are of course specific for each computational task. I am looking for a general solution how to automatically continue the computation.
From a comment by Arnoud Buzing yesterday, on Stack Exchange Mathematica chat, quoting entirely:
In a notebook, if you have multiple cells you can put Quit in a cell by itself and set this option:
SetOptions[$FrontEnd, "ClearEvaluationQueueOnKernelQuit" -> False]
Then if you have a cell above it and below it and select all three and evaluate, the kernel will Quit but the frontend evaluation queue will continue (and restart the kernel for the last cell).
-- Arnoud Buzing
The following approach runs one kernel to open a front-end with its own kernel, which is then closed and reopened, renewing the second kernel.
This file is the MathKernel input, C:\Temp\test4.m
Needs["JLink`"];
$FrontEndLaunchCommand="Mathematica.exe";
UseFrontEnd[
nb = NotebookOpen["C:\\Temp\\run.nb"];
SelectionMove[nb, Next, Cell];
SelectionEvaluate[nb];
];
Pause[8];
CloseFrontEnd[];
Pause[1];
UseFrontEnd[
nb = NotebookOpen["C:\\Temp\\run.nb"];
Do[SelectionMove[nb, Next, Cell],{12}];
SelectionEvaluate[nb];
];
Pause[8];
CloseFrontEnd[];
Print["Completed"]
The demo notebook, C:\Temp\run.nb contains two cells:
x1 = 0;
Module[{},
While[x1 < 1000000,
If[Mod[x1, 100000] == 0, Print["x1=" <> ToString[x1]]]; x1++];
NotebookSave[EvaluationNotebook[]];
NotebookClose[EvaluationNotebook[]]]
Print[x1]
x1 = 0;
Module[{},
While[x1 < 1000000,
If[Mod[x1, 100000] == 0, Print["x1=" <> ToString[x1]]]; x1++];
NotebookSave[EvaluationNotebook[]];
NotebookClose[EvaluationNotebook[]]]
The initial kernel opens a front-end and runs the first cell, then it quits the front-end, reopens it and runs the second cell.
The whole thing can be run either by pasting (in one go) the MathKernel input into a kernel session, or it can be run from a batch file, e.g. C:\Temp\RunTest2.bat
#echo off
setlocal
PATH = C:\Program Files\Wolfram Research\Mathematica\8.0\;%PATH%
echo Launching MathKernel %TIME%
start MathKernel -noprompt -initfile "C:\Temp\test4.m"
ping localhost -n 30 > nul
echo Terminating MathKernel %TIME%
taskkill /F /FI "IMAGENAME eq MathKernel.exe" > nul
endlocal
It's a little elaborate to set up, and in its current form it depends on knowing how long to wait before closing and restarting the second kernel.
Perhaps the parallel computation machinery could be used for this? Here is a crude set-up that illustrates the idea:
Needs["SubKernels`LocalKernels`"]
doSomeWork[input_] := {$KernelID, Length[input], RandomReal[]}
getTheJobDone[] :=
Module[{subkernel, initsub, resultSoFar = {}}
, initsub[] :=
( subkernel = LaunchKernels[LocalMachine[1]]
; DistributeDefinitions["Global`"]
)
; initsub[]
; While[Length[resultSoFar] < 1000
, DistributeDefinitions[resultSoFar]
; Quiet[ParallelEvaluate[doSomeWork[resultSoFar], subkernel]] /.
{ $Failed :> (Print#"Ouch!"; initsub[])
, r_ :> AppendTo[resultSoFar, r]
}
]
; CloseKernels[subkernel]
; resultSoFar
]
This is an over-elaborate setup to generate a list of 1,000 triples of numbers. getTheJobDone runs a loop that continues until the result list contains the desired number of elements. Each iteration of the loop is evaluated in a subkernel. If the subkernel evaluation fails, the subkernel is relaunched. Otherwise, its return value is added to the result list.
To try this out, evaluate:
getTheJobDone[]
To demonstrate the recovery mechanism, open the Parallel Kernel Status window and kill the subkernel from time-to-time. getTheJobDone will feel the pain and print Ouch! whenever the subkernel dies. However, the overall job continues and the final result is returned.
The error-handling here is very crude and would likely need to be bolstered in a real application. Also, I have not investigated whether really serious error conditions in the subkernels (like running out of memory) would have an adverse effect on the main kernel. If so, then perhaps subkernels could kill themselves if MemoryInUse[] exceeded a predetermined threshold.
Update - Isolating the Main Kernel From Subkernel Crashes
While playing around with this framework, I discovered that any use of shared variables between the main kernel and subkernel rendered Mathematica unstable should the subkernel crash. This includes the use of DistributeDefinitions[resultSoFar] as shown above, and also explicit shared variables using SetSharedVariable.
To work around this problem, I transmitted the resultSoFar through a file. This eliminated the synchronization between the two kernels with the net result that the main kernel remained blissfully unaware of a subkernel crash. It also had the nice side-effect of retaining the intermediate results in the event of a main kernel crash as well. Of course, it also makes the subkernel calls quite a bit slower. But that might not be a problem if each call to the subkernel performs a significant amount of work.
Here are the revised definitions:
Needs["SubKernels`LocalKernels`"]
doSomeWork[] := {$KernelID, Length[Get[$resultFile]], RandomReal[]}
$resultFile = "/some/place/results.dat";
getTheJobDone[] :=
Module[{subkernel, initsub, resultSoFar = {}}
, initsub[] :=
( subkernel = LaunchKernels[LocalMachine[1]]
; DistributeDefinitions["Global`"]
)
; initsub[]
; While[Length[resultSoFar] < 1000
, Put[resultSoFar, $resultFile]
; Quiet[ParallelEvaluate[doSomeWork[], subkernel]] /.
{ $Failed :> (Print#"Ouch!"; CloseKernels[subkernel]; initsub[])
, r_ :> AppendTo[resultSoFar, r]
}
]
; CloseKernels[subkernel]
; resultSoFar
]
I have a similar requirement when I run a CUDAFunction for a long loop and CUDALink ran out of memory (similar here: https://mathematica.stackexchange.com/questions/31412/cudalink-ran-out-of-available-memory). There's no improvement on the memory leak even with the latest Mathematica 10.4 version. I figure out a workaround here and hope that you may find it's useful. The idea is that you use a bash script to call a Mathematica program (run in batch mode) multiple times with passing parameters from the bash script. Here is the detail instruction and demo (This is for Window OS):
To use bash-script in Win_OS you need to install cygwin (https://cygwin.com/install.html).
Convert your mathematica notebook to package (.m) to be able to use in script mode. If you save your notebook using "Save as.." all the command will be converted to comments (this was noted by Wolfram Research), so it's better that you create a package (File->New-Package), then copy and paste your commands to that.
Write the bash script using Vi editor (instead of Notepad or gedit for window) to avoid the problem of "\r" (http://www.linuxquestions.org/questions/programming-9/shell-scripts-in-windows-cygwin-607659/).
Here is a demo of the test.m file
str=$CommandLine;
len=Length[str];
Do[
If[str[[i]]=="-start",
start=ToExpression[str[[i+1]]];
Pause[start];
Print["Done in ",start," second"];
];
,{i,2,len-1}];
This mathematica code read the parameter from a commandline and use it for calculation.
Here is the bash script (script.sh) to run test.m many times with different parameters.
#c:\cygwin64\bin\bash
for ((i=2;i<10;i+=2))
do
math -script test.m -start $i
done
In the cygwin terminal type "chmod a+x script.sh" to enable the script then you can run it by typing "./script.sh".
You can programmatically terminate the kernel using Exit[]. The front end (notebook) will automatically start a new kernel when you next try to evaluate an expression.
Preserving "some code from the previous kernel" is going to be more difficult. You have to decide what you want to preserve. If you think you want to preserve everything, then there's no point in restarting the kernel. If you know what definitions you want to save, you can use DumpSave to write them to a file before terminating the kernel, and then use << to load that file into the new kernel.
On the other hand, if you know what definitions are taking up too much memory, you can use Unset, Clear, ClearAll, or Remove to remove those definitions. You can also set $HistoryLength to something smaller than Infinity (the default) if that's where your memory is going.
Sounds like a job for CleanSlate.
<< Utilities`CleanSlate`;
CleanSlate[]
From: http://library.wolfram.com/infocenter/TechNotes/4718/
"CleanSlate, tries to do everything possible to return the kernel to the state it was in when the CleanSlate.m package was initially loaded."

Resources