Effect of print("") in a while loop in julia - parallel-processing

I encountered a strange bug in julia. Essentially, adding a print("") statement somewhere sensibly changes the behavior of the following code (in a positive way). I am quite puzzled. Why?
xs = [1,2,3,4,5,6,7,8]
cmds = [`sleep $x` for x in xs]
f = open("results.txt", "w")
i = 1
nb_cmds = length(cmds)
max_running_ps = 3
nb_running_ps = 0
ps = Dict()
while true
# launching new processes if possible
if nb_running_ps < max_running_ps
if i <= nb_cmds && !(i in keys(ps))
print("spawn:")
println(i)
p = spawn(cmds[i], Base.DevNull, f, f)
setindex!(ps,p,i)
nb_running_ps = nb_running_ps + 1
i = i+1
end
end
# detecting finished processes to be able to launch new ones
for j in keys(ps)
if process_exited(ps[j])
print("endof:")
println(j)
delete!(ps,j)
nb_running_ps = nb_running_ps - 1
else
print("")
# why do I need that ????
end
end
# nothing runs and there is nothing to run
if nb_running_ps <= 0 && i > nb_cmds
break
end
end
close(f)
println("finished")
(Indeed, the commands are in fact more useful than sleep.)
If the print("") is removed or commented, the content of the conditional "if process_exited(ps[j])" seems to never run, and the program runs into an infinite while loop even though the first max_running_ps processes have finished.
Some background: I need to run a piece of code which takes quite a long time to run (and uses a lot of memory), for different values of a set of parameters (represented here by x). As they take a long time, I want to run them in parallel. On the other hand, there is nearly nothing to share between the different runs, so the usual parallel tools are not really relevant. Finally, I want to avoid a simple pmap, first in order to avoid loosing everything if there is a failure, and second because it may be useful to have partial results during the run. Hence this code (written in julia because the main code doing actual computations is in julia).

This is not a bug, though it might be surprising if you are not used to this kind of asynchronous programming.
Julia is single-threaded by default and only one task will run at once. And for a task to finish, Julia needs to switch to it. Tasks are switched whenever the current task yields.
print is also an asynchronous operation and so it will yield for you, but a simpler way to do it is yield(), which achieves the same result. For more information on asynchronous programming, see the documentation.

Related

Why is SharedArray not correct after operation?

I have the following simple code to understand using the #distributed command, as in the docs https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1:
using Distributed
using SharedArrays
using DelimitedFiles
f(x) = x^2
t = collect(-1:0.001:1)
y = SharedArray{Float64}(size(t,1))
#distributed for i in 1:size(t,1)
y[i] = f(t[i])
end
file = open("foo.dat", "w")
writedlm(file, [t, y])
close(file)
But when I open the file data = readdlm("foo.dat"), the y values are all zero. Interestingly enough, if I were running a Jupyter notebook, and the file write section,
file = open("foo.dat", "w")
writedlm(file, [t, y])
close(file)
was in a different cell, then the file contains the correct content. This is consistent in the REPL where running the data write commands works fine. Additionally, if the above code is in a script, the foo.dat file is also incorrect unless I have something before the writedlm command dealing with y. For example, having println(y) before writedlm(file, [t, y]), then foo.dat will contain the correct contents. Is there something I'm not doing right? It seems that there is a workaround by simply doing something with y before writing it to file, but it just seems like a weird bug and I'm wondering if anyone has any suggestions, or if this is something that should be brought up as an issue on GitHub.
The macro #distributed is launching the distributed computations asynchronously using green threads to control them. You should wait until they complete before processing further the data (e.g. writing to a file).
Hence your loop should look like this:
#sync #distributed for i in 1:size(t,1)
y[i] = f(t[i])
end
Additionally, your code does not spawn any worker processes.
You could run for an example to add two workers:
addprocs(2)
But then you will notice that your #distributed loop crashes because your f function should be defined across all worker processes not just the master. Hence, your code should look like:
#everywhere f(x) = x^2
The above line should be after the addprocs command.
Happy distributed computing!

Ruby multithreading

I have code that is running in a manner similar to the sample code below. There are two threads that loop at certain time intervals. The first thread sets a flag and depending on the value of this flag the second thread prints out a result. My question is in a situation like this where only one thread is changing the value of the resource (#flag), and the second thread is only accessing its value but not changing it, is a mutex lock required? Any explanations?
Class Sample
def initialize
#flag=""
#wait_interval1 = 20
#wait_interval2 = 5
end
def thread1(x)
Thread.start do
loop do
if x.is_a?(String)
#flag = 0
else
#flag = 1
sleep #wait_interval
end
end
end
def thread2(y)
Thread.start do
loop do
if #flag == 0
if y.start_with?("a")
puts "yes"
else
puts "no"
end
end
end
end
end
end
As a general rule, the mutex lock is required (or better yet, a read/write lock so multiple reads can run in parallel and the exclusive lock's only needed when changing the value).
It's possible to avoid needing the lock if you can guarantee that the underlying accesses (both the read and the write) are atomic (they happen as one uninterruptible action so it's not possible for two to overlap). On modern multi-core and multi-processor hardware it's difficult to guarantee that, and when you add in virtualization and semi-interpreted languages like Ruby it's all but impossible. Don't be fooled into thinking that being 99.999% certain there won't be an overlap is enough, that just means you can expect an error due to lack of locking once every 100,000 iterations which translates to several times a second for your code and probably at least once every couple of seconds for the kind of code you'd see in a real application. That's why it's advisable to follow the general rule and not worry about when it's safe to break it until you've exhausted every other option for getting acceptable performance and shown through profiling that it's acquiring/releasing that lock that's the bottleneck.

Fortran issue with multiple cpus

I'm running some code written in fortan. It is made of several subroutines and I share variables among them using global variables specified in a module.
The problem occurs when using multiple cpus. In one subroutine the code should update a value of a local variable by the value of a global variable. It so happens that in some random passes though the subroutine the code does not update the variables when I run it using multiple cpus. However if I pause it and make it go up to force the code to pass in the piece of code that updates the variable it works! Magic! I've then implemented a loop that checks if the variable was updated and tries to go back using (GOTO's) in the code to make it update the variables.... but for 2 tries it still sometimes do not update the variables. If I run the code with only one core then it works fine.... Any ideas??
Thanks
Piece of code:
Subroutine1() !Where the variable A0 should be updated
nTries = 0
777 IF (nItems.NE.0) THEN
DO J = 1,nItems
IF (nint(mDATA(J,3)).EQ.nint(XCOORD+U1NE0)
& .AND. nint(mDATA(J,4)).EQ.nint(YCOORD+U2NE0) .AND.
2 nint(mDATA(J,5)).EQ.nint(ZCOORD+U3NE0)) THEN
A0 = mDATA(J,1)
JNODE = mDATA(J,2)
EXIT
ELSE
A0 = A02
ENDIF
ENDDO
IF (A0.EQ.ZERO) THEN !If the variable was not updated
IF (nTries.LE.2) THEN
nTries = nTries + 1
GOTO 777
ENDIF
write(6,*) "ZERO A0", IELEM, JTYPE
A0 = MAXT
ENDIF
I don't exactly know how Abaqus interacts with your FORTRAN subroutines, nor is it clear from the above code what is going wrong, but what you're running into seems to be a classical example of a "race condition," which what you're calling "one core going ahead of the other."
A general comment is that GOTOs and global variables are extremely dangerous in that they make programs very hard to reason about. These problems compound once you start parallelizing. If Abaqus is doing some kind of "black box" computation that it is responsible for parallelizing, you (as a user who is only preprocessing and postprocessing the data) should be insulated from this. However, from the above, it sounds like you're doing some stuff that is interleaved with the Abaqus parallel computation. In that case, you need to make sure everything you're doing is thread-safe. Among many other things, you absolutely need to make sure you're not writing to any global variables.
Another comment is that your checking of A0 is basically a lock called a "spinlock." This is one way of making things thread-safe, but locks have pitfalls of their own. If Abaqus doesn't give you a way to synchronize all of the threads and guarantee that it's done with its job, some sort of lock like this may be the way to go.

MPI : Wondering about process sequence and for-loops

I am trying to build something with MPI, so since i am not very familiar with it, i started with some arrays and printing stuff. I noticed that a plain C command (not an MPI one) works simultaneously on every process, i.e. printing something like that :
printf("Process No.%d",rank);
Them i noticed that the numbers of the processes got all scrambled and because the right sequence of the processes would fit me, i tried using a for-loop like that :
for(rank=0; rank<processes; rank++) printf("Process No.%d",rank);
And that started a third world war in my computer. Lots of strange errors in a strange format that i couldn't understand and that made me suspicious. How is it possible since an if-loop stating a ranks value , like the master rank:
if(rank==0) printf("Process No.%d",rank);
cant use a for-loop for the same reason. Well, that is my first question.
My second question is about an other for-loop i used, that it got ignored.
printf("PROCESS --------------->**%d**\n",id);
for (i = 0; i < PARTS; ++i){
printf("Array No.%d\n", i+1);
for (j = 0; j < MAXWORDS; ++j)
printf("%d, ",0);
printf("\n\n");
}
I run that for-loop and every process printed only the first line:
$ mpiexec -n 6 `pwd`/test
PROCESS --------------->**0**
PROCESS --------------->**1**
PROCESS --------------->**3**
PROCESS --------------->**2**
PROCESS --------------->**4**
PROCESS --------------->**5**
And not the following amount of zeros (there was an array there at first that i removed cause i was trying to figure out why it didn't get printed).
So, why is it about MPI and for-loops that don't get along?
--edit 1: grammar
--edit 2: Code paste
It is not the same as above, but same problem in the the last for-loop with fprintf.
This is a paste zone, sorry for that, i couldn't deal with the code system here
--edit 3: fixed
Well i finally figured it out. For first i have to say that the fprintf function when used inside MPI is a mess. Apparently there is a kind of overlap while every process writes in a text file. I tested it with the printf function and it worked. The second thing i was doing is, i was calling the MPI_Scatter function from inside root:
if(rank==root) MPI_Scatter();
..which only scatters the data inside the process and not the others.
Now that i have fixed those two issues, the program works as it should, apart a minor problem when i printf the my_list arrays. It seems like every array has a random number of inputs, but when i tested using a counter for every array, it's only the data that is printed like this. Tried using fflush(stdout); but it returned me an error.
usr/lib/gcc/x86_64-pc-linux-gnu/4.2.2/../../../../x86_64-pc-linux-gnu/bin/ld: final link failed: `Input/output error collect2: ld returned 1 exit status`
MPI in and of itself does not have a problem with for loops. However, just like with any other software, you should always remember that it will work the way you code it, not the way you intend it. It appears that you are having two distinct issues, both of which are only tangentially related to MPI.
The first issue is that the variable PARTS is defined in such a way that it depends on another variable, procs, which is not initialized at before-hand. This means that the value of PARTS is undefined, and probably ends up causing a divide-by-zero as often as not. PARTS should actually be set after line 44, when procs is initialized.
The second issue is with the loop for(i = 0; i = LISTS; i++) labeled /*Here is the problem*/. First of all, the test condition of the loop always sets i to the value of LISTS, regardless of the initial value of 0 and the increment at the end of the loop. Perhaps it was intended to be i < LISTS? Secondly, LISTS is initialized in a way that depends on PARTS, which depends on procs, before that variable is initialized. As with PARTS, LISTS must be initialized after the statement MPI_Comm_size(MPI_COMM_WORLD, &procs); on line 44.
Please be more careful when you write your loops. Also, make sure that you initialize variables correctly. I highly recommend using print statements (for small programs) or the debugger to make sure your variables are being set to the expected values.

Self-restarting MathKernel - is it possible in Mathematica?

This question comes from the recent question "Correct way to cap Mathematica memory use?"
I wonder, is it possible to programmatically restart MathKernel keeping the current FrontEnd process connected to new MathKernel process and evaluating some code in new MathKernel session? I mean a "transparent" restart which allows a user to continue working with the FrontEnd while having new fresh MathKernel process with some code from the previous kernel evaluated/evaluating in it?
The motivation for the question is to have a way to automatize restarting of MathKernel when it takes too much memory without breaking the computation. In other words, the computation should be automatically continued in new MathKernel process without interaction with the user (but keeping the ability for user to interact with the Mathematica as it was originally). The details on what code should be evaluated in new kernel are of course specific for each computational task. I am looking for a general solution how to automatically continue the computation.
From a comment by Arnoud Buzing yesterday, on Stack Exchange Mathematica chat, quoting entirely:
In a notebook, if you have multiple cells you can put Quit in a cell by itself and set this option:
SetOptions[$FrontEnd, "ClearEvaluationQueueOnKernelQuit" -> False]
Then if you have a cell above it and below it and select all three and evaluate, the kernel will Quit but the frontend evaluation queue will continue (and restart the kernel for the last cell).
-- Arnoud Buzing
The following approach runs one kernel to open a front-end with its own kernel, which is then closed and reopened, renewing the second kernel.
This file is the MathKernel input, C:\Temp\test4.m
Needs["JLink`"];
$FrontEndLaunchCommand="Mathematica.exe";
UseFrontEnd[
nb = NotebookOpen["C:\\Temp\\run.nb"];
SelectionMove[nb, Next, Cell];
SelectionEvaluate[nb];
];
Pause[8];
CloseFrontEnd[];
Pause[1];
UseFrontEnd[
nb = NotebookOpen["C:\\Temp\\run.nb"];
Do[SelectionMove[nb, Next, Cell],{12}];
SelectionEvaluate[nb];
];
Pause[8];
CloseFrontEnd[];
Print["Completed"]
The demo notebook, C:\Temp\run.nb contains two cells:
x1 = 0;
Module[{},
While[x1 < 1000000,
If[Mod[x1, 100000] == 0, Print["x1=" <> ToString[x1]]]; x1++];
NotebookSave[EvaluationNotebook[]];
NotebookClose[EvaluationNotebook[]]]
Print[x1]
x1 = 0;
Module[{},
While[x1 < 1000000,
If[Mod[x1, 100000] == 0, Print["x1=" <> ToString[x1]]]; x1++];
NotebookSave[EvaluationNotebook[]];
NotebookClose[EvaluationNotebook[]]]
The initial kernel opens a front-end and runs the first cell, then it quits the front-end, reopens it and runs the second cell.
The whole thing can be run either by pasting (in one go) the MathKernel input into a kernel session, or it can be run from a batch file, e.g. C:\Temp\RunTest2.bat
#echo off
setlocal
PATH = C:\Program Files\Wolfram Research\Mathematica\8.0\;%PATH%
echo Launching MathKernel %TIME%
start MathKernel -noprompt -initfile "C:\Temp\test4.m"
ping localhost -n 30 > nul
echo Terminating MathKernel %TIME%
taskkill /F /FI "IMAGENAME eq MathKernel.exe" > nul
endlocal
It's a little elaborate to set up, and in its current form it depends on knowing how long to wait before closing and restarting the second kernel.
Perhaps the parallel computation machinery could be used for this? Here is a crude set-up that illustrates the idea:
Needs["SubKernels`LocalKernels`"]
doSomeWork[input_] := {$KernelID, Length[input], RandomReal[]}
getTheJobDone[] :=
Module[{subkernel, initsub, resultSoFar = {}}
, initsub[] :=
( subkernel = LaunchKernels[LocalMachine[1]]
; DistributeDefinitions["Global`"]
)
; initsub[]
; While[Length[resultSoFar] < 1000
, DistributeDefinitions[resultSoFar]
; Quiet[ParallelEvaluate[doSomeWork[resultSoFar], subkernel]] /.
{ $Failed :> (Print#"Ouch!"; initsub[])
, r_ :> AppendTo[resultSoFar, r]
}
]
; CloseKernels[subkernel]
; resultSoFar
]
This is an over-elaborate setup to generate a list of 1,000 triples of numbers. getTheJobDone runs a loop that continues until the result list contains the desired number of elements. Each iteration of the loop is evaluated in a subkernel. If the subkernel evaluation fails, the subkernel is relaunched. Otherwise, its return value is added to the result list.
To try this out, evaluate:
getTheJobDone[]
To demonstrate the recovery mechanism, open the Parallel Kernel Status window and kill the subkernel from time-to-time. getTheJobDone will feel the pain and print Ouch! whenever the subkernel dies. However, the overall job continues and the final result is returned.
The error-handling here is very crude and would likely need to be bolstered in a real application. Also, I have not investigated whether really serious error conditions in the subkernels (like running out of memory) would have an adverse effect on the main kernel. If so, then perhaps subkernels could kill themselves if MemoryInUse[] exceeded a predetermined threshold.
Update - Isolating the Main Kernel From Subkernel Crashes
While playing around with this framework, I discovered that any use of shared variables between the main kernel and subkernel rendered Mathematica unstable should the subkernel crash. This includes the use of DistributeDefinitions[resultSoFar] as shown above, and also explicit shared variables using SetSharedVariable.
To work around this problem, I transmitted the resultSoFar through a file. This eliminated the synchronization between the two kernels with the net result that the main kernel remained blissfully unaware of a subkernel crash. It also had the nice side-effect of retaining the intermediate results in the event of a main kernel crash as well. Of course, it also makes the subkernel calls quite a bit slower. But that might not be a problem if each call to the subkernel performs a significant amount of work.
Here are the revised definitions:
Needs["SubKernels`LocalKernels`"]
doSomeWork[] := {$KernelID, Length[Get[$resultFile]], RandomReal[]}
$resultFile = "/some/place/results.dat";
getTheJobDone[] :=
Module[{subkernel, initsub, resultSoFar = {}}
, initsub[] :=
( subkernel = LaunchKernels[LocalMachine[1]]
; DistributeDefinitions["Global`"]
)
; initsub[]
; While[Length[resultSoFar] < 1000
, Put[resultSoFar, $resultFile]
; Quiet[ParallelEvaluate[doSomeWork[], subkernel]] /.
{ $Failed :> (Print#"Ouch!"; CloseKernels[subkernel]; initsub[])
, r_ :> AppendTo[resultSoFar, r]
}
]
; CloseKernels[subkernel]
; resultSoFar
]
I have a similar requirement when I run a CUDAFunction for a long loop and CUDALink ran out of memory (similar here: https://mathematica.stackexchange.com/questions/31412/cudalink-ran-out-of-available-memory). There's no improvement on the memory leak even with the latest Mathematica 10.4 version. I figure out a workaround here and hope that you may find it's useful. The idea is that you use a bash script to call a Mathematica program (run in batch mode) multiple times with passing parameters from the bash script. Here is the detail instruction and demo (This is for Window OS):
To use bash-script in Win_OS you need to install cygwin (https://cygwin.com/install.html).
Convert your mathematica notebook to package (.m) to be able to use in script mode. If you save your notebook using "Save as.." all the command will be converted to comments (this was noted by Wolfram Research), so it's better that you create a package (File->New-Package), then copy and paste your commands to that.
Write the bash script using Vi editor (instead of Notepad or gedit for window) to avoid the problem of "\r" (http://www.linuxquestions.org/questions/programming-9/shell-scripts-in-windows-cygwin-607659/).
Here is a demo of the test.m file
str=$CommandLine;
len=Length[str];
Do[
If[str[[i]]=="-start",
start=ToExpression[str[[i+1]]];
Pause[start];
Print["Done in ",start," second"];
];
,{i,2,len-1}];
This mathematica code read the parameter from a commandline and use it for calculation.
Here is the bash script (script.sh) to run test.m many times with different parameters.
#c:\cygwin64\bin\bash
for ((i=2;i<10;i+=2))
do
math -script test.m -start $i
done
In the cygwin terminal type "chmod a+x script.sh" to enable the script then you can run it by typing "./script.sh".
You can programmatically terminate the kernel using Exit[]. The front end (notebook) will automatically start a new kernel when you next try to evaluate an expression.
Preserving "some code from the previous kernel" is going to be more difficult. You have to decide what you want to preserve. If you think you want to preserve everything, then there's no point in restarting the kernel. If you know what definitions you want to save, you can use DumpSave to write them to a file before terminating the kernel, and then use << to load that file into the new kernel.
On the other hand, if you know what definitions are taking up too much memory, you can use Unset, Clear, ClearAll, or Remove to remove those definitions. You can also set $HistoryLength to something smaller than Infinity (the default) if that's where your memory is going.
Sounds like a job for CleanSlate.
<< Utilities`CleanSlate`;
CleanSlate[]
From: http://library.wolfram.com/infocenter/TechNotes/4718/
"CleanSlate, tries to do everything possible to return the kernel to the state it was in when the CleanSlate.m package was initially loaded."

Resources