Julia: call a parallel loop in another file - parallel-processing

Suppose I have these two files.
test_file.jl
using Distributed
function loop(N)
#distributed for i in 1:N
println(i)
end
end
test_file_call.jl
include("test_file.jl")
loop(10)
If I run julia -p 2 test_file_call.jl, I expect the function loop executed on different processors, printing out 10 numbers in an arbitrary number. However, this command doesn't render anything.
I'm not sure what I did wrong? It's just a simple loop. Is it possible that I include a parallel loop in file A, write another file, B, that contains this loop, and call B to execute the parallel loop in file A? This two file structure is what I want. Is that doable?

The problem is that you forgot to #sync your loop so it exits before it actually has time to print anything.
Hence this should be:
function loop(N)
#sync #distributed for i in 1:N
println(i, " ",myid())
end
end

Related

Why is SharedArray not correct after operation?

I have the following simple code to understand using the #distributed command, as in the docs https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1:
using Distributed
using SharedArrays
using DelimitedFiles
f(x) = x^2
t = collect(-1:0.001:1)
y = SharedArray{Float64}(size(t,1))
#distributed for i in 1:size(t,1)
y[i] = f(t[i])
end
file = open("foo.dat", "w")
writedlm(file, [t, y])
close(file)
But when I open the file data = readdlm("foo.dat"), the y values are all zero. Interestingly enough, if I were running a Jupyter notebook, and the file write section,
file = open("foo.dat", "w")
writedlm(file, [t, y])
close(file)
was in a different cell, then the file contains the correct content. This is consistent in the REPL where running the data write commands works fine. Additionally, if the above code is in a script, the foo.dat file is also incorrect unless I have something before the writedlm command dealing with y. For example, having println(y) before writedlm(file, [t, y]), then foo.dat will contain the correct contents. Is there something I'm not doing right? It seems that there is a workaround by simply doing something with y before writing it to file, but it just seems like a weird bug and I'm wondering if anyone has any suggestions, or if this is something that should be brought up as an issue on GitHub.
The macro #distributed is launching the distributed computations asynchronously using green threads to control them. You should wait until they complete before processing further the data (e.g. writing to a file).
Hence your loop should look like this:
#sync #distributed for i in 1:size(t,1)
y[i] = f(t[i])
end
Additionally, your code does not spawn any worker processes.
You could run for an example to add two workers:
addprocs(2)
But then you will notice that your #distributed loop crashes because your f function should be defined across all worker processes not just the master. Hence, your code should look like:
#everywhere f(x) = x^2
The above line should be after the addprocs command.
Happy distributed computing!

Missing print-out for MPI root process, after its handling data reading alone

I'm writing a project that firstly designates the root process to read a large data file and do some calculations, and secondly broadcast the calculated results to all other processes. Here is my code: (1) it reads random numbers from a txt file with nsample=30000 (2) generate dens_ent matrix by some rule (3) broadcast to other processes. Btw, I'm using OpenMPI with gfortran.
IF (myid==0) THEN
OPEN(UNIT=8,FILE='rnseed_ent20.txt')
DO i=1,n_sample
DO j=1,3
READ(8,*) rn(i,j)
END DO
END DO
CLOSE(8)
END IF
dens_ent=0.0d0
DO i=1,n_sample
IF (myid==0) THEN
!Random draws of productivity and savings
rn_zb=MC_JOINT_SAMPLE((/-0.1d0,mu_b0/),var,rn(i,1:2))
iz=minloc(abs(log(zgrid)-rn_zb(1)),dim=1)
ib=minloc(abs(log(bgrid(1:nb/2))-rn_zb(2)),dim=1) !Find the closest saving grid
CALL SUB2IND(j,(/nb,nm,nk,nxi,nz/),(/ib,1,1,1,iz/))
DO iixi=1,nxi
DO iiz=1,nz
CALL SUB2IND(jj,(/nb,nm,nk,nxi,nz/),(/policybmk_2_statebmk_index(j,:),iixi,iiz/))
dens_ent(jj)=dens_ent(jj)+1.0d0/real(nxi)*markovian(iz,iiz)*merge(1.0d0,0.0d0,vent(j) .GE. -bgrid(ib)+ce)
!Density only recorded if the value of entry is greater than b0+ce
END DO
END DO
END IF
END DO
PRINT *, 'dingdongdingdong',myid
IF (myid==0) dens_ent=dens_ent/real(n_sample)*Mpo
IF (myid==0) PRINT *, 'sum_density by joint normal distribution',sum(dens_ent)
PRINT *, 'BLBLALALALALALA',myid
CALL MPI_BCAST(dens_ent,N,MPI_DOUBLE_PRECISION,0,MPI_COMM_WORLD,ierr)
Problem arises:
(1) IF (myid==0) PRINT *, 'sum_density by joint normal distribution',sum(dens_ent) seems not executed, as there is no print out.
(2) I then verify this by adding PRINT *, 'BLBLALALALALALA',myid etc messages. Again no print out for root process myid=0.
It seems like root process is not working? How can this be true? I'm quite confused. Is it because I'm not using MPI_BARRIER before PRINT *, 'dingdongdingdong',myid?
Is it possible that you miss the following statement just at the very beginning of your code?
CALL MPI_COMM_RANK (MPI_COMM_WORLD, myid, ierr)
IF (ierr /= MPI_SUCCESS) THEN
STOP "MPI_COMM_RANK failed!"
END IF
The MPI_COMM_RANK returns into myid (if succeeds) the identifier of the process within the MPI_COMM_WORLD communicator (i.e a value within 0 and NP, where NP is the total number of MPI ranks).
Thanks for contributions from #cw21 #Harald and #Hristo Iliev.
The failure lies in unit numbering. One reference says:
unit number : This must be present and takes any integer type. Note this ‘number’ identifies the
file and must be unique so if you have more than one file open then you must specify a different
unit number for each file. Avoid using 0,5 or 6 as these UNITs are typically picked to be used by
Fortran as follows.
– Standard Error = 0 : Used to print error messages to the screen.
– Standard In = 5 : Used to read in data from the keyboard.
– Standard Out = 6 : Used to print general output to the screen.
So I changed all numbering i into 1i, not working; then changed into 10i. It starts to work. Mysteriously, as correctly pointed out by #Hristo Iliev, as long as the numbering is not 0,5,6, the code should behave properly. I cannot explain to myself why 1i not working. But anyhow, the root process is now printing out results.

Effect of print("") in a while loop in julia

I encountered a strange bug in julia. Essentially, adding a print("") statement somewhere sensibly changes the behavior of the following code (in a positive way). I am quite puzzled. Why?
xs = [1,2,3,4,5,6,7,8]
cmds = [`sleep $x` for x in xs]
f = open("results.txt", "w")
i = 1
nb_cmds = length(cmds)
max_running_ps = 3
nb_running_ps = 0
ps = Dict()
while true
# launching new processes if possible
if nb_running_ps < max_running_ps
if i <= nb_cmds && !(i in keys(ps))
print("spawn:")
println(i)
p = spawn(cmds[i], Base.DevNull, f, f)
setindex!(ps,p,i)
nb_running_ps = nb_running_ps + 1
i = i+1
end
end
# detecting finished processes to be able to launch new ones
for j in keys(ps)
if process_exited(ps[j])
print("endof:")
println(j)
delete!(ps,j)
nb_running_ps = nb_running_ps - 1
else
print("")
# why do I need that ????
end
end
# nothing runs and there is nothing to run
if nb_running_ps <= 0 && i > nb_cmds
break
end
end
close(f)
println("finished")
(Indeed, the commands are in fact more useful than sleep.)
If the print("") is removed or commented, the content of the conditional "if process_exited(ps[j])" seems to never run, and the program runs into an infinite while loop even though the first max_running_ps processes have finished.
Some background: I need to run a piece of code which takes quite a long time to run (and uses a lot of memory), for different values of a set of parameters (represented here by x). As they take a long time, I want to run them in parallel. On the other hand, there is nearly nothing to share between the different runs, so the usual parallel tools are not really relevant. Finally, I want to avoid a simple pmap, first in order to avoid loosing everything if there is a failure, and second because it may be useful to have partial results during the run. Hence this code (written in julia because the main code doing actual computations is in julia).
This is not a bug, though it might be surprising if you are not used to this kind of asynchronous programming.
Julia is single-threaded by default and only one task will run at once. And for a task to finish, Julia needs to switch to it. Tasks are switched whenever the current task yields.
print is also an asynchronous operation and so it will yield for you, but a simpler way to do it is yield(), which achieves the same result. For more information on asynchronous programming, see the documentation.

Parallel execution of program within C++/CLI

I'm writing a windows forms program (C++/CLI) that calls an executable program multiple times within a large 'for' loop. I want to do the calls to the executable in parallel since it takes up to a minute to run once.
The key part of the windows forms code is the large for loop (actually 2 loops):
for (int a=0; a<1000; a++){
for (int b=0; b<100; b++){
int run = a*100 + b;
char startstr[50], configstr[50]; strcpy(startstr, "solver.exe");
sprintf(configstr, " %d %d %d", run, a, b);
strcat(startstr, configstr);
CreateProcessA(NULL, startstr,......) ;
}
}
The integers "run", "a" and "b" are used by the solver.exe program.
"Run" is used to write a unique output text file from each program run.
"a" and "b" are numbers used to read specific input text files. These are not unique to each run.
I'm not waiting after each call to "CreateProcess" as I want these to execute in parallel.
Currently my code runs and appears to work correctly. However, it swans a huge number of instances of the solver.exe program at once causing my computer to become very slow until everything finishes.
My question is, how can I create a queue that limits the number of concurrent processes (for example to the number of physical cores on the machine) so that they don't all try to run at the same time? Memory may also be an issue when the for loops are set larger.
A secondary question is, could potential concurrent file reads by different instances of solver.exe create a problem? (I can fix this but don't want to if I don't need to.)
I'm familiar with openmp and C but this is my first attempt at running parallel processes in a windows forms program.
Thanks
I've managed to do what I want using the OpenMP function "parallel for" to run the outer loop in parallel and the function omp_set_num_threads() to set the number of concurrent processes. As suggested, the concurrent file reads haven't caused any problems on my system.

OpenMP using Critical construct crashes my code

So am writing a bit of parallel code in Fortran, but I need to use the critical block to prevent a race condition. Here's a bare-bones version of my code (it's an optimizer):
do i=2,8,2
do j=1,9-i
Ftemp=1.0e20 !A large number
!$OMP parallel do default(shared) private(...variables...)
do k=1,N
###Some code that calculates variable Fobj###
!$OMP Critical
!$OMP Flush(Ftemp,S,Fv) !Variables I want optimized
if (Fobj.lt.Ftemp) then
S=Stemp
Fv=Ft
Ftemp=Fobj
end if
!OMP Flush(Ftemp,S,Fv)
!OMP end Critical
end do !Line 122
!$OMP end parallel do !Line 123
end do
end do
So without openmp, the code works fine. It also runs without the critical commands (Flush commands are fine). The error I get is "unexpected END statement" on Line 122 and "Unexpected !$OMP end parallel do statement" on line 123. I have no idea why this won't work as the critical block is fully contained inside the parallel loop and there are no exit/goto statements that will leave or enter either... some gotos jump around the main part of the loop, but never leaving it or entering/bypassing the critical block.
As Hristo Iliev points out in the comment: Your closing directive !OMP end Critical is missing a $ right after the !.
It is treated just as a comment and ignored.

Resources