Assign MPI cpus-per-proc individually - parallel-processing

I have a code that runs on 2 MPI process, each of them using multiple threads with OpenMP. My computer (Ubuntu) has 4 CPU (cores ?). Thus I made a small script to give 2 CPU per process such that each of them can run in parallel:
# Environment variables
export CPU_PER_PROC=2
export OMP_NUM_THREADS=${CPU_PER_PROC}
export OPTION="-map-by node:PE=${CPU_PER_PROC}"
mpiexec ${OPTION} -n 2 python3 main.py
But now I would like to give, for instance, 3 CPU to the process (rank) 0 and only 1 CPU to the rank 1 such that:
Process 0 : use 3 CPU with X (certainly 3 for better performances) threads.
Process 1 : use 1 CPU with Y (certainly 1) threads.

All MPI process inherit from the parent environment by default which contains the OMP_NUM_THREADS variable. To tweak the value regarding the rank, one solution is to change the environment variable in the child processes. In Python, you can do that using for example:
import os
os.environ['OMP_NUM_THREADS'] = str(3)
This line must be execute before the OpenMP initialization (that is before the first execution of a parallel directive). You can tweak the right-end-side regarding the MPI rank.
An alternative solution is to use the omp_set_num_threads function call but this is not so simple from a Python code. Another solution is to use a num_thread clause in a parallel section, but again, this is generally an issue from a Python code. A last alternative is to calla shell script that changes the environment before calling the Python script but this script can hardly get the MPI rank (AFAIK there are shell variable to get this but they are dependent of the MPI implementation and so not portable).

Related

mpirun without options runs a program on one process only

Here I read
If no value is provided for the number of copies to execute (i.e.,
neither the "-np" nor its synonyms are provided on the command line),
Open MPI will automatically execute a copy of the program on each
process slot (see below for description of a "process slot")
So I would expect
mpirun program
to run eight copies of the program (actually a simple hello world), since I have an Intel® Core™ i7-2630QM CPU # 2.00GHz × 8, but it doesn't: it simply runs a single process.
If you do not specify the number of processes to be used, mpirun tries to obtain them from the (specified or) default host file. From the corresponding section of the man page you linked:
If the hostfile does not provide slots information, a default of 1 is assumed.
Since you did not modify this file (I assume), mpirun will use one slot only.
On my machine, the default host file is located in
/etc/openmpi-x86_64/openmpi-default-hostfile
i7-2630QM is a 4-core CPU with two hardware threads per core. With computationally intensive programs, you should better start four MPI processes instead of eight.
Simply use mpiexec -n 4 ... as you do not need a hostfile for starting processes on the same node where mpiexec is executed.
Hostfiles are used when launching MPI processes on remote nodes. If you really need to create one, the following should do it:
hostname slots=4 max_slots=8
(replace hostname with the host name of the machine)
Run the program as
mpiexec -hostfile name_of_hostfile ...
max_slots=8 allows you to oversubscribe the node with up to eight MPI processes if your MPI program can make use of the hyperthreading. You can also set the environment variable OMPI_MCA_orte_default_hostfile to the full path of the hostfile instead of explicitly passing it each and every time as a parameter to mpiexec.
If you happen to be using a distributed resource manager like Torque, LSF, SGE, etc., then, if properly compiled, Open MPI integrates with the environment and builds a host and slot list from the reservation automatically.

Which operating systems support passing -j options to sub-makes?

From the man page for gnu make:
The ‘-j’ option is a special case (see Parallel Execution). If you set
it to some numeric value ‘N’ and your operating system supports it
(most any UNIX system will; others typically won’t), the parent make
and all the sub-makes will communicate to ensure that there are only
‘N’ jobs running at the same time between them all. Note that any job
that is marked recursive (see Instead of Executing Recipes) doesn’t
count against the total jobs (otherwise we could get ‘N’ sub-makes
running and have no slots left over for any real work!)
If your operating system doesn’t support the above communication, then
‘-j 1’ is always put into MAKEFLAGS instead of the value you
specified. This is because if the ‘-j’ option were passed down to
sub-makes, you would get many more jobs running in parallel than you
asked for. If you give ‘-j’ with no numeric argument, meaning to run
as many jobs as possible in parallel, this is passed down, since
multiple infinities are no more than one.
Which common operating systems support or don't support this behavior?
And how can you tell if your os supports it?
To tell if your make supports this, run this command from your shell prompt:
echo 'all:;#echo $(filter jobserver,$(.FEATURES))' | make -f-
If it prints 'jobserver', then you have support. If it prints nothing, you do not have support. Or, if your OS doesn't support echo or pipelines, create a small makefile containing:
all:;#echo $(filter jobserver,$(.FEATURES))
then run make with that makefile.

mpirun OpenFOAM parallel app in gdb

I'm trying to step through an OpenFOAM application (in this case, icoFoam, but this question is in general for any OpenFOAM app).
I'd like to use gdb to step through an analysis running in parallel (let's say, 2 procs).
To simply launch the app in parallel, I type:
mpirun -np 2 icoFoam -parallel
Now I want to step through it in gdb. But I'm having trouble launching icoFoam in parallel and debugging it, since I can't figure out how to set a break point before the application begins to execute.
One thing I know I could do is insert a section of code after the MPI_Initialize that waits (and endless loop) until I change some variable in gdb. Then I'd run the app in parallel, attach a gdb session to each of those PIDs, and happily debug. But I'd rather not have to alter the OpenFOAM source and recompile.
So, how can I start the application running in parallel, some how get it to stop (like at the beginning of main) and then step through it in gdb? All without changing the original source code?
Kindest regards,
Madeleine.
You could try this resource
It looks like the correct command is:
mpirunDebug -np 2 xxxFoam -parallel

How to run several commands in one PBS job submission

I have written a code that takes only 1-4 cpus. But when I submit a job on the cluster, I have to take at least one node with 16 cores per job. So I want to run several simulations on each node with each job I submit.
I was wondering if there is a way to submit the simulations in parallel in one job.
Here's an example:
My code takes 4 cpus. I submit a job for one node, and I want the node to run 4 instances of my code (each instance has different parameters) to take all the 16 cores.
Yes, of course; generally such systems will have instructions for how to do this, like these.
If you have (say) 4x 4-cpu jobs that you know will each take the same amount of time, and (say) you want them to run in 4 different directories (so the output files are easier to keep track of), use the shell ampersand to run them each in the background and then wait for all background tasks to finish:
(cd jobdir1; myexecutable argument1 argument2) &
(cd jobdir2; myexecutable argument1 argument2) &
(cd jobdir3; myexecutable argument1 argument2) &
(cd jobdir4; myexecutable argument1 argument2) &
wait
(where myexecutable argument1 argument2 is just a place holder for however you usually run your program; if you use mpiexec or something similar, that goes in there just as you'd normally use it. If you're using OpenMP, you can export the environment variable OMP_NUM_THREADS before the first line above.
If you have a number of tasks that won't all take the same length of time, it's easiest to assign well more than the (say) 4 jobs above and let a tool like gnu parallel launch the jobs as necessary, as described in this answer.

qsub for one machine?

A frequent problem I encounter is having to run some script with 50 or so different parameterizations. In the old days, I'd write something like (e.g.)
for i in `seq 1 50`
do
./myscript $i
done
In the modern era though, all my machines can handle 4 or 8 threads at once. The scripts aren't multithreaded, so what I want to be able to do is run 4 or 8 parameterizations at a time, and to automatically start new jobs as the old ones finish. I can rig up a haphazard system myself (and have in the past), but I suspect that there must be a linux utility that does this already. Any suggestions?
GNU parallel does this. With it, your example becomes:
parallel ./myscript -- `seq 1 50`

Resources