Given: 2 Ubuntu 16.04 machines with multiple CPU cores.
I want to execute multiple instances of program fixed_arg arg2on the machines, passing one file name per call as arg2 to the program.
So far, working with xargs, this works on a single machine:
find . -iname "*.ext" -print | xargs -n1 -P12 program fixed_arg
(This will find all files with extension "ext" in the current directory (.), print one file per line (-print), and call xargs to call program 12 times in parallel (-P12) with only one argument arg2per call (-n1). Note the white space on the end of the whole command.)
I want to use multiple machines on which I installed the "mpich" package from the official Ubuntu 16.04 repositories.
I just do not know how to make mpiexec to run my program with only one argument on multiple machines.
I do know that mpiexec will accept a list of arguments, but my list will be in the range of 800 to 2000 files which so far has been to long for any program.
Any help is appreciated.
You just selected wrong instrument (Or give us more details about your target program). MPI (mpich implementation, mpiexec and mpirun commands) is not for starting unrelated programs on multiple hosts, it is for starting one program with exactly same source code in the way, when program knows now many copies are there (up to 100 and more thousands) to do well-defined point-to-point and collective message passing between copies. It is instrument to parallel some scientific codes like computation over huge array which can't computed on single machine or even can't fit into its memory.
Better instrument for you can be GNU parallel (https://www.gnu.org/software/parallel/); and if you have one or two machines or it is just several runs, it is easier to manually split your file list in two parts, and run two parallel or xargs on every machine (by hand or with ssh using authorized_keys). I'll assume that all files are accessible from both machines at the same path (NFS share or something like; no magic tool like mpi or gnu parallel will forward files for you, but batch some modern batch processing system may):
find . -iname "*.ext" -print > list
l=$(wc -l < list)
sp=$((l/2))
split -l $sp list
cat xaa | xargs -n1 -P12 program fixed_arg &
cat xab | ssh SECOND_HOST xargs -n1 -P12 program fixed_arg &
wait
Or just learn about multi-host usage of GNU parallel: https://www.gnu.org/software/parallel/man.html
-S #hostgroup Distribute jobs to remote computers. The jobs will be run on a list of remote computers. GNU parallel will determine the number of CPU cores on the remote computers and run the number of jobs as specified by -j.
EXAMPLE: Using remote computers
It also has a magic of sending files to remote machine with --transferfile filename option if you have no shared FS between two ubuntus.
Related
We have a large number of files in a directory which need to be processed by a program called process, which takes two aguments infile and outfile.
We want to name the outfile after infile and add a suffix. E.g. for processing a single file, we would do:
process somefile123 somefile123-processed
How can we process all files at once from the command line?
(This is in a bash command line)
As #Cyrus says in the comments, the usual way to do that would be:
for f in *; do process "$f" "${f}-processed" & done
However, that may be undesirable and lead to a "thundering herd" of processes if you have thousands of files, so you might consider GNU Parallel which is:
more controllable,
can give you progress reports,
is easier to type
and by default runs one process per CPU core to keep them all busy. So, that would become:
parallel process {} {}-processed ::: *
On linux or Mac, regularly run several python scripts (or other programs, for that matter) in parallel. A use case could be that the script runs a simulation based on random numbers, and I just want to run it many times to get good statistics.
An easy way of doing this on linux or Mac would be to use a for loop, and use an ampersand & to make the jobs run in parallel:
for i in {1..10}; do python script.py & ; done
Another usecase would be that I want to run a script on some stored data in a file. Say I have a bunch of .npy files with stored data, and I want to process them all with the same script, running 4 jobs in parallel (since I have a 4-core CPU), I could use xargs:
ls *.npy | xargs -P4 -n1 python script.py
Are there equivalent ways of doing this on the Windows command line?
Parallel processing using xargs - takes too much time ( ~8 hrs) on some servers
I have a script that scans an entire file system and does some processing on a selective bunch of files. I am using xargs to do this in parallel. Using xargs instead of using GNU parallel is because I will have to run this script on 100s of servers and installing the utility on all the servers is not an option.
All the servers have the below configuration
Architecture: x86_64
CPU(s): 24
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 2
I tried increasing the number of processes but beyond a point that doesn't help. I read somewhere that if the script is I/O bound, its better to keep the number of processes equal to the number of cores. Is that true?
find . -type f ! -empty -print0 | xargs -L1 -P 10 -0 "./process.sh"
I believe the above code will make my script I/O bound?
I have to scan the entire file system. How do I optimize the code so I can significantly reduce the processing time.
Also, my code only needs to handle parallel processing of files in a file system. Processing the servers in parallel is taken care of.
You need to find where your bottleneck is.
From your question it is unclear that you have found where your bottleneck is.
If it is CPU then you can use our 100 servers with GNU Parallel without install GNU Parallel on all of them (are you by the way aware of parallel --embed available since 20180322?)
You simply prefix the sshlogins with number of CPU threads and /. So for 24 threads:
find ... |
parallel -S 24/server1,24/server2,24/server3 command
If your bottleneck is your disk, then using more servers will not help.
Then it is better to get a faster disk (e.g. SSD, mirrored disks, RAM-disks and similar).
The optimal number of threads to use on a disk can in practice not be predicted. It can only be measured. I have had a 40 spindle RAID system where the optimal number was 10 threads.
Let's say I have N Fortran executables and M cores on my machine, where N is greater than M. I want to be able to run these executables in parallel. I am using RHEL 6.9
I have used both OpenMP and GNU Parallel in the past to run code in parallel. However for my current purposes, neither of these two options would work: RHEL doesn't have a GNU Parallel distribution, and OpenMP applies to parallelizing blocks within a single executable, not multiple executables.
What is the best way to run these N executables in parallel? Would a simple approach like
executable_1 & executable_2 & ... & executable_N
work?
Just because it is not part of the official repository, doesn't mean you cannot use GNU parallel on a RHEL system. Just build GNU parallel yourself or install a third party rpm.
xargs supports parallel execution as well. Its interface is not ideal for your use case, but this should work:
echo executable_1 executable_2 ... executable_N | xargs -n1 -P8 bash -c
(-P8 means “run eight processes in parallel”.)
For more complex tasks, I sometimes write makefiles and use make -j8 to run targets in parallel.
Here I read
If no value is provided for the number of copies to execute (i.e.,
neither the "-np" nor its synonyms are provided on the command line),
Open MPI will automatically execute a copy of the program on each
process slot (see below for description of a "process slot")
So I would expect
mpirun program
to run eight copies of the program (actually a simple hello world), since I have an Intel® Core™ i7-2630QM CPU # 2.00GHz × 8, but it doesn't: it simply runs a single process.
If you do not specify the number of processes to be used, mpirun tries to obtain them from the (specified or) default host file. From the corresponding section of the man page you linked:
If the hostfile does not provide slots information, a default of 1 is assumed.
Since you did not modify this file (I assume), mpirun will use one slot only.
On my machine, the default host file is located in
/etc/openmpi-x86_64/openmpi-default-hostfile
i7-2630QM is a 4-core CPU with two hardware threads per core. With computationally intensive programs, you should better start four MPI processes instead of eight.
Simply use mpiexec -n 4 ... as you do not need a hostfile for starting processes on the same node where mpiexec is executed.
Hostfiles are used when launching MPI processes on remote nodes. If you really need to create one, the following should do it:
hostname slots=4 max_slots=8
(replace hostname with the host name of the machine)
Run the program as
mpiexec -hostfile name_of_hostfile ...
max_slots=8 allows you to oversubscribe the node with up to eight MPI processes if your MPI program can make use of the hyperthreading. You can also set the environment variable OMPI_MCA_orte_default_hostfile to the full path of the hostfile instead of explicitly passing it each and every time as a parameter to mpiexec.
If you happen to be using a distributed resource manager like Torque, LSF, SGE, etc., then, if properly compiled, Open MPI integrates with the environment and builds a host and slot list from the reservation automatically.