GNU parallel kills command in bash script on HPC - parallel-processing

I am running GNU parallel to run a bash script but it seems GNU parallel automatically kills my program and I am not sure why. It run normally when I run the inside script individually.
I wonder why this happen and how to solve it?
Your help is really appreciated!
Here is my code:
parallel --progress --joblog ${home}/data/hsc-admmt/Projects/log_a.sh -j 5 :::: a.sh
Here is the message at the end of output of GNU parallel
/scratch/eu82/bt0689/data/hsc-admmt/Projects/sim_causal_mantel_generate.sh: line 54: 3050285 Killed $home/data/hsc-admmt/Tools/mtg2 -plink plink_all${nsamp}${nsnp}_1 -simreal snp.lst1

I just had the same problem and found this possible explanation:
“Some jobs need a lot of memory, and should only be started when there is enough memory free. Using --memfree GNU parallel can check if there is enough memory free. Additionally, GNU parallel will kill off the youngest job if the memory free falls below 50% of the size. The killed job will put back on the queue and retried later.”
(from https://www.gnu.org/software/parallel/parallel_tutorial.html).
In my case the killed job was not resumed. I’m not sure if this is the reason for your problem but it would explain mine since the error only occurs to me when I parallelize my script for more than 3 jobs.

Related

How to utilise GNU parallel efficiently?

I have a script say parallelise.sh, whose contents are 10 different python calls shown below:
python3.8 script1.py
python3.8 script2.py
.
.
.
python3.8 script10.py
Now, I use GNU parallel
nohup parallel -j 5 < parallellise.sh &
It starts as expected; 5 different processors are being used and the first 5 scripts, script_1.py ... script_5.py are running. Now I notice that some of them (say two of them script_1.py and script_2.py) complete very fast, whereas the others need more time to complete.
Now, there are unused resources (2 processors) while waiting for the remaining 3 scripts (script_3.py, script_4.py, and script_5.py) to complete so that the next 5 can be loaded. Is there a way to use these resources by loading new ones as existing commands get completed?
For information: My OS is CentOS
As #RenaudPacalet says there is nothing else to do.
So there is something in your scripts which causes this not to happen.
To help debug you can use:
parallel --lb --tag < parallellise.sh
and maybe add a "Starting X" line at the beginning of scriptX.py and a "Finishing X" line at the end of scriptX.py so you can see that the scripts are indeed finishing.
Without knowing anything about scriptX.py it is impossible to say what is causing this.
(Instead of nohup consider using tmux or screen so you can have the jobs run in the background but always check in on them and see their output. nohup is not ideal for debugging).

Running bash scripts in parallel

I would like to run commands in parallel. So that if one fails, the whole job exists as failure. How can I do that? More specifically, I would like to run aws sync commands in parallel. I have 5 aws sync commands that run sequentially. I would like them to run in parallel so that if one fails, the whole job fails. How can I do that?
GNU Parallel is a really handy and powerful tool that works with anything bash
http://www.gnu.org/software/parallel/
https://www.youtube.com/watch?v=OpaiGYxkSuQ
# run lines from a file, 8 at a time
cat commands.txt | parallel --eta -j 8 "{}"

Julia Command Line Running Processes in Parallel

I have a Julia script that converts csvs to a binary format. Trust me it's great. I also have many (seemingly innumerable) csvs that I want to process. It's a shared network and so I can only process five files at a clip without savagely burdening the CPU and making my coworkers irate and potentially unstable. Accordingly, I want to run the script in groups of five, wait for them to finish, and then run the next batch as background processes until it's Miller time all using Julia's wonderful run() function ala:
julia csvparse3.jl /home/file1.csv > /dev/null 2>&1 &
I'm fairly certain that I could sidestep all of this by using addprocs() and pmap() if I made my parsing script into a Julia module/function. However, the reason I'm asking this is because I don't know what I would then do if my original script was written in Fortran or even worse Python? Is there a way for me to achieve my aforementioned goals for an arbitrary number of external programs, ascertain when the processes are finished, and start anew in the context of a simple loop? Many thanks.
With GNU Parallel you can run:
parallel -j5 julia csvparse3.jl ::: /home/*.csv > /dev/null 2>&1
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

Queue using several processes to launch bash jobs

I need to run many (hundreds) commands in shell, but I only want to have a maximum of 4 processes running (from the queue) at once. Each process will last several hours.
When a process finishes I want the next command to be "popped" from the queue and executed.
I also want to be able to add more process after the beginning, and it will be great if I could remove some jobs from the queue, or at least empty the queue.
I have seen solutions using makefile, but this only work if I have all my list of commands before the beginning. Also tried using mkfifo sjobq, and others, but I never could reach my needs...
Does anyone have code to solve this problem?
Edit: In response to Mark Setchell
The solution with tail -f and parallel is almost perfect, but when I do it, it always keep not launching the last 4 commands until I add more, and so on, I don't know why, and it is quite troublesome...
As for Redis, good solution also, but it takes more time to master all of it.
Thanks !
Use GNU Parallel to make a job queue like this:
# Clear out file containing job queue
> jobqueue
# Start GNU Parallel processing jobs from queue
# -k means "keep" output in order
# -j 4 means run 4 jobs at a time
tail -f jobqueue | parallel -k -j 4
# From another terminal, submit 40 jobs to the queue
for i in {1..40}; do echo "sleep 5;date +'%H:%M:%S Job $i'"; done >> jobqueue
Another option is to use REDIS - see my answer here Run several jobs parallelly and Efficiently

Making qsub block until job is done?

Currently, I have a driver program that runs several thousand instances of a "payload" program and does some post-processing of the output. The driver currently calls the payload program directly, using a shell() function, from multiple threads. The shell() function executes a command in the current working directory, blocks until the command is finished running, and returns the data that was sent to stdout by the command. This works well on a single multicore machine. I want to modify the driver to submit qsub jobs to a large compute cluster instead, for more parallelism.
Is there a way to make the qsub command output its results to stdout instead of a file and block until the job is finished? Basically, I want it to act as much like "normal" execution of a command as possible, so that I can parallelize to the cluster with as little modification of my driver program as possible.
Edit: I thought all the grid engines were pretty much standardized. If they're not and it matters, I'm using Torque.
You don't mention what queuing system you're using, but SGE supports the '-sync y' option to qsub which will cause it to block until the job completes or exits.
In TORQUE this is done using the -x and -I options. qsub -I specifies that it should be interactive and -x says run only the command specified. For example:
qsub -I -x myscript.sh
will not return until myscript.sh finishes execution.
In PBS you can use qsub -Wblock=true <command>

Resources