I'm using Slurm on my lab's server, and I would like to submit a job that looks like this:
mkdir my/file/architecture
echo "#HEADER" > my/file/architecture/output_summary.txt
for f in my/dir/*.csv; do
python3 myscript.py $f
Is there any way to run this so that it will complete the first instructions, then run the for loop in parallel? Each step is independant, so they can run at the same time.
The initial steps are not very complex, so if needed I could separate it into a separate SBATCH script. my/dir/ however contains about 7000 csv files to processes, so typing them all out manually would be a pain.
GNU Parallel might be a good fit here, or xargs, though I prefer parallel in Slurm jobs.
Here's an example of an sbatch script running an 8-way parallel:
#SBATCH --nodes=1
#SBATCH --ntasks=
srun="srun --exclusive -N1 -n1"
# -j is the number of tasks parallel runs so we set it to $SLURM_NTASKS
# Note that --ntasks=1 and --cpus-per-task=8 will have srun start one copy of the program at a time. We use "find" to generate a list of files to operate on.
find /my/dir/*.csv -type f | parallel -j $SLURM_NTASKS "$srun python3 myscript.py {}"
The easiest way is to run on a single node, though parallel can use SSH (I believe) to run on multiple computers.
This is a follow up question from [How to run jobs in paralell using one slurm batch script?]. The goal was to create a single SBatch-Script, which can start multiple processes and run them in parallel. The Answer given by
damienfrancois was very detailed and looked something like this.
#SBATCH --job-name=test
#SBATCH --output=/dev/null
#SBATCH --error=/dev/null
#SBATCH --partition=All
srun -n 1 -c 1 --exclusive sleep 60 &
srun -n 1 -c 1 --exclusive sleep 60 &
However, I am not able to understand the exclusive keyword. If I use the keyword, one node of the cluster is chosen and all processes are launched there. However, I would like Slurm to distribute the ["sleeps"/steps] over the entire cluster.
So how does the keyword exclusive work ? According to the Slurm documentaion, the restriction to one node should not happen, since the keyword is used within a step-allocation.
[I am new to Slurm]
Hello I would need help
In fact I need to execute several bash files ex:
those file will generate data that will be used for another bash file call final.sh
So in order to gain time I want to execute the fileNb.sh files sumultany on a cluster by doing :
for file in file*.sh; do sbatch $file; done
, and then when all job have been done, I would like to execute automatically the final.sh file.
Does someone have an idea ?
Thank you very much
One clean option is to reorganise the set of jobs as a job array and then add a dependency for final job on the whole array.
Assuming fileN.sh looks like this:
#SBATCH --<some option>
#SBATCH --<some other option>
./my_program input_fileN
you can make this a job array. In a single submission file file.sh, write this
#SBATCH --<some option>
#SBATCH --<some other option>
#SBATCH --array=1-4
./my_program input_file${SLURM_ARRAY_TASK_ID}
Then run
JOBID=$(sbatch --parsable file.sh)
sbatch --dependency after:$JOBID final.sh
In case your jobs cannot be parametrised by an integer directly, create a Bash array like this:
#SBATCH --<some option>
#SBATCH --<some other option>
#SBATCH --array=0-2
ARGS=(SRR63563 SRR63564 SRR63565)
fasterq-dump --threads 10 ${ARGS[$SLURM_ARRAY_TASK_ID]} -O /path1/path2/path3/
You could do:
sbatch --wait file1.sh &
sbatch --wait file2.sh &
sbatch --wait file3.sh &
sbatch --wait file4.sh &
sbatch final.sh
Or, more simply with GNU Parallel:
parallel -j4 sbatch --wait ::: file*.sh
sbatch final.sh
Is this no good?
for file in file*.sh; do sbatch $file; done; ./final.sh
I am trying to launch several task in a SLURM-managed cluster, and would like to avoid dealing with dozens of files.
Right now, I have 50 tasks (subscripted i, and for simplicity, i is also the input parameter of my program), and for each one a single bash file slurm_run_i.sh which indicates the computations configuration, and the srun command:
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=30G
srun python plotConvergence.py i
I am then using another bash file to submit all these tasks, slurm_run_all.sh
for i in {1..50}:
sbatch slurm_run_$i.sh
This works (50 jobs are running on the cluster), but I find it troublesome to have more than 50 input files. Searching a solution, I came up with the & command, obtaining something as:
#SBATCH --ntasks=50
#SBATCH --cpus-per-task=1
#SBATCH -J pltall
#SBATCH --mem=30G
# Running jobs
srun python plotConvergence.py 1 &
srun python plotConvergence.py 2 &
srun python plotConvergence.py 49 &
srun python plotConvergence.py 50 &
echo "All done"
Which seems to run as well. However, I cannot manage each of these jobs independently: the output of squeue shows I have a single job (pltall) running on a single node. As there are only 12 cores on each node in the partition I am working in, I am assuming most of my jobs are waiting on the single node I've been allocated to. Setting the -N option doesn't change anything too.. Moreover, I cannot cancel some jobs individually anymore if I realize there's a mistake or something, which sounds problematic to me.
Is my interpretation right, and is there a better way (I guess) than my attempt to process several jobs in slurm without being lost among many files ?
What you are looking for is the jobs array feature of Slurm.
In your case, you would have a single submission file (slurm_run.sh) like this:
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=30G
#SBATCH --array=1-50
srun python plotConvergence.py ${SLURM_ARRAY_TASK_ID}
and then submit the array of jobs with
sbatch slurm_run.sh
You will see that you will have 50 jobs submitted. You can cancel all of them at once or one by one. See the man page of sbatch for details.
I wanted to run a python script main.py multiple times with different arguments through a sbatch_run.sh script as in:
#SBATCH --job-name=sbatch_run
#SBATCH --array=1-1000
#SBATCH --exclude=node047
arg1=10 #arg to be change during runs
arg2=12 #arg to be change during runs
python main.py $arg1 $arg2
The arguments are encoded in the bash file ran by sbatch. I was worried that if I ran sbatch_run.sh multiple times one after the other but changing the value of arg1 and arg2 during each run, that it might cause errors in my runs. For example if I do:
sbatch sbatch_run.sh # with arg1=10 and arg2=12
and then immediately after I change sbatch_run.sh but run the file again as in:
sbatch sbatch_run.sh # with arg1=69 and arg2=666
would case my runs to all run with the last one (i.e. arg1=69 and arg2=666) instead of each run with its own arguments.
I know for sure that if I hard code the arguments in main.py and then run the same sbatch script but change the main.py it will run the last one. I was wondering if that is the case too if I change the sbatch_run.sh script.
Just so you know, I did try this experiment, by running 1000 scripts, then some get queued and put a sleep command and then change the sbatch_run.sh. It seems to not change what my run is, however, if I am wrong this is way too important to be wrong by accident and wanted to make sure I asked too.
For the record I ran:
#SBATCH --job-name=ECHO
#SBATCH --array=1-1000
#SBATCH --exclude=node047
sleep 15
echo helloworld
echo 5
and then change the echo to echo 10 or echo byebyeworld.
When sbatch is run, Slurm copies the submission script to its internal database ; you can convince yourself with the following experiment:
$ cat submit.sh
#SBATCH --hold
echo helloworld
The --hold is there to make sure the job does not start. Submit it :
$ sbatch submit.sh
Then modify the submission script:
$ sed -i 's/hello/bye/' submit.sh
$ cat submit.sh
#SBATCH --hold
echo byeworld
and now use control show job to see the script Slurm is planning to run:
$ scontrol show -ddd job YOURJOBID
JobId=******* JobName=submit.sh
#SBATCH --hold
echo helloworld
It hasn't changed although the original script has.
[EDIT] Recent versions of Slurm use scontrol write batch_script - rather than scontrol show -dd job to show the submission script.
Suppose that I have the following simple bash script which I want to submit to a batch server through SLURM:
#SBATCH -o "outFile"$1".txt"
#SBATCH -e "errFile"$1".txt"
exit 0
In this script, I simply want to write the output of hostname on a textfile whose full name I control via the command-line, like so:
login-2:jobs$ sbatch -D `pwd` exampleJob.sh 1
Submitted batch job 203775
Unfortunately, it seems that my last command-line argument (1) is not parsed through sbatch, since the files created do not have the suffix I'm looking for and the string "$1" is interpreted literally:
login-2:jobs$ ls
errFile$1.txt exampleJob.sh outFile$1.txt
I've looked around places in SO and elsewhere, but I haven't had any luck. Essentially what I'm looking for is the equivalent of the -v switch of the qsub utility in Torque-enabled clusters.
Edit: As mentioned in the underlying comment thread, I solved my problem the hard way: instead of having one single script that would be submitted several times to the batch server, each with different command line arguments, I created a "master script" that simply echoed and redirected the same content onto different scripts, the content of each being changed by the command line parameter passed. Then I submitted all of those to my batch server through sbatch. However, this does not answer the original question, so I hesitate to add it as an answer to my question or mark this question solved.
I thought I'd offer some insight because I was also looking for the replacement to the -v option in qsub, which for sbatch can be accomplished using the --export option. I found a nice site here that shows a list of conversions from Torque to Slurm, and it made the transition much smoother.
You can specify the environment variable ahead of time in your bash script:
$ var_name='1'
$ sbatch -D `pwd` exampleJob.sh --export=var_name
Or define it directly within the sbatch command just like qsub allowed:
$ sbatch -D `pwd` exampleJob.sh --export=var_name='1'
Whether this works in the # preprocessors of exampleJob.sh is also another question, but I assume that it should give the same functionality found in Torque.
Using a wrapper is more convenient. I found this solution from this thread.
Basically the problem is that the SBATCH directives are seen as comments by the shell and therefore you can't use the passed arguments in them. Instead you can use a here document to feed in your bash script after the arguments are set accordingly.
In case of your question you can substitute the shell script file with this:
sbatch <<EOT
#SBATCH -o "outFile"$1".txt"
#SBATCH -e "errFile"$1".txt"
exit 0
And you run the shell script like this:
bash [script_name].sh [suffix]
And the outputs will be saved to outFile[suffix].txt and errFile[suffix].txt
If you pass your commands via the command line, you can actually bypass the issue of not being able to pass command line arguments in the batch script. So for instance, at the command line :
sbatch --error=$var1 --output=$var2 batch_script.sh
The lines starting with #SBATCH are not interpreted by bash but are replaced with code by sbatch.
The sbatch options do not support $1 vars (only %j and some others, replacing $1 by %1 will not work).
When you don't have different sbatch processes running in parallel, you could try
touch outFile${1}.txt errFile${1}.txt
rm link_out.sbatch link_err.sbatch 2>/dev/null # remove links from previous runs
ln -s outFile${1}.txt link_out.sbatch
ln -s errFile${1}.txt link_err.sbatch
#SBATCH -o link_out.sbatch
#SBATCH -e link_err.sbatch
# I do not know about the background processing of sbatch, are the jobs still running
# at this point? When they are, you can not delete the temporary symlinks yet.
exit 0
As you said in a comment yourself, you could make a masterscript.
This script can contain lines like
cat exampleJob.sh.template | sed -e 's/File.txt/File'$1'.txt/' > exampleJob.sh
# I do not know, is the following needed with sbatch?
chmod +x exampleJob.sh
In your template the #SBATCH lines look like
#SBATCH -o "outFile.txt"
#SBATCH -e "errFile.txt"
This is an old question but I just stumbled into the same task and I think this solution is simpler:
Let's say I have the variable $OUT_PATH in the bash script launch_analysis.bash and I want to pass this variable to task_0_generate_features.sl which is my SLURM file to send the computation to a batch server. I would have the following in launch_analysis.bash:
`sbatch --export=OUT_PATH=$OUT_PATH task_0_generate_features.sl`
Which is directly accessible in task_0_generate_features.sl
In #Jason case we would have:
sbatch -D `pwd` --export=hostname=$hostname exampleJob.sh
Reference: Using Variables in SLURM Jobs
Something like this works for me and Torque
echo "$(pwd)/slurm.qsub 1" | qsub -S /bin/bash -N Slurm-TEST
hostname > outFile${1}.txt 2>errFile${1}.txt
exit 0