Send multiple qsub jobs with different runtime parameters - bash

Last night, I sent a large number of jobs with qsub of the same executable but with different input parameters. Most of the jobs were in queue, waiting for the others to finish. This morning, I realized that all the jobs that were in queue used the last instance of my input file.
What is the standard way of working around this issue? Should I have one input file per job and compile my code so it reads the correct one? Or is there a better/more robust solution?

You could create a master PBS script which loops over the different input paramters, executes them either in parallel or sequentially:
this simply gives executable a different input number for each job (IN), you should change this to loop over one or more of your input parameters as needed.
# PBS -l mppwidth=2048
NIN=10 # number of input parameters
for IN in `seq -w 1 $NIN`; do
cd "sub_job_${IN}"
executable $IN # runs jobs sequentially (you might have to prefix this with aprun)
done
or in parallel:
# PBS -l mppwidth=2048
# ^^ these should now be shared among the jobs.
NIN=10 # number of input parameters
for IN in `seq -w 1 $NIN`; do
cd "sub_job_${IN}"
executable $IN & # runs the job in the background, you might
# have to prefix this with `aprun -n .. -N ..` or something
# so that each job only uses a portion of the total
# requested CPUs.
done
wait # wait for all jobs to finish

Related

How to make sbatch wait until last submitted job is *running* when submitting multiple jobs?

I'm running a numerical model which parameters are in a "parameter.input" file. I use sbatch to submit multiple iterations of the model, with one parameter in the parameter file changing every time. Here is the loop I use:
#!/bin/bash -l
for a in {01..30}
do
sed -i "s/control_[0-9][0-9]/control_${a}/g" parameter.input
sbatch --time=21-00:00:00 run_model.sh
sleep 60
done
The sed line changes a parameter in the parameter file. The
run_model.sh file runs the model.
The problem: depending on the resources available, a job might run immediately or stay pending for a few hours. With my default loop, if 60 seconds is not enough time to find resources for job n to run, the parameter file will be modified while job n is pending, meaning job n will run with the wrong parameters. (and I can't wait for job n to complete before submitting job n+1 because each job takes several days to complete)
How can I force batch to wait to submit job n+1 until job n is running?
I am not sure how to create an until loop that would grab the status of job n and wait until it changes to 'running' before submitting job n+1. I have experimented with a few things, but the server I use also hosts another 150 people's jobs, and I'm afraid too much experimenting might create some issues...
Use the following to grab the last submitted job's ID and its status, and wait until it isn't pending anymore to start the next job:
sentence=$(sbatch --time=21-00:00:00 run_model.sh) # get the output from sbatch
stringarray=($sentence) # separate the output in words
jobid=(${stringarray[3]}) # isolate the job ID
sentence="$(squeue -j $jobid)" # read job's slurm status
stringarray=($sentence)
jobstatus=(${stringarray[12]}) # isolate the status of job number jobid
Check that the job status is 'running' before submitting the next job with:
if [ "$jobstatus" = "R" ];then
# insert here relevant code to run next job
fi
You can insert that last snippet in an until loop that checks the job's status every few seconds.

Running a queue of MPI calls in parallel with SLURM and limited resources

I'm trying to run a Particle Swarm Optimization problem on a cluster using SLURM, with the optimization algorithm managed by a single-core matlab process. Each particle evaluation requires multiple MPI calls that alternate between two Python programs until the result converges. Each MPI call takes up to 20 minutes.
I initially naively submitted each MPI call as a separate SLURM job, but the resulting queue time made it slower than running each job locally in serial. I am now trying to figure out a way to submit an N node job that will continuously run MPI tasks to utilize the available resources. The matlab process would manage this job with text file flags.
Here is a pseudo-code bash file that might help to illustrate what I am trying to do on a smaller scale:
#!/bin/bash
#SBATCH -t 4:00:00 # walltime
#SBATCH -N 2 # number of nodes in this job
#SBATCH -n 32 # total number of processor cores in this job
# Set required modules
module purge
module load intel/16.0
module load gcc/6.3.0
# Job working directory
echo Working directory is $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
# Run Command
while <"KeepRunning.txt” == 1>
do
for i in {0..40}
do
if <“RunJob_i.txt” == 1>
then
mpirun -np 8 -rr -f ${PBS_NODEFILE} <job_i> &
fi
done
done
wait
This approach doesn't work (just crashes), but I don't know why (probably overutilization of resources?). Some of my peers have suggested using parallel with srun, but as far as I can tell this requires that I call the MPI functions in batches. This will be a huge waste of resources, as a significant portion of the runs finish or fail quickly (this is expected behavior). A concrete example of the problem would be starting a batch of 5 8-core jobs and having 4 of them crash immediately; now 32 cores would be doing nothing while they wait up to 20 minutes for the 5th job to finish.
Since the optimization will likely require upwards of 5000 mpi calls, any increase in efficiency will make a huge difference in absolute walltime. Does anyone have any advice as to how I could run a constant stream of MPI calls on a large SLURM job? I would really appreciate any help.
A couple of things: under SLURM you should be using srun, not mpirun.
The second thing is that the pseudo-code you provided launches an infinite number of jobs without waiting for any completion signal. You should try to put the wait into the inner loop, so you launch just a set of jobs, wait for them to finish, evaluate the condition and, maybe, launch the next set of jobs:
#!/bin/bash
#SBATCH -t 4:00:00 # walltime
#SBATCH -N 2 # number of nodes in this job
#SBATCH -n 4 # total number of tasks in this job
#SBATCH -s 8 # total number of processor cores for each task
# Set required modules
module purge
module load intel/16.0
module load gcc/6.3.0
# Job working directory
echo Working directory is $SLURM_SUBMIT_DIR
cd $SLURM_SUBMIT_DIR
echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`
# Run Command
while <"KeepRunning.txt” == 1>
do
for i in {0..40}
do
if <“RunJob_i.txt” == 1>
then
srun -np 8 --exclusive <job_i> &
fi
done
wait
<Update "KeepRunning.txt”>
done
Take care also distinguishing tasks and cores. -n says how many tasks will be used, -c says how many cpus per task will be allocated.
The code I wrote will launch in the background 41 jobs (from 0 to 40, included), but they will only start once the resources are available (--exclusive), waiting while they are occupied. Each jobs will use 8 CPUs. The you will wait for them to finish and I assume that you will update the KeepRunning.txt after that round.

How do I create a Stack or LIFO for GNU Parallel in Bash

While my original problem was solved in a different manner (see comment thread under this question, as well as the edits to this question), I was able to create a stack/LIFO for GNU Parallel in Bash. So I will edited my background/question to reflect a situation where it could be needed.
Background
I am using GNU Parallel to process files with a Bash script. As the files are processed, more files are created and new commands need to be added to parallel's list. I am not able to give parallel a complete list of commands, as information is generated as the initial files are processed.
I need a way to add the lines to parallel's list while it is running.
Parallel will also need to wait for a new line if nothing is in the queue and exit once the queue is finished.
Solution
First I created a fifo:
mkfifo /tmp/fifo
Next I created a bash file that cat's the file and pipes the output to parallel, which checks for the end_of_file line. (I wrote this with help from the accepted answer as well as from here)
#!/bin/bash
while true;
do
cat /tmp/fifo
done | parallel --ungroup --gnu --eof "end_of_file" "{}"
Then I write to the pipe with this command, adding lines to parallel's queue:
echo "command here" > /tmp/fifo
With this setup, all new commands are added to the queue. Once the queue is full parallel will begin processing it. This means that if you have slots for 32 jobs (32 processors), then you will need to add 32 jobs in order to start the queue.
If parallel is occupying all of its processors, it will put the job on hold until a processor becomes available.
By using the --ungroup argument, parallel will process/output jobs as they are added to the queue once the queue is full.
Without the --ungroup argument, parallel waits until a new slot is needed to complete a job. From the accepted answer:
Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or -u, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.
From http://www.gnu.org/software/parallel/man.html#EXAMPLE:-GNU-Parallel-as-queue-system-batch-manager
There is a a small issue when using GNU parallel as queue system/batch manager: You have to submit JobSlot number of jobs before they will start, and after that you can submit one at a time, and job will start immediately if free slots are available. Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or -u, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.

Queue using several processes to launch bash jobs

I need to run many (hundreds) commands in shell, but I only want to have a maximum of 4 processes running (from the queue) at once. Each process will last several hours.
When a process finishes I want the next command to be "popped" from the queue and executed.
I also want to be able to add more process after the beginning, and it will be great if I could remove some jobs from the queue, or at least empty the queue.
I have seen solutions using makefile, but this only work if I have all my list of commands before the beginning. Also tried using mkfifo sjobq, and others, but I never could reach my needs...
Does anyone have code to solve this problem?
Edit: In response to Mark Setchell
The solution with tail -f and parallel is almost perfect, but when I do it, it always keep not launching the last 4 commands until I add more, and so on, I don't know why, and it is quite troublesome...
As for Redis, good solution also, but it takes more time to master all of it.
Thanks !
Use GNU Parallel to make a job queue like this:
# Clear out file containing job queue
> jobqueue
# Start GNU Parallel processing jobs from queue
# -k means "keep" output in order
# -j 4 means run 4 jobs at a time
tail -f jobqueue | parallel -k -j 4
# From another terminal, submit 40 jobs to the queue
for i in {1..40}; do echo "sleep 5;date +'%H:%M:%S Job $i'"; done >> jobqueue
Another option is to use REDIS - see my answer here Run several jobs parallelly and Efficiently

Hold a bash script on PBS status without Torque

I've access to a low priority queue on a large national system. I can allocate in the queue only 1 job at the time.
The PBS job contains a program who is not likely to complete before the wall-time ends. Jobs on hold can't be queued in a number that exceeds 3.
It means that:
I can not use -W depend=afterok:$ID_of_previous_job . The script would submit all the job at once, but just the first 3 will enter the queue (the last 2 in H state)
I can not modify the submission script with a last line that submit the next_job (it is very likely that the actual program won't finish before the walltime ends and then the last line is not executed.
I can not install any software so I am limited to use a Bash Script, rather than Torque
I'd rather not use a "time check" script (such as: every 5 minute check if previous_job is over)
Is it possible to use a while and or sleep ?
Option 1
To use a while and sleep requires you to do something very similar to a time check script:
#!/bin/bash
jobid=`submit the first job`
while [[ -z `qstat ${jobid} | grep C` ]]; do
sleep 5
done
# submit the new job once the loop is done, after checking the exit status if desired
Option 2 - may be TORQUE only, not sure:
Perhaps a better way, suggested by Dmitri Chubarov in the comments, would be to use the per-job epilogue option. To do this the compute nodes have to be able to submit jobs, but since you were considering having the final line of the job do it then this seems like a possibility.
Add to the job a perjob epilogue by adding this line to the script:
#PBS -l epilogue=/path/to/script
And then have the script:
#!/bin/bash
# check exit code if desired, its argument 10 to the script
# submit the next job

Resources