LSF ERROR:Project must be 'acc_*' - bash

I need to run a Python script on a supercomputer by submitting a job with LSF. I have been trying to become acquainted with the syntax using a simple example script:
#!/bin/bash
#BSUB –q alloc
#BSUB –n 1
#BSUB –o t.out
echo “Salve Munde!”
I saved this file as example.txt, and on the command line, I ran:
$ bsub < example.txt
This returned the message:
LSF ERROR:Project must be 'acc_*'. Request aborted by esub. Job not submitted.
What is the cause of this error?

Related

Read job name from bash script parameters in SGE

I am running Sun Grid Engine for submitting jobs, and I want to have a bash script that sends in any file I need to run, instead of having to run a different qsub command with a different bash file for each of the jobs. I have been capable of generating output and error files that share the name of the input file, but now I am struggling with setting a different name for each file. My approach has been the following:
#!/bin/bash
#
#$ -cwd
#$ -S /bin/bash
#$ -N $1
#
python -u $1 >/output_dir/$1.out 2>/error_dir/$1.error
This way, running qsub send_to_sge.sh foo executes the program, and creates the files foo.error and foo.out with the errors and printouts, respectively. However, the job appears with the name $1 in the SGE queue. Instead, I would like to have foo as the job name. Is there any way to achieve what I am seeking?

How to prevent multiple executables from running at the same time on cluster

I have submitted a job to a multicore cluster with LSF platform. It looks like the code at the end. The two executables, exec1 and exec2, start at the same time. In my intention they are separated by a column comma and the second should start after the first has finished. Of course, this caused several problems with the job that couldn't terminate correctly. Now that I have figured out this behavior, I am writing separated job-submission files for each executable. Can anybody explain why these executables are running at the same time?
#!/bin/bash -l
#
# Batch script for bash users
#
#BSUB -L /bin/bash
#BSUB -n 10
#BSUB -J jobname
#BSUB -oo output.log
#BSUB -eo error.log
#BSUB -q queue
#BSUB -P project
#BSUB -R "span[hosts=1]"
#BSUB -W 4:0
source /etc/profile.d/modules.sh
module purge
module load intel_comp/c4/2013.0.028
module load hdf5/1.8.9
module load platform_mpi/8.2.1
export OMP_NUM_THREADS=1
export MP_TASK_AFFINITY=core:$OMP_NUM_THREADS
OPT="-aff=automatic:latency"
mpirun $OPT exec1; mpirun $OPT exec2
I assume that both exec1 and exec2 are MPI applications?
Theoretically it should work, but LSF is probably doing something odd and the mpirun for exec1 is exiting before exec1 actually exits. You could instead try:
mpirun $OPT exec1 && mpirun $OPT exec2
so that mpirun $OPT exec1 has to exit with return code 0 before exec2 is launched.
However, it probably isn't a great idea to run two MPI jobs from the same script like this, since for instance the MPI environment variable setup may introduce conflicts. What you should really do is use job chaining, so that exec2 is run after exec1, like this.

Changing the bash script sent to sbatch in slurm during run a bad idea?

I wanted to run a python script main.py multiple times with different arguments through a sbatch_run.sh script as in:
#!/bin/bash
#SBATCH --job-name=sbatch_run
#SBATCH --array=1-1000
#SBATCH --exclude=node047
arg1=10 #arg to be change during runs
arg2=12 #arg to be change during runs
python main.py $arg1 $arg2
The arguments are encoded in the bash file ran by sbatch. I was worried that if I ran sbatch_run.sh multiple times one after the other but changing the value of arg1 and arg2 during each run, that it might cause errors in my runs. For example if I do:
sbatch sbatch_run.sh # with arg1=10 and arg2=12
and then immediately after I change sbatch_run.sh but run the file again as in:
sbatch sbatch_run.sh # with arg1=69 and arg2=666
would case my runs to all run with the last one (i.e. arg1=69 and arg2=666) instead of each run with its own arguments.
I know for sure that if I hard code the arguments in main.py and then run the same sbatch script but change the main.py it will run the last one. I was wondering if that is the case too if I change the sbatch_run.sh script.
Just so you know, I did try this experiment, by running 1000 scripts, then some get queued and put a sleep command and then change the sbatch_run.sh. It seems to not change what my run is, however, if I am wrong this is way too important to be wrong by accident and wanted to make sure I asked too.
For the record I ran:
#!/bin/bash
#SBATCH --job-name=ECHO
#SBATCH --array=1-1000
#SBATCH --exclude=node047
sleep 15
echo helloworld
echo 5
and then change the echo to echo 10 or echo byebyeworld.
When sbatch is run, Slurm copies the submission script to its internal database ; you can convince yourself with the following experiment:
$ cat submit.sh
#!/bin/bash
#SBATCH --hold
echo helloworld
The --hold is there to make sure the job does not start. Submit it :
$ sbatch submit.sh
Then modify the submission script:
$ sed -i 's/hello/bye/' submit.sh
$ cat submit.sh
#!/bin/bash
#SBATCH --hold
echo byeworld
and now use control show job to see the script Slurm is planning to run:
$ scontrol show -ddd job YOURJOBID
JobId=******* JobName=submit.sh
[...]
BatchScript=
#!/bin/bash
#SBATCH --hold
echo helloworld
[...]
It hasn't changed although the original script has.
[EDIT] Recent versions of Slurm use scontrol write batch_script - rather than scontrol show -dd job to show the submission script.

Hold remainder of shell script commands until PBS qsub array job completes

I am very new to shell scripting, and I am trying to write a shell pipeline that submits multiple qsub jobs, but has several commands to run in between these qsubs, which are contingent on the most recent job completing. I have been researching multiple ways to try and hold the shell script from proceeding after submission of a qsub job, but none have been successful.
The simplest chunk of code I can provide to illustrate the issue is as follows:
THREADS=`wc -l < list1.txt`
qsub -V -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
There are obviously other lines of code after this that are actually contingent on firstjob.sh finishing, but I have omitted them here for clarity. I have tried the following methods of pausing/holding the script:
1) Only using wait, which is supposed to stop the script until all background programs are completed. This pushed right past the wait and printed the echo statement to the terminal while the array job was still running. My guess is this is occurring because once the qsub job is submitted, is exits and wait thinks it has completed?
qsub -V -t 1-$THREADS firstjob.sh
wait
echo "firstjob.sh completed"
2) Setting the job to a variable, echoing that variable to submit the job, and using the the entire job ID along with wait to pause. The echo command should wait until all elements of the array job have completed.The error message is shown following the code, within the code block.
job1=$(qsub -V -t 1-$THREADS firstjob.sh)
echo "$job1"
wait $job1
echo "firstjob.sh completed"
####ERROR RECEIVED####
-bash: wait: `4585057[].cluster-name.local': not a pid or valid job spec
3) Using the -sync y for qsub. This should prevent it from exiting the qsub until the job is complete, acting as an effective pause...I had hoped. Error in comment after the commands. For some reason it is not reading the -sync option correctly?
qsub -V -sync y -t 1-$THREADS firstjob.sh
echo "firstjob.sh completed"
####ERROR RECEIVED####
qsub: script file 'y' cannot be loaded - No such file or directory
4) Using a dummy shell script (the dummy just makes an empty file) so that I could use the -W depend=afterok: option of qsub to pause the script. This again pushes right past to the echo statement without any pause for submitting the dummy script. Both jobs get submitted, one right after the other, no pause.
job1=$(qsub -V -t 1-$THREADS demux.sh)
echo "$job1"
check=$(qsub -V -W depend=afterok:$job1 dummy.sh)
echo "$check"
echo "firstjob.sh completed"
Some further details regarding the script:
Each job submission is an array job.
The pipeline is being run in the terminal using a command resembling the following, so that I may provide it with 3 inputs: source Pipeline.sh -r list1.txt -d /workingDir/ -s list2.txt
I am certain that the firstjob.sh has not actually completed running because I see them in the queue when I use showq.
Perhaps there is an easy fix in most of these scenarios, but being new to all this, I am really struggling. I have to use this method in 8-10 places throughout the script, so it is really hindering progress. Would appreciate any assistance. Thanks.
POST EDIT 1
Here is the code contained in firstjob.sh...though doubtful that it will help. Everything in here functions as expected, always produces the correct results.
\#! /bin/bash
\#PBS -S /bin/bash
\#PBS -N demux
\#PBS -l walltime=72:00:00
\#PBS -j oe
\#PBS -l nodes=1:ppn=4
\#PBS -l mem=15gb
module load biotools
cd ${WORKDIR}/rawFQs/
INFILE=`head -$PBS_ARRAYID ${WORKDIR}${RAWFQ} | tail -1`
BASE=`basename "$INFILE" .fq.gz`
zcat $INFILE | fastx_barcode_splitter.pl --bcfile ${WORKDIR}/rawFQs/DemuxLists/${BASE}_sheet4splitter.txt --prefix ${WORKDIR}/fastqs/ --bol --suffix ".fq"
I just tried using -sync y, and that worked for me, so good idea there... Not sure what's different about your setup.
But a couple other things you could try involve your main script knowing the status of the qsub jobs you're running. One idea is that you could have your main script check the status of your job using qstat and wait until it finishes before proceeding.
Alternatively, you could have the first job write to a file as its last step (or, as you suggested, set up a dummy job that waits for the first job to finish). Then in your main script, you can test to see whether that file has been written before going on.

Run Julia codes on a cluster

I aim to run some Julia-coded simulations on a cluster (no complicated parallel processing involved) using a .pbs file (and qsub)
I know two ways to run a .jl file from the Bash. The first one is
/path/to/julia myscript.jl
The second one is
exec '/Applications/bla/bla/julia/bin/julia'
include("myscript.jl")
Here is my .pbs file. I cannot test if it works because I don't know yet where the Julia application is stored on the cluster.
#!/bin/bash
#PBS -l procs=1
#PBS -l walltime=240:00:00
#PBS -N Name
#PBS -m ea
#PBS -M name#something.com
#PBS -l pmem=1000mb
#PBS -t 1-3
echo "Starting run at: `date`"
exec '/Applications/bla/bla/julia/bin/julia'
include("myscript.jl")
echo "Job finished with exit code $? at: `date`"
Does it seem correct to you? Or should I, somehow, make an .exec out of my .jl?
You want to directly execute Julia, with your .jl program file as an argument.
Something like:
echo "Starting run at: `date`"
/Applications/bla/bla/julia/bin/julia myscript.jl
echo "Job finished with exit code $? at: `date`"
PBS will catch the standard out and put it in a file such as .pbs.o#### (similarly the standard error in .pbs.e####).
You might find an issue in where your 'present working directory' is when the script runs. Some clusters are setup to 'cd' you to a /tmp/ filesystem, or just drop you in your home directory, rather than being where the script was submitted from.
In that case, the simple solution is to use a full path for the Julia script, but this makes it difficult to reuse your PBS submission script.
/Applications/bla/bla/julia/bin/julia ~/mydirectory/myscript.jl

Resources