Run netlogo in parallel mpi using Sun Grid Engine - bash

#!/bin/bash
#$ -N new
#$ -q all.q
#$ -pe mpi 30
unset SGE_ROOT
/opt/mpi/1.8.1/bin/mpirun -np $NSLOTS -hostfile $TMPDIR/machines /home/abhishekb/netlogo/netlogo-5.2.0/netlogo-headless.sh \
--model /home/abhishekb/scale_med/try4.nlogo \
--experiment experiment1 \
--table /home/abhishekb/Trash/anything.csv
Error:
The: Command not found.
queuing: Command not found.
time-to-exit: Command not found.
Badly placed ()'s.
Output:
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
--------------------------------------------------------------------------
A hostfile was provided that contains at least one node not
present in the allocation:
hostfile: /tmp/8396.1.all.q/machines
node: compute-0-1
If you are operating in a resource-managed environment, then only
nodes that are in the allocation can be used in the hostfile. You
may find relative node syntax to be a useful alternative to
specifying absolute node names see the orte_hosts man page for
further information.
--------------------------------------------------------------------------
PE file:
rm: cannot remove `/tmp/8396.1.all.q/rsh': No such file or directory
Earlier, I used to run the below:
#!/bin/bash
#$ -N new
#$ -q all.q
#$ -pe mpi 30
/home/abhishekb/netlogo/netlogo-5.2.0/netlogo-headless.sh \
--model /home/abhishekb/std_low/try4.nlogo \
--experiment experiment1 \
--table /home/abhishekb/Trash/anything.csv \
--threads 30
which simply processes on just one core (on checking at HPC end)though it grabs 30
Edit:
Doc for submitting jobs:
http://it.iiitd.edu.in/HPC_final_doc.pdf Please refer page 4 and 5 section 10 `Job subsmission steps.
Submitted job by qsub <filename.sh>

Related

Running different tasks on individual resource sets within same node

I asked about an issue I had with this using a different approach (Having issues running mpi4py on large HPC system. Receving startup errors and sometimes variable errors), however I'm currently attempting two other approaches. With no success. All examples below still put the same task on each of the six resource sets.
Background: I'm attempting to distribute predictions across resource sets on a node. Each resource set contains 1 gpu and 7 cpus and there are six sets per node. Once a RS task completes, it should move on to the next prediction on in a list (part00.lst through part05.lst; in theory one per RS)
First approach looks something like this (a submission bash script calls this using jsrun -r6 -g1 -a1 -c7 -b packed:7 -d packed -l gpu-cpu):
#!/bin/bash
output=/path/ ##where completed predictions will be collected
for i in {0..5}; do
target=part0${i}.lst
........ ##the singularity job script to execute using $target and $output variables
done
The next attempt is using simultaneous jobs steps via UNIX backgrounding (which others have been able to appropriate to do similar things that I wish to do, but for different jobs and tasks). Here I created six separate bash files with each corresponding input file ($target aka part00.lst through part05.lst):
#!/bin/bash
## Various submission flags here
for i in {0..5}; do
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_0${i}.sh &
done
wait
I also attempted just hardcoding the six separate bash files:
#!/bin/bash
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_00.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_01.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_02.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_03.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_04.sh &
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_05.sh &
wait
Thanks for any help! I'm still quite new to all of this!
Okay, attempt number two using simultaneous job steps/UNIX process backgrounding was nearly correct!
It now works. An example for one node:
Submission script:
#!/bin/bash
## Various submission flags here
for i in {0..5}; do
jsrun -r 6 -g 1 -a 1 -c 7 -brs -d packed -l gpu-cpu bash batch_run_0${i}.sh &
done
wait
It was only a matter of incorrect flags (-n 1 -r 1, not -n 1 -r 6).

Why it's not possible to run wget with background option in slurm script?

I used this script for downloading files. Without -b, wget download files one by one. With -b, I have the possibility to download files in background but also simultaneously. Unfortunately, the script doesn't work in SLURM. It only works without -b in Slurm.
Script for downloading files
#!/bin/bash
mkdir data
cd data
for i in 11 08 15 26 ;
do
wget -c -b -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_1.fastq.gz
wget -c -b -q ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_2.fastq.gz
done
cd ..
Slurm Script
#!/bin/bash
#SBATCH --job-name=mytestjob # create a short name for your job
#SBATCH --nodes=2 # node count
#SBATCH --ntasks=2 # total number of tasks across all nodes
#SBATCH --cpus-per-task=2 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=4G # memory per cpu-core (4G is default
#SBATCH --time=10:01:00 # total run time limit (HH:MM:SS)
#SBATCH --array=1-2 # job array with index values 1, 2
#Execution
bash download.sh
On the terminal : sbatch slurmsript.sh ( It dosen't work) no jobid
You can download multiple files at the same time with curl.
In your case, this should work:
# Create an empty bash array of urls.
urls=()
# Add each url to the array, such that '-O' and the url are separate
# items in the array. This is necessary so that the curl command will
# look like 'curl -O <url1> -O <url2> ...', since the -O command must
# be provided for each url.
for i in 11 08 15 16; do
urls+=( "-O" "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_1.fastq.gz" )
urls+=( "-O" "ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR116/0${i}/SRR116802${i}/SRR116802${i}_2.fastq.gz" )
done
# Simultaneously download from all urls.
curl --silent -C - "${urls[#]}"
To explain each of the curl options:
--silent is the equivalent of wget's -q. Disables curl's progress meter.
-C - is the equivalent of wget's -c. It tells curl to automatically find out where/how to resume a transfer.
-O tells curl to to write the output to a file with the same name as the remote file (this is the behavior of wget). This must be specified for each url.
Alternatively, you might want to consider installing and using aria2.

why the job has been submitted using qsub is unknown?

In regard to create a PBS script file to run long-term jobs on a server with 256 GB of RAM and two CPUs, each with 12 cores and 24 threads, yielding 48 computing unit. I tried to do it, but I think there is something wrong.
I created a PBS script named run_trinity and submitted it to server using qsub command (qsub run_trinity.sh) within the same directory that contain my desired program (trinity) and data, and it returned something like 47.chpc. But when I tried to check the status of job using qstat command, it says: unknown job id 47.chpc. I'm a biology student and really new in this field, could you please help me to figure out what happened? here is my PBS script:
#!/bin/bash
#PBS -N run_trinity
#PBS -l nodes=1:ppn=6
#PBS -l walltime=100:00:00
#PBS -l mem=200gb
#PBS -j oe
#Set stack size to unlimited
ulimit -s unlimited
cd /home/mary/software/trinityrnaseq_r20140717
perl /home/mary/software/trinityrnaseq_r20140717/Trinity.pl --seqType fq --JM 200G --normalize_reads --left reads8_1.fq.gz --right reads8_2.fq.gz --SS_lib_type FR --CPU 6 --full_cleanup --output /home/mary/software/trinityrnaseq_r20140717
Looking forward to hearing your perfect solutions.

Create a new batch .txt file with specified content for every current file in a directory

I have a huge list of files on a cluster and I need to create a .txt file for each "pair". Each pair is specified by filename_R1.fq.gz and filename_R2.fq.gz. for each pair of R1 and R2 files I need to create a text file that contains:
#!/bin/bash
#$ -N align.$i
#$ -j y
#$ -l h_rt=4:00:00
#$ -pe omp 12
bowtie2 \
--phred33 \
--fast-local \
-X 1000 \
-p 12 \
-x /usr3/graduate/dhc285/reference_files/21G6 \
-1 $i -2 ${i%_R1.fq.gz}_R2.fq.gz \
| samtools view -bS - > ${i%_R1.fq.gz}.bam
Where the $i command refers to my filenames. I would also like each file to be named ${i%_R1.fq.gz}.txt. Thanks!
Using GNU Parallel it looks like this:
sge_jobfile() {
i="$1"
cat <<EOF > ${i%_R1.fq.gz}.txt
#!/bin/bash
#$ -N align.$i
#$ -j y
#$ -l h_rt=4:00:00
#$ -pe omp 12
bowtie2 \\
--phred33 \\
--fast-local \\
-X 1000 \\
-p 12 \\
-x /usr3/graduate/dhc285/reference_files/21G6 \\
-1 $i -2 ${i%_R1.fq.gz}_R2.fq.gz \\
| samtools view -bS - > ${i%_R1.fq.gz}.bam
EOF
}
export -f sge_jobfile
parallel sge_jobfile ::: *_R1.fq.gz
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to. It can often replace a for loop.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
If GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
(wget -O - pi.dk/3 || curl pi.dk/3/ || fetch -o - http://pi.dk/3) | bash
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel

SunGridEngine, Condor, Torque as Resource Managers for PVM

Anyone have any idea which Resource manager is good for PVM? Or should I not have used PVM and instead relied on MPI (or any version of it, such as MPICH-2 [are there any other ones that are better?]). Main reason for using PVM was because the person before me who started this project assumed the use of PVM. However, now that this project is mine (he hasn't done any significant work that relies on PVM) this can be easily changed, preferably to something that is easy to install because installing and setting up PVM was a big hassle.
I'm leaning towards SunGridEngine seeing as how I have dedicated hardware, and after reading up on another post of which ones are better for dedicated hardware, SGE seems to be the winner. However I'm unsure of its performance using PVM. Wondering if anyone have had any experience with PVM and SGE?
If people use SGE, what do you use to communicate from computer to computer (or virtual machine to virtual machine)
Oh and I will be running Perl applications/lines if this matters.
Any suggestions or ideas?
Thanks in advance to all comments,
Tyug
I run PVM on Linux systems using Torque, SGE and LSF without any problems. Are you asking "Is it possible to use SGE, Torque, etc. to run PVM applications?"?
If so, check out my example Linux c-shell job scripts below. Note the scripts are nearly identical, except for the header of each script, which conforms to the appropriate format for each resource manager.
SGE job script:
#!/bin/csh
#$ -N LTR-001
#$ -o LTR-001.output
#$ -e LTR-001.error
#$ -pe comp 24
#$ -l h_rt=04:00:00
#$ -A cmit2
#$ -cwd
#$ -V
# Setup envirnoment
setenv LD_LIBRARY_PATH /lfs0/projects/cmit2/opt-intel/overture-noX/lib:${LD_LIBRARY_PATH}
setenv PVM_ARCH LINUX
setenv PVM_ROOT /lfs0/projects/cmit2/opt-intel/pvm3
setenv PVM_BIN ${PVM_ROOT}/bin
setenv PVM_RSH /usr/bin/ssh
setenv MY_HOSTS pvm_hostfile
rm -f ~/.pvmprofile
env | grep PVM_ > ~/.pvmprofile
# Create file containing _unique_ host names. Note that there are two possible sources of available hosts
sort -k 1,1 -u ${MACHINE_FILE} >! ${MY_HOSTS}
# Start PVM & add nodes
printf "%s\n%s\n" conf quit|${PVM_ROOT}/lib/pvm ${MY_HOSTS}
wait
sleep 2
#
# Run apps requiring PVM.
#
wait
# Exit PVM daemon
echo "reset" | $PVM_ROOT/lib/pvm
echo "halt" | $PVM_ROOT/lib/pvm
Torque job script:
#!/bin/csh
#PBS -N LTR-001
#PBS -o LTR-001.output
#PBS -e LTR-001.error
#PBS -l nodes=3:ppn=8
#PBS -l walltime=04:00:00
#PBS -q compute
#PBS -d .
# Setup envirnoment
setenv LD_LIBRARY_PATH /users/ps14/opt-intel/overture/lib:${LD_LIBRARY_PATH}
setenv PVM_ARCH LINUX64
setenv PVM_ROOT /users/ps14/opt-intel/pvm3
setenv PVM_BIN ${PVM_ROOT}/bin
setenv PVM_RSH ${PVM_ROOT}/ssh
setenv MY_HOSTS pvm_hostfile
rm -f ~/.pvmprofile
env | grep PVM_ > ~/.pvmprofile
# Create file containing _unique_ host names. Note that there are two possible sources of available hosts
sort -k 1,1 -u ${PBS_NODEFILE} >! ${MY_HOSTS}
# Start PVM & add nodes
printf "%s\n%s\n" conf quit|${PVM_ROOT}/lib/pvm ${MY_HOSTS}
wait
sleep 2
#
# Run apps requiring PVM.
#
wait
# Exit PVM daemon
echo "reset" | $PVM_ROOT/lib/pvm
echo "halt" | $PVM_ROOT/lib/pvm

Resources