Slurm script for parallel execution of independent tasks not working - parallel-processing

I am having a problem with the Slurm script as shown below:
#!/bin/bash
#
#SBATCH --job-name=parReconstructPar # Job name
#SBATCH --output=log.parReconstructPar # Standard output and error log
#SBATCH --partition=orbit # define the partition
#SBATCH -n 32
#
srun --exclusive -n1 reconstructPar -allRegions -time 0.0:0.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 0.35:0.65 &
srun --exclusive -n1 reconstructPar -allRegions -time 0.7:1.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.05:1.35 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.4:1.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.75:2.05 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.1:2.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.45:2.75 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.8:3.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.15:3.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.45:3.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.75:4.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.05:4.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.35:4.6 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.65:4.9 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.95:5.2 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.25:5.5 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.55:5.8 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.85:6.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.15:6.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.45:6.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.75:7.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.05:7.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.35:7.6 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.65:7.9 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.95:8.2 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.25:8.5 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.55:8.8 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.85:9.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.15:9.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.45:9.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.75:10.0 &
The script is supposed to submit several tasks that are independent from each other and should run in parallel. However, when submitting the job to the scheduler, the tasks aren't launched and the job is removed immediately. The log file does not show a single entry.
If someone could tell me, what is wrong with this, that would be very appreciated.
Best regards
I tried running the script without --exclusive and also with explicit memory allocation.

You are missing the command wait at the end of the submission script. Without wait to wait for all the backgrounded processes to complete, the script will exit straight away as you have seen.
i.e. Your script should be:
#!/bin/bash
#
#SBATCH --job-name=parReconstructPar # Job name
#SBATCH --output=log.parReconstructPar # Standard output and error log
#SBATCH --partition=orbit # define the partition
#SBATCH -n 32
#
srun --exclusive -n1 reconstructPar -allRegions -time 0.0:0.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 0.35:0.65 &
srun --exclusive -n1 reconstructPar -allRegions -time 0.7:1.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.05:1.35 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.4:1.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 1.75:2.05 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.1:2.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.45:2.75 &
srun --exclusive -n1 reconstructPar -allRegions -time 2.8:3.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.15:3.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.45:3.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 3.75:4.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.05:4.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.35:4.6 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.65:4.9 &
srun --exclusive -n1 reconstructPar -allRegions -time 4.95:5.2 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.25:5.5 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.55:5.8 &
srun --exclusive -n1 reconstructPar -allRegions -time 5.85:6.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.15:6.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.45:6.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 6.75:7.0 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.05:7.3 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.35:7.6 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.65:7.9 &
srun --exclusive -n1 reconstructPar -allRegions -time 7.95:8.2 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.25:8.5 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.55:8.8 &
srun --exclusive -n1 reconstructPar -allRegions -time 8.85:9.1 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.15:9.4 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.45:9.7 &
srun --exclusive -n1 reconstructPar -allRegions -time 9.75:10.0 &
wait

Related

How to append memory usage for each step within a shell script in slurm output

I have a bash script:
#!/bin/bash
time srun -p my_partition -c 1 --mem=4G my_code -i my_file_1 -o my_output_file_1
time srun -p my_partition -c 1 --mem=4G my_code -i my_file_2 -o my_output_file_2
time srun -p my_partition -c 1 --mem=4G my_code -i my_file_3 -o my_output_file_3
time srun -p my_partition -c 1 --mem=4G my_code -i my_file_4 -o my_output_file_4
I want to know the average memory usage for each step (printed after the real/user/sys time) while the script is running.
I have tried
#!/bin/bash
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_1 -o my_output_file_1
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_2 -o my_output_file_2
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_3 -o my_output_file_3
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_4 -o my_output_file_4
sstat -a -j my_job --format=JobName,AveRSS,MaxRSS
You can try
#!/bin/bash
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_1 -o my_output_file_1
sstat -j ${SLURM_JOB_ID}.1 --format=JobName,AveRSS,MaxRSS
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_2 -o my_output_file_2
sstat -j ${SLURM_JOB_ID}.2 --format=JobName,AveRSS,MaxRSS
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_3 -o my_output_file_3
sstat -j ${SLURM_JOB_ID}.3 --format=JobName,AveRSS,MaxRSS
time srun -p my_partition -c 1 --mem=4G --job-name"my_job" my_code -i my_file_4 -o my_output_file_4
sstat -j ${SLURM_JOB_ID}.4 --format=JobName,AveRSS,MaxRSS

Hisat2 with job array

I want to use Hsat2 instead of bowtie2 but
I have a problem with my script:
!/bin/bash
##SBATCH --time=5:00:00
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --mem=32G
#SBATCH --cpus-per-task=16 # Nb of threads we want to run on
#SBATCH -o log/slurmjob-%j
#SBATCH --job-name=hist2
#SBATCH --partition=short
#SBATCH --array=0-5
module load gcc/4.8.4 HISAT2/2.0.5 samtools/1.3
SCRATCHDIR=/storage/scratch/"$USER"/"$SLURM_JOB_ID"
DATABANK="$HOME/projet/GRCm38/bwa"
OUTPUT="$HOME"/hisat2
mkdir -p "$OUTPUT"
mkdir -p "$SCRATCHDIR"
cd "$SCRATCHDIR"
#Define an array to optimize tasks
ARRAY=()
hisat2 -p 8 -x "$DATABANK"/all.* -1 "$HOME"/chipseq/${ARRAY[$SLURM_ARRAY_TASK_ID]}_R1_trim.fastq.gz -2 "$HOME"/chipseq/${ARRAY[$SLURM_ARRAY_TASK_ID]}_R2_trim.fastq.gz -S Hisat2_out.${ARRAY[$SLURM_ARRAY_TASK_ID]}.sam | samtools view -b -S - | samtools sort - -o Hisat2_out.${ARRAY[$SLURM_ARRAY_TASK_ID]}.mapped.sorted.bam
samtools idxstats Hisat2_out.${ARRAY[$SLURM_ARRAY_TASK_ID]}.mapped.sorted.bam > $OUTPUT/"$HOME/hisat2/hisat2_indxstat".log
mv "$SCRATCHDIR" "$OUTPUT"
The error occurs on this one
${ARRAY[$SLURM_ARRAY_TASK_ID]} : variable without link
Thank you for the help !

How to process a list of files with SLURM

I'm new to SLURM. I want to process a list of files assembled_reads/*.sorted.bam in parallel. With the code below, however only one process is being used over and over again.
#!/bin/bash
#
#SBATCH --job-name=****
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --partition=short
#SBATCH --time=12:00:00
#SBATCH --array=1-100
#SBATCH --mem-per-cpu=16000
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=****#***.edu
srun hostname
for FILE in assembled_reads/*.sorted.bam; do
echo ${FILE}
OUTFILE=$(basename ${FILE} .sorted.bam).raw.snps.indels.g.vcf
PLDY=$(awk -F "," '$1=="$FILE"{print $4}' metadata.csv)
PLDYNUM=$( [[$PLDY = "haploid" ]] && echo "1" || echo "2")
srun java -Djava.io.tmpdir="tmp" -jar GenomeAnalysisTK.jar \
-R scaffs_HAPSgracilaria92_50REF.fasta \
-T HaplotypeCaller \
-I ${${SLURM_ARRAY_TASK_ID}} \
--emitRefConfidence GVCF \
-ploidy $PLDYNUM \
-nt 1 \
-nct 24 \
-o $OUTFILE
sleep 1 # pause to be kind to the scheduler
done
You are creating a job array but are not using it. You should replace the for-loop with an indexing of the files based on the slurm job array id:
#!/bin/bash
#
#SBATCH --job-name=****
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=24
#SBATCH --partition=short
#SBATCH --time=12:00:00
#SBATCH --array=0-99
#SBATCH --mem-per-cpu=16000
#SBATCH --mail-type=FAIL
#SBATCH --mail-user=****#***.edu
srun hostname
FILES=(assembled_reads/*.sorted.bam)
FILE=${FILES[$SLURM_ARRAY_TASK_ID]}
echo ${FILE}
OUTFILE=$(basename ${FILE} .sorted.bam).raw.snps.indels.g.vcf
PLDY=$(awk -F "," '$1=="$FILE"{print $4}' metadata.csv)
PLDYNUM=$( [[$PLDY = "haploid" ]] && echo "1" || echo "2")
srun java -Djava.io.tmpdir="tmp" -jar GenomeAnalysisTK.jar \
-R scaffs_HAPSgracilaria92_50REF.fasta \
-T HaplotypeCaller \
-I ${${SLURM_ARRAY_TASK_ID}} \
--emitRefConfidence GVCF \
-ploidy $PLDYNUM \
-nt 1 \
-nct 24 \
-o $OUTFILE
Just make sure to adapt the value of --array to be equal to the number of files to process.

Hbase Indexer with Lily Solr deletes indexed data on adding New Indexer

I use Hbase Indexer with Lily Solr (Lucene 4.4) with my CDH4 System.
I have 2 servers where lily-solr runs and 5 Indexer Configurations to Index different data datatype , I too use lily-server for storing data in Hbase.
When I add indexer configuration and request for batch build , initially it indexes all data and gradually it deletes all indexes .
I use the following way to add index :
asset_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n asset_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c asset_indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n asset_idx01 --batch BUILD_REQUESTED
asset_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n asset_idx02 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster02:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c asset_indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n asset_idx02 --batch BUILD_REQUESTED
brightcove_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n brightcove_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c brightcove_indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n brightcove_idx01 --batch BUILD_REQUESTED
brightcove_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n brightcove_idx02 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster02:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c brightcove_indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n brightcove_idx02 --batch BUILD_REQUESTED
fortune_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n fortune_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c fortune_indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n fortune_idx01 --batch BUILD_REQUESTED
fortune_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n fortune_idx01 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster02:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c fortune_indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n fortune_idx02 --batch BUILD_REQUESTED
populist_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n populist_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n populist_idx01 --batch BUILD_REQUESTED
populist_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n populist_idx02 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n populist_idx02 --batch BUILD_REQUESTED
twitter_stat_idx01
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n twitter_stats_idx01 -z spotlight-prod-cluster01 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster01:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c twitter_stats_indexconfig.xml -cp lily.zk=spotlight-prod-cluster01
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n twitter_stats_idx01 --batch BUILD_REQUESTED
twitter_stat_idx02
/usr/lib/hbase-indexer/bin/hbase-indexer add-indexer -n twitter_stats_idx02 -z spotlight-prod-cluster02 -cp solr.mode=classic -cp solr.shard.1=http://spotlight-prod-cluster02:7575/solr -r org.lilyproject.indexer.hbase.mapper.LilyIndexerComponentFactory -cp solr.sharder=org.lilyproject.indexer.hbase.mapper.LilySharder -c twitter_stats_indexconfig.xml -cp lily.zk=spotlight-prod-cluster02
/usr/lib/hbase-indexer/bin/hbase-indexer update-indexer -n twitter_stats_idx02 --batch BUILD_REQUESTED
Can anyone please help me in knowing the various root cause, So that I can rectify.
I follow the following steps
Import Data to Hbase
Start Lily Server
Start Hbase Indexer
Add Indexer Configuration to Hbase
Request Batch Build as mentioned above.

xautolock doesn't start a second time

I'll give an example to describe my problem.
#!/bin/sh
if (( $# == 1 ))
then
xmessage "before kill"
killall xautolock
xmessage "after kill"
var=$1
let "var += 1"
xautolock -time $var -locker "\"./test1.sh\"" &
xmessage "after run"
exit 0
fi
The first time I start xautolock from bash:
$ xautolock -time 1 -locker "./test1.sh 1" &
The option -time means that xautolock will start a program which passed as an argument of the option -locker after 1 minute idle time.
After starting xautolock from bash:
$ ps ax | grep -E "xaut|test"
6038 pts/1 S 0:00 xautolock -time 1 -locker ./test1.sh 1
6046 pts/2 S+ 0:00 grep -E xaut|test
After starting xmessage "before kill" :
$ ps ax | grep -E "xaut|test"
6038 pts/1 S 0:00 xautolock -time 1 -locker ./test1.sh 1
6223 pts/1 S 0:00 /bin/sh /home/mhd/Texts/Programming/Programms/test1.sh 1
6240 pts/2 S+ 0:00 grep -E xaut|test
After starting xmessage "after kill":
$ ps ax | grep -E "xaut|test"
6223 pts/1 S 0:00 /bin/sh /home/mhd/Texts/Programming/Programms/test1.sh 1
6373 pts/2 S+ 0:00 grep -E xaut|test
After starting xmessage "after run":
$ ps ax | grep -E "xaut|test"
6223 pts/1 S 0:00 /bin/sh /home/mhd/Texts/Programming/Programms/test1.sh 1
6470 pts/2 S+ 0:00 grep -E xaut|test
Why isn't xautolock in a list of processes after this step? How to start it a second time in a Bash script?
xautolock closes stdout and stdrerr by default. If you will pass the option "-noclose" to xautolock then it will not close stdout and stdrerr and you can start xautolock a second time in the Bash script. But I don't understand why xautolock will not start a second time in my sample script if it has closed stdout and stderr?

Resources