Variable Not getting recognized in shell script - shell

I use the following shell script to run a simulation on my cluster.
#PBS -N 0.05_0.05_m_1_200k
#PBS -l nodes=1:ppn=1,pmem=1000mb
#PBS -S /bin/bash
#$ -m n
#$ -j oe
FOLDER= 0.57
WDIR=/home/vikas/ala_1_free_energy/membrane_200k/restraint_decoupling_pullinit_$FOLDER
cd /home/vikas/ala_1_free_energy/membrane_200k/restraint_decoupling_pullinit_$FOLDER
LAMBDA= 0.05
/home/durba/gmx455/bin/mdrun -np 1 -deffnm md0.05 -v
############################
Now my problem is that my script doesn't recognize variable FOLDER and throws an error
couldn't find md0.05.tpr
which exist in the folder. If I write 0.57 at the place of $folder,It works fine, which makes me feel that it's not recognizing the variable FOLDER. LAMBDA is recognized perfectly in both of the cases.If somebody can help me here, I will be grateful.

There should not be a space between the = and the value you wish to assign to the variables:
FOLDER="0.57"
WDIR="/home/vikas/ala_1_free_energy/membrane_200k/restraint_decoupling_pullinit_$FOLDER"
cd "/home/vikas/ala_1_free_energy/membrane_200k/restraint_decoupling_pullinit_$FOLDER"
LAMBDA="0.05"
/home/durba/gmx455/bin/mdrun -np 1 -deffnm md0.05 -v
############################
All of the double quotes "" I added are not strictly necessary for this example, however it is good practice to get into using them.

Related

How to use variables on qsub?

I'm new to qsub and I'm trying to figure out how to use the task queue optimally. I have this script that works well:
#!bin/bash
##PBS -V # Export all environment variables from the qsub command environment to the batch job.
#PBS -N run
#PBS -q normal.q
#PBS -e archivo.err
#PBS -o archivo.out
#PBS -pe mpirun 8
#PBS -d ~/ # Working directory (PBS_O_WORKDIR)
#PBS -l nodes=1:ppn=8
~/brinicle/step-2/onephase_3/./main.x --mesh ~/brinicle/step-2/onephase_3/results/mesh.msh -Rmin 0 -Rmax 10 -Zmin 0 -Zmax 10 -o 2 -r 2 -T_f -10 -a_l 7.8 -a_s 70.8 -dt 0.01 -t_f 1 -v_s 10 -ode 12 -reltol 0.00001 -abstol 0.00001
The problem, as you can see, is that the command line is huge and hard to edit from the command shell. I would want to separate it into variables such as
#MESH="--mesh ~/brinicle/step-2/onephase_3/results/mesh.msh"
#EXE="~/brinicle/step-2/onephase_3/./main.x"
.
.
.
$EXE $MESH $PARAMETERS
And for the other parameters too.
But when I do this the program doesn't run and says that there's an illegal variable or that the variable is undefined. Also, is very important to me to change easily the parameters -o, -r, -ode and send multiple jobs at once. For example 5 equal jobs with -o 1 then 5 with -0 2 and so on. I want to be also able to modify in this way -r and -ode. The problem is that without using the variables I really don't know how to do that.
Please, if someone can tell me how to automate the script in this way would be a huge help.
Use bash arrays.
exe=(~/brinicle/step-2/onephase_3/./main.x)
mesh=(--mesh ~/brinicle/step-2/onephase_3/results/mesh.msh)
parms=(
-Rmin 0
-Rmax 10
-Zmin 0
-Zmax 10
. etc.
)
"${exe[#]}" "${mesh[#]}" "${parms[#]}"
Research bash arrays and how to use then and quoting in shell. Prefer to use lower case variables. Research order of expansions in shell.
One alternative if you have a lot of static parameters and a lot of dynamic ones is to refactor into a function where you hard-code what doesn't change, and interpolate the parts which do change.
qrunmesh () {
qsub <<:
#!bin/bash
##PBS -V # Export all environment variables from the qsub command environment to the batch job.
#PBS -N run
#PBS -q normal.q
#PBS -e archivo.err
#PBS -o archivo.out
#PBS -pe mpirun 8
#PBS -d ~/ # Working directory (PBS_O_WORKDIR)
#PBS -l nodes=1:ppn=8
"$1" --mesh "$2" -Rmin 0 -Rmax 10 -Zmin 0 -Zmax 10 \
-o "$3" -r "$4" -T_f -10 -a_l 7.8 -a_s 70.8 \
-dt 0.01 -t_f 1 -v_s 10 -ode "$5" \
-reltol 0.00001 -abstol 0.00001
:
}
for o in 1 2 3; do
for r in 5 10 15; do
for x in onephase_3 onephase_2 twophase_3; do
for ode in 12 13 15; do
for mesh in onephase_3 otherphase_2; do
qrunmesh "$x" "$mesh" "$o" "$r" "$ode"
done
done
done
done
done
(I'm not very familiar with qsub; I assume it accepts the script on standard input if you don't pass in a script name. If not, maybe you have to store the here document in a temporary file, submit it, and remove the temporary file.)

Creating a job-array from text file for Bash

I'm trying to create a job array to run simultaneously taking each line from the text file "somemore.txt" where I have the directories of some files I want to run through the program "fastqc". This is the script:
#!/bin/bash
#$ -S /bin/bash
#$ -N QC
#$ -cwd
#$ -l h_vmem=24G
cd /emc/cbmr/users/czs772/
FILENAME=$(sed -n $SGE_TASK_ID"p" somemore.txt)
/home/czs772/FastQC/fastqc $FILENAME -outdir /emc/cbmr/users/czs772/marcQC
but I get the error: "No such file or directory"
instead, if I run the code through a for loop I have no error:
for name in $(cat /emc/cbmr/users/czs772/somemore.txt)
do /home/czs772/FastQC/fastqc $name -outdir /emc/cbmr/users/czs772/marcQC
done
So it makes me think that the mistake is the script code and not the directory, but I can't make it to work. I've also tried to open the file with "cat" but again, it didn't work.
Any idea why?
Problem solved!
I typed "cat -vet" to see hidden characters:
cat -vet 2fastqc.sh
#!/bin/bash^M$
#$ -S /bin/bash^M$
#$ -N FastQC^M$
#$ -cwd^M$
#$ -pe smp 1^M$
#$ -l h_vmem=12G^M$
^M$
cd /emc/cbmr/users/czs772/^M$
FILENAME=$(sed -n $SGE_TASK_ID"p" somemore.txt)^M$
/home/czs772/FastQC/fastqc $FILENAME -outdir marcQC
Which showed an "^M" at the end of each line, I just discovered this is something that may happen when writing scripts in windows. It can be solved:
from the code editor program (I use Sublime text): by selecting the code, View tab -> Line Endings -> Unix (instead of Windows)
from the server by typing: dos2unix [name of the script.sh]
Thanks for your comments!

how to write a bash script that creates new scripts iteratively

How would I write a script that loops through all of my subjects and creates a new script per subject? The goal is to create a script that runs a program called FreeSurfer per subject on a supercomputer. The supercomputer queue restricts how long each script/job will take, so I will have each job run 1 subject. Ultimately I would like to automate the job submitting process since I cannot submit all the jobs at the same time. In my subjects folder I have three subjects: 3123, 3315, and 3412.
I am familiar with MATLAB scripting, so I was envisioning something like this
for i=1:length(subjects)
nano subjects(i).sh
<contents of FreeSurfer script>
input: /subjects(i)/scan_name.nii
output: /output/subjects(i)/<FreeSurfer output folders>
end
I know I mixed aspects of MATLAB and linux but hopefully it's relatively clear what the goal is. Please let me know if there is a better method.
Here is an example of the FreeSurfer script for a given subject
#!/bin/bash
#PBS -l walltime=25:00:00
#PBS -q long
export FREESURFER_HOME=/gpfs/software/freesurfer/6.0.0/freesurfer
source $FREESURFER_HOME/SetUpFreeSurfer.sh
export SUBJECTS_DIR=/gpfs/projects/Group/ppmi/freesurfer/subjects/
recon-all -i /gpfs/projects/Group/ppmi/all_anat/3105/Baseline/*.nii -s
$SUBJECTS_DIR/freesurfer/subjects/3105 -autorecon-all
The -i option gives the input and the -s option gives the output.
change your script to accept the subject as an argument, so that you have only one generic script.
#!/bin/bash
#PBS -l walltime=25:00:00
#PBS -q long
subject="$1"
export FREESURFER_HOME=/gpfs/software/freesurfer/6.0.0/freesurfer
source $FREESURFER_HOME/SetUpFreeSurfer.sh
export SUBJECTS_DIR=/gpfs/projects/Group/ppmi/freesurfer/subjects/
recon-all -i /gpfs/projects/Group/ppmi/all_anat/"$subject"/Baseline/*.nii -s
$SUBJECTS_DIR/freesurfer/subjects/"$subject" -autorecon-all
and you can call it for all your subjects
for s in 3123 3315 3412;
do
./yourscriptnamehere.sh "$s"
done
add error handling as desired.

RNA-seq STAR alignment error in reading fastq files

I am writing a script to use the STAR aligner to map fastq files to a reference genome. Here is my code:
#!/bin/bash
#$ -N DT_STAR
#$ -l mem_free=200G
#$ -pe openmp 8
#$ -q bio,abio,pub8i
module load STAR/2.5.2a
cd /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017
mkdir David_data1
STAR --genomeDir /dfs1/bio/dtatarak/indexes/STAR_Index --readFilesIn /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read2.fastq --runThreadN 8 --outFileNamePrefix "David_data1/DT_1"`
I keep getting this error message
EXITING because of fatal input ERROR: could not open readFilesIn=/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
Does anyone have experience using STAR? I cannot figure out why it isn't able to open my read files.
The second space character between STAR and --genomeDir is a syntax error. There should be only one.
Another thing is the argument --outFileNamePrefix "David_data1/DT_1"
Are you sure, that it takes a path, which is in quotes? Also you have to create the directory DT_1 within David_data1 first, if you didn't do so already manually. Also there always have to be a / in front of the paths.
--outFileNamePrefix /David_data1/DT_1/
Besides, are there any subdirectories in your STAR_Index folder? Because I always have to set the genomDir argument like this:
--genomeDir path/to/STAR_index/STARindex/hg38/
The message is known to come up, after syntax errors, so I hope it works, if you try it something like this:
#!/bin/bash
#$ -N DT_STAR
#$ -l mem_free=200G
#$ -pe openmp 8
#$ -q bio,abio,pub8i
module load STAR/2.5.2a
cd /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017
mkdir David_data1
cd David_data1
mkdir DT_1
cd ..
STAR --genomeDir /dfs1/bio/dtatarak/indexes/STAR_Index --readFilesIn /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read2.fastq --runThreadN 8 --outFileNamePrefix /David_data1/DT_1/

Sun Grid Engine: name output file using value stored in variable

Thanks in advance for the help.
I am trying to pass a job using
qsub -q myQ myJob.sh
in myJob.sh I have
# Name of the output log file:
temp=$( date +"%s")
out="myPath"
out=$out$temp
#$ -v out
#$ -o $out
unset temp
unset out
What I want is for my output file to have standard name with the unix timestamp appended to the end such as myOutputFile123456789
When I run this, my output file is named literally "$out" rather than myOutputFile123456789. Is it possible to do what I want and if so how might I do it?
You can't set -o or -e programtically inside the script. What you can do is point them at /dev/null then redirect inside the script. Assuming you want the timestamp to be the time the job ran and the jobscript is a bourne shell script (including bash,ksh,zsh scripts) then the following should work
#$ -o /dev/null
exec >myPath$(date +"%s")
You'll be throwing away any output from the prolog/epilog though.

Resources