Creating separate output file per input file - bash

I'm using kofamscan by KEGG to annotate bunch of fasta files.I'm running this with multiple fasta files so whenever new file is being analyzed the output file is being overwritten. I really want separate output files per input file(i.e. a.fasta -> a.txt; b.fasta -> b.txt, etc.) and I have tried the following but it seems to be not working:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -pe def_slot 8
#$ -N coral_kofam
#$ -o stdout
#$ -e stderr
#$ -l os7
# perform kofam operation from file 1 to file 47
#$ -t 1-47:1
#$ -tc 10
#setting
source ~/.bash_profile
readarray -t files < kofam_files #input files
TASK_ID=$((SGE_TASK_ID - 1))
~/kofamscan/bin/exec_annotation -o kofam_out_[$TASK_ID].txt --tmp-dir $(mktemp -d) ${files[$TASK_ID]}
The following section of the code is where I need to change(obviously as it is not working for me now)
-o kofam_out_[$TASK_ID].txt
Could anybody help me how to make this work?

Do you want to name output file with $TASK_ID?
Just put file name like this kofam_out_${TASK_ID}.txt

Related

How to use variables on qsub?

I'm new to qsub and I'm trying to figure out how to use the task queue optimally. I have this script that works well:
#!bin/bash
##PBS -V # Export all environment variables from the qsub command environment to the batch job.
#PBS -N run
#PBS -q normal.q
#PBS -e archivo.err
#PBS -o archivo.out
#PBS -pe mpirun 8
#PBS -d ~/ # Working directory (PBS_O_WORKDIR)
#PBS -l nodes=1:ppn=8
~/brinicle/step-2/onephase_3/./main.x --mesh ~/brinicle/step-2/onephase_3/results/mesh.msh -Rmin 0 -Rmax 10 -Zmin 0 -Zmax 10 -o 2 -r 2 -T_f -10 -a_l 7.8 -a_s 70.8 -dt 0.01 -t_f 1 -v_s 10 -ode 12 -reltol 0.00001 -abstol 0.00001
The problem, as you can see, is that the command line is huge and hard to edit from the command shell. I would want to separate it into variables such as
#MESH="--mesh ~/brinicle/step-2/onephase_3/results/mesh.msh"
#EXE="~/brinicle/step-2/onephase_3/./main.x"
.
.
.
$EXE $MESH $PARAMETERS
And for the other parameters too.
But when I do this the program doesn't run and says that there's an illegal variable or that the variable is undefined. Also, is very important to me to change easily the parameters -o, -r, -ode and send multiple jobs at once. For example 5 equal jobs with -o 1 then 5 with -0 2 and so on. I want to be also able to modify in this way -r and -ode. The problem is that without using the variables I really don't know how to do that.
Please, if someone can tell me how to automate the script in this way would be a huge help.
Use bash arrays.
exe=(~/brinicle/step-2/onephase_3/./main.x)
mesh=(--mesh ~/brinicle/step-2/onephase_3/results/mesh.msh)
parms=(
-Rmin 0
-Rmax 10
-Zmin 0
-Zmax 10
. etc.
)
"${exe[#]}" "${mesh[#]}" "${parms[#]}"
Research bash arrays and how to use then and quoting in shell. Prefer to use lower case variables. Research order of expansions in shell.
One alternative if you have a lot of static parameters and a lot of dynamic ones is to refactor into a function where you hard-code what doesn't change, and interpolate the parts which do change.
qrunmesh () {
qsub <<:
#!bin/bash
##PBS -V # Export all environment variables from the qsub command environment to the batch job.
#PBS -N run
#PBS -q normal.q
#PBS -e archivo.err
#PBS -o archivo.out
#PBS -pe mpirun 8
#PBS -d ~/ # Working directory (PBS_O_WORKDIR)
#PBS -l nodes=1:ppn=8
"$1" --mesh "$2" -Rmin 0 -Rmax 10 -Zmin 0 -Zmax 10 \
-o "$3" -r "$4" -T_f -10 -a_l 7.8 -a_s 70.8 \
-dt 0.01 -t_f 1 -v_s 10 -ode "$5" \
-reltol 0.00001 -abstol 0.00001
:
}
for o in 1 2 3; do
for r in 5 10 15; do
for x in onephase_3 onephase_2 twophase_3; do
for ode in 12 13 15; do
for mesh in onephase_3 otherphase_2; do
qrunmesh "$x" "$mesh" "$o" "$r" "$ode"
done
done
done
done
done
(I'm not very familiar with qsub; I assume it accepts the script on standard input if you don't pass in a script name. If not, maybe you have to store the here document in a temporary file, submit it, and remove the temporary file.)

Creating a job-array from text file for Bash

I'm trying to create a job array to run simultaneously taking each line from the text file "somemore.txt" where I have the directories of some files I want to run through the program "fastqc". This is the script:
#!/bin/bash
#$ -S /bin/bash
#$ -N QC
#$ -cwd
#$ -l h_vmem=24G
cd /emc/cbmr/users/czs772/
FILENAME=$(sed -n $SGE_TASK_ID"p" somemore.txt)
/home/czs772/FastQC/fastqc $FILENAME -outdir /emc/cbmr/users/czs772/marcQC
but I get the error: "No such file or directory"
instead, if I run the code through a for loop I have no error:
for name in $(cat /emc/cbmr/users/czs772/somemore.txt)
do /home/czs772/FastQC/fastqc $name -outdir /emc/cbmr/users/czs772/marcQC
done
So it makes me think that the mistake is the script code and not the directory, but I can't make it to work. I've also tried to open the file with "cat" but again, it didn't work.
Any idea why?
Problem solved!
I typed "cat -vet" to see hidden characters:
cat -vet 2fastqc.sh
#!/bin/bash^M$
#$ -S /bin/bash^M$
#$ -N FastQC^M$
#$ -cwd^M$
#$ -pe smp 1^M$
#$ -l h_vmem=12G^M$
^M$
cd /emc/cbmr/users/czs772/^M$
FILENAME=$(sed -n $SGE_TASK_ID"p" somemore.txt)^M$
/home/czs772/FastQC/fastqc $FILENAME -outdir marcQC
Which showed an "^M" at the end of each line, I just discovered this is something that may happen when writing scripts in windows. It can be solved:
from the code editor program (I use Sublime text): by selecting the code, View tab -> Line Endings -> Unix (instead of Windows)
from the server by typing: dos2unix [name of the script.sh]
Thanks for your comments!

RNA-seq STAR alignment error in reading fastq files

I am writing a script to use the STAR aligner to map fastq files to a reference genome. Here is my code:
#!/bin/bash
#$ -N DT_STAR
#$ -l mem_free=200G
#$ -pe openmp 8
#$ -q bio,abio,pub8i
module load STAR/2.5.2a
cd /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017
mkdir David_data1
STAR --genomeDir /dfs1/bio/dtatarak/indexes/STAR_Index --readFilesIn /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read2.fastq --runThreadN 8 --outFileNamePrefix "David_data1/DT_1"`
I keep getting this error message
EXITING because of fatal input ERROR: could not open readFilesIn=/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
Does anyone have experience using STAR? I cannot figure out why it isn't able to open my read files.
The second space character between STAR and --genomeDir is a syntax error. There should be only one.
Another thing is the argument --outFileNamePrefix "David_data1/DT_1"
Are you sure, that it takes a path, which is in quotes? Also you have to create the directory DT_1 within David_data1 first, if you didn't do so already manually. Also there always have to be a / in front of the paths.
--outFileNamePrefix /David_data1/DT_1/
Besides, are there any subdirectories in your STAR_Index folder? Because I always have to set the genomDir argument like this:
--genomeDir path/to/STAR_index/STARindex/hg38/
The message is known to come up, after syntax errors, so I hope it works, if you try it something like this:
#!/bin/bash
#$ -N DT_STAR
#$ -l mem_free=200G
#$ -pe openmp 8
#$ -q bio,abio,pub8i
module load STAR/2.5.2a
cd /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017
mkdir David_data1
cd David_data1
mkdir DT_1
cd ..
STAR --genomeDir /dfs1/bio/dtatarak/indexes/STAR_Index --readFilesIn /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read2.fastq --runThreadN 8 --outFileNamePrefix /David_data1/DT_1/

How to save the output files in the corresponding folders

I have many allsamples.bam files in different folders and I want to extract unmapped reads from all of them and save it as unmapped.bam in the corresponding folders, how to do it? allbamfiles.txt contains the paths to all my bam files.
#!/usr/bin/env bash
#$ -q cluster
#$ -cwd
#$ -N test
#$ -e /path/to/log
#$ -o /path/to/log
#$ -l job_mem=8G
#$ -pe serial 4
SAMTOOLS="/path/to/samtools"
while IFS= read -r file
do
$SAMTOOLS view -b -f 4 $file > "${file%.bam}_unmapped.bam"
done < "/path/to/allbamfiles.txt"
wait
Assuming that the paths of all files in allbamfiles.txt are refered to the current directory or are absolute paths this solution should work.
Notice that the dirname command gets the path of the file and the basename command gets the file name.
SAMTOOLS="/path/to/samtools"
while read file; do
dir=$(dirname $file)
fileName=$(basename $file)
$SAMTOOLS view -b -f 4 $file > "${dir}/${fileName%.bam}_unmapped.bam"
done < "/path/to/allbamfiles.txt"

Sun Grid Engine: name output file using value stored in variable

Thanks in advance for the help.
I am trying to pass a job using
qsub -q myQ myJob.sh
in myJob.sh I have
# Name of the output log file:
temp=$( date +"%s")
out="myPath"
out=$out$temp
#$ -v out
#$ -o $out
unset temp
unset out
What I want is for my output file to have standard name with the unix timestamp appended to the end such as myOutputFile123456789
When I run this, my output file is named literally "$out" rather than myOutputFile123456789. Is it possible to do what I want and if so how might I do it?
You can't set -o or -e programtically inside the script. What you can do is point them at /dev/null then redirect inside the script. Assuming you want the timestamp to be the time the job ran and the jobscript is a bourne shell script (including bash,ksh,zsh scripts) then the following should work
#$ -o /dev/null
exec >myPath$(date +"%s")
You'll be throwing away any output from the prolog/epilog though.

Resources