Creating a job-array from text file for Bash - bash

I'm trying to create a job array to run simultaneously taking each line from the text file "somemore.txt" where I have the directories of some files I want to run through the program "fastqc". This is the script:
#!/bin/bash
#$ -S /bin/bash
#$ -N QC
#$ -cwd
#$ -l h_vmem=24G
cd /emc/cbmr/users/czs772/
FILENAME=$(sed -n $SGE_TASK_ID"p" somemore.txt)
/home/czs772/FastQC/fastqc $FILENAME -outdir /emc/cbmr/users/czs772/marcQC
but I get the error: "No such file or directory"
instead, if I run the code through a for loop I have no error:
for name in $(cat /emc/cbmr/users/czs772/somemore.txt)
do /home/czs772/FastQC/fastqc $name -outdir /emc/cbmr/users/czs772/marcQC
done
So it makes me think that the mistake is the script code and not the directory, but I can't make it to work. I've also tried to open the file with "cat" but again, it didn't work.
Any idea why?

Problem solved!
I typed "cat -vet" to see hidden characters:
cat -vet 2fastqc.sh
#!/bin/bash^M$
#$ -S /bin/bash^M$
#$ -N FastQC^M$
#$ -cwd^M$
#$ -pe smp 1^M$
#$ -l h_vmem=12G^M$
^M$
cd /emc/cbmr/users/czs772/^M$
FILENAME=$(sed -n $SGE_TASK_ID"p" somemore.txt)^M$
/home/czs772/FastQC/fastqc $FILENAME -outdir marcQC
Which showed an "^M" at the end of each line, I just discovered this is something that may happen when writing scripts in windows. It can be solved:
from the code editor program (I use Sublime text): by selecting the code, View tab -> Line Endings -> Unix (instead of Windows)
from the server by typing: dos2unix [name of the script.sh]
Thanks for your comments!

Related

Adding printers by shell script; works in terminal but not as .command

I am trying to provide a clickable .command to set up printers in Macs for my workplace. I thought since it is something I do very frequently, I can write a shellscript for each printer and save it on a shared server. Then, when I need to add a printer for someone, I can just find the shell script on the server and execute it. My current command works in terminal, but once executed as a .command, it comes up with the errors.
This is my script:
#!/bin/sh
lpadmin -p ‘PRINTERNAME’ -D PRINTER\ NAME -L ‘OFFICE’ -v lpd://xx.xx.xx.xx -P /Library/Printers/PPDs/Contents/Resources/Xerox\ WorkCentre\ 7855.gz -o printer-is-shared=false -E​
I get this error after running the script:
lpadmin: Unknown option “?”.
I find this strange, because there is no "?" in the script.
I have a idea, why not try it like this ? there are huge differences between sh shells, so let me know if it rocks, I have more ideas.
#!/bin/sh
PPD="PRINTERNAME"
INFO="PRINTER\ NAME"
LOC="OFFICE"
URI="lpd://xx.xx.xx.xx"
OP ="printer-is-shared=false"
# This parameter P is new to me. Is it the paper-name ?
P="/Library/Printers/PPDs/Contents/Resources/Xerox\ WorkCentre\ 7855.gz"
lpadmin -p "$PPD" -D "$INFO" -L "$LOC" -v "$URI" -P "$P" -o "$OP" -E;

Creating separate output file per input file

I'm using kofamscan by KEGG to annotate bunch of fasta files.I'm running this with multiple fasta files so whenever new file is being analyzed the output file is being overwritten. I really want separate output files per input file(i.e. a.fasta -> a.txt; b.fasta -> b.txt, etc.) and I have tried the following but it seems to be not working:
#!/bin/bash
#$ -S /bin/bash
#$ -cwd
#$ -pe def_slot 8
#$ -N coral_kofam
#$ -o stdout
#$ -e stderr
#$ -l os7
# perform kofam operation from file 1 to file 47
#$ -t 1-47:1
#$ -tc 10
#setting
source ~/.bash_profile
readarray -t files < kofam_files #input files
TASK_ID=$((SGE_TASK_ID - 1))
~/kofamscan/bin/exec_annotation -o kofam_out_[$TASK_ID].txt --tmp-dir $(mktemp -d) ${files[$TASK_ID]}
The following section of the code is where I need to change(obviously as it is not working for me now)
-o kofam_out_[$TASK_ID].txt
Could anybody help me how to make this work?
Do you want to name output file with $TASK_ID?
Just put file name like this kofam_out_${TASK_ID}.txt

Running executable file with additional options or arguments

I'm writing a bash script Test.sh that aims to execute anotherscript (a linux executable file):
#!/bin/bash -l
mp1='/my/path1/'
mp2='/my/path2/anotherscript'
for myfile in $mp1*.txt; do
echo "$myfile"
"$mp2 $myfile -m mymode"
echo "finished file"
done
Notice that anotherscript takes as arguments $myfile and options -m mymode.
But I get the file not found error (says Test.sh: line 8: /my.path2/anotherscript: No such file or directory).
My questions are:
I have followed this question to get my bash script to run the executable file. But I'm afraid I still get the error above.
Am I specifying arguments as they should to execute the file?
I suggest you use
sh -c "$mp2 $myfile -m mymode"
instead of just
"$mp2 $myfile -m mymode"
#!/bin/bash -l
dir=`find /my/path1/ -name "*.txt"`
mp2='/my/path2/anotherscript'
for myfile in "$dir"; do
echo "$myfile"
"$mp2" "$myfile" -m mymode
echo "finished file"
done
Make sure anotherscript has execution right (chmod +x anotherscript).

RNA-seq STAR alignment error in reading fastq files

I am writing a script to use the STAR aligner to map fastq files to a reference genome. Here is my code:
#!/bin/bash
#$ -N DT_STAR
#$ -l mem_free=200G
#$ -pe openmp 8
#$ -q bio,abio,pub8i
module load STAR/2.5.2a
cd /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017
mkdir David_data1
STAR --genomeDir /dfs1/bio/dtatarak/indexes/STAR_Index --readFilesIn /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read2.fastq --runThreadN 8 --outFileNamePrefix "David_data1/DT_1"`
I keep getting this error message
EXITING because of fatal input ERROR: could not open readFilesIn=/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
Does anyone have experience using STAR? I cannot figure out why it isn't able to open my read files.
The second space character between STAR and --genomeDir is a syntax error. There should be only one.
Another thing is the argument --outFileNamePrefix "David_data1/DT_1"
Are you sure, that it takes a path, which is in quotes? Also you have to create the directory DT_1 within David_data1 first, if you didn't do so already manually. Also there always have to be a / in front of the paths.
--outFileNamePrefix /David_data1/DT_1/
Besides, are there any subdirectories in your STAR_Index folder? Because I always have to set the genomDir argument like this:
--genomeDir path/to/STAR_index/STARindex/hg38/
The message is known to come up, after syntax errors, so I hope it works, if you try it something like this:
#!/bin/bash
#$ -N DT_STAR
#$ -l mem_free=200G
#$ -pe openmp 8
#$ -q bio,abio,pub8i
module load STAR/2.5.2a
cd /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017
mkdir David_data1
cd David_data1
mkdir DT_1
cd ..
STAR --genomeDir /dfs1/bio/dtatarak/indexes/STAR_Index --readFilesIn /dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read1.fastq
/dfs1/bio/dtatarak/DT-advancement_RNAseq_stuff/RNAseq_10_4_2017/DT_1_read2.fastq --runThreadN 8 --outFileNamePrefix /David_data1/DT_1/

How to save the output files in the corresponding folders

I have many allsamples.bam files in different folders and I want to extract unmapped reads from all of them and save it as unmapped.bam in the corresponding folders, how to do it? allbamfiles.txt contains the paths to all my bam files.
#!/usr/bin/env bash
#$ -q cluster
#$ -cwd
#$ -N test
#$ -e /path/to/log
#$ -o /path/to/log
#$ -l job_mem=8G
#$ -pe serial 4
SAMTOOLS="/path/to/samtools"
while IFS= read -r file
do
$SAMTOOLS view -b -f 4 $file > "${file%.bam}_unmapped.bam"
done < "/path/to/allbamfiles.txt"
wait
Assuming that the paths of all files in allbamfiles.txt are refered to the current directory or are absolute paths this solution should work.
Notice that the dirname command gets the path of the file and the basename command gets the file name.
SAMTOOLS="/path/to/samtools"
while read file; do
dir=$(dirname $file)
fileName=$(basename $file)
$SAMTOOLS view -b -f 4 $file > "${dir}/${fileName%.bam}_unmapped.bam"
done < "/path/to/allbamfiles.txt"

Resources