I need to run a java program that merge multiple files with a *bam extension. the structure of the program is:
java -jar mergefiles.jar \
I=file1.bam \
I=file2.bam \
I=file3.bam \
O=output.bam
So, I am trying the run this program for all *bam files in a directory. Initially, I try to create a list with the names of the *bam files (filenames.txt)
file1.bam
file2.bam
file3.bam
and using the 'while' command, like:
while read -r line; do
java -jar MergeFiles.jar \
I=$line \
O=output.bam
done < filenames.txt
However, the program executed for each *bam file in the text file but not all together (merge only one file per time, and overwrite the output). So, how I can run the program to merge all *bam files recursively?
Also, there are other option in the bash (e.g. using a loop for) to solve this issue?
Thanks in advance.
In your question you specify that you would like to use all .bam files in a dir, so instead of creating a file with the filenames, you should probably use globbing instead. Here's an example:
#! /bin/bash
# set nullglob to be safe
shopt -s nullglob
# read the filenames into an array
files=( *.bam )
# check that files actually exist
if (( ${#files[#]} == 0 )); then echo "no files" && exit 1; fi
# expand the array with a replacement
java -jar MergeFiles.jar \
"${files[#]/#/I=}" \
O=output.bam
The problem with your current solution is that the while loop will only read one line at a time, calling the command on each line separately.
Related
I needed to convert several pnm image files to jpeg with pnmtojpeg. So I used this script, which I named 'pnm2jpg':
for f in *.pnm;
do pnmtojpeg -quality=85 "$f" > "${f%.pnm}.jpg";
done
This works very nicely. However, I would like to adapt it further so that it can be used for a single file as well.
In other words, if no files are specified in the command line, then process all the files.
$ pnm2jpg thisfile.pnm # Process only this file.
$ pnm2jpg # Process all pnm files in the current directory.
Your insight is greatly appreciated- Thank you.
Something like:
#!/bin/bash
if [[ -z "$1" ]]; then
for f in *.pnm; do
pnmtojpeg -quality=85 "$f" > "${f%.pnm}.jpg"
done
else
pnmtojpeg -quality=85 "$1" > "${1%.pnm}.jpg"
fi
If you execute pnm2jpg without an argument the if block is processed.
if you execute pnm2jpg thisfile.pnm the else block is processed.
Say I have a command, command.py, and it pairs together files, File_01_R1.fastq to File_01_R2.fastq. The command executed on a single pair looks like this:
command.py -f File_01_R1.fastq -r File_01_R2.fastq
I have many files however, each with a R1 and R2 version. How can I tell this command to go through every file I have, so it also executes
command.py -f File_02_R1.fastq -r File_02_R2.fastq
command.py -f File_03_R1.fastq -r File_03_R2.fastq
and so on.
You may use a simple parameter expansion:
for f in *_R1.fastq; do
echo command.py -f "$f" -r "${f%_R1.fastq}_R2.fastq"
done
This will just print out what's to be executed. Remove the echo if you're happy with the result.
# Loop over all R1.fastq files
for f in File_*_R1.fastq; do
# Replace R1 with R2 in the filename and run the command on both files.
command.py -f "$f" -r "${f/_R1./_R2.}"
done; unset -v f
As #gniourf_gniourf indicates in his comment my answer is slightly less safe than his in that it may match at an incorrect location in the filename (whereas his is anchored at the end).
I have many files' path, but I need to copy all files into other location /sample, and I want to copy files into different folders:
/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59329/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz
/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59329/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_2.fq.gz
/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59329/clean_111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz.total.info
I want to copy those files into AS34_59329 folder inside /sample
/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59328/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz
/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59328/111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_2.fq.gz
/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59328/clean_111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz.total.info
I want to copy those file into AS34_59328 folder inside /sample
I write codes to scp all file into /sample folder, but I don't know how to put each files into different sub-directory, like:
/ifshk5/BC_IP/PROJECT/T11073/T11073_RICekkR/Fq/AS34_59328/clean_111220_I631_FCC0E5EACXX_L4_RICwdsRSYHSD11-2-IPAAPEK-93_1.fq.gz.total.info
put into AS34_59328
#! /bin/bash
while read myline
do
for i in $myline
do
if [ -f $i]; then
#how to put different files into different sub-directory
scp -r $i xxx#191.168.174.43:/sample
fi
done
done < data.list
new changed part
#! /bin/bash
while read myline
do
for i in $myline
do
if [ -f $i ]
then
relname=$(echo $i | sed 's%\(/[^/][^/]*\)\{5\}/%%')
echo $relname
fi
done
done < /home/jesse/T11073_all_3254.fq.list
It appears you need to strip the leading 5 components of the pathname off the filename. Since you don't have spaces in your names (the way you're using for i in $myline precludes that possibility), you can use:
#! /bin/bash
while read myline
do
for i in $myline
do
if [ -f $i ]
then
relname=$(echo $i | sed 's%\(/[^/][^/]*\)\{5\}/%%')
scp -r $i xxx#191.168.174.43:/sample/$relname
fi
done
done < data.list
The regex is just a way of looking for a sequence of five sets of slash followed by one or more non-slashes plus one more slash and deleting them. Since slashes figure prominently in the search, I used % to mark the sections of the s/// operation instead.
For example, given the input:
/a/b/c/d/e/f/g
the output from the sed is:
f/g
Note that this code does not explicitly create directories on the remote machine; it just specifies where the file is to go. If you need to create them too, you will have to investigate ssh, probably, to run mkdir -p /sample/$(dirname $relname) on the remote machine (where the dirname operation can be run either locally or remotely).
Note that scp has a recursive copy mode (-r) which would simplify things considerably if you knew you needed to copy all the files from the local directory to the remote.
#!/bin/bash
for i in /home/xxx/sge_jobs_output/split_rCEU_results/*.rCEU.bed
do
intersectBed -a /home/xxx/sge_jobs_output/split_rCEU_results/$i.rCEU.bed -b /home/xxx/sge_jobs_output/split_NA12878_results/$i.NA12878.bed -f 0.90 -r > $i.overlap_90.bed
done
However I got the errors like:
Error: can't determine file type of '/home/xug/sge_jobs_output/split_NA12878_results//home/xug/sge_jobs_output/split_rCEU_results/chr4.rCEU.bed.NA12878.bed': No such file or directory
Seems the computer mixes the two .bed files together, and I don't know why.
thx
Your i has the format /home/xxx/sge_jobs_output/split_rCEU_results/whatever.rCEU.bed, and you insert it to the file name, which leads to the duplication. It's probably simplest to switch to the directory and use basename, like this:
pushd /home/xxx/sge_jobs_output/split_rCEU_results
for i in *.rCEU.bed
do
intersectBed -a $i -b ../../sge_jobs_output/split_NA12878_results/`basename $i .rCEU.bed`.NA12878.bed -f 0.90 -r > `basename $i .NA12878.bed`.overlap_90.bed
done
popd
Notice the use of basename, with which you can replace the extension of a file: If you have a file called filename.foo.bar, basename filename.foo.bar .foo.bar returns just filename.
Another one I can't find an answer for, and it feels like I've gone mad.
I have a BASH script using a for loop to run a complex command (many protein sequence alignments) on a lot of files (~5000). The loop produces statements that will execute when given alone (i.e. copy-pasted from the error message to the command prompt), but which return "no such file or directory" inside the loop. Script below; there are actually several more arguments but this includes some representative ones and the file arguments.
#!/bin/bash
# Pass directory with targets as FASTA sequences as argument.
# Arguments to psiblast
# Common
db=local/db/nr/nr
outfile="/mnt/scratch/psi-blast"
e=0.001
threads=8
itnum=5
pssm="/mnt/scratch/psi-blast/pssm."
pssm_txt="/mnt/scratch/psi-blast/pssm."
pseudo=0
pwa_inclusion=0.002
for i in ${1}/*
do
filename=$(basename $i)
"local/ncbi-blast-2.2.23+/bin/psiblast\
-query ${i}\
-db $db\
-out ${outfile}/${filename}.out\
-evalue $e\
-num_threads $threads\
-num_iterations $itnum\
-out_pssm ${pssm}$filename\
-out_ascii_pssm ${pssm_txt}${filename}.txt\
-pseudocount $pseudo\
-inclusion_ethresh $pwa_inclusion"
done
Running this scripts gives "<scriptname> line <last line before 'done'>: <attempted command> : No such file or directory. If I then paste the attempted command onto the prompt it will run.
Each of these commands takes a couple of minutes to run.
try without the quotes. and you forgot some slashes.
for i in ${1}/*
do
filename=$(basename $i)
local/ncbi-blast-2.2.23+/bin/psiblast \
-query "${i}" \
-db "$db" \
-out "${outfile}/${filename}.out" \
-evalue "$e" \
-num_threads "$threads" \
-num_iterations "$itnum" \
-out_pssm "${pssm}/$filename" \
-out_ascii_pssm "${pssm_txt}/${filename}.txt" \
-pseudocount "$pseudo" \
-inclusion_ethresh "$pwa_inclusion"
done
The behavior you're observing will occur if there are spaces in the filenames you're iterating over. For this reason, you'll want to properly quote your filenames, as in the following minimal example:
#!/bin/bash
for i in *
do
filename="$(basename "$i")"
command="ls -lah '$filename'"
echo "filename=$filename"
echo "Command = $command"
eval "$command"
done
Adding quotes to filenames will not help when using a for loop. To overcome this, I've always done something similar to the following example whenever I needed to loop over filenames:
ls -1 directory | { while read line; do echo $line; done; }