bash script to execute a program sequentially - bash

I have a problem with a bash script I am trying to use. I have a directory with 1000s of files and I want to run a command sequentially using each file. However, each file is paired with another, e.g File1.sam, File1.gz, File2.sam, File2.gz etc.. and the command I am executing requires that I use both files from a pair as arguments. I have been using something similar to the command below when only a single argument was required, and I thought (wrongly) that I could just simply extend it like below.
shopt -s nullglob
for myfile1 in *.sam && for myfile2 in *.gz
do
./bwa samse -r "#RG\tID:$myfile1\tLB:$myfile1\tSM:$myfile1\tPL:ILLUMINA" lope_V1.2.fasta $myfile1 $myfile2 > $myfile1.sam2 2>$myfile1.log
done
Anyone know how I can modify this or point me in the direction of another way of doing it?

Why not generate the second filename, e.g. replace .sam with .gz
for myfile1 in *.sam ; do
myfile2="${myfile1%.sam}.gz"
[ -e "$myfile2" ] || continue
./bwa samse -r "#RG\tID:$myfile1\tLB:$myfile1\tSM:$myfile1\tPL:ILLUMINA" lope_V1.2.fasta "$myfile1" "$myfile2" > "$saiFile".sam 2>"$saiFile".log
done

shopt -s nullglob
for myfile1 in *.sam
do
myfile2=$(echo $myfile1|sed s/.sam$/.gz/)
./bwa samse -r "#RG\tID:$myfile1\tLB:$myfile1\tSM:$myfile1\tPL:ILLUMINA" lope_V1.2.fasta $myfile1 $myfile2 > $saiFile.sam 2>$saiFile.log
done

Iterate only over files with one of the extensions (for instance *.gz) and use for instance sed to get the matching .sam file.
Something like this:
for myfile1 in *.sam
do
sam_name=`echo $myfile | sed -e s#gz\\$#sam#`
./bwa samse -r "#RG\tID:$myfile1\tLB:$myfile1\tSM:$myfile1\tPL:ILLUMINA" lope_V1.2.fasta $myfile1 $myfile2 > $saiFile.sam 2>$saiFile.log
done

Change your for loop using one of the file extensions and calculate the other file name. For example:
for p in a b c; do touch $p.1 $p.2; done
for f in *.1; do g=${f%%.}.2; echo $f $g; done
This displays:
a.1 a.2
b.1 b.2
c.1 c.2

Related

Listing files with spaces in name

Problem
In some directory, I have some files with spaces (or maybe some special character) in their filenames.
Trying to list one file per line, I used ls -1 but files with spaces in name are not processed as I expected.
Example
I have these three files:
$ ls -1
My file 1.zip
My file 2.zip
My file 3.zip
and I want to list and do something with them, so I use a loop like this:
for i in `ls -1 My*.zip`; do
# statements
echo $i;
# Do something with each file;
done
But like split names with spaces:
My
file
1.zip
My
file
2.zip
My
file
3.zip
Question
How can I solve this?, Is there some alternative in shell?
Don't use output from ls, use:
for f in *.zip; do
echo "processing $f"
done
By not using ls, and by quoting properly.
for i in My*.zip
do
echo "$i"
done
shopt -s dotglob
for i in *;
do :; # work on "$i"
done
shopt -u dotglob
By default, files that begin with a dot are special in that * will not match them.

Shell script Issues and Errors when tested in school's program

Files created in 'testdir':
file1 file2.old file3old file4.old
Execution of 'oldfiles2 testdir':
Files in 'testdir' after 'oldfiles2' was run:
file1.old file2.old file3old.old file4.old
Error: 'for' does not seem to loop only through required filenames
Please hit to continue with the Assignment
Is the error I am hitting with a script running for school,
Here is the script below
#!/bin/bash
shopt -s extglob nullglob
dir=$1
for file in "$dir"/!(*.old)
do
[[ $file == *.old ]] || mv -- "$file" "$file.old"
done
The assignment was written by someone who doesn't know bash well. Your approach is way better.
Instead of grepping ls, you can use extglob (and also nullglob in case there are no matches):
#!/bin/bash
shopt -s extglob nullglob
for file in "$dir"/!(*.old)
do
mv -- "$file" "$file.old"
done
As demonstrated by your test validator's output, it works perfectly:
file1 does not end in .old, and so it's renamed to file1.old
file2.old ends in .old, and is not renamed.
file3old does not end in .old (old != .old), and is renamed.
file4.old ends in .old, and is not renamed.
However, the validator refuses to accept it, indicating that the validator is wrong. A common mistake for people who don't know bash well (like your professor) is to use grep -v .old or grep -v '.old$', which doesn't actually check if files end .old because . means "any character".
We can emulate this bug in the script:
#!/bin/bash
shopt -s extglob nullglob
for file in "$dir"/!(*?old*)
do
mv -- "$file" "$file.old"
done
This code is objectively wrong, but may pass the incorrect validator. Alternatively, "$dir"/!(*?old) will emulate a buggy grep anchored to the end of the line.
If I read correctly what your teacher wants, then here is a one liner using grep -v and no if statement. You can block it out in the script or leave it as a one liner.
ls | grep -v '\.old' | while read FILE; do mv "${FILE}" "${FILE}.old"; done
BTW I've tested this and it works because the "." in '\.old' is a dot (or period) and not "any character" because it's escaped with a backslash.
Here is sample output from Terminal
System1:test 123$ ls -1
file name 1
file name 2
file name.old
file.old
file1
file2
System1:test 123$ ls | grep -v '\.old' | while read FILE; do mv "${FILE}" "${FILE}.old"; done
System1:test 123$ ls -1
file name 1.old
file name 2.old
file name.old
file.old
file1.old
file2.old
System1:test 123$
Try:
#!/bin/bash
for filename in $(ls $1 | grep -v "\.old$")
do
mv $1/$filename $1/$filename.old
done
In Bash you can use character classes beginning with the inversion character ^ or ! to match all characters except the listed character. In your case:
for file in "$dir"/*.[^o][^l][^d]*; do
[ "$file" = *.old ] || mv -- "$file" "$file.old"
done
That will locate all files in $dir that do NOT have and .old extension and move the file to $file.old. For a case insensitive version:
for file in "$dir"/*.[^oO][^lL][^dD]*; do
You can use the bash [[ operator for the [[ "$file" == *.old ]] test as well, but it is less portable in practice. (character classes are also not portable). Unless a file starts potentially starts with -, there isn't any reason to include -- following mv (but it doesn't hurt either).

Bash: Why mv won't move files under this for-loop?

Using a Bash script, I'd like to move a list of files by using a for-loop, not a while-loop (for testing purpose). Can anyone explain to me why mv always acts as file rename rather than file move under this for loop? How can I fix it to move the list of files?
The following works:
for file in "/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
do
mv "$file" "/Volumes/HDD2/"
done
UPDATE#1:
However, suppose that I have a sample_pathname.txt
cat sample_pathname.txt
"/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
Why the following for-loop will not work then?
array=$(cat sample_path2.txt)
for file in "${array[#]}"
do
mv "$file" "/Volumes/HDD2/"
done
Thanks.
System: OS X
Bash version: 3.2.53(1)
cat sample_pathname.txt
"/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
The quotation marks here are the problem. Unless you need to cope with file names with newlines in them, the simple and standard way to do this is to list one file name per line, with no quotes or other metainformation.
vbvntv$ cat sample_pathname_fixed.txt
/Volumes/HDD1/001.jpg
/Volumes/HDD1/002.jpg
vbvntv$ while read -r file; do
> mv "$file" "/Volumes/HDD2/"
> done <sample_pathname_fixed.txt
In fact, you could even
xargs mv -t /Volumes/HDD2 <sample_pathname_fixed.txt
(somewhat depending on how braindead your xargs is).
The syntax used in your example will not create an array... It is just storing the file contents in a variable named array.
IFS=$'\n' array=$(cat sample_path2.txt)
If you have a text file containing filenames (each on separate line would be simplest), you can load it into an array and iterate over it as follows. Note the use of $(< file ) as a better alternative to cat and the parenthesis to initialize the contents into an array. Each line of the file corresponds to an index.
array=($(< file_list.txt ))
for file in "${array[#]}"; do
mv "$file" "/absolute/path"
done
Update: Your IFS was probably not set correctly if the command at the top of the post didn't work. I updated it to reflect that. Also, there are a couple of other reliable ways to initialize an array from a file. But like you mentioned, if you are just piping the file directly into a while loop, you may not need it.
This is a shell builtin in Bash 4+ and a synonym of mapfile. This works great if its available.
readarray -t array < file
The 'read' command can also initialize an array for you:
IFS=$'\n' read -d '' -r -a array < file
use this:
for file in "/Volumes/HDD1/001.jpg" "/Volumes/HDD1/002.jpg"
do
f=$(basename $file)
mv "$file" "/Volumes/HDD2/$f"
done

bash scripting and conditional statements

I am trying to run a simple bash script but I am struggling on how to incoperate a condition. any pointers. the loop says. I would like to incoperate a conditions such that when gdalinfo cannot open the image it copies that particular file to another location.
for file in `cat path.txt`; do gdalinfo $file;done
works fine in opening the images and also shows which ones cannot be opened.
the wrong code is
for file in `cat path.txt`; do gdalinfo $file && echo $file; else cp $file /data/temp
Again, and again and again - zilion th again...
Don't use contsructions like
for file in `cat path.txt`
or
for file in `find .....`
for file in `any command what produces filenames`
Because the code will BREAK immediatelly, when the filename or path contains space. Never use it for any command what produces filenames. Bad practice. Very Bad. It is incorrect, mistaken, erroneous, inaccurate, inexact, imprecise, faulty, WRONG.
The correct form is:
for file in some/* #if want/can use filenames directly from the filesystem
or
find . -print0 | while IFS= read -r -d '' file
or (if you sure than no filename contains a newline) can use
cat path.txt | while read -r file
but here the cat is useless, (really - command what only copies a file to STDOUT is useless). You should use instead
while read -r file
do
#whatever
done < path.txt
It is faster (doesn't fork a new process, as do in case of every pipe).
The above whiles will fill the corect filename into the variable file in cases when the filename contains a space too. The for will not. Period. Uff. Omg.
And use "$variable_with_filename" instead of pure $variable_with_filename for the same reason. If the filename contains a white-space any command will misunderstand it as two filenames. This probably not, what you want too..
So, enclose any shell variable what contain a filename with double quotes. (not only filename, but anything what can contain a space). "$variable" is correct.
If i understand right, you want copy files to /data/temp when the gdalinfo returns error.
while read -r file
do
gdalinfo "$file" || cp "$file" /data/temp
done < path.txt
Nice, short and safe (at least if your path.txt really contains one filename per line).
And maybe, you want use your script more times, therefore dont out the filename inside, but save the script in a form
while read -r file
do
gdalinfo "$file" || cp "$file" /data/temp
done
and use it like:
mygdalinfo < path.txt
more universal...
and maybe, you want only show the filenames for what gdalinfo returns error
while read -r file
do
gdalinfo "$file" || printf "$file\n"
done
and if you change the printf "$file\n" to printf "$file\0" you can use the script in a pipe safely, so:
while read -r file
do
gdalinfo "$file" || printf "$file\0"
done
and use it for example as:
mygdalinfo < path.txt | xargs -0 -J% mv % /tmp/somewhere
Howgh.
You can say:
for file in `cat path.txt`; do gdalinfo $file || cp $file /data/temp; done
This would copy the file to /data/temp if gdalinfo cannot open the image.
If you want to print the filename in addition to copying it in case of failure, say:
for file in `cat path.txt`; do gdalinfo $file || (echo $file && cp $file /data/temp); done

Performance with bash loop when renaming files

Sometimes I need to rename some amount of files, such as add a prefix or remove something.
At first I wrote a python script. It works well, and I want a shell version. Therefore I wrote something like that:
$1 - which directory to list,
$2 - what pattern will be replacement,
$3 - replacement.
echo "usage: dir pattern replacement"
for fname in `ls $1`
do
newName=$(echo $fname | sed "s/^$2/$3/")
echo 'mv' "$1/$fname" "$1/$newName&&"
mv "$1/$fname" "$1/$newName"
done
It works but very slowly, probably because it needs to create a process (here sed and mv) and destroy it and create same process again just to have a different argument. Is that true? If so, how to avoid it, how can I get a faster version?
I thought to offer all processed files a name (using sed to process them at once), but it still needs mv in the loop.
Please tell me, how you guys do it? Thanks. If you find my question hard to understand please be patient, my English is not very good, sorry.
--- update ---
I am sorry for my description. My core question is: "IF we should use some command in loop, will that lower performance?" Because in for i in {1..100000}; do ls 1>/dev/null; done creating and destroying a process will take most of the time. So what I want is "Is there any way to reduce that cost?".
Thanks to kev and S.R.I for giving me a rename solution to rename files.
Every time you call an external binary (ls, sed, mv), bash has to fork itself to exec the command and that takes a big performance hit.
You can do everything you want to do in pure bash 4.X and only need to call mv
pat_rename(){
if [[ ! -d "$1" ]]; then
echo "Error: '$1' is not a valid directory"
return
fi
shopt -s globstar
cd "$1"
for file in **; do
echo "mv $file ${file//$2/$3}"
done
}
Simplest first. What's wrong with rename?
mkdir tstbin
for i in `seq 1 20`
do
touch tstbin/filename$i.txt
done
rename .txt .html tstbin/*.txt
Or are you using an older *nix machine?
To avoid re-executing sed on each file, you could instead setup two name streams, one original, and one transformed, then sip from the ends:
exec 3< <(ls)
exec 4< <(ls | sed 's/from/to/')
IFS=`echo`
while read -u3 orig && read -u4 to; do
mv "${orig}" "${to}";
done;
I think you can store all of file names into a file or string, and use awk and sed do it once instead of one by one.

Resources