Add file by (*) star character to variable in for loop - bash

I have a folder structure where two files are in a folder. The files have long names, yet are distinguished by R1 and R2. Note I am running this over many folders using the for loop but keeping it simple for this example. I am running a loop and am wonder how to correctly call the files with a (*) star character to autocomplete without having to type in all file name. My attempt is below:
#!/bin/bash
for item in Folder_Directory:
do
forward=$item/*R1*
reverse=$item/*R2*
bbmap.sh ref=reference.fna in1=$forward in2=$reverse outu=Unmapped.fasta
done
The output I am getting is an error because the variable is not identifying the desired file:
Error:
align2.BBMap build=1 overwrite=true fastareadlen=500 ref=reference.fna
in1=Folder_Dictory/*R1* in2=Folder_Dictory/*R2* outu=Folder_Dictory/Unmapped.fastq
In this example I could autocomplete the files, however, when I expand this loop to include multiple folders that is no longer ideal. Autocompleting using (*) characters was my first approach, any other suggestions or fixes to my issue are greatly appreciated.

The problem is that the shell sees in1=Folder_Dictory/*R1* and notices that there are no files which match the glob with the literal in1= prefix, and so the wildcard does not get expanded at all.
You probably want to evaluate the wildcard before passing it to the command, like for instance
for item in Folder_Directory:
do
forward=$item/*R1*
reverse=$item/*R2*
bbmap.sh ref=reference.fna in1="$(echo $forward)" in2="$(echo $reverse)" outu=Unmapped.fasta
done
This will of course still be erratic if the wildcard expands to more than one file.

If you want only two files from your folder_structure, then i believe it would be good to use find to search for the files and assign then into separate variables as per your requirement...don't see use of for loop here.
forward=$(find Folder_Directory -type f -name "*R1*")
reverse=$(find Folder_Directory -type f -name "*R2*")
bbmap.sh ref=reference.fna in1="$forward" in2="$reverse" outu=Unmapped.fasta

It works like this:
test=f*
$ echo $test
file
But
$ echo "$test"
f*
And
test2=$test
$ echo "$test" $test2
f* file
$ echo "$test" "$test2"
f* f*
To make it work, you have to do something like this:
test3="$(echo $test)"
$ echo "$test" "$test2" "$test3"
f* f* file

Related

Using brace expansion to move files on the command line

I have a question concerning why this doesn't work. Probably, it's a simple answer, but I just can't seem to figure it out.
I want to move a couple of files I have. They all have the same filename (let's say file1) but they are all in different directories (lets say /tmp/dir1,dir2 and dir3). If I were to move these individually I could do something along the lines of:
mv /tmp/dir1/file1 /tmp
That works. However, I have multiple directories and they're all going to end up in the same spot....AND I don't want to overwrite. So, I tried something like this:
mv /tmp/{dir1,dir2,dir3}/file1 /tmp/file1.{a,b,c}
When I try this I get:
/tmp/file1.c is not a directory
Just to clarify...this also works:
mv /tmp/dir1/file1 /tmp/file1.c
Pretty sure this has to do with brace expansion but not certain why.
Thanks
Just do echo to understand how the shell expands:
$ echo mv /tmp/{dir1,dir2,dir3}/file1 /tmp/file1.{a,b,c}
mv /tmp/dir1/file1 /tmp/dir2/file1 /tmp/dir3/file1 /tmp/file1.a /tmp/file1.b /tmp/file1.c
Now you can see that your command is not what you want, because in a mv command, the destination (directory or file) is the last argument.
That's unfortunately now how the shell expansion works.
You'll have to probably use an associative array.
!/bin/bash
declare -A MAP=( [dir1]=a [dir2]=b [dir3]=c )
for ext in "${!MAP[#]}"; do
echo mv "/tmp/$ext/file1" "/tmp/file1.${MAP[$ext]}"
done
You get the following output when you run it:
mv /tmp/dir2/file1 /tmp/file1.b
mv /tmp/dir3/file1 /tmp/file1.c
mv /tmp/dir1/file1 /tmp/file1.a
Like with many other languages key ordering is not guaranteed.
${!MAP[#]} returns an array of all the keys, while ${MAP[#]} returns the an array of all the values.
Your syntax of /tmp/{dir1,dir2,dir3}/file1 expands to /tmp/dir1/file /tmp/dir2/file /tmp/dir3/file. This is similar to the way the * expansion works. The shell does not execute your command with each possible combination, it simply executes the command but expands your one value to as many as are required.
Perhaps instead of a/b/c you could differentiate them with the actual number of the dir they came from?
$: for d in 1 2 3
do echo mv /tmp/dir$d/file1 /tmp/file1.$d
done
mv /tmp/dir1/file1 /tmp/file1.1
mv /tmp/dir2/file1 /tmp/file1.2
mv /tmp/dir3/file1 /tmp/file1.3
When happy with it, take out the echo.
A relevant point - brace expansion is not a wildcard. It has nothing to do with what's on disk. It just creates strings.
So, if you create a bunch of files named with single letters or digits, echo ? will wildcard and list them all, but only the ones actually present. If there are files for vowels but not consonants, only the vowels will show. But -
if you say echo {foo,bar,nope} it will output foo bar nope regardless of whether or not any or all of those exist as files or directories, etc.

how to loop over folders/directories using bash script?

i'm trying to count all the .txt files in the folders, the problem is that the main folder has more than one folder and inside everyone of them there are txt files , so in total i want to count the number of txt files . till now i've tried to build such a solution,but of course it's wrong:
#!/bin/bash
counter=0
for i in $(ls /Da) ; do
for j in $(ls i) ; do
$counter=$counter+1
done
done
echo $counter
the error i'm getting is :ls cannot access i ...
the problem is that i don't know how i'm supposed to build the inner for loop as it depends on the external for loop(schema) ?
This can work for you
find . -name "*.txt" | wc -l
In the first part find looks for the *.txt from this folder (.) and its subfolders. In the second part wc counts the returnes lines (-l) of find.
You want to avoid parsing ls and you want to quote your variables.
There is no need for repeated loops, either.
printf 'x\n' /Da/* /Da/*/* | wc -l
depending also on whether you expect the entries in /Da to be all files (in which case /Da/* will suffice), all directories (in which case /Da/*/* alone is enough), or both. Additionally, if you don't want to count directories at all, maybe switch to find /Da -type f -printf 'x\n' or similar.
There is no need to print the file names at all; this avoids getting the wrong result if a file name should ever contain a line feed (touch $'/Da/ick\npoo' to see this in action.)
More generally, a correct nested loop looks like
for i in list of things; do
for j in different items, perhaps involving "$i"; do
things with "$j" and perhaps also "$i"
done
done
i is a variable, so you need to reference it via $, i.e. the second loop should be
for j in $(ls "$i") ; do

bash - assigning output of find to variable while looping through list

I'm new to shell scripting. In bash, I'm trying to assign the output of find to a new variable while looping through a list.
for i in {25,27}; do
r1=$(find $i*R1_001.fastq.gz)
r2=$(find $i*R2_001.fastq.gz)
done
What I want to happen is for the compute to assign a file name to r1 and r2. For instance:
$ echo $r1
25-NVB206M02_S27_R1_001.fastq.gz
However, the computer interprets this as if the * is not a wildcard. I get an error that states:
find: `25*R1_001.fastq.gz': No such file or directory
Thank you for any advice you can provide.
If you want to use find you need something like:
r1=$(find . -name "${i}*R1_001.fastq.gz" | sed 's#^.*/##')
or, better, you can use:
r1="${i}*R1_001.fastq.gz"
Be aware that you may match 0 or more files with this. If multiple files match, for instance, if i=2 then r1 and r2 would match 2-NVB..., 20-NVB..., 21-NVB..., 200-NVB..., and so on. You probably are only expecting the variable to hold a single file name. Try to tighten up what you are matching on.
If no files match r1/r2 may still be not empty. Example:
(pi19 692) $ ls "*foo*"
ls: *foo*: No such file or directory
(pi19 693) $ r1="*foo*"
(pi19 694) $ echo $r1
*foo*

Using Wildcards with 'rename'

I have been using the rename command to batch rename files. Up to now, I have had files like:
2010.306.18.08.11.0000.BO.ADM..BHZ.SAC
2010.306.18.08.11.0000.BO.AMM..BHZ.SAC
2010.306.18.08.11.0000.BO.ASI..BHE.SAC
2010.306.18.08.11.0000.BO.ASI..BHZ.SAC
and using rename 2010.306.18.08.11.0000.BO. "" * and rename .. _. * I have reduced them to:
ADM_.BHZ.SAC
AMM_.BHZ.SAC
ASI_.BHE.SAC
ASI_.BHZ.SAC
which is exactly what I want. A bit clumsy, I guess, but it works. The problem occurs now that I have files like:
2010.306.18.06.12.8195.TW.MASB..BHE.SAC
2010.306.18.06.14.7695.TW.CHGB..BHN.SAC
2010.306.18.06.24.4195.TW.NNSB..BHZ.SAC
2010.306.18.06.25.0695.TW.SSLB..BHZ.SAC
which exist in the same folder. I have been trying to get the similar results to above using wildcards in the rename command eg. rename 2010.306.18.*.*.*.*. "" but this appends the first appearance of 2010.306.18.*.*.*.*. to the beginning of all the other files - clearly not what I'm after, such that I get:
2010.306.18.06.12.8195.TW.MASB..BHE.SAC
2010.306.18.06.12.8195.TW.MASB..BHE.SAC2010.306.18.06.14.7695.TW.CHGB..BHN.SAC
2010.306.18.06.12.8195.TW.MASB..BHE.SAC2010.306.18.06.24.4195.TW.NNSB..BHZ.SAC
2010.306.18.06.12.8195.TW.MASB..BHE.SAC2010.306.18.06.25.0695.TW.SSLB..BHZ.SAC
I guess I am not understanding a fairly fundamental principal of wildcards here so, can someone please explain why this doesn't work and what I can do to get the desired result (preferably using rename).
N.B.
To clarify, the output wants to be:
ADM_.BHZ.SAC
AMM_.BHZ.SAC
ASI_.BHE.SAC
ASI_.BHZ.SAC
MASB.BHE.SAC
CHGB.BHN.SAC
NNSB.BHZ.SAC
SSLB.BHZ.SAC
You can try this first to see what commands would be executed
for f in *; do echo mv $f `echo $f | sed 's/2010.*.TW.//'` ; done
If it's what you expect, you can remove echo from the command to execute
for f in *; do mv $f `echo $f | sed 's/2010.*.TW.//'` ; done
rename does not allow wildcards in the from and to strings. When you run rename 2010.306.18.*.*.*.*. "" * it is actually your shell which first expands the wildcard and then passes the result of the expansion to rename, hence why it does not work.
Instead of using rename, use a loop as follows:
for file in *
do
tmp="${file##2010*TW.}" # remove the file prefix
mv "$file" "${tmp/../_}" # replace dots with underscore
done

Properly handle lists of files with whitespace in filename

I want to iterate over a list of files in Bash and perform some action. The problem: the file names may contain whitespace, which creates an obvious problem with wildcards or ls:
touch a\ b
FILES=* # or $(ls)
for FILE in $FILES; do echo $FILE; done
yields
a
b
Now, the conventional way to handle this is to use find … -print0 instead. However, this only works (well) in conjunction with xargs -0, not with Bash variables / loops.
My idea was to set $IFS to the null character to make this work. However, the comp.unix.shell seems to think that this is impossible in bash.
Bummer. Well, it’s theoretically possible to use another character, such as : (after all, $PATH uses this format, too):
IFS=$':'
FILES=$(find . -print0 | xargs -0 printf "%s:")
for FILE in $FILES; do echo $FILE; done
(The output is slightly different but fair enough.)
However, I can’t help but feel that this is clumsy and that there should be a more direct way of doing this. I’m looking for a more direct way of accomplishing this, preferably using wildcards or ls.
The best way to handle this is to store the file list as an array, rather than a string (and be sure to double-quote all variable substitutions):
files=(*)
for file in "${files[#]}"; do
echo "$file"
done
If you want to generate an array from find's output (e.g. if you need to search recursively), see this previous answer.
Exactly what you have in the first example works fine for me in Msys Bash, Cygwin and on my Fedora box:
FILES=*
for FILE in $FILES
do
echo $FILE
done
Its very important to preceed
IFS=""
otherwise files with two directly following spaces will not be found

Resources