Globbing for only files in Bash - bash

I'm having a bit of trouble with globs in Bash. For example:
echo *
This prints out all of the files and folders in the current directory.
e.g. (file1 file2 folder1 folder2)
echo */
This prints out all of the folders with a / after the name.
e.g. (folder1/ folder2/)
How can I glob for just the files?
e.g. (file1 file2)
I know it could be done by parsing ls but also know that it is a bad idea. I tried using extended blobbing but couldn't get that to work either.

WIthout using any external utility you can try for loop with glob support:
for i in *; do [ -f "$i" ] && echo "$i"; done

I don't know if you can solve this with globbing, but you can certainly solve it with find:
find . -type f -maxdepth 1

You can do what you want in bash like this:
shopt -s extglob
echo !(*/)
But note that what this actually does is match "not directory-likes."
It will still match dangling symlinks, symlinks pointing to not-directories, device nodes, fifos, etc.
It won't match symlinks pointing to directories, though.
If you want to iterate over normal files and nothing more, use find -maxdepth 1 -type f.
The safe and robust way to use it goes like this:
find -maxdepth 1 -type f -print0 | while read -d $'\0' file; do
printf "%s\n" "$file"
done

My go to in this scenario is to use the find command. I just had to use it, to find/replace dozens of instances in a given directory. I'm sure there are many other ways of skinning this cat, but the pure for example above, isn't recursive.
for file in $( find path/to/dir -type f -name '*.js' );
do sed -ie 's#FIND#REPLACEMENT#g' "$file";
done

Related

Bash script to concatenate text files with specific substrings in filenames

Within a certain directory I have many directories containing a bunch of text files. I’m trying to write a script that concatenates only those files in each directory that have the string ‘R1’ in their filename into one file within that specific directory, and those that have ‘R2’ in another . This is what I wrote but it’s not working.
#!/bin/bash
for f in */*.fastq; do
if grep 'R1' $f ; then
cat "$f" >> R1.fastq
fi
if grep 'R2' $f ; then
cat "$f" >> R2.fastq
fi
done
I get no errors and the files are created as intended but they are empty files. Can anyone tell me what I’m doing wrong?
Thank you all for the fast and detailed responses! I think I wasn't very clear in my question, but I need the script to only concatenate the files within each specific directory so that each directory has a new file ( R1 and R2). I tried doing
cat /*R1*.fastq >*/R1.fastq
but it gave me an ambiguous redirect error. I also tried Charles Duffy's for loop but looping through the directories and doing a nested loop to run though each file within a directory like so
for f in */; do
for d in "$f"/*.fastq;do
case "$d" in
*R1*) cat "$d" >&3
*R2*) cat "$d" >&4
esac
done 3>R1.fastq 4>R2.fastq
done
but it was giving an unexpected token error regarding ')'.
Sorry in advance if I'm missing something elementary, I'm still very new to bash.
A Note To The Reader
Please review edit history on the question in considering this answer; several parts have been made less relevant by question edits.
One cat Per Output File
For the purpose at hand, you can probably just let shell globbing do all the work (if R1 or R2 will be in the filenames, as opposed to the directory names):
set -x # log what's happening!
cat */*R1*.fastq >R1.fastq
cat */*R2*.fastq >R2.fastq
One find Per Output File
If it's a really large number of files, by contrast, you might need find:
find . -mindepth 2 -maxdepth 2 -type f -name '*R1*.fastq' -exec cat '{}' + >R1.fastq
find . -mindepth 2 -maxdepth 2 -type f -name '*R2*.fastq' -exec cat '{}' + >R2.fastq
...this is because of the OS-dependent limit on command-line length; the find command given above will put as many arguments onto each cat command as possible for efficiency, but will still split them up into multiple invocations where otherwise the limit would be exceeded.
Iterate-And-Test
If you really do want to iterate over everything, and then test the names, consider a case statement for the job, which is much more efficient than using grep to check just one line:
for f in */*.fastq; do
case $f in
*R1*) cat "$f" >&3
*R2*) cat "$f" >&4
esac
done 3>R1.fastq 4>R2.fastq
Note the use of file descriptors 3 and 4 to write to R1.fastq and R2.fastq respectively -- that way we're only opening the output files once (and thus truncating them exactly once) when the for loop starts, and reusing those file descriptors rather than re-opening the output files at the beginning of each cat. (That said, running cat once per file -- which find -exec {} + avoids -- is probably more overhead on balance).
Operating Per-Directory
All of the above can be updated to work on a per-directory basis quite trivially. For example:
for d in */; do
find "$d" -name R1.fastq -prune -o -name '*R1*.fastq' -exec cat '{}' + >"$d/R1.fastq"
find "$d" -name R2.fastq -prune -o -name '*R2*.fastq' -exec cat '{}' + >"$d/R2.fastq"
done
There are only two significant changes:
We're no longer specifying -mindepth, to ensure that our input files only come from subdirectories.
We're excluding R1.fastq and R2.fastq from our input files, so we never try to use the same file as both input and output. This is a consequence of the prior change: Previously, our output files couldn't be considered as input because they didn't meet the minimum depth.
Your grep is searching the file contents instead of file name. You could rewrite it this way:
for f in */*.fastq; do
[[ -f $f ]] || continue
if [[ $f = *R1* ]]; then
cat "$f" >> R1.fastq
elif [[ $f = *R2* ]]; then
cat "$f" >> R2.fastq
fi
done
Find in a forloop might suit this:
for i in R1 R2
do
find . -type f -name "*${i}*" -exec cat '{}' + >"$i.txt"
done

Go into every subdirectory and mass rename files by stripping leading characters

From the current directory I have multiple sub directories:
subdir1/
001myfile001A.txt
002myfile002A.txt
subdir2/
001myfile001B.txt
002myfile002B.txt
where I want to strip every character from the filenames before myfile so I end up with
subdir1/
myfile001A.txt
myfile002A.txt
subdir2/
myfile001B.txt
myfile002B.txt
I have some code to do this...
#!/bin/bash
for d in `find . -type d -maxdepth 1`; do
cd "$d"
for f in `find . "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's/^.*myfile/myfile/')"
done
done
however the newly renamed files end up in the parent directory
i.e.
myfile001A.txt
myfile002A.txt
myfile001B.txt
myfile002B.txt
subdir1/
subdir2/
In which the sub-directories are now empty.
How do I alter my script to rename the files and keep them in their respective sub-directories? As you can see the first loop changes directory to the sub directory so not sure why the files end up getting sent up a directory...
Your script has multiple problems. In the first place, your outer find command doesn't do quite what you expect: it outputs not only each of the subdirectories, but also the search root, ., which is itself a directory. You could have discovered this by running the command manually, among other ways. You don't really need to use find for this, but supposing that you do use it, this would be better:
for d in $(find * -maxdepth 0 -type d); do
Moreover, . is the first result of your original find command, and your problems continue there. Your initial cd is without meaningful effect, because you're just changing to the same directory you're already in. The find command in the inner loop is rooted there, and descends into both subdirectories. The path information for each file you choose to rename is therefore stripped by sed, which is why the results end up in the initial working directory (./subdir1/001myfile001A.txt --> myfile001A.txt). By the time you process the subdirectories, there are no files left in them to rename.
But that's not all: the find command in your inner loop is incorrect. Because you do not specify an option before it, find interprets "*.txt" as designating a second search root, in addition to .. You presumably wanted to use -name "*.txt" to filter the find results; without it, find outputs the name of every file in the tree. Presumably you're suppressing or ignoring the error messages that result.
But supposing that your subdirectories have no subdirectories of their own, as shown, and that you aren't concerned with dotfiles, even this corrected version ...
for f in `find . -name "*.txt"`;
... is an awfully heavyweight way of saying this ...
for f in *.txt;
... or even this ...
for f in *?myfile*.txt;
... the latter of which will avoid attempts to rename any files whose names do not, in fact, change.
Furthermore, launching a sed process for each file name is pretty wasteful and expensive when you could just use bash's built-in substitution feature:
mv "$f" "${f/#*myfile/myfile}"
And you will find also that your working directory gets messed up. The working directory is a characteristic of the overall shell environment, so it does not automatically reset on each loop iteration. You'll need to handle that manually in some way. pushd / popd would do that, as would running the outer loop's body in a subshell.
Overall, this will do the trick:
#!/bin/bash
for d in $(find * -maxdepth 0 -type d); do
pushd "$d"
for f in *.txt; do
mv "$f" "${f/#*myfile/myfile}"
done
popd
done
You can do it without find and sed:
$ for f in */*.txt; do echo mv "$f" "${f/\/*myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
If you remove the echo, it'll actually rename the files.
This uses shell parameter expansion to replace a slash and anything up to myfile with just a slash and myfile.
Notice that this breaks if there is more than one level of subdirectories. In that case, you could use extended pattern matching (enabled with shopt -s extglob) and the globstar shell option (shopt -s globstar):
$ for f in **/*.txt; do echo mv "$f" "${f/\/*([!\/])myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir1/subdir3/001myfile001A.txt subdir1/subdir3/myfile001A.txt
mv subdir1/subdir3/002myfile002A.txt subdir1/subdir3/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
This uses the *([!\/]) pattern ("zero or more characters that are not a forward slash"). The slash has to be escaped in the bracket expression because we're still inside of the pattern part of the ${parameter/pattern/string} expansion.
Maybe you want to use the following command instead:
rename 's#(.*/).*(myfile.*)#$1$2#' subdir*/*
You can use rename -n ... to check the outcome without actually renaming anything.
Regarding your actual question:
The find command from the outer loop returns 3 (!) directories:
.
./subdir1
./subdir2
The unwanted . is the reason why all files end up in the parent directory (that is .). You can exclude . by using the option -mindepth 1.
Unfortunately, this was onyl the reason for the files landing in the wrong place, but not the only problem. Since you already accepted one of the answers, there is no need to list them all.
a slight modification should fix your problem:
#!/bin/bash
for f in `find . -maxdepth 2 -name "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's,[^/]+(myfile),\1,')"
done
note: this sed uses , instead of / as the delimiter.
however, there are much faster ways.
here is with the rename utility, available or easily installed wherever there is bash and perl:
find . -maxdepth 2 -name "*.txt" | rename 's,[^/]+(myfile),/$1,'
here are tests on 1000 files:
for `find`; do mv 9.176s
rename 0.099s
that's 100x as fast.
John Bollinger's accepted answer is twice as fast as the OPs, but 50x as slow as this rename solution:
for|for|mv "$f" "${f//}" 4.316s
also, it won't work if there is a directory with too many items for a shell glob. likewise any answers that use for f in *.txt or for f in */*.txt or find * or rename ... subdir*/*. answers that begin with find ., on the other hand, will also work on directories with any number of items.

find and verify contents in subfolders

I have a directory that contains subdirectores each containing a particular script and supporting files. I need to verify that the proper files are in place in each of these directories. These directories can change at any time, so I'd like to use bash (I think) and store the following command which returns proper subdirectores in an array
find . -maxdepth 1 -type d -not -name home -not -name lhome -print
and then verify that each of those directories contains the proper files:
file1 file2 file3.sh file4.conf
If it a particular directory does not contain those files, I need to know which directory is the issue and which files are missing. What is the best/proper way to achieve that goal? Maybe bash is the wrong tool and perl or something would be better?
There may be a more integrated way, but here's my shot at it :
while read -rd '' directory; do
files=("file1" "file2" "$directory.sh" "$directory.conf")
for file in "${files[#]}"; do
if [ ! -e "$directory/$file" ]; then
echo "$directory is missing $file"
fi
done
done < <(find . -maxdepth 1 -type d -not -name home -not -name lhome -print0)
Note that this find also returns the current directory. If you wish to avoid that, you might want to add a -mindepth 1 option.
Also to make it into a script, you might want to replace the find kocation . by $1 so you can specify the target more flexibly.
I think something like this might work:
shopt -s nullglob extglob
diff <(while IFS= read -r f; do printf "%s\n" "$f/"{file1,file2,file3.sh,file4.conf}; done < <(printf "%s\n" !(home|lhome))) \
<(printf "%s\n" !(home|lhome)/{file1,file2,file3.sh,file4.conf})
Basically what is happening is that a list of all possible files is generated by the while loop, something like:
c/file1
c/file2
c/file3.sh
c/file4.conf
d/file1
d/file2
d/file3.sh
d/file4.conf
Then another list is generated with the existing files:
c/file1
c/file2
Now all that is missing is to compare the two lists to find the differences:
2,5d1
< c/file2
< c/file3.sh
< c/file4.conf
< d/file1
7,8d2
< d/file3.sh
< d/file4.conf
As you can see this have some serious drawbacks, for one the list of expected files is written twice. And each list is stored in memory which would cause problems if many directories is present.

bash: how to change the basename only of a list of files [duplicate]

This question already has an answer here:
Closed 10 years ago.
Possible Duplicate:
makefile: how to add a prefix to the basename?
I have a lit of files (which I get from find bla -name "*.so") such as:
/bla/a1.so
/bla/a2.so
/bla/blo/a3.so
/bla/blo/a4.so
/bla/blo/bli/a5.so
and I want to rename them such as it becomes:
/bla/liba1.so
/bla/liba2.so
/bla/blo/liba3.so
/bla/blo/liba4.so
/bla/blo/bli/liba5.so
... i.e. add the prefix 'lib' to the basename
any idea on how to do that in bash ?
Something along the lines of:
for a in /bla/a1.so /bla/a2.so /bla/blo/a4.so
do
dn=$(dirname $a)
fn=$(basename $a)
mv "$a" "${dn}/lib${fn}"
done
should do it. You might want to add code to read the list of filenames from a file, rather than listing them verbatim in the script, of course.
find . -name "*.so" -printf "mv '%h/%f' '%h/lib%f'\n" | bash
The code will rename files in current directory and subdirectories to append "lib" in front of .so filenames.
No looping needed, as find already does its recursive work to list the files. The code builds the "mv" commands one by one and executes them. To see the "mv" commands without executing them, simply remove the piping to shell part "| bash".
find's printf command understands many variables which makes it pretty scalable. I only needed to use two here:
%h: directory
%f: filename
How to test it:
Run this first (will perform nothing yet, only print lines on the screen):
find . -name "*.so" -printf "mv '%h/%f' '%h/lib%f'\n" | less -S
This will show you all the commands that your script will execute. If you're satisfied with the result, simply execute it afterwards by piping it into bash instead of less.
find . -name "*.so" -printf "mv '%h/%f' '%h/lib%f'\n" | bash
while multiliner
A slightly more robust and generalized solution based on $nfm (maybe more than you really need) would be
while IFS= read -r -u3 -d $'\0' FILE; do
DIR=`dirname $FILE`;
FILENAME=`basename $FILE`;
mv $FILE ${DIR}/lib${FILENAME};
done 3< <(find bla -name *.so -print0 | sort -rz)
This is quite robust:
read -u3 and 3< does not interfere with stdin
-print0 + IFS= + -d $'/0' allows for newlines in filenames
sort -rz renames deeper paths first, so that you can even rename directories and the files inside them at once
find -execdir + rename
This would be perfect if it weren't for the PATH annoyances, see: Find multiple files and rename them in Linux
Try mmv:
cd /bla/
mmv "*.so" "lib#1.so"
(mmv "*" "lib#1" would also work but it's less safe).
If you don't have mmv installed, get it.
basename and dirname are your friends :)
You want something like this (excuse my bash syntax - it's a little rusty):
for FILE in `find bla -name *.so` do
DIR=`dirname $FILE`;
FILENAME=`basename $FILE`;
mv $FILE ${DIR}/lib${FILENAME};
done
Beaten to the punch!
Note I've commented out the mv command to prevent any accidental mayhem
for f in *
do
dir=`dirname "$f"`
fname=`basename "$f"`
new="$dir/lib$fname"
echo "new name is $new"
# only uncomment this if you know what you are doing
# mv "$f" "$new"
done

How to do something to every file in a directory using bash?

I started with this:
command *
But it doesn't work when the directory is empty; the * wildcard becomes a literal "*" character. So I switched to this:
for i in *; do
...
done
which works, but again, not if the directory is empty. I resorted to using ls:
for i in `ls -A`
but of course, then file names with spaces in them get split. I tried tacking on the -Q switch:
for i in `ls -AQ`
which causes the names to still be split, only with a quote character at the beginning and ending of the name. Am I missing something obvious here, or is this harder than it ought it be?
Assuming you only want to do something to files, the simple solution is to test if $i is a file:
for i in *
do
if test -f "$i"
then
echo "Doing somthing to $i"
fi
done
You should really always make such tests, because you almost certainly don't want to treat files and directories in the same way. Note the quotes around the "$i" which prevent problems with filenames containing spaces.
find could be what you want.
find . | while read file; do
# do something with $file
done
Or maybe like this:
find . -exec <command> {} \;
If you do not want the search to include subdirectories you might need to add a combination of -type f and -maxdepth 1 to the find command. See the find man page for details.
It depends whether you're going to type this at a command prompt, and which command you're applying to the files.
If it's typed you could go with your second choice and substitute something harmless for the command. I like to use echo instead of mv or rm, for example.
Put it all on one line:
for i in * ; do command $i; done
When that works - or you can see where it fails, and whether it's harmless, you can press up-arrow, edit the command and try again.
Use shopt to prevent expansion to *.txt
shopt -s nullglob
for myfile in *.txt
do
# your code here
echo $myfile
done
this should do the trick:
find -type d -print0 | xargs -n 1 -0 echo "your folder: {} !"
find -type f -print0 | xargs -n 1 -0 echo "your file: {} !"
the print0 / 0 are there to avoid problems with whitespace

Resources