Bash: copying files with find, naming them with progressive numbers

Bash: copying files with find, naming them with progressive numbers - bash

I am trying to write a script to copy all files in a directory tree to another directory, using find command. However, some files have the same name as other. Since I am not interested in file names at all, I thought that the simplest solution would be to give to the copies progressive numbers as names.
I tried with this command:
i=0
find . -iname "*.jpg" -exec cp {} $DEST_DIR/$i ; i=$i+1;
however, this command obviously won't work, as -exec runs a subshell in which i variable is not defined.
Has anyone got some idea to do this, preferably with find? Is there any other better way to do it?

i=0; find . -iname "*.jpg" | while IFS= read -r f; do echo "$f" "$i"; i=$((i + 1)); done
... assuming there are no files with spaces in their name and such

Related

Bash script to concatenate text files with specific substrings in filenames

Within a certain directory I have many directories containing a bunch of text files. I’m trying to write a script that concatenates only those files in each directory that have the string ‘R1’ in their filename into one file within that specific directory, and those that have ‘R2’ in another . This is what I wrote but it’s not working.
#!/bin/bash
for f in */*.fastq; do
if grep 'R1' $f ; then
cat "$f" >> R1.fastq
fi
if grep 'R2' $f ; then
cat "$f" >> R2.fastq
fi
done
I get no errors and the files are created as intended but they are empty files. Can anyone tell me what I’m doing wrong?
Thank you all for the fast and detailed responses! I think I wasn't very clear in my question, but I need the script to only concatenate the files within each specific directory so that each directory has a new file ( R1 and R2). I tried doing
cat /*R1*.fastq >*/R1.fastq
but it gave me an ambiguous redirect error. I also tried Charles Duffy's for loop but looping through the directories and doing a nested loop to run though each file within a directory like so
for f in */; do
for d in "$f"/*.fastq;do
case "$d" in
*R1*) cat "$d" >&3
*R2*) cat "$d" >&4
esac
done 3>R1.fastq 4>R2.fastq
done
but it was giving an unexpected token error regarding ')'.
Sorry in advance if I'm missing something elementary, I'm still very new to bash.

A Note To The Reader
Please review edit history on the question in considering this answer; several parts have been made less relevant by question edits.
One cat Per Output File
For the purpose at hand, you can probably just let shell globbing do all the work (if R1 or R2 will be in the filenames, as opposed to the directory names):
set -x # log what's happening!
cat */*R1*.fastq >R1.fastq
cat */*R2*.fastq >R2.fastq
One find Per Output File
If it's a really large number of files, by contrast, you might need find:
find . -mindepth 2 -maxdepth 2 -type f -name '*R1*.fastq' -exec cat '{}' + >R1.fastq
find . -mindepth 2 -maxdepth 2 -type f -name '*R2*.fastq' -exec cat '{}' + >R2.fastq
...this is because of the OS-dependent limit on command-line length; the find command given above will put as many arguments onto each cat command as possible for efficiency, but will still split them up into multiple invocations where otherwise the limit would be exceeded.
Iterate-And-Test
If you really do want to iterate over everything, and then test the names, consider a case statement for the job, which is much more efficient than using grep to check just one line:
for f in */*.fastq; do
case $f in
*R1*) cat "$f" >&3
*R2*) cat "$f" >&4
esac
done 3>R1.fastq 4>R2.fastq
Note the use of file descriptors 3 and 4 to write to R1.fastq and R2.fastq respectively -- that way we're only opening the output files once (and thus truncating them exactly once) when the for loop starts, and reusing those file descriptors rather than re-opening the output files at the beginning of each cat. (That said, running cat once per file -- which find -exec {} + avoids -- is probably more overhead on balance).
Operating Per-Directory
All of the above can be updated to work on a per-directory basis quite trivially. For example:
for d in */; do
find "$d" -name R1.fastq -prune -o -name '*R1*.fastq' -exec cat '{}' + >"$d/R1.fastq"
find "$d" -name R2.fastq -prune -o -name '*R2*.fastq' -exec cat '{}' + >"$d/R2.fastq"
done
There are only two significant changes:
We're no longer specifying -mindepth, to ensure that our input files only come from subdirectories.
We're excluding R1.fastq and R2.fastq from our input files, so we never try to use the same file as both input and output. This is a consequence of the prior change: Previously, our output files couldn't be considered as input because they didn't meet the minimum depth.

Your grep is searching the file contents instead of file name. You could rewrite it this way:
for f in */*.fastq; do
[[ -f $f ]] || continue
if [[ $f = *R1* ]]; then
cat "$f" >> R1.fastq
elif [[ $f = *R2* ]]; then
cat "$f" >> R2.fastq
fi
done

Find in a forloop might suit this:
for i in R1 R2
do
find . -type f -name "*${i}*" -exec cat '{}' + >"$i.txt"
done

Go into every subdirectory and mass rename files by stripping leading characters

From the current directory I have multiple sub directories:
subdir1/
001myfile001A.txt
002myfile002A.txt
subdir2/
001myfile001B.txt
002myfile002B.txt
where I want to strip every character from the filenames before myfile so I end up with
subdir1/
myfile001A.txt
myfile002A.txt
subdir2/
myfile001B.txt
myfile002B.txt
I have some code to do this...
#!/bin/bash
for d in `find . -type d -maxdepth 1`; do
cd "$d"
for f in `find . "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's/^.*myfile/myfile/')"
done
done
however the newly renamed files end up in the parent directory
i.e.
myfile001A.txt
myfile002A.txt
myfile001B.txt
myfile002B.txt
subdir1/
subdir2/
In which the sub-directories are now empty.
How do I alter my script to rename the files and keep them in their respective sub-directories? As you can see the first loop changes directory to the sub directory so not sure why the files end up getting sent up a directory...

Your script has multiple problems. In the first place, your outer find command doesn't do quite what you expect: it outputs not only each of the subdirectories, but also the search root, ., which is itself a directory. You could have discovered this by running the command manually, among other ways. You don't really need to use find for this, but supposing that you do use it, this would be better:
for d in $(find * -maxdepth 0 -type d); do
Moreover, . is the first result of your original find command, and your problems continue there. Your initial cd is without meaningful effect, because you're just changing to the same directory you're already in. The find command in the inner loop is rooted there, and descends into both subdirectories. The path information for each file you choose to rename is therefore stripped by sed, which is why the results end up in the initial working directory (./subdir1/001myfile001A.txt --> myfile001A.txt). By the time you process the subdirectories, there are no files left in them to rename.
But that's not all: the find command in your inner loop is incorrect. Because you do not specify an option before it, find interprets "*.txt" as designating a second search root, in addition to .. You presumably wanted to use -name "*.txt" to filter the find results; without it, find outputs the name of every file in the tree. Presumably you're suppressing or ignoring the error messages that result.
But supposing that your subdirectories have no subdirectories of their own, as shown, and that you aren't concerned with dotfiles, even this corrected version ...
for f in `find . -name "*.txt"`;
... is an awfully heavyweight way of saying this ...
for f in *.txt;
... or even this ...
for f in *?myfile*.txt;
... the latter of which will avoid attempts to rename any files whose names do not, in fact, change.
Furthermore, launching a sed process for each file name is pretty wasteful and expensive when you could just use bash's built-in substitution feature:
mv "$f" "${f/#*myfile/myfile}"
And you will find also that your working directory gets messed up. The working directory is a characteristic of the overall shell environment, so it does not automatically reset on each loop iteration. You'll need to handle that manually in some way. pushd / popd would do that, as would running the outer loop's body in a subshell.
Overall, this will do the trick:
#!/bin/bash
for d in $(find * -maxdepth 0 -type d); do
pushd "$d"
for f in *.txt; do
mv "$f" "${f/#*myfile/myfile}"
done
popd
done

You can do it without find and sed:
$ for f in */*.txt; do echo mv "$f" "${f/\/*myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
If you remove the echo, it'll actually rename the files.
This uses shell parameter expansion to replace a slash and anything up to myfile with just a slash and myfile.
Notice that this breaks if there is more than one level of subdirectories. In that case, you could use extended pattern matching (enabled with shopt -s extglob) and the globstar shell option (shopt -s globstar):
$ for f in **/*.txt; do echo mv "$f" "${f/\/*([!\/])myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir1/subdir3/001myfile001A.txt subdir1/subdir3/myfile001A.txt
mv subdir1/subdir3/002myfile002A.txt subdir1/subdir3/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
This uses the *([!\/]) pattern ("zero or more characters that are not a forward slash"). The slash has to be escaped in the bracket expression because we're still inside of the pattern part of the ${parameter/pattern/string} expansion.

Maybe you want to use the following command instead:
rename 's#(.*/).*(myfile.*)#$1$2#' subdir*/*
You can use rename -n ... to check the outcome without actually renaming anything.
Regarding your actual question:
The find command from the outer loop returns 3 (!) directories:
.
./subdir1
./subdir2
The unwanted . is the reason why all files end up in the parent directory (that is .). You can exclude . by using the option -mindepth 1.
Unfortunately, this was onyl the reason for the files landing in the wrong place, but not the only problem. Since you already accepted one of the answers, there is no need to list them all.

a slight modification should fix your problem:
#!/bin/bash
for f in `find . -maxdepth 2 -name "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's,[^/]+(myfile),\1,')"
done
note: this sed uses , instead of / as the delimiter.
however, there are much faster ways.
here is with the rename utility, available or easily installed wherever there is bash and perl:
find . -maxdepth 2 -name "*.txt" | rename 's,[^/]+(myfile),/$1,'
here are tests on 1000 files:
for `find`; do mv 9.176s
rename 0.099s
that's 100x as fast.
John Bollinger's accepted answer is twice as fast as the OPs, but 50x as slow as this rename solution:
for|for|mv "$f" "${f//}" 4.316s
also, it won't work if there is a directory with too many items for a shell glob. likewise any answers that use for f in *.txt or for f in */*.txt or find * or rename ... subdir*/*. answers that begin with find ., on the other hand, will also work on directories with any number of items.

find and verify contents in subfolders

I have a directory that contains subdirectores each containing a particular script and supporting files. I need to verify that the proper files are in place in each of these directories. These directories can change at any time, so I'd like to use bash (I think) and store the following command which returns proper subdirectores in an array
find . -maxdepth 1 -type d -not -name home -not -name lhome -print
and then verify that each of those directories contains the proper files:
file1 file2 file3.sh file4.conf
If it a particular directory does not contain those files, I need to know which directory is the issue and which files are missing. What is the best/proper way to achieve that goal? Maybe bash is the wrong tool and perl or something would be better?

There may be a more integrated way, but here's my shot at it :
while read -rd '' directory; do
files=("file1" "file2" "$directory.sh" "$directory.conf")
for file in "${files[#]}"; do
if [ ! -e "$directory/$file" ]; then
echo "$directory is missing $file"
fi
done
done < <(find . -maxdepth 1 -type d -not -name home -not -name lhome -print0)
Note that this find also returns the current directory. If you wish to avoid that, you might want to add a -mindepth 1 option.
Also to make it into a script, you might want to replace the find kocation . by $1 so you can specify the target more flexibly.

I think something like this might work:
shopt -s nullglob extglob
diff <(while IFS= read -r f; do printf "%s\n" "$f/"{file1,file2,file3.sh,file4.conf}; done < <(printf "%s\n" !(home|lhome))) \
<(printf "%s\n" !(home|lhome)/{file1,file2,file3.sh,file4.conf})
Basically what is happening is that a list of all possible files is generated by the while loop, something like:
c/file1
c/file2
c/file3.sh
c/file4.conf
d/file1
d/file2
d/file3.sh
d/file4.conf
Then another list is generated with the existing files:
c/file1
c/file2
Now all that is missing is to compare the two lists to find the differences:
2,5d1
< c/file2
< c/file3.sh
< c/file4.conf
< d/file1
7,8d2
< d/file3.sh
< d/file4.conf
As you can see this have some serious drawbacks, for one the list of expected files is written twice. And each list is stored in memory which would cause problems if many directories is present.

Doing something to all files in an entire tree

The scenario is that I want to convert all of my music files from .mp3 to .ogg. They are in a folder called "Music". In this folder there are folders and files. The files are .mp3s. The directories may contain .mp3s or directories which further contain .mp3s or directories, and so on. This is because some artists have albums which have parts and some do not, etc.
I want to write a script that converts each file using avconv.
Basically, what I am going to do is manually cd into every directory and run the following:
for file in $(ls); do avconv -i $file `echo \`basename $file .mp3\`.ogg`; done
This successfully gets me what I want. However, this is not great as I have a lot of folders, and manually going into each of them and executing this is slow.
My question, then, is how do I write a script that runs this in any directory that has .mp3s, and then goes into any subdirectory it finds and recursively calls itself? My intuition tells me to use Perl or Python because of the complex nature of this.
Thanks for any suggestions!

I'm not familiar with avconv but assuming your command is:
avconv -i inputname outputname
And you want to convert all inputname.mp3 to inputname.ogg in their original directories below Music, then the following should work in bash:
#!/bin/bash
while read -r fname; do
avconv -i "$fname" "${fname%.mp3}.ogg"
done < <(find /path/to/Music -type f -name "*.mp3")
Note: this does not remove the original .mp3, and the space between < < is required. Also note, for file in $(ls) is filled with potential for errors.

You can do it with bash in one liner:
First you find all files (of type file (-type f) ) that match next pattern "*.mp3". To read each one you use 'while' and invoke avconf.
For exchange extension I prefer 'sed' command, that keep folder so you don't need the 'cd' command.
Notice that you must put quotes on $FN variable because it can contain spaces.
find -type f -iname "*.mp3" | while read "FN" ; do avconf -i "$FN" $(echo "$FN" | sed 's/\.mp3/\.ogg/g') ; done

find <music-folder> -type f -name '*.mp3' | \
xargs -I{} bash -c 'mp3="$0"; ogg="${mp3%.mp3}.ogg"; avconv -i "$mp3" "$ogg";' {}
This should survive in cases of "weird" filenames with spaces, quotes and other strange symbols within.

You can list directories with absolute paths and recursively cd into every directory using find $PWD -type d syntax:
Just inside from Music directory run:
for d in $(find $PWD -type d)
do
cd $d
for file in $(find . -maxdepth 1 -type f)
do
echo $file
avconv -i $file `echo \`basename $file .mp3\`.ogg`
done
done

Globbing for only files in Bash

I'm having a bit of trouble with globs in Bash. For example:
echo *
This prints out all of the files and folders in the current directory.
e.g. (file1 file2 folder1 folder2)
echo */
This prints out all of the folders with a / after the name.
e.g. (folder1/ folder2/)
How can I glob for just the files?
e.g. (file1 file2)
I know it could be done by parsing ls but also know that it is a bad idea. I tried using extended blobbing but couldn't get that to work either.

WIthout using any external utility you can try for loop with glob support:
for i in *; do [ -f "$i" ] && echo "$i"; done

I don't know if you can solve this with globbing, but you can certainly solve it with find:
find . -type f -maxdepth 1

You can do what you want in bash like this:
shopt -s extglob
echo !(*/)
But note that what this actually does is match "not directory-likes."
It will still match dangling symlinks, symlinks pointing to not-directories, device nodes, fifos, etc.
It won't match symlinks pointing to directories, though.
If you want to iterate over normal files and nothing more, use find -maxdepth 1 -type f.
The safe and robust way to use it goes like this:
find -maxdepth 1 -type f -print0 | while read -d $'\0' file; do
printf "%s\n" "$file"
done

My go to in this scenario is to use the find command. I just had to use it, to find/replace dozens of instances in a given directory. I'm sure there are many other ways of skinning this cat, but the pure for example above, isn't recursive.
for file in $( find path/to/dir -type f -name '*.js' );
do sed -ie 's#FIND#REPLACEMENT#g' "$file";
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio