Trouble iterating through all files in directory

Trouble iterating through all files in directory - bash

Part of my Bash script's intended function is to accept a directory name and then iterate through every file.
Here is part of my code:
#! /bin/bash
# sameln --- remove duplicate copies of files in specified directory
D=$1
cd $D #go to directory specified as default input
fileNum=0 #save file numbers
DIR=".*|*"
for f in $DIR #for every file in the directory
do
files[$fileNum]=$f #save that file into the array
fileNum=$((fileNum+1)) #increment the fileNum
echo aFile
done
The echo statement is for testing purposes. I passed as an argument the name of a directory with four regular files, and I expected my output to look like:
aFile
aFile
aFile
aFile
but the echo statement only shows up once.

A single operation
Use find for this, it's perfect for it.
find <dirname> -maxdepth 1 -type f -exec echo "{}" \;
The flags explained: maxdepth defines how deep int he hierarchy you want to look (dirs in dirs in dirs), type f defines files, as opposed to type d for dirs. And exec allows you to process the found file/dir, which is can be accessed through {}. You can alternatively pass it to a bash function to perform more tasks.
This simple bash script takes a dir as argument and lists all it's files:
#!/bin/bash
find "$1" -maxdepth 1 -type f -exec echo "{}" \;
Note that the last line is identical to find "$1" -maxdepth 1 -type f -print0.
Performing multiple tasks
Using find one can also perform multiple tasks by either piping to xargs or while read, but I prefer to use a function. An example:
#!/bin/bash
function dostuff {
# echo filename
echo "filename: $1"
# remove extension from file
mv "$1" "${1%.*}"
# get containing dir of file
dir="${1%/*}"
# get filename without containing dirs
file="${1##*/}"
# do more stuff like echoing results
echo "containing dir = $dir and file was called $file"
}; export -f dostuff
# export the function so you can call it in a subshell (important!!!)
find . -maxdepth 1 -type f -exec bash -c 'dostuff "{}"' \;
Note that the function needs to be exported, as you can see. This so you can call it in a subshell, which will be opened by executing bash -c 'dostuff'. To test it out, I suggest your comment to mv command in dostuff otherwise you will remove all your extensions haha.
Also note that this is safe for weird characters like spaces in filenames so no worries there.
Closing note
If you decide to go with the find command, which is a great choice, I advise you read up on it because it is a very powerful tool. A simple man find will teach you a lot and you will learn a lot of useful options to find. You can for instance quit from find once it has found a result, this can be handy to check if dirs contain videos or not for example in a rapid way. It's truly an amazing tool that can be used on various occasions and often you'll be done with a one liner (kinda like awk).

You can directly read the files into the array, then iterate through them:
#! /bin/bash
cd $1
files=(*)
for f in "${files[#]}"
do
echo $f
done

If you are iterating only files below a single directory, you are better off using simple filename/path expansion to avoid certain uncommon filename issues. The following will iterate through all files in a given directory passed as the first argument (default ./):
#!/bin/bash
srchdir="${1:-.}"
for i in "$srchdir"/*; do
printf " %s\n" "$i"
done
If you must iterate below an entire subtree that includes numerous branches, then find will likely be your only choice. However, be aware that using find or ls to populate a for loop brings with it the potential for problems with embedded characters such as a \n within a filename, etc. See Why for i in $(find . -type f) # is wrong even though unavoidable at times.

Related

How to recursively find & replace whole files with bash?

I have hundreds of files that I need to recursively replace as the files are currently stored like so:
/2019/01/
file1.pdf
file2.pdf
/2019/02
file3.pdf
file4.pdf
etc
I then have all of the updated files in another directory like so:
/new-files
file1.pdf
file2.pdf
file3.pdf
file4.pdf
Could someone please tell me the best way of doing this with a bash script? I'd basically like to read the new-files directory and then replace any matching file names in the other folders.
Thanks in advance for any help!

Assuming that the 'new-files' directory and all the directory trees containing PDF files are under the current directory, try this Shellcheck-clean Bash code:
#! /bin/bash -p
find . -path ./new-files -prune -o -type f -name '*.pdf' -print0 \
| while IFS= read -r -d '' pdfpath; do
pdfname=${pdfpath##*/}
new_pdfpath=new-files/$pdfname
if [[ -f $new_pdfpath ]]; then
printf "Replace '%s' with '%s'\n" "$pdfpath" "$new_pdfpath" >&2
# cp -- "$new_pdfpath" "$pdfpath"
fi
done
The -path ./new-files -prune in the find command stops the 'new-files' directory from being searched.
The -o in the find command causes the next test and actions to be tried after checking for 'new-files'.
See BashFAQ/001 (How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?) for an explanation of the use of the -print0 option to find and the while IFS= read -r -d '' .... In short, the code can handle arbitrary file paths, including ones with whitespace and newline characters in them.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${pdfpath##*/}.
It's not clear to me if you want to copy or move the new file to replace the old file, or do something else. Run the code as it is to check if it is identifying the correct replacements to be done. If you are happy with it, uncomment the cp line, and modify it to do something different if that is what you want.
The -- in the cp command protects against arguments beginning with dash characters being interpreted as options. It's unnecessary in this case, but I always use it when arguments begin with variable (or other) expansions so the code will remain safe if it is used in other contexts.

I think this calls for a bash array.
#!/usr/bin/env bash
# Make an associative array
declare -A files=()
# Populate array as $files[file.pdf]="/path/to/file.pdf"
for f in 20*/*/*.pdf; do
files[${f##*/}]="$f"
done
# Step through files and replace
for f in new-files/*.pdf; do
if [[ ! -e "${files[${f##*/}]}" ]]; then
echo "ERROR: missing $f" >&2
continue
fi
mv -v "$f" "${files[${f##*/}]}"
done
Note that associative arrays require bash version 4 or above. If you're using the native bash on a Mac, this won't work as-is.
Note also that if you remove continue in the final lines, then the mv command will NOT safely move files that do not exist in the date hash directories, since no target is known.
If you wanted further protection you might use test -nt or friends to confirm that an update is happening in the right direction.

Go into every subdirectory and mass rename files by stripping leading characters

From the current directory I have multiple sub directories:
subdir1/
001myfile001A.txt
002myfile002A.txt
subdir2/
001myfile001B.txt
002myfile002B.txt
where I want to strip every character from the filenames before myfile so I end up with
subdir1/
myfile001A.txt
myfile002A.txt
subdir2/
myfile001B.txt
myfile002B.txt
I have some code to do this...
#!/bin/bash
for d in `find . -type d -maxdepth 1`; do
cd "$d"
for f in `find . "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's/^.*myfile/myfile/')"
done
done
however the newly renamed files end up in the parent directory
i.e.
myfile001A.txt
myfile002A.txt
myfile001B.txt
myfile002B.txt
subdir1/
subdir2/
In which the sub-directories are now empty.
How do I alter my script to rename the files and keep them in their respective sub-directories? As you can see the first loop changes directory to the sub directory so not sure why the files end up getting sent up a directory...

Your script has multiple problems. In the first place, your outer find command doesn't do quite what you expect: it outputs not only each of the subdirectories, but also the search root, ., which is itself a directory. You could have discovered this by running the command manually, among other ways. You don't really need to use find for this, but supposing that you do use it, this would be better:
for d in $(find * -maxdepth 0 -type d); do
Moreover, . is the first result of your original find command, and your problems continue there. Your initial cd is without meaningful effect, because you're just changing to the same directory you're already in. The find command in the inner loop is rooted there, and descends into both subdirectories. The path information for each file you choose to rename is therefore stripped by sed, which is why the results end up in the initial working directory (./subdir1/001myfile001A.txt --> myfile001A.txt). By the time you process the subdirectories, there are no files left in them to rename.
But that's not all: the find command in your inner loop is incorrect. Because you do not specify an option before it, find interprets "*.txt" as designating a second search root, in addition to .. You presumably wanted to use -name "*.txt" to filter the find results; without it, find outputs the name of every file in the tree. Presumably you're suppressing or ignoring the error messages that result.
But supposing that your subdirectories have no subdirectories of their own, as shown, and that you aren't concerned with dotfiles, even this corrected version ...
for f in `find . -name "*.txt"`;
... is an awfully heavyweight way of saying this ...
for f in *.txt;
... or even this ...
for f in *?myfile*.txt;
... the latter of which will avoid attempts to rename any files whose names do not, in fact, change.
Furthermore, launching a sed process for each file name is pretty wasteful and expensive when you could just use bash's built-in substitution feature:
mv "$f" "${f/#*myfile/myfile}"
And you will find also that your working directory gets messed up. The working directory is a characteristic of the overall shell environment, so it does not automatically reset on each loop iteration. You'll need to handle that manually in some way. pushd / popd would do that, as would running the outer loop's body in a subshell.
Overall, this will do the trick:
#!/bin/bash
for d in $(find * -maxdepth 0 -type d); do
pushd "$d"
for f in *.txt; do
mv "$f" "${f/#*myfile/myfile}"
done
popd
done

You can do it without find and sed:
$ for f in */*.txt; do echo mv "$f" "${f/\/*myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
If you remove the echo, it'll actually rename the files.
This uses shell parameter expansion to replace a slash and anything up to myfile with just a slash and myfile.
Notice that this breaks if there is more than one level of subdirectories. In that case, you could use extended pattern matching (enabled with shopt -s extglob) and the globstar shell option (shopt -s globstar):
$ for f in **/*.txt; do echo mv "$f" "${f/\/*([!\/])myfile/\/myfile}"; done
mv subdir1/001myfile001A.txt subdir1/myfile001A.txt
mv subdir1/002myfile002A.txt subdir1/myfile002A.txt
mv subdir1/subdir3/001myfile001A.txt subdir1/subdir3/myfile001A.txt
mv subdir1/subdir3/002myfile002A.txt subdir1/subdir3/myfile002A.txt
mv subdir2/001myfile001B.txt subdir2/myfile001B.txt
mv subdir2/002myfile002B.txt subdir2/myfile002B.txt
This uses the *([!\/]) pattern ("zero or more characters that are not a forward slash"). The slash has to be escaped in the bracket expression because we're still inside of the pattern part of the ${parameter/pattern/string} expansion.

Maybe you want to use the following command instead:
rename 's#(.*/).*(myfile.*)#$1$2#' subdir*/*
You can use rename -n ... to check the outcome without actually renaming anything.
Regarding your actual question:
The find command from the outer loop returns 3 (!) directories:
.
./subdir1
./subdir2
The unwanted . is the reason why all files end up in the parent directory (that is .). You can exclude . by using the option -mindepth 1.
Unfortunately, this was onyl the reason for the files landing in the wrong place, but not the only problem. Since you already accepted one of the answers, there is no need to list them all.

a slight modification should fix your problem:
#!/bin/bash
for f in `find . -maxdepth 2 -name "*.txt"`; do
mv "$f" "$(echo "$f" | sed -r 's,[^/]+(myfile),\1,')"
done
note: this sed uses , instead of / as the delimiter.
however, there are much faster ways.
here is with the rename utility, available or easily installed wherever there is bash and perl:
find . -maxdepth 2 -name "*.txt" | rename 's,[^/]+(myfile),/$1,'
here are tests on 1000 files:
for `find`; do mv 9.176s
rename 0.099s
that's 100x as fast.
John Bollinger's accepted answer is twice as fast as the OPs, but 50x as slow as this rename solution:
for|for|mv "$f" "${f//}" 4.316s
also, it won't work if there is a directory with too many items for a shell glob. likewise any answers that use for f in *.txt or for f in */*.txt or find * or rename ... subdir*/*. answers that begin with find ., on the other hand, will also work on directories with any number of items.

Efficiently rereading (or otherwise reusing) content from an input file in bash

I would like to apply a commands to files in a directory, target_dir, by the following code.
for t_file in $(find 'target_dir' -maxdepth 1 -type f);
do
exec {fd}<'command_list.txt'
while read -u "${fd}" eval_command
do
eval "${eval_command}"
done
exec {fd}>&-
done
A example of command_list.txt is
# command_list.txt
cat "${t_file}"
The program loads the command_file.txt for every files but
I expects that it is more efficient if I can move the file pointer back to the first line of the file without needing to close and reopen it between iterations.
exec {fd}<'command_list.txt'
for t_file in $(find 'target_dir' -maxdepth 1 -type f);
do
(move cursor of read to the first line of 'command_list.txt')
while read -u "${fd}" eval_command
do
eval "${eval_command}"
done
done
exec {fd}>&-
Is seeking a file pointer back to the beginning of a file without reopening it possible in bash?

To answer the literal question: You can seek a file descriptor in bash only with a loadable module (plugin) to the shell adding a new builtin, or by spinning up an external program (inheriting your file descriptors) and asking it to do the seek operation (an approach given in this answer). However, the cost of spinning up an external program is larger than the cost of just closing and reopening your file, so that's not approach that really makes sense in this case.
If you want to store your command list, just do that -- store it as an array. If by "moving the cursor" you're referring to seeking within the input file, bash doesn't provide a seek primitive in the default set of builtins -- but there's no particular need for one anyhow.
# read command_list.txt only once, storing its contents in an array
readarray -t eval_commands < command_list.txt
while IFS= read -r -d '' t_file <&3; do # iterate over filenames from find
for eval_command in "${eval_commands[#]}"; do # iterate over your array
eval "$eval_command"
done
done 3< <(find target_dir -maxdepth 1 -type f -print0)
By the way -- if you were going to pass your filename as an argument to the command rather than substituting it in, you'd want to do so as follows:
# put the filename expansion in *single quotes*
eval "$eval_command "'"$t_file"'
...or as follows:
# generate a safe version of the filename
printf -v t_file_q '%q' "$t_file"
# ...and use that safe version instead of the original
eval "$eval_command $t_file_q"
If one ran eval "$eval_command $t_file" -- not following either of these precautions -- a file in created with touch $'target_dir/hello world $(rm -rf $HOME)' would be very bad news.

find 'target_dir' -maxdepth 1 -type f -exec cat {} \;
That would cat every file in the target directory 1 folder deep.

Bash: How to control iteration flow/loops?

For going over some recovered data, I am working on a script that recursively goes through folders & files and finally runs file on them, to check if they are likely fully recovered from a certain backup or not. (recovered files play, and are identified as mp3 or other audio, non-working files as ASCII-Text)
For now I would just be satisfied with having it go over my test folder structure, print all folders & corresponding files. (printing them mainly for testing, but also because I would like to log where the script currently is and how far along it is in the end, to verify what has been processed)
I tried using 2 for loops, one for the folders, then one for the files. (so that ideally it would take 1 folder, then list the files in there (or potentially delve into subfolders) and below each folder only give the files in that subfolders, then moving on to the next.
Such as:
Folder1
- File 1
- File 2
-- Subfolder
-- File3
-- File4
Folder2
- File5
However this doesn't seem to work in the ways (such with for loops) that are normally proposed. I got as far as using "find . -type d" for the directories and "find . -type f" or "find * -type f" (so that it doesn't go in to subdirectories) However, when just printing the paths/files in order to check if it ran as I wanted it to, it became obvious that that didn't work.
It always seemed to first print all the directories (first loop) and then all the files (second loop). For keeping track of what it is doing and for making it easier to know what was checked/recovered I would like to do this in a more orderly fashion as explained above.
So is it that I just did something wrong, or is this maybe a general limitation of the for loop in bash?
Another problem that could be related: Although assigning the output of find to an array seemed to work, it wasn't accessible as an array ...
Example for loop:
for folder in '$(find . -type d)' ; do
echo $folder
let foldercounter++
done
Arrays:
folders=("$(find . -type d)")
#As far as I know this should assign the output as an array
#However, it is not really assigned properly somehow as
echo "$folders[1]"
# does not work (quotes necessary for spaces)

A find ... -exec ... solution #H.-Dirk Schmitt was referring to might look something like:
find . -type f -exec sh -c '
case $(file "$1") in
*Audio file*)
echo "$1 is an audio file"
;;
*ASCII text*)
echo "$1 is an ascii text file"
;;
esac
' _ {} ';'

For going over some recovered data, I am working on a script that recursively goes through folders & files and finally runs file on them, to check if they are likely fully recovered from a certain backup or not. (recovered files play, and are identified as mp3 or other audio, non-working files as ASCII-Text)
If you want to run file on every file and directory in the current directory, including its subdirectories and so on, you don't need to use a Bash for-loop, because you can just tell find to run file:
find -exec file '{}' ';'
(The -exec ... ';' option runs the command ... on every matched file or directory, replacing the argument {} with the path to the file.)
If you only want to run file on regular files (not directories), you can specify -type f:
find -type f -exec file '{}' ';'
If you (say) want to just print the names of directories, but run the above on regular files, you can use the -or operator to connect one directive that uses -type d and one that uses -type f:
find -type d -print -or -type f -exec file '{}' ';'
Edited to add: If desired, the effect of the above commands can be achieved in pure Bash (plus the file command, of course), by writing a recursive shell function. For example:
function foo () {
local file
for file in "$1"/* ; do
if [[ -d "$file" ]] ; then
echo "$file"
foo "$file"
else
file "$file"
fi
done
}
foo .
This differs from the find command in that it will sort the files more consistently, and perhaps in gritty details such as handling of dot-files and symbolic links, but is broadly the same, so may be used as a starting-point for further adjustments.

Bash scripting, loop through files in folder fails

I'm looping through certain files (all files starting with MOVIE) in a folder with this bash script code:
for i in MY-FOLDER/MOVIE*
do
which works fine when there are files in the folder. But when there aren't any, it somehow goes on with one file which it thinks is named MY-FOLDER/MOVIE*.
How can I avoid it to enter the things after
do
if there aren't any files in the folder?

With the nullglob option.
$ shopt -s nullglob
$ for i in zzz* ; do echo "$i" ; done
$

for i in $(find MY-FOLDER/MOVIE -type f); do
echo $i
done
The find utility is one of the Swiss Army knives of linux. It starts at the directory you give it and finds all files in all subdirectories, according to the options you give it.
-type f will find only regular files (not directories).
As I wrote it, the command will find files in subdirectories as well; you can prevent that by adding -maxdepth 1
Edit, 8 years later (thanks for the comment, #tadman!)
You can avoid the loop altogether with
find . -type f -exec echo "{}" \;
This tells find to echo the name of each file by substituting its name for {}. The escaped semicolon is necessary to terminate the command that's passed to -exec.

for file in MY-FOLDER/MOVIE*
do
# Skip if not a file
test -f "$file" || continue
# Now you know it's a file.
...
done

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio