How to recursively traverse a directory tree and find only files? - bash

I am working on a scp call to download a folder present on a remote system. Downloaded folder has subfolders and within these subfolders there are a bunch of files which I want to pass as arguments to a python script like this:
scp -r researcher#192.168.150.4:SomeName/SomeNameElse/$folder_name/ $folder_name/
echo "File downloaded successfully"
echo "Running BD scanner"
for d in $folder_name/*; do
if [[ -d $d ]]; then
echo "It is a directory"
elif [[ -f $d ]]; then
echo "It is a file"
echo "Running the scanner :"
python bd_scanner_new.py /home/nsadmin/Some/bash_script_run_files/$d
else
echo "$d is invalid file"
exit 1
fi
done
I have added the logic to find if there are any directories and excluding them. However, I don't traverse down those directories recursively.
Partial results below:
File downloaded succesfully
Running BD scanner
It is a directory
It is a directory
It is a directory
Exiting
I want to improve this code so that it traverses all directories and picks up all files. Please help me with any suggestions.

You can use shopt -s globstar in Bash 4.0+:
#!/bin/bash
shopt -s globstar nullglob
cd _your_base_dir
for file in **/*; do
# will loop for all the regular files across the entire tree
# files with white spaces or other special characters are gracefully handled
python bd_scanner_new.py "$file"
done
Bash manual says this about globstar:
If set, the pattern ‘**’ used in a filename expansion context will
match all files and zero or more directories and subdirectories. If
the pattern is followed by a ‘/’, only directories and subdirectories
match.
More globstar discussion here: https://unix.stackexchange.com/questions/117826/bash-globstar-matching

Why go through the trouble of using globbing for file matching but rather use find with is meant for this by using a process-substitution (<()) with a while-loop.
#!/bin/bash
while IFS= read -r -d '' file; do
# single filename is in $file
python bd_scanner_new.py "$file"
done < <(find "$folder_name" -type f -print0)
Here, find does a recursive search of all the files from the mentioned path to any level of sub-directories below. Filenames can contain blanks, tabs, spaces, newlines. To process filenames in a safe way, find with -print0 is used: filename is printed with all control characters & terminated with NUL which then is read command processes with the same de-limit character.
Note; On a side note, always double-quote variables in bash to avoid expansion by shell.

Related

How i get the file name with certain extension in bash [duplicate]

This question already has answers here:
How to loop through file names returned by find?
(17 answers)
Closed 12 months ago.
Here im trying to uncompress roms(iso) files they usually come in zip or 7z
and once in iso file i will like to compress it again to chd (readable format for the emulator) so i though i can use the command find to look up for the file but looks like when i just execute the find instruction the files are display propletly (one per line) but when i try to get each file name to process it looks like it just split by space (yes this files had spaces in it) and not the actual full filename, is worth mention that this iso files are inside a subdirectory name equal than the file itself(without *.iso obvsly) this is what im trying:
#/bin/bash
dir="/home/creeper/Downloads/"
dest="/home/creeper/Documents/"
for i in $(find $dir -name '*.7z' -or -name '*.zip' -or -name '*.iso');
do
if [[ $i == *7z ]]
then
7z x $i
rm -fr $i
fi
if [[ $i == *zip ]]
then
unzip $i
rm -fr $i
fi
if [[ $i == *iso ]]
then
chd_file="${i%.*}.chd"
chdman createcd -i $i -o $chd_file;
mv -v $chd_file $dest
rm -fr $i
fi
done;```
when i try to get each file name to process it looks like it just split by space (yes this files had spaces in it) and not the actual full filename
That's because for does word splitting etc. when its input is a command's output. See Don't Read Lines with For in the bash wiki for details.
One alternative is to use bash's extended globbing features instead of find:
#!/usr/bin/env bash
shopt -s extglob globstar
dir="/home/creeper/Downloads/"
for i in "$dir"/**/*.#(7z|zip|iso); do
# Remember to quote expansions of $i!
# ...
done

How to recursively find & replace whole files with bash?

I have hundreds of files that I need to recursively replace as the files are currently stored like so:
/2019/01/
file1.pdf
file2.pdf
/2019/02
file3.pdf
file4.pdf
etc
I then have all of the updated files in another directory like so:
/new-files
file1.pdf
file2.pdf
file3.pdf
file4.pdf
Could someone please tell me the best way of doing this with a bash script? I'd basically like to read the new-files directory and then replace any matching file names in the other folders.
Thanks in advance for any help!
Assuming that the 'new-files' directory and all the directory trees containing PDF files are under the current directory, try this Shellcheck-clean Bash code:
#! /bin/bash -p
find . -path ./new-files -prune -o -type f -name '*.pdf' -print0 \
| while IFS= read -r -d '' pdfpath; do
pdfname=${pdfpath##*/}
new_pdfpath=new-files/$pdfname
if [[ -f $new_pdfpath ]]; then
printf "Replace '%s' with '%s'\n" "$pdfpath" "$new_pdfpath" >&2
# cp -- "$new_pdfpath" "$pdfpath"
fi
done
The -path ./new-files -prune in the find command stops the 'new-files' directory from being searched.
The -o in the find command causes the next test and actions to be tried after checking for 'new-files'.
See BashFAQ/001 (How can I read a file (data stream, variable) line-by-line (and/or field-by-field)?) for an explanation of the use of the -print0 option to find and the while IFS= read -r -d '' .... In short, the code can handle arbitrary file paths, including ones with whitespace and newline characters in them.
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${pdfpath##*/}.
It's not clear to me if you want to copy or move the new file to replace the old file, or do something else. Run the code as it is to check if it is identifying the correct replacements to be done. If you are happy with it, uncomment the cp line, and modify it to do something different if that is what you want.
The -- in the cp command protects against arguments beginning with dash characters being interpreted as options. It's unnecessary in this case, but I always use it when arguments begin with variable (or other) expansions so the code will remain safe if it is used in other contexts.
I think this calls for a bash array.
#!/usr/bin/env bash
# Make an associative array
declare -A files=()
# Populate array as $files[file.pdf]="/path/to/file.pdf"
for f in 20*/*/*.pdf; do
files[${f##*/}]="$f"
done
# Step through files and replace
for f in new-files/*.pdf; do
if [[ ! -e "${files[${f##*/}]}" ]]; then
echo "ERROR: missing $f" >&2
continue
fi
mv -v "$f" "${files[${f##*/}]}"
done
Note that associative arrays require bash version 4 or above. If you're using the native bash on a Mac, this won't work as-is.
Note also that if you remove continue in the final lines, then the mv command will NOT safely move files that do not exist in the date hash directories, since no target is known.
If you wanted further protection you might use test -nt or friends to confirm that an update is happening in the right direction.

Remove numbers at beginning of filenames in directory in bash

In an attempt to rename the files in one directory with numbers at the front I made an error in my script so that this happened in the wrong directory. Therefore I now need to remove these numbers from the beginning of all of my filenames in a directory. These range from 1 to 3 digits. Examples of the filnames I am working with are:
706terrain_Slope1000m_Minimum_all_25PCs_bolt_all_25PCs_qq_bolt.png
680met_sfcWind_all_25PCs_bolt_number.txt
460greenness_NDVI_500m_min_all_25PCs_bolt_number.txt
I was thinking of using mv but I'm not really sure how to do it with varying numbers of digits at the beginning, so any advice would be appreciated!
A simple way in bash is making use of a regular expression test:
for file in *; do
[[ -f "${file}" ]] && [[ "${file}" =~ (^[0-9]+) ]] && mv ${file} ${file/${BASH_REMATCH[1]}}
done
This does the following:
[[ -f "${file}" ]]: test if file is a file, if so
[[ "${file}" =~ (^[0-9]+) ]]: check if file starts with a number
${file/${BASH_REMATCH[1]}}: remove the number from the string file by using BASH_REMATCH, a variable that matches the groupings from the regex match.
If you've got perl's rename installed, the following should work :
rename 's/^[0-9]{1,3}//' /path/to/files
/path/to/files can be a list of specific files, or probably in your case a glob (e.g. *.{png,txt}). You don't need to select only files starting with digits as rename won't modify those that do not.
Using bash parameter expansion:
shopt -s extglob
for i in +([0-9])*.{txt,png}; do
mv -- "$i" "${i##+([0-9])}"
done
This will remove starting digits (any number) in filenames having png and txt extension.
The ## is removing the longest matching prefix pattern.
The +(...) is path name expansion syntax for repeated characters.
And [0-9] is pattern matching digits.
Alternate method using GNU find:
#!/usr/bin/env bash
find ./ \
-maxdepth 1\
-type f\
-name '[[:digit:]]*'\
-exec bash -c 'shopt -s extglob; f="${1##*/}"; d="${1%%/*}"; mv -- "$1" "${d}/${f##+([[:digit:]])}"' _ {} \;
Find all actual files in current directory whose name start with a digit.
For each found file, execute the Bash script below:
shopt -s extglob # need for extended pattern syntax
f="${1##*/}" # Get file name without directory path
d="${1%%/*}" # Get directory path without file name
mv -- "$1" "${d}/${f##+([[:digit:]])}" # Rename without the leading digits
Using basic features of a POSIX-compliant shell:
#!/bin/sh
for f in [[:digit:]]*; do
if [ -f "$f" ]; then
pf="${f%${f#???}}" pf="${pf##*[[:digit:]]}"
mv "$f" "$pf${f#???}"
fi
done

Shell Script to list files in a given directory and if they are files or directories

Currently learning some bash scripting and having an issue with a question involving listing all files in a given directory and stating if they are a file or directory. The issue I am having is that I only get either my current directory or if a specify a directory it will just say that it is a directory eg. /home/user/shell_scripts will return shell_scipts is a directory rather than the files contained within it.
This is what I have so far:
dir=$dir
for file in $dir; do
if [[ -d $file ]]; then
echo "$file is a directory"
if [[ -f $file ]]; then
echo "$file is a regular file"
fi
done
Your line:
for file in $dir; do
will expand $dir just to a single directory string. What you need to do is expand that to a list of files in the directory. You could do this using the following:
for file in "${dir}/"* ; do
This will expand the "${dir}/"* section into a name-only list of the current directory. As Biffen points out, this should guarantee that the file list wont end up with split partial file names in file if any of them contain whitespace.
If you want to recurse into the directories in dir then using find might be a better approach. Simply use:
for file in $( find ${dir} ); do
Note that while simple, this will not handle files or directories with spaces in them. Because of this, I would be tempted to drop the loop and generate the output in one go. This might be slightly different than what you want, but is likely to be easier to read and a lot more efficient, especially with large numbers of files. For example, To list all the directories:
find ${dir} -maxdepth 1 -type d
and to list the files:
find ${dir} -maxdepth 1 -type f
if you want to iterate into directories below, then remove the -maxdepth 1
This is a good use for globbing:
for file in "$dir/"*
do
[[ -d "$file" ]] && echo "$file is a directory"
[[ -f "$file" ]] && echo "$file is a regular file"
done
This will work even if files in $dir have special characters in their names, such as spaces, asterisks and even newlines.
Also note that variables should be quoted ("$file"). But * must not be quoted. And I removed dir=$dir since it doesn't do anything (except break when $dir contains special characters).
ls -F ~ | \
sed 's#.*/$#/& is a Directory#;t quit;s#.*#/& is a File#;:quit;s/[*/=>#|] / /'
The -F "classify" switch appends a "/" if a file is a directory. The sed code prints the desired message, then removes the suffix.
for file in $(ls $dir)
do
[ -f $file ] && echo "$file is File"
[ -d $file ] && echo "$file is Directory"
done
or replace the
$(ls $dir)
with
`ls $`
If you want to list files that also start with . use:
for file in "${dir}/"* "${dir}/"/.[!.]* "${dir}/"/..?* ; do

Filenames with wildcards in variables

#!/bin/bash
outbound=/home/user/outbound/
putfile=DATA_FILE_PUT_*.CSV
cd $outbound
filecnt=0
for file in $putfile; do let filecnt=filecnt+1; done
echo "Filecount: " $filecnt
So this code works well when there are files located in the outbound directory. I can place files into the outbound path and as long as they match the putfile mask then the files are incremented as expected.
Where the problem comes in is if I run this while there are no files located in $outbound.
If there are zero files there $filecnt still returns a 1 but I'm looking to have it return a 0 if there are no files there.
Am I missing something simple?
Put set -x just below the #! line to watch what your script is doing.
If there is no matching file, then the wildcard is left unexpanded, and the loop runs once, with file having the value DATA_FILE_PUT_*.CSV.
To change that, set the nullglob option. Note that this only works in bash, not in sh.
shopt -s nullglob
putfile=DATA_FILE_PUT_*.CSV
for file in $putfile; do let filecnt=filecnt+1; done
Note that the putfile variable contains the wildcard pattern, not the list of file names. It might make more sense to put the list of matches in a variable instead. This needs to be an array variable, and you need to change the current directory first. The number of matching files is then the length of the array.
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
echo "Filecount: " ${#putfiles}
If you need to iterate over the files, take care to protect the expansion of the array with double quotes, otherwise if a file name contains whitespace then it will be split over several words (and if a filename contains wildcard characters, they will be expanded).
#!/bin/bash
shopt -s nullglob
outbound=/home/user/outbound/
cd "$outbound"
putfiles=(DATA_FILE_PUT_*.CSV)
for file in "${putfiles[#]}"; do
echo "Processing $file"
done
You could test if file exists first
for file in $putfile; do
if [ -f "$file" ] ; then
let filecnt=filecnt+1
fi
done
Or look for your files with find
for file in $(find . -type f -name="$putfile"); do
let filecnt=filecnt+1
done
or simply (fixed)
filecnt=$(find . -type f -name "$putfile" | wc -l); echo $filecnt
This is because when no matches are found, bash by default expands the wildcard DATA_FILE_PUT_*.CSV to the word DATA_FILE_PUT_*.CSV and therefore you end up with a count of 1.
To disable this behavior, use shopt -s nullglob
Not sure why you need a piece of code here. Following one liner should do your job.
ls ${outbound}/${putfile} | wc -l
Or
find ${outbound} -maxdepth 1 -type f -name "${putfile}" | wc -l

Resources