Bash looping through files in Directory - bash

I have a bash script, created by someone else, that I need to modify a little.
Since I'm new to Bash, I may need a little help with some common commands.
The script simply loops through a directory (recursively) for a specific file extension.
Here's the current script: (runme.sh)
#! /bin/bash
SRC=/docs/companies/
function report()
{
echo "-----------------------"
find $SRC -iname "*.aws" -type f -print
echo -e "\033[1mSOURCE FILES=\033[0m" `find $SRC -iname "*.aws" -type f -print |wc -l`
echo "-----------------------"
exit 0
}
report
I simply type #./runme.sh and I can see a list of all files with the extension of .aws
My primary goal is to limit the search. (some directories have way too many files)
I would like to run the script, limiting it to just 20 files.
Do I need to place the entire script into a loop method?

That's easy -- as long as you want the first 20 files, just pipe the first find command through head -n 20. But I can't resist a little cleanup while I'm at it: as written, it runs find twice, once to print the filenames and once to count them; if there are a lot of files to search, this is a waste of time. Second, wrapping the actual content of the script in a function (report) doesn't make much sense, and having the function exit (rather than returning) makes even less. Finally, I like to protect filenames with double-quotes and hate backquotes (use $() instead). So I took the liberty of a bit of cleanup:
#! /bin/bash
SRC=/docs/companies/
files="$(find "$SRC" -iname "*.aws" -type f -print)"
if [ -n "$files" ]; then
count="$(echo "$files" | wc -l)"
else # echo would print one line even if there are no files, so special-case the empty list
count=0
fi
echo "-----------------------"
echo "$files" | head -n 20
echo -e "\033[1mSOURCE FILES=\033[0m $count"
echo "-----------------------"

Use head -n 20 (as proposed by Peter). Additional remark: the script is very inefficient, as it runs find twice. You should consider using tee to gennerate a temporary file when the command runs for the first time, count the lines of this file afterwards and delete the file.

I would personnaly prefer to do it like this:
files=0
while read file ; do
files=$(($files + 1))
echo $file
done < <(find "$SRC" -iname "*.aws" -type f -print0 | head -20)
echo "-----------------------"
find $SRC -iname "*.aws" -type f -print
echo -e "\033[1mSOURCE FILES=\033[0m" $files
echo "-----------------------"
If you just want there count, you could only use find "$SRC" -iname "*.aws" -type f -print0 | head -20

Related

Issues renaming files using bash script with input from .txt file with find -exec rename command

Update 01/12/2022
With triplee's helpful suggestions, I resolved it to take both files & directories by adding a comma in between f and d, the final code now looks like this:
while read -r old new;
do echo "replacing ${old} by ${new}" >&2
find '/path/to/dir' -depth -type d,f -name "$old" -exec rename
"s/${old}/${new}/" {} ';'
done <input.txt
Thank you!
Original request:
I am trying to rename a list of files (from $old to $new), all present in $homedir or in subdirectories in $homedir.
In the command line this line works to rename files in the subfolders:
find ${homedir}/ -name ${old} -exec rename "s/${old}/${new}/" */${old} ';'
However, when I want to implement this line in a simple bash script getting the $old and $new filenames from input.txt, it doesn't work anymore...
input.txt looks like this:
name_old name_new
name_old2 name_new2
etc...
the script looks like this:
#!/bin/bash
homedir='/path/to/dir'
cat input.txt | while read old new;
do
echo 'replacing' ${old} 'by' ${new}
find ${homedir}/ -name ${old} -exec rename "s/${old}/${new}/" */${old} ';'
done
After running the script, the text line from echo with $old and $new filenames being replaced is printed for the entire loop, but no files are renamed. No error is printed either. What am I missing? Your help would be greatly appreaciated!
I checked whether the $old and $new variables were correctly passed to the find -exec rename command, but because they are printed by echo that doesn't seem to be the issue.
If you add an echo, like -exec echo rename ..., you'll see what actually gets executed. I'd say that both the path to $old is wrong (you're not using the result of find in the -exec clause), and */$old isn't quoted and might be expanded by the shell before find ever gets to see it.
You're also having most other expansions unquoted, which can lead to all sorts of trouble.
You could do it in pure Bash (drop echo when output looks good):
shopt -s globstar
for f in **/"$old"; do echo mv "$f" "${f/%*/$new}"; done
Or with rename directly, though this would run into trouble if too many files match (drop -n when output looks good):
rename -n "s/$old\$/$new/" **/"$old"
Or with GNU find, using -execdir to run in the same directory as the matching file (drop echo when output looks good):
find -type f -name "$old" -execdir echo mv "$old" "$new" \;
And finally, a version with find that spawns just a single subshell (drop echo when output looks right):
find -type f -name "$old" -exec bash -c '
new=$1
shift
for f; do
echo mv "$f" "${f/%*/$new}"
done
' bash "$new" {} +
The argument to rename should be the file itself, not */${old}. You also have a number of quoting errors, and a useless cat).
#!/bin/bash
while read -r old new;
do
echo "replacing ${old} by ${new}" >&2
find /path/to/dir -name "$old" -exec rename "s/${old}/${new}/" {} ';'
done <input.txt
Running find multiple times on the same directory is hugely inefficient, though. Probably a better solution is to find all files in one go, and abort if it's not one of the files on the list.
find /path/to/dir -type f -exec sh -c '
for f in "$#"; do
awk -v f="$f" "f==\$1 { print \"s/\" \$1 \"/\" \$2 \"/\" }" "$0" |
xargs -I _ -r rename _ "$f"
done' input.txt {} +
(Untested; probably try with echo before you run this live.)

How to copy files in Bash that have more than 1 line

I am trying to copy files from one directory (defined as $inDir below) to another (defined as $outDir below) if they 1) exist and 2) have more than 1 line in the file (this is to avoid copying files that are empty text files). I am able to do the first part using the below code but am struggling to know how to do the latter part. I'm gussing maybe using awk and NR somehow but I'm not very good with coding in Bash so any help would be appreciated. I'd like this to be incorporated into the below if possible, so that it can be done in one step.
for i in $inDir/NVDI_500m_mean_distance_*_40PCs; do
batch_name_dir=$i;
batch_name=$(basename $i);
if [ ! -f $outDir/${batch_name}.plink.gz ]; then
echo 'Copying' $batch_name;
find $batch_name_dir -name ${batch_name}.plink.gz -exec cp {} $outDir/${batch_name}.plink.gz \;
else
echo $batch_name 'already exists'
fi
done
You can use wc -l to check how many lines are in a file and awk to strip only the number from the result.
lines=$(wc -l $YOUR_FILE_NAME | awk '{print $1}')
if [ $lines -gt 0 ]; then
//copy the file
fi
Edit: I have corrected LINES to lines according to the comments below.
I propose this:
for f in "$(find $indir -type f -name 'NVDI_500m_mean_distance_*_40PC' -not -empty)";
do
cp "$f" /some/targetdir;
done
find is faster than wc to check for zero size.
I consider it more readable, than the other solution, subjectivly.
However, the for-loop is not necessary, since:
find "$indir" -type f -name 'NVDI_500m_mean_distance_*_40PC' -not -empty |\
xargs -I % cp % /some/targetdir/%
Always "quote" path strings, since most shell utils break when there are unescaped shell chars or white spaces in the string. There are rarely good reasons to use unquoted strings.

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks
Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]
Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.
For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.
Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

why is part of my one liner not being executed

I am trying to write a one liner to find the number of files in each home directory. I am trying to do this as the other day I had situation where I ran out of inodes on /home. It took me a long time to find the offender and I want to shorten this process. This is what i have but it is not working.
for i in /home/*; do if [ -d "$i" ]; then cd $i find . -xdev -maxdepth 100 -type f |wc -l; fi done
When I run it, it prints a 0 for each home directory, and I remain in roots home directory.
However when I run just this part:
for i in /home/*; do if [ -d "$i" ]; then cd $i; fi done
I wind up in the last home directory leading me to believe I traversed them all.
And when I run this in each users home directory:
find . -xdev -maxdepth 100 -type f |wc -l
I get a legit answer.
You're missing a terminating character after your cd. But more importantly, using cd can cause unwanted errors if you're not careful, try below instead (cd not needed).
for i in /home/*; do [ -d "$i" ] && echo "$i" && find "$i" -xdev -maxdepth 100 -type f | wc -l; done
Since find can take multiple paths, you don't need a loop:
find /home/*/ -xdev -maxdepth 100 -type f | wc -l
To avoid any issues with filenames containing newlines (rare, yes), you can take advantage of an additional GNU extension to find (you're using -maxdepth, so I assume you can use -printf as well):
find /home/*/ -xdev -maxdepth 100 -type f -printf "." | wc -c
Since you aren't actually using the name of the file for counting, replace it with a single-character string, then count the length of the resulting string.

Suppress output to StdOut when piping echo

I'm making a bash script that crawls through a directory and outputs all files of a certain type into a text file. I've got that working, it just also writes out a bunch of output to console I don't want (the names of the files)
Here's the relevant code so far, tmpFile is the file I'm writing to:
for DIR in `find . -type d` # Find problem directories
do
for FILE in `ls "$DIR"` # Loop through problems in directory
do
if [[ `echo ${FILE} | grep -e prob[0-9]*_` ]]; then
`echo ${FILE} >> ${tmpFile}`
fi
done
done
The files I'm putting into the text file are in the format described by the regex prob[0-9]*_ (something like prob12345_01)
Where I pipe the output from echo ${FILE} into grep, it still outputs to stdout, something I want to avoid. I think it's a simple fix, but it's escaping me.
All this can be done in one single find command. Consider this:
find . -type f -name "prob[0-9]*_*" -exec echo {} >> ${tmpFile} \;
EDIT:
Even simpler: (Thanks to #GlennJackman)
find . -type f -name "prob[0-9]*_*" >> $tmpFile
To answer your specific question, you can pass -q to grep for silent output.
if echo "hello" | grep -q el; then
echo "found"
fi
But since you're already using find, this can be done with just one command:
find . -regex ".*prob[0-9]*_.*" -printf '%f\n' >> ${tmpFile}
find's regex is a match on the whole path, which is why the leading and trailing .* is needed.
The -printf '%f\n' prints the file name without directory, to match what your script is doing.
what you want to do is, read the output of the find command,
for every entry find returned, you want to get all (*) the files under that location
and then you want to check whether that filename matches the pattern you want
if it matches then add it to the tmpfile
while read -r dir; do
for file in "$dir"/*; do # will not match hidden files, unless dotglob is set
if [[ "$file" =~ prob[0-9]*_ ]]; then
echo "$file" >> "$tmpfile"
fi
done < <(find . -type d)
however find can do that alone
anubhava got me there ;)
so look his answer on how that's done

Resources