count number of lines for each file found - bash

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks

Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]

Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.

For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.

Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

Related

find command - get base name only - NOT with basename command / NOT with printf

Is there any way to get the basename in the command find?
What I don't need:
find /dir1 -type f -printf "%f\n"
find /dir1 -type f -exec basename {} \;
Why you may ask? Because I need to continue using the found file. I basically want something like this:
find . -type f -exec find /home -type l -name "*{}*" \;
And it uses ./file1, not file1 as the agrument for -name.
Thanks for your help :)
If you've got Bash version 4.3 or later, try this Shellcheck-clean pure Bash code:
#! /bin/bash -p
shopt -s dotglob globstar nullglob
for path in ./**; do
[[ -L $path ]] && continue
[[ -f $path ]] || continue
base=${path##*/}
for path2 in /home/**/*"$base"*; do
[[ -L $path2 ]] && printf '%s\n' "$path2"
done
done
shopt -s ... enables some Bash settings that are required by the code:
dotglob enables globs to match files and directories that begin with .. find shows such files by default.
globstar enables the use of ** to match paths recursively through directory trees. globstar was introduced in Bash 4.0, but it was dangerous to use before Bash 4.3 (2014) because it followed symlinks when looking for matches.
nullglob makes globs expand to nothing when nothing matches (otherwise they expand to the glob pattern itself, which is almost never useful in programs).
See Removing part of a string (BashFAQ/100 (How do I do string manipulation in bash?)) for an explanation of ${path##*/}. That always works, even in some rare cases where $(basename "$path") doesn't.
See the accepted, and excellent, answer to Why is printf better than echo? for an explanation of why I used printf instead of echo to output the found paths.
This solution works correctly if you've got files that contain pattern characters (?, *, [, ], \) in their names.
Spawn a shell and make the second call to find from there
find /dir1 -type f -exec sh -c '
for p; do
find /dir2 -type l -name "*${p##*/}*"
done' sh {} +
If your files may contain special characters in their names (like [, ?, etc.), you may want to escape them like this to avoid false positives
find /dir1 -type f -exec sh -c '
for p; do
esc=$(printf "%sx\n" "${p##*/}" | sed "s/[?*[\]/\\\&/g")
esc=${esc%x}
find /dir2 -type l -name "*$esc*"
done' sh {} +
You'll have to forward it to another evaluator. There is no way to do that in find.
find . -type f -printf '%f\0' |
xargs -r0I{} find /home -type l -name '*{}*'
This answers your question about trying to merge the functionality of %f and -exec find and is based on your example but your example injects raw filenames as -name patterns so avoid that and look at other solutions instead.
Simply spawn a bash shell:
find /dir1 -type f -exec bash -c '
base=$(basename "$1")
echo "$base"
do_something_else "$base"
' bash {} \;
$1 in the bash part is each file filtered by find.

How to get file count and names in directory on bash

I want to get the file count & file names & folder names in directory:
mkdir -p /tmp/test/folder1
mkdir -p /tmp/test/folder2
touch /tmp/test/file1
touch /tmp/test/file2
file_names=$(find "/tmp/test" -mindepth 1 -maxdepth 1 -type f -print0 | xargs -0 -I {} basename "{}")
echo $file_names
here is the output:
file2 file1
For folder:
folder_names=$(find "/tmp/test" -mindepth 1 -maxdepth 1 -type d -print0 | xargs -0 -I {} basename "{}")
echo $folder_names
here is the output:
folder2 folder1
For count:
file_count=0 && $(find "/tmp/test" -mindepth 1 -maxdepth 1 -type f -print0 | let "file_count=file_count+1")
echo $file_count
folder_count=0 && $(find "/tmp/test" -mindepth 1 -maxdepth 1 -type d -print0 | let "folder_count=folder_count+1")
echo $folder_count
The file_count and folder_count does not work
Question 1:
How to get the correct file_count and folder_count?
Question 2:
Is it possible for getting names into an array and check the count from array size?
The answer to the second question is really the answer to the first, too.
mapfile -d '' files < <( find /tmp/test -type f \
-mindepth 1 -maxdepth 1 \
-printf '%f\0')
echo "${#files} files"
printf '%s\n' "${files[#]}"
The use of double quotes and # in the array expansion are essential for printing file names with whitespace correctly. The use of a null byte terminator between file names ensures that even newlines in file names are disambiguated.
Notice also the use of -printf with a specific format string to avoid having to run basename separately. However, the -printf option and its various format strings, as well as the -print0 option you used, are a GNU find extension, and thus not portable. (Linux typically ships with GNU tools; on other platforms, they are obviously easy to install, but typically not installed out of the box.)
If you have an older version of Bash which doesn't support mapfiles, try an explicit loop:
files=()
while IFS= read -r -d $'\0' file; do
files+=("$file")
done < <(find ...)
If you don't have GNU find, a common workaround is to print a fixed string for each found file, and then the line or character count reliably reflects the number of found files.
find /tmp/test -type f \
-mindepth 1 -maxdepth 1 \
-exec printf . \; |
wc -c
Though then, how do you collect the file names? If (as in your case) you don't require recursion into subdirectories, simply loop over all items in the directory.
In which case, again, the number of items in the collected array will also tell you how many there are.
files=()
dirs=()
for item in /tmp/test/*; do
if [[ -f "$item"]]; then
files+=("$item")
elif [[ -d "$item" ]]; then
dirs+=("$item")
fi
done
echo "${#dirs[#] directories}
printf '- %s\n' "${dirs[#]}"
echo "${#files[#]} files"
printf '%s\n' "${dirs[#]}"
For a further discussion, see also https://mywiki.wooledge.org/BashFAQ/020
Needlessly collecting items into an array so you can loop over it once is a common beginner antipattern. If you just want to process each item in isolation and then forget it, there is no need to waste memory on remembering all the other items - just loop over them directly.
As an aside, running find in a subprocess will create a new shell instance with its own set of variables; thus in your attempt, the pipe to let would increment from 0 to 1 each time you ran it (though of course, piping to let also does not do anything useful anyway).

Counting sum of lines in all .c and .h files

I am trying to write a shell script that will count the sum of all lines in every file in a directory (and its subdirectories) of format .c and .h.
I already have that code but I am not sure how to make it find both file formats.
!/bin/bash
#Program
total=0
find /path -type f -name "*.php" | while read FILE; do
count=$(grep -c ^ < "$FILE")
echo "$FILE has $count lines"
let total=total+count
done
echo TOTAL LINES COUNTED: $total
I am newbie to shell/bash and if anything else is wrong I would be grateful for help.
Optimized and fast find + GNU parallel solution:
find /path -type f -name "*.[ch]" -print0 | parallel -q0 -j0 --no-notice wc -l {} \
| awk '{ sum+=$1 }END{ print "TOTAL LINES COUNTED: "sum }'
-print0 - print the full file name on the standard output, followed by a null character (instead of the newline character that -print uses). This allows file names that contain newlines or other types of white space to be correctly interpreted by programs that process the find output.
with parallel the command wc -l {} will be excuted for each file in parallel (that's called parallel processing)
To find .c and .h files instead of .php,
simply change the value of the -name parameter to *.[ch].
There are a few other issues in the script:
It would be safer to read the filenames as IFS= read -r
The first line should be #!/bin/bash instead of !/bin/bash
And some minor improvements are possible:
The summing logic can be written a bit simpler using ((...)) syntax (arithmetic context)
It's not recommended to use uppercase variable names, as that conversion is reserved to system variables
Putting it together:
#!/bin/bash
total=0
find /path -type f -name "*.[ch]" | while IFS= read -r file; do
count=$(grep -c ^ < "$file")
echo "$file has $count lines"
((total += count))
done
echo TOTAL LINES COUNTED: $total
Other answers recommend variations of find ... -exec wc -l.
Although they look more elegant,
they will not work exactly the same way as your script:
wc -l counts lines a bit differently from grep -c ^. In particular it doesn't count the last line of a file if it doesn't end with a newline. Try for example printf hello > file; wc -l file; grep -c ^ file -> you'll get 0 and 1.
Getting the line count in the individual files, and the total lines is not so simple. Using find ... -exec wc -l {} + comes quite close (if your implementation of find supports +), but again there will be corner cases that need special treatment. For example if there are too many files, then wc will be invoked multiple times, producing multiple sub-totals that would need to be reconciled.
Try this:
cat $(find /path -type f \( -name '*.c' -o -name '*.h' \)) |wc -l
It will run cat on every file returned by find and pipe the output into wc. If you need the value in a variable just do this
lines=$(cat ...)
echo counted $lines lines
Cat all files ending in .c or .h and pipe to grep -c:
find -type f -name '*.[ch]' -exec cat {} + | grep -c '^'
For a find without the + option, the alternative is
find -type f -name '*.[ch]' -exec cat {} \; | grep -c '^'
which calls cat once per file instead of as few times as possible, making it a bit slower.
If you know that you won't have a lot of files approaching the command line length limit, you could use just shell globbing:
shopt -s globstar # enable **/* glob
cat **/*.[ch] | grep -c '^'

How to log variable in bash

for i in *.txt;
do
xxd -l 3 $i >> log
done
I also want to log file names $i for each result. E.g.:
file_name
result_of_command
You probably just need to use printf:
for f in *.txt; do
printf "%s: %s\n" "$f" "$(xxd -l 3 "$f")"
done >> log
I'm not totally clear what you are asking, but is this what you want?
for i in *.txt;
do
echo "$i" >> log
xxd -l 3 $i >> log
done
It's better to use find with the -exec option to run a command for every file matching certain criteria.
If you want all files in your current directory matching *.txt you can use find. You can use the -exec option to run a command for each file. {} replaces the name of the file and \; (an escaped ; terminates the command). You can use + instead to tell find to replace {} with multiple filenames.
find . -type f -name '*.txt' -maxdepth 1 -exec xxd -l 3 {} \; >> log
Note that the above example includes hidden files, you can exclude them using a regex.
find . -type f \( ! -regex '.*/\..*' \) -name '*.txt' -maxdepth 1 -exec xxd -l 3 {} \; >> log
Also, if you're going to be globbing files in the current directory and using them in commands, always use ./*. Paths beginning with - are likely to be interpreted by your command as options.

while loop stops after first iteration in BASH [duplicate]

This question already has answers here:
While loop stops reading after the first line in Bash
(5 answers)
Closed 1 year ago.
I wrote the script that has to convert *.avi files to mp4 format.
However "while" loop stops after first iteration.
#!/bin/bash
shopt -s lastpipe
cd <some_directory>
find . -name *.avi -type f |
while read -r avi
do
/usr/bin/HandBrakeCLI -i "${avi}" -o "${avi%.avi}.mp4" -f mp4 -m -O -e x264 -q 20 --vfr \
# &> /dev/null
if [ "$?" -eq 0 ]
then
echo "${avi} was converted successfully"
rm "${avi}"
else
echo "${avi} was not converted"
break
fi
done
This part is wrong: find . -name *.avi -type f
The shell is expanding the wildcard before find starts, so the find command looks like:
find . -name a.avi b.avi.c.avi d.avi ... -type f
I'm surprised you didn't notice an error message, like "find: paths must precede expression: b.avi"
You need to protect the asterisk from the shell so find can to its own expansion. Pick one of
find . -name \*.avi -type f
find . -name '*.avi' -type f
You don't mention if you're on a GNU system or not. You're while loop is at risk of being tripped up by filenames with leading or trailing whitespace. Try this:
find . -name \*.avi -type f -print0 | while read -rd '' avi; do ...
HandBrakeCLI could also be reading input that it makes your loop end after the first instance is called. Since you're using bash, you can use process substitution with redirected input to another file descriptor. In this example with use 4:
while read -ru 4 avi; do
...
done 4< <(exec find . -name *.avi -type f)
My preferred version too is to use readarray. It's quite enough if you don't target irregular filenames where they have newlines:
readarray -t files < <(exec find . -name *.avi -type f)
for avi in "${files[#]}"; do
...
done
Another way perhaps is to redirect input of HandBrakeCLI to /dev/null:
</dev/null /usr/bin/HandBrakeCLI ...
Other suggestions:
Quote your -name pattern: '*.avi'
Use IFS= to prevent stripping of leading and trailing spaces: IFS= while read ...

Resources