Count number of found paths with find in bash variable - bash

I wanted to search for some files recursively and count the number of occurrences that I found.
To find the files I did:
file=$(find . -iname "*.xml")
Now I'd like to store the number of occurrences in another variable. I just don't know how. I tried:
n=$(echo $file | wc -l)
but I don't think that's the right way...
Super grateful for any help:)

There is a typo in your command: the } is not correct.
With it removed, your attempt is pretty close, but you have to quote the variable expansion to preserve linebreaks:
files=$(find . -iname '*.xml')
n=$(echo "$files" | wc -l)
echo "$n"
This can still break, though, for files with exotic names – for example including a newline in the filename. To make it robust for all possible filenames, you could do this (requires GNU find):
files=$(find . -iname '*.xml' -printf '.')
echo "${#files}"
This prints a single . for each file found and then counts these periods.
Alternatively, if you don't have GNU find, you could use null byte separation for filenames and read them into an array:
readarray -d '' files < <(find . -iname '*.xml' -print0)
echo "${#files[#]}"
or for older version of Bash where readarray can't specify the delimiter to use (any Bash older than 4.4):
while IFS= read -r -d '' fname; do
files+=("$fname")
done < <(find . -iname '*.xml' -print0)
echo "${#files[#]}"

#!/bin/bash
# make an array
files=($(find . -name \*.xml -print))
# number of array elements
fcnt=${#files[#]}
echo files: "${files[#]}"
echo
echo fcnt: $fcnt

Related

How to get file count and names in directory on bash

I want to get the file count & file names & folder names in directory:
mkdir -p /tmp/test/folder1
mkdir -p /tmp/test/folder2
touch /tmp/test/file1
touch /tmp/test/file2
file_names=$(find "/tmp/test" -mindepth 1 -maxdepth 1 -type f -print0 | xargs -0 -I {} basename "{}")
echo $file_names
here is the output:
file2 file1
For folder:
folder_names=$(find "/tmp/test" -mindepth 1 -maxdepth 1 -type d -print0 | xargs -0 -I {} basename "{}")
echo $folder_names
here is the output:
folder2 folder1
For count:
file_count=0 && $(find "/tmp/test" -mindepth 1 -maxdepth 1 -type f -print0 | let "file_count=file_count+1")
echo $file_count
folder_count=0 && $(find "/tmp/test" -mindepth 1 -maxdepth 1 -type d -print0 | let "folder_count=folder_count+1")
echo $folder_count
The file_count and folder_count does not work
Question 1:
How to get the correct file_count and folder_count?
Question 2:
Is it possible for getting names into an array and check the count from array size?
The answer to the second question is really the answer to the first, too.
mapfile -d '' files < <( find /tmp/test -type f \
-mindepth 1 -maxdepth 1 \
-printf '%f\0')
echo "${#files} files"
printf '%s\n' "${files[#]}"
The use of double quotes and # in the array expansion are essential for printing file names with whitespace correctly. The use of a null byte terminator between file names ensures that even newlines in file names are disambiguated.
Notice also the use of -printf with a specific format string to avoid having to run basename separately. However, the -printf option and its various format strings, as well as the -print0 option you used, are a GNU find extension, and thus not portable. (Linux typically ships with GNU tools; on other platforms, they are obviously easy to install, but typically not installed out of the box.)
If you have an older version of Bash which doesn't support mapfiles, try an explicit loop:
files=()
while IFS= read -r -d $'\0' file; do
files+=("$file")
done < <(find ...)
If you don't have GNU find, a common workaround is to print a fixed string for each found file, and then the line or character count reliably reflects the number of found files.
find /tmp/test -type f \
-mindepth 1 -maxdepth 1 \
-exec printf . \; |
wc -c
Though then, how do you collect the file names? If (as in your case) you don't require recursion into subdirectories, simply loop over all items in the directory.
In which case, again, the number of items in the collected array will also tell you how many there are.
files=()
dirs=()
for item in /tmp/test/*; do
if [[ -f "$item"]]; then
files+=("$item")
elif [[ -d "$item" ]]; then
dirs+=("$item")
fi
done
echo "${#dirs[#] directories}
printf '- %s\n' "${dirs[#]}"
echo "${#files[#]} files"
printf '%s\n' "${dirs[#]}"
For a further discussion, see also https://mywiki.wooledge.org/BashFAQ/020
Needlessly collecting items into an array so you can loop over it once is a common beginner antipattern. If you just want to process each item in isolation and then forget it, there is no need to waste memory on remembering all the other items - just loop over them directly.
As an aside, running find in a subprocess will create a new shell instance with its own set of variables; thus in your attempt, the pipe to let would increment from 0 to 1 each time you ran it (though of course, piping to let also does not do anything useful anyway).

Store output of find with -print0 in variable

I am on macOS and using find . -type f -not -xattrname "com.apple.FinderInfo" -print0 to create a list of files. I want to store that list and be able to pass it to multiple commands in my script. However, I can't use tee because I need them to be sequential and wait for each to complete. The issue I am having is that since print0 uses the null character if I put it into a variable then I can't use it in commands.
To load 0-delimited data into a shell array (Much better than trying to store multiple filenames in a single string):
bash 4.4 or newer:
readarray -t -d $'\0' files < <(find . -type f -not -xattrname "com.apple.FinderInfo" -print0)
some_command "${files[#]}"
other_command "${files[#]}"
Older bash, and zsh:
while read -r -d $'\0' file; do
files+=("$file")
done < <(find . -type f -not -xattrname "com.apple.FinderInfo" -print0)
some_command "${files[#]}"
other_command "${files[#]}"
This is a bit verbose, but works with the default bash 3.2:
eval "$(find ... -print0 | xargs -0 bash -c 'files=( "$#" ); declare -p files' bash)"
Now the files array should exist in your current shell.
You will want to expand the variable with "${files[#]}" including the quotes, to pass the list of files.

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks
Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]
Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.
For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.
Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

How to loop through files in a directory having a certain word in filename using bash script?

I'm trying to loop through all the files in a directory which contain a certain word in filename using bash script.
The following script loops through all the files in the directory,
cd path/to/directory
for file in *
do
echo $file
done
ls | grep 'my_word' gives only the files which have the word 'my_word' in the filename. However I'm unsure how to replace the * with ls | grep 'my_word' in the script.
If i do like this,
for file in ls | grep 'my_word'
do
echo $file
done
It gives me an error "syntax error near unexpected token `|'". What is the correct way of doing this?
You should avoid parsing ls where possible. Assuming there are no subdirectories in your present directory, a glob is usually sufficient:
for file in *foo*; do echo "$file"; done
If you have one or more subdirectories, you may need to use find. For example, to cat the files:
find . -type f -name "*foo*" | xargs cat
Or if your filenames contain special characters, try:
find . -type f -name "*foo*" -print0 | xargs -0 cat
Alternatively, you can use process substitution and a while loop:
while IFS= read -r myfile; do echo "$myfile"; done < <(find . -type f -name '*foo*')
Or if your filenames contain special characters, try:
while IFS= read -r -d '' myfile; do
echo "$myfile"
done < <(find . -type f -name '*foo*' -print0)

Spaces in path names giving trouble with Find in Bash. Any *simple* work-around?

Is there any way to change the following string so I don't get any problems when there are files/folders with spaces in them?
files=`find ~/$folder -name "*#*" -type f`
I'd prefer if there was a solution that wouldn't have to involve having to change other parts of my code but this line of code, as everything seems to be working correctly, apart from this minor detail.
Thanks
EDIT: Here is the code in a bit more detail:
abc=( $(find "$pasta" -name "$ficheiro_original#*" -type f) )
abc_length=${#abc[#]}
If you are not using those file names later in your script , just iterate them and process on the fly.
find ~/$folder -name "*#*" -type f | while read -r FILE
do
echo "do you stuff"
done
otherwise, you can set IFS
IFS=$'\n'
files=$(find ~/$folder -name "*#*" -type f)
Update:
$ IFS=$'\n'
$ a=($(find . -type f ))
$ echo ${#a[#]}
14
You'd have to make some changes, but to deal with arbitrary names, think in terms of using the GNU Find option -print0, etc.
find ~/$folder -name "*#*" -type f -print0 |
while read -d '^#' file
do
echo "<<$file>>"
done
(Where the single byte represented as '^#' is actually an ASCII NUL ('\0'; enter with Control-V Control-Shift-#).
find ~/$folder -name "*#*" -type f -print0 |
while read -d '' file
do
echo "<<$file>>"
done
The empty string for the delimiter means 'use the zero byte, ASCII NUL, as the delimiter' and is appropriate for parsing 'find ... -print0' output. (Thanks Dennis Williamson for the tip.)
This allows you to read any arbitrary names. You should probably use a bash array to hold the names, but that implies some changes further down the script.
(Given the comment response that only spaces have to be worried about, this might be overkill, though using read to process lines with the names is a key part of the operation, and using an array would probably make life simpler.)
If you need a list of files that might have spaces in the names, you pretty much have to store them as an array, rather than just a string. Create the array with something like this:
saveIFS="$IFS"; IFS=$'\n'; files=( $(find ~/"$folder" -name "*#*" -type f) ); IFS="$saveIFS"
and then you'll have to modify the rest of the script to use files as an array rather than a string, and it (and other filenames) should always be in double-quotes to keep spaces from being mistaken as separators. For instance, anyplace you're currently using $files, replace that with "${files[#]}"
ls "${files[#]}"
for f in "${files[#]}"; do
ls "$f"
done
echo "found ${#files[#]} files"
Here is another way to get around without changing the rest of code:
# files=($(find))
eval "files=($(find -printf '"%h/%f" '))"
for f in "${files[#]}"; do
echo "$f"
done
It's dirty and will not work for filename with special characters, e.g. ". It uses eval to evaluate a string of a Bash array and -printf of find to form that string.
I personally prefer changing $IFS, just FYI.
To read file names with spaces into a Bash array variable, you could use the "read" builtin command as well:
printf '%q\n' "$IFS"
IFS=$'\n' read -r -d "" -a abc <<< "$(find ~/$folder -name "*#*" -type f)"
IFS=$'\n' read -r -d "" -a abc < <(find ~/$folder -name "*#*" -type f) # alternative
abc_length=${#abc[#]}
for ((i=1; i <= ${#abc[#]}; i++)); do echo "$i: ${abc[i-1]}"; done
printf '%q\n' "$IFS"
Note that the scope of the newly set IFS variable is limited to the execution of the read command (which leaves the original IFS variable intact).

Resources