How to get file count and names in directory on bash - bash

I want to get the file count & file names & folder names in directory:
mkdir -p /tmp/test/folder1
mkdir -p /tmp/test/folder2
touch /tmp/test/file1
touch /tmp/test/file2
file_names=$(find "/tmp/test" -mindepth 1 -maxdepth 1 -type f -print0 | xargs -0 -I {} basename "{}")
echo $file_names
here is the output:
file2 file1
For folder:
folder_names=$(find "/tmp/test" -mindepth 1 -maxdepth 1 -type d -print0 | xargs -0 -I {} basename "{}")
echo $folder_names
here is the output:
folder2 folder1
For count:
file_count=0 && $(find "/tmp/test" -mindepth 1 -maxdepth 1 -type f -print0 | let "file_count=file_count+1")
echo $file_count
folder_count=0 && $(find "/tmp/test" -mindepth 1 -maxdepth 1 -type d -print0 | let "folder_count=folder_count+1")
echo $folder_count
The file_count and folder_count does not work
Question 1:
How to get the correct file_count and folder_count?
Question 2:
Is it possible for getting names into an array and check the count from array size?

The answer to the second question is really the answer to the first, too.
mapfile -d '' files < <( find /tmp/test -type f \
-mindepth 1 -maxdepth 1 \
-printf '%f\0')
echo "${#files} files"
printf '%s\n' "${files[#]}"
The use of double quotes and # in the array expansion are essential for printing file names with whitespace correctly. The use of a null byte terminator between file names ensures that even newlines in file names are disambiguated.
Notice also the use of -printf with a specific format string to avoid having to run basename separately. However, the -printf option and its various format strings, as well as the -print0 option you used, are a GNU find extension, and thus not portable. (Linux typically ships with GNU tools; on other platforms, they are obviously easy to install, but typically not installed out of the box.)
If you have an older version of Bash which doesn't support mapfiles, try an explicit loop:
files=()
while IFS= read -r -d $'\0' file; do
files+=("$file")
done < <(find ...)
If you don't have GNU find, a common workaround is to print a fixed string for each found file, and then the line or character count reliably reflects the number of found files.
find /tmp/test -type f \
-mindepth 1 -maxdepth 1 \
-exec printf . \; |
wc -c
Though then, how do you collect the file names? If (as in your case) you don't require recursion into subdirectories, simply loop over all items in the directory.
In which case, again, the number of items in the collected array will also tell you how many there are.
files=()
dirs=()
for item in /tmp/test/*; do
if [[ -f "$item"]]; then
files+=("$item")
elif [[ -d "$item" ]]; then
dirs+=("$item")
fi
done
echo "${#dirs[#] directories}
printf '- %s\n' "${dirs[#]}"
echo "${#files[#]} files"
printf '%s\n' "${dirs[#]}"
For a further discussion, see also https://mywiki.wooledge.org/BashFAQ/020
Needlessly collecting items into an array so you can loop over it once is a common beginner antipattern. If you just want to process each item in isolation and then forget it, there is no need to waste memory on remembering all the other items - just loop over them directly.
As an aside, running find in a subprocess will create a new shell instance with its own set of variables; thus in your attempt, the pipe to let would increment from 0 to 1 each time you ran it (though of course, piping to let also does not do anything useful anyway).

Related

find directories but exclude list where directories have a space in name

I have a process that audits files from one day to the next on a large file system. I want to exclude some directories from consideration by using a list of directories to exclude. I can do that just fine, but I'm having trouble if an exclude directory has a space in the name.
For simplicity's sake, I'm only going to list four sub-directories, but in reality there are many more directories I want to search vs exclude. There's also the chance that a new directory gets added and I want to automatically include new directories, hence the exclude list vs using an include list.
base_dir/
├── sub_dir1
├── sub_dir2
├── sub dir3
└── sub_dir4
I have a shell script and an exclude list
$ cat exclude.txt
sub_dir2
sub dir3
The shell script uses find and printf along with awk and sort to get a list of directories to audit.
$ find ./base_dir -maxdepth 1 -type d $(printf "! -iname %s " $(cat exclude.txt)) | awk -F/ '{print $NF}' | sort
sub_dir1
sub dir3
sub_dir4
As you can probably guess and see above, this works except that it's not ignoring sub dir3. I've tried a few combinations of double quotes inside exclude list and using %q vs %s vs %a, but can't seem to find the correct combination.
My desired output is
sub_dir1
sub_dir4
I realize I could do something like:
find ./base_dir -maxdepth 1 -type d \
! -iname "sub dir3" $(printf "! -iname %s " $(cat exclude.txt)) \
| awk -F/ '{print $NF}' | sort
and get my expected output, but I want to only use the exclude.txt list.
EDIT
After reading some replies I tried using an array and thought that would work, now it's even more obscure to me why this option doesn't work. printf appears to produce a string that would work if I strictly typed it into the command line, but when trying to run it as a one-liner still giving me errors.
$cat exclude.txt
base_dir
sub_dir2
"sub dir3"
$ mapfile -t exclude < exclude.txt
$printf "! -iname %s " "${exclude[#]}"
! -iname base_dir ! -iname sub_dir2 ! -iname "sub dir3"
$find ./base_dir -maxdepth 1 -type d $(printf "! -iname %s " "${exclude[#]}")
find: paths must precede expression: dir3"
$ find ./base_dir -maxdepth 1 -type d ! -iname base_dir ! -iname sub_dir2 ! -iname "sub dir3"
./base_dir/sub_dir1
./base_dir/sub_dir4
You could read the exclude file into a Bash array and then craft a find command like this:
mapfile -t exclude < exclude.txt
find ./base_dir \
-mindepth 1 \ # Exclude the current directory
-type d \
-regextype egrep \ # Make sure alternation "|" does not have to be escaped
! -iregex ".*/($(IFS='|'; echo "${exclude[*]}"))" \
-printf '%f\n' # Print just filename without leading directories
resulting in
sub_dir1
sub_dir4
For your example input, the -iregex test expands like this:
$ IFS='|'
$ echo "${exclude[*]}")
sub_dir2|sub dir3
so the regular expression for paths to exclude becomes
.*/(sub_dir2|sub dir3)
The change to IFS is limited to the command substitution.
The limitation to this is if the directories to be excluded contain characters that are special to regexes, you have to escape those, which can get messy. If you wanted to escape, for example, pipes, you could use
echo "${exclude[*]//|/\\|}"
in the command substitution, resulting in
sub_dir2|sub dir3|has\|pipe
where the directory has|pipe with a | in its name has its pipe properly escaped.
edited to include new info, in case it's useful later
Don't embed printf/cat. The interpreter parser is working against you.
Stack the exclusion filters with paste -s into a tempfile to build your command dynamically, then execute it.
$: find ./base_dir
./base_dir
./base_dir/sub dir1
./base_dir/sub dir3
./base_dir/sub_dir1
./base_dir/sub_dir3
$: tmpfile=/tmp/xFinder
$: printf "find ./base_dir -maxdepth 1 -type d ! -iname base_dir " > $tmpfile
$: { sed -E 's/^(.*)/! -iname \"\1\"/' exclude.txt;
printf " | xargs -I R basename R "; } | paste -s >> $tmpfile
$: cat $tmpfile
find ./base_dir -maxdepth 1 -type d ! -iname base_dir ! -iname "sub_dir1" ! -iname "sub dir3" ! -iname "sub_dir4" | xargs -I R basename R
The xargs call to basname strips the path info, and ! -iname base_dir keeps it out of the find output as a dir of it's own.
$: . $tmpfile
./base_dir
./base_dir/sub dir1
./base_dir/sub_dir3
Apologies for the earlier incomplete version.
Since you only want to limit to a single subdirectory, without recursion, you can use a for loop with whildcards:
$ find base_dir/
base_dir/
base_dir/sub_dir2
base_dir/sub_dir1
base_dir/sub_dir4
base_dir/sub dir3
$ cat exclude.txt
sub_dir2
sub dir3
$ cat script.sh
#!/bin/bash
for dir in base_dir/*
do
! [ -d "$dir" ] ||
grep -qFx -- "$(basename -- "$dir")" exclude.txt &&
continue
echo "$dir" # or do somthing else
done
$ ./script.sh
base_dir/sub_dir1
base_dir/sub_dir4

How to log variable in bash

for i in *.txt;
do
xxd -l 3 $i >> log
done
I also want to log file names $i for each result. E.g.:
file_name
result_of_command
You probably just need to use printf:
for f in *.txt; do
printf "%s: %s\n" "$f" "$(xxd -l 3 "$f")"
done >> log
I'm not totally clear what you are asking, but is this what you want?
for i in *.txt;
do
echo "$i" >> log
xxd -l 3 $i >> log
done
It's better to use find with the -exec option to run a command for every file matching certain criteria.
If you want all files in your current directory matching *.txt you can use find. You can use the -exec option to run a command for each file. {} replaces the name of the file and \; (an escaped ; terminates the command). You can use + instead to tell find to replace {} with multiple filenames.
find . -type f -name '*.txt' -maxdepth 1 -exec xxd -l 3 {} \; >> log
Note that the above example includes hidden files, you can exclude them using a regex.
find . -type f \( ! -regex '.*/\..*' \) -name '*.txt' -maxdepth 1 -exec xxd -l 3 {} \; >> log
Also, if you're going to be globbing files in the current directory and using them in commands, always use ./*. Paths beginning with - are likely to be interpreted by your command as options.

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks
Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]
Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.
For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.
Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

Bash looping through files in Directory

I have a bash script, created by someone else, that I need to modify a little.
Since I'm new to Bash, I may need a little help with some common commands.
The script simply loops through a directory (recursively) for a specific file extension.
Here's the current script: (runme.sh)
#! /bin/bash
SRC=/docs/companies/
function report()
{
echo "-----------------------"
find $SRC -iname "*.aws" -type f -print
echo -e "\033[1mSOURCE FILES=\033[0m" `find $SRC -iname "*.aws" -type f -print |wc -l`
echo "-----------------------"
exit 0
}
report
I simply type #./runme.sh and I can see a list of all files with the extension of .aws
My primary goal is to limit the search. (some directories have way too many files)
I would like to run the script, limiting it to just 20 files.
Do I need to place the entire script into a loop method?
That's easy -- as long as you want the first 20 files, just pipe the first find command through head -n 20. But I can't resist a little cleanup while I'm at it: as written, it runs find twice, once to print the filenames and once to count them; if there are a lot of files to search, this is a waste of time. Second, wrapping the actual content of the script in a function (report) doesn't make much sense, and having the function exit (rather than returning) makes even less. Finally, I like to protect filenames with double-quotes and hate backquotes (use $() instead). So I took the liberty of a bit of cleanup:
#! /bin/bash
SRC=/docs/companies/
files="$(find "$SRC" -iname "*.aws" -type f -print)"
if [ -n "$files" ]; then
count="$(echo "$files" | wc -l)"
else # echo would print one line even if there are no files, so special-case the empty list
count=0
fi
echo "-----------------------"
echo "$files" | head -n 20
echo -e "\033[1mSOURCE FILES=\033[0m $count"
echo "-----------------------"
Use head -n 20 (as proposed by Peter). Additional remark: the script is very inefficient, as it runs find twice. You should consider using tee to gennerate a temporary file when the command runs for the first time, count the lines of this file afterwards and delete the file.
I would personnaly prefer to do it like this:
files=0
while read file ; do
files=$(($files + 1))
echo $file
done < <(find "$SRC" -iname "*.aws" -type f -print0 | head -20)
echo "-----------------------"
find $SRC -iname "*.aws" -type f -print
echo -e "\033[1mSOURCE FILES=\033[0m" $files
echo "-----------------------"
If you just want there count, you could only use find "$SRC" -iname "*.aws" -type f -print0 | head -20

How can I list all unique file names without their extensions in bash?

I have a task where I need to move a bunch of files from one directory to another. I need move all files with the same file name (i.e. blah.pdf, blah.txt, blah.html, etc...) at the same time, and I can move a set of these every four minutes. I had a short bash script to just move a single file at a time at these intervals, but the new name requirement is throwing me off.
My old script is:
find ./ -maxdepth 1 -type f | while read line; do mv "$line" ~/target_dir/; echo "$line"; sleep 240; done
For the new script, I basically just need to replace find ./ -maxdepth 1 -type f
with a list of unique file names without their extensions. I can then just replace do mv "$line" ~/target_dir/; with do mv "$line*" ~/target_dir/;.
So, with all of that said. What's a good way to get a unique list of files without their file names with bash script? I was thinking about using a regex to grab file names and then throwing them in a hash to get uniqueness, but I'm hoping there's an easier/better/quicker way. Ideas?
A weird-named files tolerant one-liner could be:
find . -maxdepth 1 -type f -and -iname 'blah*' -print0 | xargs -0 -I {} mv {} ~/target/dir
If the files can start with multiple prefixes, you can use logic operators in find. For example, to move blah.* and foo.*, use:
find . -maxdepth 1 -type f -and \( -iname 'blah.*' -or -iname 'foo.*' \) -print0 | xargs -0 -I {} mv {} ~/target/dir
EDIT
Updated after comment.
Here's how I'd do it:
find ./ -type f -printf '%f\n' | sed 's/\..*//' | sort | uniq | ( while read filename ; do find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \; ; sleep 240; done )
Perhaps it needs some explaination:
find ./ -type f -printf '%f\n': find all files and print just their name, followed by a newline. If you don't want to look in subdirectories, this can be substituted by a simple ls;
sed 's/\..*//': strip the file extension by removing everything after the first dot. Both foo.tar ad foo.tar.gz are transformed into foo;
sort | unique: sort the filenames just found and remove duplicates;
(: open a subshell:
while read filename: read a line and put it into the $filename variable;
find . -type f -iname "$filename"'*' -exec mv {} /dest/dir \;: find in the current directory (find .) all the files (-type f) whose name starts with the value in filename (-iname "$filename"'*', this works also for files containing whitespaces in their name) and execute the mv command on each one (-exec mv {} /dest/dir \;)
sleep 240: sleep
): end of subshell.
Add -maxdepth 1 as argument to find as you see fit for your requirements.
Nevermind, I'm dumb. there's a uniq command. Duh. New working script is: find ./ -maxdepth 1 -type f | sed -e 's/.[a-zA-Z]*$//' | uniq | while read line; do mv "$line*" ~/target_dir/; echo "$line"; sleep 240; done
EDIT: Forgot close tag on code and a backslash.

Resources