Store output of find with -print0 in variable - macos

I am on macOS and using find . -type f -not -xattrname "com.apple.FinderInfo" -print0 to create a list of files. I want to store that list and be able to pass it to multiple commands in my script. However, I can't use tee because I need them to be sequential and wait for each to complete. The issue I am having is that since print0 uses the null character if I put it into a variable then I can't use it in commands.

To load 0-delimited data into a shell array (Much better than trying to store multiple filenames in a single string):
bash 4.4 or newer:
readarray -t -d $'\0' files < <(find . -type f -not -xattrname "com.apple.FinderInfo" -print0)
some_command "${files[#]}"
other_command "${files[#]}"
Older bash, and zsh:
while read -r -d $'\0' file; do
files+=("$file")
done < <(find . -type f -not -xattrname "com.apple.FinderInfo" -print0)
some_command "${files[#]}"
other_command "${files[#]}"

This is a bit verbose, but works with the default bash 3.2:
eval "$(find ... -print0 | xargs -0 bash -c 'files=( "$#" ); declare -p files' bash)"
Now the files array should exist in your current shell.
You will want to expand the variable with "${files[#]}" including the quotes, to pass the list of files.

Related

Count number of found paths with find in bash variable

I wanted to search for some files recursively and count the number of occurrences that I found.
To find the files I did:
file=$(find . -iname "*.xml")
Now I'd like to store the number of occurrences in another variable. I just don't know how. I tried:
n=$(echo $file | wc -l)
but I don't think that's the right way...
Super grateful for any help:)
There is a typo in your command: the } is not correct.
With it removed, your attempt is pretty close, but you have to quote the variable expansion to preserve linebreaks:
files=$(find . -iname '*.xml')
n=$(echo "$files" | wc -l)
echo "$n"
This can still break, though, for files with exotic names – for example including a newline in the filename. To make it robust for all possible filenames, you could do this (requires GNU find):
files=$(find . -iname '*.xml' -printf '.')
echo "${#files}"
This prints a single . for each file found and then counts these periods.
Alternatively, if you don't have GNU find, you could use null byte separation for filenames and read them into an array:
readarray -d '' files < <(find . -iname '*.xml' -print0)
echo "${#files[#]}"
or for older version of Bash where readarray can't specify the delimiter to use (any Bash older than 4.4):
while IFS= read -r -d '' fname; do
files+=("$fname")
done < <(find . -iname '*.xml' -print0)
echo "${#files[#]}"
#!/bin/bash
# make an array
files=($(find . -name \*.xml -print))
# number of array elements
fcnt=${#files[#]}
echo files: "${files[#]}"
echo
echo fcnt: $fcnt

how to increment with find -exec?

I Would like to do something like that
#!/bin/bash
nb=$(find . -type f -name '*.mp4' | wc -l)
var=0
find . -type f -name '*.mp4' -exec ((var++)) \;
echo $var
But it doesn't work ? Can you help me ?
You can't. Each exec is performed in a separate process. Those processes aren't part of your shell, so they can't access or change shell variables. (They could potentially read environment variables, but updated versions of those variables would be lost as soon as the processes exited; they couldn't make changes).
If you want to modify shell state, you need to do that in the shell itself. Thus:
#!/usr/bin/env bash
# ^^^^- NOT /bin/sh; do not run as "sh scriptname"
while IFS= read -r -d '' filename; do
((++var))
done < <(find . -type f -name '*.mp4' -print0)
Note preincrement vs postincrement -- that helps you avoid some gotchas if you're running your script with set -e (though I'd argue that the better practice is to avoid that "feature").
See Using Find for details.
This is using find but not with the option -exec, if you just want to store in a variable the number of items found, something like this could work:
#!/bin/bash
var=$(find . -type f -name '*.mp4' | wc -l | awk '{print $1}')
echo $var
is this what you require:
bash-4.4$ var=$(find . -name "*.mp4" -exec echo {} \;|wc -l)
bash-4.4$ echo $var
4
it counts the number of *.mp4 files inside the dir and assigns the number to var.
short and sweet with the help of -c option in egrep
ALP ❱ find . | egrep mp4$
./T/How_to_Use_Slang_at_the_Market_English_Lessons.mp4
./T/How_to_Use_Slang_on_the_Road_English_Lessons.mp4
./T/How_to_Use_Slang_on_Vacation_English_Lessons.mp4
./T/How_to_Use_Slang_at_the_Airport_English_Lessons.mp4
./T/How_to_Use_Slang_to_Talk_about_Health_English_Lessons.mp4
./list-mp4
ALP ❱ find . | egrep -c mp4$
6
ALP ❱

How to find all file paths in a directory with bash

I have a script that looks like this:
function main() {
for source in "$#"; do
sort_imports "${source}"
done
}
main "$#"
Right now if I pass in a file ./myFile.m the script works as expected.
I want to change it to passing in ./myClassPackage and have it find all files and call sort_imports on each of them.
I tried:
for source in $(find "$#"); do
sort_imports "${source}"
done
but when I call it I get an error that I'm passing it a directory.
Using the output of a command substitution for a for loop has pitfalls due to word splitting. A truly rock-solid solution will use null-byte delimiters to properly handle even files with newlines in their names (which is not common, but valid).
Assuming you only want regular files (and not directories), try this :
while IFS= read -r -d '' source; do
sort_imports "$source"
done < <(find "$#" -type f -print0)
The -print0 option causes find to separate entries with null bytes, and the -d '' option for read allows these to be used as record separators.
You should use find with -exec:
find "$#" -type f -exec sort_imports "{}" \;
For more information see https://www.everythingcli.org/find-exec-vs-find-xargs/
If you don't want find to enumerate directories, then exclude them:
for source in $(find "$#" -not -type d); do
sort_imports "${source}"
done

Bash find execute process with output redirected to a different file per each

I'd like to run the following bash command for every file in a folder (outputting a unique JSON file for each processed .csv), via a Makefile:
csvtojson ./file/path.csv > ./file/path.json
Here's what I've managed, I'm struggling with the stdin/out syntax and arguments:
find ./ -type f -name "*.csv" -exec csvtojson {} > {}.json \;
Help much appreciated!
You're only passing a single argument to csvtojson -- the filename to run.
The > outputfile isn't an argument at all; instead, it's an instruction to the shell that parses and invokes the relevant command to connect the command's stdout to the given filename before actually starting that command.
Thus, above, that redirection is parsed before the find command is run -- because that's the only place a shell is involved at all.
If you want to involve a shell, consider doing so as follows:
find ./ -type f -name "*.csv" \
-exec sh -c 'for arg; do csvtojson "$arg" >"${arg}.json"; done' _ {} +
...or, as follows:
find ./ -type f -name '*.csv' -print0 |
while IFS= read -r -d '' filename; do
csvtojson "$filename" >"$filename.json"
done
...or, if you want to be able to set shell variables inside the loop and have them persist after its exit, you can use a process substitution to avoid the issues described in BashFAQ #24:
bad=0
good=0
while IFS= read -r -d '' filename; do
if csvtojson "$filename" >"$filename.json"; then
(( ++good ))
else
(( ++bad ))
fi
done < <(find ./ -type f -name '*.csv' -print0)
echo "Converting CSV files to JSON: ${bad} failures, ${good} successes" >&2
See UsingFind, particularly the Complex Actions section and the section on Actions In Bulk.

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks
Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]
Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.
For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.
Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

Resources