while loop stops after first iteration in BASH [duplicate] - bash

This question already has answers here:
While loop stops reading after the first line in Bash
(5 answers)
Closed 1 year ago.
I wrote the script that has to convert *.avi files to mp4 format.
However "while" loop stops after first iteration.
#!/bin/bash
shopt -s lastpipe
cd <some_directory>
find . -name *.avi -type f |
while read -r avi
do
/usr/bin/HandBrakeCLI -i "${avi}" -o "${avi%.avi}.mp4" -f mp4 -m -O -e x264 -q 20 --vfr \
# &> /dev/null
if [ "$?" -eq 0 ]
then
echo "${avi} was converted successfully"
rm "${avi}"
else
echo "${avi} was not converted"
break
fi
done

This part is wrong: find . -name *.avi -type f
The shell is expanding the wildcard before find starts, so the find command looks like:
find . -name a.avi b.avi.c.avi d.avi ... -type f
I'm surprised you didn't notice an error message, like "find: paths must precede expression: b.avi"
You need to protect the asterisk from the shell so find can to its own expansion. Pick one of
find . -name \*.avi -type f
find . -name '*.avi' -type f
You don't mention if you're on a GNU system or not. You're while loop is at risk of being tripped up by filenames with leading or trailing whitespace. Try this:
find . -name \*.avi -type f -print0 | while read -rd '' avi; do ...

HandBrakeCLI could also be reading input that it makes your loop end after the first instance is called. Since you're using bash, you can use process substitution with redirected input to another file descriptor. In this example with use 4:
while read -ru 4 avi; do
...
done 4< <(exec find . -name *.avi -type f)
My preferred version too is to use readarray. It's quite enough if you don't target irregular filenames where they have newlines:
readarray -t files < <(exec find . -name *.avi -type f)
for avi in "${files[#]}"; do
...
done
Another way perhaps is to redirect input of HandBrakeCLI to /dev/null:
</dev/null /usr/bin/HandBrakeCLI ...
Other suggestions:
Quote your -name pattern: '*.avi'
Use IFS= to prevent stripping of leading and trailing spaces: IFS= while read ...

Related

Removing files with/without spaces in filename

Hello stackoverflow community,
I'm facing a problem with removing files that contain spaces in filename, i have this part of code which is responsible of deleting files that we get from a directory,
for f in $(find $REP -type f -name "$Filtre" -mtime +${DelAvtPurge})
do
rm -f $f
I know that simple or double quotes are working for deleting files with spaces, it works for me when i try them in a command line, but when i put them in $f in the file it doesn't work at all.
Could anybody help me to find a solution for this ?
GNU find has -delete for that:
find "$REP" -type f -name "$Filtre" -mtime +"$DelAvtPurge" -delete
With any other find implementation, you can use bulk-exec:
find "$REP" -type f -name "$Filtre" -mtime +"$DelAvtPurge" -exec rm -f {} +
For a dry-run, drop -delete from the first and see the list of files to be deleted; for second, insert echo before rm.
The other answer has shown how to do this properly. But fundamentally the issue in your command is the lack of quoting, due to the way the shell expands variables:
rm -f $f
needs to become
rm -f "$f"
In fact, always quoting your variables is safe and generally a good idea.
However, this will not fix your code. Now filenames with spaces will work, but filenames with other valid characters (to wit, newlines) won’t. Try it:
touch foo$'\n'bar
for f in $(find . -maxdepth 1 -name foo\*); do echo "rm -f $f"; done
Output:
rm -f ./foo
rm -f bar
Clearly that won’t do. In fact, you mustn’t parse the output of find, for this reason. The only way of making this safe, apart from the solution via find -exec is to use the -print0 option:
find "$REP" -type f -name "$Filtre" -mtime +"$DelAvtPurge" -print0 \
| IFS= while read -r -d '' f; do
rm -f "$f"
done
Using -print0 instead of (implicit) -print causes find to delimit hits by the null character instead of newline. Correspondingly, IFS= read -r -d '' reads a null-character delimited input string, which we do in a loop using while (the -r option prevents read from interpreting backslashes as escape sequences).

Bash find execute process with output redirected to a different file per each

I'd like to run the following bash command for every file in a folder (outputting a unique JSON file for each processed .csv), via a Makefile:
csvtojson ./file/path.csv > ./file/path.json
Here's what I've managed, I'm struggling with the stdin/out syntax and arguments:
find ./ -type f -name "*.csv" -exec csvtojson {} > {}.json \;
Help much appreciated!
You're only passing a single argument to csvtojson -- the filename to run.
The > outputfile isn't an argument at all; instead, it's an instruction to the shell that parses and invokes the relevant command to connect the command's stdout to the given filename before actually starting that command.
Thus, above, that redirection is parsed before the find command is run -- because that's the only place a shell is involved at all.
If you want to involve a shell, consider doing so as follows:
find ./ -type f -name "*.csv" \
-exec sh -c 'for arg; do csvtojson "$arg" >"${arg}.json"; done' _ {} +
...or, as follows:
find ./ -type f -name '*.csv' -print0 |
while IFS= read -r -d '' filename; do
csvtojson "$filename" >"$filename.json"
done
...or, if you want to be able to set shell variables inside the loop and have them persist after its exit, you can use a process substitution to avoid the issues described in BashFAQ #24:
bad=0
good=0
while IFS= read -r -d '' filename; do
if csvtojson "$filename" >"$filename.json"; then
(( ++good ))
else
(( ++bad ))
fi
done < <(find ./ -type f -name '*.csv' -print0)
echo "Converting CSV files to JSON: ${bad} failures, ${good} successes" >&2
See UsingFind, particularly the Complex Actions section and the section on Actions In Bulk.

Bash - Rename ".tmp" files recursively

A bunch of Word & Excel documents were being moved on the server when the process terminated before it was complete. As a result, we're left with several perfectly fine files that have a .tmp extension, and we need to rename these files back to the appropriate .xlsx or .docx extension.
Here's my current code to do this in Bash:
#!/bin/sh
for i in "$(find . -type f -name *.tmp)"; do
ft="$(file "$i")"
case "$(file "$i")" in
"$i: Microsoft Word 2007+")
mv "$i" "${i%.tmp}.docx"
;;
"$i: Microsoft Excel 2007+")
mv "$i" "${i%.tmp}.xlsx"
;;
esac
done
It seems that while this does search recursively, it only does 1 file. If it finds an initial match, it doesn't go on to rename the rest of the files. How can I get this to loop correctly through the directories recursively without it doing just 1 file at a time?
Try find command like this:
while IFS= read -r -d '' i; do
ft="$(file "$i")"
case "$ft" in
"$i: Microsoft Word 2007+")
mv "$i" "${i%.tmp}.docx"
;;
"$i: Microsoft Excel 2007+")
mv "$i" "${i%.tmp}.xlsx"
;;
esac
done < <(find . -type f -name '*.tmp' -print0)
Using <(...) is called process substitution to run find command here
Quote filename pattern in find
Use -print0 to get find output delimited by a null character to allow space/newline characters in file names
Use IFS= and -d '' to read null separated filenames
I too would recommend using find. I would do this in two passes of find:
find . -type f -name \*.tmp \
-exec sh -c 'file "{}" | grep -q "Microsoft Word 2007"' \; \
-exec sh -c 'f="{}"; echo mv "$f" "${f%.tmp}.docx"' \;
find . -type f -name \*.tmp \
-exec sh -c 'file "{}" | grep -q "Microsoft Excel 2007"' \; \
-exec sh -c 'f="{}"; echo mv "$f" "${f%.tmp}.xlsx"' \;
Lines are split for readability.
Each instance of find will search for tmp files, then use -exec to test the output of find. This is similar to how you're doing it within the while loop in your shell script, only it's launched from within find itself. We're using the pipe to grep instead of your case statement.
The second -exec only gets run if the first one returned "true" (i.e. grep -q ... found something), and executes the rename in a tiny shell instance.
I haven't profiled this to see whether it would be faster or slower than a loop in a shell script. Just another way to handle things.

count number of lines for each file found

i think that i don't understand very well how the find command in Unix works; i have this code for counting the number of files in each folder but i want to count the number of lines of each file found and save the total in variable.
find "$d_path" -type d -maxdepth 1 -name R -print0 | while IFS= read -r -d '' file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
nb_ligne_fichier_R= "$(find "$file" -type f -maxdepth 1 -iname '*.R' -exec wc -l {} +)"
echo "$nb_ligne_fichier_R"
done
output:
43 .//system d exploi/r-repos/gbm/R/basehaz.gbm.R
90 .//system d exploi/r-repos/gbm/R/calibrate.plot.R
45 .//system d exploi/r-repos/gbm/R/checks.R
178 total: File name too long
can i just save to total number of lines in my variable? here in my example just save 178 and that for each files in my folder "$d_path"
Many Thanks
Maybe I'm missing something, but wouldn't this do what you want?
wc -l R/*.[Rr]
Solution:
find "$d_path" -type d -maxdepth 1 -name R | while IFS= read -r file; do
nb_fichier_R="$(find "$file" -type f -maxdepth 1 -iname '*.R' | wc -l)"
echo "$nb_fichier_R" #here is fine
find "$file" -type f -maxdepth 1 -iname '*.R' | while IFS= read -r fille; do
wc -l $fille #here is the problem nothing shown
done
done
Explanation:
adding -print0 the first find produced no newline so you had to tell read -d '' to tell it not to look for a newline. Your subsequent finds output newlines so you can use read without a delimiter. I removed -print0 and -d '' from all calls so it is consistent and idiomatic. Newlines are good in the unix world.
For the command:
find "$d_path" -type d -maxdepth 1 -name R -print0
there can be at most one directory that matches ("$d_path/R"). For that one directory, you want to print:
The number of files matching *.R
For each such file, the number of lines in it.
Allowing for spaces in $d_path and in the file names is most easily handled, I find, with an auxilliary shell script. The auxilliary script processes the directories named on its command line. You then invoke that script from the main find command.
counter.sh
shopt -s nullglob;
for dir in "$#"
do
count=0
for file in "$dir"/*.R; do ((count++)); done
echo "$count"
wc -l "$dir"/*.R </dev/null
done
The shopt -s nullglob option means that if there are no .R files (with names that don't start with a .), then the glob expands to nothing rather than expanding to a string containing *.R at the end. It is convenient in this script. The I/O redirection on wc ensures that if there are no files, it reads from /dev/null, reporting 0 lines (rather than sitting around waiting for you to type something).
On the other hand, the find command will find names that start with a . as well as those that do not, whereas the globbing notation will not. The easiest way around that is to use two globs:
for file in "$dir"/*.R "$dir"/.*.R; do ((count++)); done
or use find (rather carefully):
find . -type f -name '*.R' -exec sh -c 'echo $#' arg0 {} +
Using counter.sh
find "$d_path" -type d -maxdepth 1 -name R -exec sh ./counter.sh {} +
This script allows for the possibility of more than one sub-directory (if you remove -maxdepth 1) and invokes counter.sh with all the directories to be examined as arguments. The script itself carefully handles file names so that whether there are spaces, tabs or newlines (or any other character) in the names, it will work correctly. The sh ./counter.sh part of the find command assumes that the counter.sh script is in the current directory. If it can be found on $PATH, then you can drop the sh and the ./.
Discussion
The technique of having find execute a command with the list of file name arguments is powerful. It avoids issues with -print0 and using xargs -0, but gives you the same reliable handling of arbitrary file names, including names with spaces, tabs and newlines. If there isn't already a command that does what you need (but you could write one as a shell script), then do so and use it. If you might need to do the job more than once, you can keep the script. If you're sure you won't, you can delete it after you're done with it. It is generally much easier to handle files with awkward names like this than it is to fiddle with $IFS.
Consider this solution:
# If `"$dir"/*.R` doesn't match anything, yield nothing instead of giving the pattern.
shopt -s nullglob
# Allows matching both `*.r` and `*.R` in one expression. Using them separately would
# give double results.
shopt -s nocaseglob
while IFS= read -ru 4 -d '' dir; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
# Use process substitution to prevent going to a subshell. This may not be
# necessary for now but it could be useful to future modifications.
# Let's also use a custom fd to keep troubles isolated.
# It works with `-u 4`.
done 4< <(exec find "$d_path" -type d -maxdepth 1 -name R -print0)
Another form is to use readarray which allocates all found directories at once. Only caveat is that it can only read normal newline-terminated paths.
shopt -s nullglob
shopt -s nocaseglob
readarray -t dirs < <(exec find "$d_path" -type d -maxdepth 1 -name R)
for dir in "${dirs[#]}"; do
files=("$dir"/*.R)
echo "${#files[#]}"
for file in "${files[#]}"; do
wc -l "$file"
done
done

Looping through all files in a given directory [duplicate]

This question already has answers here:
Looping through all files in a directory [duplicate]
(6 answers)
Closed 4 years ago.
Here is what I'm trying to do:
Give a parameter to a shell script that will run a task on all files of jpg, bmp, tif extension.
Eg: ./doProcess /media/repo/user1/dir5
and all jpg, bmp, tif files in that directory will have a certain task run on them.
What I have now is:
for f in *
do
imagejob "$f" "output/${f%.output}" ;
done
I need help with the for loop to restrict the file types and also have some way of starting under a specified directory instead of current directory.
Use shell expansion rather than ls
for file in *.{jpg,bmp,tif}
do
imagejob "$file" "output/${file%.output}"
done
also if you have bash 4.0+, you can use globstar
shopt -s globstar
shopt -s nullglob
shopt -s nocaseglob
for file in **/*.{jpg,bmp,tif}
do
# do something with $file
done
for i in `ls $1/*.jpg $1/*.bmp $1/*.tif`; do
imagejob "$i";
done
This is assuming you're using a bashlike shell where $1 is the first argument given to it.
You could also do:
find "$1" -iname "*.jpg" -or -iname "*.bmp" -or -iname "*.tif" \
-exec imagejob \{\} \;
You could use a construct with backticks and ls (or any other commando of course):
for f in `ls *.jpg *.bmp *.tif`; do ...; done
The other solutions here are either Bash-only or recommend the use of ls in spite of it being a common and well-documented antipattern. Here is how to solve this in POSIX sh without ls:
for file in *.jpg *.bmp *.tif; do
... stuff with "$file"
done
If you have a very large number of files, perhaps you also want to look into
find . -maxdepth -type f \( \
-name '*.jpg' -o -name '*.bmp' -o -name '*.tif' \) \
-exec stuff with {} +
which avoids the overhead of sorting the file names alphabetically. The -maxdepth 1 says to not recurse into subdirectories; obviously, take it out or modify it if you do want to recurse into subdirectories.
The -exec ... + predicate of find is a relatively new introduction; if your find is too old, you might want to use -exec ... \; or replace the -exec stuff with {} + with
find ... -print0 |
xargs -r0 stuff with
where however again the -print0 option and the corresponding -0 option for xargs are a GNU extension.

Resources