Storing 'find -print0' in intermediate variable without losing nul char - shell

I have the following sequence of commands to lint JSON files in a directory:
npm install --global --no-save jsonlint-cli
FILES="$(find . -name '*.json' -type f -print)"
echo -e "Discovered JSON files:\n$FILES"
echo "$FILES" | tr '\n' ' ' | xargs jsonlint-cli
I fully realize this is a NO-NO because it does not use the -print0 | xargs -0 pattern.
However, I would like to retain the ability to echo the discovered files but also invoke find and jsonlint-cli only once.
This is made difficult by the fact that ash can't store \0 in a variable. So this will not work:
FILES="$(find . -name '*.json' -type f -print0)"
echo -e "Discovered JSON files:\n$FILES" | tr '\0' '\n' # No effect, '\0' already gone
echo "$FILES" | xargs -0 jsonlint-cli
How can I use -print0 | xargs -0 will still maintaining the current behavior of echoing the discovered files to stdout?
The shell in this case is ash (inside a node:alpine Docker container).

Can't see anything that keeps you from doing it like this:
echo 'Discovered JSON files:'
find . -name '*.json' -type f -print -exec jsonlint-cli {} +
Note that if you have so many files that find needs to split them into sets, this won't work as you expected.

Related

"find | xargs | ls" not running ls on filenames from find

So I have a directory with files and sub-directories in it. I want to get all the files recursively and then list them in long format, sorted by the modified date. Here's what I came up with.
find . -type f | xargs -d "\n" | ls -lt
However this only lists the files in the current directory and not the sub-directories. I don't understand why, given that the following prints out all the files.
find . -type f | xargs -d "\n" | cat
Any help appreciated.
xargs can only start ls if it's passed ls as an argument. When you pipe from xargs into ls, only one copy of ls is started -- by the parent shell -- and it isn't given any of the filenames from find | xargs as arguments -- instead they're on its stdin, but ls never reads its stdin, so it doesn't even know that they're there.
Thus, you need to remove the | character:
# Does what you specified in the common case, but buggy; don't use this
# (filenames can contain newlines!)
# ...also, xargs -d is GNU-only
find . -type f | xargs -d '\n' ls -lt
...or, better:
# uses NUL separators, which cannot exist inside filenames
# also, while a non-POSIX extension, this is supported in both GNU and BSD xargs
find . -type f -print0 | xargs -0 ls -lt
...or, even better than that:
# no need for xargs at all here; find -exec can do the same thing
# -exec ... {} + is POSIX-mandated functionality since 2008
find . -type f -exec ls -lt {} +
Much of the content in this answer is also covered in the Actions, Complex Actions, and Actions in Bulk sections of Using Find, which is well worth reading.

Solution for find -exec if single and double quotes already in use

I would like to recursively go through all subdirectories and remove the oldest two PDFs in each subfolder named "bak":
Works:
find . -type d -name "bak" \
-exec bash -c "cd '{}' && pwd" \;
Does not work, as the double quotes are already in use:
find . -type d -name "bak" \
-exec bash -c "cd '{}' && rm "$(ls -t *.pdf | tail -2)"" \;
Any solution to the double quote conundrum?
In a double quoted string you can use backslashes to escape other double quotes, e.g.
find ... "rm \"\$(...)\""
If that is too convoluted use variables:
cmd='$(...)'
find ... "rm $cmd"
However, I think your find -exec has more problems than that.
Using {} inside the command string "cd '{}' ..." is risky. If there is a ' inside the file name things will break and might execcute unexpected commands.
$() will be expanded by bash before find even runs. So ls -t *.pdf | tail -2 will only be executed once in the top directory . instead of once for each found directory. rm will (try to) delete the same file for each found directory.
rm "$(ls -t *.pdf | tail -2)" will not work if ls lists more than one file. Because of the quotes both files would be listed in one argument. Therefore, rm would try to delete one file with the name first.pdf\nsecond.pdf.
I'd suggest
cmd='cd "$1" && ls -t *.pdf | tail -n2 | sed "s/./\\\\&/g" | xargs rm'
find . -type d -name bak -exec bash -c "$cmd" -- {} \;
You have a more fundamental problem; because you are using the weaker double quotes around the entire script, the $(...) command substitution will be interpreted by the shell which parses the find command, not by the bash shell you are starting, which will only receive a static string containing the result from the command substitution.
If you switch to single quotes around the script, you get most of it right; but that would still fail if the file name you find contains a double quote (just like your attempt would fail for file names with single quotes). The proper fix is to pass the matching files as command-line arguments to the bash subprocess.
But a better fix still is to use -execdir so that you don't have to pass the directory name to the subshell at all:
find . -type d -name "bak" \
-execdir bash -c 'ls -t *.pdf | tail -2 | xargs -r rm' \;
This could stll fail in funny ways because you are parsing ls which is inherently buggy.
You are explicitely asking for find -exec. Usually I would just concatenate find -exec find -delete but in your case only two files should be deleted. Therefore the only method is running subshell. Socowi already gave nice solution, however if your file names do not contain tabulator or newlines, another workaround is find while read loop.
This will sort files by mtime
find . -type d -iname 'bak' | \
while read -r dir;
do
find "$dir" -maxdepth 1 -type f -iname '*.pdf' -printf "%T+\t%p\n" | \
sort | head -n2 | \
cut -f2- | \
while read -r file;
do
rm "$file";
done;
done;
The above find while read loop as "one-liner"
find . -type d -iname 'bak' | while read -r dir; do find "$dir" -maxdepth 1 -type f -iname '*.pdf' -printf "%T+\t%p\n" | sort | head -n2 | cut -f2- | while read -r file; do rm "$file"; done; done;
find while read loop can also handle NUL terminated file names. However head can not handle this, so I did improve other answers and made it work with nontrivial file names (only GNU + bash)
replace 'realpath' with rm
#!/bin/bash
rm_old () {
find "$1" -maxdepth 1 -type f -iname \*.$2 -printf "%T+\t%p\0" | sort -z | sed -zn 's,\S*\t\(.*\),\1,p' | grep -zim$3 \.$2$ | xargs -0r realpath
}
export -f rm_old
find -type d -iname bak -execdir bash -c 'rm_old "{}" pdf 2' \;
However bash -c might still exploitable, to make it more secure let stat %N do the quoting
#!/bin/bash
rm_old () {
local dir="$1"
# we don't like eval
# eval "dir=$dir"
# this works like eval
dir="${dir#?}"
dir="${dir%?}"
dir="${dir//"'$'\t''"/$'\011'}"
dir="${dir//"'$'\n''"/$'\012'}"
dir="${dir//$'\047'\\$'\047'$'\047'/$'\047'}"
find "$dir" -maxdepth 1 -type f -iname \*.$2 -printf '%T+\t%p\0' | sort -z | sed -zn 's,\S*\t\(.*\),\1,p' | grep -zim$3 \.$2$ | xargs -0r realpath
}
find -type d -iname bak -exec stat -c'%N' {} + | while read -r dir; do rm_old "$dir" pdf 2; done

Running multiple commands with xargs - for loop

Based on the top answer in Running multiple commands with xargs I'm trying to use find / xargs to work upon more files. Why the first file 1.txt is missing in for loop?
$ ls
1.txt 2.txt 3.txt
$ find . -name "*.txt" -print0 | xargs -0
./1.txt ./2.txt ./3.txt
$ find . -name "*.txt" -print0 | xargs -0 sh -c 'for arg do echo "$arg"; done'
./2.txt
./3.txt
Why do you insist on using xargs? You can do the following as well.
while read -r file; do
echo $file
done <<<$(find . -name "*.txt")
Because this is executed in the same shell, changing variables is possible in the loop. Otherwise you'll get a sub-shell in which that doesn't work.
When you use your for-loop in a script example.sh, the call example.sh var1 var2 var3 will put var1 in the first argument, not example.sh.
When you want to process one file for each command, use the xargs option -L:
find . -name "*.txt" -print0 | xargs -0 -L1 sh -c 'echo "$0"'
# or for a simple case
find . -name "*.txt" -print0 | xargs -0 -L1 echo
I ran across this while having the same issue. You need the extra _ at the end as place holder 0 for xargs
$ find . -name "*.txt" -print0 | xargs -0 sh -c 'for arg do echo "$arg"; done' _

renaming series of files using xargs

I would like to rename several files picked by find in some directory, then use xargs and mv to rename the files, with parameter expansion. However, it did not work...
example:
mkdir test
touch abc.txt
touch def.txt
find . -type f -print0 | \
xargs -I {} -n 1 -0 mv {} "${{}/.txt/.tx}"
Result:
bad substitution
[1] 134 broken pipe find . -type f -print0
Working Solution:
for i in ./*.txt ; do mv "$i" "${i/.txt/.tx}" ; done
Although I finally got a way to fix the problem, I still want to know why the first find + xargs way doesn't work, since I don't think the second way is very general for similar tasks.
Thanks!
Remember that shell variable substitution happens before your command runs. So when you run:
find . -type f -print0 | \
xargs -I {} -n 1 -0 mv {} "${{}/.txt/.tx}"
The shell tries to expan that ${...} construct before xargs even
runs...and since that contents of that expression aren't a valid shell variable reference, you get an error. A better solution would be to use the rename command:
find . -type f -print0 | \
xargs -I {} -0 rename .txt .tx {}
And since rename can operate on multiple files, you can simplify
that to:
find . -type f -print0 | \
xargs -0 rename .txt .tx

Bash script to change file names recursively

I have a script for changing file names of mht files but it does not traverse through dirs and sub dirs. I asked a question on a local forum and I got an answer that this is a solution:
find . -type f -name "*.mhtml" -o -type f -name "*.mht" | xargs -I item sh -c '{ echo item; echo item | sed "s/[:?|]//g"; }' | xargs -n2 mv
But it generates an error. With some of my experimenting it turns out that sh -c breaks file names with space and that this generates an error. How can I fix this?
#!/bin/bash
# renames.sh
# basic file renamer
for i in . *.mht
do
j=`echo $i | sed 's/|/ /g' | sed 's/:/ /g' | sed 's/?//g' | sed 's/"//g'`
mv "$i" "$j"
done
#! /bin/bash
find . -type f \( -name "*.mhtml" -o -name ".mht" \) -print0 |
while IFS= read -r -d '' source; do
target="${source//[:?|]/}"
[ "X$source" != "X$target" ] &&
mv -nv "$source" "$target"
done
Update: Do the rename according to the original question, and added support for .mht.
Use rename. With rename you can specify a renaming pattern:
find . -type f \( -name "*.mhtml" -o -name "*.mht" \) -print0 | xargs -0 -I'{}' rename 's/[:?|]//g' "{}"
This way you can properly handle names with spaces. xargs will replace {} with every names of file provided by the find command. Also note the use of -print0 and -0. This use a \0 as a separator so its avoid problems dealing with filnames containing \n (newline).
The -o was not working the way it was intended to. you must use parenthesis to group conditions.
You may also consider using -iname instead of -name if you deal with file ending with ".mHtml".

Resources