How to execute multiple commands after xargs -0? - bash

find . -name "filename including space" -print0 | xargs -0 ls -aldF > log.txt
find . -name "filename including space" -print0 | xargs -0 rm -rdf
Is it possible to combine these two commands into one so that only 1 find will be done instead of 2?
I know for xargs -I there may be ways to do it, which may lead to errors when proceeding filenames including spaces. Any guidance is much appreciated.

find . -name "filename including space" -print0 |
xargs -0 -I '{}' sh -c 'ls -aldF {} >> log.txt; rm -rdf {}'
Ran across this just now, and we can invoke the shell less often:
find . -name "filename including space" -print0 |
xargs -0 sh -c '
for file; do
ls -aldF "$file" >> log.txt
rm -rdf "$file"
done
' sh
The trailing "sh" becomes $0 in the shell. xargs provides the files (returrned from find) as command line parameters to the shell: we iterate over them with the for loop.

If you're just wanting to avoid doing the find multiple times, you could do a tee right after the find, saving the find output to a file, then executing the lines as:
find . -name "filename including space" -print0 | tee my_teed_file | xargs -0 ls -aldF > log.txt
cat my_teed_file | xargs -0 rm -rdf
Another way to accomplish this same thing (if indeed it's what you're wanting to accomplish), is to store the output of the find in a variable (supposing it's not TB of data):
founddata=`find . -name "filename including space" -print0`
echo "$founddata" | xargs -0 ls -aldF > log.txt
echo "$founddata" | xargs -0 rm -rdf

I believe all these answers by now have given out the right ways to solute this problem. And I tried the 2 solutions of Jonathan and the way of Glenn, all of which worked great on my Mac OS X. The method of mouviciel did not work on my OS maybe due to some configuration reasons. And I think it's similar to Jonathan's second method (I may be wrong).
As mentioned in the comments to Glenn's method, a little tweak is needed. So here is the command I tried which worked perfectly FYI:
find . -name "filename including space" -print0 |
xargs -0 -I '{}' sh -c 'ls -aldF {} | tee -a log.txt ; rm -rdf {}'
Or better as suggested by Glenn:
find . -name "filename including space" -print0 |
xargs -0 -I '{}' sh -c 'ls -aldF {} >> log.txt ; rm -rdf {}'

As long as you do not have newline in your filenames, you do not need -print0 for GNU Parallel:
find . -name "My brother's 12\" records" | parallel ls {}\; rm -rdf {} >log.txt
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ

Just a variation of the xargs approach without that horrible -print0 and xargs -0, this is how I would do it:
ls -1 *.txt | xargs --delimiter "\n" --max-args 1 --replace={} sh -c 'cat {}; echo "\n"'
Footnotes:
Yes I know newlines can appear in filenames but who in their right minds would do that
There are short options for xargs but for the reader's understanding I've used the long ones.
I would use ls -1 when I want non-recursive behavior rather than find -maxdepth 1 -iname "*.txt" which is a bit more verbose.

You can execute multiple commands after find using for instead of xargs:
IFS=$'\n'
for F in `find . -name "filename including space"`
do
ls -aldF $F > log.txt
rm -rdf $F
done
The IFS defines the Internal Field Separator, which defaults to <space><tab><newline>. If your filenames may contain spaces, it is better to redefine it as above.

I'm late to the party, but there is one more solution that wasn't covered here: user-defined functions. Putting multiple instructions on one line is unwieldy, and can be hard to read/maintain. The for loop above avoids that, but there is the possibility of exceeding the command line length.
Here's another way (untested).
function processFiles {
ls -aldF "$#"
rm -rdf "$#"
}
export -f processFiles
find . -name "filename including space"` -print0 \
| xargs -0 bash -c processFiles dummyArg > log.txt
This is pretty straightforward except for the "dummyArg" which gave me plenty of grief. When running bash in this way, the arguments are read into
"$0" "$1" "$2" ....
instead of the expected
"$1" "$2" "$3" ....
Since processFiles{} is expecting the first argument to be "$1", we have to insert a dummy value into "$0".
Footnontes:
I am using some elements of bash syntax (e.g. "export -f"), but I believe this will adapt to other shells.
The first time I tried this, I didn't add a dummy argument. Instead I added "$0" to the argument lines inside my function ( e.g. ls -aldf "$0" "$#" ). Bad idea.
Aside from stylistic issues, it breaks when the "find" command returns nothing. In that case, $0 is set to "bash", Using the dummy argument instead avoids all of this.

Another solution:
find . -name "filename including space" -print0 \
| xargs -0 -I FOUND echo "$(ls -aldF FOUND > log.txt ; rm -rdf FOUND)"

Related

How to stop bash loop from looping over files created during the loop?

I want to run a loop over all files of a particular extension in a directory:
for i in *.bam
do
...
done
However, if the command that I run inside the loop creates a temporary file of the same extension, the loop tries to process this new tmp file as well. This is unwanted. So, I thought the following would solve the problem: first list all the *.bam files in the directory, save that list to a variable, and then loop over this saved list:
list_bam=$(for i in *.bam; do echo $i; done)
for i in $list_bam
do
...
done
To my surprise, this runs into the same problem! Could someone please explain the logic behind this and how to fix it so that the loop only processes the pre-existing .bam files?
Instead of a loop you can use find and xargs
find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "{}" > "{}.new.bam"'
or
find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "$1" > "$1.new.bam"' -- {}
example:
$ touch a.bam b.bam
$ ls
a.bam b.bam
$ find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "{}" > "{}.new.bam"'
$ ls
a.bam a.bam.new.bam b.bam b.bam.new.bam
You should perhaps make sure that your globbing expression *.bam couldn't be interpreted afterward with something like:
list_bam=$(ls *.bam)
...
...but, as noticed by #glenn in the comments, this is a bad idea.
Something similar should be made using a find ... -print0 | xargs -0 ... command template.

Running multiple commands with xargs - for loop

Based on the top answer in Running multiple commands with xargs I'm trying to use find / xargs to work upon more files. Why the first file 1.txt is missing in for loop?
$ ls
1.txt 2.txt 3.txt
$ find . -name "*.txt" -print0 | xargs -0
./1.txt ./2.txt ./3.txt
$ find . -name "*.txt" -print0 | xargs -0 sh -c 'for arg do echo "$arg"; done'
./2.txt
./3.txt
Why do you insist on using xargs? You can do the following as well.
while read -r file; do
echo $file
done <<<$(find . -name "*.txt")
Because this is executed in the same shell, changing variables is possible in the loop. Otherwise you'll get a sub-shell in which that doesn't work.
When you use your for-loop in a script example.sh, the call example.sh var1 var2 var3 will put var1 in the first argument, not example.sh.
When you want to process one file for each command, use the xargs option -L:
find . -name "*.txt" -print0 | xargs -0 -L1 sh -c 'echo "$0"'
# or for a simple case
find . -name "*.txt" -print0 | xargs -0 -L1 echo
I ran across this while having the same issue. You need the extra _ at the end as place holder 0 for xargs
$ find . -name "*.txt" -print0 | xargs -0 sh -c 'for arg do echo "$arg"; done' _

Printing the shell find and remove command to screen and log file

I have a script that finds log files older than x days within a specified directory and removes them.
find $LOG_ARCHIVE/* -mtime +$DAYS_TO_KEEP_LOGS -exec rm -f {} \;
This is working as expected but I would like to have the option to print the processing to the screen and log file so I know what files (if any) have been deleted. I've tried appending tee at the end but have had no success.
find $LOG_ARCHIVE/* -mtime +$DAYS_TO_KEEP_LOGS -exec rm -fv {} \; | tee -a $LOG
There are multiple ways the task can be done.
One possibility is to simply run find twice:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -print > "$LOG"
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -exec rm -f {} +
Another possibility is to use tee along with (GNU extensions) -print0 to find and -0 to xargs:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -print0 |
tee "$LOG" |
xargs -0 rm -f
With this version, the log file will have null bytes at the end of each file name. You can arrange to replace those with newlines if you don't mind the possible ambiguity:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -print0 |
tee >(tr '\0' '\n' >"$LOG") |
xargs -0 rm -f
This uses Bash (and Korn shell) process substitution to pass the log file through tr to map null bytes '\0' to newlines '\n'.
Another way of doing it is to write a tiny custom script (call it remove-log.sh):
printf '%s\n' "$#" >> "$LOG"
rm -f "$#"
and then use:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -exec bash remove-log.sh {} +
Note that the script needs to see the value of $LOG, so that must be exported as an environment variable. You could avoid that by passing the log name explicitly:
logfile="$1"
shift
printf '%s\n' "$#" >> "$logfile"
rm -f "$#"
plus:
find "$LOG_ARCHIVE" -mtime +"$DAYS_TO_KEEP_LOGS" -exec bash remove-log.sh "$LOG" {} +
Note that both of these use >> to append because the script might be invoked more than once (though it probably won't be). The onus is on you to ensure that the log file is empty before you run the find command.
Note that I dropped the /* from the path argument for find; it wasn't really needed. You might want to add -type f to ensure that only files are removed. The + is a feature from the POSIX 2008 specification of find which makes find act rather like xargs without needing to explicitly use xargs.
find $LOG_ARCHIVE/* -mtime +$DAYS_TO_KEEP_LOGS -exec sh -c 'echo {} |tee -a "$LOG"; rm -f {}' \;
Try and see if it works.

How can I get xargs to do something with the input, then do another thing?

I'm in zsh.
I'd like to do something like:
find . -iname *.md | xargs cat && echo "---" > all_slides_with_separators_in_between.md
Of course this cats all the slides, then appends a single "---" at the end instead of after each slide.
Is there an xargs way of doing this? Can I replace cat && echo "---" with some inline function or do block?
Very strangely, when I create a file cat---.sh with the contents
cat $1
echo ---
and run
find . -iname *.md | xargs ./cat---.sh
it only executes for the first result of find.
Replace cat---.sh with cat and it runs on both files.
There's no need to use xargs at all here. Following is a properly paranoid approach (robust against files with spaces, files with newlines, files with literal backslashes in their names, etc):
while IFS= read -r -d '' filename; do
printf '---\n'
cat -- "$filename"
done < <(find . -iname '*.md' -print0) >all_slides_with_separators.md
However -- you don't even need that either: find can do all the work itself, both printing the separator and calling cat!
find . -iname '*.md' -printf '---\n' -exec cat -- '{}' ';' >all_slides_with_separators.md
A common usage pattern is xargs sh -c 'command; another' _ where the entire shell script in the quotes will have access to the command-line arguments. The underscore is because the first argument to sh -c will be assigned to $0 (where you'd often see e.g. -sh in a ps listing).
find . -iname '*.md' |
xargs sh -c 'for x; do
cat "$x" && echo "---"
done' _ > all_slides_with_separators_in_between.md
As noted in the comments, you should probably investigate find -print0 and the corresponding xargs -0 option in GNU find (and maybe install it if you don't have it).
You can do something like this, but it can be insecure in some cases (see comments):
find . -iname '*.md' | xargs -I % sh -c '{ cat %; echo "----"; }' > output.txt
You'll rarely need find in zsh; its globbing facilities cover nearly every use case of find.
for f in (#i)**/*.md; do
cat $f
print -- "---"
done > all_slides.md
This looks in the current directory hierarchy for every file that matches *.md in a case-insensitive manner.
For even more efficiency, replace cat $f with < $f; zsh itself will read the file and write its contents to standard output.
Using GNU Parallel it looks like this:
parallel cat {}\; print -- --- ::: **/*.md

In-line text replacement using sed, shell, or some other means

I want to pass two parameters to a program, a file name and a modified version of the file name. The situation is I have a bunch of .html.erb files in a directory tree, and I want invoke html2haml on them with the original filename and a new output filename with the haml extension, like so:
html2haml thing.html.erb thing.html.haml
Here's my current best attempt at this:
find . -name "*.html.erb" -exec echo {} `echo {} | sed "s/.erb/.haml/g"` \;
(after I'm done testing I'll replace echo with html2haml and run it again)
However it doesn't work. The result of the expression inside backticks is the unmodified string.
Here are some experiments I tried which DO behave as expected (to test if my syntax and levels of escaping/quotes were correct):
1. echo myfile.foo | sed 's/foo/foo2/g'
2. find . -name "*.html.erb" -exec echo {} `echo xyz | sed "s/y/Y/g"` \;
3. find . -name "*.html.erb" -exec echo {} `echo {} hello` \;
4. find . -name "*.html.erb" -exec echo {} `echo {}` \;
The fact that these all behave as expected suggest to me that I am getting some small thing wrong in the syntax, and that is is indeed possible to do this with a one-liner.
If this is impossible, it might be because of a misunderstanding about "when" find inserts its results on each invocation. example #3 above suggest to me that it does it exactly when i need/expect it to (because I'm successfully concatenating each individual result string with "hello").
If you have gsed:
find . -name \*.erb -print0 | gsed -z 'p;s/.erb$/.haml/' | xargs -0 -n2 html2haml
If you don't have gsed and only have sed, this will work, but only if none of your file names have whitespace.
find . -name \*.erb -print | sed 'p;s/.erb$/.haml/' | xargs -n2 html2haml
Discussion about these and other techniques follows:
I have different versions of sed - my GNU sed is called gsed, if your sed is GNU - instead of gsed use sed.
You can check your sed with the sed --version, if prints something like:
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.
You have a GNU sed.
The above - for the next find
$ find . -name \*foo -print
./a/test.foo
./b/c/test.foo
./b/te st.foo #<- note the filename with space
./b/test.foo
the above command produces:
$find . -name \*foo -print0 | gsed -z 'p;s/foo$/foo2/' | xargs -0 -n2 echo bar
bar ./a/test.foo ./a/test.foo2
bar ./b/c/test.foo ./b/c/test.foo2
bar ./b/te st.foo ./b/te st.foo2
bar ./b/test.foo ./b/test.foo2
Without additional scripts or functions. ;)
or you can replace the sed with perl, so the next
find . -name \*foo -print0 | perl -n0le 'print;s/foo/foo2/;print' | xargs -0 -n2 echo bar
produces the same result:
bar ./a/test.foo ./a/test.foo2
bar ./b/c/test.foo ./b/c/test.foo2
bar ./b/te st.foo ./b/te st.foo2
bar ./b/test.foo ./b/test.foo2
IF you REALLY want to do it within one find, try:
find . -name \*html.erb -exec sh -c 'echo html2haml "{}" "$(echo "{}" | sed 's/\.erb/\.haml/')"' \;
or elimitating two useless echo the final command:
find . -name \*html.erb -exec sh -c 'html2haml "{}" "$(sed 's/\.erb/\.haml/'<<<"{}")"' \;
What about a loop?
find . -name "*.html.erb" | while read file
do
haml_file=${file%.erb}.haml
html2haml $file $haml_file
done
The ${var%glob} syntax takes an environment variable ${var} and filters out the smallest portion of the right side that matches glob.
If you know that the filename ends with .foo, then you can use:
do_something "$filename" "${filename%.foo}.foo2"
(In the unlikely case that you really want to just put a 2 on the end, you could of course just use "${filename}2". But I assume the foo and foo2 are to be substituted with less similar strings.)
If you want to invoke do_something from find, your best bet would be to pass it only one filename (or, better, a number of filenames each of them representing a single operation). For example:
-- do_something.sh
#!/bin/bash
# This is the definition of what you want to do.
# It is called as `bar old_filename new_filename`
bar() {
# For example
mv "$1" "$2"
}
for filename in "$#"; do
bar "$filename" "${filename%.foo}.foo2"
done
-- find command:
find . -type f -name '*.foo' -exec do_something.sh {} +
If you really need to use sed (for something that you can't even do with the bash replace syntax, ${var/pattern/substitution}), then set up do_something as above, but replace the line inside the for loop with, for example:
bar "$filename" "$(sed -r 's/([^.]+)\.([^.]+)$/\2.\1/' <<<"$filename")"
Explanation: The above sed expression (gnu-specific) flips the last two extensions around, so it would change some.file.html.en into some.file.en.html. -r causes gnu sed to use extended regex format, which I find more readable. <<< is a bashism which expands the word following it and feeds it into stdin, somewhat similar to echo "$filename" | sed ... but without creating another subprocess.
You can call your find like this:
find . -name "*.html.erb" -print0 -print0|xargs -0 -J % html2haml % | sed 's/\.erb$/.haml/'
This will result in executing:
html2haml thing.html.erb thing.html.haml

Resources