I would like to do something like following
find ./testsrc -type f -print0 | xargs -0 -P4 -n10 -I{} cp --parents {} <dest>/
The cp is just an example of command that expect something after the input got from xargs. I know in this case I might do | xargs -0 -P4 -n10 cp --parents -t <dest>/ but there are commands cannot do this.
Here -n conflicts with -I
How can I achieve same effect with -I{}?
Looks like you're using Linux's xargs which gives the below warning (--max-args is the same as -n) :
xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value
As indeed -I will only consume a single element from the input, to keep -n10 you'll need a different approach. Below command line works by ab-using shell's ${#}, letting -n10 pass all these args to a sh which wraps the cp run:
find ./testsrc -type f -print0 | xargs -0 -P4 -n10 sh -c 'cp --parents "$#" <dest>/' --
Hope it helps :)
xargs cannot do this directly. GNU Parallel, however, can:
find ./testsrc -type f -print0 |
parallel -0 -P4 -n10 -I{} cp --parents {} <dest>/
(-I{} is redundant, -P4 is the default if you have 4 CPU threads).
Related
How to create symlinks in a single directory when:
The common way fails:
ln -s /readonlyShare/mydataset/*.mrc .
-bash: /bin/ln: Argument list too long
The find command doesn't allow the following syntax:
find /readonlyShare/mydataset -maxdepth 1 -name '*.mrc' -exec ln -s {} . +
Using wild forking takes hours to complete:
find /readonlyShare/mydataset -maxdepth 1 -name '*.mrc' -exec ln -s {} . ';'
find readonlyShare/mydataset -name '*.mrc' -maxdepth 1 -exec ln -s '{}' '+' .
or if you prefer xargs:
find readonlyShare/mydataset -name '*.mrc' -maxdepth 1 -print0 |
xargs -0 -P0 sh -c 'ln -s "$#" .' sh
If you are using BSD xargs instead of GNU xargs, it can be simpler:
find readonlyShare/mydataset -name '*.mrc' -maxdepth 1 -print0 |
xargs -0 -J# -P0 ln -s # .
Why '{}' '+'?
Quoted from man find:
-exec utility [argument ...] {} +
Same as -exec, except that “{}” is replaced with as many pathnames as possible for each invocation of utility. This behaviour is similar
to that of xargs(1). The primary always returns true; if at least one invocation of utility returns a non-zero exit status, find will
return a non-zero exit status.
find is good at splitting large number of arguments:
find readonlyShare/mydataset -name '*.mrc' -maxdepth 1 -exec ruby -e 'pp ARGV.size' '{}' '+'
15925
15924
15925
15927
1835
Why not xargs -I?
It is not efficient and slow because -I executes the utility per argument, for example:
printf 'foo\0bar' | xargs -0 -I# ruby -e 'pp ARGV' #
["foo"]
["bar"]
printf 'foo\0bar' | xargs -0 ruby -e 'pp ARGV'
["foo", "bar"]
xargs is also good at splitting large number of arguments
seq 65536 | tr '\n' '\0' | xargs -0 ruby -e 'pp ARGV.size'
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
5000
536
Why sh -c?
Only BSD xargs have -J flag to put arguments in the middle of commands. For GNU xargs, we need the combination of sh -c and "$#" to do the same thing.
find -exec vs find | xargs
It depends but I would suggest use xargs when you want to utilize all your CPUs. xargs can execute utility parallelly by -P while find can't.
I was in a rush when I needed it so I didn't explore all possibilities but I worked-out something meanwhile
Thanks to #WeihangJian answer I now know that find ... | xargs -I {} ... is as bad as find ... -exec ... {} ';'.
A correct answer to my question would be:
find /readonlyShare/mydataset -maxdepth 1 -name '*.mrc' \
-exec sh -c 'ln -s "$0" $#" .' {} +
I want to run a loop over all files of a particular extension in a directory:
for i in *.bam
do
...
done
However, if the command that I run inside the loop creates a temporary file of the same extension, the loop tries to process this new tmp file as well. This is unwanted. So, I thought the following would solve the problem: first list all the *.bam files in the directory, save that list to a variable, and then loop over this saved list:
list_bam=$(for i in *.bam; do echo $i; done)
for i in $list_bam
do
...
done
To my surprise, this runs into the same problem! Could someone please explain the logic behind this and how to fix it so that the loop only processes the pre-existing .bam files?
Instead of a loop you can use find and xargs
find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "{}" > "{}.new.bam"'
or
find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "$1" > "$1.new.bam"' -- {}
example:
$ touch a.bam b.bam
$ ls
a.bam b.bam
$ find . -maxdepth 1 -type f -name "*.bam" -print0 | \
xargs -0 -I{} bash -c 'echo "{}" > "{}.new.bam"'
$ ls
a.bam a.bam.new.bam b.bam b.bam.new.bam
You should perhaps make sure that your globbing expression *.bam couldn't be interpreted afterward with something like:
list_bam=$(ls *.bam)
...
...but, as noticed by #glenn in the comments, this is a bad idea.
Something similar should be made using a find ... -print0 | xargs -0 ... command template.
Based on the top answer in Running multiple commands with xargs I'm trying to use find / xargs to work upon more files. Why the first file 1.txt is missing in for loop?
$ ls
1.txt 2.txt 3.txt
$ find . -name "*.txt" -print0 | xargs -0
./1.txt ./2.txt ./3.txt
$ find . -name "*.txt" -print0 | xargs -0 sh -c 'for arg do echo "$arg"; done'
./2.txt
./3.txt
Why do you insist on using xargs? You can do the following as well.
while read -r file; do
echo $file
done <<<$(find . -name "*.txt")
Because this is executed in the same shell, changing variables is possible in the loop. Otherwise you'll get a sub-shell in which that doesn't work.
When you use your for-loop in a script example.sh, the call example.sh var1 var2 var3 will put var1 in the first argument, not example.sh.
When you want to process one file for each command, use the xargs option -L:
find . -name "*.txt" -print0 | xargs -0 -L1 sh -c 'echo "$0"'
# or for a simple case
find . -name "*.txt" -print0 | xargs -0 -L1 echo
I ran across this while having the same issue. You need the extra _ at the end as place holder 0 for xargs
$ find . -name "*.txt" -print0 | xargs -0 sh -c 'for arg do echo "$arg"; done' _
I want to merge output of three logwatch outputs and pipe result through sendmail.
Example:
#!/bin/sh
LOG_DIR="/var/log/remote-hosts"
MAIL_TO="me#email.com"
sh -c "logwatch && find ${LOG_DIR} -type d -name \"ip*\" -print0 | xargs -0 -I{} sh -c 'logwatch --logdir {} --hostname $(basename {})'" |
sed '1!b;s/^/To: '${MAIL_TO}'\nSubject: Logwatch report\n\n/' | sendmail -t
first logwatch is executed on /var/log folder
and then I would like to traverse /var/log/remote-hosts subfolders (ip-10-0-0-38 and ip-10-0-0-39 ) with find and also do logwatch on them.
The merged output will be sent throught sentmail. However I would like to replace hostname with basename of /var/log/remote-hosts subfolder so instead of /var/log/remote-hosts/ip-10-0-0-38 I will have ip-10-0-0-38 only.
But unfortunatelly I don't how to do the basename part correctly. Any help? Thanks in advance.
Don't use sh -c for grouping statements, use (...):
(logwatch && find ${LOG_DIR} -type d -name "ip*" -print0 | xargs -0 -I{} sh -c 'logwatch --logdir {} --hostname $(basename {})') |
sed '1!b;s/^/To: '${MAIL_TO}'\nSubject: Logwatch report\n\n/' | sendmail -t
find . -name "filename including space" -print0 | xargs -0 ls -aldF > log.txt
find . -name "filename including space" -print0 | xargs -0 rm -rdf
Is it possible to combine these two commands into one so that only 1 find will be done instead of 2?
I know for xargs -I there may be ways to do it, which may lead to errors when proceeding filenames including spaces. Any guidance is much appreciated.
find . -name "filename including space" -print0 |
xargs -0 -I '{}' sh -c 'ls -aldF {} >> log.txt; rm -rdf {}'
Ran across this just now, and we can invoke the shell less often:
find . -name "filename including space" -print0 |
xargs -0 sh -c '
for file; do
ls -aldF "$file" >> log.txt
rm -rdf "$file"
done
' sh
The trailing "sh" becomes $0 in the shell. xargs provides the files (returrned from find) as command line parameters to the shell: we iterate over them with the for loop.
If you're just wanting to avoid doing the find multiple times, you could do a tee right after the find, saving the find output to a file, then executing the lines as:
find . -name "filename including space" -print0 | tee my_teed_file | xargs -0 ls -aldF > log.txt
cat my_teed_file | xargs -0 rm -rdf
Another way to accomplish this same thing (if indeed it's what you're wanting to accomplish), is to store the output of the find in a variable (supposing it's not TB of data):
founddata=`find . -name "filename including space" -print0`
echo "$founddata" | xargs -0 ls -aldF > log.txt
echo "$founddata" | xargs -0 rm -rdf
I believe all these answers by now have given out the right ways to solute this problem. And I tried the 2 solutions of Jonathan and the way of Glenn, all of which worked great on my Mac OS X. The method of mouviciel did not work on my OS maybe due to some configuration reasons. And I think it's similar to Jonathan's second method (I may be wrong).
As mentioned in the comments to Glenn's method, a little tweak is needed. So here is the command I tried which worked perfectly FYI:
find . -name "filename including space" -print0 |
xargs -0 -I '{}' sh -c 'ls -aldF {} | tee -a log.txt ; rm -rdf {}'
Or better as suggested by Glenn:
find . -name "filename including space" -print0 |
xargs -0 -I '{}' sh -c 'ls -aldF {} >> log.txt ; rm -rdf {}'
As long as you do not have newline in your filenames, you do not need -print0 for GNU Parallel:
find . -name "My brother's 12\" records" | parallel ls {}\; rm -rdf {} >log.txt
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ
Just a variation of the xargs approach without that horrible -print0 and xargs -0, this is how I would do it:
ls -1 *.txt | xargs --delimiter "\n" --max-args 1 --replace={} sh -c 'cat {}; echo "\n"'
Footnotes:
Yes I know newlines can appear in filenames but who in their right minds would do that
There are short options for xargs but for the reader's understanding I've used the long ones.
I would use ls -1 when I want non-recursive behavior rather than find -maxdepth 1 -iname "*.txt" which is a bit more verbose.
You can execute multiple commands after find using for instead of xargs:
IFS=$'\n'
for F in `find . -name "filename including space"`
do
ls -aldF $F > log.txt
rm -rdf $F
done
The IFS defines the Internal Field Separator, which defaults to <space><tab><newline>. If your filenames may contain spaces, it is better to redefine it as above.
I'm late to the party, but there is one more solution that wasn't covered here: user-defined functions. Putting multiple instructions on one line is unwieldy, and can be hard to read/maintain. The for loop above avoids that, but there is the possibility of exceeding the command line length.
Here's another way (untested).
function processFiles {
ls -aldF "$#"
rm -rdf "$#"
}
export -f processFiles
find . -name "filename including space"` -print0 \
| xargs -0 bash -c processFiles dummyArg > log.txt
This is pretty straightforward except for the "dummyArg" which gave me plenty of grief. When running bash in this way, the arguments are read into
"$0" "$1" "$2" ....
instead of the expected
"$1" "$2" "$3" ....
Since processFiles{} is expecting the first argument to be "$1", we have to insert a dummy value into "$0".
Footnontes:
I am using some elements of bash syntax (e.g. "export -f"), but I believe this will adapt to other shells.
The first time I tried this, I didn't add a dummy argument. Instead I added "$0" to the argument lines inside my function ( e.g. ls -aldf "$0" "$#" ). Bad idea.
Aside from stylistic issues, it breaks when the "find" command returns nothing. In that case, $0 is set to "bash", Using the dummy argument instead avoids all of this.
Another solution:
find . -name "filename including space" -print0 \
| xargs -0 -I FOUND echo "$(ls -aldF FOUND > log.txt ; rm -rdf FOUND)"