Multiple commands with find and xargs, also accounting for special characters

Multiple commands with find and xargs, also accounting for special characters - bash

In OS X I am trying to combine the following two commands into a single command in a bash script, so that find operates once only. The files used by find contain spaces and special characters.
Command 1:
find /path -print0 | tr '\n' '\0' | xargs -0 chmod -N
Command 2:
find /path -print0 | tr '\n' '\0' | xargs -0 xattr -c
Both the above commands work.
I understand from 'Make xargs execute the command once for each line of input' that multiple commands can be executed through xargs with something like
find /path -print0 | xargs -0 -I '{}' sh -c 'command1 {} ; command2 {}'
However, my attempt to combine the commands with
find /path -print0 | tr '\n' '\0' | xargs -0 -I '{}' sh -c 'chmod -N {} ; xattr -c {}'
results in multiple errors for each file and folder in the /path, such as
chmod: Failed to clear ACL on file {}: No such file or directory
xattr: No such file: {}
sh: -c: line 0: syntax error near unexpected token `('
Is anyone able to help? Thank you in advance.

Try the following:
find /path -exec sh -c 'chmod -N "$#"; xattr -c "$#"' - {} +
-exec ... +, passes (typically) all matching paths to the specified command, which is them most efficient approach.
Both chmod and xattr support multiple file operands, so this approach is feasible.
find properly retains argument boundaries when passing substituting the paths for {}, so it would even handle filenames with embedded newlines correctly.
Incidentally: I'm unclear on what the purpose of tr '\n', '\0' in your code is, given that you already output \0-separated paths thanks to -print0.
Note the - as the first (dummy) argument passed to sh -c, because the first argument will become $0.
As for the problem with your original command:
I can't explain the specific symptoms, but one problem is that you're not quoting the {} instances inside your shell command, which makes them subject to word splitting (breaks file paths with embedded spaces into multiple arguments).

Related

Error - script move files related to name file inside folder

Hi guys i was building a script to order my files related to my studies file, but i don't understand why the prompt give me this error
error 1.1
mv: cannot stat 'filefilefilefilefilefilefilefilefilefilefilefile.pdf'$'\n': File name too long
that's mean i have to rename all long files? exists an other way to prevent this error?
the example below it's the script that has generated the error
Script 1 - move all greped files that contain business inside their name file and move them inside auto_folder_business
mkdir -p /mnt/c/Users/alber/Desktop/testfileorder/auto_folder_business
ls /mnt/c/Users/alber/Desktop/testfileorder | egrep -i 'business.' | xargs -0 -I '{}' mv '{}' /mnt/c/Users/alber/Desktop/testfileorder/auto_folder_business
In the example above I had also this other error
error 1.2
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
that i solved inserting -0 option,
despite this i tried to generalize this process writing this snippet
script 2 - move all greped files that contain the inserted keyword inside their name file and move them inside auto_folder_business
#!/bin/sh
read -p "file to order: --> " fetching_keyword
mypath=/mnt/c/Users/alber/Desktop/testfileorder/auto_folder_$fetching_keyword/
echo $mypath
mkdir -p $mypath
ls /mnt/c/Users/alber/Desktop/testfileorder |
egrep -i "$fetching_keyword" |
xargs -0 -I {} mv -n {} $mypath
also here I have an other error I think they are related
error 2
mv: cannot stat 'Statino (1).pdf'$'\n''Statino (2).pdf'$'\n''Statino (3).pdf'$'\n''Statino (4).pdf'$'\n''Statino.pdf'$'\n''auto_folder_statino'$'\n': No such file or directory
xargs: unmatched single quote; by default quotes are special to xargs unless you use the -0 option
I'm not understanding what I'm doing wrong...

xargs -0 expects to read null-terminated lines from standard input. ls and egrep are both writing newline-terminated lines, so their entire output is being read as a single input here.
The quickest fix is to utilize find -print0.
find /mnt/c/Users/alber/Desktop/testfileorder -iname "*${fetching_keyword}*" -print0 | \
xargs -0 -I {} mv -n {} "$mypath"
...but in this specific case, you might just want to use -exec at that point.
find /mnt/c/Users/alber/Desktop/testfileorder -iname "*${fetching_keyword}*" \
-exec mv -n {} "${mypath}" \;

Using touch and sed within a find -ok command

I have some wav files. For each of those files I would like to create a new text file with the same name (obviously with the wav extension being replaced with txt).
I first tried this:
find . -name *.wav -exec 'touch $(echo '{}" | sed -r 's/[^.]+\$/txt/')" \;
which outputted
< touch $(echo {} | sed -r 's/[^.]+$/txt/') ... ./The_stranglers-Golden_brown.wav > ?
Then find complained after I hit y key with:
find: ‘touch $(echo ./music.wav | sed -r 's/[^.]+$/txt/')’: No such file or directory
I figured out I was using a pipe and actually needed a shell. I then ran:
find . -name *.wav -exec sh -c 'touch $(echo "'{}"\" | sed -r 's/[^.]+\$/txt/')" \;
Which did the job.
Actually, I do not really get what is being done internally, but I guess a shell is spawned on every file right ? I fear this is memory costly.
Then, what if I need to run this command on a large bunch of files and directories !?
Now is there a way to do this in a more efficient way ?
Basically I need to transform the current file's name and to feed touch command.
Thank you.

This find with bash parameter-expansion will do the trick for you. You don't need sed at all.
find . -type f -name "*.wav" -exec sh -c 'x=$1; file="${x##*/}"; woe="${file%.*}"; touch "${woe}.txt"; ' sh {} \;
The idea is the part
x=$1 represents each of the entry returned from the output of find
file="${x##*/}" strips the path of the file leaving only the last file name part (only filename.ext)
The part woe="${file%.*}" stores the name without extension, and the new file is created with an extension .txt from the name found.
EDIT
Parameter expansion sets us free from using Command substitution $() sub-process and sed.
After looking at sh man page, I figured out that the command up above could be simplified.
Synopsis -c [-aCefnuvxIimqVEbp] [+aCefnuvxIimqVEbp] [-o option_name] [+o option_name] command_string [command_name [argument ...]]
...
-c Read commands from the command_string operand instead of from the stan‐dard input. Special parameter 0 will be set from the command_name oper‐and and the positional parameters ($1, $2, etc.) set from the remaining argument operands.
We can directly pass the file path, skipping the shell's name (which is useless inside the script anyway). So {} is passed as the command_name $0 which can be expanded right away.
We end up with a cleaner command.
find . -name *.wav -exec sh -c 'touch "${0%.*}".txt ;' {} \;

xargs command length limits

I am using jsonlint to lint a bunch of files in a directory (recursively). I wrote the following command:
find ./config/pages -name '*.json' -print0 | xargs -0I % sh -c 'echo Linting: %; jsonlint -V ./config/schema.json -q %;'
It works for most files but some files I get the following error:
Linting: ./LONG_FILE_NAME.json
fs.js:500
return binding.open(pathModule._makeLong(path), stringToFlags(flags), mode);
^
Error: ENOENT, no such file or directory '%'
It appears to fail for long filenames. Is there a way to fix this? Thanks.
Edit 1:
Found the problem.
-I replstr
Execute utility for each input line, replacing one or more occurrences
of replstr in up to replacements (or 5 if no -R flag is specified)
arguments to utility with the entire line of input. The resulting
arguments, after replacement is done, will not be allowed to grow
beyond 255 bytes; this is implemented by concatenating as much of the
argument containing replstr as possible, to the con-structed arguments
to utility, up to 255 bytes. The 255 byte limit does not apply to
arguments to utility which do not contain replstr, and furthermore, no
replacement will be done on utility itself. Implies -x.
Edit 2:
Partial solution. Supports longer file names than before but still not as long as I need.
find ./config/pages -name '*.json' -print0 | xargs -0I % sh -c 'file=%; echo Linting: $file; jsonlint -V ./config/schema.json -q $file;'

On BSD like systems (e.g. Mac OS X)
If you happen to be on a mac or freebsd etc. your xargs implementation may support option -J which does not suffer from the argument size limits imposed on option -I.
Excert from manpage
-J replstr
If this option is specified, xargs will use the data read from standard input to replace the first occurrence of replstr instead of appending that data after all other arguments. This option will not effect how many arguments will be read from input (-n), or the size of the command(s) xargs will generate (-s). The option just moves where those arguments will be placed in the command(s) that are executed. The replstr must show up as a distinct argument to xargs. It will not be recognized if, for instance, it is in the middle of a quoted string. Furthermore, only the first occurrence of the replstr will be replaced. For example, the following command will copy the list of files and directories which start with an uppercase letter in the current directory to destdir:
/bin/ls -1d [A-Z]* | xargs -J % cp -Rp % destdir
If you need to refer to the repstr multiple times (*points up* TL;DR -J only replaces first occurrence) you can use this pattern:
echo hi | xargs -J{} sh -c 'arg=$0; echo "$arg $arg"' "{}"
=> hi hi
POSIX compliant method
The posix compliant method of doing this would be to use some other tool, e.g. sed to construct the code you want to execute and then use xargs to just specify the utility. When no repl string is used in xargs the 255 byte limit does not apply. xargs POSIX spec
find . -type f -name '*.json' -print |
sed "s_^_-c 'file=\\\"_g;s_\$_\\\"; echo \\\"Definitely over 255 byte script..$(printf "a%.0s" {1..255}): \\\$file\\\"; wc -l \\\"\\\$file\\\"'_g" |
xargs -L1 sh
This of course largely defeats the purpose of xargs to begin with, but can still be used to leverage e.g. parallel execution using xargs -L1 -P10 sh which is quite widely supported, though not posix.

Use -exec in find instead of piping to xargs.
find ./config/pages -name '*.json' -print0 -exec echo Linting: {} \; -exec jsonlint -V ./config/schema.json -q {} \;

The limit on xargs's command line length is imposed by the system (not an environment) variable ARG_MAX. You can check it like:
$ getconf ARG_MAX
2097152
Surprisingly, there doesn't not seem to be a way to change it, barring kernel modification.
But even more surprising that xargs by default gets capped to a much lower value, and you can increase with -s option. Still, ARG_MAX is not the value you can set after -s — acc. to man xargs you need to subtract size of environment, plus some "headroom", no idea why. To find out the actual number use the following command (alternatively, using an arbitrary big number for -s will result in a descriptive error):
$ xargs --show-limits 2>&1 | grep "limit on argument length (this system)"
POSIX upper limit on argument length (this system): 2092120
So you need to run … | xargs -s 2092120 …, e.g. with your command:
find ./config/pages -name '*.json' -print0 | xargs -s 2092120 -0I % sh -c 'echo Linting: %; jsonlint -V ./config/schema.json -q %;'

xargs with multiple commands

In the current directory, I'd like to print the filename and contents in it.
I can print filenames or contents separately by
find . | grep "file_for_print" | xargs echo
find . | grep "file_for_print" | xargs cat
but what I want is printing them together like this:
file1
line1 inside file1
line2 inside file1
file2
line1 inside file2
line2 inside file2
I read xargs with multiple commands as argument
and tried
find . | grep "file_for_print" | xargs -I % sh -c 'echo; cat;'
but doesn't work.
I'm not familiar with xargs, so don't know what exactly "-I % sh -c" means.
could anyone help me? thank you!

find . | grep "file_for_print" | xargs -I % sh -c 'echo %; cat %;' (OP was missing %s)

To start with, there is virtually no difference between:
find . | grep "file_for_print" | xargs echo
and
find . -name "file_for_print*"
except that the second one will not match filenames like this_is_not_the_file_for_print, and it will print the filenames one per line. It will also be a lot faster, because it doesn't need to generate and print the entire recursive directory structure just in order for grep to toss most of it away.
find . -name "file_for_print*"
is actually exactly the same as
find . -name "file_for_print*" -print
where the -print action prints each matched filename followed by a newline. If you don't provide find with any actions, it assumes you wanted -print. But it has more tricks up its sleeve than that. For example:
find . -name "file_for_print*" -exec cat {} \;
The -exec action causes find to execute the following command, up to the \;, replacing {} with each matching file name.
find does not limit itself to a single action. You can tell it to do however many you want. So:
find . -name "file_for_print*" -print -exec cat {} \;
will probably do pretty well what you want.
For lots more information on this very useful utility, type:
man find
or
info find
and read all about It.

Since it's not been said yet: -I % tells xargs to replace '%' with the arguments in the command you give it. The sh -c '...' just means run the commands '...' in a new shell.
So
xargs -I % sh -c 'echo %; cat %;'
will run echo [filename] followed by cat [filename] for every filename given to xargs. The echo and cat commands will be executed inside a different shell process but this usually doesn't matter. Your version didn't work because it was missing the % signs inside the command passed to xargs.
For what it's worth I would use this command to achieve the same thing:
find -name "*file_for_print*" | parallel 'echo {}; cat {};'
because it's simpler (parallel automatically uses {} as the substitution character and can take multiple commands by default).

In this specific case, each command is executed for each individual file anyway, so there's no advantage in using xargs. You may just append -exec twice to your 'find':
find . -name "*file_for_print*" -exec echo {} \; -exec cat {} \;
In this case-print could be used instead of the first echo as pointed out by rici, but this example shows the ability to execute two arbitrary commands with a single find

What about writing your own bash function?
#!/bin/bash
myFunction() {
while read -r file; do
echo "$file"
cat "$file"
done
}
find . -name "file_for_print*" | myFunction

Make xargs execute the command once for each line of input

How can I make xargs execute the command exactly once for each line of input given?
It's default behavior is to chunk the lines and execute the command once, passing multiple lines to each instance.
From http://en.wikipedia.org/wiki/Xargs:
find /path -type f -print0 | xargs -0 rm
In this example, find feeds the input of xargs with a long list of file names. xargs then splits this list into sublists and calls rm once for every sublist. This is more efficient than this functionally equivalent version:
find /path -type f -exec rm '{}' \;
I know that find has the "exec" flag. I am just quoting an illustrative example from another resource.

The following will only work if you do not have spaces in your input:
xargs -L 1
xargs --max-lines=1 # synonym for the -L option
from the man page:
-L max-lines
Use at most max-lines nonblank input lines per command line.
Trailing blanks cause an input line to be logically continued on
the next input line. Implies -x.

It seems to me all existing answers on this page are wrong, including the one marked as correct. That stems from the fact that the question is ambiguously worded.
Summary: If you want to execute the command "exactly once for each line of input," passing the entire line (without newline) to the command as a single argument, then this is the best UNIX-compatible way to do it:
... | tr '\n' '\0' | xargs -0 -n1 ...
If you are using GNU xargs and don't need to be compatible with all other UNIX's (FreeBSD, Mac OS X, etc.) then you can use the GNU-specific option -d:
... | xargs -d\\n -n1 ...
Now for the long explanation…
There are two issues to take into account when using xargs:
how does it split the input into "arguments"; and
how many arguments to pass the child command at a time.
To test xargs' behavior, we need an utility that shows how many times it's being executed and with how many arguments. I don't know if there is a standard utility to do that, but we can code it quite easily in bash:
#!/bin/bash
echo -n "-> "; for a in "$#"; do echo -n "\"$a\" "; done; echo
Assuming you save it as show in your current directory and make it executable, here is how it works:
$ ./show one two 'three and four'
-> "one" "two" "three and four"
Now, if the original question is really about point 2. above (as I think it is, after reading it a few times over) and it is to be read like this (changes in bold):
How can I make xargs execute the command exactly once for each argument of input given? Its default behavior is to chunk the input into arguments and execute the command as few times as possible, passing multiple arguments to each instance.
then the answer is -n 1.
Let's compare xargs' default behavior, which splits the input around whitespace and calls the command as few times as possible:
$ echo one two 'three and four' | xargs ./show
-> "one" "two" "three" "and" "four"
and its behavior with -n 1:
$ echo one two 'three and four' | xargs -n 1 ./show
-> "one"
-> "two"
-> "three"
-> "and"
-> "four"
If, on the other hand, the original question was about point 1. input splitting and it was to be read like this (many people coming here seem to think that's the case, or are confusing the two issues):
How can I make xargs execute the command with exactly one argument for each line of input given? Its default behavior is to chunk the lines around whitespace.
then the answer is more subtle.
One would think that -L 1 could be of help, but it turns out it doesn't change argument parsing. It only executes the command once for each input line, with as many arguments as were there on that input line:
$ echo $'one\ntwo\nthree and four' | xargs -L 1 ./show
-> "one"
-> "two"
-> "three" "and" "four"
Not only that, but if a line ends with whitespace, it is appended to the next:
$ echo $'one \ntwo\nthree and four' | xargs -L 1 ./show
-> "one" "two"
-> "three" "and" "four"
Clearly, -L is not about changing the way xargs splits the input into arguments.
The only argument that does so in a cross-platform fashion (excluding GNU extensions) is -0, which splits the input around NUL bytes.
Then, it's just a matter of translating newlines to NUL with the help of tr:
$ echo $'one \ntwo\nthree and four' | tr '\n' '\0' | xargs -0 ./show
-> "one " "two" "three and four"
Now the argument parsing looks all right, including the trailing whitespace.
Finally, if you combine this technique with -n 1, you get exactly one command execution per input line, whatever input you have, which may be yet another way to look at the original question (possibly the most intuitive, given the title):
$ echo $'one \ntwo\nthree and four' | tr '\n' '\0' | xargs -0 -n1 ./show
-> "one "
-> "two"
-> "three and four"
As mentioned above, if you are using GNU xargs you can replace the tr with the GNU-specific option -d:
$ echo $'one \ntwo\nthree and four' | xargs -d\\n -n1 ./show
-> "one "
-> "two"
-> "three and four"

If you want to run the command for every line (i.e. result) coming from find, then what do you need the xargs for?
Try:
find path -type f -exec your-command {} \;
where the literal {} gets substituted by the filename and the literal \; is needed for find to know that the custom command ends there.
EDIT:
(after the edit of your question clarifying that you know about -exec)
From man xargs:
-L max-lines
Use at most max-lines nonblank input lines per command line. Trailing
blanks cause an input line to be logically continued on the next input line.
Implies -x.
Note that filenames ending in blanks would cause you trouble if you use xargs:
$ mkdir /tmp/bax; cd /tmp/bax
$ touch a\ b c\ c
$ find . -type f -print | xargs -L1 wc -l
0 ./c
0 ./c
0 total
0 ./b
wc: ./a: No such file or directory
So if you don't care about the -exec option, you better use -print0 and -0:
$ find . -type f -print0 | xargs -0L1 wc -l
0 ./c
0 ./c
0 ./b
0 ./a

How can I make xargs execute the command exactly once for each line of input given?
-L 1 is the simple solution but it does not work if any of the files contain spaces in them. This is a key function of find's -print0 argument – to separate the arguments by '\0' character instead of whitespace. Here's an example:
echo "file with space.txt" | xargs -L 1 ls
ls: file: No such file or directory
ls: with: No such file or directory
ls: space.txt: No such file or directory
A better solution is to use tr to convert newlines to null (\0) characters, and then use the xargs -0 argument. Here's an example:
echo "file with space.txt" | tr '\n' '\0' | xargs -0 ls
file with space.txt
If you then need to limit the number of calls you can use the -n 1 argument to make one call to the program for each input:
echo "file with space.txt" | tr '\n' '\0' | xargs -0 -n 1 ls
This also allows you to filter the output of find before converting the breaks into nulls.
find . -name \*.xml | grep -v /target/ | tr '\n' '\0' | xargs -0 tar -cf xml.tar

These two ways also work, and will work for other commands that are not using find!
xargs -I '{}' rm '{}'
xargs -i rm '{}'
example use case:
find . -name "*.pyc" | xargs -i rm '{}'
will delete all pyc files under this directory even if the pyc files contain spaces.

Another alternative...
find /path -type f | while read ln; do echo "processing $ln"; done

find path -type f | xargs -L1 command
is all you need.

The following command will find all the files (-type f) in /path and then copy them using cp to the current folder. Note the use if -I % to specify a placeholder character in the cp command line so that arguments can be placed after the file name.
find /path -type f -print0 | xargs -0 -I % cp % .
Tested with xargs (GNU findutils) 4.4.0

You can limit the number of lines, or arguments (if there are spaces between each argument) using the --max-lines or --max-args flags, respectively.
-L max-lines
Use at most max-lines nonblank input lines per command line. Trailing blanks cause an input line to be logically continued on the next input
line. Implies -x.
--max-lines[=max-lines], -l[max-lines]
Synonym for the -L option. Unlike -L, the max-lines argument is optional. If max-args is not specified, it defaults to one. The -l option
is deprecated since the POSIX standard specifies -L instead.
--max-args=max-args, -n max-args
Use at most max-args arguments per command line. Fewer than max-args arguments will be used if the size (see the -s option) is exceeded,
unless the -x option is given, in which case xargs will exit.

#Draemon answers seems to be right with "-0" even with space in the file.
I was trying the xargs command and I found that "-0" works perfectly with "-L". even the spaces are treated (if input was null terminated ). the following is an example :
#touch "file with space"
#touch "file1"
#touch "file2"
The following will split the nulls and execute the command on each argument in the list :
#find . -name 'file*' -print0 | xargs -0 -L1
./file with space
./file1
./file2
so -L1 will execute the argument on each null terminated character if used with "-0". To see the difference try :
#find . -name 'file*' -print0 | xargs -0 | xargs -L1
./file with space ./file1 ./file2
even this will execute once :
#find . -name 'file*' -print0 | xargs -0 | xargs -0 -L1
./file with space ./file1 ./file2
The command will execute once as the "-L" now doesn't split on null byte. you need to provide both "-0" and "-L" to work.

It seems I don't have enough reputation to add a comment to Tobia's answer above, so I am adding this "answer" to help those of us wanting to experiment with xargs the same way on the Windows platforms.
Here is a windows batch file that does the same thing as Tobia's quickly coded "show" script:
#echo off
REM
REM cool trick of using "set" to echo without new line
REM (from: http://www.psteiner.com/2012/05/windows-batch-echo-without-new-line.html)
REM
if "%~1" == "" (
exit /b
)
<nul set /p=Args: "%~1"
shift
:start
if not "%~1" == "" (
<nul set /p=, "%~1"
shift
goto start
)
echo.

In your example, the point of piping the output of find to xargs is that the standard behavior of find's -exec option is to execute the command once for each found file. If you're using find, and you want its standard behavior, then the answer is simple - don't use xargs to begin with.

execute ant task clean-all on every build.xml on current or sub-folder.
find . -name 'build.xml' -exec ant -f {} clean-all \;

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio