help using xargs to pass mulitiple filenames to shell script - shell

Can someone show me to use xargs properly? Or if not xargs, what unix command should I use?
I basically want to input more than (1) file name for input <localfile>, third input parameter.
For example:
1. use `find` to get list of files
2. use each filename as input to shell script
Usage of shell script:
test.sh <localdir> <localfile> <projectname>
My attempt, but not working:
find /share1/test -name '*.dat' | xargs ./test.sh /staging/data/project/ '{}' projectZ \;
Edit:
After some input from everybody and trying -exec, I am finding that my <localfile> filename input with find is also giving me the full path. /path/filename.dat instead of filename.dat. Is there a way to get the basename from find? I think this will have to be a separate question.

I'd just use find -exec here:
% find /share1/test -name '*.dat' -exec ./test.sh /staging/data/project/ {} projectZ \;
This will invoke ./test.sh with your three arguments once for each .dat file under /share1/test.
xargs would pack up all of these filenames and pass them into one invocation of ./test.sh, which doesn't look like your desired behaviour.

If you want to execute the shell script for each file (as opposed to execute in only once on the whole list of files), you may want to use find -exec:
find /share1/test -name '*.dat' -exec ./test.sh /staging/data/project/ '{}' projectZ \;
Remember:
find -exec is for when you want to run a command on one file, for each file.
xargs instead runs a command only once, using all the files as arguments.

xargs stuffs as many files as it can onto the end of the command line.
Do you want to execute the script on one file at a time or all files? For one at a time, use file's exec, which it looks like you're already using the syntax for, and which xargs doesn't use:
find /share1/test -name '*.dat' -exec ./test.sh /staging/data/project/ '{}' projectZ \;

xargs does not have to combine arguments, it's just the default behavior. this properly uses xargs, to execute the commands, as intended.
find /share1/test -name '*.dat' -print0 | xargs -0 -I'{}' ./test.sh /staging/data/project/ '{}' projectZ
When piping find to xargs, NULL termination is usually preferred, I recommend appending the -print0 option to find. After which you must add -0 to xargs, so it will expect NULL terminated arguments. This ensures proper handling of filenames. It's not POSIX proper, but considered well supported. You can always drop the NULL terminating options, if your commands lack support.
Remeber while find's purpose is finding files, xargs is much more generic. I often use xargs to process non-filename arguments.

Related

awk for many compressed files

The following command calculates the GC content for each fasta fastq files
identified with the find command. Briefly a fastq is a file which for a large number of datapoints have 4 lines of information and where the second line I'm interested in only contains (ATGC). For testing (identical) example files can be found here).
find . -iname '*.fastq' -exec awk '(NR%4==2) {N1+=length($0);gsub(/[AT]/,"");N2+=length($0);}END{print N2/N1;}' "{}" \;
How can I modify/rewrite it into a one-liner that works on gziped fastq files? I need the regex option currently used with find.
The find '-exec' can be used to invoke (and pass arguments) to a single program. The challenge here is that two commands (cat|awk) need to be combined with a pipe. Two possible path: construct a shell command OR use the more flexible xargs.
# Using the 'shell -c' command
find . -iname '*.fastq.gz' -exec sh -c "zcat {} | awk '(NR%4==2) \
{N1+=length(\$0);gsub(/[AT]/,\"\");N2+=length(\$0);}END{print N2/N1;}'" \;
# OR, using process substitution
find . -iname '*.fastq.gz' -exec bash -c "awk '(NR%4==2) \
{N1+=length(\$0);gsub(/[AT]/,\"\");N2+=length(\$0);}END{print N2/N1;}' <(zcat {})" \;
See many references to find/xargs in stack overflow
If, as you say, you have many large files, I would suggest processing them in parallel. If the issue is that you are having problems quoting your awk, I would suggest putting your script in a separate file, called, say script.awk like this:
(NR%4==2) {N1+=length($0);gsub(/[AT]/,"");N2+=length($0);}END{print N2/N1;}
Now you can simply process them all in parallel with GNU Parallel:
find . -iname \*fastq.gz -print0 | parallel -0 gzcat {} \| awk -f ./script.awk

Bash: Fail to use find -exec

When using scp or rsync I often fail to deal with 'Argument list too long' error. When having to mv or rm, I have no problem to use find and xargs but I fail to understand how to use find and -exec despite all the SE posts on the subject. Consider the following issue...
I tried
$scp /Path/to/B* Me#137.92.4.152:/Path/to/
-bash: /usr/bin/scp: Argument list too long
So I tried
$find . -name "/Path/to/B*" -exec scp "{}" Me#137.92.4.152:/Path/to/ '\;'
find: -exec: no terminating ";" or "+"
so I tried
$find . -name "/Path/to/B*" -exec scp "{}" Me#137.92.4.152:/Path/to/ ';'
find: ./.gnupg: Permission denied
find: ./.subversion/auth: Permission denied
So I tried
$sudo find . -name "/Path/to/B*" -exec scp "{}" Me#137.92.4.152:/Path/to/ ';'
and nothing happen onces I enter my password
I am on Mac OSX version 10.11.3, Terminal version 2.6.1
R. Saban's helpful answer solves your primary problem:
-name only accepts a filename pattern, not a path pattern.
Alternatively, you could simply use the -path primary instead of the -name primary.
As for using as few invocations of scp as possible - each of which requires specifying a password by default:
As an alternative, consider bypassing the use of scp altogether, as suggested in Eric Renouf's helpful answer.
While find's -exec primary allows using terminator + in lieu of ; (which must be passed as ';' or \; to prevent the shell from interpreting ; as a command terminator) for passing as many filenames as will fit on a single command line (a built-in xargs, in a manner of speaking), this is NOT an option here, because use of + requires that placeholder {} come last on the command line, immediately before +.
However, since you're on macOS, you can use BSD xarg's nonstandard -J option for placing the placeholder anywhere on the command line, while still passing as many arguments as possible at once (using BSD find's nonstandard -print0 option in combination with xargs's nonstandard -0 option ensures that all filenames are passed as-is, even if they have embedded spaces, for instance):
find . -path "/Path/to/B*" -print0 | xargs -0 -J {} scp {} Me#137.92.4.152:/Path/to/
Now you will at most be prompted a few times: one prompt for every batch of arguments, as required to accommodate all arguments while observing the max. command-line length with the fewest number of calls possible.
EDIT after your update:
find "/Path/to" -maxdepth 1 -name "B*" -exec scp {} Me#137.92.4.152:/Path/to/ \;
A solution that wouldn't require multiple scp connections (and therefore password entries) would be to tar on one side and untar on the other like:
find /Path/to -maxdepth 1 -name 'B*' -print0 | tar -c --null -T - | ssh ME#137.92.4.152 tar -x -C /Path/to
assuming your version of find supports -print0 and the like. It works by printing out null terminated list of files from find and telling tar to read its list of files from stdin (-T -) treating the list as null terminated (--null) and create a new archive (-c). By default, tar will write to stdout.
So then we'll pipe that archive to an ssh command to the target host. That will read the output of the previous command on its stdin, so we'll use tar there to extract (-x) the archive into the given directory (-C /Path/to)

When to use xargs when piping?

I am new to bash and I am trying to understand the use of xargs, which is still not clear for me. For example:
history | grep ls
Here I am searching for the command ls in my history. In this command, I did not use xargs and it worked fine.
find /etc - name "*.txt" | xargs ls -l
I this one, I had to use xargs but I still can not understand the difference and I am not able to decide correctly when to use xargs and when not.
xargs can be used when you need to take the output from one command and use it as an argument to another. In your first example, grep takes the data from standard input, rather than as an argument. So, xargs is not needed.
xargs takes data from standard input and executes a command. By default, the data is appended to the end of the command as an argument. It can be inserted anywhere however, using a placeholder for the input. The traditional placeholder is {}; using that, your example command might then be written as:
find /etc -name "*.txt" | xargs -I {} ls -l {}
If you have 3 text files in /etc you'll get a full directory listing of each. Of course, you could just have easily written ls -l /etc/*.txt and saved the trouble.
Another example lets you rename those files, and requires the placeholder {} to be used twice.
find /etc -name "*.txt" | xargs -I {} mv {} {}.bak
These are both bad examples, and will break as soon as you have a filename containing whitespace. You can work around that by telling find to separate filenames with a null character.
find /etc -print0 -name "*.txt" | xargs -I {} -0 mv {} {}.bak
My personal opinion is that there are almost always alternatives to using xargs (such as the -exec argument to find) and you will be better served by learning those.
When you use piping without xargs, the actual data is fed into the next command. On the other hand, when using piping with xargs, the actual data is viewed as a parameter to the next command. To give a concrete example, say you have a folder with a.txt and b.txt. a.txt contains just a single line 'hello world!', and b.txt is just empty.
If you do
ls | grep txt
you would end up getting the output:
a.txt
b.txt
Yet, if you do
ls | xargs grep txt
you would get nothing since neither file a.txt nor b.txt contains the word txt.
If the command is
ls | xargs grep hello
you would get:
hello world!
That's because with xargs, the two filenames given by ls are passed to grep as arguments, rather than the actual content.
Short answer: Avoid xargs for now. Return to xargs when you have written dozens or hundreds of scripts.
Commands can get their input from parameters (like rm bad_example) or can get the input from stdin (not just the y on the question after rm -i is_this_bad_too, but also read answer). Other commands like grep and sed will look for parameters and when the parameters don't show the input, switch to the input.
Your grep example works fine reading from stdin, nothing special needed.
Your ls needs the output of find as a parameter. xargs is just one way to turn things around. Use man xargs for more about xargs. Alternatives:
find /etc -name "*.txt" -exec ls -l {} \;
find /etc -name "*.txt" -ls
ls -l $(find /etc -name "*.txt" )
ls /etc/*.txt
First try to see which of this commands is best when you have a nasty filename with spaces.txt in /etc.
xargs(1) is dangerous (broken, exploitable, etc.) when reading non-NUL-delimited input.
If you're working with filenames, use find's -exec [command] {} + instead.
If you can get NUL-delimited output, use xargs -0.
GNU Parallel can do the same as xargs, but does not have the broken and exploitable "features".
You can learn GNU Parallel by looking at examples http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Working-as-xargs--n1.-Argument-appending and walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html

Find, grep, and execute - all in one?

This is the command I've been using for finding matches (queryString) in php files, in the current directory, with grep, case insensitive, and showing matching results in line:
find . -iname "*php" -exec grep -iH queryString {} \;
Is there a way to also pipe just the file name of the matches to another script?
I could probably run the -exec command twice, but that seems inefficient.
What I'd love to do on Mac OS X is then actually to "reveal" that file in the finder. I think I can handle that part. If I had to give up the inline matches and just let grep show the files names, and then pipe that to a third script, that would be fine, too - I would settle.
But I'm actually not even sure how to pipe the output (the matched file names) to somewhere else...
Help! :)
Clarification
I'd like to reveal each of the files in a finder window - so I'm probably not going to using the -q flag and stop at the first one.
I'm going to run this in the console, ideally I'd like to see the inline matches printed out there, as well as being able to pipe them to another script, like oascript (applescript, to reveal them in the finder). That's why I have been using -H - because I like to see both the file name and the match.
If I had to settle for just using -l so that the file name could more easily be piped to another script, that would be OK, too. But I think after looking at the reply below from #Charlie Martin, that xargs could be helpful here in doing both at the same time with a single find, and single grep command.
I did say bash but I don't really mind if this needs to be ran as /bin/sh instead - I don't know too much about the differences yet, but I do know there are some important ones.
Thank you all for the responses, I'm going to try some of them at the command line and see if I can get any of them to work and then I think I can choose the best answer. Leave a comment if you want me to clarify anything more.
Thanks again!
You bet. The usual thing is something like
$ find /path -name pattern -print | xargs command
So you might for example do
$ find . -name '*.[ch]' -print | xargs grep -H 'main'
(Quiz: why -H?)
You can carry on with this farther; for example. you might use
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1
to get the vector of file names for files that contain 'main', or
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1 |
xargs growlnotify -
to have each name become a Growl notification.
You could also do
$ grep pattern `find /path -name pattern`
or
$ grep pattern $(find /path -name pattern)
(in bash(1) at least these are equivalent) but you can run into limits on the length of a command line that way.
Update
To answer your questions:
(1) You can do anything in bash you can do in sh. The one thing I've mentioned that would be any different is the use of $(command) in place of using backticks around command, and that works in the version of sh on Macs. The csh, zsh, ash, and fish are different.
(2) I think merely doing $ open $(dirname arg) will opena finder window on the containing directory.
It sounds like you want to open all *.php files that contain querystring from within a Terminal.app session.
You could do it this way:
find . -name '*.php' -exec grep -li 'querystring' {} \; | xargs open
With my setup, this opens MacVim with each file on a separate tab. YMMV.
Replace -H with -l and you will get a list of those filenames that matched the pattern.
if you have bash4, simply do
grep pattern /path/**/*.php
the ** operator is like
grep pattern `find -name \*.php -print`
find /home/aaronmcdaid/Code/ -name '*.cpp' -exec grep -q -iH boost {} \; -exec echo {} \;
The first change I made is to add -q to your grep command. This is "Exit immediately with zero status if any match is found".
The good news is that this speeds up grep when a file has many matching lines. You don't care how many matches there are. But that means we need another exec on the end to actually print the filenames when grep has been successful
The grep result will be sent to stdout, so another -exec predicate is probably the best solution here.
Pipe to another script:
find . -iname "*.php" | myScript
File names will come into the stdin of myScript 1 line at a time.
You can also use xargs to form/execute commands to act on each file:
find . -iname "*.php" | xargs ls -l
act on files you find that match:
find . -iname "*.php" | xargs grep -l pattern | myScript
act that don't match pattern
find . -iname "*.php" | xargs grep -L pattern | myScript
In general using multiple -exec's and grep -q will be FAR faster than piping, since find has implied short circuits -a's separating each juxtaposed pair of expressions that's not separated with an explicit operator. The main problem here, is that you want something to happen if grep matches something AND for matches to be printed. If the files are reasonably sized then this should be faster (because grep -q exits after finding a single match)
find . -iname "*php" -exec grep -iq queryString {} \; -exec grep -iH queryString {} \; -exec otherprogram {} \;
If the files are particularly big, encapsulating it in a shell script may be faster then running multiple grep commands
find . -iname "*php" -exec bash -c \
'out=$(grep -iH queryString "$1"); [[ -n $out ]] && echo "$out" && exit 0 || exit 1' \
bash {} \; -print
Also note, if the matches are not particularly needed, then
find . -iname "*php" -exec grep -iq queryString {} \; -exec otherprogram {} \;
Will virtually always be faster than then a piped solution like
find . -iname "*php" -print0 | xargs -0 grep -iH | ...
Additionally, you should really have -type f in all cases, unless you want to catch *php directories
Regarding the question of which is faster, and you actually care about the minuscule time difference, which maybe you might if you are trying to see which will save your processor some time... perhaps testing using the command as a suffix to the "time" command, and see which one performs better.

How do I use a pipe in the exec parameter for a find command?

I'm trying to construct a find command to process a bunch of files in a directory using two different executables. Unfortunately, -exec on find doesn't allow to use pipe or even \| because the shell interprets that character first.
Here is specifically what I'm trying to do (which doesn't work because pipe ends the find command):
find /path/to/jpgs -type f -exec jhead -v {} | grep 123 \; -print
Try this
find /path/to/jpgs -type f -exec sh -c 'jhead -v {} | grep 123' \; -print
Alternatively you could try to embed your exec statement inside a sh script and then do:
find -exec some_script {} \;
A slightly different approach would be to use xargs:
find /path/to/jpgs -type f -print0 | xargs -0 jhead -v | grep 123
which I always found a bit easier to understand and to adapt (the -print0 and -0 arguments are necessary to cope with filenames containing blanks)
This might (not tested) be more effective than using -exec because it will pipe the list of files to xargs and xargs makes sure that the jhead commandline does not get too long.
With -exec you can only run a single executable with some arguments, not arbitrary shell commands. To circumvent this, you can use sh -c '<shell command>'.
Do note that the use of -exec is quite inefficient. For each file that is found, the command has to be executed again. It would be more efficient if you can avoid this. (For example, by moving the grep outside the -exec or piping the results of find to xargs as suggested by Palmin.)
Using find command for this type of a task is maybe not the best alternative. I use the following command frequently to find files that contain the requested information:
for i in dist/*.jar; do echo ">> $i"; jar -tf "$i" | grep BeanException; done
As this outputs a list would you not :
find /path/to/jpgs -type f -exec jhead -v {} \; | grep 123
or
find /path/to/jpgs -type f -print -exec jhead -v {} \; | grep 123
Put your grep on the results of the find -exec.
There is kind of another way you can do it but it is also pretty ghetto.
Using the shell option extquote you can do something similar to this in order to make find exec stuff and then pipe it to sh.
root#ifrit findtest # find -type f -exec echo ls $"|" cat \;|sh
filename
root#ifrit findtest # find -type f -exec echo ls $"|" cat $"|" xargs cat\;|sh
h
I just figured I'd add that because at least the way i visualized it, it was closer to the OP's original question of using pipes within exec.

Resources