xargs with multiple commands - bash

In the current directory, I'd like to print the filename and contents in it.
I can print filenames or contents separately by
find . | grep "file_for_print" | xargs echo
find . | grep "file_for_print" | xargs cat
but what I want is printing them together like this:
file1
line1 inside file1
line2 inside file1
file2
line1 inside file2
line2 inside file2
I read xargs with multiple commands as argument
and tried
find . | grep "file_for_print" | xargs -I % sh -c 'echo; cat;'
but doesn't work.
I'm not familiar with xargs, so don't know what exactly "-I % sh -c" means.
could anyone help me? thank you!

find . | grep "file_for_print" | xargs -I % sh -c 'echo %; cat %;' (OP was missing %s)

To start with, there is virtually no difference between:
find . | grep "file_for_print" | xargs echo
and
find . -name "file_for_print*"
except that the second one will not match filenames like this_is_not_the_file_for_print, and it will print the filenames one per line. It will also be a lot faster, because it doesn't need to generate and print the entire recursive directory structure just in order for grep to toss most of it away.
find . -name "file_for_print*"
is actually exactly the same as
find . -name "file_for_print*" -print
where the -print action prints each matched filename followed by a newline. If you don't provide find with any actions, it assumes you wanted -print. But it has more tricks up its sleeve than that. For example:
find . -name "file_for_print*" -exec cat {} \;
The -exec action causes find to execute the following command, up to the \;, replacing {} with each matching file name.
find does not limit itself to a single action. You can tell it to do however many you want. So:
find . -name "file_for_print*" -print -exec cat {} \;
will probably do pretty well what you want.
For lots more information on this very useful utility, type:
man find
or
info find
and read all about It.

Since it's not been said yet: -I % tells xargs to replace '%' with the arguments in the command you give it. The sh -c '...' just means run the commands '...' in a new shell.
So
xargs -I % sh -c 'echo %; cat %;'
will run echo [filename] followed by cat [filename] for every filename given to xargs. The echo and cat commands will be executed inside a different shell process but this usually doesn't matter. Your version didn't work because it was missing the % signs inside the command passed to xargs.
For what it's worth I would use this command to achieve the same thing:
find -name "*file_for_print*" | parallel 'echo {}; cat {};'
because it's simpler (parallel automatically uses {} as the substitution character and can take multiple commands by default).

In this specific case, each command is executed for each individual file anyway, so there's no advantage in using xargs. You may just append -exec twice to your 'find':
find . -name "*file_for_print*" -exec echo {} \; -exec cat {} \;
In this case-print could be used instead of the first echo as pointed out by rici, but this example shows the ability to execute two arbitrary commands with a single find

What about writing your own bash function?
#!/bin/bash
myFunction() {
while read -r file; do
echo "$file"
cat "$file"
done
}
find . -name "file_for_print*" | myFunction

Related

How can I get xargs to do something with the input, then do another thing?

I'm in zsh.
I'd like to do something like:
find . -iname *.md | xargs cat && echo "---" > all_slides_with_separators_in_between.md
Of course this cats all the slides, then appends a single "---" at the end instead of after each slide.
Is there an xargs way of doing this? Can I replace cat && echo "---" with some inline function or do block?
Very strangely, when I create a file cat---.sh with the contents
cat $1
echo ---
and run
find . -iname *.md | xargs ./cat---.sh
it only executes for the first result of find.
Replace cat---.sh with cat and it runs on both files.
There's no need to use xargs at all here. Following is a properly paranoid approach (robust against files with spaces, files with newlines, files with literal backslashes in their names, etc):
while IFS= read -r -d '' filename; do
printf '---\n'
cat -- "$filename"
done < <(find . -iname '*.md' -print0) >all_slides_with_separators.md
However -- you don't even need that either: find can do all the work itself, both printing the separator and calling cat!
find . -iname '*.md' -printf '---\n' -exec cat -- '{}' ';' >all_slides_with_separators.md
A common usage pattern is xargs sh -c 'command; another' _ where the entire shell script in the quotes will have access to the command-line arguments. The underscore is because the first argument to sh -c will be assigned to $0 (where you'd often see e.g. -sh in a ps listing).
find . -iname '*.md' |
xargs sh -c 'for x; do
cat "$x" && echo "---"
done' _ > all_slides_with_separators_in_between.md
As noted in the comments, you should probably investigate find -print0 and the corresponding xargs -0 option in GNU find (and maybe install it if you don't have it).
You can do something like this, but it can be insecure in some cases (see comments):
find . -iname '*.md' | xargs -I % sh -c '{ cat %; echo "----"; }' > output.txt
You'll rarely need find in zsh; its globbing facilities cover nearly every use case of find.
for f in (#i)**/*.md; do
cat $f
print -- "---"
done > all_slides.md
This looks in the current directory hierarchy for every file that matches *.md in a case-insensitive manner.
For even more efficiency, replace cat $f with < $f; zsh itself will read the file and write its contents to standard output.
Using GNU Parallel it looks like this:
parallel cat {}\; print -- --- ::: **/*.md

how to grep large number of files?

I am trying to grep 40k files in the current directory and i am getting this error.
for i in $(cat A01/genes.txt); do grep $i *.kaks; done > A01/A01.result.txt
-bash: /usr/bin/grep: Argument list too long
How do one normally grep thousands of files?
Thanks
Upendra
This makes David sad...
Everyone so far is wrong (except for anubhava).
Shell scripting is not like any other programming language because much of the interpretation of lines comes from the power of the shell interpolating them before the command is actually executed.
Let's take something simple:
$ set -x
$ ls
+ ls
bar.txt foo.txt fubar.log
$ echo The text files are *.txt
echo The text files are *.txt
> echo The text files are bar.txt foo.txt
The text files are bar.txt foo.txt
$ set +x
$
The set -x allows you to see how the shell actually interpolates the glob and then passes that back to the command as input. The > points to the line that is actually being executed by the command.
You can see that the echo command isn't interpreting the *. Instead, the shell grabs the * and replaces it with the names of the matching files. Then and only then does the echo command actually executes the command.
When you have 40K plus files, and you do grep *, you're expanding that * to the names of those 40,000 plus files before grep even has a chance to execute, and that's where the error message /usr/bin/grep: Argument list too long is coming from.
Fortunately, Unix has a way around this dilemma:
$ find . -name "*.kaks" -type f -maxdepth 1 | xargs grep -f A01/genes.txt
The find . -name "*.kaks" -type f -maxdepth 1 will find all of your *.kaks files, and the -depth 1 will only include files in the current directory. The -type f makes sure you only pick up files and not a directory.
The find command pipes the names of the files into xargs and xargs will append the names of the file to the grep -f A01/genes.txtcommand. However, xargs has a trick up it sleeve. It knows how long the command line buffer is, and will execute the grep when the command line buffer is full, then pass in another series of file to the grep. This way, grep gets executed maybe three or ten times (depending upon the size of the command line buffer), and all of our files are used.
Unfortunately, xargs uses whitespace as a separator for the file names. If your files contain spaces or tabs, you'll have trouble with xargs. Fortunately, there's another fix:
$ find . -name "*.kaks" -type f -maxdepth 1 -print0 | xargs -0 grep -f A01/genes.txt
The -print0 will cause find to print out the names of the files not separated by newlines, but by the NUL character. The -0 parameter for xargs tells xargs that the file separator isn't whitespace, but the NUL character. Thus, fixes the issue.
You could also do this too:
$ find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the grep for each and every file found instead of what xargs does and only runs grep for all the files it can stuff on the command line. The advantage of this is that it avoids shell interference entirely. However, it may or may not be less efficient.
What would be interesting is to experiment and see which one is more efficient. You can use time to see:
$ time find . -name "*.kaks" -type f -maxdepth 1 -exec grep -f A01/genes.txt {} \;
This will execute the command and then tell you how long it took. Try it with the -exec and with xargs and see which is faster. Let us know what you find.
You can combine find with grep like this:
find . -maxdepth 1 -name '*.kaks' -exec grep -H -f A01/genes.txt '{}' \; > A01/A01.result.txt
you can use recursive feature of grep:
for i in $(cat A01/genes.txt); do
grep -r $i .
done > A01/A01.result.txt
though if you want to select only kaks files:
for i in $(cat A01/genes.txt); do
find . -iregex '.*\.kaks$' -exec grep $i \;
done > A01/A01.result.txt
Put another for loop inside your outer one:
for f in *.kaks; do
grep -H $i "$f"
done
By the way, are you interested in finding EVERY occurrence in each file, or merely if the search string exists in there one or more times? If it is "good enough" to know the string occurs in there one or more times you can specify "-n 1" to grep and it will not bother reading/searching the rest of the file after finding the first match, which could potentially save lots of time.
The following solution has worked for me:
Problem:
grep -r "example\.com" *
-bash: /bin/grep: Argument list too long
Solution:
grep -r "example\.com" .
["In newer versions of grep you can omit the “.“, as the current directory is implied."]
Source:
Reinlick, J. https://www.saotn.org/bash-grep-through-large-number-files-argument-list-too-long/

In-line text replacement using sed, shell, or some other means

I want to pass two parameters to a program, a file name and a modified version of the file name. The situation is I have a bunch of .html.erb files in a directory tree, and I want invoke html2haml on them with the original filename and a new output filename with the haml extension, like so:
html2haml thing.html.erb thing.html.haml
Here's my current best attempt at this:
find . -name "*.html.erb" -exec echo {} `echo {} | sed "s/.erb/.haml/g"` \;
(after I'm done testing I'll replace echo with html2haml and run it again)
However it doesn't work. The result of the expression inside backticks is the unmodified string.
Here are some experiments I tried which DO behave as expected (to test if my syntax and levels of escaping/quotes were correct):
1. echo myfile.foo | sed 's/foo/foo2/g'
2. find . -name "*.html.erb" -exec echo {} `echo xyz | sed "s/y/Y/g"` \;
3. find . -name "*.html.erb" -exec echo {} `echo {} hello` \;
4. find . -name "*.html.erb" -exec echo {} `echo {}` \;
The fact that these all behave as expected suggest to me that I am getting some small thing wrong in the syntax, and that is is indeed possible to do this with a one-liner.
If this is impossible, it might be because of a misunderstanding about "when" find inserts its results on each invocation. example #3 above suggest to me that it does it exactly when i need/expect it to (because I'm successfully concatenating each individual result string with "hello").
If you have gsed:
find . -name \*.erb -print0 | gsed -z 'p;s/.erb$/.haml/' | xargs -0 -n2 html2haml
If you don't have gsed and only have sed, this will work, but only if none of your file names have whitespace.
find . -name \*.erb -print | sed 'p;s/.erb$/.haml/' | xargs -n2 html2haml
Discussion about these and other techniques follows:
I have different versions of sed - my GNU sed is called gsed, if your sed is GNU - instead of gsed use sed.
You can check your sed with the sed --version, if prints something like:
sed (GNU sed) 4.2.2
Copyright (C) 2012 Free Software Foundation, Inc.
You have a GNU sed.
The above - for the next find
$ find . -name \*foo -print
./a/test.foo
./b/c/test.foo
./b/te st.foo #<- note the filename with space
./b/test.foo
the above command produces:
$find . -name \*foo -print0 | gsed -z 'p;s/foo$/foo2/' | xargs -0 -n2 echo bar
bar ./a/test.foo ./a/test.foo2
bar ./b/c/test.foo ./b/c/test.foo2
bar ./b/te st.foo ./b/te st.foo2
bar ./b/test.foo ./b/test.foo2
Without additional scripts or functions. ;)
or you can replace the sed with perl, so the next
find . -name \*foo -print0 | perl -n0le 'print;s/foo/foo2/;print' | xargs -0 -n2 echo bar
produces the same result:
bar ./a/test.foo ./a/test.foo2
bar ./b/c/test.foo ./b/c/test.foo2
bar ./b/te st.foo ./b/te st.foo2
bar ./b/test.foo ./b/test.foo2
IF you REALLY want to do it within one find, try:
find . -name \*html.erb -exec sh -c 'echo html2haml "{}" "$(echo "{}" | sed 's/\.erb/\.haml/')"' \;
or elimitating two useless echo the final command:
find . -name \*html.erb -exec sh -c 'html2haml "{}" "$(sed 's/\.erb/\.haml/'<<<"{}")"' \;
What about a loop?
find . -name "*.html.erb" | while read file
do
haml_file=${file%.erb}.haml
html2haml $file $haml_file
done
The ${var%glob} syntax takes an environment variable ${var} and filters out the smallest portion of the right side that matches glob.
If you know that the filename ends with .foo, then you can use:
do_something "$filename" "${filename%.foo}.foo2"
(In the unlikely case that you really want to just put a 2 on the end, you could of course just use "${filename}2". But I assume the foo and foo2 are to be substituted with less similar strings.)
If you want to invoke do_something from find, your best bet would be to pass it only one filename (or, better, a number of filenames each of them representing a single operation). For example:
-- do_something.sh
#!/bin/bash
# This is the definition of what you want to do.
# It is called as `bar old_filename new_filename`
bar() {
# For example
mv "$1" "$2"
}
for filename in "$#"; do
bar "$filename" "${filename%.foo}.foo2"
done
-- find command:
find . -type f -name '*.foo' -exec do_something.sh {} +
If you really need to use sed (for something that you can't even do with the bash replace syntax, ${var/pattern/substitution}), then set up do_something as above, but replace the line inside the for loop with, for example:
bar "$filename" "$(sed -r 's/([^.]+)\.([^.]+)$/\2.\1/' <<<"$filename")"
Explanation: The above sed expression (gnu-specific) flips the last two extensions around, so it would change some.file.html.en into some.file.en.html. -r causes gnu sed to use extended regex format, which I find more readable. <<< is a bashism which expands the word following it and feeds it into stdin, somewhat similar to echo "$filename" | sed ... but without creating another subprocess.
You can call your find like this:
find . -name "*.html.erb" -print0 -print0|xargs -0 -J % html2haml % | sed 's/\.erb$/.haml/'
This will result in executing:
html2haml thing.html.erb thing.html.haml

Find, grep, and execute - all in one?

This is the command I've been using for finding matches (queryString) in php files, in the current directory, with grep, case insensitive, and showing matching results in line:
find . -iname "*php" -exec grep -iH queryString {} \;
Is there a way to also pipe just the file name of the matches to another script?
I could probably run the -exec command twice, but that seems inefficient.
What I'd love to do on Mac OS X is then actually to "reveal" that file in the finder. I think I can handle that part. If I had to give up the inline matches and just let grep show the files names, and then pipe that to a third script, that would be fine, too - I would settle.
But I'm actually not even sure how to pipe the output (the matched file names) to somewhere else...
Help! :)
Clarification
I'd like to reveal each of the files in a finder window - so I'm probably not going to using the -q flag and stop at the first one.
I'm going to run this in the console, ideally I'd like to see the inline matches printed out there, as well as being able to pipe them to another script, like oascript (applescript, to reveal them in the finder). That's why I have been using -H - because I like to see both the file name and the match.
If I had to settle for just using -l so that the file name could more easily be piped to another script, that would be OK, too. But I think after looking at the reply below from #Charlie Martin, that xargs could be helpful here in doing both at the same time with a single find, and single grep command.
I did say bash but I don't really mind if this needs to be ran as /bin/sh instead - I don't know too much about the differences yet, but I do know there are some important ones.
Thank you all for the responses, I'm going to try some of them at the command line and see if I can get any of them to work and then I think I can choose the best answer. Leave a comment if you want me to clarify anything more.
Thanks again!
You bet. The usual thing is something like
$ find /path -name pattern -print | xargs command
So you might for example do
$ find . -name '*.[ch]' -print | xargs grep -H 'main'
(Quiz: why -H?)
You can carry on with this farther; for example. you might use
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1
to get the vector of file names for files that contain 'main', or
$ find . -name '*.[ch]' -print | xargs grep -H 'main' | cut -d ':' -f 1 |
xargs growlnotify -
to have each name become a Growl notification.
You could also do
$ grep pattern `find /path -name pattern`
or
$ grep pattern $(find /path -name pattern)
(in bash(1) at least these are equivalent) but you can run into limits on the length of a command line that way.
Update
To answer your questions:
(1) You can do anything in bash you can do in sh. The one thing I've mentioned that would be any different is the use of $(command) in place of using backticks around command, and that works in the version of sh on Macs. The csh, zsh, ash, and fish are different.
(2) I think merely doing $ open $(dirname arg) will opena finder window on the containing directory.
It sounds like you want to open all *.php files that contain querystring from within a Terminal.app session.
You could do it this way:
find . -name '*.php' -exec grep -li 'querystring' {} \; | xargs open
With my setup, this opens MacVim with each file on a separate tab. YMMV.
Replace -H with -l and you will get a list of those filenames that matched the pattern.
if you have bash4, simply do
grep pattern /path/**/*.php
the ** operator is like
grep pattern `find -name \*.php -print`
find /home/aaronmcdaid/Code/ -name '*.cpp' -exec grep -q -iH boost {} \; -exec echo {} \;
The first change I made is to add -q to your grep command. This is "Exit immediately with zero status if any match is found".
The good news is that this speeds up grep when a file has many matching lines. You don't care how many matches there are. But that means we need another exec on the end to actually print the filenames when grep has been successful
The grep result will be sent to stdout, so another -exec predicate is probably the best solution here.
Pipe to another script:
find . -iname "*.php" | myScript
File names will come into the stdin of myScript 1 line at a time.
You can also use xargs to form/execute commands to act on each file:
find . -iname "*.php" | xargs ls -l
act on files you find that match:
find . -iname "*.php" | xargs grep -l pattern | myScript
act that don't match pattern
find . -iname "*.php" | xargs grep -L pattern | myScript
In general using multiple -exec's and grep -q will be FAR faster than piping, since find has implied short circuits -a's separating each juxtaposed pair of expressions that's not separated with an explicit operator. The main problem here, is that you want something to happen if grep matches something AND for matches to be printed. If the files are reasonably sized then this should be faster (because grep -q exits after finding a single match)
find . -iname "*php" -exec grep -iq queryString {} \; -exec grep -iH queryString {} \; -exec otherprogram {} \;
If the files are particularly big, encapsulating it in a shell script may be faster then running multiple grep commands
find . -iname "*php" -exec bash -c \
'out=$(grep -iH queryString "$1"); [[ -n $out ]] && echo "$out" && exit 0 || exit 1' \
bash {} \; -print
Also note, if the matches are not particularly needed, then
find . -iname "*php" -exec grep -iq queryString {} \; -exec otherprogram {} \;
Will virtually always be faster than then a piped solution like
find . -iname "*php" -print0 | xargs -0 grep -iH | ...
Additionally, you should really have -type f in all cases, unless you want to catch *php directories
Regarding the question of which is faster, and you actually care about the minuscule time difference, which maybe you might if you are trying to see which will save your processor some time... perhaps testing using the command as a suffix to the "time" command, and see which one performs better.

Write a shell script that find-greps and outputs filename and content in 1 line

To see all the php files that contain "abc" I can use this simple script:
find . -name "*php" -exec grep -l abc {} \;
I can omit the -l and i get extracted some part of the content instead of the filenames as results:
find . -name "*php" -exec grep abc {} \;
What I would like now is a version that does both at the same time, but on the same line.
Expected output:
path1/filename1: lorem abc ipsum
path2/filename2: ipsum abc lorem
path3/filename3: non abc quod
More or less like grep abc * does.
Edit: I want to use this as a simple shell script. It would be great if the output is on one line, so further grepping would be possible. But it is not necessary that the script is only one line, i am putting it in a bash script file anyways.
Edit 2: Later I found "ack", which is a great tool and I use this now in most cases instead of grep. It does all this and more. http://betterthangrep.com/ You would write ack --php --nogroup abc to get the desired result
Use the -H switch (man grep):
find . -name "*php" -exec grep -H abc {} \;
Alternative using xargs (now the -H switch is not needed, at least for the version of grep I have here):
find . -name "*php" -print | xargs grep abc
Edit: As a consequence of grep's behavior as noted by orsogufo, the second command above should use -H if find could conceivably return only a single filename (i.e. if there is only a single PHP file). If orsogufo's comment w.r.t. -print0 is also incorporated, the command becomes:
find . -name "*php" -print0 | xargs -0 grep -H abc
Edit 2: A (more1) POSIX compliant version as proposed by Jonathan Leffler, which through the use of /dev/null avoids the -H switch:
find . -name "*php" -print0 | xargs -0 grep abc /dev/null
1: A quote from the opengroup.org manual on find hints that -print0 is non-standard:
A feature of SVR4's find utility was
the -exec primary's + terminator. This
allowed filenames containing special
characters (especially s) to
be grouped together without the
problems that occur if such filenames
are piped to xargs. Other
implementations have added other ways
to get around this problem, notably a
-print0 primary that wrote filenames with a null byte terminator. This was
considered here, but not adopted.
Using a null terminator meant that any
utility that was going to process
find's -print0 output had to add a new
option to parse the null terminators
it would now be reading.
If you don't need to recursively search, you can just do..
grep -H abc *.php
..which gives you the desired output. -H is the default behaviour (at least on the OS X version of grep), so you can omit this:
grep abc *.php
You can grep recursively using the -R flag, but you're unable limit it to .php files:
grep -R abc *
Again, this has the same desired output.
I know this doesn't exactly answer your questions, it's just.. an alternative... The above are just grep with a single flag, so are easier to remember than find/-exec/grep/xargs combinations! (irrelevant for a script, but useful for day-to-day shell'ing)
find /path -type f -name "*.php" | awk '
{
while((getline line<$0)>0){
if(line ~ /time/){
print $0":"line
#do some other things here
}
}
}'

Resources