Bash: escape characters in backticks - bash

I'm trying to escape characters within backticks in my bash command, mainly to handle spaces in filenames which cause my command to fail.
The command I have so far is:
grep -Li badword `grep -lr goodword *`
This command should result in a list of files that do not contain the word "badword" but do contain "goodword".

Your approach, even if you get the escaping right, will run into problems when the number of files output by the goodword grep reaches the limits on command-line length. It is better to pipe the output of the first grep onto a second grep, like this
grep -lr -- goodword * | xargs grep -Li -- badword
This will correctly handle files with spaces in them, but it will fail if a file name has a newline in it. At least GNU grep and xargs support separating the file names with NUL bytes, like this
grep -lrZ -- goodword * | xargs -0 grep -Li -- badword
EDIT: Added double dashes -- to grep invocations to avoid the case when some file names start with - and would be interpreted by grep as additional options.

How about rewrite it to:
grep -lr goodword * | grep -Li badword

Related

How to wrap output lines in quotes in bash?

Essentially I want the inverse operation performed in this question.
I'm running a search, looking for files that have Windows line endings (\r\n) as I want to remove them.
$ grep -URl ^M .
Some of the returned files have spaces in their names:
./file name 1.txt
./file name 2.txt
In order to pass this on to another tool via xargs, I need to quote the lines. How can I transform to this output instead:
"./file name 1.txt"
"./file name 2.txt"
BSD grep provides a --null option to print names followed by a null byte (instead of a newline).
GNU grep provides a -Z or --null option with the same semantics.
Both BSD and GNU xargs take a -0 option to indicate that file names are separated by null bytes.
Hence:
grep -URl --null ^M . | xargs -0 ...
To avoid the space problem I'd use new line character as separator for xargs with the -d option:
xargs -d '\n' ...
So my solution would be:
grep -URl ^M . | xargs -d '\n' rm

perform an operation for *each* item listed by grep

How can I perform an operation for each item listed by grep individually?
Background:
I use grep to list all files containing a certain pattern:
grep -l '<pattern>' directory/*.extension1
I want to delete all listed files but also all files having the same file name but a different extension: .extension2.
I tried using the pipe, but it seems to take the output of grep as a whole.
In find there is the -exec option, but grep has nothing like that.
If I understand your specification, you want:
grep --null -l '<pattern>' directory/*.extension1 | \
xargs -n 1 -0 -I{} bash -c 'rm "$1" "${1%.*}.extension2"' -- {}
This is essentially the same as what #triplee's comment describes, except that it's newline-safe.
What's going on here?
grep with --null will return output delimited with nulls instead of newline. Since file names can have newlines in them delimiting with newline makes it impossible to parse the output of grep safely, but null is not a valid character in a file name and thus makes a nice delimiter.
xargs will take a stream of newline-delimited items and execute a given command, passing as many of those items (one as each parameter) to a given command (or to echo if no command is given). Thus if you said:
printf 'one\ntwo three \nfour\n' | xargs echo
xargs would execute echo one 'two three' four. This is not safe for file names because, again, file names might contain embedded newlines.
The -0 switch to xargs changes it from looking for a newline delimiter to a null delimiter. This makes it match the output we got from grep --null and makes it safe for processing a list of file names.
Normally xargs simply appends the input to the end of a command. The -I switch to xargs changes this to substitution the specified replacement string with the input. To get the idea try this experiment:
printf 'one\ntwo three \nfour\n' | xargs -I{} echo foo {} bar
And note the difference from the earlier printf | xargs command.
In the case of my solution the command I execute is bash, to which I pass -c. The -c switch causes bash to execute the commands in the following argument (and then terminate) instead of starting an interactive shell. The next block 'rm "$1" "${1%.*}.extension2"' is the first argument to -c and is the script which will be executed by bash. Any arguments following the script argument to -c are assigned as the arguments to the script. This, if I were to say:
bash -c 'echo $0' "Hello, world"
Then Hello, world would be assigned to $0 (the first argument to the script) and inside the script I could echo it back.
Since $0 is normally reserved for the script name I pass a dummy value (in this case --) as the first argument and, then, in place of the second argument I write {}, which is the replacement string I specified for xargs. This will be replaced by xargs with each file name parsed from grep's output before bash is executed.
The mini shell script might look complicated but it's rather trivial. First, the entire script is single-quoted to prevent the calling shell from interpreting it. Inside the script I invoke rm and pass it two file names to remove: the $1 argument, which was the file name passed when the replacement string was substituted above, and ${1%.*}.extension2. This latter is a parameter substitution on the $1 variable. The important part is %.* which says
% "Match from the end of the variable and remove the shortest string matching the pattern.
.* The pattern is a single period followed by anything.
This effectively strips the extension, if any, from the file name. You can observe the effect yourself:
foo='my file.txt'
bar='this.is.a.file.txt'
baz='no extension'
printf '%s\n'"${foo%.*}" "${bar%.*}" "${baz%.*}"
Since the extension has been stripped I concatenate the desired alternate extension .extension2 to the stripped file name to obtain the alternate file name.
If this does what you want, pipe the output through /bin/sh.
grep -l 'RE' folder/*.ext1 | sed 's/\(.*\).ext1/rm "&" "\1.ext2"/'
Or if sed makes you itchy:
grep -l 'RE' folder/*.ext1 | while read file; do
echo rm "$file" "${file%.ext1}.ext2"
done
Remove echo if the output looks like the commands you want to run.
But you can do this with find as well:
find /path/to/start -name \*.ext1 -exec grep -q 'RE' {} \; -print | ...
where ... is either the sed script or the three lines from while to done.
The idea here is that find will ... well, "find" things based on the qualifiers you give it -- namely, that things match the file glob "*.ext", AND that the result of the "exec" is successful. The -q tells grep to look for RE in {} (the file supplied by find), and exit with a TRUE or FALSE without generating any of its own output.
The only real difference between doing this in find vs doing it with grep is that you get to use find's awesome collection of conditions to narrow down your search further if required. man find for details. By default, find will recurse into subdirectories.
You can pipe the list to xargs:
grep -l '<pattern>' directory/*.extension1 | xargs rm
As for the second set of files with a different extension, I'd do this (as usual use xargs echo rm when testing to make a dry run; I haven't tested it, it may not work correctly with filenames with spaces in them):
filelist=$(grep -l '<pattern>' directory/*.extension1)
echo $filelist | xargs rm
echo ${filelist//.extension1/.extension2} | xargs rm
Pipe the result to xargs, it will allow you to run a command for each match.

How to apply shell command to each line of a command output?

Suppose I have some output from a command (such as ls -1):
a
b
c
d
e
...
I want to apply a command (say echo) to each one, in turn. E.g.
echo a
echo b
echo c
echo d
echo e
...
What's the easiest way to do that in bash?
It's probably easiest to use xargs. In your case:
ls -1 | xargs -L1 echo
The -L flag ensures the input is read properly. From the man page of xargs:
-L number
Call utility for every number non-empty lines read.
A line ending with a space continues to the next non-empty line. [...]
You can use a basic prepend operation on each line:
ls -1 | while read line ; do echo $line ; done
Or you can pipe the output to sed for more complex operations:
ls -1 | sed 's/^\(.*\)$/echo \1/'
for s in `cmd`; do echo $s; done
If cmd has a large output:
cmd | xargs -L1 echo
You can use a for loop:
for file in * ; do
echo "$file"
done
Note that if the command in question accepts multiple arguments, then using xargs is almost always more efficient as it only has to spawn the utility in question once instead of multiple times.
You actually can use sed to do it, provided it is GNU sed.
... | sed 's/match/command \0/e'
How it works:
Substitute match with command match
On substitution execute command
Replace substituted line with command output.
A solution that works with filenames that have spaces in them, is:
ls -1 | xargs -I %s echo %s
The following is equivalent, but has a clearer divide between the precursor and what you actually want to do:
ls -1 | xargs -I %s -- echo %s
Where echo is whatever it is you want to run, and the subsequent %s is the filename.
Thanks to Chris Jester-Young's answer on a duplicate question.
xargs fails with with backslashes, quotes. It needs to be something like
ls -1 |tr \\n \\0 |xargs -0 -iTHIS echo "THIS is a file."
xargs -0 option:
-0, --null
Input items are terminated by a null character instead of by whitespace, and the quotes and backslash are
not special (every character is taken literally). Disables the end of file string, which is treated like
any other argument. Useful when input items might contain white space, quote marks, or backslashes. The
GNU find -print0 option produces input suitable for this mode.
ls -1 terminates the items with newline characters, so tr translates them into null characters.
This approach is about 50 times slower than iterating manually with for ... (see Michael Aaron Safyans answer) (3.55s vs. 0.066s). But for other input commands like locate, find, reading from a file (tr \\n \\0 <file) or similar, you have to work with xargs like this.
i like to use gawk for running multiple commands on a list, for instance
ls -l | gawk '{system("/path/to/cmd.sh "$1)}'
however the escaping of the escapable characters can get a little hairy.
Better result for me:
ls -1 | xargs -L1 -d "\n" CMD

Bash and filenames with spaces

The following is a simple Bash command line:
grep -li 'regex' "filename with spaces" "filename"
No problems. Also the following works just fine:
grep -li 'regex' $(<listOfFiles.txt)
where listOfFiles.txt contains a list of filenames to be grepped, one
filename per line.
The problem occurs when listOfFiles.txt contains filenames with
embedded spaces. In all cases I've tried (see below), Bash splits the
filenames at the spaces so, for example, a line in listOfFiles.txt
containing a name like ./this is a file.xml ends up trying to run
grep on each piece (./this, is, a and file.xml).
I thought I was a relatively advanced Bash user, but I cannot find a
simple magic incantation to get this to work. Here are the things I've
tried.
grep -li 'regex' `cat listOfFiles.txt`
Fails as described above (I didn't really expect this to work), so I
thought I'd put quotes around each filename:
grep -li 'regex' `sed -e 's/.*/"&"/' listOfFiles.txt`
Bash interprets the quotes as part of the filename and gives "No such
file or directory" for each file (and still splits the filenames with
blanks)
for i in $(<listOfFiles.txt); do grep -li 'regex' "$i"; done
This fails as for the original attempt (that is, it behaves as if the
quotes are ignored) and is very slow since it has to launch one 'grep'
process per file instead of processing all files in one invocation.
The following works, but requires some careful double-escaping if
the regular expression contains shell metacharacters:
eval grep -li 'regex' `sed -e 's/.*/"&"/' listOfFiles.txt`
Is this the only way to construct the command line so it will
correctly handle filenames with spaces?
Try this:
(IFS=$'\n'; grep -li 'regex' $(<listOfFiles.txt))
IFS is the Internal Field Separator. Setting it to $'\n' tells Bash to use the newline character to delimit filenames. Its default value is $' \t\n' and can be printed using cat -etv <<<"$IFS".
Enclosing the script in parenthesis starts a subshell so that only commands within the parenthesis are affected by the custom IFS value.
cat listOfFiles.txt |tr '\n' '\0' |xargs -0 grep -li 'regex'
The -0 option on xargs tells xargs to use a null character rather than white space as a filename terminator. The tr command converts the incoming newlines to a null character.
This meets the OP's requirement that grep not be invoked multiple times. It has been my experience that for a large number of files avoiding the multiple invocations of grep improves performance considerably.
This scheme also avoids a bug in the OP's original method because his scheme will break where listOfFiles.txt contains a number of files that would exceed the buffer size for the commands. xargs knows about the maximum command size and will invoke grep multiple times to avoid that problem.
A related problem with using xargs and grep is that grep will prefix the output with the filename when invoked with multiple files. Because xargs invokes grep with multiple files one will receive output with the filename prefixed, but not for the case of one file in listOfFiles.txt or the case of multiple invocations where the last invocation contains one filename. To achieve consistent output add /dev/null to the grep command:
cat listOfFiles.txt |tr '\n' '\0' |xargs -0 grep -i 'regex' /dev/null
Note that was not an issue for the OP because he was using the -l option on grep; however it is likely to be an issue for others.
This works:
while read file; do grep -li dtw "$file"; done < listOfFiles.txt
With Bash 4, you can also use the builtin mapfile function to set an array containing each line and iterate on this array:
$ tree
.
├── a
│ ├── a 1
│ └── a 2
├── b
│ ├── b 1
│ └── b 2
└── c
├── c 1
└── c 2
3 directories, 6 files
$ mapfile -t files < <(find -type f)
$ for file in "${files[#]}"; do
> echo "file: $file"
> done
file: ./a/a 2
file: ./a/a 1
file: ./b/b 2
file: ./b/b 1
file: ./c/c 2
file: ./c/c 1
Though it may overmatch, this is my favorite solution:
grep -i 'regex' $(cat listOfFiles.txt | sed -e "s/ /?/g")
Do note that if you somehow ended up with a list in a file which has Windows line endings, \r\n, NONE of the notes above about the input file separator $IFS (and quoting the argument) will work; so make sure that the line endings are correctly \n (I use scite to show the line endings, and easily change them from one to the other).
Also cat piped into while file read ... seems to work (apparently without need to set separators):
cat <(echo -e "AA AA\nBB BB") | while read file; do echo $file; done
... although for me it was more relevant for a "grep" through a directory with spaces in filenames:
grep -rlI 'search' "My Dir"/ | while read file; do echo $file; grep 'search\|else' "$ix"; done

Use lines in a file as filenames for grep?

I have a file which contains filenames (and the full path to them) and I want to search for a word within all of them.
some pseudo-code to explain:
grep keyword <all files specified in files.txt>
or
cat files.txt > grep keyword
cat files txt | grep keyword
the problem is that I can only get grep to search the filenames, not the contents of the actual files.
cat files.txt | xargs grep keyword
or
grep keyword `cat files.txt`
or (equivalent to previous but harder to mis-read)
grep keyword $(cat files.txt)
should do the trick.
Pitfalls:
If files.txt contains file names with spaces, either solution will malfunction, because "This is a filename.txt" will be interpreted as four files, "This", "is", "a", and "filename.txt". A good reason why you shouldn't have spaces in your filenames, ever.
There are ways around this, but none of them is trivial. (find ... -print0 / xargs -0 is one of them.)
The second (cat) version can result in a very long command line (which might fail when exceeding the limits of your environment). The first (xargs) version handles long input automatically; xargs offers several options to control the details.
Both of the answers from DevSolar work (tested on Linux Ubuntu), but the xargs version is preferable if there may be many files, since it will avoid running into command line length limits.
so:
cat files.txt | xargs grep keyword
is the way to go
tr '\n' '\0' <files.txt | LANG=C xargs -r0 grep -F keyword
tr will delimit names with NUL character so that spaces not significant (note the corresponding -0 option to xargs).
xargs -r will start a single grep process for a "large" number of files, but not start any grep process if there are no files.
LANG=C means use quick routines for matching, rather than slow locale ones
grep -F means use quick string matching rather than slow regular expression matching
bash, ksh & zsh version:
grep keyword $(<files.txt)
Long time when last created a bash shell script, but you could store the result of the first grep (the one finding all filenames) in an array and iterate over it, issuing even more grep commands.
A good starting point should be the bash scripting guide.

Resources