--exclude-dir option in grep does not work as expected - shell

I'm trying to exclude multiple directories when using grep as in the following command
grep -r --exclude-dir={folder1, folder2} 'foo'
However, an error is raised grep: foo: No such file or directory. Maybe I'm doing something wrong with --exclude-dir option since the command below works as expected
grep -r 'foo'
How can I use --exclude-dir option correctly? Thanks in advance.

The --exclude-dir flag of GNU grep takes a glob expression as an argument. The glob expression with more than items then becomes a brace expansion sequence which is expanded by the shell.
The expansion involves words separated by a comma character and doesn't like spaces between the words. So ideally it should have been
--exclude-dir={folder1,folder2}
You can see this as a simple brace expansion in your shell by running
echo {a,b} # produces 'a b'
echo {a, b} # this doesn't undergo expansion by shell
echo --exclude-dir={folder1, folder2}
--exclude-dir={folder1, folder2}
so, your original command becomes
grep -r '--exclude-dir={folder1,' 'folder2}' foo
i.e. the exclude-dir takes a unexpanded glob expansion string as {folder1,' and 'folder2}' becomes the content that you are trying to search for, leaving foo as an unwanted extra argument, which the argparser of grep doesn't like throwing a command line parse error.
Remember brace expansion is a feature of the shell (e.g. bash), and not grep. In shells that don't support the feature, putting directories between {..} will be treated literally and might not work desirably.

Related

How to escape bash parameter expansion (when a "!" followed by letters is a parameter) [duplicate]

This question already has answers here:
echo "#!" fails -- "event not found"
(5 answers)
Closed 7 years ago.
I am attempting to parse the output of a VNC server startup event and have run into a problem in parsing using sed in a command substitution. Specifically, the remote VNC server is started in a manner such as the following:
address1="user1#lxplus.cern.ch"
VNCServerResponse="$(ssh "${address1}" 'vncserver' 2>&1)"
The standard error output produced in this startup event is then to be parsed in order to extract the server and display information. At this point the content of the variable VNCServerResponse is something such as the following:
New 'lxplus0186.cern.ch:1 (user1)' desktop is lxplus0186.cern.ch:1
Starting applications specified in /afs/cern.ch/user/u/user1/.vnc/xstartup
Log file is /afs/cern.ch/user/u/user1/.vnc/lxplus0186.cern.ch:1.log
This output can be parsed in the following way in order to extract the server and display information:
echo "${VNCServerResponse}" | sed '/New.*desktop.*is/!d' \
| awk -F" desktop is " '{print $2}'
The result is something such as the following:
lxplus0186.cern.ch:1
What I want to do is use this parsing in a command substitution something like the following:
VNCServerAndDisplayNumber="$(echo "${VNCServerResponse}" \
| sed '/New.*desktop.*is/!d' | awk -F" desktop is " '{print $2}')"
On attempting to do this, I am presented with the following error:
bash: !d': event not found
I am not sure how to address this. It appears to be a problem in the way sed is being used in the command substitution. I would appreciate guidance.
Bash history expansion is a very odd corner in the bash command line parser, and you are clearly running into an unexpected history expansion, which is explained below. However, any sort of history expansion in a script is unexpected, because normally history expansion is not enabled in scripts; not even scripts run with the source (or .) builtin.
How history expansion is enabled (or disabled)
There are two shell options which control history expansion:
set -o history: Required for the history to be recorded.
set -H (or set -o histexpand): Additionally required for history expansion to be enabled.
Both of these options must be set for history expansion to be recognized. (I found the manual unclear on this interaction, but it's logical enough.)
According to the bash manual, these options are unset for non-interactive shells, so if you want to enable history expansion in a script (and I cannot imagine a reason you would want this), you would need to set both of them:
set -o history -o histexpand
The situation for scripts run with source is more complicated (and what I'm about to say only applies to bash v4, and since it's undocumented in might change in the future). [Note 3]
History recording (and consequently expansion) is turned off in source'd scripts, but through an internal flag which, as far as I know, is not made visible. It certainly does not appear in $SHELLOPTS. Since a sourced script runs in the current bash context, it shares the current execution environment, including shell options. So in the execution of a sourced script initiated from an interactive session, you'll see both history and histexpand in $SHELLOPTS, but no history expansion will take place. In order to enable it, you need to:
set -o history
which is not a no-op because it has the side-effect of resetting the internal flag which suppresses history recording. Setting the histexpand shell option does not have this side-effect.
In short, I'm not sure how you managed to enable history expansion in a script (if, indeed, the misbehaving command was in a script and not in an interactive shell), but you might want to consider not doing so, unless you have a really good reason.
How history expansion is parsed
The bash implementation of history expansion is designed to work with readline, so that it can be performed during command input. (By default this function is bound to Meta-^; generally Meta is ESC, but you can customize that as well.) However, it is also performed immediately after each line is input, before any bash parsing is performed.
By default, the history expansion character is !, and -- as mostly documented -- that will trigger history expansion except:
when it is followed by whitespace or =
if the shell option extglob is set, and it is followed by ( [Note 1]
if it appears in a single-quoted string
if it is preceded by a \ [Note 2 and see below]
if it is preceded by $ or ${ [Note 1]
if it is preceded by [ [Note 1]
(As of bash v4.3) if it is the last character in a double-quoted string.
The immediate issue here is the precise interpretation of the third case, an ! appearing inside of a single-quoted string. Normally, bash starts a new quoting context for a command substitution ($(...) or the deprecated backtick notation). For example:
$ s=SUBSTITUTED
$ # The interior single quotes are just characters
$ echo "'Echoing $s'"
'Echoing SUBSTITUTED'
$ # The interior single quotes are single quotes
$ echo "$(echo 'Echoing $s')"
Echoing $s
However, the history expansion scanner isn't that intelligent. It keeps track of quotes, but not of command substitution. So as far as it is concerned, both of the single quotes in the above example are double-quoted single quotes, which is to say ordinary characters. So history expansion occurs in both of them:
# A no-op to indicated history expansion
$ HIST() { :; }
# Single-quoted strings inhibit history expansion
$ HIST
$ echo '!!'
!!
# Double-quoted strings allow history expansion
$ HIST
$ echo "'!!'"
echo "'HIST'"
'HIST'
# ... and it applies also to interior command substitution.
$ HIST
$ echo "$(echo '!!')"
echo "$(echo 'HIST')"
HIST
So if you have a perfectly normal command like sed '/foo/!d' file, where you would expect the single-quotes to protect you from history-expansion, and you put it inside a double-quoted command substitution:
result="$(sed '/foo/!d' file)"
you suddenly find that the ! is a history expansion character. Worse, you can't fix this by backslash escaping the exclamation point, because although "\!" inhibits history expansion, it doesn't remove the backslash:
$ echo "\!"
\!
In this particular example -- and the one in the OP -- the double quotes are completely unnecessary, because the right-hand side of a variable assignment does not undergo either filename expansion nor word splitting. However, there are other contexts in which removing the double quotes would change the semantics:
# Undesired history expansion
printf "The answer is '%s'\n" "$(sed '/foo/!d' file)"
# Undesired word splitting
printf "The answer is '%s'\n" $(sed '/foo/!d' file)
In this case, the best solution is probably to put the sed argument in a variable
# Works
sed_prog='/foo/!d'
printf "The answer is '%s'\n" "$(sed "$sed_prog" file)"
(The quotes around $sed_prog were not necessary in this case but usually they would be, and they do no harm.)
Notes:
The inhibition of history expansion when the following character is some form of open parenthesis only works if there is a corresponding close parenthesis in the rest of the string. However, it doesn't have to really match the open parenthesis. For example:
# No matching close parenthesis
$ echo "!("
bash: !: event not found
# The matching close parenthesis has nothing to do with the open
$ echo "!(" ")"
!( )
# An actual extended glob: files whose names don't start with a
$ echo "!(a*)"
b
As indicated in the bash manual, a history-expansion character is treated as an ordinary character if immediately preceded by a backslash. This is literally true; it doesn't matter whether the backslash will later be considered an escape character or not:
$ echo \!
!
$ echo \\!
\!
$ echo \\\!
\!
\ also inhibits history expansion inside double quotes, but \! is not a valid escape sequence inside the double quoted string, so the backslash is not removed:
$ echo "\!"
\!
$ echo "\\!"
\!
$ echo "\\\!"
\\!
I'm referring to the source code for bash v4.2 as I write this, so any undocumented behaviour may be completely different as of v4.3.
The problem is that within double quotes, bash is trying to expand !d before passing it to the subshell. You can get around this problem by removing the double quotes but I would also propose a simplification to your script:
VNCServerAndDisplayNumber=$(echo "$VNCServerResponse" | awk '/desktop/ {print $NF}')
This simply prints the last field on the line containing the word "desktop".
On a newer bash, you can use a herestring rather than piping an echo:
VNCServerAndDisplayNumber=$(awk '/desktop/ {print $NF}' <<<"$VNCServerResponse")
Don't wrap the $(...) command substitution in double quotes. You are asking the shell to perform evaluation on the contents of the quotes and are hitting the history substitution expansion feature. Drop the quotes and you stop telling the shell to do that and you won't hit that problem.
And yes, dropping those quotes is safe on that assignment line even if the output may contain spaces or newlines or whatever. Assignments of that sort are not going to split on those the way command substitution or variable evaluation will on a normal shell execution line.
Alternatively, disable history expansion in your shell/script before you run that. (It should be off when running a script by default I believe anyway.)
This only happens when history expansion is enabled, which it normally isn't and definitely shouldn't be for scripts.
Rather than trying to work around it, figure out why history expansion is enabled and what to do so it isn't.
If you're executing your script with . foo or source foo, use ./foo instead.
If you're writing this as a function in .bashrc or similar, consider making it a separate script.
If your script (or BASH_ENV) explicitly does set -H, don't.
Quote it with '' or \ or disable history expansion with set +H or shopt -u -o histexpand. See History Expansion.

sed - inserting line with /c\ that has a variable that contains spaces

I have just recently got back into learning bash. Currently working on a project of mine and when using sed I've run into an issue, I've tried looking around the web for help but haven't had any joy. I suspect as I may not be using the correct terminology so I can't find what I'm looking for. ANYHOW.
So in my script I'm trying to assign the output of date to a variable. Here's the line from my script.
origdate=$(date)
When I call it the output looks like this:
Wed Oct 5 19:40:45 BST 2016
Part of my script then generates a file and writes information to it, part of which I am trying to use sed to find lines and replace parts of it. This is the first I've been playing around with sed, I've used it successfully so far for my needs. However I'm getting stuck when I try this:
sed -i '/origdate=empty/c\'$origdate'' $sd/pingcheck-email-$job.txt
When I run the script and it gets to this line, this is the error I'm getting:
sed: can't read Oct: No such file or directory
sed: can't read 5: No such file or directory
sed: can't read 19:52:56: No such file or directory
sed: can't read BST: No such file or directory
sed: can't read 2016: No such file or directory
I suspect it's something to do with the spaces in the date (variable), my question is: how can I work around this? Can I get sed to 'ignore' the spaces? or should I just use cut to cut the field for the date, and set that to a variable and the same thing again to set the time to another variable?
Even if someone could kindly point me in the right direction that'd be great!
Thanks in advance!
double quote the variable
sed -i '/origdate=empty/c\'"$origdate"'' $sd/pingcheck-email-$job.txt
or alternatively, the whole script
sed -i "/origdate=empty/c\$origdate" $sd/pingcheck-email-$job.txt
The problem is not with sed but rather with how bash word splits on your date given your command.
Bash
In bash, word splitting is performed on the command line so that text is broken up into a list of arguments. To illustrate, I'm going to run a simple script that outputs the first argument only.
bash -c 'echo $1' ignored_0 foo bar
Think of bash -c 'echo $1' ignored_0 as the command (sed in your case) and foo bar as the arguments. In this case, foo bar is split into two arguments, foo and bar.
To pass foo bar in as the first parameter, you need to have the text in either single or double quotes. See the GNU manual on quoting.
bash -c 'echo $1' ignored_0 'foo bar'
bash -c 'echo $1' ignored_0 "foo bar"
Parameter expansion does not occur when the variable is inside a single quote.
var="foo bar"
bash -c 'echo $1' ignored_0 '$var'
bash -c 'echo $1' ignored_0 "$var"
NOTE: In the command `bash -c 'echo $1', I do not want $1 to expand before being passed as an argument to bash because that's part of the code I want to execute.
Parameter expansion occurs when variables are outside of quotes, but word splitting will apply after the parameter is expanded. From the bash man page in the Word Splitting section:
The shell scans the results of parameter expansion, command
substitution, and arithmetic expansion that did not occur within
double quotes for word splitting.
From the GNU bash manual on Word Splitting:
The shell scans the results of parameter expansion, command
substitution, and arithmetic expansion that did not occur within
double quotes for word splitting.
var="foo bar"
bash -c 'echo $1' ignored_0 $var
The last step in Shell Expansions in Quote Removal where unquoted quote characters are removed before being passed to commands. The following command shows that ''"" has no effect on the arguments passed.
bash -c 'echo $1' ignored_0 foo''""
Application
In your example, the trailing '' after $origdate is extraneous. The important part is that $origdate is not quoted so word splitting applies to the expanded variable.
When -e is not passed to the sed command, sed expects the expression to be in one argument, or word from bash. When you run your command, your expression is /origdate=empty/c\Wed and the rest of the date is considered to be files for the expression to be applied to.
The simple fix is to put double quotes around the string for which you want to prevent word splitting. I've modified the command so that anyone can run this example without having the files on their system.
In this example, the \ must be escaped so that it is not considered an escape character for $.
echo "origdate=empty" | sed "/origdate=empty/c\\$origdate"
You can also change the type of quotes you are using without affecting word splitting like so.
echo "origdate=empty" | sed '/origdate=empty/c\'"$origdate"
You need escape by double slash
\ / \%

Bash wildcard pattern using `seq`

I am trying the following command:
ls myfile.h1.{`seq -s ',' 3501 3511`}*
But ls raises the error:
ls: cannot access myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511}*: No such file or directory
Seems like ls is thinking the entire line is a filename and not a wildcard pattern. But if I just copy that command ls myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511}* in the terminal I get the listing as expected.
Why does typing out the command in full work, but not the usage with seq?
seq is not needed for your case, try
$ ls myfile.h1.{3500..3511}
if you want to use seq I would suggest using format option
$ ls $(seq -f 'myfile.h1.%g' 3501 3511)
but I don't think there is any reason to do so.
UPDATE:
Note that I didn't notice the globbing in the original post. With that, the brace extension still preferred way
$ ls myfile.h1.{3500..3511}*
perhaps even factoring the common digit out, if your bash support zero padding
$ ls myfile.h1.35{00..11}*
if not you can extract at least 3 out
$ ls myfile.h1.3{500..511}*
Note that the seq alternative won't work with globbing.
Other answer has more details...
karakfa's answer, which uses a literal sequence brace expansion expression, is the right solution.
As for why your approach didn't work:
Bash's brace expansion {...} only works with literal expressions - neither variable references nor, as in your case, command substitutions (`...`, or, preferably, $(...)) work[1] - for a concise overview, see this answer of mine.
With careful use of eval, however, you can work around this limitation; to wit:
from=3501 to=3511
# CAVEAT: Only do this if you TRUST that $from and $to contain
# decimal numbers only.
eval ls "myfile.h1.{$from..$to}*"
#ghoti suggests the following improvement in a comment to make the use of eval safe here:
# Use parameter expansion to remove all non-digit characters from the values
# of $from and $to, thus ensuring that they either contain only a decimal
# number or the empty string; this expansion happens *before* eval is invoked.
eval ls "myfile.h1.{${from//[^0-9]/}..${to//[^0-9]/}}*"
As for how your command was actually evaluated:
Note: Bash applies 7-8 kinds of expansions to a command line; only the ones that actually come into play here are discussed below.
first, the command in command substitution `seq -s ',' 3501 3511` is executed, and replaced by its output (also note the trailing ,):
3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,
the result then forms a single word with its prefix, myfile.h1.{ and its suffix, }*, yielding:
myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,}*
pathname expansion (globbing) is then applied to the result - in your case, since no files match, it is left as-is (by default; shell options shopt -s nullglob or shopt -s failglob could change that).
finally, literal myfile.h1.{3501,3502,3503,3504,3505,3506,3507,3508,3509,3510,3511,}* is passed to ls, which - because it doesn't refer to an existing filesystem item - results in the error message you saw.
[1] Note that the limitation only applies to sequence brace expansions (e.g., {1..3}); list brace expansions (e.g, {1,2,3}) are not affected, because no up-front interpretation (interpolation) is needed; e.g. {$HOME,$USER} works, because brace expansion results expanding the list to separate words $HOME, and $USER, which are only later expanded.
Historically, sequence brace expansions were introduced later, at a time when the order of shell expansions was already fixed.

shell script exit with no match with question mark symbol

Why ./script.sh ? throws No match. ./script.sh is running fine.
script.sh
#!/bin/sh
echo "Hello World"
? is a glob character on UNIX. By default, in POSIX shells, a glob that matches no files at all will evaluate to itself; however, many shells have the option to modify this behavior and either pass no arguments in this case or make it an error.
If you want to pass this (or any other string which can be interpreted as a glob) literally, quote it:
./script.sh '?'
If you didn't use quotes, consider what the following would do:
touch a b c
./script.sh ? ## this is the same as running: ./script.sh a b c
That said -- the behavior of your outer shell (exiting when no matches exist, rather than defaulting to pass the non-matching glob expression as a literal) is non-default. If this shell is bash, you can modify it with:
shopt -u failglob
Note, however, that this doesn't really fix your problem, but only masks it when your current directory has no single-character filenames. The only proper fix is to correct your usage to quote and escape values properly.

Why does grep ignore the shell variable containing directories to be ignored?

On Mac OS X, I have a bash script like this:
# Directories excluded from grep go here.
EXCLUDEDIR="--exclude-dir={node_modules,.git,tmp,angular*,icons,server,coffee}"
# This grep needs to include one line below the hit.
grep -iIrn -A1 $EXCLUDEDIR -e "class=[\"\']title[\"\']>$" -e "<div class=\"content" . > microcopy.txt
but it seems to be ignoring $EXCLUDEDIR. If I simply use the --exclude-dir directly, it works. Why won't it expand the variable and work right?
The braces are technically an error. When they are in a variable, they are included verbatim, while when you type them directly as part of the command, Bash performs brace expansion, and effectively removes the braces from your expression.
bash$ echo --exclude-dir=moo{bar,baz}
--exclude-dir=moobar --exclude-dir=moobaz
bash$ x='moo{bar,baz}'
bash$ echo --exclude-dir=$x
--exclude-dir=moo{bar,baz}
The (not so simple) workaround is to list your parameters explicitly instead. This can be somewhat simplified by using an array to list the directory names you want to exclude (but this is not portable to legacy /bin/sh).
x=(node_modules .git tmp angular\* icons server coffee)
EXCLUDEDIR="${x[#]/#/--exclude-dir=}"
The backslash in angular\* is to pass this wildcard expression through to grep unexpanded -- if the shell would expand the variable, grep would not exclude directories matching the wildcard expression in subdirectories (unless they conveniently happened to match one of the expanded values in the current directory). If you have nullglob in effect, an unescaped wildcard would simply disappear from the lists.
#tripleee correctly describes the problem, but there are two workarounds that I think are simpler (and, I think, more portable) than using an array: use eval in the git command, or use echo in the variable assignment itself. The echo method is preferable.
Using eval
# Directories excluded from grep go here.
EXCLUDEDIR="--exclude-dir={node_modules,.git,tmp,angular*,icons,server,coffee}"
# This grep needs to include one line below the hit.
eval grep -iIrn -A1 $EXCLUDEDIR # .... etc
This causes the braces to be expanded as if they had been typed literally. Note, however, that it may have some unintended side-effects if you're not careful; for instance, you may need to add some extra \'s to escape quotes and $-signs.
Using echo
This is potentially safer than eval, since you won't accidentally execute code hidden in the EXCLUDEDIR variable.
# Directories excluded from grep go here.
EXCLUDEDIR="$(echo --exclude-dir={node_modules,.git,tmp,angular*,icons,server,coffee})"
# This grep needs to include one line below the hit.
grep -iIrn -A1 $EXCLUDEDIR # .... etc

Resources