Where is bash globbing behavior documented? - bash

I know I can do the following:
ls /dir/*/subdir/file
to list any matching files in any matching directories. Is this just regular globbing? It feels like it is more advanced than the following usage:
ls /dir/subdir/file*
I think of it as "branching/searching glob". If it is different to regular globbing, what is its real name and where is its behaviour documented?

man bash
search for Pathname Expansion

There's no particular distinction between these two forms in the bash documentation; they're both described under Filename Expansion and Pattern Matching.

It's on the bash man page, in the section on Pattern Matching.
* Matches any string, including the null string.

And if you want to be more portable here are the POSIX specs.

Related

Bash pattern matching 'or' using shell parameter expansion

hashrate=${line//*:/}
hashrate=${hashrate//H\/s/}
I'm trying to unify this regex replace into a single command, something like:
hashrate=${line//*:\+/H\/s/}
However, this last option doesn't work. I also tried with \|, but it doesn't seem to work and I haven't found anything useful in bash manuals and documentation. I need to use ${} instead of sed, even if using it solves my problem.
The alternation for shell patterns (assuming extended globbing, shopt -s extglob is enabled), is #(pattern|pattern...). For your case:
${line//#(*:|H\/s)}
The trailing / is optional if you just remove a pattern instead of replacing it.
Notice that because of the double slash, //, all occurrences of the patterns will be removed, one at a time. If you used *(...) (see randomir's answer), consecutive patterns would be removed all in one go. Unless you have giant string, the difference should be negligible. (If you have giant strings, you don't want to use globbing anyway, as it's not optimized for this kind of thing.)
If you enable extended globbing (extglob via shopt), you can use the *(pattern1|pattern2|...) operator to match zero or more glob patterns:
hashrate="${line//*(*:|H\/s)/}"

Weird issue when running grep with the --include option

Here is the code at the bash shell. How is the file mask supposed to be specified, if not this way? I expected both commands to find the search expression, but it's not happening. In this example, I know in advance that I prefer to restrict the search to python source code files only, because unqualified searches are silly time wasters.
So, this works as expected:
grep -rni '/home/ga/projects' -e 'def Pr(x,u,v)'
/home/ga/projects/anom/anom.py:27:def Pr(x,u,v): blah, blah, ...
but this won't work:
grep --include=\*.{py} -rni '/home/ga/projects' -e 'def Pr(x,u,v)'
I'm using GNU grep version 2.16.
--include=\*.{py} looks like a broken attempt to use brace expansion (an unquoted {...} expression).
However, for brace expansion
to occur in bash (and ksh and zsh), you must either have:
a list of at least 2 items, separated with ,; e.g. {py,txt}, which expands to 2 arguments, py and txt.
or, a range of items formed from two end points, separated with ..; e.g., {1..3}, which expands to 3 arguments, 1, 2, and 3.
Thus, with a single item, simply do not use brace expansion:
--include=\*.py
If you did have multiple extensions to consider, e.g., *.py as well as *.pyc files, here's a robust form that illustrates the underlying shell features:
'--include=*.'{py,pyc}
Here:
Brace expansion is applied, because {...} contains a 2-item list.
Since the {...} directly follows the literal (single-quoted) string --include=*., the results of the brace expansion include the literal part.
Therefore, 2 arguments are ultimately passed to grep, with the following literal content:
--include=*.py
--include=*.pyc
Your command fails because of the braces '{}'. It will search for it in the file name. You can create a file such as 'myscript.{py}' to convince yourself. You'll see it will appear in the results.
The correct option parameter would be '*.py' or the equivalent \*.py. Either way will protect it from being (mis)interpreted by the shell.
On the other side, I can only advise to use the command find for such jobs :
find /home/ga/projects -regex '.*\.py$' -exec grep -e "def Pr(x,u,v)" {} +
That will protect you from hard to understand shell behaviour.
Try like this (using quotes to be safe; also better readability than backslash escaping IMHO):
grep --include='*.py' ...
your \*.{py} brace expansion usage isn't supported at all by grep. Please see the comments below for the full investigation regarding this. For the record, blame this answer for the resulting brace wars ;)
By the way, the brace expansion works generally fine in Bash. See mklement0 answer for more details.
Ack. As an alternative, you might consider switching to ack instead from now on. It's a tool just like grep, but fully optimized for programmers.
It's a great fit for what you are doing. A nice quote about it:
Every once in a while something comes along that improves an idea so much, you can't ignore it. Such a thing is ack, the grep replacement.

Pattern to ignore 'history numlines' in HISTIGNORE, bash

What would be the correct pattern for ignoring commands such as:
history 10
history 104
history .. #whatever may be the number here
I tried:
HISTIGNORE='history\s+\d*'
but that doesn't work.
The value of HISTIGNORE is a list of shell patterns. Not a list of regular expressions. As such the regular expressions will not work.
This pattern 'history *[0-9]*' should do what is needed here.
Edit: Pulling in added information from the comments.
To also ignore history by itself the simplest solution is to just add history to the value of HISTIGNORE.
But, when extglob is enabled (and assuming HISTIGNORE honors it) this pattern should cover that as well:
'history?( *[0-9]*)'
To have multiple patterns you can separate them with a :
Like this:
HISTIGNORE="&:exit:pwd:rm *:history *:[ \t]*"

Why would I not leave extglob enabled in bash?

I just found out about the bash extglob shell option here:-
How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?
All the answers that used shopt -s extglob also mentioned shopt -u extglob to turn it off.
Why would I want to turn something so useful off? Indeed why isn't it on by default?
Presumably it has the potential for giving some nasty surprises.
What are they?
No nasty surprises -- default-off behavior is only there for compatibility with traditional, standards-compliant pattern syntax.
Which is to say: It's possible (albeit unlikely) that someone writing fo+(o).* actually intended the + and the parenthesis to be treated as literal parts of the pattern matched by their code. For bash to interpret this expression in a different manner than what the POSIX sh specification calls for would be to break compatibility, which is right now done by default in very few cases (echo -e with xpg_echo unset being the only one that comes immediately to mind).
This is different from the usual case where bash extensions are extending behavior undefined by the POSIX standard -- cases where a baseline POSIX shell would typically throw an error, but bash instead offers some new and different explicitly documented behavior -- because the need to treat these characters as matching themselves is defined by POSIX.
To quote the relevant part of the specification, with emphasis added:
An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in Quoting that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting.
When unquoted and outside a bracket expression, the following three characters shall have special meaning in the specification of patterns:
? - A question-mark is a pattern that shall match any character.
* - An asterisk is a pattern that shall match multiple characters, as described in Patterns Matching Multiple Characters.
[ - The open bracket shall introduce a pattern bracket expression.
Thus, the standard explicitly requires any non-NUL character other than ?, * or [ or those listed elsewhere as requiring quoting to match themselves. Bash's behavior of having extglob off by default allows it to conform with this standard in its default configuration.
However, for your own scripts and your own interactive shell, unless you're making a habit of running code written for POSIX sh with unusual patterns included, enabling extglob is typically worth doing.
Being a Kornshell person, I have extglob on in my .bashrc by default because that's the way it is in Kornshell, and I use it a lot.
For example:
$ find !(target) -name "*.xml"
In Kornshell, this is no problem. In BASH, I need to set extglob. I also set lithist and set -o vi. This allows me to use VI commands in using my shell history, and when I hit v, it shows my code as a bunch of lines.
Without lithist set:
for i in *;do;echo "I see $i";done
With listhist set:
for i in *
do
echo "I see $i"
done
Now, only if BASH had the print statement, I'd be all set.

Search and replace in Shell

I am writing a shell (bash) script and I'm trying to figure out an easy way to accomplish a simple task.
I have some string in a variable.
I don't know if this is relevant, but it can contain spaces, newlines, because actually this string is the content of a whole text file.
I want to replace the last occurence of a certain substring with something else.
Perhaps I could use a regexp for that, but there are two moments that confuse me:
I need to match from the end, not from the start
the substring that I want to scan for is fixed, not variable.
for truncating at the start: ${var#pattern}
truncating at the end ${var%pattern}
${var/pattern/repl} for general replacement
the patterns are 'filename' style expansion, and the last one can be prefixed with # or % to match only at the start or end (respectively)
it's all in the (long) bash manpage. check the "Parameter Expansion" chapter.
amn expression like this
s/match string here$/new string/
should do the trick - s is for sustitute, / break up the command, and the $ is the end of line marker. You can try this in vi to see if it does what you need.
I would look up the man pages for awk or sed.
Javier's answer is shell specific and won't work in all shells.
The sed answers that MrTelly and epochwolf alluded to are incomplete and should look something like this:
MyString="stuff ttto be edittted"
NewString=`echo $MyString | sed -e 's/\(.*\)ttt\(.*\)/\1xxx\2/'`
The reason this works without having to use the $ to mark the end is that the first '.*' is greedy and will attempt to gather up as much as possible while allowing the rest of the regular expression to be true.
This sed command should work fine in any shell context used.
Usually when I get stuck with Sed I use this page,
http://sed.sourceforge.net/sed1line.txt

Resources