bash file globbing anomaly - bash

The bash manual (I'm using version 4.3.42 on OSX) states that the vertical bar '|' character is used as a separator for multiple file patterns in file globbing. Thus, the following should work on my system:
projectFiles=./config/**/*|./support/**/*
However, the second pattern gives a "Permission denied" on the last file that is in that directory structure so the pattern is never resolved into projectFiles. I've tried variations on this, including wrapping the patterns in parentheses,
projectFiles=(./config/**/*)|(./support/**/*)
which is laid out in the manual, but that doesn't work either.
Any suggestions on what I'm doing wrong?

You're probably referring to this part in man bash:
If the extglob shell option is enabled using the shopt builtin, several
extended pattern matching operators are recognized. In the following
description, a pattern-list is a list of one or more patterns separated
by a |. Composite patterns may be formed using one or more of the fol-
lowing sub-patterns:
?(pattern-list)
Matches zero or one occurrence of the given patterns
*(pattern-list)
Matches zero or more occurrences of the given patterns
+(pattern-list)
Matches one or more occurrences of the given patterns
#(pattern-list)
Matches one of the given patterns
!(pattern-list)
Matches anything except one of the given patterns
The | separator works in pattern-lists as explained, but only when extglob is enabled:
shopt -s extglob
Try this:
projectFiles=*(./config/**/*|./support/**/*)
As #BroSlow pointed out in a comment:
Note that you can do this without extglob, ./{config,support}/**/*, which would just expand to the path with config and the path with support space delimited and then do pattern matching. Or ./#(config|support)/**/* with extglob. Either of which seems cleaner.
#chepner's comment is also worth mentioning:
Also, globbing isn't performed at all during a simple assignment; try foo=*, then compare echo "$foo" with echo $foo. Globbing does occur during array assignment; see foo=(*); echo "${foo[#]}"

Related

How to keep/remove numbers in a variable in shell?

I have a variable such as:
disk=/dev/sda1
I want to extract:
only the non numeric part (i.e. /dev/sda)
only the numeric part (i.e. 1)
I'm gonna use it in a script where I need the disk and the partition number.
How can I do that in shell (bash and zsh mostly)?
I was thinking about using Shell parameters expansions, but couldn't find working patterns in the documentation.
Basically, I tried:
echo ${disk##[:alpha:]}
and
echo ${disk##[:digit:]}
But none worked. Both returned /dev/sda1
With bash and zsh and Parameter Expansion:
disk="/dev/sda12"
echo "${disk//[0-9]/} ${disk//[^0-9]/}"
Output:
/dev/sda 12
The expansions kind-of work the other way round. With [:digit:] you will match only a single digit. You need to match everything up until, or from a digit, so you need to use *.
The following looks ok:
$ echo ${disk%%[0-9]*} ${disk##*[^0-9]}
/dev/sda 1
To use [:digit:] you need double braces, cause the character class is [:class:] and it itself has to be inside [ ]. That's why I prefer 0-9, less typing*. The following is the same as above:
echo ${disk%%[[:digit:]]*} ${disk##*[^[:digit:]]}
* - Theoretically they may be not equal, as [0-9] can be affected by the current locale, so it may be not equal to [0123456789], but to something different.
You have to be careful when using patterns in parameter substitution. These patterns are not regular expressions but pathname expansion patterns, or glob patterns.
The idea is to remove the last number, so you want to make use of Remove matching suffix pattern (${parameter%%word}). Here we remove the longest instance of the matched pattern described by word. Representing single digit numbers is easily done by using the pattern [0-9], however, multi-digit numbers is harder. For this you need to use extended glob expressions:
*(pattern-list): Matches zero or more occurrences of the given patterns
So if you want to remove the last number, you use:
$ shopt -s extglob
$ disk="/dev/sda1"
$ echo "${disk#${disk%%*([0-9])}} "${disk%%*([0-9])}"
1 dev/sda
$ disk="/dev/dsk/c0t2d0s0"
$ echo "${disk#${disk%%*([0-9])}} "${disk%%*([0-9])}"
0 /dev/dsk/c0t2d0s
We have to use ${disk#${disk%%*([0-9])}} to remove the prefix. It essentially searches the last number, removes it, uses the remainder and remove that part again.
You can also make use of pattern substitution (${parameter/pattern/string}) with the anchors % and # to anchor the pattern to the begin or end of the parameter. (see man bash for more information). This is completely equivalent to the previous solution:
$ shopt -s extglob
$ disk="/dev/sda1"
$ echo "${disk/${disk/%*([0-9])}/}" "${disk/%*([0-9])}"
1 dev/sda
$ disk="/dev/dsk/c0t2d0s0"
$ echo "${disk/${disk/%*([0-9])}/}" "${disk/%*([0-9])}"
0 /dev/dsk/c0t2d0s

Using Interval expressions with bash extended globbing

I know for a fact, that bash supports extended glob with a regular expression like support for #(foo|bar), *(foo) and ?(foo). This syntax is quite unique i.e. different from that of EREs -- extended globs use a prefix notation (where the operator appears before its operands), rather than postfix like EREs.
I'm wondering does it support the interval expressions feature of type {n,m} i.e. if there is one number in the braces, the preceding regexp is repeated n times or if there are two numbers separated by a comma, the preceding regexp is repeated n to m times. I couldn't find a particular documentation that suggests this support enabled in extended glob.
Actual Question
I came across a requirement in one of the questions today, to remove only a pair of trailing zeroes in a string. Trying to solve this with the extended glob support in bash
Given some sample strings like
foobar0000
foobar00
foobar000
should produce
foobar00
foobar
foobar0
I tried using extended glob with parameter expansion to do
x='foobar000'
respectively. I tried using the interval expression as below which seemed obvious to me that it wouldn't work
echo ${x%%+([0]{2})}
i.e. similar using sed in ERE as sed -E 's/[0]{2}$//' or in BRE as sed 's/[0]\{2\}$//'
So my question being, is this possible using any of the extended glob operators? I'm looking for answers specific to using the extended glob support in bash would take 'No' if not possible too.
Somehow I managed to find a way to do this within the confinements of bash.
Are interval glob-expressions implemented in bash?
No! In contrast to other shells such as ksh and zsh, bash did not implement interval expressions for globbing.
Can we mimic interval expressions in bash?
Yes! However, it is not really practical and could sometimes benefit by using printf. The idea is to build the globular expression that mimics the {m,n} interval using the KSH-globs #(pattern) and ?(pattern).
In the explanation below, we assume that the pattern is stored in variable p
Match n occurrences of the given pattern ({n}):
The idea is to repeat the pattern n times. For large n you can use printf
$ var="foobar01010"
$ echo ${var%%#(0|1)#(0|1)}
foobar000
or
$ var="foobar01010"
$ p=$(printf "#(0|1)%.0s" {1..4})
$ echo ${var%%$p}
foobar0
Match at least m occurrences of the given pattern ({m,}):
It is the same as before, but with an additional *(pattern)
$ var="foobar01010"
$ echo ${var%%#(0|1)#(0|1)*(0|1)}
foobar
or
$ var="foobar01010"
$ p="(0|1)"
$ q=$(printf "#$p%.0s" {1..4})
$ echo ${var%%$q*$p}
foobar
Match from n to m occurrences of the given pattern ({m,n}):
The interval expression {n,m} implies we have for sure n appearances and m-n possible appearances. These can be constructed using the ksh-globs #(pat) n times and ?(pat) m-n times. For n=2 and m=3, this leads to:
$ var="foobar01010"
$ echo ${var%%#(0|1)#(0|1)?(0|1)}
foobar010
or
$ p="(0|1)"
$ q=$(printf "#$p%.0s" {1..n})$(printf "?$p%.0s" {n+1..m})
$ echo ${var%%$q}
foobar010
$ var="foobar00200"
foobar002
$ var="foobar00020"
foobar00020
Another way to construct the interval expression {n,m} is using the ksh-glob anything but pattern written as !(pat) which allows us to say: give me all, except...
man bash:
!(pattern-list): Matches anything except one of the given patterns
This way we can write
$ echo ${var%%!(!(*$p)|#$p#$p#$p+$p|?$p)}
or
$ p="(0|1)"
$ pn=$(printf "#$p%.0s" {1..n})
$ pm=$(printf "?$p%.0s" {1..m-1})
$ echo ${var%%!(!(*$p)|$pn+$p|$pm)}
note: you need to do a double exclusion here due to the or (|) in the pattern list.
What about other shells?
KSH93
The interval expression {n,m} has been implemented in ksh93:
man ksh:
{n}(pattern-list) Matches n occurrences of the given patterns.
{m,n}(pattern-list) Matches from m to n occurrences of the given patterns. If m is omitted, 0 will be used. If n is omitted at least m occurrences will be matched.
$ echo ${var%%{2,3}(0|1)}
ZSH
Also zsh has a form of interval expression. It is a globbing flag which is part of the EXTENDED_GLOB option:
man zshall:
(#cN,M) The flag (#cN,M) can be used anywhere that the # or ## operators can be used except in the expressions (*/)# and (*/)## in filename generation, where / has special meaning; it cannot be combined with other globbing flags and a bad pattern error occurs if it is misplaced. It is equivalent to the form {N,M} in regular expressions. The previous character or group is required to match between N and M times, inclusive. The form
(#cN) requires exactly N matches; (#c,M) is equivalent to specifying N as 0; (#cN,) specifies that there is no maximum limit on the number of matches.
$ echo ${var%%(0|1)(#c2,3)}
No
"Extended pattern matching features" is enabled using extglob (thus we call that extended glob). Extended pattern matching features are used in an operation called pattern matching. Pattern matching is used in filename expansion and in [[...]] conditional constructs when using = or != operators. Filename expansion is used in parameter expansion.
As you can see in pattern matching, extended glob or not, pattern matching does not support expressions like [set]{count}. We can for example match one or more occurrences with +(..) and so on, but specifying the number of occurrences of a pattern is not possible.
But this is bash and bash is powerful. We can specify the number of occurrences of a pattern simply by repeating the pattern. We cannot specify the ending or the beginning (I mean like using ^ and $ in regex), but we can use ${parameter%%word} parameter expansions to remove the trailing portion of the parameter. So this will work:
var='foobar000'
echo ${var%%[0][0]}
and, with some simple hacking, we can do this:
var='foobar000'
echo ${var%%$(yes '[0]' | head -n 2 | tr -d '\n')}
and this will remove two trailing zeros from the string.

Bash bad substitution with glob expansion for environment variables

How can I match environment variables which include the case-insensitive segment "proxy" that is not a prefix? I'm on bash:
root#PDPINTDEV9:~# echo ${SHELL}
/bin/bash
I want to unset a bunch of proxy variables simultaneously. They all have "proxy" or "PROXY" in the name, such as http_proxy or NO_PROXY. I would like to use glob expansion, which this answer & comment says is what bash uses.
Also based on that answer, I see that I can find environment vars which start with "PROXY":
root#PDPINTDEV9:~# echo "${!PROXY*}"
PROXY_IP PROXY_PORT
But that doesn't make sense with what I've read about glob expansion. Based on those, "${!PROXY*}" should match anything that doesn't start with proxy... I think.
Furthermore, I can't get anything that does make sense with glob syntax to actually work:
root#PDPINTDEV9:~# echo ${*proxy}
-bash: ${*proxy}: bad substitution
root#PDPINTDEV9:~# echo "${!*[pP][rR][oO][xX][yY]}"
-bash: ${!*[pP][rR][oO][xX][yY]}: bad substitution
SOLVED below: Turns out you can't. Crazy, but thanks everyone.
Variable name expansion, as a special case of shell parameter expansion, does not support globbing. But it has two flavors:
${!PREFIX*}
${!PREFIX#}
In both, the * and # characters are hard-coded.
The first form will expand to variable names prefixed with PREFIX and joined by the first character of the IFS (which is a space, by default):
$ printf "%s\n" "${!BASH*}"
BASH BASHOPTS BASHPID BASH_ALIASES BASH_ARGC BASH_ARGV BASH_CMDS BASH_COMMAND ...
The second form will expand to variable names (prefixed with PREFIX), but as separate words:
$ printf "%s\n" "${!BASH#}"
BASH
BASHOPTS
BASHPID
BASH_ALIASES
BASH_ARGC
...
Both of these forms are case-sensitive, so to get the variable names in a case-insensitive manner, you can use set, in combination with some cut and grep:
$ (set -o posix; set) | cut -d= -f1 | grep -i ^proxy
PROXY_IP
proxy_port
But that doesn't make sense with what I've read about glob expansion.
Based on those, "${!PROXY*}" should match anything that doesn't start
with proxy... I think.
No and no.
In the first place, the ! character is not significant to pathname expansion, except when it appears at the beginning of a character class in a pattern, in which case the sense of the class is inverted. For example, fo[!o] is a pattern that matches any three-character string whose first two characters are "fo" and whose third is not another 'o'. But there is no character class in your expression.
But more importantly, pathname expansion isn't relevant to your expression ${!PROXY*} at all. There is no globbing there. The '!' and '*' are fixed parts of the syntax for one of the forms of parameter expansion. That particular expansion produces, by definition, the names of all shell variables whose names start with "PROXY", separated by the first character of the value of the IFS variable. Where it appears outside of double quotes, it is equivalent to ${!PROXY#}, which is less susceptible to globbing-related confusion.
Furthermore, I can't get anything that does make sense with glob syntax to actually work: [...]
No, because, again, there is no globbing going on. You need exactly ${! followed by the name prefix of interest, followed by *} or #} to form the particular kind of parameter expansion you're asking about.
How can I match environment variables which include the case-insensitive segment "proxy"?
You need to explicitly express the case variations of interest to you. For example:
${!PROXY*} ${!proxy*} ${!Proxy*}

Remove specified string pattern(s) from a string in bash

I found a good answer that explains how to remove a specified pattern from a string variable. In this case, to remove 'foo' we use the following:
string="fooSTUFF"
string="${string#foo}"
However, I would like to add the "OR" functionality that would be able to remove 'foo' OR 'boo' in the cases when my string starts with any of them, and leave the string as is, if it does not start with 'foo' or 'boo'. So, the modified script should look something like that:
string="fooSTUFF"
string="${string#(foo OR boo)}"
How could this be properly implemented?
If you have set the extglob (extended glob) shell option with
shopt -s extglob
Then you can write:
string="${string##(foo|boo)}"
The extended patterns are documented in the bash manual; they take the form:
?(pattern-list): Matches zero or one occurrence of the given patterns.
*(pattern-list): Matches zero or more occurrences of the given patterns.
+(pattern-list): Matches one or more occurrences of the given patterns.
#(pattern-list): Matches one of the given patterns.
!(pattern-list): Matches anything except one of the given patterns.
In all cases, pattern-list is a list of patterns separated by |
You need an extended glob pattern for that (enabled with shopt -s extglob):
$ str1=fooSTUFF
$ str2=booSTUFF
$ str3=barSTUFF
$ echo "${str1##(foo|boo)}"
STUFF
$ echo "${str2##(foo|boo)}"
STUFF
$ echo "${str3##(foo|boo)}"
barSTUFF
The #(pat1|pat2) matches one of the patterns separated by |.
#(pat1|pat2) is the general solution for your question (multiple patterns); in some simple cases, you can get away without extended globs:
echo "${str#[fb]oo}"
would work for your specific example, too.
You can use:
string=$(echo $string | tr -d "foo|boo")

Glob pattern to match non-hidden filenames that don't start with a particular string

I'm pretty inexperienced to globbing in general. How would one go about writing a glob pattern that matches filenames not starting with, say, "ab" but still need a length of at least 2? i.e. "start with something 2-letter string other than "ab"" This is a homework question, and only basic bash globs are allowed, and must work with "echo <glob here>".
Note: the question verbatim is
(Non-hidden) files in the current directory whose names contain at least two characters, but do not start with ab.
printed on paper. I'm pretty sure I didn't misunderstand anything. The requirements are
For each of the following file search criteria, provide a globbing pattern that matches
the criterion. Your answer in each case should be a text file with the following format:
echo <pattern>
My current attempt is echo {a[!b]*,[!a.]?*} but somehow it gets no points with the automatic grader which actually runs your file against a test case automatically without human intervention.
For a single letter, this would do:
$ echo [!a]?*
However, for 2 letters (and assuming files can also start with numbers or punctuation or all kinds of other things), I can only think of this without resorting to shopt:
$ GLOBIGNORE=ab*
$ echo *
Well, now, technically, this would work:
$ echo [!a]?* [a][!b]*
BUT this would leave a nasty [a][!b]* in our results if there are no files starting with an a+1 or more extra characters, which would not only be undesirable, but even considered a bug in any application, so on that grounds I would not consider it a valid answer. To omit that [a][!b]*, we have to resort to nullglob (and if extglob isn't allowed, nullglob probably isn't either):
$ shopt -s nullglob
$ echo [!a]?* [a][!b]*
Fwiw, extglob would be:
$ shopt -s extglob
$ echo !(ab*)
That previous answer would match files with less then 2 characters, so like #perreal says:
$ #([^a]?|?[^b])*
Starting with "a" or "b" OR with "ab"? For the later:
ab*
Needless to say, you have to specify a path that resolves (relative or absolute):
/path/to/ab*
To your updated question:
{b,c,d,e,f...}{a,c,d,e,f...}*
Should work, note that ... is not actually valid, but I won't write the whole alphabet here. :P
shopt -s extglob # turn on extended globbing
echo #([^a]?|?[^b])*
My original echo {a[!b]*,[!a.]?*} is correct works very well. The teacher actually set up the test cases wrong, so everybody got marked incorrectly and got a remark just now.

Resources