confusing case of a bash completion script - bash

I'm having trouble understanding what the following code does in a bash completion script:
case "$last" in
+\(--import|-i\))
_filedir '+(txt|html)';;
When is that case ever met? I thought the second line above would be something like
--import|-i)
which does make sense to me. I grepped my bash_completion.d directory for '+\\(' but that one was the only one that came up so I guess it's not that common.

This code is indeed puzzling without context. As it is, it matches two literal strings -
$ case "+(--import" in +\(--import|-i\)) echo match ;; esac
match
$ case "-i)" in +\(--import|-i\)) echo match ;; esac
match
It looks similar to the extended glob pattern +(--import|-i), but in this form it's neither a match for the literal pattern (would need to escape the pipe) nor the actual pattern (would need to unescape the parentheses). I'd guess "bug", but bash completion is a minefield of crazy metaprogramming, so it's impossible to say without seeing the entire script.
From bash(1)
If the extglob shell option is enabled using the shopt builtin,
several extended pattern matching operators are recognized. In the
following description, a pattern-list is a list of one or more
patterns separated by a |. Composite patterns may be formed using one
or more of the following sub-patterns:
[...]
+(pattern-list)
Matches one or more occurrences of the given patterns

Related

How to remove a known last part from commands output string in one line?

To rephrase - I want to use Bash command substitution and string substitution in the same line.
My actual commands are longer, but the ridiculous use of echo here is just a "substitution" for shortness and acts the same - with same errors ;)
I know we can use a Bash command to produce it's output string as a parameter for another command like this:
echo "$(echo "aahahah</ddd>")"
aahahah</ddd>
I also know we can remove last known part of a string like this:
var="aahahah</ddd>"; echo "${var%</ddd>}"
aahahah
I am trying to write a command where one command gives a string output, where I want to remove last part, which is known.
echo "${$(echo "aahahah</ddd>")%</ddd>}"
-bash: ${$(echo "aahahah</ddd>")%</ddd>}: bad substitution
It might be the order of things happening or substitution only works on variables or hardcoded strings. But I suspect just me missing something and it is possible.
How do I make it work?
Why doesn't it work?
When a dollar sign as in $word or equivalently ${word} is used, it asks for word's content. This is called parameter expansion, as per man bash.
You may write var="aahahah</ddd>"; echo "${var%</ddd>}": That expands var and performs a special suffix operation before returning the value.
However, you may not write echo "${$(echo "aahahah</ddd>")%</ddd>}" because there is nothing to expand once $(echo "aahahah</ddd>") is evaluated.
From man bash (my emphasis):
${parameter%word}
Remove matching suffix pattern. The word is expanded to produce a
pattern just as in pathname expansion. If the pattern
matches a trailing portion of the expanded value of parameter, then
the result of the expansion is the expanded value of parameter
with the shortest matching pattern (the ''%'' case) or the longest matching pattern (the ''%%'' case) deleted.
Combine your commands like this
var=$(echo "aahahah</ddd>")
echo ${var/'</ddd>'}

multiple replacements on a single variable

For the following variable:
var="/path/to/my/document-001_extra.txt"
i need only the parts between the / [slash] and the _ [underscore].
Also, the - [dash] needs to be stripped.
In other words: document 001
This is what I have so far:
var="${var##*/}"
var="${var%_*}"
var="${var/-/ }"
which works fine, but I'm looking for a more compact substitution pattern that would spare me the triple var=...
Use of sed, awk, cut, etc. would perhaps make more sense for this, but I'm looking for a pure bash solution.
Needs to work under GNU bash, version 3.2.51(1)-release
After editing your question to talk about patterns instead of regular expressions, I'll now show you how to actually use regular expressions in bash :)
[[ $var =~ ^.*/(.*)-(.*)_ ]] && var="${BASH_REMATCH[#]:1:2}"
Parameter expansions like you were using previously unfortunately cannot be nested in bash (unless you use ill-advised eval hacks, and even then it will be less clear than the line above).
The =~ operator performs a match between the string on the left and the regular expression on the right. Parentheses in the regular expression define match groups. If a match is successful, the exit status of [[ ... ]] is zero, and so the code following the && is executed. (Reminder: don't confuse the "0=success, non-zero=failure" convention of process exit statuses with the common Boolean convention of "0=false, 1=true".)
BASH_REMATCH is an array parameter that bash sets following a successful regular-expression match. The first element of the array contains the full text matched by the regular expression; each of the following elements contains the contents of the corresponding capture group.
The ${foo[#]:x:y} parameter expansion produces y elements of the array, starting with index x. In this case, it's just a short way of writing ${BASH_REMATCH[1]} ${BASH_REMATCH[2]}. (Also, while var=${BASH_REMATCH[*]:1:2} would have worked as well, I tend to use # anyway to reinforce the fact that you almost always want to use # instead of * in other contexts.)
Both of the following should work correctly. Though the second is sensitive to misplaced characters (if you have a / or - after the last _ it will fail).
var=$(IFS=_ read s _ <<<"$var"; IFS=-; echo ${s##*/})
var=$(IFS=/-_; a=($var); echo "${a[#]:${#a[#]} - 3:2}")

Using pattern in Shell Parameter Expansion

I am reading a page and trying to extract some data from it. I am interested in using bash and after going through few links, i came to know that 'Shell Parameter Expansion' might help however, i am finding difficulty using it in my script. I know that using sed might be easier but just for my knowledge i want to know how can i achieve this in bash.
shopt -s extglob
str='My work</u><br /><span style="color: rgb(34,34,34);"></span><span>abc-X7-27ABC | </span><span style="color: rgb(34,34,34);">build'
echo "${str//<.*>/|}"
I want my output to be like this: My work|abc-X7-27ABC |build
I thought of checking whether it accepts only word instead of pattern and it seems to be working with words.
For instance,
echo "${str//span style/|}" works but
echo "${str//span.*style/|}" doesn't
On the other hand, i saw in one of the link that it does accept pattern. I am confused why it's not working with the patern i am using above.
How to make sed do non-greedy match?
(User konsolebox's solution)
One mistake you're making is by mixing shell globbing and regex. In shell glob dot is taken literally as dot character not as 0 or more of any character.
If you try this code instead:
echo "${str//<*>/|}"
then it will print:
My work|build
This is not an answer, so much as a demonstration of why pattern-matching is not recommended for this kind of HTML editing. I attempted the following.
shopt -s extglob
set +H # Turn off history expansion, if necessary, to allow the !(...) pattern
echo ${str//+(<+(!(>))>)/|}
First: it didn't work, even for a simpler string like str='My work</u><br />bob<foo>build'. Second, for the string in the original question, it appeared to lock up the shell; I suspect such a complex pattern triggers exponential backtracking.
Here's how it's intended to work:
!(>) is any thing other than a single >
+(!(>)) is one or more non-> characters.
<+(!(>))> is one or more non-> characters enclosed in < and >
+(<+(!(>))>) is one or more groups of <...>-enclosed non->s.
My theory is that since !(>) can match a multi-character string as well as a single character, there is a ton of backtracking required.

Why would I not leave extglob enabled in bash?

I just found out about the bash extglob shell option here:-
How can I use inverse or negative wildcards when pattern matching in a unix/linux shell?
All the answers that used shopt -s extglob also mentioned shopt -u extglob to turn it off.
Why would I want to turn something so useful off? Indeed why isn't it on by default?
Presumably it has the potential for giving some nasty surprises.
What are they?
No nasty surprises -- default-off behavior is only there for compatibility with traditional, standards-compliant pattern syntax.
Which is to say: It's possible (albeit unlikely) that someone writing fo+(o).* actually intended the + and the parenthesis to be treated as literal parts of the pattern matched by their code. For bash to interpret this expression in a different manner than what the POSIX sh specification calls for would be to break compatibility, which is right now done by default in very few cases (echo -e with xpg_echo unset being the only one that comes immediately to mind).
This is different from the usual case where bash extensions are extending behavior undefined by the POSIX standard -- cases where a baseline POSIX shell would typically throw an error, but bash instead offers some new and different explicitly documented behavior -- because the need to treat these characters as matching themselves is defined by POSIX.
To quote the relevant part of the specification, with emphasis added:
An ordinary character is a pattern that shall match itself. It can be any character in the supported character set except for NUL, those special shell characters in Quoting that require quoting, and the following three special pattern characters. Matching shall be based on the bit pattern used for encoding the character, not on the graphic representation of the character. If any character (ordinary, shell special, or pattern special) is quoted, that pattern shall match the character itself. The shell special characters always require quoting.
When unquoted and outside a bracket expression, the following three characters shall have special meaning in the specification of patterns:
? - A question-mark is a pattern that shall match any character.
* - An asterisk is a pattern that shall match multiple characters, as described in Patterns Matching Multiple Characters.
[ - The open bracket shall introduce a pattern bracket expression.
Thus, the standard explicitly requires any non-NUL character other than ?, * or [ or those listed elsewhere as requiring quoting to match themselves. Bash's behavior of having extglob off by default allows it to conform with this standard in its default configuration.
However, for your own scripts and your own interactive shell, unless you're making a habit of running code written for POSIX sh with unusual patterns included, enabling extglob is typically worth doing.
Being a Kornshell person, I have extglob on in my .bashrc by default because that's the way it is in Kornshell, and I use it a lot.
For example:
$ find !(target) -name "*.xml"
In Kornshell, this is no problem. In BASH, I need to set extglob. I also set lithist and set -o vi. This allows me to use VI commands in using my shell history, and when I hit v, it shows my code as a bunch of lines.
Without lithist set:
for i in *;do;echo "I see $i";done
With listhist set:
for i in *
do
echo "I see $i"
done
Now, only if BASH had the print statement, I'd be all set.

11*(...) as a bash parameter without quotation marks

I'm trying to write a small piece of code that passes a small formula to another program, however i've found that something strange happens when the formula starts with 11*(:
$ echo 11*15
Neatly prints '11*15'
$ echo 21*(15)
Neatly prints '21*(15)', while
echo 11*(15)
Only gives '11'. As far as I've found this only happens with '11*('. I know that this can be solved by using proper quotation marks, but I'm still curious as to why this happens.
Does anyone know?
How is your program coded? If its coded to take in parameters, then pass your formula like
./myprogram "11*15"
or
echo '11*15' | myprogram
If you do echo just like that on the command line, you may inadvertently display files that has 11 in its file name
11*(15) uses a Bash-specific extended glob syntax. You've stumbled across it accidentally, emphasizing why quotation marks are a good idea. (I also learned a lot tracking down why it was working differently for me; thanks for that.)
The behavior of
echo 11*(15)
in bash is going to vary depending on whether extglob is enabled. If it's enabled *(PATTERN-LIST) matches zero or more occurrences of the patterns. If it's disabled, it doesn't, and the resulting ( is likely to cause a syntax error.
For example:
$ ls
11 115 1155 11555 115555
$ shopt -u extglob
$ echo 11*(55)
bash: syntax error near unexpected token `('
$ shopt -s extglob
$ echo 11*(55)
11 1155 115555
$
(This explains the odd behavior I discussed in comments.)
Quoting from the bash 4.2.8 documentation (info bash):
If the `extglob' shell option is enabled using the `shopt' builtin,
several extended pattern matching operators are recognized. In the
following description, a PATTERN-LIST is a list of one or more
patterns separated by a `|'. Composite patterns may be formed using
one or more of the following sub-patterns:
`?(PATTERN-LIST)'
Matches zero or one occurrence of the given patterns.
`*(PATTERN-LIST)'
Matches zero or more occurrences of the given patterns.
`+(PATTERN-LIST)'
Matches one or more occurrences of the given patterns.
`#(PATTERN-LIST)'
Matches one of the given patterns.
`!(PATTERN-LIST)'
Matches anything except one of the given patterns.

Resources