I read such an example which excerpted from an instruction,
and intend to examine \ in ^[-[:alnum:]\._]+$
# is input a valid filename?
read -p "Enter a single item > "
if [[ "$REPLY" =~ ^[-[:alnum:]\._]+$ ]]; then
echo "'$REPLY' is a valid filename."
else
echo "The string '$REPLY' is not a valid filename."
fi
check the bracket expression by feeding some combinations.
$ bash read_validate.sh
Enter a single item > test.tst.
'test.tst.' is a valid filename.'
#test `\`
$ bash read_validate.sh
Enter a single item > test\\tst
The string 'test\tst' is not a valid filename.
When i remove the escape \ from ^[-[:alnum:]\._]+$, to be ^[-[:alnum:]._]+$
$ bash read_validate.sh
Enter a single item > test.tst
'test.tst' is a valid filename.
# to assert that dot is not the any character sign.
$ bash read_validate.sh
Enter a single item > test*tst
The string 'test*tst' is not a valid filename.
# codes run properly.
It seems not necessary to insert escape \ to the pattern.
Is that right?
I cannot make sure if omit some key points about the bracket expression and escape character?
Bash uses Extended Regular Expressions. Quoting the standard:
The special characters '.', '*', '[', and '\' (period, asterisk, left-bracket, and backslash, respectively) shall lose their special meaning within a bracket expression.
So inside [ ], they don't need to be escaped.
The situation is made slightly more complicated by the fact that Bash processes backslashes in your string:
$ set -x
$ [[ '\' =~ [\.] ]] && echo yes
+ [[ \ =~ [.] ]] # look, no backslash!
So the recommended way to use regular expressions is to set a shell variable:
$ re='[\.]'
+ re='[\.]'
$ [[ '\' =~ $re ]] && echo yes
+ [[ \ =~ [\.] ]] # backslash preserved!
+ echo yes
yes
Related
This is part of a bash script with line numbers shown. I can understand how getopts works in bash, but can't understand line 116. if [[ ! " ${FS_OPTIONS[#]} " =~ " $OPTARG " ]]; then part. Earlier in the script there was
#!/usr/bin/env bash
FS_OPTIONS=("ubuntu" "busybox")
while getopts "hsf:" opt; do
case $opt in
f)
if [[ ! " ${FS_OPTIONS[#]} " =~ " $OPTARG " ]]; then
echo "Unsupported filesystem $OPTARG"
echo "Use ubuntu/busybox"
exit -1
else
echo "ok!"
export FILESYSTEM=$OPTARG
fi
;;
esac
done
ckim#chan-ubuntu:~/testbash$ test3.sh -f ubuntu
ok!
ckim#chan-ubuntu:~/testbash$ test3.sh -f busybox
ok!
ckim#chan-ubuntu:~/testbash$ test3.sh -f ubunt.
Unsupported filesystem ubunt.
Use ubuntu/busybox
I changed the condition line to if [[ ! " ${FS_OPTIONS[#]} " =~ $OPTARG ]]; then and can see the right side is regarded as regular expression. Now I can see
ckim#chan-ubuntu:~/testbash$ test3.sh -f ubunt.
ok!
This is because the " " was removed and the argument was taken as regexp.
The bash manual says
An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used,
the string to the right of the operator is considered an extended regular expression and matched accordingly
(as in regex(3)). The return value is 0 if the string matches the pattern, and 1 otherwise.
This got long before my question. My first question is : From what I observed above, when the pattern matches, the =~ returns 1. (it has ! before) as opposed to the manual. Am I missing something?
My second questions : What are some use cases of using the original if [[ ! " ${FS_OPTIONS[#]} " =~ " $OPTARG " ]]; then? exploiting regular expression? Because it has " ", it will take the argument as string(without taking it as regular expression). Is there any usefulness for =~ with " "?
What's the meaning of the specific line you asked about?
Breaking down what's done by [[ ! " ${FS_OPTIONS[#]} " =~ " $OPTARG " ]]:
" ${FS_OPTIONS[#]} " is, in this context, a less-clear equivalent to " ${FS_OPTIONS[*]} " (since the usual distinction between them would have [#] expanding to multiple words, which isn't legal in this context). Thus, in acting like ${FS_OPTIONS[*]}, we're expanding to the complete list of words in FS_OPTIONS, with a separator (first character in IFS, by default a single space) added between them. Also, note that we're adding whitespace at the beginning and end (so our OPTARG can be matched in those positions and not just in the middle).
" $OPTARG " pads the string we're looking for with whitespace on both sides, so we can't match a substring.
With the right-hand side quoted, we're doing a substring search, so we're checking if the contents of $OPTARG -- with added whitespace at the beginning and end -- exist anywhere in the full expanded list generated by ${FS_OPTIONS[*]}.
The ! inverts the logical exit status of the rest of the expression (changing a 0 to a 1 or a 1 to a 0).
What's the use of that syntax in general?
A quoted string on the RHS acts as a regular, unanchored substring search. That is, [[ $foo = "bar" ]] is true only if the variable foo expands to exactly bar, but [[ $foo =~ "bar" ]] is true if the variable foo expands to contain bar anywhere in its contents. This is a useful semantic on its own.
Quoting is determined character-by-character, not for the full string. As an example of how this can be used:
regex_pre='([[:space:]]|^)'
literal_string='** match this exactly **'
regex_post='([[:space:]]|$)'
[[ $foo =~ ${regex_pre}"${literal_string}"${regex_post} ]]
# | | |
# | | \-> unquoted: acts like a regex
# | \-> quoted: acts like a literal string
# \-> unquoted: acts like a regex
...matches ** match this exactly ** only when it's at the start of a string or immediately preceded by whitespace, and also either at the end or immediately succeeded by whitespace, without needing to write a regex for the literal string in between, and without needing to do the pad-with-whitespace trick shown in the question.
ckim#chan-ubuntu:~/testbash$ test3.sh -f ubuntu
ok!
ckim#chan-ubuntu:~/testbash$ test3.sh -f ubunt.
Unsupported filesystem ubunt.
Use ubuntu/busybox
The bash manual says (in the discussion about [[...]]):
Any part of the pattern may be quoted to force the quoted portion to be matched as a string
So, to treat the input as a regular expression, plus add literal spaces at the ends, quote the spaces but leave the variable unquoted:
[[ " ${FS_OPTIONS[*]} " =~ " "$OPTARG" " ]]
# .........................^^^-------^^^
Demonstrating:
$ OPTARG=ubuntu
$ [[ " ${FS_OPTIONS[*]} " =~ " "$OPTARG" " ]] && echo Y || echo N
Y
$ OPTARG=ubunt.
$ [[ " ${FS_OPTIONS[*]} " =~ " "$OPTARG" " ]] && echo Y || echo N
Y
$ OPTARG=ubunt
$ [[ " ${FS_OPTIONS[*]} " =~ " "$OPTARG" " ]] && echo Y || echo N
N
I have a loop that evaluates based on a regex conditional:
until read -p "Enter oprator: " operator
[[ $operator =~ ^[+-*\/]$ ]] #doesn't work
do...
The loop will run until the user enters an arithmetic operator (+, -, * or /). When I enter any of those four, the loop still runs.
I've tried variations of this (i.e. place regex in variable, using quotes, grep) but nothing seems to work.
^[+-*\/]$ ]]$
Here problem is placement of an unescaped - in the middle of the bracket expression which acts as a range between + and *.
You may use this regex (no need to escape / in BASH regex):
[[ $operator =~ ^[-+*/]$ ]]
Or even better without regex use glob match:
[[ $operator == [-+*/] ]]
When including the dash or minus sign - in a character class of a Regex, it must be first or last position, or it will be handled like a range marker. Also the slash / does not need escaping with a backslash:
#!/usr/bin/env bash
until
read -r -n1 -p "Enter oprator: " operator
printf \\n
[[ "$operator" =~ [+*/-] ]] #doesn't work
do
printf 'Symbol %q is not an operator!\n' "$operator" >&2
done
POSIX shell grammar implementation:
#!/usr/bin/env sh
until
printf 'Enter oprator: '
read -r operator
printf \\n
[ -n "$operator" ] && [ -z "${operator%%[-+*/]}" ]
do
printf 'Symbol %s is not an operator!\n' "$operator" >&2
done
A POSIX compliant shell shall provide mechanisms like this to iterate over collections of strings:
for x in $(seq 1 5); do
echo $x
done
But, how do I iterate over each character of a word?
It's a little circuitous, but I think this'll work in any posix-compliant shell. I've tried it in dash, but I don't have busybox handy to test with.
var='ab * cd'
tmp="$var" # The loop will consume the variable, so make a temp copy first
while [ -n "$tmp" ]; do
rest="${tmp#?}" # All but the first character of the string
first="${tmp%"$rest"}" # Remove $rest, and you're left with the first character
echo "$first"
tmp="$rest"
done
Output:
a
b
*
c
d
Note that the double-quotes around the right-hand side of assignments are not needed; I just prefer to use double-quotes around all expansions rather than trying to keep track of where it's safe to leave them off. On the other hand, the double-quotes in [ -n "$tmp" ] are absolutely necessary, and the inner double-quotes in first="${tmp%"$rest"}" are needed if the string contains "*".
Use getopts to process input one character at a time. The : instructs getopts to ignore illegal options and set OPTARG. The leading - in the input makes getopts treat the string as a options.
If getopts encounters a colon, it will not set OPTARG, so the script uses parameter expansion to return : when OPTARG is not set/null.
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "-$1"
do echo "'${OPTARG:-:}'"
done
}
while read -r line;do
split_string "$line"
done
As with the accepted answer, this processes strings byte-wise instead of character-wise, corrupting multibyte codepoints. The trick is to detect multibyte codepoints, concatenate their bytes and then print them:
#!/bin/sh
IFS='
'
split_string () {
OPTIND=1;
while getopts ":" opt "$1";do
case "${OPTARG:=:}" in
([[:print:]])
[ -n "$multi" ] && echo "$multi" && multi=
echo "$OPTARG" && continue
esac
multi="$multi$OPTARG"
case "$multi" in
([[:print:]]) echo "$multi" && multi=
esac
done
[ -n "$multi" ] && echo "$multi"
}
while read -r line;do
split_string "-$line"
done
Here the extra case "$multi" is used to detect when the multi buffer contains a printable character. This works on shells like Bash and Zsh but Dash and busybox ash do not pattern match multibyte codepoints, ignoring locale.
This degrades somewhat nicely: Dash/ash treat sequences of multibyte codepoints as one character, but handle multibyte characters surrounded by single byte characters fine.
Depending on your requirements it may be preferable not to split consecutive multibyte codepoints anyway, as the next codepoint may be a combining character which modifies the character before it.
This won't handle the case where a single byte character is followed by a combining character.
This works in dash and busybox:
echo 'ab * cd' | grep -o .
Output:
a
b
*
c
d
I was developing a script which demanded stacks... So, we can use it to iterate through strings
#!/bin/sh
# posix script
pop () {
# $1 top
# $2 stack
eval $1='$(expr "'\$$2'" : "\(.\).*")'
eval $2='$(expr "'\$$2'" : ".\(.*\)" )'
}
string="ABCDEFG"
while [ "$string" != "" ]
do
pop c string
echo "--" $c
done
I need to check if a string starts and ends with a single quote, for example
'My name is Mozart'
What I have is this, which doesn't work
if [[ $TEXT == '*' ]] ;
This does not work either
if [[ $TEXT == /'*/' ]] ;
But if I change it to
if [[ $TEXT == a*a ]] ;
it works for a sentence like 'an amazing apa'. So I Believe it has to do with the single quote sign.
Any ideas on how I can solve it?
With a regex:
if [[ $TEXT =~ ^\'.*\'$ ]]
With globbing:
if [[ $TEXT == \'*\' ]]
I am writing the complete bash script so you won't have any confusion:
#! /bin/bash
text1="'helo there"
if [[ $text1 =~ ^\'.*\'$ ]]; then
echo "text1 match"
else
echo "text1 not match"
fi
text2="'hello babe'"
if [[ $text2 =~ ^\'.*\'$ ]]; then
echo "text2 match"
else
echo "text2 not match"
fi
Save the above script as
matchCode.sh
Now run it as:
./matchCode
output:
text1 not match
text2 match
Ask if you have any confusion.
Cyrus' helpful answer solves your problem as posted.
However, I suspect you may be confused over quotes that are part of the shell syntax vs. quotes that are actually part of the string:
In a POSIX-like shell such as Bash, 'My name is Mozart' is a single-quoted string whose content is the literal My name is Mozart - without the enclosing '. That is, the enclosing ' characters are a syntactic elements that tell the shell that everything between them is the literal contents of the string.
By contrast, to create a string whose content is actually enclosed in ' - i.e., has embedded ' instances, you'd have to use something like: "'My name is Mozart'". Now it is the enclosing " instances that are the syntactic elements that bookend the string content.
Note, however, that using a "..." string (double quotes) makes the contents subject to string interpolation (expansion of embedded variable references, arithmetic and command substitutions; none in the case at hand, however), so it's important to know when to use '...' (literal strings) vs. "..." (interpolated strings).
Embedding ' instances in '...' strings is actually not supported at all in POSIX-like shells, but in Bash, Ksh, and Zsh there's another string type that allows you to do that: ANSI C-quoted strings, $'...', in which you can embed ' escaped as \': $'\'My name is Mozart\''
Another option is to use string concatenation: In POSIX-like shells, you can place substrings employing different quoting styles (including unquoted tokens) directly next to one another in order to form a single string: "'"'My Name is Mozart'"'" would also give you a string with contents 'My Name is Mozart'.
POSIX-like shells also allow you to escape individual, unquoted characters (meaning: neither part of a single- nor a double-quoted string) with \; therefore, \''My name is Mozart'\' yields the same result.
The behavior of Bash's == operator inside [[ ... ]] (conditionals) may have added to the confusion:
If the RHS (right-hand side - the operand to the right of operator ==) is quoted, Bash treats it like a literal; only unquoted strings (or variable references) are treated as (glob-like) patterns:
'*' matches literal *, whereas * (unquoted!) matches any sequence of characters, including none.
Thus:
[[ $TEXT == '*' ]] would only ever match the single, literal character *.
[[ $TEXT == /'*/' ]], because it mistakes / for the escape character - which in reality is \ - would only match literal /*/ (/'*/' is effectively a concatenation of unquoted / and single-quoted literal */).
[[ $TEXT == a*a ]], due to using an unquoted RHS, is the only variant that actually performs pattern matching: any string that starts with a and ends with a is matched, including aa (because unquoted * represents any sequence of characters).
To verify that Cyrus' commands do work with strings whose content is enclosed in (embedded) single quotes, try these commands, which - on Bash, Ksh, and Zsh - should both output yes.
[[ "'ab'" == \'*\' ]] && echo yes # pattern matching, indiv. escaped ' chars.
[[ "'ab'" =~ ^\'.*\'$ ]] && echo yes # regex operator =~
The initial string is RU="903B/100ms"
from which I wish to obtain B/100ms.
Currently, I have written:
#!/bin/bash
RU="903B/100ms"
RU=${RU#*[^0-9]}
echo $RU
which returns /100ms since the parameter expansion removes up to and including the first non-numeric character. I would like to keep the first non-numeric character in this case. How would I do this by amending the above text?
You can use BASH_REMATCH to extract the desired matching value:
$ RU="903B/100ms"
$ [[ $RU =~ ^([[:digit:]]+)(.*) ]] && echo ${BASH_REMATCH[2]}
B/100ms
Or just catch the desired part as:
$ [[ $RU =~ ^[[:digit:]]+(.*) ]] && echo ${BASH_REMATCH[1]}
B/100ms
Assuming shopt -s extglob:
RU="${RU##+([0-9])}"
echo "903B/100ms" | sed 's/^[0-9]*//g'
B/100ms