Shorthand character class not working in shell script [duplicate] - bash

This question already has answers here:
Bash Regular Expression -- Can't seem to match any of \s \S \d \D \w \W etc
(6 answers)
Closed 8 months ago.
If I pass a word as an argument by:
$./file.sh hello
it gives Even as output when it should print "Argument should be a number"
#!/bin/bash
set -e
if [[ -z $1 ]]
then
echo "Argument expected"
else
if [[ $1 =~ "\D" ]] #This does not work as expected
then
echo "Argument should be a number"
else
a=$1%2
if [[ a -eq 0 ]]
then
echo "Even"
elif [[ a -eq 1 ]]
then
echo "Odd"
fi
fi
fi
#End of program
When I change "\D" to "[^0-9]" in the if statement, it works as expected and prints "Argument should be a number" to the console.
Don't they both have the same meaning? If not, in what way are the two different from each other?

Bash uses POSIX Extended Regular Expressions, not PCRE. Instead of escape sequences like \D, it uses Bracket Expressions. The bracket expression for digits is
[:digit:]
and to match non-digits, you use this inside a character class with the negation operator:
[^[:digit:]]
As you can see, this is longer than just writing [^0-9], so it's not really a shorthand. It's useful for portability to other locales, since it will include their digits as well.

Bash regex simply does not support PCRE regex syntax.
You might want to read up on different regex dialects and their history.
See e.g. Why are there so many different regular expression dialects?

The POSIX equivalent of \D is [^[:digit:]].

Related

different interpretation of varialble as pattern [duplicate]

This question already has answers here:
Regex stored in a shell variable doesn't work between double brackets
(2 answers)
bash regex with quotes?
(5 answers)
Closed 1 year ago.
I would like someone to clarify this because I don't understand it.
Here is a sample code, that tests an argument if it is numeric or not (integer)
#/bin/env bash
pattern="^[+|-]?[0-9]+$"
[[ "$1" =~ "$pattern" ]] && echo "1:number" || echo "1:NOT number"
[[ "$1" =~ $pattern ]] && echo "1:number" || echo "1:NOT number"
it is advisable to quote always the variables, but here, if you make the test with this simple script with various inputs, you will see that if you enter a number, the quoted pattern variable returns an erroneous result (first test)
Why is that?
thanks in advance for anyone who will take the trouble to explain this to me.
Finally, sorry if that is already answered but I haven't found that particular one.
It's normally advised to quote all variables. But [[ ]] is a special operator, it parses its contents differently.
You don't need to quote variables inside double square brackets, because it doesn't do word splitting or filename expansion. But there's no harm in quoting most variables.
However, the pattern operand to =~ is treated very specially. Any part of it that's quoted is treated as a literal, not a regular expression pattern. So when you write "$pattern" it no longer does a regular expression match, it just searches for the actual characters in $pattern in $1.

How to match version of a command using bash "=~"?

luajit -v
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
I want to negate match the version part, and got LUAJIT_VERSION="2.1.0-beta3" at the beginning of bash script. I use:
if ! [[ "$(luajit -v)" =~ LuaJIT\s+"$LUAJIT_VERSION".* ]]; then
#rest of code
But it seems not working whether I put $LUAJIT_VERSION between "" or not:
Any part of the pattern may be quoted to force the quoted portion to be matched as a string ... If the pattern is stored in a shell variable, quoting the variable expansion forces the entire pattern to be matched as a string.
Bash docs
Can you tell me what's the correct way to do this task?
\s is not a recognized character class in bash; you need to use [[:blank:]] instead:
if ! [[ "$(luajit -v)" =~ LuaJIT[[:blank:]]+"$LUAJIT_VERSION" ]]; then
(The trailing .* isn't necessary, since regular expressions aren't anchored to the start or end of the string.)
However, it's not clear your regular expression needs to be that general. It looks like you can use a single, literal space
if ! [[ "$(luajit -v)" =~ LuaJIT\ "$LUAJIT_VERSION" ]];
or simply use pattern matching:
if [[ "$(luajit -v)" != LuaJIT\ "$LUAJIT_VERSION"* ]];

Bash: avoiding filename expansion inside double bracket regular expression test [duplicate]

This question already has answers here:
How do I escape the wildcard/asterisk character in bash?
(7 answers)
Closed 5 years ago.
I am trying to do a regular expression match with a variable that contains an asterisk.
The following set of commands in Bash does filename expansion with the asterisk in the variable on the left-hand side of the operator.
test='part1 * part2'
[[ "$test" =~ ^(.+)\ .\ (.+)$ ]] && echo $BASH_REMATCH
Results in: part1 FILE1 FILE2 part2
But it should result in: part1 * part2
I have searched and searched but cannot figure out why this is happening.
I realized while asking, the regular expression matching is working fine. There is no expansion happening inside the double brackets. The expansion is occurring after the match, when the result is echoed. The $BASH_REMATCH variable contains an asterisk, and needs to be double-quoted.
The correct set of commands is:
test='part1 * part2'
regex='^(.+) . (.+)$'
[[ "$test" =~ $regex ]] && echo "$BASH_REMATCH"
UPDATE: Set regular expression outside of test.

bash why no match with number on regexp

I'm curious why this code does not match -- take the "then" branch. It echos "no match" Can you please advise?
#!/bin/bash
suffix="2"
if [[ $suffix =~ "^[0-9]+$" ]]
then
echo "match"
else
echo "no match"
fi
Quoting the right-hand side of an = or =~ operation within [[ ]] in modern (3.2+) releases of bash makes the string literal -- ie. no longer a regex or pattern.
From the manual:
Any part of the pattern may be quoted to force the quoted portion to be matched as a string.
For consistent behavior across releases supporting =~ (if one needs to support versions prior to 3.2), accepted best practice is to put your regex in a variable, and use that variable unquoted on the right-hand side of =~:
re='^[0-9]+$'
[[ $suffix =~ $re ]]

What is the regex matching operator in bourne shell script?

I am trying to validate user input against a regular expression.
vari=A
if [ $vari =~ [A-Z] ] ;
then
echo "hurray"
fi
The output I am getting is swf.sh[3]: =~: unknown test operator.
Can you please let me know the test operator I can use?
It's not built into Bourne shell, you need to use grep:
if echo "$vari" | grep -q '[A-Z]'; then
echo hurray
fi
If you want to match the whole string, remember to use the regex anchors, ^ and $. Note that the -q flag makes grep quiet, so its only output is the return value, for match/not match.
POSIX shell doesn't have a regular expression operator (or rather, the POSIX test command does not). Instead, you use the expr command to do a (limited) form of regular expression matching.
if expr "$vari" : '[A-Z]' > /dev/null; then
(I say "limited" because it always matches at the beginning of the string, as if the regular expression started with ^.) The exit status is 0 if a match is made; it also writes the number of characters matched to standard output, hence the redirect to /dev/null.
If you are actually using bash, you need to use the [[ command:
if [[ $vari =~ [A-Z] ]]; then

Resources