What does the "=~" operator do in shell scripts? - bash

It seems that it is sort of comparison operator, but what exactly it does in e.g. the following code (taken from https://github.com/lvv/git-prompt/blob/master/git-prompt.sh#L154)?
if [[ $LC_CTYPE =~ "UTF" && $TERM != "linux" ]]; then
elipses_marker="…"
else
elipses_marker="..."
fi
I'm currently trying to make git-prompt to work under MinGW, and the shell supplied with MinGW doesn't seem to support this operator:
conditional binary operator expected
syntax error near `=~'
` if [[ $LC_CTYPE =~ "UTF" && $TERM != "linux" ]]; then'
In this specific case I can just replace the entire block with elipses_marker="…" (as I know my terminal supports unicode), but what exactly this =~ does?

It's a bash-only addition to the built-in [[ command, performing regexp matching. Since it doesn't have to be an exact match of the full string, the symbol is waved, to indicate an "inexact" match.
In this case, if $LC_CTYPE CONTAINS the string "UTF".
More portable version:
if test `echo $LC_CTYPE | grep -c UTF` -ne 0 -a "$TERM" != "linux"
then
...
else
...
fi

It's a regular expression matching. I guess your bash version doesn't support that yet.
In this particular case, I'd suggest replacing it with simpler (and faster) pattern matching:
[[ $LC_CTYPE == *UTF* && $TERM != "linux" ]]
(note that * must not be quoted here)

Like Ruby, it matches where the RHS operand is a regular expression.

It matches regular expressions
Refer to following example from http://tldp.org/LDP/abs/html/bashver3.html#REGEXMATCHREF
#!/bin/bash
input=$1
if [[ "$input" =~ "[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]" ]]
# ^ NOTE: Quoting not necessary, as of version 3.2 of Bash.
# NNN-NN-NNNN (where each N is a digit).
then
echo "Social Security number."
# Process SSN.
else
echo "Not a Social Security number!"
# Or, ask for corrected input.
fi

Related

Bash Regex if else

In Bash I'm trying to check if a string is in the appropriate format.
#!/bin/bash
COMMIT_MSG="release/patch/JIRA-123"
[[ $COMMIT_MSG =~ 'release\/(major|minor|patch)\/[A-Z\d]+-\d+' ]] && echo "yes" || echo "no"
This is the regex I've used to match the string as patch could be either major or minor and JIRA-123 is Jira Ticket ID but when trying it in the Bash regex it always returns no.
Bash is a simplified version of regex called "Extended Regular Expression". \d doesn't exist in it, so use [0-9] instead.
Additionally, you shouldn't quote the regex in the condition.
[[ $COMMIT_MSG =~ release/(major|minor|patch)/[A-Z0-9]+-[0-9]+ ]] && echo "yes" || echo "no"

Detecting when a string exists but doesn't start with - in bash

I am trying to make a bash program that saves results to a file with the name of the user's choosing if the program is supplied the --file argument followed by an option, in which the option should not start with a dash. So I used the following conditional:
if [[ -n $2 && !($2="[^-]") ]]
But that didn't work. It still saves the output to a file even if the second argument starts with a dash. I also tried using this:
1) if ! [[ -z $2 && ($2="[^-]") ]]
It also did as the previous one. What's the problem? Thanks in advance!
As a pattern match, this might look like:
[[ $2 ]] && [[ $2 != -* ]]
Note:
Moving && outside of [[ ]] isn't mandatory, but it is good form: It ensures that your code can be rewritten to work with the POSIX test command without either using obsolescent functionality (-a and -o) or needing to restructure.
Whitespace is mandatory. In !($2="[^-]"), neither the ! nor the ( and ) nor the = are parsed as separate operators.
= and != check for pattern matches, not regular expressions. The regular expression operator in [[ ]] is =~. Among the differences, anchors (^ to match at the beginning of a string, or $ to match at the end) are implicit in a pattern whereas they need to be explicit in a regex, and * has a very different meaning (* in a pattern means the same thing as .* in a regex).
The ^ in [^-] already negates the -, so by using ! in addition, you're making your code only match when there is a dash in the second argument.
To test this yourself:
$ check_args() { [[ $2 ]] && [[ $2 != -* ]]; echo $?; }
$ check_args one --two
1
$ check_args one two
0
$ check_args one
1

How to check if a file name matches regex in shell script

I have a shell script that needs to check if a file name matches a certain regex, but it always shows "not match". Can anyone let me know what's wrong with my code?
fileNamePattern=abcd_????_def_*.txt
realFilePath=/data/file/abcd_12bd_def_ghijk.txt
if [[ $realFilePath =~ $fileNamePattern ]]
then
echo $realFilePath match $fileNamePattern
else
echo $realFilePath not match $fileNamePattern
fi
There is a confusion between regexes and the simpler "glob"/"wildcard"/"normal" patterns – whatever you want to call them. You're using the latter, but call it a regex.
If you want to use a pattern, you should
Quote it when assigning1:
fileNamePattern="abcd_????_def_*.txt"
You don't want anything to expand quite yet.
Make it match the complete path. This doesn't match:
$ mypath="/mydir/myfile1.txt"
$ mypattern="myfile?.txt"
$ [[ $mypath == $mypattern ]] && echo "Matches!" || echo "Doesn't match!"
Doesn't match!
But after extending the pattern to start with *:
$ mypattern="*myfile?.txt"
$ [[ $mypath == $mypattern ]] && echo "Matches!" || echo "Doesn't match!"
Matches!
The first one doesn't match because it matches only the filename, but not the complete path. Alternatively, you could use the first pattern, but remove the rest of the path with parameter expansion:
$ mypattern="myfile?.txt"
$ mypath="/mydir/myfile1.txt"
$ echo "${mypath##*/}"
myfile1.txt
$ [[ ${mypath##*/} == $mypattern ]] && echo "Matches!" || echo "Doesn't match!"
Matches!
Use == and not =~, as shown in the above examples. You could also use the more portable = instead, but since we're already using the non-POSIX [[ ]] instead of [ ], we can as well use ==.
If you want to use a regex, you should:
Write your pattern as one: ? and * have a different meaning in regexes; they modify what they stand after, whereas in glob patterns, they can stand on their own (see the manual). The corresponding pattern would become:
fileNameRegex='abcd_.{4}_def_.*\.txt'
and could be used like this:
$ mypath="/data/file/abcd_12bd_def_ghijk.txt"
$ [[ $mypath =~ $fileNameRegex ]] && echo "Matches!" || echo "Doesn't match!"
Matches!
Keep your habit of writing the regex into a separate parameter and then use it unquoted in the conditional operator [[ ]], or escaping gets very messy – it's also more portable across Bash versions.
The BashGuide has a great article about the different types of patterns in Bash.
Notice that quoting your parameters is almost always a good habit. It's not required in conditional expressions in [[ ]], and actually suppresses interpretation of the right-hand side as a pattern or regex. If you were using [ ] (which doesn't support regexes and patterns anyway), quoting would be required to avoid unexpected side effects of special characters and empty strings.
1 Not exactly true in this case, actually. When assigning to a variable, the manual says that the following happens:
[...] tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, and quote removal [...]
i.e., no pathname (glob) expansion. While in this very case using
fileNamePattern=abcd_????_def_*.txt
would work just as well as the quoted version, using quotes prevents surprises in many other cases and is required as soon as you have a blank in the pattern.
Use RegExs instead of wildcards:
{ ~ } » fileNamePattern="abcd_...._def_.*\.txt" ~
{ ~ } » realFilePath=/data/file/abcd_12bd_def_ghijk.txt ~
{ ~ } » if [[ $realFilePath =~ $fileNamePattern ]] ~
\ then
\ echo $realFilePath match $fileNamePattern
\ else
\ echo $realFilePath not match $fileNamePattern
\ fi
Output:
/data/file/abcd_12bd_def_ghijk.txt match abcd_...._def_.*\.txt

RE matching operator not working in this script

Strange enough, [[ 111-11-1111 =~ "[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]" ]] just yield success on command line.
But this script cannot yield the same when I bash re.sh 111-11-1111
#!/bin/bash
# re.sh
input=$1
if [[ "$input" =~ "[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]" ]]
# ^ NOTE: Quoting not necessary, as of version 3.2 of Bash.
# NNN-NN-NNNN (where each N is a digit).
then
echo "Social Security number."
# Process SSN.
else
echo "Not a Social Security number!"
# Or, ask for corrected input.
fi
why?
As others have mentioned, you should remove the quotes on the regular expression if you're using bash 3.2 or higher. Also though, here's a shorter expression:
if [[ $input =~ ^[0-9]{3}-[0-9]{2}-[0-9]{4}$ ]]

"sh: : unknown operand" in Yocto

The following works in Ubuntu but not Yocto (Poky).
root#system:~/# x='abc'
root#system:~/# y=''
root#system:~/# [[ $(echo $x) != '' ]] && echo true
true
root#system:~/# [[ $(echo $y) != '' ]] && echo true
sh: : unknown operand
In Ubuntu the last line returns nothing (as expected). Any ideas why it's throwing an error in Yocto?
The problem seems to be that $(echo $y) is expanding to an empty string, and then [[ isn't handling it correctly. The solution to that would be to quote the command substitution like
[[ "$(echo "$y")" != '' ]] && echo true
though it's probably better still to use printf than echo so you might do it as
[[ "$(printf '%s' "$y")" != '' ]] && echo true
just in case $y might end up with special characters that can trip up echo or similar
Apparently, busybox ash has a rather simplistic implementation of [[. It is the same as [ except that it expects a ]] instead of ] final argument. This misses the point of why [[ can be useful at all: [[ is supposed to be a keyword with special parsing and using it looks more beautiful and avoids various pitfalls (while adding some of its own). I guess they added it so a few more bash scripts run unmodified on busybox ash.
To avoid confusion, I recommend not using [[ in busybox at all. Use [ and quote all command substitutions and parameter expansions.

Resources