How to check if a file name matches regex in shell script - bash

I have a shell script that needs to check if a file name matches a certain regex, but it always shows "not match". Can anyone let me know what's wrong with my code?
fileNamePattern=abcd_????_def_*.txt
realFilePath=/data/file/abcd_12bd_def_ghijk.txt
if [[ $realFilePath =~ $fileNamePattern ]]
then
echo $realFilePath match $fileNamePattern
else
echo $realFilePath not match $fileNamePattern
fi

There is a confusion between regexes and the simpler "glob"/"wildcard"/"normal" patterns – whatever you want to call them. You're using the latter, but call it a regex.
If you want to use a pattern, you should
Quote it when assigning1:
fileNamePattern="abcd_????_def_*.txt"
You don't want anything to expand quite yet.
Make it match the complete path. This doesn't match:
$ mypath="/mydir/myfile1.txt"
$ mypattern="myfile?.txt"
$ [[ $mypath == $mypattern ]] && echo "Matches!" || echo "Doesn't match!"
Doesn't match!
But after extending the pattern to start with *:
$ mypattern="*myfile?.txt"
$ [[ $mypath == $mypattern ]] && echo "Matches!" || echo "Doesn't match!"
Matches!
The first one doesn't match because it matches only the filename, but not the complete path. Alternatively, you could use the first pattern, but remove the rest of the path with parameter expansion:
$ mypattern="myfile?.txt"
$ mypath="/mydir/myfile1.txt"
$ echo "${mypath##*/}"
myfile1.txt
$ [[ ${mypath##*/} == $mypattern ]] && echo "Matches!" || echo "Doesn't match!"
Matches!
Use == and not =~, as shown in the above examples. You could also use the more portable = instead, but since we're already using the non-POSIX [[ ]] instead of [ ], we can as well use ==.
If you want to use a regex, you should:
Write your pattern as one: ? and * have a different meaning in regexes; they modify what they stand after, whereas in glob patterns, they can stand on their own (see the manual). The corresponding pattern would become:
fileNameRegex='abcd_.{4}_def_.*\.txt'
and could be used like this:
$ mypath="/data/file/abcd_12bd_def_ghijk.txt"
$ [[ $mypath =~ $fileNameRegex ]] && echo "Matches!" || echo "Doesn't match!"
Matches!
Keep your habit of writing the regex into a separate parameter and then use it unquoted in the conditional operator [[ ]], or escaping gets very messy – it's also more portable across Bash versions.
The BashGuide has a great article about the different types of patterns in Bash.
Notice that quoting your parameters is almost always a good habit. It's not required in conditional expressions in [[ ]], and actually suppresses interpretation of the right-hand side as a pattern or regex. If you were using [ ] (which doesn't support regexes and patterns anyway), quoting would be required to avoid unexpected side effects of special characters and empty strings.
1 Not exactly true in this case, actually. When assigning to a variable, the manual says that the following happens:
[...] tilde expansion, parameter and variable expansion, command substitution, arithmetic expansion, and quote removal [...]
i.e., no pathname (glob) expansion. While in this very case using
fileNamePattern=abcd_????_def_*.txt
would work just as well as the quoted version, using quotes prevents surprises in many other cases and is required as soon as you have a blank in the pattern.

Use RegExs instead of wildcards:
{ ~ } » fileNamePattern="abcd_...._def_.*\.txt" ~
{ ~ } » realFilePath=/data/file/abcd_12bd_def_ghijk.txt ~
{ ~ } » if [[ $realFilePath =~ $fileNamePattern ]] ~
\ then
\ echo $realFilePath match $fileNamePattern
\ else
\ echo $realFilePath not match $fileNamePattern
\ fi
Output:
/data/file/abcd_12bd_def_ghijk.txt match abcd_...._def_.*\.txt

Related

In Bash, is it possible to match a string variable containing wildcards to another string

I am trying to compare strings against a list of other strings read from a file.
However some of the strings in the file contain wildcard characters (both ? and *) which need to be taken into account when matching.
I am probably missing something but I am unable to see how to do it
Eg.
I have strings from file in an array which could be anything alphanumeric (and include commas and full stops) with wildcards : (a?cd, xy, q?hz, j,h-??)
and I have another string I wish to compare with each item in the list in turn. Any of the strings may contain spaces.
so what I want is something like
teststring="abcdx.rubb ish,y"
matchstrings=("a?cd" "*x*y" "q?h*z" "j*,h-??")
for i in "${matchstrings[#]}" ; do
if [[ "$i" == "$teststring" ]]; then # this test here is the problem
<do something>
else
<do something else>
fi
done
This should match on the second "matchstring" but not any others
Any help appreciated
Yes; you just have the two operands to == reversed; the glob goes on the right (and must not be quoted):
if [[ $teststring == $i ]]; then
Example:
$ i=f*
$ [[ foo == $i ]] && echo pattern match
pattern match
If you quote the parameter expansion, the operation is treated as a literal string comparison, not a pattern match.
$ [[ foo == "$i" ]] || echo "foo != f*"
foo != f*
Spaces in the pattern are not a problem:
$ i="foo b*"
$ [[ "foo bar" == $i ]] && echo pattern match
pattern match
You can do this even completely within POSIX, since case alternatives undergo parameter substitution:
#!/bin/sh
teststring="abcdx.rubbish,y"
while IFS= read -r matchstring; do
case $teststring in
($matchstring) echo "$matchstring";;
esac
done << "EOF"
a?cd
*x*y
q?h*z
j*,h-??
EOF
This outputs only *x*y as desired.

how to check whether a string starts with xx and ends with yy in shellscript?

In the below example I want to find whether the sentence starts with 'ap' and ends with 'e'.
example: a="apple"
if [[ "$a" == ^"ap"+$ ]]
This is not giving proper output.
You don't mention which shell you're using, but the [[ in your attempt suggests you're using one that expands upon the base POSIX sh language. The following works with at least bash, zsh and ksh93:
$ a=apple
$ [[ $a == ap*e ]] && echo matches # Wildcard pattern
matches
$ [[ $a =~ ^ap.*e$ ]] && echo matches # Regular expression - note the =~
matches

Detecting when a string exists but doesn't start with - in bash

I am trying to make a bash program that saves results to a file with the name of the user's choosing if the program is supplied the --file argument followed by an option, in which the option should not start with a dash. So I used the following conditional:
if [[ -n $2 && !($2="[^-]") ]]
But that didn't work. It still saves the output to a file even if the second argument starts with a dash. I also tried using this:
1) if ! [[ -z $2 && ($2="[^-]") ]]
It also did as the previous one. What's the problem? Thanks in advance!
As a pattern match, this might look like:
[[ $2 ]] && [[ $2 != -* ]]
Note:
Moving && outside of [[ ]] isn't mandatory, but it is good form: It ensures that your code can be rewritten to work with the POSIX test command without either using obsolescent functionality (-a and -o) or needing to restructure.
Whitespace is mandatory. In !($2="[^-]"), neither the ! nor the ( and ) nor the = are parsed as separate operators.
= and != check for pattern matches, not regular expressions. The regular expression operator in [[ ]] is =~. Among the differences, anchors (^ to match at the beginning of a string, or $ to match at the end) are implicit in a pattern whereas they need to be explicit in a regex, and * has a very different meaning (* in a pattern means the same thing as .* in a regex).
The ^ in [^-] already negates the -, so by using ! in addition, you're making your code only match when there is a dash in the second argument.
To test this yourself:
$ check_args() { [[ $2 ]] && [[ $2 != -* ]]; echo $?; }
$ check_args one --two
1
$ check_args one two
0
$ check_args one
1

BASH: Everything but not slash? IF STATEMENT (STRING COMPARISION)

I'm trying to match any strings that start with /John/ but does not contain / after /John/
if
[ $string == /John/[!/]+ ]; then ....
fi
This is what I got and it doesn't seem to be working.
So I tried
if
[[ $string =~ ^/John/[!/]+$ ]]; then ....
fi
It still didn't work, and so I changed it to
if
[[ $string =~ /John/[^/] ]]; then ....
fi
It worked but will match with all the strings that has / behind /John/ too.
For bash you want [[ $string =~ /John/[^/]*$ ]] -- the end-of-line anchor ensures there are no slashes after the last acceptable slash.
How about "the string starts with '/John/' and doesn't contain any slashes after '/John/'"?
[[ $string = /John/* && $string != /John/*/* ]]
Or you could compare against a parameter expansion that only expands if the conditions are met. This says "after stripping off everything including and after the last slash, the string is /John":
[[ ${string%/*} = /John ]]
In fact, this last solution is the only entirely POSIXLY_STRICT one I can come up with without multiple test expressions.
[ "${string%/*}" = /John ]
By the way, your problem is probably simply be using double-equals inside a single-bracket test expression. bash actually does accept them inside double-bracket test expressions, but a single equals is a better idea.
You can also use plain old grep:
string='/John Lennon/Yoko Ono'
if echo "$string" | grep -q "/John[^/]" ; then
echo "matched"
else
echo "no match found"
fi
This only fails if /John is at the very end of the string... if that's a possibility then you can tweak to handle that case, for instance:
string='/John Lennon/Yoko Ono'
if echo "$string" | grep -qP "(/John[^/])|(/John$)" ; then
echo "matched"
else
echo "no match found"
fi
Not sure what language you're using, but normal negative character classes are prefixed with a ^
e.g.
[^/]
You can also put in start/end qualifiers (clojure example, so Java's regex engine). Usually ^ at beginning and $ at end.
user => (re-matches #"^/[a-zA-Z]+[^/]$" "/John/")
nil

What does the "=~" operator do in shell scripts?

It seems that it is sort of comparison operator, but what exactly it does in e.g. the following code (taken from https://github.com/lvv/git-prompt/blob/master/git-prompt.sh#L154)?
if [[ $LC_CTYPE =~ "UTF" && $TERM != "linux" ]]; then
elipses_marker="…"
else
elipses_marker="..."
fi
I'm currently trying to make git-prompt to work under MinGW, and the shell supplied with MinGW doesn't seem to support this operator:
conditional binary operator expected
syntax error near `=~'
` if [[ $LC_CTYPE =~ "UTF" && $TERM != "linux" ]]; then'
In this specific case I can just replace the entire block with elipses_marker="…" (as I know my terminal supports unicode), but what exactly this =~ does?
It's a bash-only addition to the built-in [[ command, performing regexp matching. Since it doesn't have to be an exact match of the full string, the symbol is waved, to indicate an "inexact" match.
In this case, if $LC_CTYPE CONTAINS the string "UTF".
More portable version:
if test `echo $LC_CTYPE | grep -c UTF` -ne 0 -a "$TERM" != "linux"
then
...
else
...
fi
It's a regular expression matching. I guess your bash version doesn't support that yet.
In this particular case, I'd suggest replacing it with simpler (and faster) pattern matching:
[[ $LC_CTYPE == *UTF* && $TERM != "linux" ]]
(note that * must not be quoted here)
Like Ruby, it matches where the RHS operand is a regular expression.
It matches regular expressions
Refer to following example from http://tldp.org/LDP/abs/html/bashver3.html#REGEXMATCHREF
#!/bin/bash
input=$1
if [[ "$input" =~ "[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]" ]]
# ^ NOTE: Quoting not necessary, as of version 3.2 of Bash.
# NNN-NN-NNNN (where each N is a digit).
then
echo "Social Security number."
# Process SSN.
else
echo "Not a Social Security number!"
# Or, ask for corrected input.
fi

Resources