In Bash, is it possible to match a string variable containing wildcards to another string - bash

I am trying to compare strings against a list of other strings read from a file.
However some of the strings in the file contain wildcard characters (both ? and *) which need to be taken into account when matching.
I am probably missing something but I am unable to see how to do it
Eg.
I have strings from file in an array which could be anything alphanumeric (and include commas and full stops) with wildcards : (a?cd, xy, q?hz, j,h-??)
and I have another string I wish to compare with each item in the list in turn. Any of the strings may contain spaces.
so what I want is something like
teststring="abcdx.rubb ish,y"
matchstrings=("a?cd" "*x*y" "q?h*z" "j*,h-??")
for i in "${matchstrings[#]}" ; do
if [[ "$i" == "$teststring" ]]; then # this test here is the problem
<do something>
else
<do something else>
fi
done
This should match on the second "matchstring" but not any others
Any help appreciated

Yes; you just have the two operands to == reversed; the glob goes on the right (and must not be quoted):
if [[ $teststring == $i ]]; then
Example:
$ i=f*
$ [[ foo == $i ]] && echo pattern match
pattern match
If you quote the parameter expansion, the operation is treated as a literal string comparison, not a pattern match.
$ [[ foo == "$i" ]] || echo "foo != f*"
foo != f*
Spaces in the pattern are not a problem:
$ i="foo b*"
$ [[ "foo bar" == $i ]] && echo pattern match
pattern match

You can do this even completely within POSIX, since case alternatives undergo parameter substitution:
#!/bin/sh
teststring="abcdx.rubbish,y"
while IFS= read -r matchstring; do
case $teststring in
($matchstring) echo "$matchstring";;
esac
done << "EOF"
a?cd
*x*y
q?h*z
j*,h-??
EOF
This outputs only *x*y as desired.

Related

BASH regex syntax for replacing a sub-string

I'm working in bash and I want to remove a substring from a string, I use grep to detect the string and that works as I want, my if conditions are true, I can test them in other tools and they select exactly the string element I want.
When it comes to removing the element from the string I'm having difficulty.
I want to remove something like ": Series 1", where there could be different numbers including 0 padded, a lower case s or extra spaces.
temp='Testing: This is a test: Series 1'
echo "A. "$temp
if echo "$temp" | grep -q -i ":[ ]*[S|s]eries[ ]*[0-9]*" && [ "$temp" != "" ]; then
title=$temp
echo "B. "$title
temp=${title//:[ ]*[S|s]eries[ ]*[0-9]*/ }
echo "C. "$temp
fi
# I trim temp for spaces here
series_title=${temp// /_}
echo "D. "$series_title
The problem I have is that at points C & D
Give me:
C. Testing
D. Testing_
You can perform regex matching from bash alone without using external tools.
It's not clear what your requirement is. But from your code, I guess following will help.
temp='Testing: This is a test: Series 1'
# Following will do a regex match and extract necessary parts
# i.e. extract everything before `:` if the entire pattern is matched
[[ $temp =~ (.*):\ *[Ss]eries\ *[0-9]* ]] || { echo "regex match failed"; exit; }
# now you can use the extracted groups as follows
echo "${BASH_REMATCH[1]}" # Output = Testing: This is a test
As mentioned in the comments, if you need to extract parts both before and after the removed section,
temp='Testing: This is a test: Series 1 <keep this>'
[[ $temp =~ (.*):\ *[Ss]eries\ *[0-9]*\ *(.*) ]] || { echo "invalid"; exit; }
echo "${BASH_REMATCH[1]} ${BASH_REMATCH[2]}" # Output = Testing: This is a test <keep this>
Keep in mind that [0-9]* will match zero lengths too. If you need to force that there need to be at least single digit, use [0-9]+ instead. Same goes for <space here>* (i.e. zero or more spaces) and others.

Counting filenames matching a regex in bash

I have the following script
setup=`ls ./test | egrep 'm-ha-.........js'`
regex="m-ha-(........)\.js"
if [[ "$setup" =~ $regex ]]
then
checksum=${BASH_REMATCH[1]}
fi
I noticed that if [[ "$setup" =~ $regex ]] returns the first file that matches the regex in BATCH_REMATCH.
Is there a way to test how many files matches the regex? I want to return an error, if there are multiple files that matches the regex.
You don't need a regex, or ls, for this.
matches=(./test/m-ha-????????.js)
[[ ${#matches[*]} -gt 1 ]] && echo "More than one."
We expand the wildcard into an array and examine the number of elements in the array.
If you want to strip the prefix, ${match[0]#mh-a-} returns the first element with the prefix removed. The % interpolation operator similarly strips a suffix, e.g. ${match[0]%.js}. You can't strip from both ends at the same time, but you can loop over the matches:
for match in "${matches[#]%.js}"; do
echo "${match#./test/m-ha-}"
done
Notice that the array won't be empty if there are no matches unless you explicitly set the nullglob option.

Case insensitive comparision in If condition

I have this csv file and i need to count the number of rows which satisfies the condition that the row entry is betwen a certain year range and the artist_name matches the name argument. But the string matching should be case insensitive. How do i achieve that in the if loop..
I am a beginner, so please bear with me
#!/bin/bash
file="$1"
artist="$2"
from_year="$(($3-1))"
to_year="$(($4+1))"
count=0
while IFS="," read arr1 arr2 arr3 arr4 arr5 arr6 arr7 arr8 arr9 arr10 arr11 ; do
if [[ $arr11 -gt $from_year ]] && [[ $arr11 -lt $to_year ]] && [[ $arr7 =~ $artist ]]; then
count=$((count+1))
fi
done < "$file"
echo $count
The $arr7 =~ $artist part is where i need to make the modification
Bash has a builtin method for converting strings to lower case. Once they are both lower case, you can compare them for equality. For example:
$ arr7="Rolling Stones"
$ artist="rolling stoneS"
$ [ "${arr7,,}" = "${artist,,}" ] && echo "Matches!"
Matches!
$ [[ ${arr7,,} =~ ${artist,,} ]] && echo "Matches!"
Matches!
Details
${parameter,,} converts all characters in a string to lower case. If you wanted to convert to upper case, use ${parameter^^}. If you want to convert just some of the characters, use ${parameter,,pattern} where only those characters matching pattern are changed. Still more details on this are documented by manbash`:
${parameter^pattern}
${parameter^^pattern}
${parameter,pattern}
${parameter,,pattern}
Case modification. This expansion modifies the case of alphabetic characters in parameter. The pattern is expanded to
produce a pattern just
as in pathname expansion. The ^ operator converts lowercase letters matching pattern to uppercase; the , operator
converts matching uppercase
letters to lowercase. The ^^ and ,, expansions convert each matched character in the expanded value; the ^ and , expansions
match and convert
only the first character in the expanded value. If pattern is omitted, it is treated like a ?, which matches every
character. If parameter
is # or *, the case modification operation is applied to each positional parameter in turn, and the expansion is the
resultant list. If
parameter is an array variable subscripted with # or *, the case modification operation is applied to each member of the array
in turn, and
the expansion is the resultant list.
Compatibility
These case modification methods require bash version 4 (released on 2009-Feb-20) or better.
The bash case-transformations (${var,,} and ${var^^}) were introduced (some time ago) in bash version 4. However, if you are using Mac OS X, you most likely have bash v3.2 which doesn't implement case-transformation natively.
In that case, you can do lower-cased comparison less efficiently and with a lot more typing using tr:
if [[ $(tr "[:upper:]" "[:lower:]" <<<"$arr7") = $(tr "[:upper:]" "[:lower:]" <<<"$artist") ]]; then
# ...
fi
By the way, =~ does a regular expression comparison, not a string comparison. You almost certainly wanted =. Also, instead of [[ $x -lt $y ]] you can use an arithmetic compound command: (( x < y )). (In arithmetic expansions, it is not necessary to use $ to indicate variables.)
Use shopt -s nocasematch
demo
#!/bin/bash
words=(Cat dog mouse cattle scatter)
#Print words from list that match pat
print_matches()
{
pat=$1
echo "Pattern to match is '$pat'"
for w in "${words[#]}"
do
[[ $w =~ $pat ]] && echo "$w"
done
echo
}
echo -e "Wordlist: (${words[#]})\n"
echo "Normal matching"
print_matches 'cat'
print_matches 'Cat'
echo -e "-------------------\n"
echo "Case-insensitive matching"
shopt -s nocasematch
print_matches 'cat'
print_matches 'CAT'
echo -e "-------------------\n"
echo "Back to normal matching"
shopt -u nocasematch
print_matches 'cat'
output
Wordlist: (Cat dog mouse cattle scatter)
Normal matching
Pattern to match is 'cat'
cattle
scatter
Pattern to match is 'Cat'
Cat
-------------------
Case-insensitive matching
Pattern to match is 'cat'
Cat
cattle
scatter
Pattern to match is 'CAT'
Cat
cattle
scatter
-------------------
Back to normal matching
Pattern to match is 'cat'
cattle
scatter

How do I compare two strings in if condition in bash

s="STP=20"
if [[ "$x" == *"$s"* ]]
The if condition is always false; why?
Try this: http://tldp.org/LDP/abs/html/comparison-ops.html
string comparison
=
is equal to
if [ "$a" = "$b" ]
There is a difference in testing for equality between [ ... ] and [[ ... ]].
The [ ... ] is an alias to the test command:
STRING1 = STRING2 the strings are equal
However, when using [[ ... ]]
When the == and != operators are used, the string to the right of the operator is considered a pattern and matched according to the rules described below under Pattern Matching. If the shell option nocasematch is enabled, the match is performed without regard to the case of alphabetic characters. The return value is 0 if the string matches (==) or does not match (!=) the pattern, and 1 otherwise. Any part of the pattern may be quoted to force it to be matched as a string.
The same seems to be true with just the = sign:
$ foo=bar
$ if [[ $foo = *ar ]]
> then
> echo "These patterns match"
> else
> echo "These two strings aren't equal"
> fi
These patterns match
Note the difference:
$ foo=bar
> if [ $foo = *ar ]
> then
> echo "These patterns match"
> else
> echo "These two strings aren't equal"
> fi
These two strings aren't equal
However, there are a few traps with the [ $f00 = *ar ] syntax. This is the same as:
test $foo = *ar
Which means the shell will interpolate glob expressions and variables before executing the statement. If $foo is empty, the command will become equivalent to:
test = *ar # or [ = *ar ]
Since the = isn't a valid comparison operator in test, you'll get an error like:
bash: [: =: unary operator expected
Which means the [ was expecting a parameter found in the test manpage.
And, if I happen to have a file bar in my directory, the shell will replace *ar with all files that match that pattern (in this case bar), so the command will become:
[ $foo = bar ]
which IS true.
To get around the various issues with [ ... ], you should always put quotes around the parameters. This will prevent the shell from interpolating globs and will help with variables that have no values:
[ "$foo" = "*ar" ]
This will test whether the variable $foo is equal to the string *ar. It will work even if $foo is empty because the quotation marks will force an empty string comparison. The quotes around *ar will prevent the shell from interpolating the glob. This is a true equality.
Of course, it just so happens that if you use quotation marks when using [[ ... ]], you'll force a string match too:
foo=bar
if [[ $foo == "*ar" ]]
then
echo "This is a pattern match"
else
echo "These strings don't match"
fi
So, in the end, if you want to test for string equality, you can use either [ ... ] or [[ ... ]], but you must quote your parameters. If you want to do glob pattern matching, you must leave off the quotes, and use [[ ... ]].
To compare two strings in variables x and y for equality, use
if test "$x" = "$y"; then
printf '%s\n' "equal"
else
printf '%s\n' "not equal"
fi
To test whether x appears somewhere in y, use
case $y in
(*"$x"*)
printf '%s\n' "$y contains $x"
;;
(*)
printf '%s\n' "$y does not contain $x"
;;
esac
Note that these constructs are portable to any POSIX shell, not just bash. The [[ ]] construct for tests is not (yet) a standard shell feature.
I do not know where you came up with the *, but you were real close:
s="STP=20"
if [[ "STP=20" == "$s" ]]; then
echo "It worked!"
fi
You need to escape = using \ in the string s="STP=20"
s="STP\=20"
if [[ "STP\=20" == "$s" ]]; then echo Hi; else echo Bye; fi

BASH: Everything but not slash? IF STATEMENT (STRING COMPARISION)

I'm trying to match any strings that start with /John/ but does not contain / after /John/
if
[ $string == /John/[!/]+ ]; then ....
fi
This is what I got and it doesn't seem to be working.
So I tried
if
[[ $string =~ ^/John/[!/]+$ ]]; then ....
fi
It still didn't work, and so I changed it to
if
[[ $string =~ /John/[^/] ]]; then ....
fi
It worked but will match with all the strings that has / behind /John/ too.
For bash you want [[ $string =~ /John/[^/]*$ ]] -- the end-of-line anchor ensures there are no slashes after the last acceptable slash.
How about "the string starts with '/John/' and doesn't contain any slashes after '/John/'"?
[[ $string = /John/* && $string != /John/*/* ]]
Or you could compare against a parameter expansion that only expands if the conditions are met. This says "after stripping off everything including and after the last slash, the string is /John":
[[ ${string%/*} = /John ]]
In fact, this last solution is the only entirely POSIXLY_STRICT one I can come up with without multiple test expressions.
[ "${string%/*}" = /John ]
By the way, your problem is probably simply be using double-equals inside a single-bracket test expression. bash actually does accept them inside double-bracket test expressions, but a single equals is a better idea.
You can also use plain old grep:
string='/John Lennon/Yoko Ono'
if echo "$string" | grep -q "/John[^/]" ; then
echo "matched"
else
echo "no match found"
fi
This only fails if /John is at the very end of the string... if that's a possibility then you can tweak to handle that case, for instance:
string='/John Lennon/Yoko Ono'
if echo "$string" | grep -qP "(/John[^/])|(/John$)" ; then
echo "matched"
else
echo "no match found"
fi
Not sure what language you're using, but normal negative character classes are prefixed with a ^
e.g.
[^/]
You can also put in start/end qualifiers (clojure example, so Java's regex engine). Usually ^ at beginning and $ at end.
user => (re-matches #"^/[a-zA-Z]+[^/]$" "/John/")
nil

Resources