How to check for strings between certain strings in unix scripting? - shell

String:
name#gmail.com
Checking for:
#
.com
My code
if [[ $word =~ "#" ]]
then
if [[ $word =~ ".com" || $word =~ ".ca" ]]
My problem
name#.com
The above example gets passed, which is not what I want. How do I check for characters (1 or more) between "#" and ".com"?

You can use a very very basic regex:
[[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]]
It looks for a string being exactly like this:
at least one a-z char
#
at least one a-z char
.
at least one a-z char
It can get as complicated as you want, see for example Email check regular expression with bash script.
See in action
$ var="a#b.com"
$ [[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]] && echo "kind of valid email"
kind of valid email
$ var="a#.com"
$ [[ $var =~ ^[a-z]+#[a-z]+\.[a-z]+$ ]] && echo "kind of valid email"
$

why not go for other tools like perl:
> echo "x#gmail.com" | perl -lne 'print $1 if(/#(.*?)\.com/)'
gmail

The glob pattern would be: [[ $word == ?*#?*.#(com|ca) ]]
? matches any single character and * matches zero or more characters
#(p1|p2|p3|...) is an extended globbing pattern that matches one of the given patterns. This requires:
shopt -s extglob
testing:
$ for word in #.com #a.ca a#.com a#b.ca a#b.org; do
echo -ne "$word\t"
[[ $word == ?*#?*.#(com|ca) ]] && echo matches || echo does not match
done
#.com does not match
#a.ca does not match
a#.com does not match
a#b.ca matches
a#b.org does not match

Related

how to check strings first char in bash

I want to check if a string's first char is uppercase, lowercase or anything else. I tried this code but I can't get to the last else although the first two conditions are false.
#!/bin/bash
echo "enter var: "
read var
if [[ {$var::1 =~ [A-Z] ]]
then
echo "UpperCase"
elif [[ {$var::1} =~ [a-z] ]]
then
echo "LowerCase"
else
echo "Digit or a symbol"
fi
exit
When I enter 1hello I get: "LowerCase"
What am I missing here?!
You don't necessarily need to extract the first character, you can compare the whole string to a pattern.
Here, I'm using the POSIX character classes [:upper:] and [:lower:] which I find more descriptive. They also handle non-ASCII letters.
Regex matching:
if [[ $var =~ ^[[:upper:]] ]]; then echo starts with an upper
elif [[ $var =~ ^[[:lower:]] ]]; then echo starts with a lower
else echo does not start with a letter
fi
With shell glob patterns -- within [[...]] the == operator does pattern matching not just string equality
if [[ $var == [[:upper:]]* ]]; then echo starts with an upper
elif [[ $var == [[:lower:]]* ]]; then echo starts with a lower
else echo does not start with a letter
fi
A case statement would work here as well
case "$var" in
[[:upper:]]*) echo starts with an upper ;;
[[:lower:]]*) echo starts with a lower ;;
*) echo does not start with a letter ;;
esac
Neither of your parameter expansions are correct. {$var::1 evaluates to {1hello::1, not 1, and {$var::1} likewise evaluates to {1hello::1}.
The expansion you want is ${var::1}, which does expand to 1 as intended.
You don't need a fancy parameter expansion anyway; you can match against the first character using regular expressions alone
[[ $var =~ ^[a-z] ]]
or pattern-matching
[[ $var = [a-z]* ]]
Regular expressions are not implicitly anchored, so you can use ^ to explicitly match the beginning of the string; the remainder of the string can be ignored.
Pattern matches are implicitly anchored to the start and end of the string, so you need * to match everything (if anything) that follows the initial character of the string.

How can I check if a variable is contains only letters

I tried to check the following case:
#!/bin/bash
line="abc"
if [[ "${line}" != [a-z] ]]; then
echo INVALID
fi
And I get INVALID as output. But why?
It's no check if $line contains only a characters in the range [a-z] ?
Use the regular expression matching operator =~:
#!/bin/bash
line="abc"
if [[ "${line}" =~ [^a-zA-Z] ]]; then
echo INVALID
fi
Works in any Bourne shell and wastes no pipes/forks:
case $var in
("") echo "empty";;
(*[!a-z]*) echo "contains a non-alphabetic";;
(*) echo "just alphabetics";;
esac
Use [!a-zA-Z] if you want to allow upper case as well.
Could you please try following and let me know if this helps you.
line="abc"
if echo "$line" | grep -i -q '^[a-z]*$'
then
echo "MATCHED."
else
echo "NOT-MATCHED."
fi
Pattern matches are anchored to the beginning and end of the string, so your code checks if $line is not a single lowercase character. You want to match an arbitrary sequence of lowercase characters, which you can do using extended patterns:
if [[ $line != #([a-z]) ]]; then
or using the regular-expression operator:
if ! [[ $line =~ ^[a-z]+$ ]]; then # there is no negative regex operator like Perl's !~
Why? Because != means "not equal", thats why. You tell bash to compare abc with [a-z]. They are not equal.
Try echo $line | grep -i -q -x '[a-z]*'.
The flag -i makes grep case insensitive.
The flag -x means match the whole line.
The flag -q means print nothing to stdout, just return 1 or 0.

what is if [[ ! $1 =~ ^# ]]; in unix

I am reading a shell script in Unix.
I have come upon the below line:
if [[ ! $1 =~ ^# ]];
I understand the part which is on left side of equal sign but what does ~^# means.
According to http://wiki.bash-hackers.org/syntax/ccmd/conditional_expression the =~ is:
<STRING> =~ <ERE> <STRING> is checked against the extended regular expression <ERE> - TRUE on a match
So the ^# is a extended regular expression. Since # is not a special character in extended regex. The meaning of the if checks that the string in $1 does not start with #. So on the command line
$ if [[ ! '#' =~ ^# ]]; then echo matches; else echo no match; fi
no match
$ if [[ ! 'b' =~ ^# ]]; then echo matches; else echo no match; fi
matches
The ~ allows to use POSIX regular expression matching (regex).
The ^ is a special character that evaluates the beginning of a line
In your case, ^# means a line beginning with #. Then your condition only takes care of lines that do not begin with #.
In shell scripting, lines beginning with # are comments, that are not evaluated as commands by the shell.

How can you search for a char in a string in bash?

If I have a string var="root/Desktop", how can I determine whether var var contains a '/' character?
Bash can match against regular expressions with =~, try:
[[ $var =~ "/" ]] && echo "contains a slash"
The following would work
[[ "$var" = */* ]]
The portable solution that works in any Bourne-heritage shell and needs no expensive forks or pipes:
case $var in
(*/*) printf 'Has a slash.\n';;
(*) printf 'No slash.\n';;
esac
echo "${var1}" | grep '/' should work.

Matching one of several possible characters in a string

In a bash (version 3.2.48) script I get a string that can be something like:
'XY'
' Y'
'YY'
etc
So, I have either an alphabetic character OR a space (first slot), then the relevant character (second slot). I tried some variation (without grep, sed, ...) like:
if [[ $string =~ ([[:space]]{1}|[[:alpha:]]{1})M ]]; then
and
if [[ $string =~ (\s{1}|.{1})M ]]; then
but my solutions did not always work correctly (matching correctly every combination).
This should work for you:
if [[ $string =~ [[:space:][:alpha:]]M ]]; then
if [[ ${string:1:1} == "M" ]]; then
echo Heureka
fi
or (if you want to do it with patterns)
if [[ $string =~ ([[:space:]]|[[:alpha:]])M ]]; then
echo Heureka
fi
or (even simpler)
if [[ $string == ?M ]]; then
echo Heureka
fi
Without using regular expressions, simply pattern matching is sufficient:
if [[ $string == [[::upper:]\ ]M ]]; then
echo match
fi
Given your example, you want [[:upper:]] rather than merely [[:alpha:]]

Resources