Bash check for http or https regex

Bash check for http or https regex - bash

I'm trying to check if a url starts with http|https and ends with jpg|png. I have searched, but the answers don't work for me.
I have this currently:
if [[ $url = ^https?://.*jpg ]]
then
wget -O webcam.jpg $url
fi
But its fails to wget. What am I doing wrong?

You have to use the =~ operator for regular-expression matching; = only does pattern matching. The equivalent pattern would be http?(s)://*jpg*. (Recent versions of bash always use extended patterns inside [[ ... ]]; older versions may require they be turned on explicitly with shopt -s extglob.)
(I added the trailing * to the pattern because patterns are anchored to both ends of the string by default, while regular expressions require ^ and $ explicitly. Since you did not have $ at the end of your regular expression, I made the pattern open at the end as well.)

Related

How to match version of a command using bash "=~"?

luajit -v
LuaJIT 2.1.0-beta3 -- Copyright (C) 2005-2017 Mike Pall. http://luajit.org/
I want to negate match the version part, and got LUAJIT_VERSION="2.1.0-beta3" at the beginning of bash script. I use:
if ! [[ "$(luajit -v)" =~ LuaJIT\s+"$LUAJIT_VERSION".* ]]; then
#rest of code
But it seems not working whether I put $LUAJIT_VERSION between "" or not:
Any part of the pattern may be quoted to force the quoted portion to be matched as a string ... If the pattern is stored in a shell variable, quoting the variable expansion forces the entire pattern to be matched as a string.
Bash docs
Can you tell me what's the correct way to do this task?

\s is not a recognized character class in bash; you need to use [[:blank:]] instead:
if ! [[ "$(luajit -v)" =~ LuaJIT[[:blank:]]+"$LUAJIT_VERSION" ]]; then
(The trailing .* isn't necessary, since regular expressions aren't anchored to the start or end of the string.)
However, it's not clear your regular expression needs to be that general. It looks like you can use a single, literal space
if ! [[ "$(luajit -v)" =~ LuaJIT\ "$LUAJIT_VERSION" ]];
or simply use pattern matching:
if [[ "$(luajit -v)" != LuaJIT\ "$LUAJIT_VERSION"* ]];

Regex word break not working in newer version of bash [duplicate]

I am trying to match on the presence of a word in a list before adding that word again (to avoid duplicates). I am using bash 4.2.24 and am trying the below:
[[ $foo =~ \bmyword\b ]]
also
[[ $foo =~ \<myword\> ]]
However, neither seem to work. They are mentioned in the bash docs example: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_01.html.
I presume I am doing something wrong but I am not sure what.

tl;dr
To be safe, do not use a regex literal with =~.
Instead, use:
either: an auxiliary variable - see #Eduardo Ivancec's answer.
or: a command substitution that outputs a string literal - see #ruakh's comment on #Eduardo Ivancec's answer
Note that both must be used unquoted as the =~ RHS.
Whether \b and \< / \> are supported at all depends on the host platform, not Bash:
they DO work on Linux,
but NOT on BSD-based platforms such as macOS; there, use [[:<:]] and [[:>:]] instead, which, in the context of an unquoted regex literal, must be escaped as [[:\<:]] and [[:\>:]]; the following works as expected, but only on BSD/macOS:
[[ ' myword ' =~ [[:\<:]]myword[[:\>:]] ]] && echo YES # OK
The problem wouldn't arise - on any platform - if you limited your regex to the constructs in the POSIX ERE (extended regular expression) specification.
Unfortunately, POSIX EREs do not support word-boundary assertions, though you can emulate them - see the last section.
As on macOS, no \-prefixed constructs are supported, so that handy character-class shortcuts such as \s and \w aren't available either.
However, the up-side is that such ERE-compliant regexes are then portable (work on both Linux and macOS, for instance)
=~ is the rare case (the only case?) of a built-in Bash feature whose behavior is platform-dependent: It uses the regex libraries of the platform it is running on, resulting in different regex flavors on different platforms.
Thus, it is generally non-trivial and requires extra care to write portable code that uses the =~ operator.
Sticking with POSIX EREs is the only robust approach, which means that you have to work around their limitations - see bottom section.
If you want to know more, read on.
On Bash v3.2+ (unless the compat31 shopt option is set), the RHS (right-hand side operand) of the =~ operator must be unquoted in order to be recognized as a regex (if you quote the right operand, =~ performs regular string comparison instead).
More accurately, at least the special regex characters and sequences must be unquoted, so it's OK and useful to quote those substrings that should be taken literally; e.g., [[ '*' =~ ^'*' ]] matches, because ^ is unquoted and thus correctly recognized as the start-of-string anchor, whereas *, which is normally a special regex char, matches literally due to the quoting.
However, there appears to be a design limitation in (at least) bash 3.x that prevents use of \-prefixed regex constructs (e.g., \<, \>, \b, \s, \w, ...) in a literal =~ RHS; the limitation affects Linux, whereas BSD/macOS versions are not affected, due to fundamentally not supporting any \-prefixed regex constructs:
# Linux only:
# PROBLEM (see details further below):
# Seen by the regex engine as: <word>
# The shell eats the '\' before the regex engine sees them.
[[ ' word ' =~ \<word\> ]] && echo MATCHES # !! DOES NOT MATCH
# Causes syntax error, because the shell considers the < unquoted.
# If you used \\bword\\b, the regex engine would see that as-is.
[[ ' word ' =~ \\<word\\> ]] && echo MATCHES # !! BREAKS
# Using the usual quoting rules doesn't work either:
# Seen by the regex engine as: \\<word\\> instead of \<word\>
[[ ' word ' =~ \\\<word\\\> ]] && echo MATCHES # !! DOES NOT MATCH
# WORKAROUNDS
# Aux. viarable.
re='\<word\>'; [[ ' word ' =~ $re ]] && echo MATCHES # OK
# Command substitution
[[ ' word ' =~ $(printf %s '\<word\>') ]] && echo MATCHES # OK
# Change option compat31, which then allows use of '...' as the RHS
# CAVEAT: Stays in effect until you reset it, may have other side effects.
# Using (...) around the command confines the effect to a subshell.
(shopt -s compat31; [[ ' word ' =~ '\<word\>' ]] && echo MATCHES) # OK
The problem:
Tip of the hat to Fólkvangr for his input.
A literal RHS of =~ is by design parsed differently than unquoted tokens as arguments, in an attempt to allow the user to focus on escaping characters just for the regex, without also having to worry about the usual shell escaping rules in unquoted tokens.
For instance,
[[ 'a[b' =~ a\[b ]] && echo MATCHES # OK
matches, because the \ is _passed through to the regex engine (that is, the regex engine too sees literal a\[b), whereas if you used the same unquoted token as a regular argument, the usual shell expansions applied to unquoted tokens would "eat" the \, because it is interpreted as a shell escape character:
$ printf %s a\[b
a[b # '\' was removed by the shell.
However, in the context of =~ this exceptional passing through of \ is only applied before characters that are regex metacharacters by themselves, as defined by the ERE (extended regular expressions) POSIX specification (in order to escape them for the regex, so that they're treated as literals:
\ ^ $ [ { . ? * + ( ) |
Conversely, these regex metacharacters may exceptionally be used unquoted - and indeed must be left unquoted to have their special regex meaning - even though most of them normally require \-escaping in unquoted tokens to prevent the shell from interpreting them.
Yet, a subset of the shell metacharacters do still need escaping, for the shell's sake, so as not to break the syntax of the [[ ... ]] conditional:
& ; < > space
Since these characters aren't also regex metacharacters, there is no need to also support escaping them on the regex side, so that, for instance, the regex engine seeing \& in the RHS as just & works fine.
For any other character preceded by \, the shell removes the \ before sending the string to the regex engine (as it does during normal shell expansion), which is unfortunate, because then even characters that the shell doesn't consider special cannot be passed as \<char> to the regex engine, because the shell invariably passes them as just <char>.
E.g, \b is invariably seen as just b by the regex engine.
It is therefore currently impossible to use a (by definition non-POSIX) regex construct in the form \<char> (e.g., \<, \>, \b, \s, \w, \d, ...) in a literal, unquoted =~ RHS, because no form of escaping can ensure that these constructs are seen by the regex engine as such, after parsing by the shell:
Since neither <, >, nor b are regex metacharacters, the shell removes the \ from \<, \>, \b (as happens in regular shell expansion). Therefore, passing \<word\>, for instance, makes the regex engine see <word>, which is not the intent:
[[ '<word>' =~ \<word\> ]] && echo YES matches, because the regex engine sees <word>.
[[ 'boo' =~ ^\boo ]] && echo YES matches, because the regex engine sees ^boo.
Trying \\<word\\> breaks the command, because the shell treats each \\ as an escaped \, which means that metacharacter < is then considered unquoted, causing a syntax error:
[[ ' word ' =~ \\<word\\> ]] && echo YES causes a syntax error.
This wouldn't happen with \\b, but \\b is passed through (due to the \ preceding a regex metachar, \), which also doesn't work:
[[ '\boo' =~ ^\\boo ]] && echo YES matches, because the regex engine sees \\boo, which matches literal \boo.
Trying \\\<word\\\> - which by normal shell expansion rules results in \<word\> (try printf %s \\\<word\\\>) - also doesn't work:
What happens is that the shell eats the \ in \< (ditto for \b and other \-prefixed sequences), and then passes the preceding \\ through to the regex engine as-is (again, because \ is preserved before a regex metachar):
[[ ' \<word\> ' =~ \\\<word\\\> ]] && echo YES matches, because the regex engine sees \\<word\\>, which matches literal \<word\>.
In short:
Bash's parsing of =~ RHS literals was designed with single-character regex metacharacters in mind, and does not support multi-character constructs that start with \, such as \<.
Because POSIX EREs support no such constructs, =~ works as designed if you limit yourself to such regexes.
However, even within this constraint the design is somewhat awkward, due to the need to mix regex-related and shell-related \-escaping (quoting).
Fólkvangr found the official design rationale in the Bash FAQ here, which, however, neither addresses said awkwardness nor the lack of support for (invariably non-POSIX) \<char> regex constructs; it does mention using an aux. variable as a workaround, however, although only with respect to making it easier to represent whitespace.
All these parsing problems go away if the string that the regex engine should see is provided via a variable or via the output from a command substitution, as demonstrated above.
Optional reading: A portable emulation of word-boundary assertions with POSIX-compliant EREs (extended regular expressions):
(^|[^[:alnum:]_]) instead of \< / [[:<:]]
([^[:alnum:]_]|$) instead of \> / [[:>:]]
Note: \b can't be emulated with a SINGLE expression - use the above in the appropriate places.
The potential caveat is that the above expressions will also capture the non-word character being matched, whereas true assertions such as \< / [[:<:]] and do not.
$foo = 'myword'
[[ $foo =~ (^|[^[:alnum:]_])myword([^[:alnum:]_]|$) ]] && echo YES
The above matches, as expected.

Yes, all the listed regex extensions are supported but you'll have better luck putting the pattern in a variable before using it. Try this:
re=\\bmyword\\b
[[ $foo =~ $re ]]
Digging around I found this question, whose answers seems to explain why the behaviour changes when the regex is written inline as in your example.
Editor's note: The linked question does not explain the OP's problem; it merely explains how starting with Bash version 3.2 regexes (or at least the special regex chars.) must by default be unquoted to be treated as such - which is exactly what the OP attempted.
However, the workarounds in this answer are effective.
You'll probably have to rewrite your tests so as to use a temporary variable for your regexes, or use the 3.1 compatibility mode:
shopt -s compat31

Not exactly "\b", but for me more readable (and portable) than the other suggestions:
[[ $foo =~ (^| )myword($| ) ]]

The accepted answer focuses on using auxiliary variables to deal with the syntax oddities of regular expressions in Bash's [[ ... ]] expressions. Very good info.
However, the real answer is:
\b \< and \> do not work on OS X 10.11.5 (El Capitan) with bash version 4.3.42(1)-release (x86_64-apple-darwin15.0.0).
Instead, use [[:<:]] and [[:>:]].

Tangential to your question, but if you can use grep -E (or egrep, its effective, but obsolescent alias) in your script:
if grep -q -E "\b${myword}\b" <<<"$foo"; then
I ended up using this after flailing with bash's =~.
Note that while regex constructs \<, \>, and \b are not POSIX-compliant, both the BSD (macOS) and GNU (Linux) implementations of grep -E support them, which makes this approach widely usable in practice.
Small caveat (not an issue in the case at hand): By not using =~, you lose the ability to inspect capturing subexpressions (capture groups) via ${BASH_REMATCH[#]} later.

I've used the following to match word boundaries on older systems. The key is to wrap $foo with spaces since [^[:alpha:]] will not match words at the beginning or end of the list.
[[ " $foo " =~ [^[:alpha:]]myword[^[:alpha:]] ]]
Tweak the character class as needed based on the expected contents of myword, otherwise this may not be good solution.

This worked for me
bar='\<myword\>'
[[ $foo =~ $bar ]]

You can use grep, which is more portable than bash's regexp like this:
if echo $foo | grep -q '\<myword\>'; then
echo "MATCH";
else
echo "NO MATCH";
fi

Can you match numbers using shell parameter expansion?

I'm trying to remove some numbers from end of a string using parameter expansion, like:
export ENV=dev12
echo ${ENV##[0-9]+}
But it doesn't work and I can't find anything on google on how to do this? Anyone know?

Parameter expansion uses glob syntax, not the more canonical regex syntax that grep and other tools use. You also have to use %% since ## is for prefixes.
There's no plain glob equivalent for what you want to do, but since you're using bash you can can enable extglob and use +([0-9]):
shopt -s extglob
ENV=dev12
echo ${ENV%%+([0-9])}

Using BASH regex you can do:
str='dev12'
[[ $str =~ ^(.*[^[:digit:]])[[:digit:]]+$ ]] && echo "${BASH_REMATCH[1]}"
dev

Trying to use wildcards in bash conditional statement/case mixed with exact alpha char and failing

Essentially, I'm testing a variable to ensure it's contents matches a specific time format: 4 digits, am/pm/AM/PM, no spaces (i.e. 1204pm). I've gotten this much to work:
tm0=1204pm; [[ $tm0 == [0-2###aApP]* ]] && echo PASS
or
tm0=1203pm; case $tm0 in [0-2###apAP]*) echo PASS; esac
But when I try to specify the last character as "m" (Originally I was trying for [Mm] but that didn't work either) it fails.
tm0=1204pm; [[ $tm0 == [0-2###aApP]m ]] && echo PASS
Any help, please and thanks.

Using globs:
[[ $tm0 == [01][0-9][0-5][0-9][aApP][mM] ]]
Note that this will validate, e.g., 1900pm. If you don't want that:
[[ $tm0 == #(0[0-9]|1[0-2])[0-5][0-9][aApP][mM] ]]
This uses extended globs. Note that you don't need shopt -s extglob to use extended globs inside [[ ... ]]: in section Condition Constructs, for the doc about [[ ... ]] you can read:
When the == and != operators are used, the string to the right of the operator is considered a pattern and matched according to the rules described below in Pattern Matching, as if the extglob shell option were enabled.
To use this feature in a case statement, you need to enable extglob.
Using regex:
[[ $tm0 =~ ^(0[0-9]|1[0-2])([0-5][0-9])([aApP][mM])$ ]]
With these groupings, you get the hour in BASH_REMATCH[0], the minutes in BASH_REMATCH[1] and the am/pm in BASH_REMATCH[2].

bash patterns are not regular expressions. They are also not Java patterns, which I think is what you're trying to use (although it's not at all clear).
You can (and should) read the bash manual chapter on pattern matching, which is a complete list of pattern features. In that, you will see that:
[...] matches a single character which is one of the characters in the enclosed character class description
* matches any number of arbitrary characters
So [0-2###apAP]* matches one of the characters 0, 1, 2, #, a,p, A, or P, followed by any number of characters (including 0).
What I think you are looking for is:
[01][0-9][0-5][0-9][aApP][mM]
although that is slightly generous since it will match, for example, 1300pm ("It was a bright cold day in April, and the clocks were striking thirteen.")

How to use [[ == ]] properly to match a glob?

Bash's manpage teaches that [[ == ]] matches patterns. In Bash therefore, why does the following not print matched?
Z=abc; [[ "$Z" == 'a*' ]] && echo 'matched'
The following however does indeed print matched:
Z=abc; [[ "$Z" == a* ]] && echo 'matched'
Isn't this exactly backward? Why does the a*, without the quotes, not immediately expand to list whatever filenames happen to begin with the letter a in the current directory? And besides, why doesn't the quoted 'a*' work in any case?

Glob pattern must not be quoted to make it work.
This should also work with just glob pattern out of quote whereas static text is still qupted:
[[ "$Z" == "a"* ]] && echo 'matched'
matched
[[ "$Z" == "ab"* ]] && echo 'matched'
matched
Explanation snippet from man page:
When the == and != operators are used, the string to the right of
the operator is considered a pattern and matched according to the
rules described below under Pattern Matching. If the shell option
nocasematch is enabled, the match is performed without regard to
the case of alphabetic characters. The return value is 0 if the
string matches (==) or does not match (!=) the pattern, and 1
otherwise. Any part of the pattern may be quoted to force it to be
matched as a string.
Additionally, one of the reasons to use [[ over [ is that [[ is a shell built-in and thus can have its own syntax and doesn't need to follow the normal expansion rules (which is why the arguments to [[ aren't subject to word-splitting for example).

While the existing answer is correct, I don't believe that it tells the full story.
Globs have two uses. There is a difference in behaviour between globs inside a [[ ]] construct which test the contents of a variable against a pattern and other globs, which expand to list a range of files. In either case, if you put quotes around character, it will be interpreted literally and not expanded.
It is also worth mentioning that the variable on the left hand side doesn't need to be quoted after the [[, so you could write your code like this:
Z=abc; [[ $Z == a* ]] && echo 'matched'
It is also possible to use a single = but the == looks more familiar to those coming from other coding backgrounds, so personally I prefer to use it in bash as well. As mentioned in the comments, the single = is the more widely compatible, as it is used to test string equality in all of POSIX-compliant shells, e.g. [ "$a" = "abc" ]. For this reason you may prefer to use it in bash as well.
As always, Greg's wiki contains some good information on the subject of pattern matching in bash.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Bash check for http or https regex - bash

I'm trying to check if a url starts with http|https and ends with jpg|png. I have searched, but the answers don't work for me. I have this currently: if [[ $url = ^https?://.*jpg ]] then wget -O webcam.jpg $url fi But its fails to wget. What am I doing wrong?

Related

How to match version of a command using bash "=~"?

Regex word break not working in newer version of bash [duplicate]

Can you match numbers using shell parameter expansion?

Trying to use wildcards in bash conditional statement/case mixed with exact alpha char and failing

How to use [[ == ]] properly to match a glob?

Categories

Resources