Regex - Validate Email Domain and Full email - ruby

Here is the regex that I have:
\Ame\..*$
And I want it to match on:
me.com
me.ca
Bill#me.com
Bill.Smith#me.com
It also must not match on:
me.you#mean.com
me.you#foo
Currently it only matches the domain and not the full email.
I am using ruby for this.
I have been using http://rubular.com/ to try and solve this.

The following works, if I understand your requirements correctly:
\bme\.[^.#]*\z
Explanation:
\b # Match the start of a word
me # Match "me"
\. # Match "."
[^.#]* # Match any string unless it contains a "." or a "#"
\z # Match the end of the string
(I used \z instead of $ as I did on the Rubular example because that also matches the end of a line).

Related

chef inspec output consists of error due to regex

When executing the below chef inspec command getting error.
describe command ("cat sql.conf | grep 'log_filename'") do
its('stdout') {should match (/^'sql-(\d)+.log'/)}
end
Expected pattern matching is sql-20201212.log. pls check.
This regex /^'sql-(\d)+.log'/ doesn't match this string sql-20201212.log. You can try it out on https://regexr.com/
There are a few problems with your regex:
' is in your regex but not in your string
. matches any character expect line breaks, perhaps you want to match only a dot(?), if so, then you'd need to e.g. escape it \.
you probably don't need to have \d in a group (())
So, this regex ^sql-\d+\.log$ would match sql-20201212.log string. I also added $ to match the end of the string.

Why am I not able to match multiple lines with this regex on rubular?

I'm working with the following regex (taken from the devise.rb file that devise generates):
\A[^#\s]+#[^#\s]+\z
Usually, when I'm learning about a regex I use rubular. For example, if I wanted to learn about the regex /.a./, I would set up my workspace as shown here:
Notice how I'm using multiple examples:
foo
bar
baz
And rubular is giving me feedback that both bar and baz match.
Now I'd like to learn about the regex that devise generates: /\A[^#\s]+#[^#\s]+\z/. So I set up my rubular workspace as shown here here:
There isn't a match. It's because I have two examples:
foo#foo.com
cats#cat.com
But I was expecting them both to match. Why aren't both test strings matching?
This is because the regex /\A[^#\s]+#[^#\s]+\z/ is matching the start of the string with \A and end of the string with \z.
If you remove both \A and \z and instead try to match /[^#\s]+#[^#\s]+/ then it will match both email addresses as shown here:
Also, it's worth mentioning that the start and end of a string is different from the start and end of a line. Each are represented by four different patterns shown below and also on rubular in the Regex quick reference:
^ - Start of line
$ - End of line
\A - Start of string
\z - End of string
There can be multiple lines in a string; however, a single string goes from \A to \z. So to continue with this multiple email example. Replacing the start and end of a string patterns with the start and end of a line patterns to get: /^[^#\s]+#[^#\s]+$/ will also match, shown below and on rubular:

Regex match anything except ending string

I'm trying to make a regex that matches anything except an exact ending string, in this case, the extension '.exe'.
Examples for a file named:
'foo' (no extension) I want to get 'foo'
'foo.bar' I want to get 'foo.bar'
'foo.exe.bar' I want to get 'foo.exe.bar'
'foo.exe1' I want to get 'foo.exe1'
'foo.bar.exe' I want to get 'foo.bar'
'foo.exe' I want to get 'foo'
So far I created the regex /.*\.(?!exe$)[^.]*/
but it doesn't work for cases 1 and 6.
You can use a positive lookahead.
^.+?(?=\.exe$|$)
^ start of string
.+? non greedily match one or more characters...
(?=\.exe$|$) until literal .exe occurs at end. If not, match end.
See demo at Rubular.com
Wouldn't a simple replacement work?
string.sub(/\.exe\z/, "")
Do you mean regex matching or capturing?
There may be a regex only answer, but it currently eludes me. Based on your test data and what you want to match, doing something like the following would cover both what you want to match and capture:
name = 'foo.bar.exe'
match = /(.*).exe$/.match(name)
if match == nil
# then this filename matches your conditions
print name
else
# otherwise match[1] is the capture - filename without .exe extension
print match[1]
end
string pattern = #" (?x) (.* (?= \.exe$ )) | ((?=.*\.exe).*)";
First match is a positive look-ahead that checks if your string
ends with .exe. The condition is not included in the match.
Second match is a positive look-ahead with the condition included in the
match. It only checks if you have something followed by .exe.
(?x) is means that white spaces inside the pattern string are ignored.
Or don't use (?x) and just delete all white spaces.
It works for all the 6 scenarios provided.

Ruby advanced gsub

I've got a string like this one below:
My first LINK
and my second LINK
How do I substitute all the links in this string from href="URL" to href="/redirect?url=URL" so that it becomes
My first LINK
and my second LINK
Thanks!
Given your case we can construct following regex:
re = /
href= # Match attribute we are looking for
[\'"]? # Optionally match opening single or double quote
\K # Forget previous matches, as we dont really need it
([^\'" >]+) # Capture group of characters except quotes, space and close bracket
/x
Now you can replace captured group with string you need (use \1 to refer a group):
str.gsub(re, '/redirect?url=\1')
gsub allows you to match regex patterns and use captured substrings in the substitution:
x = <<-EOS
My first LINK
and my second LINK
EOS
x.gsub(/"(.*)"/, '"/redirect?url=\1"') # the \1 refers to the stuff captured
# by the (.*)

Difference between \A \z and ^ $ in Ruby regular expressions

In the documentation I read:
Use \A and \z to match the start and end of the string, ^ and $ match the start/end of a line.
I am going to apply a regular expression to check username (or e-mail is the same) submitted by user. Which expression should I use with validates_format_of in model? I can't understand the difference: I've always used ^ and $ ...
If you're depending on the regular expression for validation, you always want to use \A and \z. ^ and $ will only match up until a newline character, which means they could use an email like me#example.com\n<script>dangerous_stuff();</script> and still have it validate, since the regex only sees everything before the \n.
My recommendation would just be completely stripping new lines from a username or email beforehand, since there's pretty much no legitimate reason for one. Then you can safely use EITHER \A \z or ^ $.
According to Pickaxe:
^
Matches the beginning of a line.
$
Matches the end of a line.
\A
Matches the beginning of the string.
\z
Matches the end of the string.
\Z
Matches the end of the string unless the string ends with a "\n", in which case it matches just before the "\n".
So, use \A and lowercase \z. If you use \Z someone could sneak in a newline character. This is not dangerous I think, but might screw up algorithms that assume that there's no whitespace in the string. Depending on your regex and string-length constraints someone could use an invisible name with just a newline character.
JavaScript's implementation of Regex treats \A as a literal 'A' (ref). So watch yourself out there and test.
Difference By Example
/^foo$/ matches any of the following, /\Afoo\z/ does not:
whatever1
foo
whatever2
foo
whatever2
whatever1
foo
/^foo$/ and /\Afoo\z/ all match the following:
foo
The start and end of a string may not necessarily be the same thing as the start and end of a line. Imagine if you used the following as your test string:
my
name
is
Andrew
Notice that the string has many lines in it - the ^ and $ characters allow you to match the beginning and end of those lines (basically treating the \n character as a delimeter) while \A and \Z allow you to match the beginning and end of the entire string.

Resources