Match all email addresses belonging to a specific domain and its subdomains

Match all email addresses belonging to a specific domain and its subdomains - ruby

I am looking to match all email addresses from a specific domain.
Any email coming from example.com or foo.example.com should match, everything else should be rejected. To do this, I could do some basic string matching to check if the given string ends with, or contains, example.com which would work fine but it also means that something like fooexample.com will pass.
Hence, based on the above requirements, I started working on a pattern that would pass the domain and its sub-domain. I was able to come up with the following regex pattern:
`/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.example.com\b/i`
This only matched subdomains, but I have seen the pattern at "How to match all email addresses at a specific domain using regex?" which handles the main domain.
Is there a way to combine these two into something that works for any address from example.com.

How about
/\b(?:(?![_.-])(?!.*[_.-]{2})[a-z0-9_.-]+(?<![_.-]))#(?:(?!-)(?!.*--)[a-z0-9-]+(?<!-)\.)*example\.com\b/i

This one would also match 'tagged' and 'tagged-subdomain' mails like a+b#example.com and a+b#i.example.com
(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#(?:(?!-)(?!.*--)[a-z0-9-]+(?<!-)\.)*example\.com\b
Hope it helps you

I'd recommend reading "Stop Validating Email Addresses With Your Complex Regex".
From that point, I'd look for:
/#.*\bexample\.com/
For instance:
%w[foo#example.com foo#barexample.com foo#subdomain.example.com].grep(/#.*\bexample\.com/)
=> ["foo#example.com", "foo#subdomain.example.com"]
It's too easy to end up with a regex that is a maintenance nightmare, and that doesn't accomplish what you need. I highly recommend keeping it simple.

Related

MS Access email validation rule fails

I have used this rule ((Like "*?#?*.?*") And (Not Like "*[ ,;]*")) in MS Access for email validation it's working fine, but when I type this email#youdomain.com###hello it also accepts more # signs how to solve this? The rule is taken from here

You can't reliably validate e-mail addresses using an Access SQL statement or regex for that matter, see this for an example of a regex that still only works on prepared mail addresses, and Access SQL is substantially more limited than regex for text pattern matching.
However, fixing this specific issue is easy:
Just add Not Like "*#*#*" to your statement to disallow multiple # charactes:
((Like "*?#?*.?*") And (Not Like "*[ ,;]*")) And Not Like "*#*#*"

Check if a regex is a subset of another or equal

I have a page where a user can add an IP address to a whitelist, whose format is verified if it is a valid IP.
I'd like to add functionality so that regex's can also be input. I would like to verify that the regex matches a valid IP address (ie. the regex entered by the user is a subset of the regex that is specified in the code).
IP_Regex: ^(?:[0-9]{1,3}\.){3}[0-9]{1,3}$
Example: A user must input a string matching the specifications of IP_Regex (such as 10.111.111.111) or a subset of it (such as 12(?>\.\d{1,3}){3})
I'm not sure how to go about this. Most posts seem to just cite math theory but don't mention how to go about this when programming.

I don't think it is dangerous to allow your users to input regexes, so you don't have to be 100% accurate.
Therefore I would randomly generate some slightly invalid ips and make sure the regexes fail on those.

Documentation for SpamAssassin rules (HTML_30_40)

I'd like to refine the password reset mails which are sent by my web application to avoid them to be mistaken as spam; a customer forwarded a mail header to me which contains several SpamAssassin rule names.
Some of the rules I could find, e.g. BAYES_40, but others I couldn't find there; those are:
HTML_30_40
TO_NO_BRKTS_HTML_ONLY
TO_NO_BRKTS_NORDNS
TO_NO_BRKTS_NORDNS_HTML
What do these rules mean; are there documentation pages somewhere?
The SpamAssassin which reported them is version 3.3.2; the latest version as of now is 3.4.1. Do those rules still exist?

The HTML_30_40 rule is no longer included in SpamAssassin, but if I remember correctly it was some test that concluded the email consisted of 30-40% HTML codes. Why that has any relevance for spam filtering I cannot see, and probably that is why it is no longer present.. :)
Those other rules still exist in SpamAssassin version 3.4.1. There is no explicit documentation per rule, other than an occasional comment or description along the rule implementation itself:
describe TO_NO_BRKTS_HTML_ONLY To: misformatted and HTML only
describe TO_NO_BRKTS_NORDNS_HTML To: misformatted and no rDNS and HTML only
You are probably sending emails from an ip-address with no reverse-DNS name, and the To: line is poorly formatted. Things should improve significantly if you get the DNS problems fixed (or relay the emails via your ISP) and format the To: line in the email properly, e.g.
To: "J Random User" <jrnd#email>

Trying to figure out spamassassin globbing rules

How do the globbing rules work for spamassassin work? I've looked at the docs, but they are not clear as to whether sub-domains are included in a whitelist rule. For example, does:
whitelist_from *#somewhere.com
also whitelist addresses from subdomain.somewhere.com? This seems not to be the case, as subdomains are still labeled as spam, if they fail checking.
Should I use something like this:
whitelist_from *#*.somewhere.com
I've added this to some addresses to find out and it passes spamassassin --lint, but it may be a while before I get another email from one of those subdomain, so I thought I've just as here.
Thanks

I eventually found the answer. I can use the whitelist_from_rcvd directive instead.

Check well formatted email address

I have a text file of e-mails like this:
10:info#example.com;dev#example.com
12:john#host.com; "George <g.top#host.com>"
43:jim.p#web.com.;sue-allen#web.com
...
I want to check whether the list contains well formatted entries. Do you know any tool or web-service to check and give me a list of invalid addresses?
Update Dear all, thank you for your input. I was really looking for a basic syntax check, so I will stay with Rafe's idea (I will do it with Java).

Read this so you are doing it the RFC compliant way:
http://www.eph.co.uk/resources/email-address-length-faq/

Probably the simplest way to validate an email is to send a message to it. As Sean points out this can leave you open to DoS attacks, but from your description it seems you have a text file rather than a web page, so this shouldn't be a problem.
Regular expressions are not a good tool for matching emails, there are a lot of valid addresses that naive matching will fail. Check out this comparison of attempts to validate emails with regex for details.
If you have to check them offline, I would split the email into parts (i.e. the parts before the # and after the #), you could then create a custom validator (or regex) to validate those parts.

Email validation is not as simple as a regular expression
First, I would read this article I Knew How To Validate An Email Address Until I Read The RFC.
Back in the days of yore, you could just connect to the user's mail server and use the VRFY command and verify that an email address was valid, but spammers abused that privilege and we all lost out.
Now, I would recommend a three part approach:
Verify the syntactic validity. You can use the monster regex from the Mail perl module to check to make sure that the email address is well formed. Then make sure to blacklist localhost domains/ips as part of your check.
Verify that the domain is live. Do a DNS validation check on the domain. You could take this one step further and use a STMP check and make sure that you can connect to a valid mailserver for the domain. However, there may be some false negative results due to virtual hosting schemes.
Send an actual email, but include a single image that links to a script on your server. When the email is read with the image, your server will be notified that the image was download and hence the email is alive and valid. However, nowadays many email clients do not load images by default for this very reason, so it won't be 100% effective.
Resources
Validating Email Addresses in ASP (online)
Validating Email Addresses in PHP (code examples)
This commercial product does bulk email verification ← This is probably what you are looking for
SO Question: How to check if an email address exists without sending an-email

I wrote a simple Perl script that uses the Email::Address module to validate these addresses:
#!/usr/bin/env perl
use Email::Address;
while (<>) {
chomp;
#addresses = split /\;/;
foreach my $address (#addresses) {
if (!Email::Address->parse($address)) {
print $address, "\n";
}
}
}
You'll just need to install the module. Its home page is:
http://emailproject.perl.org/wiki/Email::Address

This problem is harder than it appears. When faced with it, I stole the code from the mf.c module in the NMH sources. I then imported the address parser into Lua so I could handle email addresses from scripts.
Using somebody else's code saved me a world of pain.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio