Check well formatted email address - validation

I have a text file of e-mails like this:
10:info#example.com;dev#example.com
12:john#host.com; "George <g.top#host.com>"
43:jim.p#web.com.;sue-allen#web.com
...
I want to check whether the list contains well formatted entries. Do you know any tool or web-service to check and give me a list of invalid addresses?
Update Dear all, thank you for your input. I was really looking for a basic syntax check, so I will stay with Rafe's idea (I will do it with Java).

Read this so you are doing it the RFC compliant way:
http://www.eph.co.uk/resources/email-address-length-faq/

Probably the simplest way to validate an email is to send a message to it. As Sean points out this can leave you open to DoS attacks, but from your description it seems you have a text file rather than a web page, so this shouldn't be a problem.
Regular expressions are not a good tool for matching emails, there are a lot of valid addresses that naive matching will fail. Check out this comparison of attempts to validate emails with regex for details.
If you have to check them offline, I would split the email into parts (i.e. the parts before the # and after the #), you could then create a custom validator (or regex) to validate those parts.

Email validation is not as simple as a regular expression
First, I would read this article I Knew How To Validate An Email Address Until I Read The RFC.
Back in the days of yore, you could just connect to the user's mail server and use the VRFY command and verify that an email address was valid, but spammers abused that privilege and we all lost out.
Now, I would recommend a three part approach:
Verify the syntactic validity. You can use the monster regex from the Mail perl module to check to make sure that the email address is well formed. Then make sure to blacklist localhost domains/ips as part of your check.
Verify that the domain is live. Do a DNS validation check on the domain. You could take this one step further and use a STMP check and make sure that you can connect to a valid mailserver for the domain. However, there may be some false negative results due to virtual hosting schemes.
Send an actual email, but include a single image that links to a script on your server. When the email is read with the image, your server will be notified that the image was download and hence the email is alive and valid. However, nowadays many email clients do not load images by default for this very reason, so it won't be 100% effective.
Resources
Validating Email Addresses in ASP (online)
Validating Email Addresses in PHP (code examples)
This commercial product does bulk email verification ← This is probably what you are looking for
SO Question: How to check if an email address exists without sending an-email

I wrote a simple Perl script that uses the Email::Address module to validate these addresses:
#!/usr/bin/env perl
use Email::Address;
while (<>) {
chomp;
#addresses = split /\;/;
foreach my $address (#addresses) {
if (!Email::Address->parse($address)) {
print $address, "\n";
}
}
}
You'll just need to install the module. Its home page is:
http://emailproject.perl.org/wiki/Email::Address

This problem is harder than it appears. When faced with it, I stole the code from the mf.c module in the NMH sources. I then imported the address parser into Lua so I could handle email addresses from scripts.
Using somebody else's code saved me a world of pain.

Related

MS Access email validation rule fails

I have used this rule ((Like "*?#?*.?*") And (Not Like "*[ ,;]*")) in MS Access for email validation it's working fine, but when I type this email#youdomain.com###hello it also accepts more # signs how to solve this? The rule is taken from here
You can't reliably validate e-mail addresses using an Access SQL statement or regex for that matter, see this for an example of a regex that still only works on prepared mail addresses, and Access SQL is substantially more limited than regex for text pattern matching.
However, fixing this specific issue is easy:
Just add Not Like "*#*#*" to your statement to disallow multiple # charactes:
((Like "*?#?*.?*") And (Not Like "*[ ,;]*")) And Not Like "*#*#*"

Security Code generation's algorithm

Alright, here's the story:
I'm getting married soon, and I'd like to create a website (or an app).
Obviously, I'd like that only guests could access to it.
So I was thinking about a system where it would require a security code to sign up.
The problem is that I do not trust anyone not to be silent about the code, so I was thinking about giving a different code for every couple (or family) of invited people.
On the sign up form, I would then verify that the entered code has not already been used.
But since I don't know who will sign up to the app, and I don't really have time to manually register each guest, I won't have a database with what code has been provided to whom information.
So, I need an algorithm to generate a random security code, and the reversed one, to check if a given string is a validate security code
I need the algorithm to be complex enough so people could not guess what's the magic behing the code they received. (I know, it feels pretty paranoid)
The generated Securiy Code should be pretty simple, like 6 to 8 characters (mix of digits, upper and lower case letters)
The main issue is that I have no clue how to perform a reliable system to generate and validate a security codes.
I feel like I should have a secret key stored on the server side, that would be necessary to generate a code, and I would have to find it back if a given string is a valid code.
Let's say secret is my private key.
The generation algorithm would be something like secret + whatever = generated code (where the + whatever operation remains to define).
But then how could I check a given string? string - whatever =? secret would be the solution (where - whatever is the reverses operation of + whatever).
Well, I actually have no clue of what whatever could (or should) be.
Do you have any advice or guidance ?
For the technical part, I will probably code this in JS (with a NodeJS server).
But as I'm talking about the concept of security code generation, any pseudo-code will do the job.
Generate a hash of the person's email address (capitalized) and make the code the first n-characters. So, for example, if your email address is TOUPYE#GMAIL.COM then the SHA-256 hash would be: 038122aedbf777b8c7c3aaed14ae7c08249a9d47f82f4455a0d667cacc57d383 so your code would be "038122". Generate a list of codes for each person/family. If someone has no email address use the telephone number. If they do not have a telephone, use their address.

Documentation for SpamAssassin rules (HTML_30_40)

I'd like to refine the password reset mails which are sent by my web application to avoid them to be mistaken as spam; a customer forwarded a mail header to me which contains several SpamAssassin rule names.
Some of the rules I could find, e.g. BAYES_40, but others I couldn't find there; those are:
HTML_30_40
TO_NO_BRKTS_HTML_ONLY
TO_NO_BRKTS_NORDNS
TO_NO_BRKTS_NORDNS_HTML
What do these rules mean; are there documentation pages somewhere?
The SpamAssassin which reported them is version 3.3.2; the latest version as of now is 3.4.1. Do those rules still exist?
The HTML_30_40 rule is no longer included in SpamAssassin, but if I remember correctly it was some test that concluded the email consisted of 30-40% HTML codes. Why that has any relevance for spam filtering I cannot see, and probably that is why it is no longer present.. :)
Those other rules still exist in SpamAssassin version 3.4.1. There is no explicit documentation per rule, other than an occasional comment or description along the rule implementation itself:
describe TO_NO_BRKTS_HTML_ONLY To: misformatted and HTML only
describe TO_NO_BRKTS_NORDNS_HTML To: misformatted and no rDNS and HTML only
You are probably sending emails from an ip-address with no reverse-DNS name, and the To: line is poorly formatted. Things should improve significantly if you get the DNS problems fixed (or relay the emails via your ISP) and format the To: line in the email properly, e.g.
To: "J Random User" <jrnd#email>

Match all email addresses belonging to a specific domain and its subdomains

I am looking to match all email addresses from a specific domain.
Any email coming from example.com or foo.example.com should match, everything else should be rejected. To do this, I could do some basic string matching to check if the given string ends with, or contains, example.com which would work fine but it also means that something like fooexample.com will pass.
Hence, based on the above requirements, I started working on a pattern that would pass the domain and its sub-domain. I was able to come up with the following regex pattern:
`/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.example.com\b/i`
This only matched subdomains, but I have seen the pattern at "How to match all email addresses at a specific domain using regex?" which handles the main domain.
Is there a way to combine these two into something that works for any address from example.com.
How about
/\b(?:(?![_.-])(?!.*[_.-]{2})[a-z0-9_.-]+(?<![_.-]))#(?:(?!-)(?!.*--)[a-z0-9-]+(?<!-)\.)*example\.com\b/i
This one would also match 'tagged' and 'tagged-subdomain' mails like a+b#example.com and a+b#i.example.com
(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+#(?:(?!-)(?!.*--)[a-z0-9-]+(?<!-)\.)*example\.com\b
Hope it helps you
I'd recommend reading "Stop Validating Email Addresses With Your Complex Regex".
From that point, I'd look for:
/#.*\bexample\.com/
For instance:
%w[foo#example.com foo#barexample.com foo#subdomain.example.com].grep(/#.*\bexample\.com/)
=> ["foo#example.com", "foo#subdomain.example.com"]
It's too easy to end up with a regex that is a maintenance nightmare, and that doesn't accomplish what you need. I highly recommend keeping it simple.

URI encoding in Yahoo mail compose link

I have link generating web app. I'd like to make it easy for users to email the links they create to others using gmail, yahoo mail, etc. Yahoo mail has a particular quirk that I need a workaround for.
If you have a Yahoo mail account, please follow this link:
http://compose.mail.yahoo.com/?body=http%3A%2F%2Flocalhost%3A8000%2Fpath%23anchor
Notice that yahoo redirects to a specific mail server (e.g. http://us.mc431.mail.yahoo.com/mc/compose). As it does, it decodes the hex codes. One of them, %23, is a hash symbol which is not legal in a query string parameter value. All info after %23 is lost.
All my links are broken, and just using another character is not an option.
Calling us.mc431.yahoo.com directly works for me, but probably not for all users, depending on their location.
I've tried setting html=true|false, putting the URL in a html tag. Nothing works. Anyone got a reliable workaround for this particular quirk?
Note: any server-based workaround is a non-starter for me. This has to be a link that's just between Yahoo and the end-user.
Thanks
Here is how i do it:
run a window.escape on those chars: & ' " # > < \
run a encodeURIComponent on the full string
it works for most of my case. though newline (\n) is still an issue, but I replace \n with space in my case and it worked fine.
I have been dealing with the same problem the last couple of hours and I found a workaround!
If you double-encode the anchor it will be interpreted correctly by Yahoo. That means change %23 to %2523 (the percent-sign is %25 encoded).
So your URI will be:
http://compose.mail.yahoo.com/?body=http%3A%2F%2Flocalhost%3A8000%2Fpath%2523anchor
The same workaround can be used for ampersand. If you only encode that as %26, then Yahoo will convert that to "&" which will discard the rest of message. Same procedure as above - change %26 to %2526.
I still haven't found a solution to the newline-problem though (%0D and %0A).
For the newline, add the newline as < BR > and double encode it also, it is interpreted successfully as new line in the new message
I think you're at the mercy of what Yahoo's server does when it issues the HTTP redirect. It seems like it should preserve the URL escaping on the redirect, but isn't. However, without knowledge of their underlying application, it's hard to say why it wouldn't. Perhaps, it's just an unintended side effect (or bug), or perhaps some of the Javascript features on that page require them to do some finagling with the hash tag.

Resources