Validation of email addresses. What a painful subject.
About a year ago I looked into the possibility of using SMTP to validate email addresses. As it turns out, it’s not perfect (a lot of false-negatives) but it’s very reliable (I’ve failed to get any false-positives).
The following code outlines how this works (written in PHP, because I was thinking about it in the context of web-apps):
<?php
function validate_email_regex ($email) {
return preg_match ('/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}\b/i', $email);
}
function validate_email_smtp ($email) {
if (!validate_email_regex ($email)) return false;
list ($user, $domain) = split ('@', $email);
if (!getmxrr ($domain, $mxhosts)) return false;
$i = 0;
$fp = false;
$thishost = isset ($_SERVER['SERVER_NAME']) ? $_SERVER['SERVER_NAME'] : exec ("hostname");
foreach ($mxhosts as $mxhost) {
if (++$i > 3) break; // don't check more than 3 MX records
if ($fp) fclose ($fp); // cleanup from last iteration
$fp = fsockopen ("tcp://$mxhost", 25, $errno, $errstr, 10);
if (!$fp) continue;
if (!$str = fgets ($fp)) continue;
if (substr ($str, 0, 3) != '220') continue;
fwrite ($fp, "HELO $thishost\r\n");
$str = fgets ($fp);
if (substr ($str, 0, 3) != '250') continue;
fwrite ($fp, "MAIL FROM:<mail@$thishost>\r\n");
$str = fgets ($fp);
if (substr ($str, 0, 3) != '250') continue;
fwrite ($fp, "RCPT TO:<$email>\r\n");
$str = fgets ($fp);
if (substr ($str, 0, 3) != '250') {
fclose ($fp);
return false; /* don't proceed with other MX records because of a bad user */
}
fclose ($fp);
break;
}
return true;
}
?>
Good idea, Matt. Although it would require a connection to be made for each email address which may not be ideal depending on your level of traffic.
In general I think with email address validation the best solution is to not do it: there are so many ways of typing out an email address wrongly that there’s just no point.
If you really need an email address to be valid, send an email to that address with a validation link. Just my two pence anyway 🙂
Yeah, you make some very good points.
I’ve been experimenting with using it “offline” (out of hours) to try and sanitize a database of email addresses – get rid of the ones which are definitely bad!
The connection time varies massively, if you’re lucky, it’s one or two seconds. If it’s bad, you can be waiting for a timeout on 3 MX records (somewhere around 45 seconds).