When Not To Program (Email Address Validation)

Have you ever been denied a purchase because your email address, that you use successfully for many other purchases, was deemed “invalid” by some 3rd party site handling purchases for a software company?

Enter the commonly-used regular expression, as found all over the Internet, to validate email addresses:

^([a-zA-Z0-9_\-\.]+)@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$

This regular expression allows any number of letters, numbers, dashes, or underscores before the “@”. But, that’s it.

So an email address like this:

mail—–i_____l_D@somedomain.com

is valid.

But what about special characters?

This is exactly the problem our reader encountered. The real problem (failure) encountered was not just a too simplistic regular expression, but the fact the email address didn’t need to be validated at all.

Imagine this workflow:

1. The user goes to a site to buy a neat software product upgrade.

2. He chooses to pay with PayPal.

3. He goes to the PayPal site, logs in, using his PayPal email with the special character(s) in his email address.

4. He is then redirected, after PayPal validates his credentials, back to the third-party payment site to complete the software purchase.

5.  But, the third party payment site, for whatever reason, chooses to validate his PayPal email address. Then, due to the simplistic regular expression they used, like the one above, they deny him the purchase with: “Email address is invalid.”

To add insult to injury beyond (1) faulty incorrect use-case analysis and the (2) wrong (too simplistic) regular repression, the email address is read only so the user has no way to complete the sale!

One of our favorite user interaction books: “The Inmates are running the asylum”, by Alan Cooper, describes these kinds of ridiculous user technology interactions.

The problem here, completely missed by the programmers at the payment site, was that the PayPal email address should not have been validated in the first place. If it’s good enough for PayPal, it’s already valid.

Apparently, no code reviews ever caught this problem either.

However, even with the incorrect use-case analysis by the third party payment vendor, they should have probably consulted even the most basic of resources for a more general regular expression to cover most, if not all, email addresses:

https://en.wikipedia.org/wiki/E-mail_address

This article discusses, among its many considerations, possible special characters in the ‘local part’ of the email address.

Thus, the following regular expression, though still unnecessary given the use-case and user workflow, would have worked and completed the sale (the user’s goal, after all):

^([a-zA-Z0-9_\-\.!#$%&’*+-/=?^]+)@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$

——–

Conclusion:

The difference between a programmer and a computer scientist, sometimes, is that the programmer will just “code”, where the computer scientist will just “think” (first).

In today’s world of “how fast can you create this functionality?”, thinking is sometimes a scarce commodity for anyone. Yet, for successful Web sites,  thinking about “user goals” is key. User goal analysis is beyond the scope of this article, but it’s a fascinating topic.

Here, the user goal was to: COMPLETE THE SALE for me.

Sadly, the user goal failed (wasn’t correctly considered) for our reader who, even after notifying the software company involved after being denied the purchase (nothing happened to fix the problem!), decided to stick with his old version of the software.

And, the software vendor LOST A SALE for a completely unnecessary reason: for slopping coding when no coding was necessary and when no coding would have been better.

Enjoy!!!

—-

Please read our disclaimer available from our home page