Software // Information Management
Commentary
9/27/2007
08:59 AM
Joe Celko
Joe Celko
Commentary
50%
50%

And the 'Email Validation' SQL Puzzle Winner Is...

The winner of last week's 'Email Address Validation' SQL puzzle is "Guest" (see comments), because he/she bothered to do the research and come up with an answer that is generic enough to port to any SQL dialect with a SIMILAR TO or a regexp() function. So, "Guest" please email me with your snail mail address (and some attempt to validate your SQL mastery/identity) and I'll send you one of my books. Here's my answer to last week's puzzle...

The winner of last week's 'Email Address Validation' SQL puzzle is "Guest" (see comments), because he/she bothered to do the research and come up with an answer that is generic enough to port to any SQL dialect with a SIMILAR TO or a regexp() function. So, "Guest" please email me with your snail mail address (and some attempt to validate your SQL mastery/identity) and I'll send you one of my books.

My answer to last week's puzzle is as follows:You can find regular expressions at http://regexlib.com/ and copy them into your code, making changes for your dialect. For example, this regular expression checks an email format against the RFC 3696 Standard and was written by David Thompson

^[a-z0-9!$'*+\-_]+ (\.[a-z0-9!$'*+\-_]+)* @ ([a-z0-9]+(-+[a-z0-9]+)*\.)+ ([a-z]{2} |aero|arpa|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|travel)$

If you do not read regular expressions, this says that a valid email address is one or more groups of strings of alphanumeric characters and some limited punctuation marks, optionally separated by a period. Then there is one "little snail" or "at-sign" in the middle. This followed by more groups of strings of alphanumeric characters and a more limited set of punctuation marks separated by periods. The string finally ends with either a two-letter country code or one of several explicit domain codes.

The problem is that the "[a-z]{2}" pattern matches any two letters even when they are not a valid country code.

If you do not have a SIMILAR TO predicate in your SQL, there is another approach. Set up a CREATE TRANSLATION declaration that maps the legal postfixes into a single unique token not used in an email address. This result is then passed on to another TRANSLATE () expression which reduces the alphanumeric and punctuation characters to a second unique token. Eventually, you wind up with a reduced pattern made up of the two tokens and the at-sign, say '#@#?' since neither '#' nor '?' appear in an email address.

A third approach is to use the TRIM( FROM ) function to reduce the suspect email address to a single at-sign or empty string.

The same effect can be had with nested REPLACE expressions in some dialects but the nesting can be pretty deep. If you are a LISP programmer, you will not mind a bit.

Are these good methods to use in place of using an external call to an external procedure in a 3GL language or SQL/PSM? Baroque as these suggestions are, they often run much faster than the external call and they are portable. But they are ugly to maintain.

Joe Celko is an independent consultant in Austin, Texas, and the author of SQL Puzzles and Answers (2006), Joe Celko's SQL for Smarties: Advanced SQL Programming (2005), and Joe Celko's Trees and Hierarchies in SQL for Smarties (2004).The winner of last week's 'Email Address Validation' SQL puzzle is "Guest" (see comments), because he/she bothered to do the research and come up with an answer that is generic enough to port to any SQL dialect with a SIMILAR TO or a regexp() function. So, "Guest" please email me with your snail mail address (and some attempt to validate your SQL mastery/identity) and I'll send you one of my books. Here's my answer to last week's puzzle...

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest, Nov. 10, 2014
Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?
Video
Slideshows
Twitter Feed
InformationWeek Radio
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.