informa
/
Commentary

And the 'Email Validation' SQL Puzzle Winner Is...

The winner of last week's 'Email Address Validation' SQL puzzle is "Guest" (see comments), because he/she bothered to do the research and come up with an answer that is generic enough to port to any SQL dialect with a SIMILAR TO or a regexp() function. So, "Guest" please email me with your snail mail address (and some attempt to validate your SQL mastery/identity) and I'll send you one of my books. Here's my answer to last week's puzzle...
The winner of last week's 'Email Address Validation' SQL puzzle is "Guest" (see comments), because he/she bothered to do the research and come up with an answer that is generic enough to port to any SQL dialect with a SIMILAR TO or a regexp() function. So, "Guest" please email me with your snail mail address (and some attempt to validate your SQL mastery/identity) and I'll send you one of my books.

My answer to last week's puzzle is as follows:You can find regular expressions at http://regexlib.com/ and copy them into your code, making changes for your dialect. For example, this regular expression checks an email format against the RFC 3696 Standard and was written by David Thompson

^[a-z0-9!$'*+\-_]+ (\.[a-z0-9!$'*+\-_]+)* @ ([a-z0-9]+(-+[a-z0-9]+)*\.)+ ([a-z]{2} |aero|arpa|biz|cat|com|coop|edu|gov|info|int|jobs|mil|mobi|museum|name|net|org|pro|travel)$

If you do not read regular expressions, this says that a valid email address is one or more groups of strings of alphanumeric characters and some limited punctuation marks, optionally separated by a period. Then there is one "little snail" or "at-sign" in the middle. This followed by more groups of strings of alphanumeric characters and a more limited set of punctuation marks separated by periods. The string finally ends with either a two-letter country code or one of several explicit domain codes.

The problem is that the "[a-z]{2}" pattern matches any two letters even when they are not a valid country code.

If you do not have a SIMILAR TO predicate in your SQL, there is another approach. Set up a CREATE TRANSLATION declaration that maps the legal postfixes into a single unique token not used in an email address. This result is then passed on to another TRANSLATE () expression which reduces the alphanumeric and punctuation characters to a second unique token. Eventually, you wind up with a reduced pattern made up of the two tokens and the at-sign, say '#@#?' since neither '#' nor '?' appear in an email address.

A third approach is to use the TRIM( FROM ) function to reduce the suspect email address to a single at-sign or empty string.

The same effect can be had with nested REPLACE expressions in some dialects but the nesting can be pretty deep. If you are a LISP programmer, you will not mind a bit.

Are these good methods to use in place of using an external call to an external procedure in a 3GL language or SQL/PSM? Baroque as these suggestions are, they often run much faster than the external call and they are portable. But they are ugly to maintain.

Joe Celko is an independent consultant in Austin, Texas, and the author of SQL Puzzles and Answers (2006), Joe Celko's SQL for Smarties: Advanced SQL Programming (2005), and Joe Celko's Trees and Hierarchies in SQL for Smarties (2004).The winner of last week's 'Email Address Validation' SQL puzzle is "Guest" (see comments), because he/she bothered to do the research and come up with an answer that is generic enough to port to any SQL dialect with a SIMILAR TO or a regexp() function. So, "Guest" please email me with your snail mail address (and some attempt to validate your SQL mastery/identity) and I'll send you one of my books. Here's my answer to last week's puzzle...