informa
/
News

Celko on SQL: Natural, Artificial and Surrogate Keys Explained

SQL programming guru Joe Celko offers a classification scheme and advice on using the right keys.
Physical Appearance of a Key

This is part of designing data, as opposed to designing a database. It's a matter of heuristics, based on human psychology, not on mathematical rules. It might sound a little silly, but they do seem to work. So, let's assume you now have to invent a key from scratch.

1. Fixed length strings are usually better than varying length strings. This gives you another constraint for validation and it is very easy to enforce on both the front and back ends. For example, you can tell if a string of digits is a ZIP code or a telephone number just by counting the digits.

2. Avoid mixing digits and letters in the same positions in the string. This will let you write a very simple regular expression for validation. It also lets a human being figure out the code. For example, the Canadian Postal codes use '[A-Z][0-9][A-Z] -[0-9][A-Z]{0-9]' as their regular expression pattern. It also makes check digits easier if you use only digits (if you don't understand check digits, check out Identification Numbers and Check Digit Schemes, by Joseph Kirkland, at www.maa.org).

3. Allow for growth. That sort of contradicts heuristic #1 about a fixed-length string. But you can use zeroes, blanks or other characters (or define certain code ranges as unused) to provide room for growth.

In the old days of punch cards, the fields were always fixed length, so if you used the digits for a code in a one-character card column and suddenly found you needed more than ten values, you had to use alphas or special characters. This messed up a lot of things. Your sort order now depended on ASCII or EBCDIC hardware. People could not read the codes easily. Special characters varied from key punch machine to key punch machine. If you thought ahead and allowed two-card columns with a leading zero, you were sitting pretty.

Finding a Key

So, how do you start to look for a key when you are setting up a database? I recommend these three steps in the order presented below:

1. Look for an industry standard and the trusted external source that maintains and verifies it. Almost every industry standard code has a regular expression and/or a check digit of some kind for validation. Today, you can Google all of these things or go to industry related chat groups and discover industry standards with less than a week of research even if you are totally ignorant of that industry.

You can often download the code from a Web site. Even better, the industry group that set up the standard will maintain it for you, providing updates for small changes and migration paths in the event of radical changes.

I classify industry standards as natural keys because they share the same properties of familiarity, validation and verification that physically derived natural keys do.

2. Look for a natural key in the attributes. Example: (longitude, latitude) makes a good key for a geographical location. A GPS can be used to verify it. There are some special problems with this kind of key. You need to determine a unit of measure and a scale. I could also use a polar coordinate system for a geographical location.

3. If you must design a new identifier, plan it carefully — especially if people will see and use it. You have to be able to verify it in application programs, so you should have a regular expression, other syntax rules or check digits. You have to be able to verify in the reality of the model or with a trusted source that you maintain.

The "auto-increment" crowd thinks their "fake pointer chains" or "fake record numbers" are surrogate keys. They are actually exposed physical locators. To reiterate an earlier point, there is no such thing as a "universal, one-size-fits-all" key. This is a superstitious belief taken from Kabala and other forms of Numerology. Why would anyone think that everything in the Universe has a mystic number divinely reveled by an internal counter in the current release of one vendor's database product? Now, if you were implementing your RDBMS on a Magic Eight Ball, that would be different!

Don't miss this month's Celko Puzzle on Email Address Validation.

Joe Celko is an independent consultant in Austin, Texas, and the author of SQL Puzzles and Answers (2006), Joe Celko's SQL for Smarties: Advanced SQL Programming (2005), and Joe Celko's Trees and Hierarchies in SQL for Smarties (2004).