Celko On SQL: Identifiers and the Properties of Relational Keys

He's back! Author, SQL expert, puzzlemaster and Ming the Merciless look-alike Joe Celko offers his sage (if offbeat) advice on unique identifiers, validation, verification, trusted sources and relational keys.
Invariant Value, Format and Change

Many people believe that a primary key:

Must have an invariant value

Must have an invariant format

They use this as an excuse for auto-numbering and other local, exposed physical locators in place of correct relational keys.

This year, the retail industry in the United States is switching from the 10-digit UPC barcode on products to the 13-digit EAN system and the ISBN (International Standard Book Number) is falling under the same scheme. Clearly, this violates both those assumptions. Hell, even the check digit is different! But the retail industry is still alive and well. Why?

The most important property of a key is that it must ensure uniqueness. But that uniqueness does not have to be eternal. Nor does the format have to be fixed for all time. They simply have to be valid and verifiable at the time I use it.

The retail industry has assured that the old and the new barcodes will identify the same products by a carefully planned migration path that will allow us to change the values and the formats of one of the most common identifiers on Earth. The migration path started with changing the length of the old UPC code columns from 10 to 13 and padding them with leftmost zeros.

In a well-designed RDBMS product, referenced keys are easy to change. Thus, I might have an Inventory table that is referenced in the Orders table. The physical implementation could be a pointer in the Orders table back to the single value in the Inventory table. Or I could use bit vectors that reference domains and simply change the domains. Or it could be a minimal perfect hashing function.

Weird Ideas

One of the weirdest Newsgroup posting I found on this topic was the assertion that industry standard identifiers were just as arbitrary as a locally generated auto-number. The reasoning seemed to be that since both use the same alphanumeric characters, they were alike. Perhaps this "Cowboy Coder," who could ignore ISO, ANSI, the entire EU and every trade organization on Earth that sets standards, wants to replace the Metric System (more properly named "Systme International d'Units" or SI ) with Donald Knuth's "Potrzebie System of Weights and Measures," from MAD magazine #33. Hey, it was invented by a famous computer scientist and it has a decimal base. Since it has a decimal base it is as good as SI!

Joe Celko is an independent consultant in Austin, Texas, and the author of SQL Puzzles and Answers (2006), Joe Celko's SQL for Smarties: Advanced SQL Programming (2005), and Joe Celko's Trees and Hierarchies in SQL for Smarties (2004).