IBM DB2's 25th Anniversary: Birth Of An Accidental Empire

At first, relational database was a highly mocked product, halting in its performance compared to the programmed-path systems. Now it represents an $18.6 billion a year market.

Charles Babcock, Editor at Large, Cloud

June 9, 2008

9 Min Read

Saturday June 7 was the 25th anniversary of DB2. Ingres and Oracle preceded it as commercial products by a narrow margin, but the launch of DB2 on June 7, 1983, marked the birth of relational database as a cornerstone for the enterprise.

DB2 came about as a result of an unprecedented collaboration between academic and commercial database researchers. Both took Edgar Codd's elegant theoretical concept and built examples of a flexible system, capable of serving data to thousands of users at a time, based on it. Relational database was the opposite of everything that had gone before in data storage.

"You could dynamically add tables or change tables without taking the system down. It doesn't take much imagination now to see this was a huge leap forward," recalled Don Haderle, chief architect of DB2, referred to within IBM ranks as the father of DB2.

At first, relational database was a highly mocked product, halting in its performance compared to the programmed-path systems. Skeptics like John Cullinane, founder of Cullinet Software, once took this reporter aside to instruct him that relational database would never amount to anything compared to his firm's IDMS product. Last year, relational database represented an $18.6 billion a year market, according to IDC. Ecommerce would be impossible without it.

Researchers at IBM's Santa Theresa Lab in San Jose were frequently drawn from the ranks of early computer science students at the nearby University of California at Berkeley, where Bob Epstein, Jerry Held, Jim Gray, Michael Stonebraker and others made such rapid progress on the experimental Ingres system that it put pressure on Big Blue to productize its own invention.

Predecessor systems like IBM's IMS and VSAM on the mainframe could store megabytes of data, but it had to be entered and retrieved in the same structured way every time. Changing the structure or sequence of data meant taking the database offline, with programmers laboring over it for hours to make sure the changes didn't torpedo the handful of systems that depended on the storage mechanism. With relational database, dozens or hundreds of applications can be added to the same system, each getting the data it needs in the desired format.

Initially, however, relational database was the subject of serious experimentation at Berkeley, not product development. IBM researcher Edgar Codd had written a series of papers in the 1960s and 1970s proposing a new kind of database. Codd died in April 2003, but his former associates told The New York Times for his obituary: "His approach was not, shall we say, welcomed with open arms at IBM. It was a revolutionary approach," said Harwood Kolsky, a physicist who had worked with Codd.

Don Haderle, project lead for the IBM DB2 team, now retired, confirmed in a recent interview some internal opposition. DB2 development was not funded through the existing IMS and mainframe software group, which opposed it. "IMS had been developed for McDonnell Douglas by a field development group [closely aligned with sales], not by research," recalled Haderle in a recent interview. McDonnell Douglas had asked for a system to manage its inventory of parts and IBM field engineers produced IMS, noted Haderle.

The IBM storage organization, realizing that every IMS customer tended to increase by a hundredfold the amount of storage it bought, agreed to fund the DB2 project. It was a long shot, but just maybe the pointy headed researchers would produce another product that increased storage sales, Haderle recalled.

Work on DB2 progressed from 1978 through 1983, when DB2 was launched. The storage unit continued to fund its development through 1986, the year DB2 was showing revenues that meant it was breaking even on the amount invested in it that year. Profits were soon to follow, but as that prospect loomed, IBM's Senior VP Earl Wheeler moved DB2 out of the storage unit and into a new IBM software division. Haderle said he has been ribbed by former storage executives ever since for taking their funds but never returning a profit.

During the years of development, "the IMS team kept asking,' why do we need you?' IMS stood alone at that time. You didn't have to convince the customer to get IMS [versus something else]," Haderle recalled. "When the head of the storage division announced DB2, he didn't want to antagonize his IMS customers. So he announced a new storage system, and at the bottom, there was 100 words, 'by the way, I have this DB2 thing for decision support,'" he added.

"We could do online transactions, but he pigeon-holed us as decision support," or combing static data in response to high level executive queries, usually a resource hog type of application, he noted.

Haderle says in fairness to the IBM storage unit, its director "didn't want to confuse the sales force or confuse customer perceptions." In 1983, it was hard to explain what relational systems were because they functioned so differently from what had gone before. A piece of data in one row of a table could be used to cross reference information in a totally separate table. Information on a customer could be pulled out of sales, field service and accounting databases to begin to correlate a single customer's information.

"It was quite hard to explain how it would give you this enormous leap in productivity," said Haderle.

Until Pat Selinger and her team did its work on DB2 queries, the productivity of relational systems was in doubt.

If McDonnell Douglas had been instrumental in developing IMS, Boeing proved a strong partner to IBM in developing DB2. The IT group at Boeing implementing DB2 used up its budget before the end of February one year and called a meeting with Haderle and Selinger. Instead of getting chewed out, they were told, "'we've been getting answers we were never able to get before.' They were thrilled with the data they were getting. Boeing was a phenomenal partner for us," as work continued on performance tuning, said Haderle.

Queries to the database could be resource hogs that tied up the CPU for minutes, if not hours. Selinger had joined the DB2 team as it got underway in 1978 and assumed management of a team to optimize relational database queries to get better performance.

Her first hire was Jim Gray, who wanted to work for the relational database team and had quit his job with the IBM Thomas J. Watson Research Center outside New York, when his boss refused to allow him to transfer. After moving himself west, he was promptly hired by Selinger in San Jose. Bruce Lindsay became the third member of the team and the three of them developed the "cost basis" of formulating queries, recalled Selinger, now retired, in an interview. A query optimizer would weigh the resources required by a query and, if it was greedy, drop it down in the queue for execution or issue a prompt for its restructuring. With cost basis querying, it started getting more difficult to mock relational performance.

Gray was representative of the computer scientists coming out of the University of California at Berkeley at that time. Students such as Jerry Held, who founded Tandem; Bob Epstein, who founded Sybase; and Michael Stonebraker, who founded Ingres, Illustra, Streambase, and Vertica, along with numerous IBM researchers "were all buddies. Half the guys went to school together. We all went to the same 'church,'" recalls Haderle, meaning they were all believers in the power of the new relational systems.

Meetings of the Special Interest Group on Data Management, sponsored by the Association For Computing Machinery, brought IBM and academic developers together outside IBM's commercial environment. In the bay area, the team working on Ingres at Berkeley met regularly with their counterparts working on DB2 at IBM, said Stonebraker in an interview.

As a new assistant computer science professor at Berkeley, Stonebraker saw in 1972 the theoretical elegance of Codd's ideas but wondered, "could you implement the relational model?" The work he and his students did on Ingres became a driving force behind relational database, sometimes gaining steps on IBM's System R, started a year later, and well before System R became productized into DB2 between 1978-83. Developers of the two relational systems tracked each other's work closely.

"Between the two groups, all the major ideas were in one or the other," said Stonebraker.

Ingres commanded respect because it was gathering pioneering efforts from several sources. The project drew outside academic support because it had been written, not for an IBM mainframe, but for a little known public operating system. "We had written Ingres for Unix, an unknown operating system. By 1977, it was widely run in universities, and 200 copies of Ingres were in use. Anybody could get it," recalls Stonebraker.

Between 1973 and 1974, Stonebraker recalls three meetings where the IBM group went to Berkeley or the Berkeley group visited San Jose. Ingres had pioneered the use of virtual tables or "views" that were not bound by the physical tables from which they originated, a key relational concept that found its way into DB2.

Don Chamberlain at IBM, for example, developed the SQL data access language. When Stonebraker founded the database firm, Ingres, in 1980, Stonebraker believed in the superior capabilities of the Quel data access language, coming out of Berkeley. But IBM established SQL as an ANSI standard. Agencies of the U.S. government and many enterprises would only buy the approach that had been standardized, and Quel fell by the wayside.

Larry Ellison, familiar with the work of both groups, realized Ingres was leaving an opening by sticking with Quel. He founded Relational Software Inc. to produce Oracle in 1979, running on Digital Equipment Corp. VMS and Unix and using SQL. Renamed Oracle Corp. in 1986 for its product, his firm eventually became the dominant database supplier.

Ingres supplier Relational Technology Inc. was sold to Ask Computer in 1990, resold to Computer Associates in 1994, and became open source code, backed by Ingres Corp., in 2004.

"Anybody who wanted to implement a relational database started with Ingres, or picked up on System R," recalls Stonebraker. He drummed up money for the Ingres research project at Berkeley through the university's economics group and by tapping defense agencies, the Air Force Office of Scientific Research, The Army Research Office and the Navy Electronics Systems Command.

About the Author(s)

Charles Babcock

Editor at Large, Cloud

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive Week. He is a graduate of Syracuse University where he obtained a bachelor's degree in journalism. He joined the publication in 2003.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights