5. Carefully Plan the ETL Architecture
Our approach describes a formal data-staging area, much like the kitchen in a restaurant, with detailed ETL processes required to bridge the gap between the production system source data and the presentation area dimensional schema. The approach further defines cleaning and conforming activities as part of the transformation process.
Let there be no doubt, the ETL effort is hard work. The ETL system is often estimated to consume 70 percent of the time and effort of building a business intelligence environment. Too often, little thought goes into architecting a robust ETL system, and it ends up as an uncoordinated, spaghetti-mess of tables, modules, processes, scripts, triggers, alerts and job schedules. This sort of design approach has unmistakably derailed many business intelligence efforts.
The Kimball Method describes a comprehensive set of ETL subsystems that comprise a robust set of ETL best practices, including those required to support real-time requirements (see The Subsystems of ETL) .
Be wary of any approach that suggests that ETL is no longer a required architectural component. Some architects believe that a simple intermediate data structure or an integration software layer is all that's needed to perform translation on the fly. Unfortunately, true data integration can only succeed if the textual descriptors in each separate source are physically altered so they have the same label (column name) and content (data domain values). If it sounds too easy, it is (see Beware the Objection Removers for more on these kinds of misleading claims).
This article has highlighted five best practices drawn from the Kimball Method which we recommend designers study carefully in order to avoid the misrepresentations sometimes heard in various teaching and writing venues. As a designer, you are free to choose any approach you are comfortable with, but we want you to think critically when you are making these choices.
Bob Becker is a member of the Kimball Group. He has focused on dimensional data warehouse consulting and education since 1989. Contact him at email@example.com. Ralph kimball, founder of the Kimball Group, teaches dimensional data warehouse and ETL design through Kimball University and reviews large warehouses. He has four best-selling data warehousing books in print, including The Data Warehouse ETL Toolkit. Write to him at firstname.lastname@example.org.
Related books include The Data Warehouse Toolkit, Second Edition, The Data Warehouse ETL Toolkit and The Data Warehouse Lifecycle Toolkit.