You can still hand-code an extract, transform and load system, but in most cases the self-documentation, structured development path and extensibility of an ETL tool is well worth the cost. Here's a close look at the pros and cons of buying rather than building.
DISADVANTAGES OF ETL TOOLS
Software licensing cost.The greatest disadvantage of ETL tools in comparison to hand-crafted systems is the licensing cost for the ETL tool software. Costs vary widely in the ETL space, from several thousand dollars to hundreds of thousands of dollars.
Uncertainty. We've spoken with many ETL teams that are uncertain – and sometimes misinformed – about what an ETL tool will do for them. Some teams under-value ETL tools, believing they are simply a visual way to connect SQL scripts together. Other teams unrealistically over-value ETL tools, imagining that building the ETL system with such a tool will be more like installing and configuring software than developing an application.
Reduced flexibility. A tool-based approach limits you to the tool vendor's abilities and scripting languages.
Build a Solid Foundation
There are some over-arching themes in successful ETL system deployments regardless of which tools and technologies are used. Most important — and most frequently neglected — is the practice of designing the ETL system before development begins. Too often we see systems that just evolved without any initial planning. These systems are inefficient and slow, they break down all the time, and they're unmanageable. The data warehouse team has no idea how to pinpoint the bottlenecks and problem areas of the system. A solid system design should incorporate the concepts described in detail in Kimball University: The Subsystems of ETL Revisited, by Bob Becker.
Good ETL system architects will design standard solutions to common problems such as surrogate key assignment. Excellent ETL systems will implement these standard solutions most of the time but offer enough flexibility to deviate from those standards where necessary. There are usually half a dozen ways to solve any ETL problem, and each one may be the best solution in a specific set of circumstances. Depending on your personality and fondness for solving puzzles, this can be either a blessing or a curse.
One of the rules you should try to follow is to write data as seldom as possible during the ETL process. Writing data, especially to the relational database, is one of the most expensive tasks that the ETL system performs. ETL tools contain functionality to operate on data in memory and guide the developer along a path to minimize database writes until the data is clean and ready to go into the data warehouse table. However, the relational engine is excellent at some tasks, particularly joining related data. There are times when it is more efficient to write data to a table, even index it, and let the relational engine perform a join than it is to use the ETL tool's lookup or merge operators. We usually want to use those operators, but don't overlook the powerful relational database when trying to solve a thorny performance problem.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join us for a roundup of the top stories on InformationWeek.com for the week of December 14, 2014. Be here for the show and for the incredible Friday Afternoon Conversation that runs beside the program.