Whether your ETL system is hand-coded or tool-based, it's your job to design the system for manageability, auditability, and restartability. Your ETL system should tag all rows in the data warehouse with some kind of batch identifier or audit key that describes exactly which process loaded the data. Your ETL system should log information about its operations, so your team can always know exactly where the process is now and how long each step is expected to take. You should build and test procedures for backing out a load, and, ideally, the system should roll back transactions in the event of a midstream failure. The best systems monitor data health during extraction and transformation, and they either improve the data or issue alerts if data quality is substandard. ETL tools can help you with the implementation of these features, but the design is up to you and your team.
SUMMING IT UP
Should you use an ETL tool? Yes. Do you have to use an ETL tool? No. For teams building their first or second ETL system, the main advantage of visual tools are self-documentation and a structured development path. For neophytes, these advantages are worth the cost of the tool. If you're a seasoned expert — perhaps a consultant who has built dozens of ETL systems by hand — it's tempting to stick to what has worked well in the past. With this level of expertise, you can probably build a system that performs as well, operates as smoothly, and perhaps costs less to develop than a tool-based ETL system. But many seasoned experts are consultants, so you should think objectively about how maintainable and extensible a hand-crafted ETL system might be once the consultant has moved on.
Don't expect to reap a positive return on investment in an ETL tool during the development of your first system. The advantages will come as that first phase moves into operation, as it's modified over time, and as your data warehouse grows with the addition of new business process models and associated ETL systems.
For more on ETL design and best practices see The Data Warehouse ETL Toolkit, by Ralph Kimball and Joe Caserta, or consider attending an upcoming Kimball University class on "ETL Architecture in Depth."
Joy Mundy has worked in the date warehousing/business intelligence industry since 1992, including a stint at Microsoft's SQL Server product development organization. She co-authored The Microsoft Data Warehouse Toolkit.