Kimball University: Data Stewardship 101: First Step to Quality and Consistency
Data stewards are the liaisons between business users and the data warehouse team, and they ensure consistent, accurate, well-documented and timely insight on resources and requirements.
Bob Becker |
University DW/BI Best Practices
Consistent data is the Holy Grail for most data warehouse initiatives, and data stewards are the crusaders who fearlessly strive toward that goal. An active data stewardship program identifies, defines and protects data across the organization. Stewardship ensures the initial effort to populate the data warehouse is done correctly, while significantly reducing the amount of rework necessary down the road. Data stewards enforce discipline and serve as a conduit between the business and IT.
The primary focus of a stewardship team is determining an organization's data warehouse content, defining common definitions, assuring data quality and managing appropriate access. They help create and enforce firm vocabularies and associated business rules. In many organizations, the same words are often used to describe different things, different words are used to describe the same thing and the same descriptor or value may have several different meanings. A stewardship team can reach across the organization to develop consistently defined business terminology.
Why Stewardship Is Essential
An active stewardship program lets organizations improve their understanding of corporate data assets, discover the relationships among the data, consolidate metadata describing the data and ultimately transform data into actionable information.
An enterprise data warehousing effort committed to dimensional modeling must commit to conformed dimensions to ensure consistency across multiple business processes. (A conformed dimension is a single, coherent view of the same piece of data throughout the organization.) An active data stewardship program can help large enterprises tackle the difficult task of agreeing upon conformed dimensions, which is more of a communication than technical challenge. Various groups across the enterprise are often committed to their own proprietary business rules and definitions. Data stewards must work closely with and cajole all interest groups to develop and embrace common business rules and definitions across the enterprise.
The primary goal of a data stewardship program is to provide the enterprise with legible, consistent, accurate, documented and timely information about its data resources. Stewardship also ensures that authorized individuals use corporate data correctly and to its fullest potential.
Data stewardship also demystifies the corporate unknowns about the analytic process. It provides resources to resolve data-related questions such as:
• Where do I get this information?
• What does the data mean?
• How does it relate to other data?
• Where did this data come from?
• How frequently is this data updated?
• How reliable is this data?
• How much history do we have for this data?
A key goal is ensuring that data warehousing efforts align with the business strategy. Stewards spend a great deal of time working outside the data warehouse team. They should be available to business users, offering a one-stop source for analytic knowledge. They are the primary resource for business users starting a new analytic process, and they can ensure that these users are going in the right direction, potentially saving hours or days of unproductive effort. The data warehouse team can more quickly deliver new iterations of the data warehouse by relying on steward knowledge. Stewards ensure that the organization can develop consistent fact-based analytic applications.
Roles and responsibilities may vary depending on whether the steward is responsible for dimension tables, fact tables or both. In general, a data steward must:
• Become familiar with the business users and their various usage profiles to convey requirements and ease-of-use concerns to the data warehouse project team.
• Understand business requirements and how the data supports those requirements to help users leverage corporate data.
• Develop in-depth knowledge of the structure and content of the data warehouse--including tables, views, aggregates, attributes, metrics, indexes, primary and foreign keys, and joins--to answer data-related questions and enable a broader audience to analyze the data directly.
• Interpret new and changing business requirements to determine their impact on data warehouse design and to propose enhancements and changes to meet these new requirements.
• Analyze the potential impact of data definition changes proposed by the business and communicate related requirements to the entire data warehouse team.
• Get involved early in source-system enhancement or content changes to ensure that the data warehouse team is prepared to accept these changes.
• Comply with corporate and regulatory policies to verify data quality, accuracy and reliability, including establishing validation procedures to be performed after each data load and prior to its release to the business. Stewards must withhold new data and communicate status if significant errors are identified.
• Establish and perform data certification processes and procedures while exercising proper due diligence in ensuring compliance with related corporate and regulatory requirements.
• Provide metadata that describes the data, offers a business description/definition and identifies the source data element(s) and any business rules or transformations used to deliver the data.
In addition to all of the above, data stewards who are specifically responsible for conformed dimensions for the enterprise must help forge agreement on their definition and use in downstream analytic environments. They also must determine departmental interdependencies and ensure that the conformed dimensions meet business needs across business processes and departments. In some large organizations, developing consensus on conformed dimensions can be a significant political challenge, so data stewards need to communicate and coordinate with the other stewards, reach agreement on data definitions and domain values, and minimize conflicting or redundant efforts. When conflicts arise, stewards must get data warehouse senior management sponsors involved to resolve cross-departmental issues.
Stewards who support fact tables must ensure that conformed dimensions are used in their creation to avoid redundant or nonconforming tables. They must also ensure that any metrics used in multiple fact tables are conformed across business events. Finally, they must understand any consolidated or aggregate tables built on their fact tables, and they must put processes in place to remove aggregated tables that may be invalidated by slowly changing dimensions.
Data stewards should be well-respected, experienced subject-matter experts with a solid understanding of the business area supported and a commitment to working through the inevitable cross-functional challenges. Data stewards need strong communications skills to talk to the business users in their language while translating their requirements to the data warehouse team. Stewardship typically involves more cultural than technical challenges, so these individuals need to be organizationally and politically savvy. An effective data steward needs a mature attitude toward interpersonal relationships and organizational wrangling in order to deal with the inevitable conflict, and they should be comfortable with technologies and have a working knowledge of database concepts. Depending on the complexity of the source systems, the industry and the breadth of the data warehouse environment, it may take one or two years before a new data steward becomes truly productive.
Keys to Rewarding Relationships
Data stewards are an integral part of the data warehouse team. They need to take part in ETL design and development to make sure the transformation rules correctly interpret the business rules. Stewards interact with the data model design team to ensure that table designs support business requirements and are easy-to-use in terms of both efficient query performance and consistent data retrieval. They're work closely with the testing team to validate that the data being populated in the data warehouse is correct and meets users' requirements.
Data warehouses must be business-focused to be successful, so data stewards must interact with business constituents constantly. Data stewards who become too data- or IT-centric risk losing touch with the needs of their business users. At the same time, data stewards must get and keep business involved in the data warehouse effort. Depending on organizational and cultural considerations, this might mean participation in requirements definition and data model design sessions or simple validation of plans. The more involved users are, the more likely they will be to embrace the data warehouse environment and the more likely the warehouse will remain in sync with business objectives and strategies. Business-user expectations are a key input to the data warehouse team, and data stewards must communicate that vision.
Communication Tools and Techniques
Stewards need to foster an open, approachable atmosphere so business users are at ease approaching them for assistance in framing an analytic request to the data warehouse team. Stewards also need more formal communications approaches, such as maintaining an e-mail distribution list and relaying "the state of the data," including known issues and inconsistencies, to interested business users. Many data warehouse teams use Web sites to communicate with business constituencies, in which case stewards should help determine and provide the site content.
Data stewards should also participate and present at meetings and educational venues provided by peers in the business community, taking advantage of any chance to increase corporate awareness and knowledge of the data warehouse and its capabilities. They also can use these opportunities to gather feedback, including suggestions for warehouse improvement and requests for future warehouse iterations.
How To Get Started
Every organization, whether successful or not in their data warehousing efforts, has individuals fulfilling the roles and responsibilities of data stewardship. To develop a formal stewardship program, identify the individuals handling these responsibilities and organize their activities.
Establishing an effective stewardship program requires a strong leader with a solid vision of potential benefits. Gaining senior management support for the initiative is critical. In the early stages of the program, it may be necessary to involve senior management to help arbitrate and ensure consensus across the enterprise. Your organization's unique circumstances will drive the strategy; however you get there, your data warehousing efforts will be much more successful with a solid data stewardship program in place.
Quick Study
A data stewardship program identifies, defines and protects data across the organization, and it's particularly beneficial to the many organizations that struggle to embrace a unified set of conformed dimensions. The primary goal of data stewardship is to provide consistent, accurate, documented and timely information about data resources. Data stewards are the liaisons between users and the data warehouse team, and they must communicate and balance business requirements and technical constraints-a role that requires cultural sensitivity and political skill. Most important, stewards must:
* Interpret new and changing business requirements to determine the impact on data warehouse design.
* Analyze the impact of data definition and source-system changes.
* Comply with corporate and regulatory policies to verify data quality, accuracy and reliability.
* Establish and perform data certification processes.
* Provide metadata that describes the data, identifies source data element(s) and details business rules or transformations.
It a tough assignment, but one that's better addressed through a formal stewardship program rather than with disorganized, ad hoc efforts to solve problems after time and energy has been wasted on conflicting approaches.
Bob Becker is a member of the Kimball Group. He has focused on dimensional data warehouse consulting and education since 1989. Contact him at [email protected].
About the Author
You May Also Like