Kimball University: Five Alternatives for Better Employee Dimension Modeling
The employee dimension presents one of the trickier challenges in data warehouse modeling. These five approaches ease the complication of designing and maintaining a 'Reports To' hierarchy for ever-changing reporting relationships and organizational structures.
Alternative 4: Forced Fixed Depth Hierarchy Technique
It is tempting to force the structure into a fixed depth hierarchy. Even a very large company probably has fewer than 15-20 layers of management, which would be modeled as 15-20 additional columns in the Employee dimension. You'll need to implement a method of handling the inevitable future exceptions. A fixed depth employee dimension table is illustrated in Figure 5.
The Employee Org Level Number tells us what level from the top of the hierarchy we'll find this employee. Usually we fill in the lower levels with the employee's name.
At query time, the forced fixed depth hierarchy approach will work smoothly with both relational and OLAP data access. The biggest awkwardness is to train the users to query the Org Level Number first to find out the level where the employee is located -- for example Level 5 -- and then constrain on that column (Level05 Manager Name). A design that uses this approach must very carefully evaluate whether this two step query procedure is actually workable with a particular query tool and consider the training costs for the business users. Query performance should be substantially better than designs that include a bridge table.
The forced fixed depth approach is maintainable, but you will see a lot of propagation of Type 2 rows. If the entire fixed depth hierarchy is managed as Type2, then a new CEO (Level01 Manager) would result in a new row for every employee. Some organizations compromise by managing the top several levels as Type1.
Alternative 5: The PathString Attribute
By now you're probably desperate for a recommendation. Two years ago, a clever student in a Kimball University modeling class described an approach that allows complex ragged hierarchies to be modeled without using a bridge table. Furthermore, this approach avoids the Type 2 SCD explosion described in Alternative #1, and it works equally well in both OLAP and ROLAP environments.
The PathString attribute is a field in the Employee dimension that contains an encoding of the path from the supreme top level manager down to the specific employee. At each level of the hierarchy, the nodes are labeled left to right as A, B, C, D, etc. and the entire path from the supreme parent is encoded in the PathString attribute. Every employee has a PathString attribute. The supreme top level manager has a PathString value of "A". The "A" indicates that this employee is the left most (and only) employee at that level. Two additional columns would hold the level number, and an indicator of whether the employee is a manager or an individual contributor. Figure 6 shows a sample organization chart with PathString values for each node.
Users query the tree by creating a filter condition on the PathString column in the Employee dimension. For example, we can find all the people who report (directly or indirectly) to the employee with PathString ACB by filtering WHERE PathString LIKE 'ACB%'. We can find direct reports by adding a clause AND OrgLevel = 4.
The advantage of the PathString approach is its maintainability. Because of this clever structure, you will see substantially fewer Type 2 rows cascading through the dimension. An organizational change high in the tree -- such as creating a new VP organization and moving many people from one node to another -- will result in a substantial restatement of the tree. If you're tracking the organizational structure itself as Type 2, this would mean many new rows in the employee dimension. But it's still fewer rows than the alternative approaches.
The main disadvantage of the PathString approach is the awkwardness of the business user query experience. This solution will require substantial marketing and education of the user community for it to be palatable.
Hopefully when you study these alternatives, you will see one that meets your needs. A Type2 Reports To or variable-depth hierarchy is a challenging beast to include in your DW/BI design. This is particularly true if you want to support ad hoc use of the structure as you'll need to balance ease of use and query performance against some very difficult maintenance problems. The decision matrix is complicated by the different capabilities of alternative storage engines, especially the differences between relational and OLAP.
The sad conclusion is that there is no universally great solution to the problem. To make the best choice, you need to have a deep understanding of both your data and your business users' requirements. It's always best to strive for that understanding, but in this case, it's imperative.
The Agile ArchiveWhen it comes to managing data, don’t look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyIT’s tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.
Join InformationWeek’s Lorna Garey and Mike Healey, president of Yeoman Technology Group, an engineering and research firm focused on maximizing technology investments, to discuss the right way to go digital.