Build A Data Warehouse That Cuts Costs--Not Corners
Smart planning and open-source tools deliver enterprise-quality data warehouse apps at prices smaller firms can afford, too.
It's not surprising that business intelligence and data warehouse solutions are very pricey. The number of users has been rising, data volume has been growing, disparity of data sources required has been widening, and the refresh rates have been quickening from weekly to near-real time. The big question is whether you're getting the bang out of your investment buck that you need.
This article picks up on a theme I introduced in "BI on a Budget." Then, I examined three ways to maximize your BI budget, through nontraditional combinations of technologies, Web services, and in-database functions. All of these remain important strategies to get the most out of the money you spend. However, there are other means as well. This article explores project tasks you can execute and technologies you can implement that guarantee to squeeze the most out of your BI budget.
If you're really serious about maximizing your BI money, then start with your project planning. I can't tell you how many wonderfully detailed, well researched, and documented plans I've seen fail or at least stall. The reason is often attributed to the lack of explicit consideration of risk. This is a very important point. BI and data warehouse projects are initiated to support analytic applications. And, as we should all know, analytic applications are some of the trickiest applications to develop. Why? Well, put simply, analytic applications are difficult to specify. Typical IT development projects, like building an order entry system, are much easier to define than analytic applications, which are stuffed with concepts like slice-and-dice, ad hoc, data pivoting, and drill-through. All these functionalities need to support the dynamic interrogation process of an analyst.
Therefore, the risk associated with analytic applications is that users often don't know exactly what they want you to build until they start seeing part of the application. In other words, you must build an application before it's fully defined and specified. Risky, right?
So, in your project planning you must consider risk explicitly. To that end, there are two project tasks you must include and execute early in your plans: conducting a data quality audit and creating prototypes. Both will reduce much of the risk associated with analytic applications while saving considerable money.
The data quality audit answers the fundamental question of whether or not the source data supports the analysis required. Please read two of my columns for detail on how to conduct such an audit: "Data Quality Discipline" and "The Architecture of Enterprise Data Quality." For now, suffice it to say that if you don't have the data necessary to support the BI application requirements, there are only three options available:
Clean the data at the source before you spend any money on the BI/DW application.
Attempt to clean the data during the ETL portion of the application if possible. The audit will tell you if you can achieve this task and give you a good sense of what that effort might entail. You can then go to the user community, armed with quantified information, and examine how that affects the budget before you start any effort.
Adjust the scope of your BI/DW application.
In all cases, you save your company money, time, and resources that would have otherwise been spent trying to complete a project that was destined to fail from the start.
Another important task that you can perform is prototyping. Your goal is to quickly source the data into a rough prototype of the deliverable. It can be quickly cobbled together by sidestepping all the formal processing and persistent data structures ultimately built for the final product. The value of the prototype, however, is often immeasurable. For example, it gives the user a sense of what you're building as well as educating you on the challenges you'll face during the main project. For both the BI planners and the users, you now have an opportunity to modify the requirements, timeline, and budget — before you start any project effort in earnest. This saves money!
Open Source Software: FUD Vs. Reality
Fear, uncertainty, and doubt (FUD): the marketing strategy executed by established vendors when technically viable alternatives are available at incredible savings. FUD is used to raise concern in the minds of customers about adopting a technology with a smaller presence as opposed to a behemoth whose own product doesn't possess equivalent features or pricing. And, in many cases, it works. No decision maker wants to invest in a technology that's destined to fail or recommend a solution that's not readily recognized by company executives. Open source products are certainly nontraditional technology, which makes them an easy target for FUD.
Project planners and architects must recognize FUD when they hear it from vendors. The fact is that there are several prominent open source products that have been effective tools in IT for years. Forward thinking, enterprise-minded planners even have a software stack they explore called LAMP (Linux, Apache, MySQL, PHP/Python/Perl). Each product in the stack has a growing body of evidence to substantiate it as proven technology. Everyone in the technical community has heard of them. And a couple of the products are even familiar to executives. Put simply: Open source works. These products shouldn't be discarded as ineffective or unstable.
Of this LAMP software stack, I want to specifically draw your attention to two great open source alternative technologies: Linux and MySQL. Both of these products can support small and large BI-centric projects, and are especially important to companies on tight budgets. With no reservations, I believe that any BI-on-a-budget effort should have these two candidates on its short list. Their presence in your project will have a positive, if not dramatic impact on your budget.
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.