Get Started With Data Mining Now - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Software // Information Management

Get Started With Data Mining Now

Are you missing valuable data mining opportunities?

Data mining has come into its own over the past decade, taking a central role in many businesses. We're all the subject of data mining dozens of times a day—from the direct mail we receive to the fraud-detection algorithms that scrutinize our every credit card purchase.

Data mining is widespread because it works. The techniques can significantly improve an organization's ability to reach its goals. Its popularity is also rising because the tools are better, more broadly available, cheaper and easier to use.

Many data warehouse/business intelligence (DW/BI) teams aren't sure how to get started with data mining. This column presents a business-based approach that will help you successfully add data mining to your DW/BI system.

The data mining process must begin with an understanding of business opportunities. The diagram on the right shows the three phases of the data mining process, major task areas within those phases and common iteration points.

The Business Phase

This first phase is a more focused version of the overall BI/DW requirements gathering process. Identify and prioritize a list of opportunities that can have a significant business impact. The business opportunities and data understanding tasks in the diagram connect because identifying opportunities must be connected to the realities of the data world. By the same token, the data itself may suggest business opportunities.

As always, the most important step in successful BI isn't about technology, it's about understanding the business. Meet with businesspeople about potential opportunities and the associated relationships and behaviors captured in the data. The goal of these meetings is to identify several high-value opportunities and carefully examine each one. First, describe business objectives in measurable ways. "Increase sales" is too broad—"reduce the monthly churn rate" is more manageable. Next, think about what factors influence the objective. What might indicate that someone is likely to churn? How can you tell if someone would be interested in a given product? While you're discussing these factors, try to translate them into specific attributes and behaviors that are known to exist in a usable, accessible form.

After several meetings with different groups to identify and prioritize a range of opportunities, take the top-priority business opportunity and its associated list of potential variables back to the DW for further exploration. Spend a lot of time exploring the data sets that might be relevant to the business opportunities discussed. At this stage, the goal is to verify that the data needed to support the business opportunity is available and clean enough to be usable.

You can discover many of the content, relationship and quality problems firsthand through data profiling—using query and reporting tools to get a sense of the content under investigation. While data profiling can be as simple as writing some SQL SELECT statements with COUNTs and DISTINCTs, several data profiling tools can provide complex analysis that goes well beyond simple queries.

Once you have a clear, viable opportunity identified, document the following:

  • Business opportunity description
  • Expected data issues
  • Modeling process description
  • Implementation plan
  • Maintenance plan.

Finally, review the opportunity and documentation with businesspeople to make sure you understand their needs and they understand how you intend to meet them.

The Data Mining Phase

Now you get to build some data mining models. The three major tasks in this phase involve preparing the data, developing alternative models and comparing their accuracy, and validating the final model. As shown in the diagram at right, this is a highly iterative process.

The first task in this phase is to build the data mining case sets. A case set includes one row per instance or event. For many data mining models, this means a data set with one row per customer. Models based on simple customer attributes, such as gender and marital status, work at the one-row-per-customer level. Models that include repeated behaviors, such as purchases, include data at the one-row-per-event level.

A well-designed and built dimensional DW is a perfect source for data mining case data. Ideally, many variables identified in the business opportunity already exist as cleansed DW attributes—often true with fields such as customer_type or product_color. The data miner's world gets even better when demographics and other external data are already loaded into the DW using conformed dimensions.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
1 of 3
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Blockchain Gets Real Across Industries
Lisa Morgan, Freelance Writer,  7/22/2021
Seeking a Competitive Edge vs. Chasing Savings in the Cloud
Joao-Pierre S. Ruth, Senior Writer,  7/19/2021
How CIO Roles Will Change: The Future of Work
Jessica Davis, Senior Editor, Enterprise Apps,  7/1/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Monitoring Critical Cloud Workloads Report
In this report, our experts will discuss how to advance your ability to monitor critical workloads as they move about the various cloud platforms in your company.
Flash Poll