Welcome Guest. | Log In| Register | Membership Benefits

Successful Mining

Understand both business and technology

By Herb Edelstein
Issue Date: Jan. 8, 1996

The key to data mining success is understanding both the business problem and the technology. All too often, people try to apply a data mining tool blindly on a database and expect to get a usable result. Without an understanding of the problem domain, it's very easy to misuse a data mining product.

For example, many products require you to use portions of the available data. These portions may be a subset of the rows (a sample of the data), a selection of columns (variables), or both. You can only properly choose this subset if you understand your problem and data. Some products will try to automatically sample the data-the cautious user must understand whether the basis the product uses for row selection will give him the desired result.

Furthermore, not every approach or algorithm is appropriate for every problem. Understanding the limitations of algorithms and how they use data is essential in interpreting results. A result that is too close to perfect may indicate the pattern you are searching for is already coded into the data in a disguised format-for example, a variable is dependent on a value calculated from other parts of the database.

Perhaps the biggest problem in data warehousing in general and data mining in particular is the quality of the data. One of the earliest principles of data processing applies here: garbage in, garbage out. It is absolutely critical to ensure that the data is as clean as possible and has as few missing values as possible. Because there inevitably will be a certain amount of bad and missing data in the data warehouse, you will need to understand how this can affect results.

If your model is highly sensitive to a particular variable, you should make sure that small amounts of incorrect or missing data in that variable haven't yielded skewed results. You must continually monitor data quality as you add data to your warehouse.

A formal examination of your data can help you build your model and improve its quality. This can range from a series of queries to some preliminary data mining.

No matter how good your model, it will likely change over time. A classification scheme that works in an era of 3% inflation may not be as effective in an era of 6% inflation.

Return to main story: " Mining Data Warehouses "

Comments on this story?

InformationWeek http://techweb.cmp.com/iwk




CAREER CENTER
Ready to take that job and shove it?



TechCareers

SEARCH
Function:

Keyword(s):

State:
SPONSOR
RECENT JOB POSTINGS
CAREER NEWS
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Ari Balogh was named to the post of chief technology officer as the companys for a "realignment" of employees.



Specialty Resources

Featured Microsite