Welcome Guest. | Log In| Register | Membership Benefits
Labs

March 20, 2000

Printer ready
Printer ready
Data Mining:
New Patterns In Your Data

DataDesk helps uncover information that other data mining tools might obscure. The product takes time to learn, but for those who need to analyze data frequently, the rewards are great.

By Michael Dineen

Related links:
  • SAS Tackles E-Intelligence (2/28/00)

  • Vendors Offer New Ways To Analyze Data (2/21/00)

  • Tools Aim To Help E-Businesses Improve Service (2/14/00)
  • TechEncyclopedia
    Need a definition of a technology term? Look it up here:


    Send Us Your Feedback
    Products that turn data into information are more important than ever. The term "data mining," with its gritty image of soot-covered workers toiling in dark caves, underscores the perceived laboriousness of the task. Now, a set of tools and processes called DataDesk 6.1 for Windows, from Data Description Inc., offers valuable help.

    DataDesk helps find patterns, relationships, and trends in data sets, a process known in statistical jargon as exploratory data analysis, and in popular terms as data mining. A key to successful data mining is ease of manipulation; important patterns will remain undiscovered if you lack the tools to unearth them. DataDesk lets you explore data in an intuitive manner. Hidden patterns can be tracked down and articulated with DataDesk's assistance. The product makes it extremely easy to play with your data and get visual feedback on relationships among your variables.

    By letting you manipulate the data directly, DataDesk lets you easily explore it without the programming interface getting in the way. Though it's quick and intuitive, DataDesk can't just load and start working. It's a complex program, and users will need some time and serious willpower to learn it.

    Starting DataDesk for the first time can be a little disconcerting. It's essentially an empty screen, with several toolbars, a color palette, and a tiny text box. If you're used to Windows, the interface will look unfamiliar, as it's much more like a Macintosh. The window-manipulation icons (close, minimize, resize) look different, though they're in the same places they would be in a standard Microsoft window. You can't resize a window by dragging its border. Right-clicking usually has the same effect as left-clicking (the Mac mouse has only one button). There are no tool tips to instruct you regarding the function of the buttons.

    Importing data is extremely easy. DataDesk reads delimited or fixed-width text files, but the easiest way to import data is to paste it from the clipboard. I copied an Excel spreadsheet with 18,000 records and 40 variables to the clipboard, then opened DataDesk, which asked if I wanted to use the column headers as variable names. I clicked the affirmative and, within a few seconds, my data was ready to explore. Copying from a database table or query works the same way.

    Variables in DataDesk are stored as icons that you manipulate by clicking and dragging, rather than by using the variable names. The idea is to let you manipulate concepts, rather than variables, and is in keeping with the strategy of the program: to remain transparent and let you interact with the data directly. If you want to see the data values, double-click the icon and a list of values appears. Tables, plots, and other object types generated in the analysis are also stored as icons, each in a folder of similar objects.

    DataDesk doesn't use standard data types; you can mix numeric, string, or date values in the same variable. DataDesk's plots are controlled graphically; because graphical control is more natural than control using textual commands, this makes it easier to use and is consistent with its interface theory. Derived variables are always dynamically linked to their source variables--so if you change the source variables, any derived variables will change, too.

    If the data is unfamiliar to the analyst, he or she can begin an exploration with some bar, pie, and frequency charts by clicking on a variable icon and selecting the plot from a drop-down menu. To explore relationships between variables, select the target icons and pick a scatterplot from the menu. Interaction with charts is completely graphical--there's no specialized terminology to learn. Once you get used to it, manipulating data is very natural.

    Displays work dynamically with one another. For example, clicking a slice in a pie chart immediately highlights all the elements in that sector in a scatterplot or table. Niceties like this make this application effective for detecting trends and patterns. It's also easy to perform data transformations in order to make relationships more apparent, or to force a distribution to comply with the assumptions of parametric statistical techniques.

    The program even provides expert suggestions for where to go next with an analysis path. These hyperview menus help guide you through an analysis by providing suggestions for related tables, charts, or statistical tests. The program accumulates results icons in the results folder in the order they are done, letting you go back to follow other analysis paths.

    New users will need a lot of help, and DataDesk provides it. The best way to get started is with the printed Quickstart Guide, which leads you through a hypothetical exploratory analysis and overview of the product's capabilities. Two manuals, a handbook and a statistics guide, give details of DataDesk's features. The online help is detailed but may leave novice users wondering how to get started. There's also a frequently-asked-questions page at the vendor's Web site.

    DataDesk demands a substantial initial outlay of time to learn, but it promises considerable rewards once you've mastered it. The high-investment/high-reward character of this product means it's not right for everyone. If you rarely need to do data mining, then it's probably not worth your time to use it. But if you or your business could profit from having the ability to uncover patterns in your data that have been overlooked by others, then I highly recommend DataDesk.

    Michael Dineen is a data analyst with the Colorado Department of Public Health. He can be reached at mdineen@uswest.net.


    Back to Labs
    Send Us Your Feedback
    Top of the Page

    CAREER CENTER
    Ready to take that job and shove it?



    TechCareers

    SEARCH
    Function:

    Keyword(s):

    State:
    SPONSOR
    RECENT JOB POSTINGS
    CAREER NEWS
    Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

    Ari Balogh was named to the post of chief technology officer as the companys for a "realignment" of employees.



    Specialty Resources

    Featured Microsite