informa
/
News

Predictive Analytics: March Madness Style

Two university professors used predictive analytics to predict "at-large" NCAA basketball tournament invitees with 94 percent accuracy this year, and they're developing ways to predict actual game winners.
Each year the NCAA Basketball Tournament brings a level of energy that few sporting events equal. It conjures images of tourneys past and gives fans and players plenty of things to consider. Business intelligence, it's safe to say, is not usually one of them. For two business professors, however, BI is at the heart of predicting who will make it to "The Big Dance."

Back in the 1990s Allen Lynch of Mercer University in Georgia and Jay Coleman of the University of North Florida combined their business skills and passion for college hoops to create the Dance Card, a formula designed to predict which teams would receive "at-large" bids to the tournament. "At-large" refers to those teams that do not receive automatic invites to the tournament, and until Selection Sunday no one knows exactly who those teams will be. "It's anyone's guess," has no doubt been uttered around an office water cooler or two. This is where Lynch and Coleman's Dance Card enters the fold. Using a predictive analytics technology developed by SAS, the pair has effectively removed the guesswork normally associated with the selection process.

To create the Dance Card, Lynch and Coleman pulled data from the old RPI index (available at CollegeRPI.com) and various public Web sources. Next came the process of actually loading all the data and creating a formula, which could seem like a grueling and time-consuming task. But as Lynch explained, the SAS program allowed them to manipulate the large data sets very efficiently. By using algorithms supplied in the program, they were able to conduct statistical analysis with just a few simple commands.

Flexibility was another important factor for the pair, given not only the large amount of data, but also the variables they needed to work with. The SAS model gives different interfaces to different users, so they can work on their data sets in a way that works best for them. As Coleman noted, the goal was to identify the most important information and boil it down to one relevant number. According to Anne Milley, Director of Analytic Strategy at SAS, the goal of their predictive models -- which are typically used in business scenarios -- is to help pinpoint what will benefit a company by creating variables based on the information they have collected, and to put those variables to work for them. The bottom line is to make the most of the data, and go beyond OLAP and reporting on what has happened, and begin to understand what will happen.

This is precisely what Lynch and Coleman have done with the Dance Card. This year they predicted correctly 32 of 34 of the "at-large" teams (94 percent) based on the new RPI index adopted by the NCAA this year. If you use the old RPI index (which the Dance Card was based on) they actually went 33 of 34 (65 of 66 total) for 97 percent accuracy. Overall accuracy for the past twelve seasons is 94 percent.

The predictive model Lynch and Coleman used helped ensure more consistent results, Lynch said. "It limits variability associated with repeated decisions," he noted.

Recently, Lynch and Coleman added a new twist by adding the Score Card, which predicts the results of specific Tournament games, and has no doubt turned heads in Las Vegas bookie circles. Employing the same predictive model, the pair used four variables on each team. At the end of conference championship games they entered those variables into a formula to achieve a Power Index. The team with the higher scorecard value was predicted to be the winner. When their data was run against results from the 2001-2004 tournaments, overall accuracy was 75 percent.

A recent IDC study shows that worldwide revenue from core and predictive analytics will surpass $11 billion by 2008. SAS recently integrated its advanced analytics (including predictive components) with its BI platform (SAS 9).