Big Data // Big Data Analytics
News
9/30/2013
09:25 AM
Connect Directly
Google+
RSS
E-Mail
50%
50%
Repost This

Big Data, 'Breaking Bad' And Orange Juice

MailChimp's chief data scientist has a unique way of explaining data science to the masses. If you know a little Excel, you're ready to learn.

 Big Data Analytics Masters Degrees: 20 Top Programs
Big Data Analytics Masters Degrees: 20 Top Programs
(click image for larger view and for slideshow)

What's the best way to teach data science to people who lack sufficient training in analytics, computer science, modeling and statistics? A little humor can't hurt, perhaps, particularly when delving into potentially mind-numbing topics like algorithms and programming.

John Foreman, chief data scientist for MailChimp, an email marketing service provider, is doing his part to demystify the dark art of data science by speaking and writing extensively on the subject. His data science blog features an eclectic blend of real world -- if a bit unorthodox -- examples of big data in action, including fictionalized accounts of Breaking Bad-style revenue and production dilemmas that illegal drug dealers may encounter.

It's all in fun, of course, and not all of Foreman's data science examples are nefarious by nature.

"The blog was supposed to be like Breaking Bad for data science, so all the examples were from a drug dealer's perspective," he said in a phone interview with InformationWeek.

[ Don't be scared to jump onto the big data boat -- Big Data Holdouts Risk Getting Swamped. ]

In his upcoming book, Data Smart: Using Data Science to Transform Information into Insight, Foreman ditches the druggie references -- a decision designed to please his publisher and parents -- and focuses on more agreeable examples, such as how to devise an optimization model to keep bottled orange juice tasting just as sweet throughout the year. (Foreman knows this problem well. Prior to his MailChimp days, he was a management consultant who did analytics work for Coca-Cola, maker of Simply Orange, a not-from-concentrate juice.)

Foreman's message is this: Don't fear data science. In fact, with a little effort, you might even learn it yourself.

"I think of data science as taking raw data and turning it into something you can make business decisions off of. But big data just means you're doing math with a lot of data," said Foreman.

His data science book, available in October, falls somewhere between rigorous, highly technical textbooks chock full of mathematical equations, and lightweight overviews that don't teach algorithms or the process of data science models.

If you're not a coder, fear not. A spreadsheet is all you need to get started, he claimed.

"The cool thing about a spreadsheet is that you see every step. They're really unsexy, but that's fine because the book is not supposed to be sexy. It's not supposed to mystify people," said Foreman. "We do spreadsheets to show people that data science is tool-agnostic."

After covering eight core data science techniques, the book introduces readers to the programming language R, which is commonly used by data scientists to develop predictive models.

"I say, 'Hey, we're going to start at the very beginning with R, and everything you just did in the previous chapters, we're going to replicate it in R. You're going to see how easy it is, and why everyone does it in code as opposed to a spreadsheet,'" said Foreman.

He added: "You actually build the model yourself, by hand, on a simple example with clear explanations. That way people can feel comfortable, and not just plug in their data and pray that it's going to work."

The book is designed to teach business folks who may feel left behind by the big data juggernaut.

"It seems like there are a lot of people who don't want to raise their hands, speak up, and say, 'Hey, I'm kind of afraid to learn how to code, and learn how to do math at the same time,'" Foreman said. "They feel embarrassed to say that they're left behind."

InformationWeek 500 companies take a practical view of even trendy tech such as cloud, big data analytics and mobile. Read all about what they're doing in our big new special issue. Also in the InformationWeek 500 issue: A ranking of our top 250 winners; profiles of the top five companies; and 20 great ideas that you can steal. (Free registration required.)

Comment  | 
Print  | 
More Insights
InformationWeek Elite 100
InformationWeek Elite 100
Our data shows these innovators using digital technology in two key areas: providing better products and cutting costs. Almost half of them expect to introduce a new IT-led product this year, and 46% are using technology to make business processes more efficient.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Government, May 2014
NIST's cyber-security framework gives critical-infrastructure operators a new tool to assess readiness. But will operators put this voluntary framework to work?
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.