Big Data. Big Decisions
InformationWeek
Special Coverage Series

# Big Data Education: When Should It Start?

When elementary school students find data that interests them, they're ready to learn basic statistics concepts. The key: make data analysis relevant to young learners.

How early should one's big data education begin? If we followed the classical music paradigm, then in utero is not too early. But what genre of music is most suitable for future big data analysts? Perhaps improvisational jazz, to foster exploratory analysis? Sousa marches, to inspire dedicated data preparation? Honky-tonk -- well, maybe not.

A related and more pertinent question is when should one's data analysis education begin? A few years ago, I visited my daughter in Japan, where she was teaching English as a second language via the wonderful JET program. In a third-grade mathematics class, the day's lesson involved collecting data on the favorite sports of each student in the class. Each student in the class of about 35 kids came to the front of the class, picked a magnetic plaque with their favorite sports name (soccer, running, table tennis, etc.) and put it on the blackboard.

## More Insights

More >>

More >>

### Reports

More >>

In short order, the teacher constructed a physical histogram corresponding to this categorical variable. The frequency counts showed some variability, and it was also evident that the proportions varied by gender. By the end of the class the students had developed, very painlessly, a good feel for histogram counts and variability. The exercise was fun and interactive, and the learning was implicit rather than authoritarian. Every kid in the class had a very good chance of retaining the gist of the lesson indefinitely. Data analysis education should commence the first occasion that data is collected.

[ Some high school teachers are addressing the anticipated shortage of data scientists now. Read more at Should High Schools Teach Big Data? ]

Aside from plotting data, I observed some other features of the school's operations that would have an indirect bearing on the students' capabilities to work in the area of data analytics. Lunch was consumed not in a cafeteria but in individual classrooms. A few students were sent to pick up the food in the school kitchen, others donned aprons and became servers, and lunch was not over until everything served was consumed. The dishes were collected, and a cleanup crew of students marched the used dishes back to the kitchen.

Then something even more remarkable happened: After lunch, each student went to an assigned area on the school campus, where they had an area to clean. Using brooms, sponges or other cleaning materials, each student performed their assigned duty. Only when this activity was complete could the kids go to the playground for a brief play recess.

This discipline and attention to detail with exhaustive cleaning also corresponds to data preparation, where the entirety of the data file is examined, cleaned, imputed and prepared for analysis. I observed no resistance, dawdling, or impertinence. (The only regrettable part of the visit was a school newspaper photo, taken unbeknownst to me, in which I am evidently impatiently checking my watch during recess.)

I am not advocating the imposition of janitorial duties on elementary school students -- just commenting on my observations and speculating that these kids could do backroom data preparation jobs.

Getting back to the question of when to start big data education: I contend that the best time to commence data analysis education is when the student encounters data of interest. A classic example of introducing histograms is to march a large class of captive statistics students to a field and arrange them in columns of comparable heights -- a living histogram. Surely, the students participating will remember the experience and maybe even recall something about bi-modality. This example suggests that earlier opportunities of statistics had not been exploited.

Advanced Placement (AP) statistics courses are available at the high school level and have experienced increasing enrollment since their inception in 1997. A test score of at least 4 (out of 5) is required to get college credit at some universities. Only about one-third of those taking the AP statistics test achieve this level, so there is no guarantee of meeting this threshold after taking the course. My own limited experience with college students who have taken AP statistics in high school is that they are in my introductory statistics class because they did not pass out of the requirement. Moreover, they may bring some unfortunate statistical baggage with misconceptions about statistics -- for example, they remember some stuff about t-tests and Z-things but do not really understand what they were doing back then. To them, statistics is somehow a set of formulas awaiting injection of numbers and is thus the epitome of boredom.

I personally would like to see statistical concepts introduced in the context of applications of interest to the student, regardless of their age or grade level. Wouldn't it be great if our kids could get some statistical feedback every time they conquer the next level of Angry Birds or Mario Brothers? They could see how they are doing in each level and how they compare to other players of their age.

Learning some elementary statistics in a play environment is painless and generates some interest in the summaries. Long ago I developed some intuition on probability and statistics via dice games (Monopoly and Risk) and cards (War and Pinochle). Data from electronic games or social media sites are more the realm of interest of K-12 and college kids. Augmenting and hopefully enhancing these experiences with related statistics and some analyses would be a plus to future more important activities.

In-memory analytics offers subsecond response times and hundreds of thousands of transactions per second. Now falling costs put it in reach of more enterprises. Also in the Analytics Speed Demon special issue of InformationWeek: Louisiana State University hopes to align business and IT more closely through a master's program focused on analytics. (Free registration required.)

 To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

## By The Numbers

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

## What Do You Think?

 What's your attitude about SQL analysis on top of Hadoop? We want fast, standard SQL analysis capabilities on Hadoop ASAP Hadoop is for unstructured data; SQL is for relational databases We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay Given strong SQL support on Hadoop, we'd nix the data warehouse We're not interested in Hadoop No opinion

## Related Content

### Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

### Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

### Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

### Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

### Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

## Informationweek Reports

### Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.