Big Data Education: 3 Steps Universities Must Take
How can universities help meet the growing demand for data scientists? Consider this advice from a professor working in the trenches with tomorrow's analytics pros.
Big Data Talent War: 7 Ways To Win
(click image for larger view and for slideshow)
By now, we all know that the "sexiest job of the 21st century" is the data scientist. A scan of articles and blogs describing data scientists and their raw material -- big data -- reveals several "sexy" themes. First, data is ubiquitous, big and coming at us with increasing velocity. Second, traditional tools that have been used to extract and analyze 20th century data don't work with big data. Third, incredibly few people have the skills necessary to translate this tsunami of data into meaningful information -- making them the hotshots in the job market.
By 2018, McKinsey estimates that there will be a talent gap for deep analytical talent of almost 200,000 people. No doubt the data scientists' dance cards will be full.
So, with all of this demand, combined with a high national unemployment rate, university students are beating down the doors for acceptance into the data science programs on campus -- right? Sadly, the answer is "no" -- but not because students are not interested in taking courses aligned with data science. They want to be job hotshots. The issue is that no university in the country has a program in data science.
We understand the general reasons why universities have not pivoted to better meet the demands of the market in this space -- ivory tower mentalities, few academics have the experience or the skill set to teach big data analytics and lack of actual big data for the classroom.
As an academic, and a former practicing statistician/consultant, I believe universities have to address the challenge and partner with the private and public sectors to close this talent gap. Specifically, I recommend three considerations for universities in the area of data science:
1. Data science should not be an undergraduate degree. It's too broad, too nuanced and too demanding for an 18-year-old student to understand. Undergraduate students who are interested in eventually pursuing data science should study mathematics or computer science and take elective courses in some content area like finance, biology or sociology. During their undergraduate degree studies, students should be developing the absorptive capacity necessary to develop the deep and wide skills required to be competitive in this space.
2. Any graduate degree in data science must integrate the disciplines of mathematics, statistics and computer science. This is a particularly daunting challenge for many universities, as these disciplines typically are housed in different departments or even different colleges. Data science is inherently interdisciplinary. Any Master's or doctoral degree would necessarily include:
a. A foundation in computational mathematics, such as matrix algebra, combinatorics and graph theory. This is critical because the other skills cannot be developed without some numerate orientation.
b. Programming, namely strong analytically oriented programming such as SAS and R as well as strong language-oriented programming such as C++, Java, Hadoop or Python. Some coursework in high performance analytics is particularly valuable.
c. Statistical analysis, model development and data visualizations. These skills are not going away; they are evolving.
d. A working knowledge of a content area. After all, data science has power in application, not in theory.
e. A practicum/work experience component. This cannot be overemphasized. If you try to teach someone to swim through lessons from a textbook, they will drown when thrown into a pool. Graduate students studying data science need practical experience working with complex, unstructured data. While we try to create realistic experiences in the classroom, ultimately, they are not real.
3. Research. This is a nascent, but expanding field of study. New problems emerge every day. Data science, much like medicine, lends itself to applied (versus traditionally theoretical) research. Conferences provide great opportunities for graduate students to present white papers on new code, develop creative solutions to solving new problems and even give name and structure to emerging issues. These are all part of the fertile field of data science research and scholarship.
Some companies like EMC/Greenplum and IBM are bypassing universities altogether and developing data scientists in house. This is a reasonable short-term response, given the absence of programs of study. However, if the talent gap is to be closed, over the long term universities are going to have to rethink how they approach the science of analytics.
From SDN to network overlays, emerging technologies promise to reshape the data center for the age of virtualization. Also in the new, all-digital The Virtual Network issue of Network Computing: Open Compute rethinks server design. (Free registration required.)
Dr. Jennifer Priestley is an Associate Professor of Statistics at Kennesaw State University, where she is the Director of the Center for Statistics and Analytical Services. She also oversees the undergraduate curriculum in Statistics, and was recognized by the SAS Institute as the 2012 Distinguished Statistics Professor of the Year. She served as the Co-Chair of the 2012 National Analytics Conference in Las Vegas, NV.
6 Tools to Protect Big DataMost IT teams have their conventional databases covered in terms of security and business continuity. But as we enter the era of big data, Hadoop, and NoSQL, protection schemes need to evolve. In fact, big data could drive the next big security strategy shift.
Big Data Brings Big Security ProblemsWhy should big data be more difficult to secure? In a word, variety. But the business won’t wait to use it to predict customer behavior, find correlations across disparate data sources, predict fraud or financial risk, and more.
InformationWeek Tech Digest, Nov. 10, 2014Just 30% of respondents to our new survey say their companies are very or extremely effective at identifying critical data and analyzing it to make decisions, down from 42% in 2013. What gives?