Starting Wednesday, Berkeley will be taking registrations for its Master of Information and Data Science program, which will begin in January 2014. The program is being offered by the School of Information, which emphasizes management of information over information technology. That means students will be expected to have a working competence with software such as R for statistical analysis and will be introduced to technologies such as Hadoop, but won't get quite as deep into the design of systems or advanced algorithms as if they were in a computer science graduate program. Instead, they will develop the skills to put big data to work for all the businesses angling to make more productive use of more data, from a wider variety of sources.
This is a completely new graduate program, offered only online, although it has some overlap with the school's on-campus Master of Information Management and Systems program. The I School, as it's known, also offers a Ph.D. program. Dean AnnaLee Saxenian said it made sense to offer the data science program online because it is the Internet era that has given rise to a huge demand for people who can make sense of all the data produced by the online world. Although she tries to stay away from the term "big data," saying "it's become so amorphous it means everything or nothing," there is a reason "data scientist" has emerged as a new job title at so many companies, she said.
[ Companies need big data experts. Could you be one? Read Big Data Career Switch: 4 Key Points.]
"We're inundated with all this data thrown off by the Web, by sensor networks, and by the mobile devices we carry," Saxenian said. New tools have emerged to analyze masses of heterogeneous data, but the process is different than the methods that worked with structured data and it tends to require a more cross-disciplinary approach, she said. "You need not simply be a programmer or computer scientist, you also need the tools of a statistician. You need to understand research design and how to communicate what comes out of the data to decision makers."
I School faculty will teach their curriculum alongside experienced data science professionals. Classes will range from an introduction to machine learning and data storage and retrieval to the privacy, security and ethics of data. Machine learning is the intersection of computer science and statistics that focuses on finding patterns in data.
"Data science and big data are a very important place for job growth," said Chip Paucek, CEO and co-founder of 2U, which is providing the technical platform and support services for the program. "I know, as a CEO, I will end up hiring several people from the program" because the skills are so scarce and valuable, he said.
A mockup of the Berkeley online program in 2U.
Saxenian said she expects the program to attract computer scientists but also students from other majors such as philosophy and the social sciences. In industry, data scientists typically work in partnership with hard-core programmers, she said. "They need to understand the pitfalls of diving into data and be able to come up with good research questions. They also need to understand our cognitive biases, such as confirmation bias. They need to understand both the opportunities presented by data and the ways you can go wrong with it." These are the people who decide what data to collect, how to collect, how to analyze it, and particularly how to visualize it, she said.
"We are awash with data, but the expertise to analyze and exploit that data is in short supply. The mission of the MIDS degree is to provide that expertise," said Hal Varian, a professor emeritus at the I School and chief economist with Google, in a statement.