My first InformationWeek column on big data incited some hot responses, such as: "Who has the time to waste working with software that is so ill featured and unreliable that it is like something out of the '80's?" and "If certain data is important to an institution, then that data belongs in the data warehouse, not in some newfangled database that doesn't even impose validity constraints." The meanest comment was, "Big data is a fad and you're just a shill for the vendors who created the fad because they had run out of things to sell us."
My column stressed the need for CIOs to get involved in big data in order to provide support to the faculty who were either teaching or doing research in this area.
A few of the CIOs who responded confessed that they did not know much about the software and tools I had cited, and they sure didn't have a clue as to why their faculty wanted these tools. I tried to help by pointing them to some explanatory websites for Hadoop, MapReduce, Cassandra and R.
[ Important analytics: Auburn University Program Trains Future National Security Analysts. ]
The responses I found most intriguing were from CIOs who said that they had already taken steps to provide their faculty with access to big data tools, but now they are itching to expand their own role with these new technologies by utilizing big data analytics to improve IT operations.
I asked what was stopping them, and they said: "I have no money" and "My staff is already overwhelmed dealing with clouds and mobile and virtualizing and distance learning and every other game changer that's been happening in higher ed technology. How can I tell them they now have to learn Hadoop?" A fair question, given the sad state of Hadoop's interface -- a question that can't be answered just by me sharing links to websites.
My advice to them was to first identify the pain points in IT operations that happen to be awash in unstructured data. A pain point might be trying to keep the network secure. Or perhaps they are getting too many complaints about network performance or reliability. These pain points do not suffer from a lack of data. The log files produce so much data that staff can barely skim them each day. This massive amount of data creates its own problems, since the attention one can pay to a warning signal is inversely proportional to the number of warning signals being emitted. (I just made that up, but I'm pretty sure it's true.)
If the data for a major pain point is generated, but it is way too much to comprehend, you have a perfect big data problem!
Those CIOs who have been seeing their budgets increase each year can just hire a consulting team to come in and set up systems to tackle the pain points. Unfortunately, none of my respondents seem to be in that situation. And, they are not likely to have any data scientists on staff.
So where can higher ed IT leaders get the people to create the research design and the predictive algorithms, deploy the software and tools, mine the data, and then reach that Eureka moment, known as the "big insight"?
The experts tell us that anyone starting a big data analytics project will likely need a cross-functional team that has at least one person on the team with the knowledge and skills to use these new tools, one who has the predictive analytics skills, and one who is great at visualization. They point out that the team also needs a project manager who has superior business domain knowledge.
My respondents admitted that they do have folks with business domain knowledge, but the other skills ... no way! Yet these are cutting-edge leaders who have deployed big data tools for faculty and students, something that few higher ed CIOs have done. This means that every day in their institutions students are gaining experience in using R, Hadoop, MapReduce and a variety of visualization and predictive analytics tools.
To these CIOs I'd like to ask: Could some of these students be recruited by your IT organization to be part of a junior data sciences team? If assembling a student team seems too much of a challenge, then how about tapping into existing student teams in your school's big data programs, such as this one at George Mason, which has teams of students from many disciplines engaging in a final project involving a big data set -- which could be your set of log data!
Bill Gates was a high school student when mainframe computers first came to Seattle. His teacher set up a link with a local corporation and then organized the students into teams to learn to computer program in order to help the high school and the corporation take advantage of this new technology.
Big data projects could provide the higher ed IT leader with the opportunity to become this kind of world-changing teacher.
I look forward to your comments. If you are using big data analytic tools to address pain points in your IT department, please share! Your colleagues are especially interested in knowing how you managed to build your data sciences team, so share that, too.