Whether or not big data has yet reached its inflection point, it is clearly transformational for IT organizations and the enterprises they support. In this environment of rapid proliferation of big data, however, governments at levels from local to federal are still sitting on the sidelines, waiting to see what will happen next before making a play.
Instead, they should do what leading businesses are doing: become thoroughly knowledgeable about the five major trends that must be addressed by their IT organizations as they plan for the future.
[The next wave? Read Yahoo Talks Apache Storm: Real-Time Appeal.]
These rapidly advancing trends are challenging many fundamental assumptions about IT strategy and planning, and therefore have the potential to revolutionize how business is conducted and managed. In the same way, they offer governments opportunities to significantly improve their efficiency and performance as well as better serve their constituencies. So understanding these big-data trends is essential for governments wanting to prepare for the changes ahead.
1. Open-source is the future of big data. Enterprises are increasingly adopting open-source technologies as they see open-source software as a competitive differentiator that enables agility, mitigates risk, and lowers costs. In big data, open-source, led by Hadoop, is driving the most significant innovations. With low implementation costs and high adoption levels -- including direct support from trendsetting technology organizations such as Facebook, Twitter, Amazon, and LinkedIn -- open-source software is spreading. Emerging open-source frameworks and technologies of particular interest in big data include Storm, Kafka, and S4 for stream processing, Drill and Dremmel for large-scale querying, and R for statistical computing.
2. Hadoop is set to replace EDWs. Traditional enterprise data warehouses (EDWs), typically designed to house an enterprise's core data, are expensive and ill-equipped for solving big data problems. Data flows from operational systems (such as ERP, financial, and HR) into EDWs that in turn provide consistent and structured data for reporting and business uses.
Hadoop, on the other hand, is an open-source framework built around a high-volume distributed architecture that runs well on low-cost commodity hardware. This architecture and its associated languages and tools allow for solving complex analytics problems relatively quickly.
Hadoop is the ideal platform for analyzing ERP data in conjunction with disparate data sources. For example, a company could combine ERP information with sensor data, weather information, and transportation rates -- data sources with different structures or no structure at all -- to optimize the most cost-effective time and place to ship perishable products. All the loading, structuring, analysis, and reporting can be done directly and rapidly with Hadoop without moving data into or out of the EDW.
Some companies are already augmenting EDWs with Hadoop, offloading traditional ETL (extract, transform, load) functions and making use of the distributed processing capability. Others are using Hadoop to replace EDWs altogether. EMC and Teradata, among other major vendors, have already made bold moves into the Hadoop space.
3. Big data and analytics are increasingly embedded into devices. Hadoop's flexibility and open architecture make it a natural fit for embedding directly into devices from medical equipment to drones. Future "smart" devices will process data and conduct advanced analytics at the source, similar to the way mobile phones have transformed from simple handsets to minicomputers. Embedding Hadoop into devices will accelerate and streamline the collection and processing of high-volume data such as video and audio.
4. Software giants play big data catch-up. While they are unequivocally leaders in analytics, the world's leading software providers are laggards in adopting solid big data strategies. Even database leaders and visualization and business intelligence companies are challenged to bring products to market fast enough to keep pace with the changes.
With so much big data innovation occurring in the open-source space, commercial software companies must not be reactive. Traditional analytics software companies will still play a role, especially in large enterprises, but their leadership will take on a new meaning, and licensing costs could decrease dramatically as customers discover many low-cost options.
Universities, longtime training grounds for specialized skills in data and analytics software (especially SAS and SPSS), are embracing open-source tools, which are freely available and generally a better fit for the academic environment. This trend could radically change the nature of the entry-level market by providing candidates with skills that more closely match employers' increasing demand for open-source experts.
5. Siri is just the beginning. Science fiction has long captivated audiences with sketches of the ideal computing interface -- think Dave and Hal in 2001: A Space Odyssey, and Jarvis in Iron Man. Similarly for big data and real-time analytics, the tipping point -- both in a corporate sense and for society in general -- will be the day a non-technical, non-mathematical, non-engineering user can ask questions of the data without typing or using a mouse. This day is not far off. Siri, the iPhone's interactive personal assistant, was the first commercial success in this area. Experts think that by 2017, nearly two thirds of analytics vendors will incorporate voice recognition into their software.
These trends are truly disruptive and call into question the basic tenants of IT strategy and planning. To grasp the opportunities, forward-looking businesses are revisiting their big data plans with an eye toward this rapidly changing landscape.
Governments should be doing the same.
What will you use for your big data platform? A high-scale relational database? NoSQL database? Hadoop? Event-processing technology? One size doesn't fit all. Here's how to decide. Get the new Pick Your Platform For Big Data issue of InformationWeek Tech Digest today. (Free registration required.)