Improved security and data lifecycle management also matter greatly when you try to establish a general-purpose enterprise big-data platform that serves many different departments, use cases and data policies. Security is delivered via Knox, a system that provides a single point of secure access for Apache Hadoop clusters. Falcon provides the data lifecycle management framework, a declarative language (think of XML) to orchestrate data movement, coordinate data pipelines, and set lifecycle policies and processing rules for data sets.
Most importantly perhaps, as Hadoop enterprise adoption has accelerated it became clear that multiple processing models -- moving beyond batch -- were critical for Hadoop to broaden its applicability for mainstream enterprise use. The common pattern is that enterprises want to store data in the Hadoop Distributed File System (HDFS) and then access it in a variety of ways, simultaneously, and with a consistent level of service. Hadoop 2.0 also includes Yarn, a resource manager that isolates different applications and supports many use cases beyond just batch processing such as interactive, online, streaming and graph processing. It's fair to say that Hadoop has evolved from an inexpensive parking lot for your data to a framework that can help make timely decisions.
A great example is Gigaset, a former unit of the German tech conglomerate Siemens well-known for its mobile phones. With its new smart home system for security and assisted living called "Elements," the company has jumped on the new possibilities now available. What's even more interesting is how Hadoop helped the company unlock an entirely new market, with additional business models on the horizon.
Elements is a cluster of small sensors that can be quickly installed in any home, slapped on doors or cabinets. Designed to be robust and foolproof, Elements observes and pipes data into a Hadoop cloud via a base station. That sounds easy enough, but the alerts, events and diagnostic pings flow to the tune of three terabytes or 10 billion messages per day in 2014. Just the sheer traffic volume of ingesting millions of doors being opened and closed is similar to a denial of service (DoS) attack.
This ocean of raw data is sorted by statistical relevance only, leaving the interpretation and decision-making to individual customers who can see data visualizations on their smartphone or computer. Customers can decide to relay the data stream to third-party service providers such as ambulances or security services. This new real-time information system for consumers, anchored in the emerging Internet of Things, is worlds apart from the old handset business, admits Gigaset's Nicholas Ord, in this video.
That's the story of one company taking the plunge with Hadoop, but when will others follow? I predict that by 2015, more than half of the top 2,000 global enterprises will have a productive Hadoop deployment. I also expect that in five years, we'll see meaningful differences in many industries when it comes to profitability. Enterprises that have fully embraced Hadoop will come out ahead.