On April 1, EMC and VMware will spin out Pivotal Inc., a potential giant of the big data analytics field. They hope the company will grow to $1 billion in revenue within five years.
The budding company, which brings together a set of software components that were not originally designed to work together, will produce a data analysis platform that can capture large amounts of data in one system, address it with SQL-like queries, come up with answers in near real time and store the data in a multi-petabyte, scale-out storage system. In other words, it will comprise the best of what heretofore have been different worlds and usher VMware customers into a new era of data management. Pivotal will give VMware users a highly automated tool with which to compete with challengers such as Amazon.
Furthermore, it will exhibit unusual performance. The Pivotal data handling platform will be based on a Greenplum data warehouse system working with Pivotal HD, a data querying system working with Hadoop. The Greenplum data warehouse team, part of EMC, has 10 years of experience in working with parallel queries and parallel databases and knows how to stage a query into Pivotal HD, resulting in performance that beats Facebook's Hive or Cloudera's Impala, two leading ways to work with Hadoop. "We smoke all these technologies," said Maritz in a presentation on the VMware investor relations website.
Pivotal will require an investment of $400 million this year and next to establish the company's goal of $300 million in revenues this year, growing to $1 billion by 2017. That was the heart of the presentation that Paul Maritz, the prospective CEO of Pivotal, gave to a large group of financial analysts at VMware's Strategic Investor Forum March 13 in New York. Maritz said big data analytics will be a $20 billion market by 2017; other estimates have ranged even higher than that.
[ Want to learn more about the prospects for the Hadoop data management system? See Big Data Debate: Will Hadoop Become Dominant Platform? ]
In short, Pivotal is a new product line being launched by EMC and VMware to sit atop the world they have already created: the virtualized data center working with deep pools of storage. These resources will function as a VMware cloud and in most cases will host a Pivotal data platform inside the enterprise data center. "I am convinced that ten years from now, someone will have come up with a platform in this area," said Maritz, and he wants it to be his company.
But there's an alternative IT infrastructure emerging to the VMware-virtualized data center -- one with the more highly automated, uniform environment of a cloud provider. Many startups obtain such an infrastructure from a public cloud, and fast-growing Internet companies such as Google, Zynga and Facebook build their own. When such companies emerge as competitors to traditional companies, as Amazon has, it's a challenge.
"The consumer Internet giants use IT in a fundamentally different way… They operate at scale in a highly automated way. The other thing they do is reason over very large data sets and use that to drive new customer experiences and business models," Maritz told analysts.
The Pivotal platform will be available for more traditional companies to compete with this disruptive force. Pivotal plans to package its Cloud Foundry software so that it can be used to host Pivotal atop other clouds as well, including Amazon Web Services' EC2. Pivotal data management will be available for use in a variety of cloud settings and will not be dependent on a proprietary VMware system, Maritz said.
To ensure that industry-neutral approach, Pivotal will be broken out of EMC and VMware as an independent company, with its own board of directors and 500 employees drawn from VMware and EMC. It will not be required to stick to a VMware-only game plan. EMC Chairman Joe Tucci is following the pattern he set with VMware, which remained independent after EMC acquired it. VMware maximized the opportunity without marching in lockstep with EMC; Pivotal will be expected to do the same.
EMC and VMware are using an assortment of components to build the new platform. One central piece is the Greenplum data warehouse system, built on the open source PostgreSQL relational database. PostgreSQL is somewhat under-productized as open source code due to the popularity of MySQL, a simpler system suited to many read-only uses. But PostgreSQL is a full ANSI-standard relational database system and Greenplum has used it to build a data warehouse system with parallel processing characteristics. That experience, it turns out, is very useful when trying to figure out how to address a large parallel system like Hadoop.
Another component is analytics startup Cetas, acquired last April, which performs fast analysis of Hadoop data. VMware will also contribute Gemfire, an in-memory data management system, to serve as a caching layer for a Pivotal analytics platform. The company gets its name from VMware acquisition Pivotal Labs, a 250-employee "extreme programming shop" in the Silicon Valley, which turned out applications by forming joint agile development teams for startups.
Skilled at working with Hadoop, Pivotal Labs developed HAWQ, or Hadoop With Query, which can provide SQL queries on top of Hadoop data handling. The new Pivotal's first product will be launched in April as Pivotal HD; during the second half of 2013 it will be grown into the Pivotal Platform and offer other data handling characteristics. With Pivotal HD, companies will be able to conduct queries against very large data sets, larger than any relational database system can handle efficiently. "In the Hadoop view of the world, the more data you have in one place, the more you're ahead," Maritz told analysts.
The software components will be given new Pivotal names as they are added to the Pivotal Platform, and they will be used to create next-generation data repositories or warehouses (sometimes referred to as "data lakes"). Analytics will be combined with application development and legacy application integration through capabilities supplied by the Spring Framework.
Nowhere is the need for new ways of managing data more evident than in the telecommunications industry. Telcos know exactly how many calls they've dropped each hour, but they don't know which calls were dropped. If you ask your service provider why a particular call was dropped, Maritz pointed out, it will take them several days to confirm and reply.
The Pivotal Platform's fast analytics might aim to inform a telco when it drops the call of a particularly valued customer, for example, or a customer who may be showing signs of moving elsewhere, perhaps triggering an apologetic text.
Such an analytics platform might also have operational uses in VMware's future, for a software-defined data center, and for managing storage in a cloud-based system. For now, however, Pivotal's main goal is to forge a place for itself in the analytics marketplace. With Pivotal, VMware and EMC are positioning for dramatic growth. By creating another public company on the order of VMware to address the analytics initiative, they are not seeking to simply defend their existing customer bases but to augment them. It's an ambitious undertaking -- but Pivotal has an unusual combination of software assets. And with Paul Maritz, the company has a leader who's done it before.
Attend Interop Las Vegas May 6-10 and learn the emerging trends in information risk management and security. Use Priority Code MPIWK by March 22 to save an additional $200 off the early bird discount on All Access and Conference Passes. Join us in Las Vegas for access to 125+ workshops and conference classes, 300+ exhibiting companies, and the latest technology. Register today!