EMC Release Promises All-Purpose Big-Data Platform
EMC Greenplum Unified Analytics Platform unites relational database, Hadoop, and a mix of query and processing techniques.
EMC describes UAP as a single software platform on which data and analytics teams can seamlessly share information and collaborate on analyses without having to work in--or move data between--separate silos. As such, EMC said it presents a more flexible and productive approach than the separate relational and Hadoop environments offered by IBM and planned by Oracle through its separate Oracle Big Data Appliance.
More Software Insights
- The Untapped Potential of Mobile Apps for Commercial Customers
- Using InfoSphere Information Server to Integrate and Manage Big Data
White PapersMore >>
"We're enabling a diversity of tools to plug into an infrastructure that is unified, that is simple to manage, and that makes it easier to do your job," said Luke Lonergan, chief technology officer of ECM's Data Computing Products division, in an interview with InformationWeek.
[ Want more on Hadoop? Read Hadoop Spurs Big Data Revolution. ]
UAP includes the ECM Greenplum relational database, the EMC Greenplum HD Hadoop distribution, and EMC Greenplum Chorus, which is a collaborative, social-network-style interface for the entire data-analysis team. These teams likely include PhD-level data scientists and other experts in predictive and statistical analytics, as well as data-integration experts, data analysts, business-intelligence experts, database administrators, and the line-of-business users and managers who commission the work. Chorus provides a sandbox environment for testing analyses and Facebook-style collaboration features that help team members discuss results.
"Data scientists are in short supply, so we think this collaborative layer is very important because it ties everybody together and creates a shared workflow," Lonergan said, adding that Chorus helps the data scientist share their expertise with other team members.
EMC in September introduced a modular EMC Data Computing Appliance (DCA) capable of running and scaling up both the Greenplum relational database and Hadoop nodes within a single box. Competitors charged it was pyrrhic victory, no better integrated than separate relational and Hadoop deployments.
The DCA does offer a shared Command Center interface that lets administrators monitor, manage, and provision both Greenplum database and Hadoop system performance and capacity. But with the release of UAP software, which will run on the DCA, Lonergan said EMC has taken the next step of unifying data access, management, and workflow.
"With UAP, the Greenplum database can access Hadoop data directly, and over the next 12 to 18 months, you're going to see a fusion of Greeplum and Hadoop services," he explained. The goal is to create a store-once, use-many platform in which all data processing and analysis steps can be handled from a single data-access-and-query layer, he said.
For now, Chorus lets users see the same data sets whether they're in Hadoop or the Greenplum database. And UAP effectively supports a mix of structured and unstructured processing, query, and analysis methods. That includes conventional SQL processing and querying; Hadoop MapReduce and data transformation; and Message Passing Interface (MPI), a method used in scientific programming, supercomputing, and a soon-to-be-released SAS High-Performance Analytics offering that EMC will support.
On the Hadoop front, there's lots of speculation about whether commercial vendors like EMC, IBM, Microsoft and Oracle will be cost competitive in a market that is used to open-source software deployed on commodity hardware. EMC has matched Cloudera on its subscription list pricing of $4,000 per node, per year, according to Lonergan. He added that customers are willing to pay more for Hadoop-customized storage capabilities such as snapshotting and other features offered by Isilon, which is a division of EMC.
The EMC Greenplum-Isilon storage pairing was recently mirrored by a Cloudera alliance with NetApp. Cloudera and NetApp said their Open Solution decouples storage and compute while also providing higher availability and reliability, and improved manageability for enterprise environments.
EMC's Lonergan agreed that storage infrastructure decisions will increasingly be decoupled from processing concerns, giving customers greater choice and mission-critical storage and performance options. It's one of many changes that will take place, he said, as Hadoop matures to meet enterprise-grade performance expectations.
Access DeniedDatabase access controls keep information out of the wrong hands. Limit who sees what to stop leaks--accidental and otherwise. Also in the new, all-digital Dark Reading supplement: Why user provisioning isn't as simple as it sounds. Download the supplement now. (Free registration required.)