Hadoop distributor Hortonworks used its Hadoop Summit in San Jose this week to get a little closer to one of its top cloud technology partners -- Microsoft.
The big data company announced that Microsoft Azure HDInsight is its Premier partner for Connected Data Platforms -- Hortonworks Data Platform for data at rest and Hortonworks DataFlow for data in motion.
"Azure HDInsight as our Premier Connected Data Platforms cloud solution gives customers flexibility to future proof their architecture as more workloads move to the cloud," Hortonworks CEO Rob Bearden wrote in a prepared statement released June 28.
The closer partnership with Microsoft was one of several announcements from Hortonworks during the Hadoop Summit this week. The company also updated its Hortonworks Data Platform package with features for enterprise customers, introduced a new precision medicine consortium to explore a next-generation open source platform for genomics research, and struck a partnership with AtScale to advance business intelligence on Hadoop.
[Another Hadoop distributor, MapR, also recently released an update. Read MapR Spyglass Initiative Eases Big Data Management.]
New Features for Enterprise
Hortonworks Data Platform (HDP) 2.5 is the newest version. The company says it offers enterprise-ready features, including an integration of comprehensive security and trusted data governance that both leverage Apache Atlas and Apache Ranger. The company has also included a host of other open source big data technologies to make the package an enterprise-grade experience.
The platform now also offers the web-based data science notebook, Apache Zeppelin, for interactive data analytics and the creation of interactive documents with SQL, Scala, Python, and other tools.
The inclusion of the most recent version of Apache Ambari gives enterprises support for planning, installing, and securely configuring HDP, and for performing ongoing maintenance and management of the systems. Also, a new role-based access control model now lets administrators provide different users with different functional access to the cluster.
To improve developer productivity, the company has added Apache Phoenix Query Server to enable more choices for development languages to access data stored within HBase. Apache Storm now allows for large-scale deployments for real-time stream processing. The new version also includes new connectors for search and NoSQL databases, according to Hortonworks.
Business Intelligence on Hadoop
Hortonworks also announced a new partnership with AtScale, offering that startup's technology for enabling SQL-type queries against data resident in Hadoop.
"From day one, our goal has been to make BI and Hadoop work in harmony by erasing the friction associated with moving data and forcing end users to learn new BI tools," wrote AtScale CEO Dave Mariani in a prepared statement. AtScale's technology will be available via Hortonworks in the third quarter, the companies said.
Precision Medicine Consortium for Genomics
Hortonworks also announced its own plan to participate in the precision medicine space with the formation of a new consortium "to define and develop and open source genomics platform to accelerate genomics based precision medicine in research and clinical care."
In addition to Hortonworks, initial members of this consortium include Arizona State University, Baylor College of Medicine, Booz Allen Hamilton, Mayo Clinic, OneOme, and Yale New Haven Health.
Hortonworks said that this consortium will take on the task of defining the requirements and addressing the limitations of current technology for storing massive volumes of genomic information, analyzing it, and querying it at scale in real time.
Hortonworks noted the consortium will apply "Design Thinking" to this problem.
"Unleashing the power of data through open community and collaboration is the right approach to solve a complex problem like precision medicine," DJ Patil, chief data scientist, White House Office of Science and Technology Policy, wrote in a prepared statement. "Initiatives like this one will break data silos and share data in an open platform across industries to speed genomics-based research and ultimately save lives."