Microsoft is building on its cloud offerings with an expanded Azure Data Lake, arriving with analytics tools designed to simplify big data, and with a new query language.
We first learned about the Azure Data Lake when Microsoft first announced it at the Build conference back in April. The data repository handles data of any size, type, and speed. It eliminates the complexities of processing and storing data while it makes it easier for businesses to get up and running with analytics.
The Azure Data Lake Store, as it has been renamed, will store structured, semi-structured, and unstructured data without forcing application changes as data scales. Data located in the Data Lake Store can be securely shared. It is also accessible from sensors connected to the Internet of Things.
[Office 2016, Windows 10 in China, and more from Microsoft's last week.]
According to a blog post published Sept. 28, the Azure Data Lake Store supports development of big data solutions through a variety of languages and frameworks. The new store works with the Hadoop Distributed File System (HDFS), so Hadoop tools like Hortonworks, Cloudera, and MapR can get the needed data for processing.
Microsoft also today announced Azure Data Lake Analytics, a cloud-based data processing and analytics service. The tool is built on Apache YARN. It scales instantly according to the power needed for each job. It's also cost-efficient; customers only pay for jobs when those jobs are running.
Azure Data Lake Analytics includes U-SQL, a new and scalable query language built on the same runtime that powers Microsoft's big data systems. With U-SQL, users can process queries to analyze data located in the Azure Data Lake Store, as well as information stored on SQL Servers in Azure, Azure SQL Database, and Azure SQL Data Warehouse.
T.K. "Ranga" Rengarajan, corporate vice president of data platforms at Microsoft, acknowledges how developers and data scientists struggle to successfully use existing technologies for big data.
"Code-based solutions offer great power, but require significant investments to master, while SQL-based tools make it easy to get started but are difficult to extend," he wrote on Microsoft's TechNet blog. "We've faced the same problems inside Microsoft and that's why we introduced U-SQL, a new query language that unifies the ease of use of SQL with the expressive power of C#."
Both the Azure Data Lake Store and Azure Data Lake Analytics will be available in preview later this year, Microsoft reports.
Microsoft adds the Azure Data Lake is supported by Azure Data Lake Tools for Visual Studio, which have been designed to foster an integrated development environment across the Azure Data Lake. It's also supported by Hadoop ISV applications spanning security, governance, data preparation, and analytics that can be deployed from the Azure Marketplace.
Ready today is HDInsight, the Apache Hadoop-based series included in Azure Data Lake that works with analytics services like Hive, Storm, HBase, and Spark. Managed clusters on Linux are now generally available, Microsoft reports.