LinkedIn Donates Dr. Elephant, MapR's Drill 1.6: Big Data Roundup

LinkedIn contributes new technology for monitoring and managing Hadoop and Spark, MapR announces Apache Drill 1.6, SAP and IBM take their relationship to the next level, and more in our Big Data Roundup for the week ending April 10, 2016.
8 Hot Software Skills To Keep Your Career On Track
8 Hot Software Skills To Keep Your Career On Track
(Click image for larger view and slideshow.)

This week in big data saw plenty of news on the artificial intelligence (AI) front, plus a new big data technology open source contribution from LinkedIn, updates from MapR, and a new partnership between a couple of giants -- IBM and SAP. Plus, Time magazine has created a new calculator from a public data set that lets you look at how your salary stacks up against others in your profession based on age and gender.

But let's start with LinkedIn, which has incubated much data science technology and talent, including data scientists that have gone on to companies like Salesforce, Jawbone, and Confluent (Apache Kafka). This week LinkedIn contributed another incubated technology to open source.

Dr. Elephant offers performance monitoring for Apache Hadoop and Apache Spark. The technology automates gathering and analyzing metrics for all the flows, providing information to help team members tune Hadoop and Spark to improve operations.

Automating the tuning is important at LinkedIn because the company runs about 100,000 Hadoop and Spark jobs every day, and users have different levels of experience with Hadoop, according to Akshay Rai, posting about the contribution of Dr. Elephant to GitHub this week.

"It is more important to optimize for the time of these people than just for the hardware resources they use," he wrote. Dr. Elephant was created to improve developer productivity and increase cluster efficiency by making it easier to tune Hadoop jobs. Rai said that Dr. Elephant is popular at LinkedIn and people love its simplicity. Like a doctor, it solves about 80% of problems through simple diagnosis.


Hadoop distribution company MapR announced the availability of Apache Drill 1.6 as part of the unified SQL layer for the MapR Converged Data Platform via tighter integration with MapR-DB. This improves the flexibility of reporting and analytics on JSON data stored in MapR-DB tables, delivering faster insights from operational data, the company said in a statement.

Apache Drill is a distributed SQL engine that enables data exploration and analytics on non-relational datastores, letting users query with standard SQL and BI tools without the need for creating schemas. Version 1.6 of Apache Drill on the MapR Converged Data Platform offers a new MapR-DB document database plugin, enhanced performance and scale, and optimization for Tableau and other BI tools.

This Week In AI

On the AI side of the house, Salesforce this week quietly acquired MetaMind, a company working on deep learning for automated image recognition. Big technology companies from Facebook to Google to IBM to Hewlett Packard Enterprise are developing or acquiring technology and talent that adds AI, deep learning, and machine learning to their arsenals. This is not the first AI acquisition for Salesforce, and the company has also been hiring some top talent in data science and deep learning development in recent years.

Facebook this week also introduced a technology called automatic alternative text, designed to help the visually impaired community experience the social network in the same way that sighted users enjoy it. The technology uses object recognition technology to generate a description of a photo.

Learn to integrate the cloud into legacy systems and new initiatives. Attend the Cloud Connect Track at Interop Las Vegas, May 2-6. Register now!

IBM And SAP Take The Next Step

Meanwhile, IBM and SAP announced that they are taking their relationship to the next level. Where they used to have their technologies running side-by-side, they now will collaborate on their offerings and establish co-locations in Walldorf, Germany, and Palo Alto, Calif. There are six areas for collaboration that the companies will focus on, and among them is putting SAP's HANA Enterprise Cloud platform and applications onto IBM's infrastructure and software to run as a private cloud service. And IBM, which has staked much of its future business on cognitive computing and analytics with IBM Watson, plans to develop such solutions for SAP S/4HANA as part of the expanded partnership. There are several other planned collaborations as well.  

Gender, Age, Salaries, And Data

Time has taken data from IPUMS-USA and created an interactive calculator that may just make you feel terrible, depending on your gender and your age. Plug in your profession, your age, and your gender, and find out how much you make vs. your counterpart of another gender and the same age. If you are a 40-year-old female in the category of "Computer Scientists and Systems Analysts/Network systems Analysts/Web Developers," for instance, you'd make 25% more if you were a man. For "Mathematical science occupations  (all others)" if you are a 40-year-old woman, your male counterpart makes 36% more than you do. The pay gap gets bigger the older you are for most professions, it appears.