How eBay's Kylin Tool Makes Sense Of Big Data - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
12:06 PM
Connect Directly

How eBay's Kylin Tool Makes Sense Of Big Data

When eBay needed to gain more value out of masses of Hadoop data, the company created its own an open source tool called Kylin. Now Kylin is an Apache Foundation project.

Hadoop At 10: Milestones And Momentum
Hadoop At 10: Milestones And Momentum
(Click image for larger view and slideshow.)

Inside the eBay operations "war room" last December, data analysts and data scientists had one big question on their minds as traffic approached its holiday crescendo: What was the hottest selling item among the 800 million available on the eBay website?

The answer wasn't one that many of them had expected.

"We found that every 12 seconds, we were selling a hoverboard," recalls Debashis Saha, vice president of Commerce Platform and Infrastructure. "It was our hottest-selling item" and one that previously hadn't even shown up on eBay's radar.

With that information in hand, eBay executives could contact suppliers and manufacturers of hoverboards, alert them to the unexpectedly high demand, and urge them to keep their manufacturing going and inventories stocked. It was a way of keeping customers satisfied and safeguarding eBay's own business, one made possible through a fast data analysis system called Kylin.

(Image: Nancy Nehring/iStockphoto)

(Image: Nancy Nehring/iStockphoto)

Kylin is open source code that began as a project inside eBay as it cast about for a tool that could help it make sense of all the data flowing into eBay's implementations of Hadoop.

By 2012 and 2013, there were already plenty of Hadoop front-end tools enhancing its basic distributed file system and MapReduce functionality.

However, eBay needed to be able to look at data in 10 billion rows from multiple angles, and do it quickly. In addition to its Hadoop-tolerant big data scientists, it had a staff of data analysts accustomed to working with the precision of ANSI-standard SQL queries. They were frustrated by the tools then available.

Apache Hive was an existing data warehouse system that worked with Hadoop. While it had SQL capabilities, it hadn't achieved the status of ANSI-standard operations at the time eBay needed them.

Sorting Through Data

"We had started to create a data ocean on Hadoop, but we weren't getting value out of it," recalled Saha in an interview with InformationWeek. Data analysts were exporting data out of Hadoop into OLAP and other SQL query-based systems, so they could find what they wanted, but that added steps to a process that needed to occur faster.

"We needed near real-time decisions on these extremely large data sets. Without them, we couldn’t respond fast enough," recalled Saha.

Furthermore, Saha was troubled by a growing gap between the data analysts who preferred to work with SQL and the data scientists accustomed to Hadoop limitations.

A small group of developers within his group set about addressing the problem in late 2013. By October 2014, they were far enough along with the SQL-standard, Hadoop-compatible Kylin project to propose it as an Apache Software Foundation project. A little over a year later, it was out of incubation and a fully-fledged, high-level project with 32 core developers.

Ten of them are eBay employees.

Kylin leverages Hadoop's ability to scale out to thousands of nodes on a server cluster and make use of the distributed processing enabled by MapReduce. At the same time, it can field SQL queries from a data visualization system like Tableau and return ANSI-standard results.

OLAP (online analytical processing) technology is not new. Building data cubes that can be viewed from a variety of angles was a well-established practice before Hadoop was invented. But Kylin enabled cube-building on a massive scale. Before the views can be achieved, hundreds of billions of rows in Hadoop must be indexed. Kylin’s ability to build "smart indexes" on that scale is one of the things that sets it apart, said Saha.

Debashis Saha, vice president of commerce platform and infrastructure at eBay.

Debashis Saha, vice president of commerce platform and infrastructure at eBay.

With the indexes already built, Kylin users can then achieve faster views and more useful results from large amounts of Hadoop data. "You can take a more granular level of the data and find results that satisfy these (specific) criteria," he said.

Broncos Win

Among other things, eBay data researchers wanted to know leading up to the Super Bowl what team paraphernalia was selling best.

Carolina Panthers gear was selling extremely well in their home region, but the Denver Broncos, and Peyton Manning in particular, had a broader appeal across much of the country. That information could guide eBay operations in making sure the right resources were behind the right memorabilia vendors.

A query handled by Kylin can obtain sub-second results from a data cube representing 10 billion rows, yielding information that's timely in terms of SuperBowl sales, Saha said. It completes 90% of its queries in five seconds or less, according to a Dec. 8 eBay blog post.

Kylin isn't the only tool invented at eBay to work with Hadoop.

It's developer teams have also produced Eagle, a data monitoring tool that quickly detects unauthorized access to sensitive data or malicious activity connected to data, as well as Pulsar, a data visualization and reporting framework. Both are also open source code.

However, Kylin has won the widest following. It's now used by many other companies, including Baidu, Expedia,,, and China Mobile.

[Are eBay operations fast enough? Read: Does eBay Fit in Instant Gratification Economy?]

"In eBay, we collect every user behavior on any eBay screen. While other OLAP engine struggles with the data volume, Kylin enables milliseconds response," Wilson Pang, eBay's senior director of behavior insights, wrote in the December blog.

"All together, Kylin serves as a critical backend component for eBay's product analytics platform... It's the best OLAP engine on big data so far," Pang wrote.

What have you done to advance the cause of Women in IT? Submit your entry now for InformationWeek's Women in IT Award. Full details and a submission form can be found here.

Charles Babcock is an editor-at-large for InformationWeek and author of Management Strategies for the Cloud Revolution, a McGraw-Hill book. He is the former editor-in-chief of Digital News, former software editor of Computerworld and former technology editor of Interactive ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
Charlie Babcock,
User Rank: Author
2/12/2016 | 4:32:56 PM
Kylin components use other parts of Hadoop project
Kylin can read data from Hive, run sorting and pre-calculations against the data via MapReduce and store data as cubes in HBase, using Zookeeper to coordinate jobs, according to some of the project's documentation. It has a Metadata Manager component, a REST Server, an ODBC Driver, a Query Engine and a Storage Engine.

11 Things IT Professionals Wish They Knew Earlier in Their Careers
Lisa Morgan, Freelance Writer,  4/6/2021
Time to Shift Your Job Search Out of Neutral
Jessica Davis, Senior Editor, Enterprise Apps,  3/31/2021
Does Identity Hinder Hybrid-Cloud and Multi-Cloud Adoption?
Joao-Pierre S. Ruth, Senior Writer,  4/1/2021
White Papers
Register for InformationWeek Newsletters
Current Issue
Successful Strategies for Digital Transformation
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll