Spotify Embraces Hortonworks, Dumps Cloudera - InformationWeek
Software // Information Management
09:15 AM
Connect Directly

Spotify Embraces Hortonworks, Dumps Cloudera

World's largest music service switches Hadoop distributions to take advantage of Hortonworks Hive improvements, support services.

Spotify, the 24-million-user-strong music service based in Stockholm and London, announced Monday that it's migrating its massive, 690-node Hadoop cluster from Cloudera's software distribution to the Hortonworks Data Platform (HDP) and Hortonworks enterprise support.

Among the largest Hadoop implementations in Europe, Spotify's cluster is used to develop analytics that drive the company's personalized services, such as Spotify Radio. It also drives data-driven analyses for advertisers and partners. For example, Spotify can do listener segmentation to help advertisers place ads. It can also do geospatial analyses of listening patterns to help record labels and artists determine optimal concert locations.

"[Hortonworks'] true open source approach and the work they have done to improve the Apache Hive data warehouse system aligns well with our needs," said Wouter de Bie, team lead for data infrastructure at Spotify, in a statement. "We use Hive extensively for ad-hoc queries and for the analysis of large data sets."

Most Hadoop software distributors have supported the so-called SQL-on-Hadoop movement this year -- Cloudera with Impala, IBM with Big SQL, MapR with Drill, and Pivotal with HAWQ -- but Hortonworks is alone in doing so by focusing on improving Hadoop's existing Hive interface through its Stinger initiative.

[ Want the latest from this up-and-coming vendor? Read Hadoop According To Hortonworks: An Insider's View. ]

Hive relies on behind-the-scenes MapReduce processing, which has a reputation for being slow, but Hortonworks executives insist that the company's design improvements will drive a 100X performance improvement that will yield ad-hoc query results within "a handful of seconds."

"Spotify is undertaking some really innovative work in the data analytics field and realized the need for a deep level of open source Apache Hadoop domain experience and expertise," commented Herb Cunitz, president of Hortonworks, in a statement.

Spotify launched in 2008 and soon thereafter launched a 30-node cluster on Amazon Web Services. The company switched to an on-premises 60-node cluster less than two years ago and was scaled out quickly to today's 690 nodes. The company collects more than 200 gigabytes of compressed user activity data per day and has more than 4 petabytes of capacity in its cluster.

Spotify could not be reached in time to comment on whether it's simply using Cloudera's distribution of open source software or also employing its commercial management software and support services. Spotify is said to have a highly skilled, 12-plus-engineer internal Hadoop team that would seem quite capable of running Hadoop independently. That team developed Luigi, a Python framework for batch data processing, dependency resolution and monitoring of Hadoop that Spotify has since contributed to open source.

"The cultural fit was an important factor in our selection and we have appreciated Hortonworks' relaxed, helpful and open approach," said Wouter de Bie. "We were looking for a true partner relationship and the team at Hortonworks [is] committed to enabling the overall ecosystem."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
D. Henschen
D. Henschen,
User Rank: Author
9/26/2013 | 8:09:53 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
I finally got the telling answer from Wouter at Spotify and it's as I suspected:

"We were not using Cloudera's commercial management software or support
beforehand," says Wouter. "Everything was done in-house, but we were
running CDH."

Kind of takes the bite out of "dumps" for Cloudera.
User Rank: Apprentice
9/26/2013 | 8:06:44 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
"We were not using Cloudera's commercial
management software or support beforehand," says Wouter. "Everything was done in-house, but we
were running CDH."
D. Henschen
D. Henschen,
User Rank: Author
9/17/2013 | 3:54:16 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
Check out this big presentation from Wouter de Bie on Spotify's implementation and uses of Hadoop I didn't see any mention of Cloudera in the slides, so I suspect it's another of the many enterprises that have been setting up and supporting Hadoop clusters on their own (without benefit of support from the likes of Cloudera or Hortonworks). That's clearly changing now at Spotify with the selection of Hortonworks, but I'm still waiting to hear whether it was actually using proprietary Cloudera management software and/or support services.
Register for InformationWeek Newsletters
White Papers
Current Issue
The Next Generation of IT Support
The workforce is changing as businesses become global and technology erodes geographical and physical barriers.IT organizations are critical to enabling this transition and can utilize next-generation tools and strategies to provide world-class support regardless of location, platform or device
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll