Software // Information Management
News
9/17/2013
09:15 AM
Connect Directly
LinkedIn
Twitter
Google+
RSS
E-Mail
50%
50%

Spotify Embraces Hortonworks, Dumps Cloudera

World's largest music service switches Hadoop distributions to take advantage of Hortonworks Hive improvements, support services.

Spotify, the 24-million-user-strong music service based in Stockholm and London, announced Monday that it's migrating its massive, 690-node Hadoop cluster from Cloudera's software distribution to the Hortonworks Data Platform (HDP) and Hortonworks enterprise support.

Among the largest Hadoop implementations in Europe, Spotify's cluster is used to develop analytics that drive the company's personalized services, such as Spotify Radio. It also drives data-driven analyses for advertisers and partners. For example, Spotify can do listener segmentation to help advertisers place ads. It can also do geospatial analyses of listening patterns to help record labels and artists determine optimal concert locations.

"[Hortonworks'] true open source approach and the work they have done to improve the Apache Hive data warehouse system aligns well with our needs," said Wouter de Bie, team lead for data infrastructure at Spotify, in a statement. "We use Hive extensively for ad-hoc queries and for the analysis of large data sets."

Most Hadoop software distributors have supported the so-called SQL-on-Hadoop movement this year -- Cloudera with Impala, IBM with Big SQL, MapR with Drill, and Pivotal with HAWQ -- but Hortonworks is alone in doing so by focusing on improving Hadoop's existing Hive interface through its Stinger initiative.

[ Want the latest from this up-and-coming vendor? Read Hadoop According To Hortonworks: An Insider's View. ]

Hive relies on behind-the-scenes MapReduce processing, which has a reputation for being slow, but Hortonworks executives insist that the company's design improvements will drive a 100X performance improvement that will yield ad-hoc query results within "a handful of seconds."

"Spotify is undertaking some really innovative work in the data analytics field and realized the need for a deep level of open source Apache Hadoop domain experience and expertise," commented Herb Cunitz, president of Hortonworks, in a statement.

Spotify launched in 2008 and soon thereafter launched a 30-node cluster on Amazon Web Services. The company switched to an on-premises 60-node cluster less than two years ago and was scaled out quickly to today's 690 nodes. The company collects more than 200 gigabytes of compressed user activity data per day and has more than 4 petabytes of capacity in its cluster.

Spotify could not be reached in time to comment on whether it's simply using Cloudera's distribution of open source software or also employing its commercial management software and support services. Spotify is said to have a highly skilled, 12-plus-engineer internal Hadoop team that would seem quite capable of running Hadoop independently. That team developed Luigi, a Python framework for batch data processing, dependency resolution and monitoring of Hadoop that Spotify has since contributed to open source.

"The cultural fit was an important factor in our selection and we have appreciated Hortonworks' relaxed, helpful and open approach," said Wouter de Bie. "We were looking for a true partner relationship and the team at Hortonworks [is] committed to enabling the overall ecosystem."

Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
9/26/2013 | 8:09:53 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
I finally got the telling answer from Wouter at Spotify and it's as I suspected:

"We were not using Cloudera's commercial management software or support
beforehand," says Wouter. "Everything was done in-house, but we were
running CDH."

Kind of takes the bite out of "dumps" for Cloudera.
Guest
50%
50%
Guest,
User Rank: Apprentice
9/26/2013 | 8:06:44 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
"We were not using Cloudera's commercial
management software or support beforehand," says Wouter. "Everything was done in-house, but we
were running CDH."
D. Henschen
50%
50%
D. Henschen,
User Rank: Author
9/17/2013 | 3:54:16 PM
re: Spotify Embraces Hortonworks, Dumps Cloudera
Check out this big presentation from Wouter de Bie on Spotify's implementation and uses of Hadoop http://bit.ly/153evDr I didn't see any mention of Cloudera in the slides, so I suspect it's another of the many enterprises that have been setting up and supporting Hadoop clusters on their own (without benefit of support from the likes of Cloudera or Hortonworks). That's clearly changing now at Spotify with the selection of Hortonworks, but I'm still waiting to hear whether it was actually using proprietary Cloudera management software and/or support services.
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Tech Digest - July 22, 2014
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
A UBM Tech Radio episode on the changing economics of Flash storage used in data tiering -- sponsored by Dell.
Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.