10-petabyte Teradata database, which will grow 2.5 x further in size..."> Greenplum Out, Teradata In at eBay - InformationWeek
Software // Information Management
Commentary
10/6/2010
02:31 PM
Curt Monash
Curt Monash
Commentary
Connect Directly
RSS
E-Mail
50%
50%

Greenplum Out, Teradata In at eBay

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla's home base of SLAC, which used to stand for "Stanford Linear Accelerator Center"). Things I learned included that eBay's 6.5-petabyte Greenplum database has turned into a >10-petabyte Teradata database, which will grow 2.5 x further in size...

I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla's home base of SLAC, which used to stand for "Stanford Linear Accelerator Center"). Todd Walter of Teradata also sat in on the latter part of the conversation. Things I learned included:

  • eBay has thrown out Greenplum. (Edit: Oliver Ratzesberger responds that the "thrown out" part "could not be further from the truth. The answer to a casual question over lunch was: Do you still use vendor XYZ? And my response was a simple 'No.' ... we have simply selected a different vendor for V2 or our Singularity project... " See comments here for more detail.) eBay's 6.5 petabyte Greenplum database has turned into a >10 petabyte Teradata database, which will grow 2.5 x further in size soon.
    • Specifically, Oliver told me there are 8 petabytes of spinning disk, with 80% compression. So that's 40 petabytes before you multiply by a reducing factor to cover mirroring, temp space, and so on. My low end for that factor would be 25-28%; my high end would be 35-40%; either way, we're talking about >10 petabytes of true user data.
    • The 8 petabytes of spinning disk are headed to 20 petabytes next year.
    • Oliver gave the impression that Greenplum got thrown out more for reliability reasons than performance. (While eBay saw a major performance difference between Teradata and Greenplum, Oliver previously indicated he was inclined to attribute this more to specific Sun Thumper hardware/storage choices than to software.)
  • That database, called "Singularity," has some interesting aspects -- notably, a character field that's a string of name-value pairs -- on which you can do views and so on for virtual tables -- in a table that otherwise has dozens of conventional relational columns.
    • The system ingests log data in the form of lots and lots of name-value pairs.
    • The most commonly found ones go into columns in the usual way.
    • The rest are strung together into, well, a character string.
    • Teradata has developed some features for eBay that make it easier to index, query, etc. on that character string of name-value pairs.

  • eBay's more EDW-like (Enterprise Data Warehouse) multi-petabyte Teradata database continues to grow, with the main system apparently up to 4.5 petabytes from the previous 2.5.
  • I took the opportunity to ask what kinds of data marts (virtual or otherwise) were spun out in practice.
    • In Oliver's ranking,
      • #1 was derived data based on other data already in the data warehouse.
      • #2 was other data within eBay that had never been put into the data warehouse in the first place.
      • #3 was data truly from outside data.
    • Todd Walter chimed in to point out that at other Teradata customers, who perhaps didn't have as fully fleshed out an EDW, #1 and #2 could be reversed.

  • eBay sees Hadoop as an interesting tool for certain special purposes.
    • eBay likes Hadoop for certain tasks such as image analysis. (Edit: And analysis of search results.)
    • eBay doesn't like Hadoop for anything that requires data movement, such as a join.
    • Similarly, eBay doesn't like HBase.

  • eBay is enamored of the idea to do "social networking around analytics."
    • This is something that has been built but not rolled out yet.
    • It seems more focused on actual business intelligence than on the underlying data, unlike Greenplum Chorus, which seems more focused on the databases themselves.
    • Since it hasn't been rolled out yet, we don't know which (if any) of activity streams, forums, or whatever will actually get significant adoption.
I chatted with Oliver Ratzesberger of eBay around a Stanford picnic table yesterday (the XLDB 4 conference is being held at Jacek Becla's home base of SLAC, which used to stand for "Stanford Linear Accelerator Center"). Things I learned included that eBay's 6.5-petabyte Greenplum database has turned into a >10-petabyte Teradata database, which will grow 2.5 x further in size...

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek - September 2, 2014
Avoiding audits and vendor fines isn't enough. Take control of licensing to exact deeper software discounts and match purchasing to actual employee needs.
Flash Poll
Video
Slideshows
Twitter Feed
InformationWeek Radio
Archived InformationWeek Radio
In in-depth look at InformationWeek's top stories for the preceding week.
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.