The InformationWeek -- Blogs
Wolfe's Den Blog

Topics:   Wolfe's Den

  • Email this page E-mail this page
  • Print this page Print this page
  • Bookmark and Share
  • icon

Bug In AMD's Quad-Core Barcelona And Phenom May Be More Serious Than Previously Suspected


Posted by Alexander Wolfe, Dec 9, 2007 09:26 AM

On Friday, I thought I'd identified the translation-lookaside buffer (TLB) bug which AMD said was responsible for problems it's having with its new Barcelona and Phenom quad-core processors. Now, two readers claim that the bug is more serious than I suggested. The reason is, while there is a BIOS workaround, they claim the fix results in a big performance penalty. (There's also an operating system fix with no performance hit.) This may be why heavy volume shipments don't seem to be in the cards until Q1, when updated silicon, now being readied, is available.


Okay, here's the deal. AMD on Thursday issued a statement where it said: "There has been some talk about an erratum relative to our TLB cache in Barcelona as well a Phenom processor resulting in delays. AMD notified customers of this erratum and released a BIOS fix prior to the Nov. 19th launch that resolves it."

I looked through my AMD documentation and came up with what I thought was the bug (erratum) to which the statement referred. I figured it was number 122, "TLB Flush Filter May Cause Coherency Problem in Multicore Systems." Erratum 122 isn't a huge deal; it can be managed by disabling the TLB flush filter.

However, by Friday evening two anonymous readers had posted comments claiming that bug 122 wasn't the bug at issue, and in fact the glitch affecting Barcelona and Phenom is more serious than anyone thinks.

Here's what commenter "Fred" wrote:

"Alex, you're wrong. The erratum is not yet in AMD's public documentation. It's #298, or something, and it most certainly IS a show-stopper.

The patch, needed to avoid random crashes, results in [an approximately] 13% penalty to desktop apps and a huge penalty to virtualization. The penalty is so bad that no Tier 1 OEM will ship Barcelona servers until the B3 stepping in Q1.

AMD is left foisting these defective parts on HPC installations willing to take them at a steep discount, and, until recently, an unsuspecting consumer public that was buying 9500 and 9600 Phenoms. AMD made sure these parts were benchmarked by review sites without the performance-killing fix, which is shameless. It really is surprising that there hasn't been a recall."

Here's what the second poster, self-identified using the Slashdot slang "Anonymous Coward," wrote:

"Errata definitely exists, and as Fred pointed out, it's a new number. There are actually two "fixes" for this bug.

1) BIOS-level fix (the 13-20% performance penalty). I've read this errata, it sets two specific hidden registers, surprisingly simple ... which means I'll bet the BIOS-level fix actually disables the L3 cache, the performance penalty is about right.

2) Operating system workaround, word is the performance cost is effectively zero. RedHat has a fix, Microsoft has a fix, VMware (who would take the biggest performance hit) could do one, it's easy to do. Catch is, the OEMs can't guarantee the end customer runs a patched OS, so the OEM would rather wait three months to ship a fixed processor.

One of the rumors I heard suggested this bug affects all processors above a certain speed, 2.0GHz chips wouldn't hit it but 2.4+ are likely, so everything ends up in a low speed bin. Just a rumor."

So, in summary, these posters are claiming that the bug in Phenom and Barcelona causes random crashes and that there are both BIOS and operating-system workarounds, but that one of those fixes -- the BIOS -- results in a big performance penalty.

I was going to cut the post off here, but I have one final thought, which is that "Anonymous Coward's" second point, above, that it affects all processors above a certain speed, makes me wonder whether it has something to do with erratum #169, "System May Hang Due To DMA or Stalled Probe Response." This is an obscure glitch where, under certain obscure timing conditions, the Northbridge hangs. Interestingly, the fix is a BIOS workaround.

Anyway, I have a query in to AMD, and I also invite any readers with knowledge of the situation to comment below.

Finally, I want to state that I remain a huge fan of AMD's innovative new 10h architecture, which is making its first appearance in Barcelona and Phenom. I remain convinced that the success of both of these processor families is important for the industry. I refer you to my earlier piece, Inside AMD's Phenom And Opteron Quad-Core Architectures.

P.S. Readers who wish to comment directly can e-mail me at alex@alexwolfe.net



Detailed description of erratum 169, which isn't the one the commenters are talking about, just one that I think might be relevant. (Click picture to enlarge.)

P.P.S. For the latest update, see AMD's Quad-Core Barcelona Bug Revealed.

« John Lennon, Then And Now | Main | Email Application The Culprit In Palm Centro Battery Life Problems »



Sign Up Now
For InformationWeek News Alerts




This is a public forum. United Business Media and its affiliates are not responsible for and do not control what is posted herein. United Business Media makes no warranties or guarantees concerning any advice dispensed by its staff members or readers.

Community standards in this comment area do not permit hate language, excessive profanity, or other patently offensive language. Please be aware that all information posted to this comment area becomes the property of United Business Media LLC and may be edited and republished in print or electronic format as outlined in United Business Media's Terms of Service.

Important Note: This comment area is NOT intended for commercial messages or solicitations of business.




 
 

  1. Sequential Programming: Like Eating Peas with a Straw.
  2. Biomolecular device using self-assembled DNA nanostructures?
  3. Coreinfo v2.0: A Simple Utility to Understand the Manycore Complexity in Windows


Join The InformationWeek Group On LinkedIn


                           


  1. More Reasons Why Linux Misses The Desktop
  2. Too Much Netbook For Too Litl?
  3. Verizon: $350 ETF Is A Go
  4. Motorola Explains Why Droid Doesn't Have Multi-Touch


  1. Florida Hospital Dials Up iPhones For Nurses
  2. Full Nelson: A Web Presence Needs Sizzle, My Nizzle
  3. Is Antivirus Software Dead?
  4. Practical Analysis: The Fastest-Growing Security Threat
  5. InformationWeek Analytics Research: Federated Search
  6. Securing The Cyber Supply Chain

 

  Ars Technica
Boing Boing
Channel 9 Forums
CRN Blogs
Dr.Dobb's Portal: Blogs
Engadget
Gizmodo
GrokLaw
  Lifehacker
Schneier on Security
Slashdot
TechCrunch
Techdirt
Techmeme
Valleywag

  DECEMBER 2008
NOVEMBER 2008
OCTOBER 2008
SEPTEMBER 2008
AUGUST 2008
JULY 2008
JUNE 2008
MAY 2008
  APRIL 2008
MARCH 2008
FEBRUARY 2008
JANUARY 2008
DECEMBER 2007
NOVEMBER 2007
OCTOBER 2007
SEPTEMBER 2007