Commentary

Alexander Wolfe
 

Bug In AMD's Quad-Core Barcelona And Phenom May Be More Serious Than Previously Suspected

On Friday, I thought I'd identified the translation-lookaside buffer (TLB) bug which AMD said was responsible for problems it's having with its new Barcelona and Phenom quad-core processors. Now, two readers claim that the bug is more serious than I suggested. The reason is, while there is a BIOS workaround, they claim the fix results in a big performance penalty. (There's also an operating system fix with no performance hit.) This may be why heavy volume shipments don't seem to be in the cards until Q1, when updated silicon, now being readied, is available.

On Friday, I thought I'd identified the translation-lookaside buffer (TLB) bug which AMD said was responsible for problems it's having with its new Barcelona and Phenom quad-core processors. Now, two readers claim that the bug is more serious than I suggested. The reason is, while there is a BIOS workaround, they claim the fix results in a big performance penalty. (There's also an operating system fix with no performance hit.) This may be why heavy volume shipments don't seem to be in the cards until Q1, when updated silicon, now being readied, is available.Okay, here's the deal. AMD on Thursday issued a statement where it said: "There has been some talk about an erratum relative to our TLB cache in Barcelona as well a Phenom processor resulting in delays. AMD notified customers of this erratum and released a BIOS fix prior to the Nov. 19th launch that resolves it."

I looked through my AMD documentation and came up with what I thought was the bug (erratum) to which the statement referred. I figured it was number 122, "TLB Flush Filter May Cause Coherency Problem in Multicore Systems." Erratum 122 isn't a huge deal; it can be managed by disabling the TLB flush filter.


More Global CIO Insights

White Papers

More >>

Reports

More >>

Webcasts

More >>

However, by Friday evening two anonymous readers had posted comments claiming that bug 122 wasn't the bug at issue, and in fact the glitch affecting Barcelona and Phenom is more serious than anyone thinks.

Here's what commenter "Fred" wrote:

"Alex, you're wrong. The erratum is not yet in AMD's public documentation. It's #298, or something, and it most certainly IS a show-stopper.

The patch, needed to avoid random crashes, results in [an approximately] 13% penalty to desktop apps and a huge penalty to virtualization. The penalty is so bad that no Tier 1 OEM will ship Barcelona servers until the B3 stepping in Q1.

AMD is left foisting these defective parts on HPC installations willing to take them at a steep discount, and, until recently, an unsuspecting consumer public that was buying 9500 and 9600 Phenoms. AMD made sure these parts were benchmarked by review sites without the performance-killing fix, which is shameless. It really is surprising that there hasn't been a recall."

Here's what the second poster, self-identified using the Slashdot slang "Anonymous Coward," wrote:

"Errata definitely exists, and as Fred pointed out, it's a new number. There are actually two "fixes" for this bug.

1) BIOS-level fix (the 13-20% performance penalty). I've read this errata, it sets two specific hidden registers, surprisingly simple ... which means I'll bet the BIOS-level fix actually disables the L3 cache, the performance penalty is about right.

2) Operating system workaround, word is the performance cost is effectively zero. RedHat has a fix, Microsoft has a fix, VMware (who would take the biggest performance hit) could do one, it's easy to do. Catch is, the OEMs can't guarantee the end customer runs a patched OS, so the OEM would rather wait three months to ship a fixed processor.

One of the rumors I heard suggested this bug affects all processors above a certain speed, 2.0GHz chips wouldn't hit it but 2.4+ are likely, so everything ends up in a low speed bin. Just a rumor."

So, in summary, these posters are claiming that the bug in Phenom and Barcelona causes random crashes and that there are both BIOS and operating-system workarounds, but that one of those fixes -- the BIOS -- results in a big performance penalty.

I was going to cut the post off here, but I have one final thought, which is that "Anonymous Coward's" second point, above, that it affects all processors above a certain speed, makes me wonder whether it has something to do with erratum #169, "System May Hang Due To DMA or Stalled Probe Response." This is an obscure glitch where, under certain obscure timing conditions, the Northbridge hangs. Interestingly, the fix is a BIOS workaround.

Anyway, I have a query in to AMD, and I also invite any readers with knowledge of the situation to comment below.

Finally, I want to state that I remain a huge fan of AMD's innovative new 10h architecture, which is making its first appearance in Barcelona and Phenom. I remain convinced that the success of both of these processor families is important for the industry. I refer you to my earlier piece, Inside AMD's Phenom And Opteron Quad-Core Architectures.

P.S. Readers who wish to comment directly can e-mail me at alex@alexwolfe.net



Detailed description of erratum 169, which isn't the one the commenters are talking about, just one that I think might be relevant. (Click picture to enlarge.)

P.P.S. For the latest update, see AMD's Quad-Core Barcelona Bug Revealed.


Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

InformationWeek encourages readers to engage in spirited, healthy debate, including taking us to task. However, InformationWeek moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. InformationWeek further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.
T-Shirt Giveaway T-Shirt Giveaway: Each week we're selecting one great comment from our readers. The author of the comment will receive an InformaitonWeek Community t-shirt. So get posting!
Subscribe to RSS

Resource Links