Inside Intel's Core Architecture--What Makes It Better?

The introduction of Core is one of the most important technology developments for Intel in years and may determine its success for the rest of the decade and beyond.

Darrell Dunn, Contributor

March 8, 2006

4 Min Read

By midyear, the new Core architecture introduced at Intel's Developer Forum this week in San Francisco will provide a single framework that will extend across virtually all of the company's future processor offerings, from desktops to mobile computers to servers. The introduction of Core is one of the most important technology developments for Intel in years and may determine its success for the rest of the decade and beyond.

Core becomes Intel's single architecture for all of the company's mainstream processor offerings, with the exception of Itanium. It will let software developers more easily create applications that will easily port across both client and server implementations.

Intel promises Core will provide increased performance while maintaining or reducing power requirements for running its processors. It's a promise the company must keep if it expects to counter gains made by rival Advanced Micro Devices in the server market with its more efficient Opteron processors.

The advances in Core are tied to five specific innovations: wide dynamic execution, intelligent power capability, advanced smart cache, smart memory access, and advanced digital media boost.

While much of Core is based on the low-power Yonah processor architecture used in Intel's recent Core Duo processors for mobile computers, Stephen Pawlowski, chief technical officer for Intel's digital enterprise group and manager of architecture and planning, says Core is "a new design from the ground up, and not simply a re-do of Yonah."

A simple way to improve the performance of any processor is to increase its clock frequency and the number of instructions it can handle in each clock cycle. The heat created in the operation of a processor, however, increases with every frequency bump. Improvements in the number of instructions the engine can handle in each cycle is a better method for improving performance while maintaining or reducing power consumption, Pawlowski says.

The most obvious advancement in the Core architecture beyond the NetBurst architecture, used since the introduction of the Pentium Pro and now used in Xeon and many Pentium processors, is the ability for each core in the processor to fetch, dispatch, execute, and return up to four full instructions simultaneously, instead of the three instructions possible in prior generation architectures. The "four wide" architecture provides for a 33% increase in the processor's ability to handle instructions, letting a dual-core device handle 16 instructions simultaneously.

The Core architecture also includes a third Arithmetic Logic Unit--two ALUs are used in earlier generation processors--that has been enhanced to handle macrofused instructions. Macrofusion enables common instruction pairs to be combined into a single internal instruction during the decoding process. Two program instructions can then be executed together, reducing the work the processor must complete in a cycle of operation.

Further efficiency improvements include more accurate branch prediction and deeper instruction buffers for greater execution flexibility.

The advanced digital media boost in Core can double the throughput of certain 128-bit-wide instructions, called Streaming SIMD Extension. SSE instructions are used in applications such as video, audio, and image processing, as well as in encryption, financial, and scientific applications. Because most processors now operate at 64 bits, 128-bit instructions are often executed over a period of two clock cycles. The Core will allow those long instructions to be executed in one cycle.

The advanced smart cache was first used in the Core Duo processor. The feature is designed to increase the probability that each execution core in a dual-core processor can access data from the highest-performing memory cache subsystem by using a shared level 2 cache instead of independent caches dedicated to each processor core. When one core has minimal cache requirements, the second core (or cores in future multicore processors) can increase their percentage use of the level 2 cache, reducing the incidence of cache misses.

Smart memory access improves system performance by optimizing the use of available data bandwidth from the memory subsystem and hiding delays. The result is to ensure that needed data can be used as quickly as possible and is located as close as possible to where it will eventually be needed.

Intelligent power capability includes providing a more finely grained ability to turn off segments of a processor that aren't being used, as was done in the Core Duo processor, and even subsegments of the processor that aren't being fully used.

The resulting improvements will be seen in each of the processors that will begin shipping in the second half of this year. The Woodcrest processor for servers will provide an 80% performance increase over existing Xeon processors, while reducing power consumption by 35%; the Conroe processor for desktop PCs will provide a 40% increase over existing Pentium processors, while reducing power demands by 40%; and the Merom processor for mobile PCs will provide a 20% performance boost over Core Duo, while maintaining the same power level, according to Pawlowski.

Read more about:


About the Author(s)

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights