Aug 01, 2011
Optimizing an application's use of cache memory is critical in its overall performance. Today's modern multi-core pipelined CPU architectures operate on data much faster than the data can be supplied by main system memory. If the cache is not always full of the data the CPU requires next, all of that CPU power stalls, sitting idle waiting for the next piece of work.
There are several distinct steps in optimizing an application for good cache performance, each targeting a different category of problems. The order of the steps is somewhat important as some problems obscure others, and some optimizations enable other optimizations or make other optimizations unnecessary. By following this workflow you avoid making unnecessary optimizations and missing optimization opportunities that may not present themselves clearly if the steps are done in another order.
The workflow outlined in this whitepaper moves through addressing data layout and data access patterns, exploring data reuse opportunities, considering multithreading issues, and finally, putting on the finishing touches. It provides a solid framework in which developers can work to optimize performance of key parts of an application. The steps described, from optimizing general access patterns to advanced techniques to multithreading concerns, cover all of the areas where cache memory bottlenecks can occur. Developers who follow these steps will reach their goal of a faster application more quickly than those who follow ad-hoc methods where they may waste time attempting to optimize the wrong parts of their code.