Santa Clara, Calif. -- The tools needed to program and debug multicore ICs are in the "dark ages," according to a keynoter at last week's Multicore Expo here. Solutions are emerging, but the dearth of parallel-programming tools and lack of expertise among embedded designers threaten to slow the progress of multicore architectures.
Although designs studded with two, four or even eight processor cores on the same die are fast becoming commonplace in embedded applications, tool support for multicore programming is about where VLSI design tools were in the 1980s--"still in the dark ages," said Anant Agarwal, professor of electrical engineering and computer science at the Massachusetts Institute of Technology.
In his keynote here, Agarwal called for new tools, standards and ecosystems. "Who will be the Microsoft, the Cadence or the Synopsys of multicore?" he asked.
Driven by performance and power concerns, multicore ICs are fast finding favor among designers--so much so that observers warn that in a few years, multicore ICs will have hundreds of cores. Meanwhile, programmers are struggling to cope with today's designs.
"Multicore is hard," said Tomas Evensen, CTO of Wind River Systems. "There are ways to make it easer, but there's a lot of history around sequential programming that makes it hard to move to multicore. A lot of code is written in a single-threaded way, and people don't want to start from scratch and rewrite."
Multicore architectures involve multiprocessing, and to take advantage of that, parallel programming is needed. But few embedded designers have the expertise. "Parallel programming was hot 15 years ago in academic circles, and then it wasn't," said Michael McCool, chief scientist at RapidMind Inc. "There's a whole generation of programmers who don't know how to program in parallel. All programmers will have to become parallel programmers, and quickly, because all programs will be parallel."
McCool noted, however, that "compilers do a terrible job extracting parallelism." Multicore debugging is also challenging, because programmers must track interactions between cores and ferret out deadlocks, data races, memory corruption and stalls. Different processors typically come with their own debugging environments, making it tough to get one view of what's going on in the system.
Solutions, however, are emerging. New and existing companies at the Multicore Expo presented compilers, software development platforms, analysis tools and debugging architectures that claim to ease--though not fully automate--the transition to multicore application development. New multicore development capabilities will also be shown at this week's Embedded Systems Conference (see related story, page 42).
Various multicore architectures pose different programming and debugging challenges. Homogeneous multicore ICs, such as the ARM11 MPCore, use very similar or identical processor cores. Heterogeneous multicore architectures, like Texas Instruments Inc.'s OMAP, use different types of processors.
Some homogeneous multicore ICs use symmetric multiprocessing (SMP), in which there's shared memory and a single operating system that automatically assigns processes to different cores. With asymmetric multiprocessing (AMP), the user manually assigns tasks.
Heterogeneous multicore ICs raise a raft of programming challenges, noted Greg Davis, technical lead for compilers at Green Hills Software. Different CPUs may require different compilers, dialects and pragmas, he said, and some have "flaky tools." Auxiliary cores may have limited memory banks and must interact with a master core to swap in memory.
SMP is an attractive programming model, because some existing prepartitioned code will "just run faster," Davis said. But SMP systems may exhibit nondeterminism, inefficiency and latent race conditions. AMP provides more user control over efficiency and determinism, but results in less portable software with higher up-front costs, he said.
Frank Schirrmeister, vice president of marketing for stealth-mode startup Imperas Inc., presented four "axes" for categorizing multicore systems: processors, communications, memory architectures and "specificity" for applications. All affect programming. For some types of designs, the big challenge is mapping tasks to the right processor; for others, it's run-time mapping to determine available compute space.
The shared-bus systems used for many multicore ICs are difficult to program and debug and prone to deadlocks and data races, Schirrmeister said. And the choice of memory architecture affects task execution times, he said.
Multiprocessing presents three major challenges, Schirrmeister said: partitioning, parallelization and optimization. What's needed, he said, is a programming model that makes it possible to create parallel applications, optimize the mapping of those applications onto parallel hardware and gather data to guide the optimization decisions.
Providers are promoting varying approaches to multicore programming. For SMP systems, Posix threads and processes provide a way to add concurrency to programs, said David Kleidermacher, Green Hills Software CTO. He advocated "partition scheduling" at the application level rather than the thread level as a way of managing CPU execution time.
MIT's Agarwal said that Posix threads will do in the short term, but they offer no encapsulation or modularity. A more promising concept, he said, is one that's already used for ASIC design: streaming data from one compute unit to another.
Streaming is fast and efficient and is similar to the sockets used for networking applications, Agarwal said. A "socketlike" stream-based application programming interface could benefit multicore devices, Agarwal said, noting that the Multicore Association's proposed Communications API standard is such an interface.
RapidMind, which provides a software development platform for the IBM Cell Broadband Engine and Nvidia graphics processor, advocates a programming model called "single program, multiple data." It includes single-instruction, multiple-data concepts, but unlike SIMD, it doesn't assume a constant time per kernel.
"This model lets you think in parallel and express locality," said McCool. "It's deterministic and safe. You can't get deadlock or race conditions."
Regardless of the programming approach, multicore developers will need analysis and debugging tools. According to Wind River's Evensen, they will need hierarchical profiling tools to partition code and find bottlenecks. And run-time analysis tools will help identify race conditions that can occur when multiple threads have access to the same data.
Limited visibility makes multiprocessor debugging difficult, said Jakob Engblom, business development manager at Virtutech. Memory caches hide data, he said, and there's "time-sensitive, chaotic behavior" and a lack of determinism to contend with. Also, the system will keep running even if one core has stopped.
ARM's multicore debug solution is CoreSight, a technology that uses ARM's embedded trace macrocells. CoreSight includes a debug access port, an embedded cross-trigger mechanism, a "trace funnel" that converts multiple trace sources onto a single debug register bus, an embedded trace buffer and a trace port interface unit. Andrew Swaine, CoreSight team lead at ARM, said the technology is independent of the ARM architecture and is available for royalty-free use.
CoreSight was just one of a number of multicore solutions presented at the expo. Virtutech offers a "virtualized" software development environment that's said to ease multicore debugging. David Stewart, CEO of CriticalBlue, showed how his company's Cascade product can generate application-specific coprocessors from legacy software. Martijn de Lange, founder and CEO of Associated Compiler Experts, discussed multicore applications for his company's CoSy compiler generation system.