Computationally complex problems constrained by conventional machine architectures might be improved with the use of hardware/software co-design, wherein components are purpose-built for specific applications.

Co-Design Approach to Supercomputing

In consumer electronics such as microwave ovens, cameras, and cellphones, every component is optimized to that particular device’s application. This approach is known as co-design, whereby systems are purpose-built for a specific application from the ground up.

Researchers at Lawrence Berkeley National Laboratory, UC Berkeley, and elsewhere are investigating the concept of co-design for supercomputers. Specifically, they have looked at applying the technique to computationally complex climate models, such as those used for studying clouds.

In a paper titled “Hardware/Software Co-design of Global Cloud System Resolving Models,” published in the Journal of Advances in Modeling Earth Systems, these scientists use global cloud-resolving models as a case study and posit that an aggressive co-design approach to scientific computing could increase code efficiency and enable chip designers to optimize the trade-offs between energy efficiency, cost, and application performance.

The current global climate models that are used to project the effects of greenhouse gases and other pollutants cannot resolve individual clouds or cloud systems because of computational constraints. In their paper, the researchers state, “we have estimated the computational requirements necessary to integrate a Global Cloud System Resolving Model (GCSRM) at rates appropriate for climate research by analyzing each of its major components separately. We find that a sustained computational rate of 28 petaflops and 1.8PB of total memory would enable such a model to simulate the atmosphere one thousand times faster than real time. These requirements are machine architecture independent.” Such performance greatly exceeds the capability of machines available today; the world’s fastest supercomputer (according to currently clocks in at 10 petaflops and 1.4PB.

The co-design study offers an alternative strategy. In their paper, the researchers present a detailed overview of their proposed co-designed system and demonstrate that a hardware/software co-design approach to low-power embedded processor technology could be used to create a custom machine at relatively affordable cost and power considerations.

To find out more about this climate computer concept design, which researchers have named “Green Flash” after the atmospheric phenomenon, I contacted lead researchers Michael Wehner and John Shalf.

AA: In your paper, you describe the strawman for this design in detail. Are there plans to build a prototype of this concept computer? If so, what are the immediate challenges and when might it be built?

MW: No; current funding is not available to actually build anything.

JS: We are able to do a detailed node design, but currently there is no funding to tape out a full chip. So, we are collaborating with a group on the UC Berkeley campus to tape out a test chip to demonstrate that the design can meet all of its power and performance targets. However, it will just be a demo chip (not a full system design) because we are doing this on essentially zero budget.

AA: You note that one criticism of this approach is that co-designed machines would be highly specialized – although fully programmable. How might the design approach differ for machines intended for other supercomputing applications?

MW: I presume that you mean how co-designed machines would differ between different targeted supercomputing applications. The general procedure of diagnosing the application for what is actually needed would remain the same. But all the details about the number of processors, the processor characteristics (cache, clock speed, etc.) the communications network, and especially the mix of instructions contained in hardware would differ.

JS: It isn’t a requirement that a co-designed machine is entirely specialized to a single application. That is a parameter that we can decide on as part of the design process. However, the more specialized the device is to the application, the more energy efficiency you can derive (so the trade-off is generality vs. energy efficiency, but that is a selectable design choice for co-design).

One observation [was] that we derived most of our energy efficiency benefits from what we removed from the design rather than what was added. There are numerous features of modern processors that are of no use to any scientific application. Many include instructions that are decades old only to maintain binary compatibility, but are not used by any modern application (and that consumes power). There are instructions that support banking systems or Internet video but would never be used in a scientific application. So, there is a lot of “fat” to cut out of a chip design that would benefit all scientific applications. We get an extra boost by targeting climate, but just targeting scientific computing is a big win in terms of energy efficiency.

MW: The important point is that the hardware itself is no longer the commodity component; rather, the embedded processor design technique is the new “commodity.”

JS: Right, we are not designing custom circuits here. The embedded processing community provides many useful “LEGO” blocks that we can assemble in a manner that is better suited for scientific computation. The “LEGO” blocks that we can buy from the embedded market are themselves commodities, but the costs are amortized over a much larger variety of designs than is the case for commodity chips in the desktop/server space. So, this project is a reformulation of what it means to leverage a commodity market.

In the embedded space, the chip is not the commodity. It is the commodity circuit blocks (“LEGO” blocks) that you put onto the chip that are the commodity.

AA: Have you been working with particular vendors on these hardware concepts?

MW: We have obtained and used the design tools from Tensilica, an embedded processor design firm. But, we have not worked with any traditional supercomputing vendors on this concept (yet …).

JS: I have presented this concept to a number of supercomputing vendors and other research groups. One group in the EU adopted the idea and has a joint project with ARM to develop a supercomputer using ARM cores. We currently are using our architectural simulation platform that was developed for Green Flash to prototype alternative node designs and inform vendors about design alternatives for the DOE exascale program. This new project is called CoDEx (CoDesign for Exascale).

It is repurposing the tools to push the HPC hardware companies to create more energy-efficient solutions by demonstrating (using hardware simulation) what the opportunities are for improving over the current design practice.

AA: What’s next for this project?

MW: For the climate applications, it is critical that we examine other credible codes at this scale. We have identified the NASA/NOAA finite volume model and the DOE/NSF spectral element model as alternatives to the Colorado State University model. Both of these models are based on a cubed-sphere mesh rather than the icosahedral mesh of the CSU model. The outstanding issue is to quantify how similar (or different) the computational requirements for these three approaches to cloud system-resolving physics are. This would allow us to determine if an appropriate set of co-design choices could be made to accommodate more than one particular physics approach. (Again, we need to procure funding to pursue this line of research.)

JS: We have also been expanding the scope of applications that we cover to include seismic imaging applications for the oil and gas industry. Our simulation results for the prototype node design for seismic imaging were recently published in the Supercomputing 2011 conference.

This provides the most detailed energy, performance and cost estimates that we have produced to date. The model includes the full simulation of the memory and other components required for a supercomputer node design. Interestingly, it is not hugely different from the design required for the climate modeling machine.

AA: Is there anything you’d like to mention about the project that I’ve not asked?

MW: Our feeling is that aggressive hardware/software co-design is the most economical (and hence practical) way to achieve exascale scientific computing. Our ideas are not without risk, but the payout could be very significant.

Co-design allows us to design computers to answer specific questions, rather than limit our questions by available machines. What I mean by this is that numerical experimentalists could take a lesson from actual experimentalists who build machines like the Large Hadron Collider or the Hubble space telescope to answer very specific questions deemed important by the larger community. Similar scientific questions, amenable to computational analysis, can be answered by co-designed computers at the exascale.

Additional details about the Green Flash project and architecture are available at, and you can read the complete research paper online at