John Fruehe, AMD’s Director of Product Marketing for the server and embedded space, gives you an inside look at what’s coming with Bulldozer and what it means for HPC vendors and customers.

Interview: AMD’s John Fruehe, Director of Server Product Marketing

Power Talk

ADMIN Magazine: The world is eagerly awaiting the appearance of the latest AMD chip line, which includes the Bulldozer series. How long have you been planning and developing this latest generation of processors? 

John Fruehe: This generation of processors has been in development for a few years now. This is a completely new architecture with a new modular design that is designed to increase the core density through the use of a module – something that hasn’t been done before. We have spent extensive time in development, simulation, and modeling to ensure a robust platform. Our partners have been working with various levels of silicon over the past six or more months. Beginning in September, they began receiving their first production parts. These will be the parts used for the final validation. Although the processor design is a clear departure from where we have been before, the platforms – the AMD Opteron 4000 and 6000 series – have been in market for some time now, so customers will be able to easily integrate this new design into their data centers. 

AM: The high-tech world is constantly evolving. What do you see as the biggest changes to the IT landscape since you rolled out your last generation of chips, and how have you addressed those changes through the features you’ve incorporated into Bulldozer? 

JF: We are starting to get to the point where higher thread count and better power efficiency are trumping raw clock speed. Virtualization is now very mainstream and everyone is doing it, so utilization is going up and concurrency is more important. Cloud is really taking off; our research shows that more than a third of the companies we surveyed are already doing cloud-type deployments. With the strong push in both cloud and virtualization, having lots of cores and higher power efficiency seem to be the keys to better value. Cloud workloads tend to be “spiky” with peaks where more resources are needed, and valleys where more power efficiency is needed. The multi-threading capabilities are clearly leading the market towards more cores and away from higher clock speeds because efficiency and throughput are more critical for enterprise applications. 

AM: Based on the information out so far, it seems like Bulldozer is tailored for HPC and data center environments. Is this your target customer?

JF: Threaded environments are the target. This includes HPC, but also includes cloud, virtualization, and database. Single-threaded applications are less important as every day goes by. For HPC, we have plenty of new capabilities, including plenty of new SSE instructions, support for 256-bit AVX, and even some instructions that are available only on AMD-based systems (XOP and FMA4). The new Flex FP floating point complex is specifically tuned for those technical workloads. But the modularity of the Bulldozer architecture allows it to match a variety of applications. On the client side, the AMD FX processor will have configurations that are perfect for different client workloads as well. 

AM: Will programs written for previous chip families see a benefit from running on Bulldozer? Or are the most advanced features only for programs written specifically for the Bulldozer platform? 

JF: Because it supports all of the existing x86 code, any application can immediately run on these platforms. However, recompiling will let you take advantage of the latest features. But this is no difference from most product introductions; to take advantage of new features, you will need to have support in the software. Some of the new instructions, like SSSE3, SSE4.1, SSE4.2, AES-NI, and other instructions, are already supported in existing software, so you will be able to take advantage of those instructions with little change. With AVX, all software will need to be updated, whether it is running on our platform or our competitors’ platform. 

AM: TDP Power Cap seems like one of the most exciting new features of the Bulldozer series. Could you explain what TDP Power Cap is and how it works? 

JF: TDP Power Cap lets you set a custom TDP (power limit) for the processor. Many customers are trying to optimize their data centers and get the best density possible. Since they have a predefined power budgets per rack, TDP power capping allows a customer to take the TDP down a bit, freeing up some of that power headroom so that they can maximize space in their rack. A good example was a customer I met with in Europe. They had power budgets for the rack, and always ran out of power first, so they had 3-5 slots free at the top of the rack. With TDP Power Cap, they can reduce the TDP, allowing them to populate the extra slots at the top of the rack, which allowed them to reduce the number of total racks and grow their server count without taking up more floor space. 

AM: How are these power thresholds set? Are these BIOS settings that the sys admin or cluster manager can set manually? Is it scriptable? 

JF: They are set in the BIOS. If you have a management tool that is APML compatible, you can set this on the fly or script it. 

AM: Do you imagine that the owner of the system will adjust the TDP Power Cap settings once, for the initial stages of testing and rollout, then leave it alone, or is this the kind of thing that will get continually tweaked and adjusted with changing conditions?

JF: In most cases, this is set at the beginning and not tweaked. Let’s say that you have 8,000 watts of power budget available for the rack and your servers consume 300W at max (during reboot, before power management is enabled through the OS). You have to provision for the “worst-case scenario,” assuming that the entire rack reboots at the same time. So, with 8,000 watts, you can fit 26 servers in the 42U rack because you can’t exceed the power budget – even if the servers only run at 200W on average. Now, to address this, you could take the TDP down by 20W on each processor, allowing you to put 30 servers in the rack instead of 26. Because your average workload is around 200W, with a 260W max power for the server, you would probably not see any impact in performance because you aren’t running above the power requirements of the workload. More flexibility, more density, little – if any – impact on performance. 

AM: Power use is one side of this, and the other side is cooling (or actually heat dissipation), which is often a limiting factor for HPC implementations. By controlling power usage, do you also gain some control over the temperature, which could conceivably allow vendors to build bigger clusters or pack more nodes in a single building without overheating? 

JF: Yes, definitely. Power and cooling are critical to cluster size and go hand in hand. The bigger issue however, for customers, is core density. With Interlagos, our 16-core processors, we are increasing core densities by 33%, yet power stays the same. So, the power-per-core is lower, allowing customers to do a better job of either packing the same performance into a smaller number of nodes (saving power and space) or allowing customers to get more cores into the same platform footprint, which means better performance in the same power/thermal range as the previous generation. 

AM: Any other cool Bulldozer features you want to share with us? 

JF: There is another feature called AMD Turbo CORE that allows us to boost frequency on processors when there is additional power headroom available. For HPC workloads that are integer-based, there may be headroom to boost, and workloads like cloud, virtualization, and database will clearly be able to boost. We have two levels of boost, an all-core boost that is 3-500MHz of boost across all cores for most workloads, and even higher boost if some of the cores are inactive. For heavy FP-based workloads, customers will probably be close to the limit on power, so the ability to boost may not be there.