What’s new at the 2024 International Supercomputing Conference and in the TOP500.

ISC 2024 from a Distance

I have only been to one International Supercomputing Conference (ISC), but I really enjoyed it because it is much smaller and allows for more conversation than at the Supercomputing Conference (SC). You also get the European perspective on high-performance computing (HPC) along with a “six-month checkup” on what came out at the previous SC. It can take that long for the community to digest what was announced or shown at SC, so ISC is perfectly timed for the community to understand, test, and develop opinions that then feed into the upcoming SC in six months, and the cycle continues. I find it a good process for announcing new things, digesting and understanding them, and making new announcements on that understanding.

Because I didn’t go to ISC this year I decided I would do a summary of the conference from a distance, which means I derive all my information from press releases, social media, and any email I received from friends. This method was actually quite fun.

In this summary I present my own opinions, although I am a bit restricted because my employer won’t allow me to write anything about them; however, I can mention their name if I have to, but it must always be around public announcements. One last stipulation I set for myself is that I will only discuss topics that I think are interesting or are important enough to discuss.

Exascale (In General)

Exascale was a very hot topic at ISC, not just because of the TOP500. The exascale system Aurora came into the TOP500 at number 2, which caused lots of discussion on social media. The upcoming European exascale system, code-named Jupiter, was also a big part of the discussion, particularly around the main processor and the updated timeline. The European processor, named Rhea-2, was announced as coming out in 2025, with the full system being ready in 2026. The processor is based on the Arm Neoverse “Zeus” V2 core; it also appears that an NVIDIA GPU will be used along with NVIDIA InfiniBand.

The upcoming “El Capitan” system at Lawrence Livermore National Labs also generated some excitement. It uses AMD MI300A accelerated processing units (APUs), and the HPE Slingshot network will be used in a Dragonfly topology. Much of the discussion was around where it placed in the fall TOP500 list, especially if it would beat Frontier to number 1.

TOP500

Closely related to the discussion of exascale was the TOP500 list. Almost everyone anticipates the announcement of the new TOP500 list at the ISC and SC conferences, even though many people don’t think it is a useful benchmark for ranking current systems. ISC 2024 was no exception. The most discussed topics I followed were:

  • Aurora not being number 1

Rumors had been circulating, and people had been hoping, that the full Argonne Aurora would exceed (or come close to) 2 exaflops (2x10^18 floating point operations per second) using the high-performance Linpack (HPL) benchmark. However, this was not the case. It came in at number 2 on the TOP500 list, sending people to Twitter/X and other social media to discuss why (and making lots of predictions).

For ISC 2024, Aurora used 87% of the system’s nodes for their submission, and some conversation indicated that HPL has not been completely tuned at scale at this point. Some people thought Aurora might be able to get close to Frontier on the next TOP500 if the entire system is used with a tuned HPL.

A fact that received many comments was that Aurora reached number 2 using 38.698MW, resulting in a low performance/power ratio of 26.15. In comparison, Frontier at number 1 reached about 1.2 exaflops using 22.78MW, resulting in a performance/power ratio of 52.7, which kind of points to Aurora needing to tune HPL. If you want to speculate about Aurora, assume it has the same performance/power ratio as Frontier and it will exceed 2 exaflops.

In another list, Aurora took the number 1 spot for the mixed-precision HPL-MxP benchmark, reaching 10.6 exaflops and passing Frontier’s performance of 10.2 exaflops. Using power numbers, this means Aurora has a performance/power ratio of 273.9Gflops, whereas Frontier has a performance/power ratio of 449Gflops. Ah, the potential of lower precision.

  • Frontier remaining number 1

Of course, the news of Aurora not being number 1 caused people to also talk about how Frontier is still number 1, which led to lots of pontification on the advantages of one architecture over the other and other topics. The HPC community loves to do comparisons on the basis of the TOP500 lists that then lead to dwelling on how one architecture is superior to another. I think these pontifications can be safely ignored because (1) Aurora and Frontier are both x86 and GPU based, and (2) HPL is one workload, and you can't decide between two HPC systems on the basis of a single number. Instead, I offer that the two systems, as well as the upcoming El Capitan, point out the determination in reaching "exascale."

  • Once again, no new Chinese systems being on the list (even though everyone knows they have exascale systems)

China did not submit results for their rumored exascale system (more than one), which has been talked about for more than a year. Even before the US embargo of high-end computing processors, the talk continued. However, people also noticed that China did not submit ANY new systems to the TOP500.

Once last comment about the TOP500 is that you need to keep your eyes on Intel’s development of zettascale HPC in just a couple of years. I’m sure they won’t let the world down by not reaching that level of performance by 2026 (BTW – just a little tongue-in-cheek).

Related to TOP500

Almost everyone thinks of the TOP500 as HPL, when in fact it also embraces the HPCG (high-performance conjugate gradient) and Green500 lists. The intent of developing the HPCG benchmark was to create a complement to HPL. The memory and compute patterns of HPCG are different from those in HPL, hence the complementary aspect of the test. Interestingly, the HPCG results are a much, much lower percentage of the total capable flops of the system.

The 2024 HPCG results still had Fugaku in Japan as the number 1 system, which is notable for many reasons, starting with the fact that Fugaku has had the number 1 ranking since it was installed (in 2020). Moreover, Fugaku is a CPU-only system (no accelerators), which means that Fugaku is a very important system because it is different from others, having a very high performance on a test other than the HPL.

One comment made on the announced list is that Aurora submitted a result that finished third, but only 40% of the system was used. The expectation is that once Aurora is fully up and running, the entire system will be used for an HPCG submittal.

The Graph500 benchmark is used for testing graph processing capabilities. In a similar vein to the Green500 is the Green Graph500. I’m not a “graph” person, so I can’t comment much on the results, and I didn’t really see any buzz about it, but it is another useful benchmark for understanding the characteristics of large HPC systems.

Green500 and Power in General

The Green500 was created to measure how efficiently a TOP500 system uses power. It is simply the HPL performance divided by the total measured power used by the system (which is not always easy to measure). At ISC 2024, it seemed like the Green500 was more a source of discussion than the TOP500. Europe has always been a bit more energy conscience than North America, so this topic has perhaps more meaning, and notice, than the TOP500 itself.

The ISC 2024 Green500 list was very interesting because of some movement in the rankings and the top few systems. The number 1 system used the new NVIDIA Grace Hooper (GH200) Superchip and quad-rail NDR200 NVIDIA InfiniBand. It achieved an energy efficiency of 72.733 gigaflops per watt (Gflops/W).

In fact, eight of the top 10 systems were NVIDIA based, with the other two being AMD based. These rankings and performance were quite a change from the previous list, but it points to GPUs being very power efficient on the HPL benchmark.

I got curious about the change in gigaflops per watt over time. The first Green500 list was in June 2013. The number 1 system used GPUs even then (NVIDIA K20 with QDR InfiniBand). The energy efficiency was 3,208.8Mflops/W (0.32Gflops/W). Compare that to the number 1 system in the ISC 2024 list, and the energy efficiency has improved by a factor of about 22.7 times over the 11 years, which is about a two times improvement every year.

To me, this improvement rate is astonishing. A current active discussion talks about the energy usage of HPC in general (more on that) and for artificial intelligence (AI) in particular. Andrew Jones has a good Twitter/X post that discusses energy efficiency in HPC:

HPC helps understand climate change, sustainable energy, more efficient energy use across transport, buildings, manufacturing, etc.
But what about climate impact of HPC?
Global electricity consumption ~3TW. Top500 supercomputers total ~700MW.
So ~0.02% of global electricity.

An even better version of this tweet is on LinkedIn.

However, I’m not sure of the numbers, so I did a little digging. I found a reference stating that the total world electrical consumption in 2022 was 25,530 terawatt-hours (TW-hr). I found another reference for energy consumption of all of IT in the world in 2022 at 240-340TW-hr). If these numbers are correct, then IT – all of IT – consumes 1.33% of the world’s electrical power (assuming the high number of 340TW).

According to Andrew, of the 340TW, HPC used about 700MW. That is 0.2% of the total world power. Although the absolute values are staggering and make me want to turn off all lights and turn the thermostat to 80°F (26°C) in the summer and 50°F (10°C) in the winter, HPC relative to the rest of the world is very small: 0.2%. Even if the HPC world doubled their power use in the next year, it would still only be roughly 0.4% of the world’s consumption, and this number includes the massive computations for training AI models, all the Cloud, and so on. Remember the people complaining that AI was going to increase climate change massively? I think you could safely say that only 0.2% of climate change is from HPC. They need to find something else to complain about before coming after HPC.

Another observation one can make when combining the TOP500 and Green500 (Table 1): You can see that some of the highest ranked HPC systems are also the most energy efficient. For example, the number 8 system on the Green500, the Venado system using NVIDIA, is also number 11 on the TOP500. Also, the number 9 Green500 system, the Adastra system using AMD, is also the number 20 system on the TOP500.

System Green500 Rank TOP500 Rank
Jedi 1 189
Isambard-AI Phase 1 2 128
Helios GPU 3 5
Henri 4 328
preAlps 5 71
HoreKa-Teal 6 299
Frontier TDS 7 54
Venado 8 11
Adastra 9 20
Setonix - GPU 10 28
Dardel GPU 11 114
LUMI 12 5
Frontier 13 1
Alps 14 6
MareNostrum 5 ACC 15 8
CEA-HE 16 8
Goethe-NHR 17 104
Greene-H100 18 410
ATOS THX.A.B 19 252
Pegasus 20 255

The HPC community can be proud of what it has achieved in terms of energy efficiency, but some of the efficiency came from necessity. Without better efficiency, there would not be enough power for such large systems.

IO500

I always like to look at the IO500 because the list is so diverse, and I/O can be an important part of HPC workloads. Moreover, remember that Amdahl’s Law shows how I/O does not help performance scaling. ISC 2024 announced no real changes at the top of the list. The highest ranked new entry was at number 15.

One thing I like to do is go through the IO500 list and look at the top-ranked “common” filesystems in the research category. Because I focus on the more common filesystems, this list is not comprehensive. Moreover, the rankings do not mean that one filesystem is better than another; it's just a list of the top I/O systems in the world for these filesystems. 

Filesystem Ranking
BeeGFS 43
DAOS 3
CephFS 89
Storage Scale
IBM Storage Scale
Storage Scale
(originally called GPFS)
26
Lustre
(including DDN EXAScaler)
23
OrangeFS 115
Panasas 114
VAST 97
WEKA 20

AI Conversation

One could argue that Europe, the Middle East and Africa (EMEA) users have been more focused on HPC simulation and are not as quick to embrace AI in HPC. However, in reading several articles and tweets, it appears that the HPC conversation is including AI more than it was before.

One conversation that floated around at the time of ISC was asking the question, "Is AI its own thing or an HPC workload?" I think this is a very interesting conversation, even though I’ve made up my mind. The interesting aspect of this conversation is: Who is answering the question? Are they a long-time HPC user, or are they coming to large systems from AI? In my opinion, understanding this connection can inform the correlation between the answer and the origin. For what it’s worth, in my opinion, HPC people by and large think of AI as a workload (but I’m a long-term HPC user). AI people think of AI as something new that requires different approaches, despite looking just like HPC.

Accelerators

In some form or another, accelerators are part of every discussion, as they were at ISC 2024. The highest ranked non-accelerator system is, of course, Fugaku – number 4 on the TOP500, where it has been for quite some time as the pinnacle of CPU-only systems. Other CPU-only systems are not near the top.

What you now see are accelerators topping the TOP500, appearing also throughout and outside the list. Accelerators are making a huge impact on HPC for simulation workloads, other workloads, and especially AI, and they are here to stay.

Among the many possible accelerator forms, GPUs absolutely dominate. According to one article, 193 systems in the TOP500 use GPUs (38.6%). However, the systems that use GPUs account for 75.3% of the flops on the list, which illustrates the big boost GPUs offer to the HPL benchmark.

The speedup that GPUs potentially offer is very attractive. One of the favorite Tweets around ISC 24 is from Jeff Hammond, speaking about rewriting code for GPUs:

Hammond’s law: almost no one will rewrite their code for less than 2x and almost everyone will rewrite their code for more than 5x.
This is an obvious, empirical statement I make regularly to explain most of the history of programming models.

Although I have seen exceptions to his law, I bet Hammond’s law is mostly accurate. Jeff has had massive experience in HPC and has seen it all, so there is little reason to doubt him.

Linux Foundation for HPC

Another announcement from ISC 2024 that caught me a little off guard is the one from the Linux Foundation (LF) announcing the High Performance Software Foundation (HPSF). The goal of HPSF is to “build, promote, and advance a portable core software stack.” It further states that it will provide a neutral space for pivotal projects that include government, industrial, and academic fields.

The initial projects under the HPSF umbrella are:

  • Spack: the HPC package manager.
  • Kokkos: a performance-portable programming model for writing modern C++ applications in a hardware-agnostic way.
  • Viskores (formerly VTK-m): a toolkit of scientific visualization algorithms for accelerator architectures. 
  • HPCToolkit: performance measurement and analysis tools for computers ranging from desktop systems to GPU-accelerated supercomputers.
  • Apptainer: formerly known as Singularity, a Linux Foundation project that provides a high-performance, full-featured HPC and computing-optimized container subsystem.
  • E4S: a curated, hardened distribution of scientific software packages.

The immediate question is what will happen to any funds collected by the HPSF. According to the original article, they have stated the following goals:

  • Continuous integration resources tailored for HPC projects
  • Continuously built, turn-key software stacks
  • Architecture support
  • Performance regression testing and benchmarking
  • Collaborations with other LF projects, such as the Open Source Security Foundation (OpenSSF), the Ultra Ethernet Consortium (UEC), the Unified Acceleration (UXL) Foundation, and the Cloud Native Computing Foundation (CNCF)

Although this list sounds really great, I have some doubts. First, notice the projects selected by HPSF. They are mostly from government labs. From what I have experienced, most of the industry and academic institutions don’t use most of these tools, which is especially true for smaller systems, not massive lab systems. Is this a change to get everyone to pay for the lab projects?

Continuing with this argument, notice that HPSF says nothing about the long-term maintenance of these packages. It will primarily make sure they work together and then promote them. There is no mention of paying for the development of these tools; then again, the fees to join HPSF are fairly low ($175,000).

Finally, I didn't hear of any metrics or ways they are going to measure whether HPSF is effective or not. They should not include existing users in any data gathering because they do not measure any effect of HPSF and their efforts. This means measuring net new “customers.” How will they do that? (Does anyone else remember when you had to self-report your use of Linux?)

I’m not the only one who asked these questions. I hope my trepidations, as well as those of others, will be addressed at some point.
 
 
 
 
Want to see what you missed at ISC 2024? Register now for complimentary on-demand content available for a limited time!