A Brief History of Supercomputers

Trajectory

Of course, clock speeds are not the best approach for comparison, but in the absence of any benchmarks that lasted for more than 15 years during that time period, it will have to serve as a guide. Although absolute clock speed numbers are not important, what is relevant is the growth in clock speeds, as well as the relative values.

From the 1990s through the early 2000s, you can see the trajectory of CPUs. The PC CPUs were very quickly gaining in performance (e.g., adding SIMD instructions). Finally, they became 64-bit in 2003 with the AMD Athlon 64. Clock speeds were also quickly increasing to well over 1GHz and on to 3GHz. Then in 2005, the Athlon 64 X2 introduced multiple cores in a single die that ran at 1.9MHz.

At the same time, supercomputer processors, which were made in much, much smaller quantities, still had lower clock speeds. Cray was using vector processors that could run vectorizable code extremely fast, but even then, the clock speeds barely reached 1GHz in 2005, when PC CPUs were approached 2GHz. The SGI MIPS processors, which were also made for the workstation market, were still under 1GHz when the Origin 3000 launched in 2000.

During the 1990s, the pace of PC CPU development was quickening, with good increases in clock speed and increasing parallelism. The L2 cache was also increasing in capacity over time. Then, in 2003, PC CPUs reached 64-bit with high clock speeds, quickly followed by two cores on a die with clock speeds of 2GHz and greater.

Supercomputers enjoyed a great period of growth in the early 1990s, with better clock speeds, great vectorization, and even additional parallelism across nodes. The early experiments of the late 1980s and early 1990s showed that parallelism from large numbers of processors was possible, although software had challenges trying to take advantage of all that processing.

At the same time, Cray was only making a small number of processors compared with the PC market. Large investments were spread across the development of a small number of processors. However, Cray also used workstation processors, specifically DEC Alpha processors, to reduce costs while still maintaining great performance, as reflected in the popularity of the Cray T3D and T3E systems.

SGI also tried using their MIPS processors in both their workstations and Origin supercomputers to help keep system prices down, making them competitive with Cray.

Overall system performance for these supercomputers was increasingly driven by parallel processing across multiple nodes. The PC processors were very quickly catching up and surpassing supercomputer processors. Tables 1 and 2 show a brief glimpse of the trajectory of PC CPUs and supercomputer processors from the late 1980s into the early 2000s.

Table 1: PC Processor Progression 

Date Processor Highlights
Apr 1989 486DX On-die L1 cache, much better performance than 386L2 on motherboard
Mar 1992 i486DX2 2:1 clock multiplier, 40/20, 50/25, 66/33 speeds; L2 on MB
Mar 1994 i486DX4 3:1 clock multiplier, 75/25, 100/33 speeds; 16KB L1 cache on-die, L2 on motherboard
Mar 1993 Pentium Data bus width doubled to 64 bits, superscalar, FSB of 60-66MHz, clock multiplier of 1; 16–32KiB L1, still external L2 cache
Nov 1995 Pentium Pro 150–200MHz on-package L2 cache (256KB to 1MB); decoupled, superscalar, 14-stage super-pipelined, out-of-order execution, two integer units
Jan 1997 Pentium MMX SIMD (MMX), 166–200MHz
Apr 1997 AMD K6 Supports MMX, 166–300MHz; L1 cache 32+32KB, L2 on motherboard
May 1997 Pentium II Improved Pentium Pro, first Xeon naming, 233–450MHz
May 1988 AMD K6-2 MMX and 3DNOW! SIMD, 200–570MHz; 64KiB L1 cache
Jun 1998 Pentium II Xeon SIMD; L2 cache from 512KB to 2MB
Feb 1999 Pentium III 9.5 million transistors, 450 and 500MHz clock speeds (600MHz in 1999); new SIMD, SSE, introduced; achieved 1GHz in early 2001; max. clock speed of 1.3GHz
Feb 1999 AMD K6-III 400 and 450MHz initial clock speed, ending at 500MHz; L2 cache of 256KB; Socket 7; MMX and 3DNOW! SIMD instructions
Jun 1999 AMD Athlon 500–700MHz
Nov 2000 Pentium 4 NetBurst architecture (not successful); introduced SSE2 (still used today); code could be fast but needed new code optimizations; eventually reached 3.8GHz
Early 2001 Pentium III ≥1.0GHz
May 2001 Xeon 32-bit; 1.4, 1.5, 1.7GHz
Sep 2001 Xeon 2.0–3.6GHz
Sep 2003 Athlon-64 1.0–3.2GHz
Feb 2005 Pentium 4F 64-bit, 2.8–3.8GHz
May 2005 Pentium D, Smithfield Dual-core, 2.66–3.2GHz
May 2005 Athlon 64 X2 Dual-core, 1.9–3.2GHz
Dec 2006 Xeon Clovertown Quad-core, 1.86–2.66GHz
Jan 2010 Nehalem Dual-core; 32+32 L1, 256KB L2, 3MB L3; 2.8GHz, two threads per core

Table 2: Supercomputer Processor Progression 

Date Processor Highlights
1985 NEC SX-1, SX-2 SX-2: four sets of high-performance vector operation pipelines with up to a maximum of 16 arithmetic units, capable of multiple/parallel operation
1988 Cray Y-MP Eight 32-bit vector processors, 167MHz, SRAM main memory single-vector pipeline
1990 NEC SX-3 SIMD, MIMD, four arithmetic processors, up to four sharing the same main memory
1991 Cray C90 Dual-vector pipeline, 244MHz, three times Y-MP performance
1994 Cray T3D DEC Alpha 21064 processors, 3D Torus, 64-bit
1994 Cray J90 Up to 32 vector processors, 100MHz, 4GB of memory, 32-processor T932 costing $59.76 million in 2020 dollars
1994 NEC SX-4 First shipped in 1995, several CPUs arranged into a parallel vector processing node; then, those nodes were installed into a regular SMP arrangement
1995 Cray T90 Evolution of C90, 450MHz processors
1996 Cray T3E DEC Alpha 21164 processor, 300MHz, future processors: 450, 600, and even 675MHz; can scale from 8 to 2,176 PEs, each PE 64MB and 2GB of memory
1996 SGI Origin 2000 R10000 MIPS processor, 180 to 300 and 400MHz
1998 Cray SV-1 Vector cache, 300MHz, later ran at 500MHz
1998 NEC SX-5 4TFLOPS, each node used 16 CPUs, up to 128GB memory
2001 NEC SX-6 Single node, up to eight vector processors, up to 64GB of memory, connect up to 128 nodes in a single system; became Earth Simulator
2003 Cray X1 NUMA, vector, 800MHz, eight-wide vector; air-cooled, up to 64 processors; liquid-cooled, 4,096 processors; 1,024 SMP nodes in 2D Torus; code with Python virtual machine (PVM) and message passing interface (MPI)
2004 SGI Origin 3000 R12000 MIPS processor, up to 360MHz; later R14000 up to 500MHz
2005 Cray X1E Dual-core, 1,150MHz

Commodity Networking

A critical aspect to making distributed computers work together is networking. When PCs were still in their infancy, specialized networks were awfully expensive and sometimes a little fragile. They were used for critical information transmission in industries such as Telco, finance, and government. Supercomputers through the 1990s used some of this specialized networking to achieve high bandwidth and low latency for that time.

For PCs, networking had to match PC pricing. You could not have a $500 to $2,000 PC with a $10,000 networking interface. The specialized networks did not match the low-cost expectation. PCs had to wait for cheaper networking to be developed. This came from Ethernet.

Ethernet, developed around 1973 and 1974, was developed at Xerox PARC, as were so many innovative technologies. Initially, Ethernet ran at 2.94Mbps and was used in several server applications, but not with PCs. In 1980, the Ethernet specification was upgraded to a 10Mbps protocol. Version 2 of the specification, known as Ethernet II, was published in November 1982. By the end of the 1980s, Ethernet had become the overall dominant network technology.

In the early 1980s, Ethernet used 10BASE5 and coaxial cable, which later changed to the 10BASE2 cabling many should remember (recall the “vampire taps”?). Then the world moved on to 10BASE-T, which used twisted-pair cables, as is still used today for common networking.

With 10BASE2 coaxial cabling, the use of Ethernet started to grow outside of supercomputers and specialized networks, bringing the prices of Ethernet, including Ethernet switches and routers, down, which caused more usage, and so on.

Around 1995, the next generation of Ethernet, Fast Ethernet, was introduced. This is probably the start of true commodity networking, with a performance of 100Mbps, 10 times faster than the previous generation. It was a quantum leap in performance, with prices dropping rapidly to the point where it became ubiquitous. The low prices allowed Fast Ethernet network interface cards (NICs), Ethernet switches, and Ethernet routers to be put into homes.

The first cluster I helped bring into Lockheed Martin used Fast Ethernet as the cluster interconnect. For the computational fluid dynamics applications, we used Fast Ethernet, which allowed code to scale very well to the point of a single application across the entire cluster. Granted it was only 64 nodes with dual processors at that time, but the price and performance were revelations to us.

Gigabit Ethernet, commonly referred to as “GigE,” runs at 1,000Mbps and was introduced in 1999. It could still use 10BASE-T twisted-pair, keeping prices low, and delivered another 10 times jump in performance for commodity networking. GigE is still going strong for small HPC systems and in homes.

Commodity networking, starting with Fast Ethernet, came about around the same time as commodity processors (PC CPUs). In a definite sense, they feed off each other. As networking got less expensive, it was cost effective to buy more PCs and add more capability, pushing PC prices down. As processors got less expensive, more systems were purchased, which needed networking, which increased the amount of networking needed and drove down networking costs.