How Linux and Beowulf Drove Desktop Supercomputing


Although most of you will know at least some of the history of Linux, I think it’s fairly important to review the history quickly, particularly in the context of the 1990s into the early 2000s. Without going back to the Multics operating system (OS), I’ll start with efforts to create PC versions of Unix.

In 1987, Professor Andrew Tannenbaum created MINIX, a Unix-like OS that he used for teaching operating systems. At that time, it was a 16-bit OS, although processors were 32-bit. Even though MINIX was not ported to 32-bit and was typecast as “educational,” it had a great effect on those wanting Unix on PC processors.

Torvalds began a project in 1991 to write a kernel for his hardware, a 32-bit Intel 80386 processor. He used MINIX and GCC for the development platform, and this initial kernel morphed into an OS kernel, which he announced on August 25, 1991, on the comp.os.minix newsgroup.

Very quickly components were added to this kernel, including those from the GNU project. I remember when networking was added because there was a great deal of excitement on the newsgroups. In 1992, X Windows was ported to Linux, supplying a GUI. From then on, Linux usage exploded.

The success of Linux is based on a few factors, all of which combined to launch Linux as an operating system: (1) Its license was free (open source), allowing a veritable army of coders to contribute. (2) Unix had been in use for a long time at universities and research institutes. (3) Computer science students had been learning MINIX for years and writing Unix code. (4) The GNU project had created a great deal of code for Unix-like operating systems.


Beowulf clusters were born from the simple need for a commodity-based cluster system designed as a cost-effective alternative to large supercomputers. Jim Fischer at NASA Goddard created a project plan for the High-Performance Computing and Communications (HPCC)/Earth and Space Sciences project that included “a task for development of a prototype scalable workstation and leading a mass buy procurement for scalable workstations for all the HPCC projects.” The word “workstation” carries a very specific meaning, in that it is designed for individual users. Therefore, although perhaps not explicitly stated, the goal was to create a desktop cluster for use by individuals.

In 1994, on the basis of this project and task, Thomas Sterling and Don Becker built the first Beowulf cluster. It was a modest system consisting of 16 i486DX4 processors and 10Mbps Ethernet. The processors were too fast for a single 10Mbps Ethernet connection, so Becker, an author of Ethernet drivers for Linux, modified his Ethernet drivers for channel bonding two Ethernet interfaces to act as one. The initial system cost roughly $50,000.

Sterling and Becker developed more systems, as did others. Within a year, the cost of the original components for the first Beowulf now cost $28,000; other systems had moved on to faster Ethernet connections and faster processors, snowballing into larger systems with faster networks and faster processors with more cores.

One thing all of these systems had in common was that the software tools used in Beowulf clusters were free, open source, or both. They all used Linux (there were exceptions, of course, but these were small in number). The combination of free and open source software, Linux, and now Beowulf started driving clusters very, very quickly.

To clarify, there is no such thing as “Beowulf software.” Beowulf clusters are designed to be extremely flexible. You can use software tools that help you solve your problems. Some of these tools were developed with Beowulf clusters in mind, but their sum does not equal Beowulf software. Truly, Beowulf clusters comprise Linux, additional software tools and projects, best practices, tutorials, and a community that is willing to help each other.

By the end of the 1990s 28 clusters were in the TOP500. Figure 1 (source: TOP500 Development Over Time Statistics) shows a history of these architectures. Notice how quickly clusters dominated the list.

Figure 1: TOP500 architecture history.