The US Department of Energy (DOE) in December issued a detailed report summarizing the Magellan project, which investigated the role of cloud computing, addressing the HPC computing needs of scientists funded by the DOE Office of Science.

The Magellan Report on Cloud Computing

The Magellan project is a two-year research and development effort to establish a nationwide scientific mid-range distributed computing and data analysis testbed. The testbed has two sites – the Argonne Leadership Computing Facility (ALCF) and the National Energy Research Scientific Computing Center (NERSC) – with, according to the report, "multiple 10s of teraflops and multiple petabytes of storage, as well as appropriate cloud software tuned for moderate concurrency." The project, which was funded through the DOE Office of Advanced Scientific Computing Research, investigated the potential role of cloud computing, particularly as regards serving the needs of mid-range computing and data-intensive computing workloads.

The testbed, which was designed to explore various computing models and hardware design points, consists of IBM iDataPlex servers and a mix of servers, including Active Storage, Big Memory, and GPU servers connected through an InfiniBand fabric. The testbed has a mix of storage options, including distributed and global disk storage, archival storage, and two classes of flash storage. The system provides both a high-bandwidth, low-latency, quad-data-rate InfiniBand network as well as a commodity Gigabit Ethernet network. According to the report, this configuration is different from a typical cloud infrastructure but is more suitable for the needs of scientific applications.

During the past two years, the project looked at various cloud models such as Infrastructure as a Service (IaaS) and Platform as a Service (PaaS), virtual software stacks, MapReduce and its open source implementation (Hadoop), and resource provider and user perspectives. Specifically, the Magellan project was charged with answering the following research questions:

  • Are the open source cloud software stacks ready for DOE HPC science?
  • Can DOE cybersecurity requirements be met within a cloud?
  • Are the new cloud programming models useful for scientific computing?
  • Can DOE HPC applications run efficiently in the cloud? What applications are suitable for clouds?
  • How usable are cloud environments for scientific applications?
  • When is it cost effective to run DOE HPC science in a cloud?

The 169-page report summarizes the finding as follows:

• Cloud approaches provide many advantages, including customized environments that enable users to bring their own software stack and try new computing environments without significant administration overhead, the ability to quickly surge resources to address larger problems, and the advantages that come from increased economies of scale. Virtualization is the primary strategy of providing these capabilities. Our experience working with application scientists using the cloud demonstrated the power of virtualization to enable fully customized environments and flexible resource management, and their potential value to scientists.

• Cloud computing can require significant initial effort and skills to port applications to these new models. This is also true for some of the emerging programming models used in cloud computing. Scientists should consider this upfront investment in any economic analysis when deciding whether to move to the cloud.

• Significant gaps and challenges exist in the areas of managing virtual environments, workflows, data, security, and others. Further research and development is needed to ensure that scientists can easily and effectively harness the capabilities exposed with these new computing models. This would include tools to simplify using cloud environments, improvements to open-source clouds software stacks, providing base images that help bootstrap users while allowing them flexibility to customize these stacks, investigation of new security techniques and approaches, and enhancements to MapReduce models to better fit scientific data and workflows. In addition, there are opportunities in exploring ways to enable these capabilities in traditional HPC platforms, thus combining the flexibility of cloud models with the performance of HPC systems.

• The key economic benefit of clouds comes from the consolidation of resources across a broad community. Existing DOE centers already achieve many of the benefits of cloud computing because these centers consolidate computing across multiple program offices, deploy at large scales, and continuously refine and improve operational efficiency.

According to the report, cloud models often provide additional capabilities and flexibility that are helpful to certain workloads. The report recommends that DOE labs and centers consider adopting and integrating these features of cloud computing research.

Specifically, the report's findings are as follows:

Finding 1. Scientific applications have special requirements that require solutions that are tailored to these needs.
Finding 2. Scientific applications with minimal communication and I/O are best suited for clouds.
Finding 3. Clouds require significant programming and system administration support.
Finding 4. Significant gaps and challenges exist in current open-source virtualized cloud software stacks for production science use.
Finding 5. Clouds expose a different risk model requiring different security practices and policies.
Finding 6. MapReduce shows promise in addressing scientific needs but current implementations have gaps and challenges.
Finding 7. Public clouds can be more expensive than in-house large systems.
Finding 8. DOE supercomputing centers already approach energy efficiency levels achieved in commercial cloud centers.
Finding 9. Cloud is a business model and can be applied at DOE supercomputing centers.

Additionally, the report states that performance of tightly coupled applications running on virtualized clouds using commodity networks can be significantly lower than on clusters optimized for these workloads. This can be true even at mid-range computing scales. As a result, the report concludes that current cloud systems are best suited for high-throughput, loosely coupled applications with modest data requirements.

You can read the complete report here.