S3QL filesystem for cloud backups

Cloud Storage

Cloud computing provides computing resources and storage that can be accessed by any system connected to the Internet. Generally, this takes the form of virtual machines (VMs) and storage both in the VMs (direct attached) and over the network (usually block devices formatted and exported with NFS). Typically, users configure the VMs with their operating system and applications of choice, perhaps configure some storage, and start running. After that, the data is copied back to permanent storage (perhaps local). However, this is not the only way to utilize cloud resources. In this article, I'll focus on the storage aspect of the cloud, which can be used for backups and duplicates of data on a large or small scale.

Amazon S3 Storage

Although several cloud storage options are available, I'll focus on Amazon S3 [1] because, arguably, Amazon is the thousand-pound gorilla in cloud storage. The exact details of S3 are not public, but you can think of it as an object-based storage system. To begin using S3, you create one or more buckets. Each bucket contains objects, and you have no limits on the number of objects per bucket. Each object is a file and any associated metadata (e.g., ACLs).

Currently, each object can be up to 5 terabytes (TB) in size accompanied by up to 2KB of metadata. However, S3 can only work with 5GB files in a single write operation, so S3 breaks files larger than 5GB into multiple pieces. You typically interact with the objects (files) with a few simple commands: write (PUT), read (GET), or delete (DELETE). The usual method of interacting with S3 is via a web console [2]. Amazon also offers a command-line interface (CLI) [3] that can be used to interact with S3. The basic

...

Use Express-Checkout link below to read the full article (PDF).