Scalable mail storage with Dovecot and Amazon S3

Storage Space

Ceph Basics

These concerns, however, do not mean that you need to refrain entirely from the convenience of S3 storage in Dovecot. Because the S3 protocol is publicly documented, several projects that provide S3 storage exist on a FLOSS basis, including the shooting star of the storage environment, Ceph.

Ceph has received much publicity in recent months, especially because of its sale to Red Hat. Thus, Ceph is a familiar concept for most admins, and it can be seen as an object store with various front ends. The ability to provide multiple front ends is quite a distinction compared with other object stores such as OpenStack Swift. Ceph was designed by its creator Inktank as a universal store for almost everything that happens in a modern data center.

A Ceph cluster ideally consists of at least three machines. Various Ceph components, including at least one monitoring server per host (MON), and storage daemons (object storage daemons, or OSDs) for each existing hard drive then run on these machines.

The monitoring servers are the guards within the storage architecture: They monitor the quorum using the Paxos algorithm to avoid split brains. Generally, a cluster partition is only considered quorate if it contains at least 50 percent of the MONs plus a whole MON. Consequently, in a three-node cluster, a cluster partition is quorate if it sees two MONs – Ceph would automatically switch off a partition that only sees one MON.


Furthermore, the MONs act as the directory for the Ceph cluster: Clients actually talk directly to the hard drives in the cluster (i.e., the OSDs). However, if the clients want to talk to the OSDs, they need to know how to reach them. The MONs export dynamic lists containing the existing OSDs and the existing MONs (OSD and MON maps) and serve up both maps if clients ask for the information. On the basis of the Crush algorithm, clients can then calculate the correct position of binary objects themselves. Ceph does not, in this sense, have a central directory where the target disks are recorded for each individual binary object.

Parallelism is proving to be a greater advantage of Ceph; it is inherent to almost all Ceph services. Individual clients who want to store a 16MB file in Ceph usually divide it into four blocks of 4MB. They then upload all four files onto four OSDs in the cluster at the same time, leveraging the combined write speed of four hard drives. The more spindles there are in a cluster, the more processes the Ceph cluster can deal with simultaneously.

This is an important prerequisite for use in the S3 example: Even mail systems that are exposed to increased loads can easily store various email in Ceph in the background at the same time. Ceph usually only performs badly compared with conventional storage solutions when it comes to sequential write latencies; however, this is irrelevant for the Dovecot S3 example.

Ceph Front Ends

The most attractive object store is useless if the clients cannot communicate with it directly. Ceph provides multiple options for clients to contact it. The RADOS block device emulates a normal hard drive based on Linux. What appears to be locally installed on the client computers is, in fact, a virtual block device. Writes to this block device migrate directly to Ceph in the background.

Ceph FS is a POSIX-compatible filesystem: however, it is Inktank's eternal problem child, which has stubbornly remained unfinished for years. The Ceph Object Gateway, however, is really interesting for the S3 example shown in Figures 3, 4, and 5. The construct previously called RADOS Gateway is based on Librados, which allows direct and native access to Ceph objects. The RADOS Gateway, on the other hand, exposes RESTful APIs that either follow the syntax of Amazon S3 or OpenStack Swift.

Figure 3: An existing Ceph cluster can be expanded quickly to a Ceph Object Gateway using an entry like this.
Figure 4: Ceph Object Gateway needs a Fast CGI-enabled web browser for it to work. Apache with mod_fastcgi is the typical combination.
Figure 5: The proof: The Ceph Object Gateway lifts its head when a URL is called, indicating that no buckets are stored for the anonymous user.

The Ceph Object Gateway might not implement the S3 specification completely, but the main features can be found in the gateway. This completes the solution: A cluster with at least three nodes operates on local Ceph disks. Additionally, a server controls the Ceph Object Gateway and allows various instances of Dovecot with the S3 plugin to store email.

Such a solution scales on all levels: If the Ceph cluster needs more space, you can just use more computers. If the strain on the Dovecot server becomes too high, you can also use more computers. As long as there is enough space, this principle can be expanded with practically no limits.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Getting Ready for the New Ceph Object Store

    The Ceph object store remains a project in transition: The developers announced a new GUI, a new storage back end, and CephFS stability in the just released Ceph v10.2.x, Jewel.

  • Ceph object store innovations
    The Ceph object store remains a project in transition: The developers announced a new GUI, a new storage back end, and CephFS stability in the just released Ceph c10.2.x, Jewel.
  • Ceph and OpenStack Join Forces

    When building cloud environments, you need more than just a scalable infrastructure; you also need a high-performance storage component. We look at Ceph, a distributed object store and filesystem that pairs well in the cloud with OpenStack.

  • Comparing Ceph and GlusterFS
    Many shared storage solutions are currently vying for users’ favor; however, Ceph and GlusterFS generate the most press. We compare the two competitors and reveal the strengths and weaknesses of each solution.
  • Troubleshooting and maintenance in Ceph
    We look into some everyday questions that administrators with Ceph clusters tend to ask: What do I do if a fire breaks out or I run out of space in the cluster?
comments powered by Disqus