Photo by Tim Foster on Unsplash

Detect failures and ensure high availability

On the Safe Side

Article from ADMIN 75/2023

By Petros Koutoupis

Eliminate single points of failure and service downtime with the DRBD distributed replicated storage system and the Corosync and Pacemaker service.

Many clustering and high-availability frameworks have been developed for Linux, but in this article, I focus on the more mainstream and widely used Corosync and Pacemaker service and DRBD. If you follow along, you'll learn how to configure an active-passive dual-node cluster that replicates local storage (accessed and written to/read from vital applications or services) to its neighboring node. In this way, you'll be able to host and serve data requests, so long as a single node of the cluster remains online.

High Availability

As we depend more and more on technology and the services it provides, availability becomes increasingly important. In the recent past, hardware vendors and solution providers charged a lot of money for proprietary products that ensured a high level of tolerance for hardware and I/O path failures. Those days have come and gone. Today, the data center has evolved, and with that evolution comes the adoption of more commodity hardware (i.e., hardware that is not built with the level of resilience or sophistication typically seen on a mainframe, an IBM POWER processor, or other equivalent machines).

These non-commodity machines were designed from the bottom up to sustain all sorts of internal hardware and path failures – redundant memory, CPUs, network interfaces, power supplies, and more. However, that type of functionality also came at a much higher cost. A level of complexity also was introduced by those same proprietary systems. Diagnosing, replacing, and repairing faulty or problematic components required both deep pockets and a well-trained technician. These factors were, and continue to be, the primary reasons for opting for commodity technology (containing only a subset of hardware redundancy) and instead relying on the software to handle all sorts of failure scenarios.

More affordable off-the-shelf server solutions provided the data center with the

...

Use one of the options below to read the full article