Lead Image © madpixblue, 123RF.com

Lead Image © madpixblue, 123RF.com

High Availability without Pacemaker


Article from ADMIN 21/2014
Managing your cluster could be so simple if it weren't so complicated. The object of many an admin's wrath in such cases is often a single component: Pacemaker. Luckily, other open source tools offer alternative options for high availability.

The Pacemaker cluster resource manager is not the friendliest of applications. In the best case scenario, Pacemaker will keep the load balanced on your cluster and monitor the back-end servers, rerouting work to maintain high availability when a system goes down. Pacemaker has come a long way since the release of its predecessor, Heartbeat 2 – an unfriendly and unstable tool that was only manageable at all by integrating XML snippets.

Pacemaker no longer has the shortcomings of Heartbeat, but it still has not outgrown some of the original usability issues. The Pacemaker resource manager has ambitious goals, but what looks promising in the lab often fails in reality, because it is too difficult to use (Figure 1). Additionally, it is not as if development of Pacemaker has progressed particularly well: Although the project can look back on several years of existence, bugs have repeatedly reared their heads, thereby undermining confidence in the Pacemaker stack.

Figure 1: Object of hate: The Pacemaker cluster manager provokes spontaneous nightmares for many admins. Granted, the software is not very intuitive.

Admins often face a difficult choice: Introducing Pacemaker might solve the HA problem, but it means installing a "black box" in the environment that apparently does whatever it wants. Of course, not having high availability at all is not a genuine alternative. This dilemma leads to the question of whether it is possible to achieve meaningful high availability in some other way. A number of FOSS solutions vie for the admin's attention.

Understanding High Availability

The universal goal of high availability is that a user must never notice that a server just crashed. For most users, this is already an inherent requirement: If you sit down in front of your computer at three o'clock in the morning, you expect your provider's email service to work just as it would at three in the afternoon. In this kind of construct, users are not interested in which server delivers their mail, they just want to be able to access their mail at any time.

The "failover" principle achieves transparent high availability by working with dynamic IP addresses. An IP address is assigned to a service and migrates with the service from one server to another if the original server fails.

This kind of magic works well for stateless protocols; in HTTP, for example, it does not matter whether Server A or Server B delivers the data  – incoming requests always arrive at the server that currently has the "Service IP" address. Stateful connections are more complex, but most applications that establish stateful connections have features that automatically reopen a connection that has broken down. The users do not notice anything more than a brief hiccup in the service – and definitely not a failure.

This is precisely the "evil" type of configuration that requires a service like a cluster manager. However, some solutions let you get along without Pacemaker or degrade Pacemaker to a simple auxiliary service that no longer controls the whole cluster. In this article, I'll show how this works through four individual examples: HAProxy as a load balancer, VRRP by means of keepalived for routing, inherent high availability with the ISC DHCP server, and classic scale-out HA based on Galera.

Variant 1: Load Balancing

Load balancing belongs to the category of scale-out solutions and, as a principle, has basically existed for several decades. The pivotal point is the idea that a single service, the load balancer, responds to connections that arrive at an IP address and forwards them in the background to target servers. The number of possible target servers is not limited, and target servers can also can be dynamically added or removed. If a service provider notices that the platform no longer offers sufficient performance, it just adds more target servers, thus ensuring more resources, which the servers in the setup can utilize.

Although originally designed for HTTP, the load balancer concept now includes just about every conceivable protocol. Essentially, it's always just a matter of forwarding an incoming connection on a TCP/IP port to a port on another server, which is something that almost any protocol can do without much difficulty.

Numerous representatives of the load balancer function on Linux keep the flag for this type of program flying; in particular, HAProxy [1] has a large community. HAProxy can use different redirect methods, depending on which protocol the application speaks (Figure 2). Admins can introduce good and reliable failsafes for mail servers, web servers, and other simple services with the help of load balancers.

Figure 2: HAProxy is a classic load balancer and ensures that incoming requests are distributed to many back ends.

A setup in which HAProxy ensures the permanent availability of a service involves some challenges. Challenge number 1 is the availability of data; you will usually want the same data to be available across all your web servers. To achieve this goal, you could use a classic NFS setup, but newer solutions such as GlusterFS are also definite options.

Factory-supplied appliances often have built-in HA features out of the box so that admins do not have to worry about implementing it. However, if you use plain vanilla HAProxy, a basic cluster manager is necessary. This alone does not question the concept, because the cluster manager in this example would only move an IP address from one computer to another and back – unless you use VRRP.

Variant 2: Routing with VRRP

One scenario in which the use of Pacemaker is particularly painful involves classic firewall and router systems. If you do not use a hardware router from the outset, you are likely to assume that a simple Linux-based firewall is probably the best solution for security and routing on the corporate network. Although there is nothing wrong with this assumption, you will rarely need more than one DHCP server or a huge number of iptables rules. Of course, the corporate firewall must somehow be highly available; if the server fails, work grinds to a standstill in the company.

The classic approach using Pacemaker involves operating services such as the DHCP server or the firewall, along with the appropriate masquerading configuration, as separate services in Pacemaker. This method inevitably drags in the entire Pacemaker stack, which is precisely the component you wanted to avoid. VRRP can provide efficient routing functions on multiple systems.

VRRP stands for "Virtual Redundancy Router Protocol." It dates back to 1998, and companies such as IBM, Microsoft, and Nokia were involved in its development. VRRP works on a simple principle: Rather than acting as individual devices on a network, all routers configured via VRRP act as a logical group. The logical router has both a virtual MAC and a virtual IP address, and one member of the VRRP pool always handles the actual routing based on the given configuration.

Failover is an inherent feature of the VRRP protocol: If one router fails, another takes over the virtual router MAC address and the virtual IP address. For the user, this means, at most, a small break but not a noticeable loss.

On Linux systems, VRRP setups can be created relatively easily through the previously mentioned keepalived software [2]. Keepalived was not initially conceived for VRRP, but it's probably the most common task for keepalived today.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>


		<div class=