Highly available loadbalancers

This post is from 2017 and some details (such as OS versions and tooling) may now be outdated. It is kept for historical reference.

Failure is inevitable. Given enough time and pressure, any given component in your system can fail; the best you can do is ensure that you are aware of the likelihood and behavior of each failure.

One such action would be to use a redundant set of load balancers, which are often the gatekeepers to your application. It is important to note that high availability is not a magic solution you can just append but needs to be part of the very core of the architecture.

In this example, we are running a fairly standard setup with load balancers, application servers, and databases. To further increase our redundancy, we want to ensure our application servers are stateless to ease with scaling and availability as well as ensure that our database cluster is set up to have the write consistency we need for our use case. This is not something that can be assumed but needs to be tested. You need to know that each application server is independent enough that they can be seamlessly replaced, and you need to know the behavior of your databases when a component fails. While it ultimately needs to be tested in production, it is advisable to start by doing extensive testing in separate environments.

Virtual IP on CentOS 7

In this example, we will be using Linux CentOS 7 and pcs/pacemaker/corosync to control our virtual IP. We'll be setting up three load balancers where two are active, as these were configured for a scenario where a single load balancer could not handle the load during peak hours. For n+1 redundancy, we had to have three nodes (two active to handle the load and one hot-standby).

For this, we'll be using two Virtual IPs that are shared between three servers. These IP addresses are then configured with round-robin DNS to spread the load somewhat evenly; they should never be located on the same server, as that would overload that server.

Initial setup

For initial setup, it's easier to first try the configuration in a VM. For this, you could use the following Vagrantfile simulating three nodes, each with two network interfaces:

Vagrant.configure(2) do |config|
	config.vm.box = "centos/7"
	(1..3).each do |i|
		config.vm.define vm_name = "node%d" % i do |config|
			config.vm.hostname = vm_name
			# Internal network
			config.vm.network "private_network", ip: "10.2.0.%d" % (i+10)
			# Pretend "External" network
			config.vm.network "private_network", ip: "172.29.0.%d" % (i+10)
		end
	end
end

Vagrantfile is only provided for simplicity and is not required to follow this guide. After running vagrant up, you should end up with the following configuration:

Name	Internal IP	"External" IP
node1	10.2.0.11	172.29.0.11
node2	10.2.0.12	172.29.0.12
node3	10.2.0.13	172.29.0.13
VIP1	N/A	172.29.0.21
VIP2	N/A	172.29.0.22

With VIP1 and VIP2 still to be configured. For simplicity, we'll configure the hosts to be able to talk with each other using the shortname of the hosts rather than the IP. For now, instead of configuring it via DNS, we'll just configure the /etc/hosts on the servers:

10.2.0.11	node1
10.2.0.12	node2
10.2.0.13	node3

We start by installing pcs:

$ yum install pcs

As you may note, this will install multiple dependencies. Services of note are:

pacemaker - heartbeat daemon, responsible for keeping track of services.
corosync - resource manager, responsible for starting/stopping services.
pcs - pacemaker/corosync configuration system, responsible for controlling and configuring pacemaker and corosync.

Start and enable the pcs daemon to run on boot:

$ systemctl start pcsd
$ systemctl enable pcsd

Set the password for the hacluster user (this will be used while the servers are communicating):

$ passwd hacluster

Firewall

Configure the firewall to allow communication on the internal interface:

For TCP: Ports 2224, 3121, 21064
For UDP: Ports 5405
For DLM (if using the DLM lock manager with clvm/GFS2): Port 21064

For simplicity, these are grouped under the service high-availability:

$ firewall-cmd --add-service=high-availability --zone internal
$ firewall-cmd --permanent --add-service=high-availability --zone internal

Ensure the internal interface is configured to use the internal zone. Set ZONE=internal in /etc/sysconfig/network-scripts/ifcfg-{DEVICE}:

$ systemctl restart network
$ firewall-cmd --list-all --zone internal
internal (active)
  interfaces: eth1
  sources:
  services: high-availability ssh
  ports:
  masquerade: no
  forward-ports:
  icmp-blocks:
  rich rules:

Ensure you have run the above on all nodes and they can all communicate with each other using the hostnames, then authenticate the nodes:

$ pcs cluster -u hacluster auth node1 node2 node3
Password: <enter hacluster password>
node1: Authorized
node3: Authorized
node2: Authorized

Setup the cluster:

$ pcs cluster setup --name lb_cluster1 node1 node2 node3
Destroying cluster on nodes: node1, node2, node3...
node3: Stopping Cluster (pacemaker)...
node2: Stopping Cluster (pacemaker)...
node1: Stopping Cluster (pacemaker)...
node1: Successfully destroyed cluster
node3: Successfully destroyed cluster
node2: Successfully destroyed cluster

Sending 'pacemaker_remote authkey' to 'node1', 'node2', 'node3'
node2: successful distribution of the file 'pacemaker_remote authkey'
node1: successful distribution of the file 'pacemaker_remote authkey'
node3: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
node1: Succeeded
node2: Succeeded
node3: Succeeded

Synchronizing pcsd certificates on nodes node1, node2, node3...
node1: Success
node3: Success
node2: Success
Restarting pcsd on the nodes in order to reload the certificates...
node1: Success
node3: Success
node2: Success

Start the cluster:

$ pcs cluster start --all
node3: Starting Cluster...
node2: Starting Cluster...
node1: Starting Cluster...

Enable the cluster:

$ pcs cluster enable --all
node1: Cluster Enabled
node2: Cluster Enabled
node3: Cluster Enabled

Verify the status:

$ pcs status
Cluster name: lb_cluster1
WARNING: no stonith devices and stonith-enabled is not false
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Wed Nov 15 16:13:05 2017
Last change: Wed Nov 15 16:11:44 2017 by hacluster via crmd on node1

3 nodes configured
0 resources configured

Online: [ node1 node2 node3 ]

No resources


Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Disable STONITH

STONITH (Shoot The Other Node In The Head) is a fencing mechanism that prevents "split-brain" scenarios where multiple nodes believe they own a resource. For stateful clusters (like databases), STONITH is mandatory to prevent data corruption. However, for a stateless load balancer cluster where the only shared resource is an IP address, we can sometimes get away with disabling it if we accept a brief period of IP conflict during a failure.

$ pcs property set stonith-enabled=false

Set up the virtual IPs

$ pcs resource create VIP1 ocf:heartbeat:IPaddr2 ip=172.29.0.21 nic=eth2:1 cidr_netmask=24 op monitor interval=20s
$ pcs resource create VIP2 ocf:heartbeat:IPaddr2 ip=172.29.0.22 nic=eth2:2 cidr_netmask=24 op monitor interval=20s

Prevent the virtual IPs from being assigned to the same node:

$ pcs constraint colocation add VIP1 with VIP2 score=-INFINITY
$ pcs constraint colocation add VIP2 with VIP1 score=-INFINITY

Load Balancer

Now that we have our Virtual IPs failing over between nodes, you should ensure your actual load balancer service (e.g., HAProxy or Nginx) is installed and configured on all nodes. For a truly high-availability setup, you should also have Pacemaker manage the load balancer service as a resource (or a cloned resource) to ensure it is always running alongside your VIPs.

Everything should now be configured; verify by using:

$ pcs status
Cluster name: lb_cluster1
Stack: corosync
Current DC: node1 (version 1.1.16-12.el7_4.4-94ff4df) - partition with quorum
Last updated: Wed Nov 15 16:28:31 2017
Last change: Wed Nov 15 16:28:29 2017 by root via cibadmin on node2

3 nodes configured
2 resources configured

Online: [ node1 node2 node3 ]

Full list of resources:

 VIP1	(ocf::heartbeat:IPaddr2):	Started node1
 VIP2	(ocf::heartbeat:IPaddr2):	Started node2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Once you are confident everything is working as expected, set up A records for the VIPs in DNS, and you should be done.

;; ANSWER SECTION:
example.com.	IN	A	172.29.0.21
example.com.	IN	A	172.29.0.22