Utilizing Keepalived Unicast for Kea 1.3.0 Site Failover

Utilizing Keepalived Unicast for Kea 1.3.0 Site Failover

ISC Kea 1.3.0 does not yet have built-in support for same or site failover, hot standby or an active/active load balanced setup. These HA features are on the roadmap for Kea 1.4.0.

This article will be discussing a multi site failover setup, which can relatively easily be solved by using Keepalived's unicast features combined with two bash scripts and a suitable network setup. Keepalived unicast VRRP is the best solution for cross vlan VRRP communication, since multicast does not traverse vlans without multicast routing.

Network Requirements

Because the Kea servers are on two different sites, with different subnets, and no floating VIP via anycast is used, two DHCP helper adresses need to be configured on the network equipment. In this setup only one of those helper adresses will be active at any given time.

Configuration Management

Since there is no virtual ip in this setup the configurations differ between the two Kea servers. This could be handled by a configuration management system like Puppet. If possible I would recommend putting the more complex configuration under git revision and including it and thereby ensuring easier rollback and Kea syntax checking via git hooks. I will cover why combining configuration managment and git is sane solution in more detail in an upcoming article.

Kea requirements

The Kea servers should be set up using as shared lease backend, such as MySQL.

A minimal configuration could look like this (here with DHCPv6):

{
    "Dhcp6": {
        "interfaces-config": {
            "interfaces": [
                "<network interface/ipv6 address specific for server>"
            ]
        },
        "control-socket": {
            "socket-type": "unix",
            "socket-name": "/tmp/kea-dhcp6-ctrl.sock"
        },
        "lease-database": {
            "type": "mysql",
            "name": "<database name>",
            "user": "<database user>",
            "password": "<mysql password>",
            "host": "<database server>",
            "port": 3306
        },
        "expired-leases-processing": {
            "reclaim-timer-wait-time": 10,
            "flush-reclaimed-timer-wait-time": 25,
            "hold-reclaimed-time": 3600,
            "max-reclaim-leases": 100,
            "max-reclaim-time": 250,
            "unwarned-reclaim-cycles": 5
        },
        "renew-timer": 1000,
        "rebind-timer": 2000,
        "preferred-lifetime": 3000,
        "valid-lifetime": 4000,
        "shared-networks": [
            {
                "name": "Test",
                "subnet6": [
                    {
                        "interface": "<network interface>",
                        "subnet": "<ipv6 subnet>",
                        "reservations": [
                            {
                                "hw-address": "<hw address>",
                                "ip-addresses": [
                                    "<reserved ip>"
                                ]
                            }
                        ],
                        "pools": [
                            {
                                "pool": "<pool range>"
                            }
                        ]
                    }
                ]
            }
        ]
    },
    "Logging": {
        "loggers": [
            {
                "name": "kea-dhcp6",
                "output_options": [
                    {
                        "output": "/var/log/kea-dhcp6.log"
                    }
                ],
                "severity": "INFO",
                "debuglevel": 0
            }
        ]
    }
}

The configuration should be identical on both nodes, with the exception of the interface name and the IPv6 adress (helper adress).

Keepalived setup

Here is an illustration which outlines the basic setup:

Kea 1.3.0 Site Failover

The flow of events in case of failure looks like this:

  1. Keepalived monitors Kea health and if the Kea health check fails; Keepalived immediately goes in to FAULT state.
  2. The fallback node is promoted from BACKUP to MASTER state which in turn starts Kea.
  3. When the Keepalived Kea health check on the primary node succeeds it immediately becomes MASTER again. This is determined by the weight of the priority option.
  4. The fallback node is returned to BACKUP state again and Kea is immediately stopped.

The Keepalived configuration itself:

global_defs {
   router_id aek
   smtp_server "smtp server ip"
   smtp_connect_timeout 30
   notification_email_from noreply@test.com
   notification_email {
        team@test.com
   }
}

vrrp_script kea_check  {
        script       "/usr/local/bin/kea_check.sh"
        interval 1
        fall 2
        rise 2
}

vrrp_instance unicast_instance {
        state MASTER
        interface "interface name"
        virtual_router_id "number between 0-255"
        priority 151
        advert_int 1
        track_interface {
               "interface name"
        }
        notify /usr/local/bin/keepalived_notify.sh
        track_script {
                kea_check
        }


        authentication {
                auth_type PASS
                auth_pass keaaek 
        }

        unicast_src_ip "host ip"
        unicast_peer {
            "the peer ip" 
        }
}

The important parts here is the kea_check script:

#!/bin/bash

function kea_check {

  /bin/systemctl -q is-active kea-dhcp6

}

if grep -q BACKUP /var/run/keepalived.state; then
  if ! kea_check; then
    exit 0
  fi
fi

if grep -q MASTER /var/run/keepalived.state; then
  if ! kea_check; then
    exit 1
  fi
fi

It checks the states of Keepalived, which is written by the keepalived_notfiy.sh script. It also checks the health of the Kea process. It is ok to exit with 0 if Keepalived is in BACKUP state with Kea not running. In MASTER state the opposite applies.

The keepalived_notify.sh script keeps track of Keepalived state and starts or stops Kea depending on the state:

#!/bin/sh

TYPE=$1
NAME=$2
STATE=$3

case $STATE in
        "MASTER") /bin/systemctl start kea-dhcp6
                  echo $STATE > /var/run/keepalived.state
                  exit 0
                  ;;
        "BACKUP") /bin/systemctl stop kea-dhcp6
                  echo $STATE > /var/run/keepalived.state
                  exit 0
                  ;;
        "FAULT") /bin/systemctl stop kea-dhcp6
                  echo $STATE > /var/run/keepalived.state
                  exit 0
                  ;;
        *)        echo "unknown state"
                  exit 1
                  ;;
esac

What About the Clients?

What happens to the clients when their designated DHCP server disappears?

  1. If a Kea node goes down, the client bound to it sends a RENEW request a number of times.
  2. These requests gets ignored by the now active fallback Kea node, because of server-id mismatch, and the clients eventually begin to send REBIND requests instead.
  3. Since the primary Kea node and the fallback node shares lease storage the fallback node can identify the client and will rebind the lease.