// Copyright (C) 2017-2021 Internet Systems Consortium, Inc. ("ISC") // // This Source Code Form is subject to the terms of the Mozilla Public // License, v. 2.0. If a copy of the MPL was not distributed with this // file, You can obtain one at http://mozilla.org/MPL/2.0/. /** @page libdhcp_ha Kea High Availability Hooks Library Welcome to Kea High Availability Hooks Library. This documentation is addressed at developers who are interested in internal operation of the library. This file provides information needed to understand and perhaps extend this library. @section haOverview Overview The High Availability (HA) hooks library is intended for DHCP deployments in which there is a need to sustain the DHCP service in the event if one of the servers becomes unavailable as a result of a crash, power outage or other unexpected situation. The other server belonging to this setup should be able to handle the entire DHCP traffic directed to the system, including the traffic that would be normally handled by the server which became unavailable. Many of the concepts behind the HA hooks library are derived from the DHCP Failover protocol, however this solution has different architecture, uses different state machine and different message formats for communication between the participating servers. This solution is not a DHCP Failover implementation and, therefore, this documentation purposely avoids using the word "Failover" in the context of this library. The HA feature design can be found at Kea HA Design page. @section haWhyHookLibrary Why Hook Library? High Availability is a very important requirement for various DHCP deployments. It is a valid question why such a generic feature is placed in a hook library rather implemented as an integral part of the Kea DHCP servers. If the HA is implemented in the loadable library, users who don't use HA or who don't want to use this particular solution for HA will simply not load this library. The server code without the HA implementation is lighter, easier to understand and debug. High Availability is a pretty complex feature and will certainly keep growing both in size and complexity. Keeping it in a separate code base makes it easier to maintain and use. Also, the HA hooks library requires Kea lease_cmds hook library to be loaded on the participating servers. It would clearly be a bad design to introduce the feature relying on the presence the loadable (lease_cmds) module in the main Kea code. @section haNotableDifferences Notable Differences to ISC DHCP It is worth to briefly explain what are the major differences between Kea HA implementation and the failover implemented in ISC DHCP. There are two protocols that IETF attempted to standardize: DHCPv4 Failover draft, which was an Internet Draft status that had expired Sept. 2003. The other one is RFC8156: DHCPv6 Failover, which was published as Proposed Standard. ISC DHCP implemented the former, but not the latter. As such, ISC DHCP is able to provide failover for DHCPv4 only, not DHCPv6. The second major difference is that both IETF failover protocols are based on MCLT (or Maximum Client Lead Time), sometimes referenced to as lazy updates. This mechanism lets a server respond immediately, which improves latency, but it does so at the cost of greatly increased complexity. The lease is assigned with a very short lifetime, then an update is sent to the other server with a lifetime greater than the client requested. Once the other server confirms the lease, the client's renewal is being updated with a longer lifetime. This approach generates more traffic and causes lease lifetimes to fluctuate greatly, despite an administrator setting it to a specific value. Kea HA does not implement this complexity. It is much simpler and easier to use and understand its operation, although the price to pay for this relative simplicity is a longer response time and somewhat decreased performance. Third difference is that in ISC DHCP the failover relationship is strictly a pair (i.e. two) of servers. On the other hand Kea HA is able to define additional backup servers. While they're not technically participating in the HA relationship, their databases are kept up to date and can be used are replacements that are almost ready to take over the traffic. However, replacing primary or secondary server with a backup requires manual administrator's intervention. The fourth difference is that Kea HA does not support pool rebalancing yet. When running in load balancing mode, Kea uses hashing mechanism to segregate clients into one of two pools. It is unlikely, but possible that a network would be visited by clients that are predominantly assigned to one server. As a result, this server could ran out of addresses, while its underutilized partner could still have many addresses available. This unfortunate, but unlikely limitation will be removed in the future Kea releases. @section haAyncCommunication Asynchronous Communication with Boost Asio One of the major technical problems with High Availability is that the participating servers must constantly communicate with each other. When one of the servers allocates a lease it must notify its peer about this allocation and provide it with a full information about the allocated lease. The server which has allocated the lease must not respond to the client until its partner confirms that it has saved the lease in its database. This guarantees that, at any given time, both servers hold the most current lease information and any of the servers can take responsibility for managing existing leases if the partner server becomes unavailable. This is similar to the requirement on a single DHCP server which must store the lease information on the persistent storage before responding to the client. Failing to do so may cause the lease information to get lost if the server crashes before writing it to the lease file. The requirement for the partner to store the lease in its lease database and confirming this fact to the server allocating the lease results in increased latency of the DHCP responses to the clients. In order to minimize the latency the idea of "parking" DHCP packets has been introduced. This is a solution for pseudo parallel processing of multiple DHCP packets and to prevent blocking wait during the communication with the other server. When the HA hooks library needs to send a lease update to the partner, the client's packet associated with this lease is "parked", waiting for the communication with the partner to complete. Meanwhile, other incoming DHCP packets are processed (and also parked if necessary). The client which sent the DHCP packet still has to wait for the communication with the partner to complete, but it doesn't have to wait for the server to receive its packet (and start processing it) while previous DHCP transaction is still in progress. This solution requires that the communication between the servers is asynchronous and the most obvious framework for this was Boost ASIO, as it is already used in many different areas of the code. The DHCP servers are processing incoming packets synchronously (in a loop), but each loop pass contains a call to: @code getIOService()->poll(); @endcode which executes callbacks for completed asynchronous operations, such as timers, asynchronous sends and receives. The instance of the IOService is owned by the DHCP servers, but hooks libraries must have access to it and must use this instance to schedule asynchronous tasks. This is why the new hook points "dhcp4_srv_configured" and "dhcp6_srv_configured" have been introduced. These hook points are used by the DHCPv4 and the DHCPv6 servers respectively, to pass the instance of the IOService (via "io_context" argument) to the hooks libraries which require to schedule asynchronous tasks. It is also worth to note that the blocking reception of the DHCP packets may cause up to 1 second delays in the asynchronous operations. This is due to the structure of the main server loop: @code bool Dhcpv4Srv::run() { while (!shutdown_) { try { run_one(); getIOService()->poll(); } catch (const std::exception& e) { // General catch-all exception that are not caught by more specific // catches. This one is for exceptions derived from std::exception. LOG_ERROR(packet4_logger, DHCP4_PACKET_PROCESS_STD_EXCEPTION) .arg(e.what()); } catch (...) { // General catch-all exception that are not caught by more specific // catches. This one is for other exceptions, not derived from // std::exception. LOG_ERROR(packet4_logger, DHCP4_PACKET_PROCESS_EXCEPTION); } } return (true); } @endcode The @c run_one() call includes a @c select() invocation with a timeout of 1 second. The @c poll() is not invoked for at most 1 second while the server is performing this blocking @c select(). Future Kea releases should mitigate this problem by introducing some mechanisms for concurrent reception and processing of the DHCP packets. @section haClientClassification Client Classification in Load Balancing One of the top requirements for the HA was to support load balancing between two participating servers. Even though, current implementation supports only 50/50 split of packets between two servers, the implementation can easily be extended to support different splits. Another supported mode of operation is the "hot-standby" mode in which one of the servers handles the entire traffic and the other server is simply receiving lease updates from it. In case of the failure of the first server, the standby server can automatically switch to handle the DHCP traffic directed to the system. The "load-balancing" mode is more complex in that it requires isolation of address/prefix pools from which the respective servers are allocating leases for the clients. If the two servers were sharing address pools they would frequently run into the conflict whereby both of them would allocate the same address to different clients. This is not a problem in the "hot-standby" mode because there is only one server allocating leases at the given time. The most challenging part in case of load balancing is the configuration of the address pools on respective servers. At the time when the HA design was created, there was no requirement on the HA hooks library to be able to rebalance the pools, e.g. in case one of the pools is nearly exhausted and the other pool include many available addresses or prefixes. This requirement may come in the future, in which case the current approach to the configuration may be enhanced. The current approach uses existing client classification mechanism to statically split allocations accross multiple pools. Client classification was designed to serve as a generic framework to support various scenarios in which clients need to be segregated and associated with selected pools, subnets and shared networks. The load balancing in HA hooks library is nothing else but another use case for client classification. Should new requirements be created for the HA hooks library in the future (e.g. rebalancing), the client classification will need to be extended to adopt those requirements. In fact, client classification was already extended for the Kea 1.4.0 release to allow for selecting a specific pool based on combinations of classes, rather than a single class associated with the server by the HA load balancing algorithm. The examples of the pools split between different device types (e.g. laptops and telephones) and between load balancing servers (e.g. "server1" and "server2") can be found in the Kea Administrator's Manual. @section haCodeStructure HA Hooks Library Code Structure @subsection haService HA Service Class The @c isc::ha::HAService class is a heart of the HA system. It implements the HA state machine. It is derived from the @c isc::util::StateModel class. The states are documented both in the Kea Administrator's Manual and the HA design. The declarations of the states can be found in the @c ha_service_states.h header file because they are used by multiple C++ classes. Besides running the state machine transitions, the @c HAService class serves the following purposes: - Assigns class to the received DHCP packet appropriate for the server selected to process the DHCP packet as a result of load balancing. - Measures the clock skew between the active servers. If the clock skew is too high, it can either log an error or stop the HA function. - Sends lease updates to the partner and receives responses. - Sends heartbeat command to the partner to verify partner's state and its notion of time (for clock skew). - Controls whether the DHCP server should respond to the queries from clients or not. - Synchronizes local lease database by fetching the leases from the partner server. - Controls which packets the server responds to (HA scopes). As of Kea 1.4.0 release, there is only one instance of the @c HAService class created by the HA hooks library. In the future, multiple @c HAService instances may co-exist, each handling an independent HA relationship with another server. For example: a server could be configured to respond to devices in two subnets and establish a connection with two different servers for respective subnets. Lease updates pertaining to the first subnet would be sent via first connection and those pertaining to the second subnet would be sent via the second connection. As of Kea 1.4.0 release, there is exactly one relationship that the Kea server instance can participate in. @subsection haImplementation HA Implementation Class The @c isc::ha::HAImpl class implements callouts and command handlers supported by the HA hooks library. Its methods expect @c isc::hooks::CalloutHandle as arguments and are usually directly called by the callout functions such as @c pkt4_receive etc. This makes it more natural to unit test those implementations because the tests can invoke methods of the @c HAImpl class, rather than the "extern" functions. Internally, the @c HAImpl class methods call methods of the @c HAService class to perform certain actions, such as triggering lease updates, sending heartbeat to another server etc. However, the @c HAImpl still includes a fair amount of logic to retrieve and validate the arguments provided within the @c isc::hooks::CalloutHandle. The @c isc::ha::HAImpl::buffer4Receive and @c isc::ha::HAImpl::buffer6Receive functions deserve some detailed explanation, because not only do they retrieve the arguments provided to the callouts but also perform parsing of the received DHCP queries. The DHCP query parsing is normally performed by the server. In most cases a hooks library would not have to parse the DHCP packets on its own. If the hooks library needs to access some information, e.g. DHCP options or BOOTP message fields, it is sufficient to implement the @c pkt4_receive or @c pkt6_receive callout, which is invoked after the server has parsed the packet. However, this approach would not work in case of the HA hooks library. This library assigns classes as a result of the load balancing to the incoming packets. This assignment must take place before the server evaluates classes specified in the configuration file, i.e. before the @c pkt4_receive and @c pkt6_receive hook point. This implies that the HA specific classification must be performed within the @c buffer4_receive or @c buffer6_receive callouts. These callouts must parse (unpack) the received buffers to have an access into the data used by the load balancing algorithm, such as: MAC address, client identifier or DUID. @subsection haQueryFilter Query Filter Class The @c isc::ha::QueryFilter class is used to control which DHCP queries are to be processed by respective servers. It implements the load balancing algorithm which is triggered by cooperating servers against each incoming packet and results in assigning the packet to one of the served "scopes". Scopes are associated with the servers and are named after the servers. In the load balancing case there are two scopes, e.g. "server1" and "server2". The Load balancing algorithm selects one of the scopes for the packet. During the normal operation, each server handles its own scope. In the "partner-down" state, the surviving server would handle both scopes. The selection of the scopes to be served by the server instance is usually made automatically as a result of transitioning to some new state within the @c HAService class. However, the scopes assignment can also be made via control channel as a result of an administrative action. @subsection haCommunicationState Communication State Class The @c CommunicationState class is used by the @c HAService to control all aspects of the communication between the active servers, i.e.: - Scheduling periodic heartbeat commands using Boost ASIO timers. - Holding the state of the partner returned in response to the heartbeat command. - Recording when the last successful heartbeat has been sent, i.e. how long the partner server has been unresponsive. - Analyzing DHCP queries to detect whether the partner server is not responsive by checking whether the values in the 'secs' field or Elapsed Time option are too high. - Monitoring the clocks skew between the active servers, which is calculated by substracting the current time (on the local server) from the time returned by the partner in response to the heartbeat command. The large part of this class is common for the DHCPv4 and DHCPv6 servers. However, there are differences in how the DHCPv4 and the DHCPv6 messages are analyzed to detect whether the partner server has stopped responding: - The DHCPv4 server uses 'secs' field, while the DHCPv6 server looks into the DHCPv6 specific Elapsed Time option. - When the DHCPv4 server records a client information in case if the DHCPv4 server fails to respond the client's query, it records both the client identifier and the MAC address. The DHCPv6 server uses the DUID to record the client. Those differences led to creation of DHCPv4 and DHCPv6 specific derivations of the @c CommunicationState class, which differently deal with analysis of the queries. The clock skew is checked by the @c QueryFilter class every time it is updated as a result of receiving a response to the heartbeat. If the clock skew is in the range of 30 to 60 seconds, the @c clockSkewShouldWarn returns true to indicate to the @c HAService that a warning should be logged. In order to prevent too frequent warnings (especially when heartbeats are sent frequently), this method implements a simple gating algorithm, which would not return true (trigger the warning) more often than every 60 seconds. The @c isc::ha::CommunicationState::clockSkewShouldTerminate informs whether the clock skew has exceeded 60 seconds, in which case the @c HAService class would transition to the "terminated" state. @subsection haCommandCreator Command Creator Class The @c CommandCreator is a collection of static methods which create commands issued between the HA-enabled DHCP servers. These JSON commands are sent over the @c isc::http::HttpClient from the @c HAService class. @section haShortcomings Future HA Hooks Library Improvement Ideas The HA hooks library was first released with Kea 1.4.0. There are numerous enhancements to this library considered for the future releases. Some of them are briefly described in this section. @subsection haStateMachineControl Controlling State Machine As of Kea 1.4.0, there are no control commands allowing for setting or influencing the transitions between states. In particular, there is no way to pause the HA state machine on the selected state to perform some administrative actions before transitioning to the normal operation state. @subsection haNameUpdates DNS Updates are not Coordinated When one of the servers allocates the lease this server is responsible or sending a DNS update if configured to send such updates. The partner server receives the lease update (including the inserted hostname) so it knows that the hostname was stored in the DNS. When this lease subsequently expires, the hostname must be removed from the DNS. The HA hooks library, however, has no means to record which server has allocated this lease in the lease database. If recording such information had been possible, the same server which allocated the lease would have sent the removal name change request (NCR) to the D2. Because this information is unavailable, both servers will send the removal NCRs. One of those NCRs will succeed, another one will fail. Addressing this issue requires two enhancements: - Implementing "user context" for leases, which could be used for storing custom type of information, e.g. server identifier, along with the leases. - Implementing callouts for the "lease4_expire" and "lease6_expire" hook points via which the server removing the lease from the database could notify the partner about such removal. @section haMTCompatibility Multi-Threading Compatibility The High Availability hooks library is compatible with multi-threading. */