FAILOVER The failover feature allows back ends to automatically switch to a different server if the current server fails. Failover Syntax The list of servers is given as a comma-separated list; any number of spaces is allowed around the comma. The servers are listed in order of preference. The list can contain any number of servers. For each failover-enabled config option, two variants exist: primary and backup. The idea is that servers in the primary list are preferred and backup servers are only searched if no primary servers can be reached. If a backup server is selected, a timeout of 31 seconds is set. After this timeout SSSD will periodically try to reconnect to one of the primary servers. If it succeeds, it will replace the current active (backup) server. The Failover Mechanism The failover mechanism distinguishes between a machine and a service. The back end first tries to resolve the hostname of a given machine; if this resolution attempt fails, the machine is considered offline. No further attempts are made to connect to this machine for any other service. If the resolution attempt succeeds, the back end tries to connect to a service on this machine. If the service connection attempt fails, then only this particular service is considered offline and the back end automatically switches over to the next service. The machine is still considered online and might still be tried for another service. Further connection attempts are made to machines or services marked as offline after a specified period of time; this is currently hard coded to 30 seconds. If there are no more machines to try, the back end as a whole switches to offline mode, and then attempts to reconnect every 30 seconds. Failover time outs and tuning Resolving a server to connect to can be as simple as running a single DNS query or can involve several steps, such as finding the correct site or trying out multiple host names in case some of the configured servers are not reachable. The more complex scenarios can take some time and SSSD needs to balance between providing enough time to finish the resolution process but on the other hand, not trying for too long before falling back to offline mode. If the SSSD debug logs show that the server resolution is timing out before a live server is contacted, you can consider changing the time outs. This section lists the available tunables. Please refer to their description in the sssd.conf5 , manual page. dns_resolver_server_timeout Time in milliseconds that sets how long would SSSD talk to a single DNS server before trying next one. Default: 1000 dns_resolver_op_timeout Time in seconds to tell how long would SSSD try to resolve single DNS query (e.g. resolution of a hostname or an SRV record) before trying the next hostname or discovery domain. Default: 3 dns_resolver_timeout How long would SSSD try to resolve a failover service. This service resolution internally might include several steps, such as resolving DNS SRV queries or locating the site. Default: 6 For LDAP-based providers, the resolve operation is performed as part of an LDAP connection operation. Therefore, also the ldap_opt_timeout timeout should be set to a larger value than dns_resolver_timeout which in turn should be set to a larger value than dns_resolver_op_timeout which should be larger than dns_resolver_server_timeout.