summaryrefslogtreecommitdiffstats
path: root/doc/dnssec-guide/troubleshooting.rst
diff options
context:
space:
mode:
Diffstat (limited to 'doc/dnssec-guide/troubleshooting.rst')
-rw-r--r--doc/dnssec-guide/troubleshooting.rst589
1 files changed, 589 insertions, 0 deletions
diff --git a/doc/dnssec-guide/troubleshooting.rst b/doc/dnssec-guide/troubleshooting.rst
new file mode 100644
index 0000000..cdc40cc
--- /dev/null
+++ b/doc/dnssec-guide/troubleshooting.rst
@@ -0,0 +1,589 @@
+.. Copyright (C) Internet Systems Consortium, Inc. ("ISC")
+..
+.. SPDX-License-Identifier: MPL-2.0
+..
+.. This Source Code Form is subject to the terms of the Mozilla Public
+.. License, v. 2.0. If a copy of the MPL was not distributed with this
+.. file, you can obtain one at https://mozilla.org/MPL/2.0/.
+..
+.. See the COPYRIGHT file distributed with this work for additional
+.. information regarding copyright ownership.
+
+.. _dnssec_troubleshooting:
+
+Basic DNSSEC Troubleshooting
+----------------------------
+
+In this chapter, we cover some basic troubleshooting
+techniques, some common DNSSEC symptoms, and their causes and solutions. This
+is not a comprehensive "how to troubleshoot any DNS or DNSSEC problem"
+guide, because that could easily be an entire book by itself.
+
+.. _troubleshooting_query_path:
+
+Query Path
+~~~~~~~~~~
+
+The first step in troubleshooting DNS or DNSSEC should be to
+determine the query path. Whenever you are working with a DNS-related issue, it is
+always a good idea to determine the exact query path to identify the
+origin of the problem.
+
+End clients, such as laptop computers or mobile phones, are configured
+to talk to a recursive name server, and the recursive name server may in
+turn forward requests on to other recursive name servers before arriving at the
+authoritative name server. The giveaway is the presence of the
+Authoritative Answer (``aa``) flag in a query response: when present, we know we are talking
+to the authoritative server; when missing, we are talking to a recursive
+server. The example below shows an answer to a query for
+``www.example.com`` without the Authoritative Answer flag:
+
+::
+
+ $ dig @10.53.0.3 www.example.com A
+
+ ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.com a
+ ; (1 server found)
+ ;; global options: +cmd
+ ;; Got answer:
+ ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 62714
+ ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
+
+ ;; OPT PSEUDOSECTION:
+ ; EDNS: version: 0, flags:; udp: 4096
+ ; COOKIE: c823fe302625db5b010000005e722b504d81bb01c2227259 (good)
+ ;; QUESTION SECTION:
+ ;www.example.com. IN A
+
+ ;; ANSWER SECTION:
+ www.example.com. 60 IN A 10.1.0.1
+
+ ;; Query time: 3 msec
+ ;; SERVER: 10.53.0.3#53(10.53.0.3)
+ ;; WHEN: Wed Mar 18 14:08:16 GMT 2020
+ ;; MSG SIZE rcvd: 88
+
+Not only do we not see the ``aa`` flag, we see an ``ra``
+flag, which indicates Recursion Available. This indicates that the
+server we are talking to (10.53.0.3 in this example) is a recursive name
+server: although we were able to get an answer for
+``www.example.com``, we know that the answer came from somewhere else.
+
+If we query the authoritative server directly, we get:
+
+::
+
+ $ dig @10.53.0.2 www.example.com A
+
+ ; <<>> DiG 9.16.0 <<>> @10.53.0.2 www.example.com a
+ ; (1 server found)
+ ;; global options: +cmd
+ ;; Got answer:
+ ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 39542
+ ;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
+ ;; WARNING: recursion requested but not available
+ ...
+
+The ``aa`` flag tells us that we are now talking to the
+authoritative name server for ``www.example.com``, and that this is not a
+cached answer it obtained from some other name server; it served this
+answer to us right from its own database. In fact,
+the Recursion Available (``ra``) flag is not present, which means this
+name server is not configured to perform recursion (at least not for
+this client), so it could not have queried another name server to get
+cached results.
+
+.. _troubleshooting_visible_symptoms:
+
+Visible DNSSEC Validation Symptoms
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+After determining the query path, it is necessary to
+determine whether the problem is actually related to DNSSEC
+validation. You can use the ``+cd`` flag in ``dig`` to disable
+validation, as described in
+:ref:`how_do_i_know_validation_problem`.
+
+When there is indeed a DNSSEC validation problem, the visible symptoms,
+unfortunately, are very limited. With DNSSEC validation enabled, if a
+DNS response is not fully validated, it results in a generic
+SERVFAIL message, as shown below when querying against a recursive name
+server at 192.168.1.7:
+
+::
+
+ $ dig @10.53.0.3 www.example.org. A
+
+ ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.org A
+ ; (1 server found)
+ ;; global options: +cmd
+ ;; Got answer:
+ ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 28947
+ ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
+
+ ;; OPT PSEUDOSECTION:
+ ; EDNS: version: 0, flags:; udp: 4096
+ ; COOKIE: d1301968aca086ad010000005e723a7113603c01916d136b (good)
+ ;; QUESTION SECTION:
+ ;www.example.org. IN A
+
+ ;; Query time: 3 msec
+ ;; SERVER: 10.53.0.3#53(10.53.0.3)
+ ;; WHEN: Wed Mar 18 15:12:49 GMT 2020
+ ;; MSG SIZE rcvd: 72
+
+With ``delv``, a "resolution failed" message is output instead:
+
+::
+
+ $ delv @10.53.0.3 www.example.org. A +rtrace
+ ;; fetch: www.example.org/A
+ ;; resolution failed: SERVFAIL
+
+BIND 9 logging features may be useful when trying to identify
+DNSSEC errors.
+
+.. _troubleshooting_logging:
+
+Basic Logging
+~~~~~~~~~~~~~
+
+DNSSEC validation error messages show up in ``syslog`` as a
+query error by default. Here is an example of what it may look like:
+
+::
+
+ validating www.example.org/A: no valid signature found
+ RRSIG failed to verify resolving 'www.example.org/A/IN': 10.53.0.2#53
+
+Usually, this level of error logging is sufficient.
+Debug logging, described in
+:ref:`troubleshooting_logging_debug`, gives information on how
+to get more details about why DNSSEC validation may have
+failed.
+
+.. _troubleshooting_logging_debug:
+
+BIND DNSSEC Debug Logging
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+A word of caution: before you enable debug logging, be aware that this
+may dramatically increase the load on your name servers. Enabling debug
+logging is thus not recommended for production servers.
+
+With that said, sometimes it may become necessary to temporarily enable
+BIND debug logging to see more details of how and whether DNSSEC is
+validating. DNSSEC-related messages are not recorded in ``syslog`` by default,
+even if query log is enabled; only DNSSEC errors show up in ``syslog``.
+
+The example below shows how to enable debug level 3 (to see full DNSSEC
+validation messages) in BIND 9 and have it sent to ``syslog``:
+
+::
+
+ logging {
+ channel dnssec_log {
+ syslog daemon;
+ severity debug 3;
+ print-category yes;
+ };
+ category dnssec { dnssec_log; };
+ };
+
+The example below shows how to log DNSSEC messages to their own file
+(here, ``/var/log/dnssec.log``):
+
+::
+
+ logging {
+ channel dnssec_log {
+ file "/var/log/dnssec.log";
+ severity debug 3;
+ };
+ category dnssec { dnssec_log; };
+ };
+
+After turning on debug logging and restarting BIND, a large
+number of log messages appear in
+``syslog``. The example below shows the log messages as a result of
+successfully looking up and validating the domain name ``ftp.isc.org``.
+
+::
+
+ validating ./NS: starting
+ validating ./NS: attempting positive response validation
+ validating ./DNSKEY: starting
+ validating ./DNSKEY: attempting positive response validation
+ validating ./DNSKEY: verify rdataset (keyid=20326): success
+ validating ./DNSKEY: marking as secure (DS)
+ validating ./NS: in validator_callback_dnskey
+ validating ./NS: keyset with trust secure
+ validating ./NS: resuming validate
+ validating ./NS: verify rdataset (keyid=33853): success
+ validating ./NS: marking as secure, noqname proof not needed
+ validating ftp.isc.org/A: starting
+ validating ftp.isc.org/A: attempting positive response validation
+ validating isc.org/DNSKEY: starting
+ validating isc.org/DNSKEY: attempting positive response validation
+ validating isc.org/DS: starting
+ validating isc.org/DS: attempting positive response validation
+ validating org/DNSKEY: starting
+ validating org/DNSKEY: attempting positive response validation
+ validating org/DS: starting
+ validating org/DS: attempting positive response validation
+ validating org/DS: keyset with trust secure
+ validating org/DS: verify rdataset (keyid=33853): success
+ validating org/DS: marking as secure, noqname proof not needed
+ validating org/DNSKEY: in validator_callback_ds
+ validating org/DNSKEY: dsset with trust secure
+ validating org/DNSKEY: verify rdataset (keyid=9795): success
+ validating org/DNSKEY: marking as secure (DS)
+ validating isc.org/DS: in fetch_callback_dnskey
+ validating isc.org/DS: keyset with trust secure
+ validating isc.org/DS: resuming validate
+ validating isc.org/DS: verify rdataset (keyid=33209): success
+ validating isc.org/DS: marking as secure, noqname proof not needed
+ validating isc.org/DNSKEY: in validator_callback_ds
+ validating isc.org/DNSKEY: dsset with trust secure
+ validating isc.org/DNSKEY: verify rdataset (keyid=7250): success
+ validating isc.org/DNSKEY: marking as secure (DS)
+ validating ftp.isc.org/A: in fetch_callback_dnskey
+ validating ftp.isc.org/A: keyset with trust secure
+ validating ftp.isc.org/A: resuming validate
+ validating ftp.isc.org/A: verify rdataset (keyid=27566): success
+ validating ftp.isc.org/A: marking as secure, noqname proof not needed
+
+Note that these log messages indicate that the chain of trust has been
+established and ``ftp.isc.org`` has been successfully validated.
+
+If validation had failed, you would see log messages indicating errors.
+We cover some of the most validation problems in the next section.
+
+.. _troubleshooting_common_problems:
+
+Common Problems
+~~~~~~~~~~~~~~~
+
+.. _troubleshooting_security_lameness:
+
+Security Lameness
+^^^^^^^^^^^^^^^^^
+
+Similar to lame delegation in traditional DNS, security lameness refers to the
+condition when the parent zone holds a set of DS records that point to
+something that does not exist in the child zone. As a result,
+the entire child zone may "disappear," having been marked as bogus by
+validating resolvers.
+
+Below is an example attempting to resolve the A record for a test domain
+name ``www.example.net``. From the user's perspective, as described in
+:ref:`how_do_i_know_validation_problem`, only a SERVFAIL
+message is returned. On the validating resolver, we see the
+following messages in ``syslog``:
+
+::
+
+ named[126063]: validating example.net/DNSKEY: no valid signature found (DS)
+ named[126063]: no valid RRSIG resolving 'example.net/DNSKEY/IN': 10.53.0.2#53
+ named[126063]: broken trust chain resolving 'www.example.net/A/IN': 10.53.0.2#53
+
+This gives us a hint that it is a broken trust chain issue. Let's take a
+look at the DS records that are published for the zone (with the keys
+shortened for ease of display):
+
+::
+
+ $ dig @10.53.0.3 example.net. DS
+
+ ; <<>> DiG 9.16.0 <<>> @10.53.0.3 example.net DS
+ ; (1 server found)
+ ;; global options: +cmd
+ ;; Got answer:
+ ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 59602
+ ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
+
+ ;; OPT PSEUDOSECTION:
+ ; EDNS: version: 0, flags:; udp: 4096
+ ; COOKIE: 7026d8f7c6e77e2a010000005e735d7c9d038d061b2d24da (good)
+ ;; QUESTION SECTION:
+ ;example.net. IN DS
+
+ ;; ANSWER SECTION:
+ example.net. 256 IN DS 14956 8 2 9F3CACD...D3E3A396
+
+ ;; Query time: 0 msec
+ ;; SERVER: 10.53.0.3#53(10.53.0.3)
+ ;; WHEN: Thu Mar 19 11:54:36 GMT 2020
+ ;; MSG SIZE rcvd: 116
+
+Next, we query for the DNSKEY and RRSIG of ``example.net`` to see if
+there's anything wrong. Since we are having trouble validating, we
+can use the ``+cd`` option to temporarily disable checking and return
+results, even though they do not pass the validation tests. The
+``+multiline`` option tells ``dig`` to print the type, algorithm type,
+and key id for DNSKEY records. Again,
+some long strings are shortened for ease of display:
+
+::
+
+ $ dig @10.53.0.3 example.net. DNSKEY +dnssec +cd +multiline
+
+ ; <<>> DiG 9.16.0 <<>> @10.53.0.3 example.net DNSKEY +cd +multiline +dnssec
+ ; (1 server found)
+ ;; global options: +cmd
+ ;; Got answer:
+ ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42980
+ ;; flags: qr rd ra cd; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 1
+
+ ;; OPT PSEUDOSECTION:
+ ; EDNS: version: 0, flags: do; udp: 4096
+ ; COOKIE: 4b5e7c88b3680c35010000005e73722057551f9f8be1990e (good)
+ ;; QUESTION SECTION:
+ ;example.net. IN DNSKEY
+
+ ;; ANSWER SECTION:
+ example.net. 287 IN DNSKEY 256 3 8 (
+ AwEAAbu3NX...ADU/D7xjFFDu+8WRIn
+ ) ; ZSK; alg = RSASHA256 ; key id = 35328
+ example.net. 287 IN DNSKEY 257 3 8 (
+ AwEAAbKtU1...PPP4aQZTybk75ZW+uL
+ 6OJMAF63NO0s1nAZM2EWAVasbnn/X+J4N2rLuhk=
+ ) ; KSK; alg = RSASHA256 ; key id = 27247
+ example.net. 287 IN RRSIG DNSKEY 8 2 300 (
+ 20811123173143 20180101000000 27247 example.net.
+ Fz1sjClIoF...YEjzpAWuAj9peQ== )
+ example.net. 287 IN RRSIG DNSKEY 8 2 300 (
+ 20811123173143 20180101000000 35328 example.net.
+ seKtUeJ4/l...YtDc1rcXTVlWIOw= )
+
+ ;; Query time: 0 msec
+ ;; SERVER: 10.53.0.3#53(10.53.0.3)
+ ;; WHEN: Thu Mar 19 13:22:40 GMT 2020
+ ;; MSG SIZE rcvd: 962
+
+Here is the problem: the parent zone is telling the world that
+``example.net`` is using the key 14956, but the authoritative server
+indicates that it is using keys 27247 and 35328. There are several
+potential causes for this mismatch: one possibility is that a malicious
+attacker has compromised one side and changed the data. A more likely
+scenario is that the DNS administrator for the child zone did not upload
+the correct key information to the parent zone.
+
+.. _troubleshooting_incorrect_time:
+
+Incorrect Time
+^^^^^^^^^^^^^^
+
+In DNSSEC, every record comes with at least one RRSIG, and each RRSIG
+contains two timestamps: one indicating when it becomes valid, and
+one when it expires. If the validating resolver's current system time does
+not fall within the two RRSIG timestamps, error messages
+appear in the BIND debug log.
+
+The example below shows a log message when the RRSIG appears to have
+expired. This could mean the validating resolver system time is
+incorrectly set too far in the future, or the zone administrator has not
+kept up with RRSIG maintenance.
+
+::
+
+ validating example.com/DNSKEY: verify failed due to bad signature (keyid=19036): RRSIG has expired
+
+The log below shows that the RRSIG validity period has not yet begun. This could mean
+the validation resolver's system time is incorrectly set too far in the past, or
+the zone administrator has incorrectly generated signatures for this
+domain name.
+
+::
+
+ validating example.com/DNSKEY: verify failed due to bad signature (keyid=4521): RRSIG validity period has not begun
+
+.. _troubleshooting_unable_to_load_keys:
+
+Unable to Load Keys
+^^^^^^^^^^^^^^^^^^^
+
+This is a simple yet common issue. If the key files are present but
+unreadable by ``named`` for some reason, the ``syslog`` returns clear error
+messages, as shown below:
+
+::
+
+ named[32447]: zone example.com/IN (signed): reconfiguring zone keys
+ named[32447]: dns_dnssec_findmatchingkeys: error reading key file Kexample.com.+008+06817.private: permission denied
+ named[32447]: dns_dnssec_findmatchingkeys: error reading key file Kexample.com.+008+17694.private: permission denied
+ named[32447]: zone example.com/IN (signed): next key event: 27-Nov-2014 20:04:36.521
+
+However, if no keys are found, the error is not as obvious. Below shows
+the ``syslog`` messages after executing ``rndc
+reload`` with the key files missing from the key directory:
+
+::
+
+ named[32516]: received control channel command 'reload'
+ named[32516]: loading configuration from '/etc/bind/named.conf'
+ named[32516]: reading built-in trusted keys from file '/etc/bind/bind.keys'
+ named[32516]: using default UDP/IPv4 port range: [1024, 65535]
+ named[32516]: using default UDP/IPv6 port range: [1024, 65535]
+ named[32516]: sizing zone task pool based on 6 zones
+ named[32516]: the working directory is not writable
+ named[32516]: reloading configuration succeeded
+ named[32516]: reloading zones succeeded
+ named[32516]: all zones loaded
+ named[32516]: running
+ named[32516]: zone example.com/IN (signed): reconfiguring zone keys
+ named[32516]: zone example.com/IN (signed): next key event: 27-Nov-2014 20:07:09.292
+
+This happens to look exactly the same as if the keys were present and
+readable, and appears to indicate that ``named`` loaded the keys and signed the zone. It
+even generates the internal (raw) files:
+
+::
+
+ # cd /etc/bind/db
+ # ls
+ example.com.db example.com.db.jbk example.com.db.signed
+
+If ``named`` really loaded the keys and signed the zone, you should see
+the following files:
+
+::
+
+ # cd /etc/bind/db
+ # ls
+ example.com.db example.com.db.jbk example.com.db.signed example.com.db.signed.jnl
+
+So, unless you see the ``*.signed.jnl`` file, your zone has not been
+signed.
+
+.. _troubleshooting_invalid_trust_anchors:
+
+Invalid Trust Anchors
+^^^^^^^^^^^^^^^^^^^^^
+
+In most cases, you never need to explicitly configure trust
+anchors. ``named`` supplies the current root trust anchor and,
+with the default setting of ``dnssec-validation``, updates it on the
+infrequent occasions when it is changed.
+
+However, in some circumstances you may need to explicitly configure
+your own trust anchor. As we saw in the :ref:`trust_anchors_description`
+section, whenever a DNSKEY is received by the validating resolver, it is
+compared to the list of keys the resolver explicitly trusts to see if
+further action is needed. If the two keys match, the validating resolver
+stops performing further verification and returns the answer(s) as
+validated.
+
+But what if the key file on the validating resolver is misconfigured or
+missing? Below we show some examples of log messages when things are not
+working properly.
+
+First of all, if the key you copied is malformed, BIND does not even
+start and you will likely find this error message in syslog:
+
+::
+
+ named[18235]: /etc/bind/named.conf.options:29: bad base64 encoding
+ named[18235]: loading configuration: failure
+
+If the key is a valid base64 string but the key algorithm is incorrect,
+or if the wrong key is installed, the first thing you will notice is
+that virtually all of your DNS lookups result in SERVFAIL, even when
+you are looking up domain names that have not been DNSSEC-enabled. Below
+shows an example of querying a recursive server 10.53.0.3:
+
+::
+
+ $ dig @10.53.0.3 www.example.com. A
+
+ ; <<>> DiG 9.16.0 <<>> @10.53.0.3 www.example.org A +dnssec
+ ; (1 server found)
+ ;; global options: +cmd
+ ;; Got answer:
+ ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 29586
+ ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
+
+ ;; OPT PSEUDOSECTION:
+ ; EDNS: version: 0, flags: do; udp: 4096
+ ; COOKIE: ee078fc321fa1367010000005e73a58bf5f205ca47e04bed (good)
+ ;; QUESTION SECTION:
+ ;www.example.org. IN A
+
+``delv`` shows a similar result:
+
+::
+
+ $ delv @192.168.1.7 www.example.com. +rtrace
+ ;; fetch: www.example.com/A
+ ;; resolution failed: SERVFAIL
+
+The next symptom you see is in the DNSSEC log messages:
+
+::
+
+ managed-keys-zone: DNSKEY set for zone '.' could not be verified with current keys
+ validating ./DNSKEY: starting
+ validating ./DNSKEY: attempting positive response validation
+ validating ./DNSKEY: no DNSKEY matching DS
+ validating ./DNSKEY: no DNSKEY matching DS
+ validating ./DNSKEY: no valid signature found (DS)
+
+These errors are indications that there are problems with the trust
+anchor.
+
+.. _troubleshooting_nta:
+
+Negative Trust Anchors
+~~~~~~~~~~~~~~~~~~~~~~
+
+BIND 9.11 introduced Negative Trust Anchors (NTAs) as a means to
+*temporarily* disable DNSSEC validation for a zone when you know that
+the zone's DNSSEC is misconfigured.
+
+NTAs are added using the ``rndc`` command, e.g.:
+
+::
+
+ $ rndc nta example.com
+ Negative trust anchor added: example.com/_default, expires 19-Mar-2020 19:57:42.000
+
+
+The list of currently configured NTAs can also be examined using
+``rndc``, e.g.:
+
+::
+
+ $ rndc nta -dump
+ example.com/_default: expiry 19-Mar-2020 19:57:42.000
+
+
+The default lifetime of an NTA is one hour, although by default, BIND
+polls the zone every five minutes to see if the zone correctly
+validates, at which point the NTA automatically expires. Both the
+default lifetime and the polling interval may be configured via
+``named.conf``, and the lifetime can be overridden on a per-zone basis
+using the ``-lifetime duration`` parameter to ``rndc nta``. Both timer
+values have a permitted maximum value of one week.
+
+.. _troubleshooting_nsec3:
+
+NSEC3 Troubleshooting
+~~~~~~~~~~~~~~~~~~~~~
+
+BIND includes a tool called ``nsec3hash`` that runs through the same
+steps as a validating resolver, to generate the correct hashed name
+based on NSEC3PARAM parameters. The command takes the following
+parameters in order: salt, algorithm, iterations, and domain. For
+example, if the salt is 1234567890ABCDEF, hash algorithm is 1, and
+iteration is 10, to get the NSEC3-hashed name for ``www.example.com`` we
+would execute a command like this:
+
+::
+
+ $ nsec3hash 1234567890ABCEDF 1 10 www.example.com
+ RN7I9ME6E1I6BDKIP91B9TCE4FHJ7LKF (salt=1234567890ABCEDF, hash=1, iterations=10)
+
+Zero-length salt can be specified as ``-``.
+
+While it is unlikely you would construct a rainbow table of your own
+zone data, this tool may be useful when troubleshooting NSEC3 problems.