Troubleshooting bind issues

From Notes_Wiki

Home > CentOS > CentOS 6.x > Bind DNS server configuration > Troubleshooting bind issues

Very high CPU usage (200%+) by bind

When using chroot bind environment with sufficiently complex configuration bind CPU usage may be above 200%. This problem is caused by configuration file mentioning directories such as '/var/named/data' or /var/named/dynamic' which do not exist in location '/var/named/chroot/var/named/data' or '/var/named/chroot/var/named/dynamic', etc. Hence to solve the problem create all directories in chrooted 'var/named' folder and make them owned by named:named. Then restart bind and the CPU usage should go below 0% as usual.


broken trust chain error

If bind logs show 'broken trust chain' such as:

15-Apr-2014 06:06:11.667 lame-servers: info: error (no valid RRSIG) resolving 'google.co.in/DS/IN': 125.19.40.90#53
15-Apr-2014 06:06:11.942 lame-servers: info: error (no valid RRSIG) resolving 'google.co.in/DS/IN': 199.7.87.1#53
15-Apr-2014 06:06:12.212 lame-servers: info: error (no valid RRSIG) resolving 'google.co.in/DS/IN': 199.253.57.1#53
15-Apr-2014 06:06:12.334 lame-servers: info: error (no valid RRSIG) resolving 'google.co.in/DS/IN': 194.0.1.7#53
15-Apr-2014 06:06:12.379 lame-servers: info: error (no valid RRSIG) resolving 'google.co.in/DS/IN': 115.249.164.142#53
15-Apr-2014 06:06:12.470 lame-servers: info: error (no valid RRSIG) resolving 'google.co.in/DS/IN': 199.249.125.1#53
15-Apr-2014 06:06:12.618 lame-servers: info: error (no valid RRSIG) resolving 'google.co.in/DS/IN': 199.249.117.1#53
15-Apr-2014 06:06:12.860 lame-servers: info: error (no valid RRSIG) resolving 'google.co.in/DS/IN': 199.253.56.1#53
15-Apr-2014 06:06:12.861 lame-servers: info: error (no valid DS) resolving 'www.google.co.in/A/IN': 216.239.34.10#53
15-Apr-2014 06:06:12.985 lame-servers: info: error (broken trust chain) resolving 'www.google.co.in/A/IN': 216.239.36.10#53
15-Apr-2014 06:06:13.055 lame-servers: info: error (broken trust chain) resolving 'www.google.co.in/A/IN': 216.239.34.10#53

Then the most probable cause for this is wrong system time. It is recommended to have ntp server or client configured on each system to resolve this permanently. For a quick fix use:

ntpdate -b 0.centos.pool.ntp.org

assuming resolution for 0.centos.pool.ntp.org is possible using some other DNS server


bind fails to stop and hence fails to start without any good reason

Sometimes, especially after unclean shutdown, bind may fail to stop and start. To solve this try following steps: 1. Use 'ps aux | grep named' and ensure that bind is not running. Kill the process if necessary. 2. Use 'mount' and verify that nothing is mounted inside '/var/named/chroot'. Unmount all folders and files mount inside this folder 3. Then go to '/var/named/chroot/var/run/named' folder and delete any pid files that exist 4. Now try 'service named restart' again


Tracing DNS resolution

If bind results are incorrect then tracing DNS resolution in a recursive query may be helpful. Use:

dig +trace +recurse +all +qr -t any <domain-name>

Note that if nscd is running it might interfere with proper resolution, hence while debugging it is recommended to have nscd stopped:

service nscd stop

To make DNS purge its old cache, restart bind using:

service named restart

If still values are wrong then see which parent DNS is configured to resolve queries for current host. Either DNS from /etc/resolv.conf is used. Or in case of DNS server trying to use itself (127.0.0.1), forwarders specified in /etc/named.conf might be in use. At least one of these peer-DNS must have wrong cache or incorrect configuration.



Home > CentOS > CentOS 6.x > Bind DNS server configuration > Troubleshooting bind issues