Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNS: client does not receive an NXDOMAIN when 1 of 3 servers times out #2613

Open
jbergler opened this issue Jan 19, 2021 · 1 comment
Open

Comments

@jbergler
Copy link

Hi there,

We discovered an issue today in how queries passed to an external DNS server are retried when an NXDOMAIN is received.

# /etc/resolv.conf (host)
search example.com
nameserver 10.0.0.1
nameserver 10.0.0.2
nameserver 10.0.0.3
nameserver 10.0.0.4
# /etc/resolv.conf (container)
search example.com
nameserver 127.0.0.11
options single-request timeout:1 ndots:0

Under normal circumstances, DNS works as expected. The NXDOMAIN is received by the client, resulting in the search domain being appended and the query retried.

$ time host usw125
Host usw125 not found: 3(NXDOMAIN)
real    0m0.052s
user    0m0.000s
sys     0m0.010s

However, when 10.0.0.3 goes offline, the container never receives the NXDOMAIN and thus never tries to resolve the query with the search domain.

$ time host foo
;; connection timed out; no servers could be reached
real    0m10.011s
user    0m0.004s
sys     0m0.007s

We can see that the docker daemon tried 3 servers (expected, since the NXDOMAIN is not authoritative).
It receives 2 NXDOMAIN responses followed by a timeout.
At this point we've hit our limit (since maxExtDNS = 3) and we fall through without sending a response

# /var/logl/dockerd.log
Name To resolve: foo.
[resolver] query foo. (A) from 172.18.0.5:55172, forwarding to udp:10.0.0.1
[resolver] external DNS udp:10.0.0.1 responded with NXDOMAIN for "foo."
[resolver] query foo. (A) from 172.18.0.5:39805, forwarding to udp:10.0.0.2
[resolver] external DNS udp:10.0.0.2 responded with NXDOMAIN for "foo."
[resolver] query foo. (A) from 172.18.0.5:54945, forwarding to udp:10.0.0.3
[resolver] read from DNS server failed, read udp 172.18.0.5:54945->10.0.0.3:53: i/o timeout

It seems to me that somehow this failure mode should(?)/could return one of the NXDOMAIN responses we previously received allowing the client to continue operating rather than hanging for extended periods of time as if all DNS servers were unreachable.

@jbergler
Copy link
Author

@thaJeztah if, on the off chance, you had some time to take a look at this it would be much appreciated.
From my debugging I believe this may be introduced in a86d276

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant