Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-FQDN resolving/conditional forwarding doesn't work properly with two search domains configured #2085

Open
kaechele opened this issue Sep 13, 2024 · 7 comments

Comments

@kaechele
Copy link

Versions

$ pihole -v
Core
    Version is v5.18.3-457-ga8d305d5 (Latest: null)
    Branch is development-v6
    Hash is a8d305d5 (Latest: a8d305d5)
Web
    Version is v5.21-929-g085c2880 (Latest: null)
    Branch is development-v6
    Hash is 085c2880 (Latest: 085c2880)
FTL
    Version is vDev-e5a24bd (Latest: null)
    Branch is development-v6
    Hash is e5a24bdd (Latest: e5a24bdd)

Platform

  • OS and version: Fedora 40
  • Platform: KVM

Expected behavior

When two search domains are configured on a client and more than one conditional forwarder is configured in Pi-Hole, Pi-Hole should respond NXDOMAIN for those domains instead of blocking them as Blocked (external, NXRA) and responding 0.0.0.0 / ::. Not responding with NXDOMAIN will result in the client not attempting to resolve the non-FQDN hostname using the second configured search domain.

Actual behavior / bug

Pihole responds A 0.0.0.0 / AAAA :: to the non-FQDN query for host1:

$ host host1
host1.domain1.lan has address 0.0.0.0
host1.domain1.lan has IPv6 address ::

Therefor the client never tries to resolve host1.domain2.lan, which would trigger conditional forwarding to the other internal DNS server that has a valid entry for host1.domain2.lan that satisfies this request.

Steps to reproduce

Scenario:

  • Two networks with separate domains and resolvers exist:
    • domain1.lan with 10.0.0.0/24 and resolver 10.0.0.10
    • domain2.lan with 192.168.0.0/24 and resolver 192.168.0.10
  • a Pi-Hole instance is running at 10.0.0.40

The client has the following DNS settings:

$ resolvectl
Link 1 (wlp2s0)
    Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
         Protocols: +DefaultRoute LLMNR=resolve -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.0.0.40
       DNS Servers: 10.0.0.40
        DNS Domain: domain1.lan domain2.lan

Configuration for Pi-Hole under Settings -> DNS -> Conditional Forwarding

true,10.0.0.0/24,10.0.0.10,domain1.lan
true,192.168.0.0/24,192.168.0.10,domain2.lan

Steps to reproduce the behavior:

  1. User tries to query the non-FQDN host host1
  2. The client expands this to host1.domain1.lan due to the search domain setting of domain1.lan domain2.lan
  3. Pi-Hole receives a query for host1.domain1.lan
  4. The first DNS server (10.0.0.10) that receives this request due to conditional forwarding does not have a valid RRSet for this domain
  5. Pi-Hole receives the NXDOMAIN from 10.0.0.10 and decides to block the request as it doesn't allow this request to be forwarded to the internet
  6. The client receives a A 0.0.0.0 / AAAA :: response from Pi-Hole and is satisfied. Had it received an NXDOMAIN response it would have tried querying host1.domain2.lan, which would have yielded the desired response.

Debug Token

Copy link

This issue is stale because it has been open 30 days with no activity. Please comment or update this issue or it will be closed in 5 days.

@github-actions github-actions bot added the stale label Oct 13, 2024
@PromoFaux
Copy link
Member

Do you see the same behaviour if you set the blocking mode to NXDOMAIN rather than NULL?

@kaechele
Copy link
Author

When I change the blocking mode to NXDOMAIN the behaviour changes to working as intended:

  • I issue host host1 on my client
  • Client automatically appends first configured search domain
  • Client queries PiHole with host1.domain1.lan
  • PiHole sends an NXDOMAIN for host1.domain1.lan
  • Client retries with host1.domain2.lan (the second configured search domain)
  • PiHole forwards this to the DNS server configured in conditional forwarding for this domain
  • Client receives a correct result for host1.domain2.lan from the forwarded server via PiHole

@PromoFaux
Copy link
Member

Thanks for the update.

@DL6ER any thoughts here?

@DL6ER
Copy link
Member

DL6ER commented Oct 15, 2024

@kaechele This is not necessarily a setup I can easily reproduce here but let me start with asking if is this still an issue with the most recent development-v6 ? I recall us having fixed something concerning the detection of the external blocked status a few weeks ago, this may have coincided with your issue ticket which I unfortunately missed myself. I will move this to the right repository.

If it still exists with your previous configuration (which may be the case), please run

sudo pihole-FTL --config debug.queries true

and try again the host host1 on your client. The related content in /var/log/pihole/FTL.log should give us a better picture of what is going on here (and hopefully why FTL seems to have detected an upstream blocking attempt with NXRA).

@DL6ER DL6ER transferred this issue from pi-hole/pi-hole Oct 15, 2024
@kaechele
Copy link
Author

I'm pretty sure the culprit is this:

FTL/src/dnsmasq_interface.c

Lines 2617 to 2626 in 61a211f

// Check if RA bit is unset in DNS header and rcode is NXDOMAIN
// If the response code (rcode) is NXDOMAIN, we may be seeing a response from
// an externally blocked query. As they are not always accompany a necessary
// SOA record, they are not getting added to our cache and, therefore,
// FTL_reply() is never getting called from within the cache routines.
// Hence, we have to store the necessary information about the NXDOMAIN
// reply already here.
if(!(header4 & 0x80) && rcode == NXDOMAIN)
// RA bit is not set and rcode is NXDOMAIN
FTL_mark_externally_blocked(id, file, line);

Context

I reverted dns.blocking.mode back to NULL (the default) and set debug.queries to true to capture the following log:

Query Log for host1 (non-FQDN)
2024-10-17 02:33:13.758 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[A] query "host1.domain1.lan" from eth0/10.0.0.151#58470 (ID 9977176, FTL 84021, src/dnsmasq/forward.c:1815)
2024-10-17 02:33:13.758 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known
2024-10-17 02:33:13.766 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no
2024-10-17 02:33:13.766 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no
2024-10-17 02:33:13.767 UTC [1023M] DEBUG_QUERIES: DNS cache: A/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1)
2024-10-17 02:33:13.767 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977176, src/dnsmasq/forward.c:559)
2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977176, FTL 84021, /app/src/dnsmasq/rfc1035.c:797)
2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: DNS cache: A/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s
2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731
2024-10-17 02:33:13.769 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977176, src/dnsmasq/rfc1035.c:802)
2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1)
2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES:   Adding RR: "host1.domain1.lan A 0.0.0.0"
2024-10-17 02:33:13.770 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is 0.0.0.0 (ID 9977176, src/dnsmasq_interface.c:404)
2024-10-17 02:33:13.778 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[AAAA] query "host1.domain1.lan" from eth0/10.0.0.151#45799 (ID 9977177, FTL 84022, src/dnsmasq/forward.c:1815)
2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known
2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no
2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no
2024-10-17 02:33:13.779 UTC [1023M] DEBUG_QUERIES: DNS cache: AAAA/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1)
2024-10-17 02:33:13.780 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977177, src/dnsmasq/forward.c:559)
2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977177, FTL 84022, /app/src/dnsmasq/rfc1035.c:797)
2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: DNS cache: AAAA/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s
2024-10-17 02:33:13.781 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731
2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977177, src/dnsmasq/rfc1035.c:802)
2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1)
2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES:   Adding RR: "host1.domain1.lan AAAA ::"
2024-10-17 02:33:13.782 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is :: (ID 9977177, src/dnsmasq_interface.c:439)
2024-10-17 02:33:13.787 UTC [1023M] DEBUG_QUERIES: **** new UDP IPv4 query[MX] query "host1.domain1.lan" from eth0/10.0.0.151#44066 (ID 9977178, FTL 84023, src/dnsmasq/forward.c:1815)
2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: host1.domain1.lan is not known
2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in antigravity (exact): no
2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: Checking if "host1.domain1.lan" is in gravity (exact): no
2024-10-17 02:33:13.788 UTC [1023M] DEBUG_QUERIES: DNS cache: MX/10.0.0.151/host1.domain1.lan is not blocked (domainlist ID: -1)
2024-10-17 02:33:13.789 UTC [1023M] DEBUG_QUERIES: **** forwarded host1.domain1.lan to 10.0.0.10#53 (ID 9977178, src/dnsmasq/forward.c:559)
2024-10-17 02:33:13.790 UTC [1023M] DEBUG_QUERIES: **** host1.domain1.lan externally blocked (ID 9977178, FTL 84023, /app/src/dnsmasq/rfc1035.c:797)
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: DNS cache: MX/10.0.0.151/host1.domain1.lan is blocked upstream with NXDOMAIN and unset RA bit, expires in 86017s
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: Set reply to NXDOMAIN (2) in src/dnsmasq_interface.c:2731
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: **** got upstream reply from 10.0.0.10#53: host1.domain1.lan is blocked due to upstream response (header) (ID 9977178, src/dnsmasq/rfc1035.c:802)
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: Preparing reply for "host1.domain1.lan", EDE: N/A (-1)
2024-10-17 02:33:13.791 UTC [1023M] DEBUG_QUERIES: **** got cache reply: host1.domain1.lan is (NODATA) (ID 9977178, src/dnsmasq_interface.c:457)

My read of what's happening here:

  • 10.0.0.10, the upstream server set in the conditional forwarding for domain1.lan, is a PowerDNS Authoritative DNS server.
  • It receives DDNS updates from the Kea DHCP server responsible for the LAN with the domain domain1.lan, so that it is able to respond to queries for both domain1.lan A queries as well as for 0.0.10.in-addr.arpa PTR queries.
  • The PowerDNS Authoritative server does not do recursion, hence the unset RA bit.
  • Pi-Hole interprets an NXDOMAIN with unset RA bit as the name being blocked upstream, is satisfied with that and considers the domain blocked for itself as well.
  • However (and this gets lost here), the PowerDNS server set the AA bit because it is authoritative for domain1.lan. It knows for sure this name doesn't exist.

In this case the upstream server behaves correctly, because it doesn't have an entry for this host but it also cannot do recursion. It also doesn't need to, because it is authoritative for domain1.lan.

This is what the query looks like towards 10.0.0.10 using dig:

; <<>> DiG 9.18.28 <<>> host1.domain1.lan @10.0.0.10
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 50978
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;host1.domain1.lan.	IN	A

;; AUTHORITY SECTION:
domain1.lan.	3600	IN	SOA	dns.domain1.lan. hostmaster.domain1.lan. 2024101608 10800 3600 604800 3600

;; Query time: 22 msec
;; SERVER: 10.0.0.10#53(10.0.0.10) (UDP)
;; WHEN: Wed Oct 16 22:53:51 EDT 2024
;; MSG SIZE  rcvd: 111

I believe the root cause here is that PiHole needs to only consider a domain blocked upstream if both the RA and the AA bit are not set. If the AA bit is set PiHole should treat any NXDOMAIN response as authoritatively non-existent rather than blocked.

For comparison, here is a response from 9.9.9.9 for a known Malware domain that this server blocks:

; <<>> DiG 9.18.28 <<>> 1312services.ru @9.9.9.9
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 29040
;; flags: qr rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;1312services.ru.		IN	A

;; Query time: 29 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Wed Oct 16 23:00:36 EDT 2024
;; MSG SIZE  rcvd: 44

No RA bit but also no AA bit. It's probably fine to continue considering this type of response as "blocked externally".

@DL6ER
Copy link
Member

DL6ER commented Oct 17, 2024

Thank you, this is about what I was assuming. Also thank you very much for the proposed fix already :-)

I will review/verify this after returning from work today (it's still earlyish morning on this side of the planet)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants