Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nslookup does not work in latest busybox image #48

Closed
krishshenoy opened this issue Jul 17, 2018 · 47 comments
Closed

Nslookup does not work in latest busybox image #48

krishshenoy opened this issue Jul 17, 2018 · 47 comments
Labels
question Usability question, not directly related to an error with the image

Comments

@krishshenoy
Copy link

I deployed a kubernetes image using the latest version of busybox image.
After the pod was successfully deployed I tried to run
kubectl exec busybox nslookup kubernetes.default

The nslookup command no longer works.

shenoyk-m01:image-pipeline shenoyk$ kubectl exec busybox nslookup kubernetes.default
Server: 10.0.0.10
Address: 10.0.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

The same command works when specifying busybox:1.28 version for the image. Nslookup started failing with the latest version

busybox.yaml is below.

apiVersion: v1
kind: Pod
metadata:
name: busybox
namespace: default
spec:
containers:

  • image: busybox
    command:
    • sleep
    • "3600"
      imagePullPolicy: Always
      name: busybox
      restartPolicy: Always
@wglambert
Copy link

Seems to be a kubernetes configuration issue

Not able to reproduce the issue through Docker standalone

$ docker run --rm -dit --name busybox busybox:latest
$ docker exec -it busybox sh

# ping google.com
PING google.com (172.217.11.174): 56 data bytes
64 bytes from 172.217.11.174: seq=0 ttl=53 time=14.993 ms
64 bytes from 172.217.11.174: seq=1 ttl=53 time=14.598 ms
64 bytes from 172.217.11.174: seq=2 ttl=53 time=14.039 ms
^C
# nslookup github.com
Server:    8.8.8.8
Address 1: 8.8.8.8 google-public-dns-a.google.com

Name:      github.com
Address 1: 192.30.255.112 lb-192-30-255-112-sea.github.com
Address 2: 192.30.255.113 lb-192-30-255-113-sea.github.com
# nslookup google.com
Server:    8.8.8.8
Address 1: 8.8.8.8 google-public-dns-a.google.com

Name:      google.com
Address 1: 2607:f8b0:4007:804::200e lax28s15-in-x0e.1e100.net
Address 2: 216.58.219.14 lax17s03-in-f14.1e100.net

Kubernetes with hostNetwork: true

$ kubectl exec busybox-7cc555b5d6-2mmcr ping google.com
PING google.com (172.217.11.174): 56 data bytes
64 bytes from 172.217.11.174: seq=0 ttl=54 time=13.444 ms
64 bytes from 172.217.11.174: seq=1 ttl=54 time=14.249 ms
64 bytes from 172.217.11.174: seq=2 ttl=54 time=20.149 ms
^C

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup google.com 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.11.174

*** Can't find google.com: No answer

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup kubernetes.default 8.8.8.8
Server:         8.8.8.8
Address:        8.8.8.8:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

$ kubectl exec busybox-7cc555b5d6-2mmcr nslookup kubernetes.default
Server:         127.0.0.53
Address:        127.0.0.53:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

This seems to be the most relevant issue I found kubernetes/kubernetes#33798

@wglambert wglambert added the question Usability question, not directly related to an error with the image label Jul 17, 2018
@tianon
Copy link
Member

tianon commented Jul 17, 2018

This reminds me of the fun we had back in #9, but that doesn't seem related. 😞

@krishshenoy
Copy link
Author

I have a kubernetes cluster monitoring test that continually deploys a busybox pod in a cluster and verifies DNS resolution within the pod by executing kubectl exec nslookup. It started failing right when I downloaded the latest busybox image. Installing a busybox pod with the previous version 1.28 of the image nslookup works. All signs point to a change in this latest version that is causing the failure.

@tianon
Copy link
Member

tianon commented Jul 17, 2018

Unfortunately, that only narrows it down to somewhere in the sea of 438 files changed, 9453 insertions(+), 4480 deletions(-) (from 1_28_4 to 1_29_1 in the Git tags of the two releases).

@tianon
Copy link
Member

tianon commented Jul 17, 2018

Something in here seems most likely:

$ git log --oneline 1_28_4...1_29_1 -- networking/nslookup.c
2f7738e47 nslookup: placate "warning: unused variable i"
c72499584 nslookup: simplify make_ptr
71e4b3f48 nslookup: get rid of query::rlen field
58e43a4c4 nslookup: move array of queries to "globals"
4b6091f92 nslookup: accept lowercase -type=soa, document query types
6cdc3195a nslookup: change -stats to -debug (it's a bug in bind that it accepts -s)
d4461ef9f nslookup: rework option parsing
a980109c6 nslookup: smaller qtypes[] array
2cf75b3c8 nslookup: process replies immediately, do not store them
4e73c0f65 nslookup: fix output corruption for "nslookup 1.2.3.4"
cf950cd3e nslookup: more closely resemble output format of bind-utils-9.11.3
71e016d80 nslookup: shrink send_queries()
db93b21ec nslookup: use xmalloc_sockaddr2dotted() instead of homegrown function
55bc8e882 nslookup: usee bbox network functions instead of opne-coded mess
0dd3be8c0 nslookup: add openwrt / lede version

@djsly
Copy link

djsly commented Jul 18, 2018

/label bug
we are having the same issue.
1.27/1.28 are working , 1.29/1.29.1 are not

kubectl run --attach busybox --rm --image=busybox:1.27 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
kubectl run --attach busybox --rm --image=busybox:1.28 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:    192.168.0.10
Address 1: 192.168.0.10 kube-dns.kube-system.svc.cluster.local

Name:      kubernetes.default
Address 1: 192.168.0.1 kubernetes.default.svc.cluster.local
 kubectl run --attach busybox --rm --image=busybox:1.29 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.

Server:         192.168.0.10
Address:        192.168.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer
 kubectl run --attach busybox --rm --image=busybox:1.29.1 --restart=Never -- sh -c "sleep 4 && nslookup kubernetes.default"
If you don't see a command prompt, try pressing enter.


Server:         192.168.0.10
Address:        192.168.0.10:53

** server can't find kubernetes.default: NXDOMAIN

*** Can't find kubernetes.default: No answer

@tianon
Copy link
Member

tianon commented Jul 18, 2018 via email

@djsly
Copy link

djsly commented Jul 18, 2018

@tokiwinter
Copy link

Same issue here. Reverting to 1.28 fixed the issue for me.

@tianon
Copy link
Member

tianon commented Jul 20, 2018

How does this relate to #27? Are they the same issue?

@tianon
Copy link
Member

tianon commented Jul 20, 2018

From what I can tell, the new resolver in BusyBox's nslookup doesn't support DNS search domains at all, which seems like a pretty hefty regression.

@krishshenoy
Copy link
Author

Thanks tianon. How will this be addressed?

@tianon
Copy link
Member

tianon commented Jul 21, 2018 via email

@hickeng
Copy link

hickeng commented Jul 26, 2018

@djsly Try using "sleep 4 && nslookup -type=a kubernetes.default"

I've added my findings here: https://bugs.busybox.net/show_bug.cgi?id=11161#c4

@piersharding
Copy link

As a suggestion, would it be possible to regress the :latest tag to point to 1.8.x until upstream is resolved?

@cparjaszewski
Copy link

See this issue:
kubernetes/kubernetes#66924

@tianon
Copy link
Member

tianon commented Aug 9, 2018

Given that the upstream change was intentional and is a reflection of upstream, I'm not comfortable changing latest back to 1.28 (especially given that 1.29 is considered "stable" by upstream) -- I'd recommend instead pinning usage to busybox:1.28 (or more specifically, busybox:1.28-variant) for now until the updated functionality which resolves this issue is implemented upstream. (Pinning to a particular release or release series of dependencies is generally good advice anyhow, and it looks like Busybox upstream might intend to get more aggressive about changes in the future, so it seems more prudent than ever.)

@cparjaszewski
Copy link

cparjaszewski commented Aug 9, 2018

For some people it’s still difficult to admit a mistake. Being aggressive and brave with new changes is one thing, breaking stuff that worked before is another one, especially these days when a lot of people are using “:latest” by default - introducing a BC and calling that was on purpose is just far from wise.

Please read more about semantic versioning as well.

@piersharding
Copy link

Hi @tianon - I can understand that you don't want to have a regression on :latest, but there is a surprising amount of fallout from this simple issue because so many people and documentation out there use busybox:latest as the "Hello, World" example. Temporarily changing the tag would help mitigate that pain and these unintended consequences.

Cheers,
Piers.

@xingchijin
Copy link

tried 1.33, this issue is still there

@Piotr1215
Copy link

Same happens on 1.33.1. For comparison, the image gcr.io/kubernetes-e2e-test-images/dnsutils:1.3, used in Kubernetes documentation DNS troubleshooting example still works as expected. As metioned by @blodone.

Any chance on getting it fixed soon?

@yosifkit
Copy link
Member

Any chance on getting it fixed soon?

#48 (comment):

It needs to be addressed upstream -- we simply package what they provide.

@ChristopherHanson
Copy link

Same happens on 1.33.1. For comparison, the image gcr.io/kubernetes-e2e-test-images/dnsutils:1.3, used in Kubernetes documentation DNS troubleshooting example still works as expected. As metioned by @blodone.

Any chance on getting it fixed soon?

Just use 1.27, the package in that version has always worked

@Piotr1215

This comment has been minimized.

guettli added a commit to guettli/website that referenced this issue Mar 2, 2022
Changes where done with these commands:

reprec 'image: busybox(?!:)' 'image: busybox:1.28' */docs */examples
reprec -- '--image=busybox(?!:)' '--image=busybox:1.28' */docs */examples

Related issues:

 docker-library/busybox#48
 kubernetes/kubernetes#66924
@gaganyaan2
Copy link

gaganyaan2 commented Mar 7, 2022

kubernetes/kubernetes#66924 (comment)

It's very infrequent hit. I ran nslookup by updating /etc/resolv.conf for ndots:5, ndots:7, ndots:10 in while loop approx. 200 times with timeout=2 seconds. Below are the results.

  • ndots:5 = 39 times nslookup query worked/200
  • ndots:7 = 22 times nslookup query worked/200
  • ndots:10 = 16 times nslookup query worked/200

Below shell script I used to calculate this result.

echo 'while(true); do
nslookup -timeout=2 kubernetes > /dev/null 2>&1
result=$?
if [ "$result" == "0" ]; then
	echo "$(date +%s) : $result : pass" >> /tmp/nslookup_status
elif [ "$result" == "1" ]; then
	echo "$(date +%s) : $result : fail" >> /tmp/nslookup_status
else
	echo "$(date +%s) : $result : fail" >> /tmp/nslookup_status
fi
done' > nslookup_status.sh

chmod +x nslookup_status.sh
./nslookup_status.sh &

busybox-pod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: "busybox1"
spec:
  containers:
  - image: busybox
    name: busybox
    command: [ "sleep","6000"]
  dnsConfig:
    options:
      - name: ndots
        value: "7"

busybox Image hash : busybox:latest@sha256:34c3559bbdedefd67195e766e38cfbb0fcabff4241dbee3f390fd6e3310f5ebc

@guettli
Copy link

guettli commented Mar 16, 2022

Just for the records, I opened a new issue at the bugtracker of busybox: https://bugs.busybox.net/show_bug.cgi?id=14671

@astraw99
Copy link

astraw99 commented Oct 4, 2022

Encountered the same issue in busybox:1.35.
Is there anyone pushing this issue to resolve?

@tianon tianon pinned this issue Oct 18, 2022
xcompass added a commit to ubc/charts that referenced this issue Dec 13, 2022
There is a bug related to nslookup in busybox
(docker-library/busybox#48). nslookup doesn't
return 0 when one of the hostname+domain suffix failed to resolve. The
suffix are listed in /etc/resolve.conf. e.g.
default.svc.cluster.local svc.cluster.local cluster.local

Also checking DNS doesn't mean service is up. wait4x will try
to make connection and valid the servie is up and running.
@lhzw
Copy link

lhzw commented May 17, 2023

1.34.1 is not stable, does not work most time, sometimes works:

# ./dnstest.sh
dnstest
/ #
/ #
/ # nslookup es
Server:         169.254.25.10
Address:        169.254.25.10:53

** server can't find es.default.svc.cluster.local: NXDOMAIN

*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer
*** Can't find es.default.svc.cluster.local: No answer
*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer

/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.069 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.108 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.069/0.088/0.108 ms
/ #

1.34.1 works once in my tests:

/ # busybox | head -1
BusyBox v1.34.1 (2021-12-29 21:12:15 UTC) multi-call binary.
/ # nslookup es
Server:         169.254.25.10
Address:        169.254.25.10:53

Name:   es.default.svc.cluster.local
Address: 10.233.36.216

*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer
*** Can't find es.default.svc.cluster.local: No answer
*** Can't find es.svc.cluster.local: No answer
*** Can't find es.cluster.local: No answer

/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.117 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.117 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.117/0.117/0.117 ms
/ #

1.36.0 works fine:

/ # busybox | head -1
BusyBox v1.36.0 (2023-05-11 16:48:06 UTC) multi-call binary.
/ # nslookup es
Server:         169.254.25.10
Address:        169.254.25.10:53

** server can't find es.svc.cluster.local: NXDOMAIN

Name:   es.default.svc.cluster.local
Address: 10.233.36.216

** server can't find es.svc.cluster.local: NXDOMAIN


** server can't find es.cluster.local: NXDOMAIN

** server can't find es.cluster.local: NXDOMAIN

/ # ping es
PING es (10.233.36.216): 56 data bytes
64 bytes from 10.233.36.216: seq=0 ttl=64 time=0.078 ms
64 bytes from 10.233.36.216: seq=1 ttl=64 time=0.128 ms
^C
--- es ping statistics ---
2 packets transmitted, 2 packets received, 0% packet loss
round-trip min/avg/max = 0.078/0.103/0.128 ms
/ #
docker images | grep busy
busybox                                                1.36.0     af2c3e96bcf1   5 days ago      4.86MB
busybox                                                1.34.1     beae173ccac6   16 months ago   1.24MB
busybox                                                latest     beae173ccac6   16 months ago   1.24MB

@BSWANG
Copy link

BSWANG commented Mar 11, 2024

Same issue on BusyBox v1.36.1. nslookup failed but wget can resolve short domain kubernetes.default.

/ #  busybox | head -1
BusyBox v1.36.1 (2023-05-18 22:34:17 UTC) multi-call binary.
/ # nslookup kubernetes.default
Server:		172.16.0.10
Address:	172.16.0.10:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN

/ # wget -O- kubernetes.default
Connecting to kubernetes.default (172.16.0.1:80)

@LaurentGoderre
Copy link
Member

@BSWANG nslookup doesn't use host file to resolve and only use DNS server.

@zhangguanzhang
Copy link

zhangguanzhang commented Mar 19, 2024

any update? I decide to used alpine:

$ getent ahostsv4 kubernetes.default | awk '/STREAM/ {print $1;exit; }'
10.96.0.1
$ getent ahostsv4 kube-dns.kube-system | awk '/STREAM/ {print $1;exit; }'
10.96.0.10

@tianon
Copy link
Member

tianon commented Mar 19, 2024

Given that this issue is an upstream issue (not something we've introduced), that it is appropriately filed at https://bugs.busybox.net/show_bug.cgi?id=11161, and apparently will be fixed in the next release (https://git.busybox.net/busybox/commit/?id=9408978a438ac6c3becb2216d663216d27b59eab), I'm going to close.

It would appear that Kubernetes has adjusted to use busybox:1.28 explicitly in the meantime (kubernetes/website#9901), which is the simplest workaround for folks affected by this upstream change.


https://bugs.busybox.net/show_bug.cgi?id=11161

It needs to be addressed upstream -- we simply package what they provide.

Given that the upstream change was intentional and is a reflection of upstream, I'm not comfortable changing latest back to 1.28 (especially given that 1.29 is considered "stable" by upstream) -- I'd recommend instead pinning usage to busybox:1.28 (or more specifically, busybox:1.28-variant) for now until the updated functionality which resolves this issue is implemented upstream. (Pinning to a particular release or release series of dependencies is generally good advice anyhow, and it looks like Busybox upstream might intend to get more aggressive about changes in the future, so it seems more prudent than ever.)

@TheDevilDan
Copy link

I have a workaround for the moment, if you want nslookup in your pods :

add the package bind-tools

    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "apk add bind-tools"]

or add the package in the docker image and it works

I use 8.1-fpm-alpine the latest, and the domains are not full when i request : exit 1

MyServer# kubectl exec -n mynamespace -it AlpineBindUtils -- nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

MyServer# kubectl exec -n mynamespace -it AlpineWithoutBindUtils -- nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find kubernetes.default: NXDOMAIN

** server can't find kubernetes.default: NXDOMAIN

command terminated with exit code 1

@der-ali
Copy link

der-ali commented Jul 12, 2024

I was really breaking my head why PQDNs (aka kubernetes.default) aren't being resolved by nslookup in my debugging pod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Usability question, not directly related to an error with the image
Projects
None yet
Development

No branches or pull requests