-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker-in-docker: Add retry mechanism into the docker init script (Failed to connect to Docker) #634
Comments
For the last two weeks we have been experiencing the same issue. Running the script recovers the docker instance. |
I wanted to add that I've also been experiencing this issue much more frequently that normal. |
Thanks for reporting! Opened #637 which adds a retry logic to address the issue ^ |
@samruddhikhandale We have just tried 2.3.0 version of Docker in Docker. But we are still observing the issue with the Docker daemon.
|
@AndriiTsok Hmm, in the creation logs do you see this log message ^? I would like to validate if the retry is actually happening. I wonder if it's conflicting (/running in parallel) with the |
@samruddhikhandale Thank you for your fast reply! We tried to create more than 30 codespaces during the last few hours to try to reproduce the issue. At the moment we are not able to produce Today, we made sure that not the prebuilt image was used and that the codespaces are created for new branches. Also, we explicitly set 2.3.0 for the D-in-D feature. We will keep an eye on the logs and the stability and will create another issue in case it is reproducible. |
@samruddhikhandale we have just managed to catch the error again. Here is the creation log: https://gist.github.com/AndriiTsok/1a62138fca79da47cb8d90db1b87ca9f Failed to start docker line: https://gist.github.com/AndriiTsok/1a62138fca79da47cb8d90db1b87ca9f#file-gistfile1-txt-L284 Line with Docker daemon error: https://gist.github.com/AndriiTsok/1a62138fca79da47cb8d90db1b87ca9f#file-gistfile1-txt-L307 We simply trying to check the clusters and remove them for cases when we rebuilding the codespaces: # Get the list of all existing clusters
clusters=$(k3d cluster list -o json | jq -r '.[].name')
# Iterate over each cluster and delete it
for cluster in $clusters; do
echo "Deleting cluster $cluster"
k3d cluster delete $cluster
done
# Create a new cluster with the given configuration
k3d cluster create --config .devcontainer/k3d/config.yaml --kubeconfig-update-default Fails trying to get clusters. P.S. |
@samruddhikhandale I can confirm the issue is still randomly occurring. I would say it is around 50%/50% usually rebuild helps to restore the codespace. One of the observed log values: [161525 ms] Start: Run: docker run --sig-proxy=false -a STDOUT -a STDERR --mount type=bind,src=/var/lib/docker/codespacemount/workspace,dst=/workspaces --mount type=volume,src=dind-var-lib-docker-0hcbhh2c7vldoj773drm1bjldnb89u0rt96sl9nju22d9ou8d14n,dst=/var/lib/docker --mount type=volume,src=minikube-config,dst=/home/vscode/.minikube --mount source=/root/.codespaces/shared,target=/workspaces/.codespaces/shared,type=bind --mount source=/var/lib/docker/codespacemount/.persistedshare,target=/workspaces/.codespaces/.persistedshare,type=bind --mount source=/.codespaces/agent/mount,target=/.codespaces/bin,type=bind --mount source=/mnt/containerTmp,target=/tmp,type=bind --mount type=bind,src=/.codespaces/agent/mount/cache,dst=/vscode -l Type=codespaces -e CODESPACES=******** -e ContainerVersion=13 -e RepositoryName=Monorepo --label ContainerVersion=13 --hostname codespaces-86fa16 --add-host codespaces-86fa16:127.0.0.1 --cap-add sys_nice --network host --privileged --entrypoint /bin/sh vsc-monorepo-9081da56175f3b1ac597257c0566d7ce76b18fbc1a048e05bdbd04f7efb0dfca-features -c echo Container started
Container started
sed: couldn't flush stdout: Device or resource busy
Outcome: success User: node WorkspaceFolder: /workspaces/Monorepo
devcontainer process exited with exit code 0
Running blocking commands... sed: couldn't flush stdout: Device or resource busy |
@AndriiTsok Thanks for the update.
Looks like the retry mechanism is triggered, we retry for 5 five times until docker daemon starts. From the logs, we can only see one such log statements, hence I can think of two things which might be happening:
@AndriiTsok would it be possible to provide a sample repro (ie a sample dev container config), I'd like to experiment few things. If not, no worries, I could play around by adding docker commands within onCreateCommand. @AndriiTsok In the meanwhile, would it be possible to add a similar retry logic (which starts docker daemon) to your onCreateCommand script? Let me know if this works! |
We have experienced this issue consistently in the last 4 days. It happens during codespaces prebuild. I see the line |
I wonder if adding retry logic is somehow breaking the codespace prebuilds, thanks @tom-growthbox for reporting the issue and providing a temporary solution. @tom-growthbox would it be possible to provide a sample repro? (ie dev container config) It would help me investigate and fix the underlying issue, thanks! |
I created a sample repo with similar configuration as we use. However, the prebuild doesn't fail on this one. Would need to spend some time on this to reproduce the issue. |
@samruddhikhandale I created a repro container as well: https://github.com/TRYON-Technology/Docker-in-Docker-Issue |
Same error here, needed 15 new codespaces instances to finally trigger and catch it. version used: Published
Discussion and context: https://github.com/orgs/community/discussions/63776 |
@AndriiTsok Unfortunately, I don't see any dev container files added to https://github.com/TRYON-Technology/Docker-in-Docker-Issue. Am I missing something? 🤔 |
Hi @mandrasch 👋 In your dev container, docker is added by the Hence, adding the The prod image was built with Feature version 2.2.1 (released on Aug 3rd) which does not contain the retry logic. I'll work on releasing a new Let me know if that makes sense. |
Opened devcontainers/images#705. In the meanwhile, @mandrasch can you use the dev image? ( Also, can you remove the |
The image is live now! |
Hi @samruddhikhandale! Thanks so much for explaining this! 🙏 🙏 I now removed the
Since the bug did occur only 1 times out of 15, I can't really say if it really fixes the problem now. I'll post here again if it happens again, but hopefully that won't be the case. 👍 Question regarding this: Is there a way to check which |
@mandrasch One more thing, the
Unfortunately, I don't think there's a direct way to find out the Feature version.
|
Hu @samruddhikhandale I have just re-pushed the container files to https://github.com/TRYON-Technology/Docker-in-Docker-Issue I also added an error.log showing the issue: |
@samruddhikhandale Thanks so much for the detailed technical background information, very helpful! 🙏 |
If anyone needs a quick fix - I've gotten in the habit of doing this recently, it has been working well: #!/bin/bash
echo "Waiting for Docker to start"
# Wait for Docker to start up
while [ ! -S /var/run/docker.sock ]; do
echo -n "."
/usr/local/share/docker-init.sh
sleep 1
done
echo "Docker is running!" {
"postCreateCommand": "bash .devcontainer/start-docker.sh"
}
|
@bherbruck I ran the |
@darmalovan I get that same output if I run |
Important Note: Opened devcontainers/spec#299 which requests a new semantics to have "blocking" entrypoints that the CLI waits for. This way we can ensure that docker is already up and running for the mentioned ^ lifecycle scripts and is available in the container. Closing in the favor of #671. Feel free to reopen if needed, or comment on #671 if you still run into docker not running issues. Thank you! |
Sometimes docker fails to start within a container with the following error 👇
As manually running
/usr/local/share/docker-init.sh
fixes this issue, add some retry mechanism into the docker-init script .The text was updated successfully, but these errors were encountered: