Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docker-in-docker] - Use blocking entrypoints to ensure dockerd startup #671

Open
samruddhikhandale opened this issue Aug 24, 2023 · 7 comments

Comments

@samruddhikhandale
Copy link
Member

We have seen flakiness in docker startups for docker-in-docker Feature where docker is not running in the container. See #634 and #660.

#669 adds few retry mechanisms to attempt to fix this issue.

/usr/local/share/docker-init.sh which starts/retries dockerd is added to the entrypoint property for the Feature. This command runs in the background and is not a blocking script for the container startup. Since it's in the background, onCreateCommand/postCreateCommand/postStartCommand could all start executing before docker is fully running. If it takes docker too long, that could introduce flakiness in those lifecycle scripts.

Opened devcontainers/spec#299 which requests a new semantics to have "blocking" entrypoints that the CLI waits for. This way we can ensure that docker is already up and running for the mentioned ^ lifecycle scripts and is 💯 available in the container.

This issue tracks updates to dnd Feature, when blocking entrypoint is available.

Blocked by -

@Christopholus
Copy link

Seconding this! We are unfortunately still running into this issue. We're using DDEV and transitioned from using docker-in-docker, to the universal:2 package as per the conversation over in #634 yet any DDEV command failed with the following error message:
Could not connect to a docker provider. Please start or install a docker provider. For install help go to: https://ddev.readthedocs.io/en/latest/users/install/

Doing a partial and full rebuild is still netting us the same issue. The only thing we've found works is completely deleting the Codespace and recreating it from scratch. However after a single codespace inactivity timeout sends us back to the docker to no longer being recognized error.

Staying tuned here to see if there's any activity on this fix 🤞

If it's helpful, here's a snapshot of our devcontainer.json

{
	"image": "mcr.microsoft.com/devcontainers/universal:2",
	"features": {
		"ghcr.io/ddev/ddev/install-ddev:latest": {}
	},
	"forwardPorts": [3000],
	"portsAttributes": {
		"3000": {
			"label": "Vite DevServer"
		},
		"3306": {
			"label": "MySQL"
		},
		"6006": {
			"label": "Storybook"
		},
		"8027": {
			"label": "Mailhog"
		},
		"8036": {
			"label": "PHPMyAdmin"
		},
		"8443": {
			"label": "Public (HTTPS)"
		},
		"8080": {
			"label": "Public (HTTP)"
		}
	},
        "postCreateCommand": "bash -c 'ddev poweroff && ddev start -y && ddev composer install'"
}

@samruddhikhandale
Copy link
Member Author

samruddhikhandale commented Aug 31, 2023

Hi @Christopholus 👋

We released new universal image yesterday with the docker-in-docker:v2.4.0 (which includes retries). See https://github.com/devcontainers/images/releases/tag/v0.3.17 However, we are still in the process of updating the codespaces cache with this new image.

If you run devcontainer-info command in a command, I don't think you'd see v0.3.17 , can you try that?

Thanks for your patience, we will soon have the codespaces cache updated. However, you should get latest image with full rebuild of a codespace.

Update: You could pin the universal image, however, as it's not cached yet, it would lead to slower codespace creations. (unless you use prebuilds)

"image": "mcr.microsoft.com/devcontainers/universal:2.5.2",

@Christopholus
Copy link

Thanks @samruddhikhandale!

I just took a look with our team, and it when we use the devcontainer-info command, we see the following output:
image

So it does look like we're seeing v0.3.17 - and a release date of yesterday with lines up with your comment. Unfortunately, we are still seeing the Could not connect to a docker provider error when trying to run DDEV.

Please let us know if you have any other ideas about what might be happening - and thanks for your help!

@samruddhikhandale
Copy link
Member Author

samruddhikhandale commented Aug 31, 2023

Interesting, investigating with the provided devcontainer ^.

In the meanwhile, @Christopholus if docker ps fails within a codespace, you can manually start docker with 👇

sudo pkill dockerd && sudo pkill containerd && /usr/local/share/docker-init.sh 

Just curious, does pinning universal version to v2.5.1 improves or worsens the docker startup?

@Christopholus
Copy link

Thanks for this @samruddhikhandale. After some review with our team - it looks like pinning the universal version to v2.5.1 worked like a charm! We've had several successful startups in a row.

We'll roll this tweak across the other project we're utilizing Codespaces for, and monitor to see if the issue crops back up. I'll be sure to poke back in here if we have any more Docker drama. I'm hopeful that in a few weeks time we can unpin the version (as maybe the cache may have been updated by then 🤞)

Thanks again for your prompt responses here!

@mandrasch
Copy link

mandrasch commented Nov 30, 2023

hey, we noticed in recent startups that docker is not ready (again) when using the postCreateCommand. Were there any major changes? (We follow this guide in the DDEV community https://ddev.readthedocs.io/en/latest/users/install/ddev-installation/#github-codespaces). One user reported that postCreateCommand failed, but postAttachCommand worked and docker was available there.

Any one else noticing this?

Thanks for fixing this the first time! 🙏 (https://github.com/orgs/community/discussions/63776#discussioncomment-6745270)

@mandrasch
Copy link

hey, we noticed in recent startups that docker is not ready (again) when using the postCreateCommand. Were there any major changes? (We follow this guide in the DDEV community https://ddev.readthedocs.io/en/latest/users/install/ddev-installation/#github-codespaces). One user reported that postCreateCommand failed, but postAttachCommand worked and docker was available there.

Any one else noticing this?

Thanks for fixing this the first time! 🙏 (https://github.com/orgs/community/discussions/63776#discussioncomment-6745270)

Reproduction with https://github.com/mandrasch/ddev-craftcms-vite

2023-12-03 15:19:38.906Z: Running the postCreateCommand from devcontainer.json...

2023-12-03 15:19:38.915Z: chmod +x ./.devcontainer/postCreateCommand.sh && ./.devcontainer/postCreateCommand.sh
2023-12-03 15:19:39.083Z: + ddev config global --omit-containers=ddev-router
2023-12-03 15:19:50.015Z: ERRO[0010] app.FindContainerByType(web) failed 
ERRO[0010] app.FindContainerByType(db) failed  
Could not connect to a Docker provider. Please start or install a Docker provider.
For install help go to: https://ddev.readthedocs.io/en/latest/users/install/
2023-12-03 15:19:50.060Z: {"outcome":"error","message":"Command failed: /bin/sh -c chmod +x ./.devcontainer/postCreateCommand.sh && ./.devcontainer/postCreateCommand.sh","description":"The postCreateCommand in the devcontainer.json failed.","containerId":"f60437911600ca72d26ac8d58c7b4f846bf51b3e7a6fca6b3533082cc957148e"}
2023-12-03 15:19:50.070Z: postCreateCommand failed with exit code 1. Skipping any further user-provided commands.

If I run ./.devcontainer/postCreateCommand.sh afterwards manual in terminal, it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants