Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FATAL ERROR: error setting up chroot: error remounting chroot in read-only: device or resource busy #10965

Open
nt opened this issue Sep 27, 2024 · 14 comments
Labels
type: bug Something isn't working

Comments

@nt
Copy link

nt commented Sep 27, 2024

Description

Starting a sandbox can randomly fail with FATAL ERROR: error setting up chroot: error remounting chroot in read-only: device or resource busy

Steps to reproduce

Happens ~0.8% of container start attempts

runsc version

20240916.0

docker version (if using docker)

No response

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

-> logs in comments

@nt nt added the type: bug Something isn't working label Sep 27, 2024
@EtiennePerot
Copy link
Contributor

EtiennePerot commented Sep 27, 2024

This appears to be a very old version of gVisor (the logs say VERSION_MISSING, but newer releases have a more detailed error message around this part of the sandbox startup process). Can you update to a newer build?

(Also, from context, it appears you are benchmarking gVisor. Please ensure to read the performance section of the Production Guide as you do this.)

@nt
Copy link
Author

nt commented Sep 30, 2024

runsc.log.20240930-135720.868828.boot.txt
runsc.log.20240930-135720.868828.gofer.txt
runsc.log.20240930-135720.868828.restore.txt

Thank for looking Etienne, here are some logs with release-20240916.0

@milantracy
Copy link
Contributor

i don't understand the logs here.

this chroot error happens when a containers starts at

return fmt.Errorf("error remounting chroot in read-only: %v", err)

from the boot.txt, the container starts with no issue and the application runs.

copybara-service bot pushed a commit that referenced this issue Sep 30, 2024
Earlier runsc was using /tmp as the sandbox chroot. However, as indicated in
#10965, some sandbox launches could fail with EBUSY while trying to remount the
/tmp chroot as read-only. We suspect that this could be due to the Golang
runtime or an imported library having an open FD within /tmp.

So we first create a new tmpfs mount over /tmp (like we did before). Then we
additionally create a subdirectory within /tmp and create a second tmpfs mount
there. This subdirectory is used as the chroot. This should protect against the
hypothesized failure scenarios.

Updates #10965

PiperOrigin-RevId: 680652316
@ayushr2
Copy link
Collaborator

ayushr2 commented Sep 30, 2024

@nt The attached logs do not show the FATAL ERROR: error setting up chroot: error remounting chroot in read-only: device or resource busy issue. The logs show that the sandbox was running and then received SIGTERM and was killed.

Anyways, @nixprime had a hypothesis of what could be going on. runsc creates a new tmpfs mount at /tmp and then creates the sandbox chroot there. This mount is re-mounted as read-only once the sandbox chroot is prepared. In between the time that we create the tmpfs mount at /tmp and it is remounted, we hypothesize that either the Golang runtime or some library opens a file descriptor within /tmp, which is not closed at the time of remount, causing it to fail with EBUSY.

Could you try patching #10975 and giving that a try?

@nt
Copy link
Author

nt commented Oct 2, 2024

Hi @ayushr2, thanks for looking. Unfortunately we can't upgrade gvisor as frequently as we'd like because we care about checkpoint stability. I will make sure to include that patch in our next update.

@andrew-anthropic
Copy link

Hi Ayush, I built a new version of runsc with the patch above and load-tested it. Unfortunately, it does not appear to solve the issue. If it's helpful, I uploaded fresh debug logs from when the issue occurred while on that build.
gofer.txt
boot.txt
run.txt

@avagin
Copy link
Collaborator

avagin commented Oct 4, 2024

What kernel do you use?

@andrew-anthropic
Copy link

@avagin If you mean for the Host, this is on 6.1.90.

uname -a
Linux sandboxing-20241003-202901-v1-84dc78694c-8c597 6.1.90+ #1 SMP PREEMPT_DYNAMIC Fri Jul 12 16:04:16 UTC 2024 x86_64 GNU/Linux

@ayushr2
Copy link
Collaborator

ayushr2 commented Oct 4, 2024

Thanks @andrew-anthropic. Could you try patching #10994 and see if it helps? We may need to adjust the interval being used there. We suspect that there may be a race with programs like fuser or lsof that are running externally and inspecting the sandbox process. (In your benchmarking setup, do you have any external observing processes which might be invoking readlink(2) on entries in /proc/sandbox_pid/fd/*?)

@andrew-anthropic
Copy link

Hi Ayush -- I'm happy to confirm that #10994 does indeed fix the issue! I'm not sure offhand what might be iterating through /proc/ -- the gVisor sandboxes are running on a pod in privileged mode on a kube node running many other processes, including a suite of observability jobs, so it's certainly possible.

LMK if it would be a help to do a dive (I'd probably do something like run lsof in a tight loop trying to catch something accessing those entries. But as nt mentioned above, we mitigated this pretty easily by just retrying sandbox creation in a retry-loop, so we're perfectly happy to close the issue at this point.

@ayushr2
Copy link
Collaborator

ayushr2 commented Oct 14, 2024

I'd prefer to not merge #10994 because it is very hacky. The change uses an arbitrary sleep and number of retries. From what I can tell, this is not a gVisor bug, and is specific to the environment being run in.

But as nt mentioned above, we mitigated this pretty easily by just retrying sandbox creation in a retry-loop, so we're perfectly happy to close the issue at this point.

OK closing!

@ayushr2 ayushr2 closed this as completed Oct 14, 2024
@jseba
Copy link
Contributor

jseba commented Nov 12, 2024

Just ran across this as well, we've also been seeing this here at Cloudflare, it's affecting only our first few stages in our release targets, so we definitely know there's a subtle configuration difference between these environment and the rest of our production with something else. My suspicion given that we applied basically the exact same patch for a month and saw the problem go away, removed it and it came back in a week is that something BPF/kernel related is scanning mounts as they are created and holding open filesystems or files in said filesystems. Given that the only processes in the system that know what's going on in those namespaces are gvisor and the kernel (if you want to call the kernel a process 😄), something has to be sticking it's nose where it doesn't belong has been my current theory. And with the explosion of BPF tracing, I've been thinking something has gotten a bit overzealous monitoring or scanning, especially endpoint security things.

@ayushr2 ayushr2 reopened this Dec 5, 2024
@ayushr2
Copy link
Collaborator

ayushr2 commented Dec 5, 2024

Given multiple users are hitting this, I am open to submitting something like #10994.

@ayushr2
Copy link
Collaborator

ayushr2 commented Dec 6, 2024

So #10994 actually had a bug, where after 3 EBUSY failures, it would just skip re-mounting as read-only. So patching it would make the error go away, but it's not the desired state.

@jseba @andrew-anthropic @nt Could you try patching #11259 instead and see if the error goes away?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants