-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No local fallback after cache timeout #20123
Comments
The intention behind One possibility is to fold this work into #19904 (but that would be a fairly large project, so we might still consider implementing this differently in the interim). |
We used |
@tjgq : @JSGette 's comment made me re-check our logs. All the examples I can find relate to actions where it's downloading .d files as part of cc_common.compile(). While our builds are about 2/3rds compile actions the number of examples is starting to look suspicious. There are also examples of timeouts with actions that aren't part of cc_common.compile() and they correctly show up as warnings and a local action is run. Could there be something special about these .d files? I know that cc_common.compile() does a special end of action step to trim dependencies which I assume is using the .d files. Could there be something about that step which is making cache timeouts behave differently in this case? I'll try and do more investigation on our end as well. I did setup my own HTTP cache that would timeout for a specific CAS entry corresponding to a .d file but so far I haven't reproduced the issue. |
I'm also seeing something similar to this in Bazel 7 without BwtB:
Interestingly we're only seeing this for the |
@tjgq is this actually a feature request? It feels like a bug since this works just fine for us in Bazel 6 |
There are two distinct issues here.
A missing digest means that Bazel was previously made aware of the existence of a digest in the remote cache, but it's no longer there by the time it tries to download it. The "build without the bytes" default has changed between Bazel 6 and 7, which widens the window between these two events (during which the blob can be evicted, i.e., deleted from the cache). It's completely up to the remote cache to decide for how long to keep an entry around; Bazel does not set an explicit lifetime nor ask for entries to be deleted. Does the remote cache implementation you're using provide any sort of log that could be used to determine whether the missing digest used to be there, and if so, the reason why it was evicted? The fact that this only happens with |
Thanks for the reply, yeah the error are slightly different but related cause of the .d files. FWIW we're using remote_download_outputs=all so I was expecting nothing to change here for us. Any ideas what to check next besides the remote cache logs? I can open a separate issue for this if you think that makes sense too. |
I have a hunch: does setting Otherwise, capturing a |
Thanks for the suggestion we're testing out |
@tjgq So |
Thanks for confirming my suspicion; that gives me a hint as to where the problem might be. Do you mind filing a fresh issue so we can track it separately? |
I filed #22387 thanks! |
Description of the bug:
While running a build our Artifactory HTTP cache timed out for a request. Obviously we're looking at why it did that but we expected Bazel to fall back to running the action locally and instead it failed the build
ERROR: /<workspace path>/BUILD:1802:10: Compiling <source file>.c failed: unable to finalize action: Download of '/<artifactory repo path>/cas/b8f31e5fda95495273a86cc5c7395298eb321490395ca90815a9184f2a9ec980' timed out. Received 0 bytes.
The documentation suggests
--remote_local_fallback
only applies to remote execution but some comments on the bug tracker suggested it might also apply to remote caching so we tried that but still saw the issue.I can try to recreate the issue but it's not totally trivial as I'll need to setup an HTTP server that can deliberately time out. So thought I'd check if this is expected behavior or if perhaps there is a trivially obvious bug to someone who knows the Bazel source.
Which category does this issue belong to?
Core
What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Not trivial to reproduce! I can work on that if it would be useful.
Which operating system are you running Bazel on?
Linux - RHEL 8
What is the output of
bazel info release
?release 7.0.0-pre.20231011.2- (@Non-Git)
If
bazel info release
returnsdevelopment version
or(@non-git)
, tell us how you built Bazel.Built from the 7.0.0-pre.20231011.2 release tag.
What's the output of
git remote get-url origin; git rev-parse master; git rev-parse HEAD
?No response
Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.
No response
Have you found anything relevant by searching the web?
No
Any other information, logs, or outputs that you want to share?
No response
The text was updated successfully, but these errors were encountered: