Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Implement reproducible out/ folder contents across different filesystem layouts #3765

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rahat2134
Copy link

@rahat2134 rahat2134 commented Oct 17, 2024

This PR addresses issue #3660 by making the out/ folder contents more reproducible and filesystem layout agnostic. These changes allow for re-using the out/ folder as a build cache between different machines, supporting both coarse-grained (e.g., zip file transfer) and fine-grained (via Bazel remote cache protocol) caching strategies.

Key Changes:

  • Updated PathRef to normalize paths relative to workspace, Coursier cache, and home directory
  • Implemented NonDeterministicFiles to handle non-deterministic file content
  • Updated JsonFormatters to use the new path normalization methods
  • Added integration test ReproducibleOutTest to verify reproducibility
  • Updated build.mill in the integration test to define the necessary project structure

Reproducibility is achieved by:

  1. Normalizing paths in serialized PathRefs
  2. Handling non-deterministic files (e.g., mill-profile.json, worker JSONs)
  3. Zeroing out modification times for zip and jar files

Testing:

  • Added new integration test ReproducibleOutTest
  • Existing tests have been updated and pass

Documentation:

  • Updated relevant comments in the code

Performance Impact:

  • Initial testing shows minimal impact on build times, but more extensive performance testing may be needed for large projects

@rahat2134 rahat2134 changed the title Implement reproducible out/ folder contents across different filesystem layouts [WIP]Implement reproducible out/ folder contents across different filesystem layouts Oct 17, 2024
@rahat2134
Copy link
Author

rahat2134 commented Oct 25, 2024

@lihaoyi, Can you take a top-level look at the changes and whether they are aligned with the solution or not in any way?
I am working on tests to pass.

(Also, If somehow you can tell why the tests are failing, it will be a great help!)
Thanks!

@lihaoyi
Copy link
Member

lihaoyi commented Oct 26, 2024

@rahat2134 the top level changes look reasonable. The CI logs seem to be overloaded with all the printlns you added and won't load for me, you should try removing them and try to run the failing tests locally to debug further

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants