Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot deploy due to BH_1.84.0-0.data file size #59

Closed
pawelru opened this issue Jul 30, 2024 · 7 comments
Closed

Cannot deploy due to BH_1.84.0-0.data file size #59

pawelru opened this issue Jul 30, 2024 · 7 comments

Comments

@pawelru
Copy link

pawelru commented Jul 30, 2024

Hello.

I have encountered a following problem when trying to deploy by pushing changes to gh_pages branch:

remote: error: File stable/site_libs/quarto-contrib/shinylive-0.5.0/shinylive/webr/packages/BH/BH_1.84.0-0.data is 121.02 MB; this exceeds GitHub's file size limit of 100.00 MB

BH is needed for anytime link
anytime is needed for ShinyWidgets link

This is an indirect dependency of the app code so I can't really control this.

Can you please advise what can be done with it? Can we safely remove a package and expect it to be downloaded on the client side? How to do this?

@georgestagg
Copy link
Collaborator

Currently some R packages for Wasm are very large. This is the result of a design decision introduced in webR in the interest of improving loading times for most (much smaller!) packages, at the cost of storing uncompressed package data. I'm currently thinking about how to improve the situation for packages that do compress particularly well, such as BH. The issue tracking that is at r-wasm/webr#460

For now, it looks like these assets are too large for GitHub, and we don't currently have a mechanism to handle that. You unfortunately cannot simply delete the .data file, the app will expect to find it and crash.

When exporting apps using the shinylive R package, you can export the app without bundling packages:

shinylive::export("myapp", "site", wasm_packages = FALSE)

This means the app won't ship with any bundled WebAssembly R package binaries. This would solve the problem in the short term, but unfortunately I can't see a simple way to export with this option set from Quarto documents.

So, there are several things we need to do here:

  1. Handle packages too large for GitHub in some way: Gracefully handle R packages too large for GitHub Pages posit-dev/r-shinylive#112

  2. Make wasm_packages = FALSE available from Quarto: Support setting R shinylive export's wasm_packages option from Quarto #60

  3. Better handle missing .data files as a Shinylive app starts: webR: Handle missing .data files gracefully posit-dev/shinylive#163

  4. (Future) Compress large Wasm package binaries in webR.

@pawelru
Copy link
Author

pawelru commented Jul 30, 2024

Thank you for a very detailed explanation and transforming this into more actionable backlog items. I'm looking forward for all of them, especially the one in shinylive Quarto extension because that's the interface I'm interacting with.

I have read the source code a little, did some reverse engineering and came up with the following:

packages_path <- sprintf("_site/site_libs/quarto-contrib/shinylive-%s/shinylive/webr/packages", shinylive::assets_version())

# remove the dirs with size >= 100 MB
for (x in list.dirs(packages_path)) {
    x_files <- file.info(list.files(x, full.names = TRUE))
    if (any(x_files$size > 100 * 1024^2)) {
        print(x)
        unlink(x, recursive = TRUE)
    }
}

# refresh the `metadata.rds` file
metadata_path <- file.path(packages_path, "metadata.rds")
metadata <- readRDS(metadata_path)
new_metadata <- metadata[intersect(names(metadata), list.dirs(packages_path, full.names = FALSE))]
saveRDS(new_metadata, metadata_path)

This will look into the package directory and delete a package dir if any of the child files exceeds 100MB. Then it drops the entries from metadata.rds file for consistency.
This way I was able to deploy and (looking briefly) everything looks fine. It might be because BH-dependent functionality is not used and all (note it's an indirect dependency). It might be worse for directly dependent packages - this I haven't tested.
Sharing this to whoever will encounter a similar issue unless more elegant will be available (see above).

@georgestagg
Copy link
Collaborator

georgestagg commented Jul 30, 2024

Yes, that should work OK as long as there are no entires in metadata.rds without the corresponding .data assets available. Saying that, keep in mind that metadata.rds is intended to be an internal structure, and so there's no guarantee we won't change it going forward.

If you find you need BH, you should be able to install it at runtime with install.packages("BH"). Without the bundled asset, webR will download it from the public webR package repo instead.

@wch
Copy link
Collaborator

wch commented Jul 31, 2024

BH is listed as a LinkingTo package, and I think in the specific case of BH, it is used for header files at compile-time and is not actually needed at run time. (I don't know if that is true in general for all LinkingTo packages, though).

I don't understand why the package is so large, though. On CRAN, the source package is about 13MB, the Mac binary package is about 12MB, and the Windows binary package is about 20MB.

@georgestagg Would it be possible to special-case BH so that if it's only in the LinkingTo section, webR won't try to bundle it? It would be good to check with the tidyverse team to see if this is a safe strategy, and if it could be applied in general for LinkingTo packages, or at least to some specific packages.

@georgestagg
Copy link
Collaborator

georgestagg commented Aug 1, 2024

I don't understand why the package is so large

It contains a copy of Boost, which is gigantic. Since it's just a bunch of text in C++ template files it compresses really, really well though. GitHub Pages won't compress .data files over the wire (😭), so I want to re-enable compression for webR packages. But it requires some thought to keep things snappy (i.e. avoiding R's built-in decompression routines).

I think you're right. IIUC LinkingTo is specifially for packages required at build time but not runtime. The configuration for webR should be tweaked to ignore LinkingTo during package dependency resolution, and similar for the r-shinylive/renv/pkgdepends logic that resolves app dependencies. I'll check with the r-lib team first, though.

@georgestagg
Copy link
Collaborator

Would it be possible to special-case BH so that if it's only in the LinkingTo section, webR won't try to bundle it?

In addition to the other work to avoid bundling and downloading packages only in the LinkingTo section, I've also uploaded a special version of BH to the webR public wasm package repo with the include/boost directory removed. I don't believe anyone will be negatively affected -- the directory includes only header files, which can't be used under WebAssembly anyway. The package is now just a few kb in size.

With this, even older shinylive deployments that request BH should benefit by having a much smaller download footprint.

BH will eventually be replaced when a new version is released on CRAN, but by that point the issues linked above will be deployed and it will no longer do the same damage.

@georgestagg
Copy link
Collaborator

Closed by posit-dev/r-shinylive#115.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants