-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feasibility check #1
Comments
It failed to install for me. Showing only the truncated output of First 55 lines:
Last 65 lines:
|
That seems to be the |
Thanks, I got it to compile/install on my laptop (Ubuntu 16.04 LTS). Took only 2m18s. Nice. On Mac OS X (merida1, El Capitan), it fails early with:
|
It does not get very far in Windows at the moment:
|
Thanks guys. @hpages I had previously modified the build set-up for Macs, but forgot to commit it... here it is, on the
I don't know whether that's a more general problem. I presume it could be solved with some autoconf-fu. I also want to see if @grimbough We probably need to write a |
Thanks Aaron. I'll try on merida1 again today and will let you know. |
Might want to hold off for a bit; I modified the build last night so that it would work on my linux machine at home, so now I have to check again that it still works on my mac at work! Edit: Okay, the good news is that The bad news is that the Mac install fails to find an appropriate version of OpenSSL, which means that the |
I'll look at this. I vaguely remember the OpenSSL thing being an issue at some point on our Mac builders. Can't remember what it was exactly and how we addressed it. Will need to dig around. |
Right. So to sum up the current state of play, you can (on Linux):
library(son.of.basilisk)
test() I've switched to using virtual environments, which means that each Bioc package can define its own desired suite of Python package versions to use. Especially cool is the fact that you can define multiple environments within and across R packages, so you can have one R package that can use functionality from mutually incompatible versions of Python packages! The Mac version is currently broken because of OpenSSL. Hoping @hpages has some ideas here. I have no ideas on what to do with Windows. @grimbough seems to have stuffed the static libraries for Rhdf5lib in the Git repository itself. Was this the only way? |
If you can build the binaries using the standard R Windows toolchain then I think it's fine to do that via I think there's a middle ground where the BioC build system has CMake installed, so you can compile there, but not expect most users to be able to 'install-from-source'. I'll see how far it gets if I just copy the commands into |
I've spent quite a while trying to get package to download and install the embeded version of Python, since that's just a zip file with pre-compiled Python.exe inside. However I've hit trouble installing support for virtualenv and now realised that the embeded version inclulde the text "Using pip to manage dependencies as for a regular Python installation is not supported with this distribution" Seems like this a bit of a deal breaker to this approach, but I don't use python often enough to say for certain. Any thoughts? Also, I think https://github.com/LTLA/basilisk/blob/8819464e47044e8c424691fb7207e21d0e76769e/R/useVirtualEnv.R#L81 should have |
I wonder whether the miniconda stuff in the linked issue above might help here. We just rely on miniconda (via rminiconda or otherwise) to give us a usable Python installation on all systems, which would avoid the need for having to package it ourselves. It seems to work for me by using: useBiocPython <- function() {
use_python(rminiconda::find_miniconda_python(), required=TRUE)
} If this also works on Windows, it might be the most expedient solution. |
The |
Sorry for the slow answer. I can reproduce the SSL error on merida1 (El Capitan). When installing basilisk with
even though we have a working installation of OpenSSL on the machine (that we installed with The Note that Python's configure script lets the user specify the root of the OpenSSL directory via the Otherwise, yes, one option would be to include your own H. |
Yeah, the Anyway, the way ahead might be to just use Miniconda to deliver a Python installation + all of its libraries into an R-controlled directory that is isolated from the system Python. The |
I went on merida1 again to test the However, on my laptop (Ubuntu 16.04 LTS), even though I can install the
Looks like it's picking up the wrong Python. Also I should have mentioned that rtracklayer does find the brew-installed openssl (see here) so no reason a priori the |
Huh. That's weird. What happens if you do something like: library(basilisk)
useBiocPython()
reticulate::py_config() ? I get:
Maybe I should explicitly set the path in |
Right, 862c098 contains an explicitly set |
So before or after commit 862c098
Looks like my problem is that I had the Anyway, the following (somewhat hacky) change seems to help in my case:
Then:
So far so good. Then I can install son.of.basilisk but then:
Not sure what to do next. |
Was there anything interesting in the son.of.basilisk installation logs? The virtual environments are currently created upon installation, so any error messages would have occurred there. |
Full log:
|
I bet that, in the new R session created by callr, the (You might ask why we create a new R session to call Python. Recall that reticulate only supports one version of Python in a single R session, and specifically, only one version of any given module can be loaded. If two packages rely on incompatible module versions... well, that's too bad. By doing Python operations in a separate R process, we guarantee that any given package can operate successfully regardless of how other packages are behaving.) |
Yup. After adding Something I didn't notice before is that during
I don't have a I'm leaving the computer now. Done for today. |
I think so... I've been ignoring it. Don't know what that's about either. |
@hpages could you do me a favor and make a PR to It would also be desirable to wrap it in: old <- Sys.getenv("RETICULATE_PYTHON")
Sys.unsetenv("RETICULATE_PYTHON")
on.exit(Sys.setenv(RETICULATE_PYTHON=old)) ... which I would do myself, but I don't know whether that would re-introduce the problem. |
Yes it does re-introduce the problem. Looks like there are other places in your code that benefit from the fact that we've unset |
I added |
works for me now |
Alright, it's showtime again. I think I addressed all the comments from #2 and now it's lean and mean on Linux and Mac. @grimbough can you try |
The file name for the Windows miniconda binary needs to be changed to:
After that building seems to work and it downloads the binary (an I presume executes it), but it doesn't seem to install in the correct place. When R tries to load the package after it's been built I get an error, which I think the pertinent bits are:
|
Thanks Mike. I changed the installer name but I don't know why it's not installing in the right place. Do you know where it's going instead? What's the value of |
@hpages I forsee some problems with the production of the binaries for basilisk and its client packages. If R literally
Is there a way to tell R to exclude certain directories from the binary? I never build binaries, so I don't know if |
I think the installation was failing because However I then run into issues with |
Does it just need reticulate does a lot of |
Hi Aaron, Don't know if there is a way to tell R to exclude certain directories from the binary. Never had this need before It's easy to build a package binary on Linux:
On Mac things are more complicated, especially if the package contains native code. We use the following script on our Mac builders: https://github.com/Bioconductor/BBS/blob/master/utils/build-universal.sh The script is a slightly modified version of a script that Simon Urbanek uses to produce the Mac package binaries on CRAN. You can see the exact command we use on the Windows and Mac builders here: https://bioconductor.org/checkResults/3.11/bioc-LATEST/beachmat/tokay2-buildbin.html I wonder about having basilisk's clients installing stuff under basilisk's installation folder. Would this happen during the installation of the client or later when code in the client actually needs to access some Python modules? It's probably fine if it's the former (granted concurrent attempts to install things are handled gracefully e.g. in the context of parallel installation) but the latter could be problematic. I've always seen package installation directories treated as read-only places (except during the installation process itself of course). FWIW it's a big "NO" during package review if we see that a package is writing to its own installation directory. Would be an even bigger "NOOOOO" if it were writing to the installation directory of another package. The CRAN people sometimes put the package library folder on a read-only partition to run the CRAN checks just to catch those packages that try to write there. We've been considering doing the same thing for a while. |
Oh, you definitely won't like how basilisk works now, then. I had initially stuffed things in the installation directory because I thought a frozen Python instance would be added during package installation, and that it would never change, much like a native library. Then things got out of hand with lazy installation, etc. I could probably switch this easily to use rappdirs, which would (i) avoid packaging the Python instance or venvs into the tarball, and (ii) avoid problems with the deletion of lazy history when basilisk is reinstalled but its clients are not. However, that introduces its own sequence of headaches related to critical content being in a separate directory, e.g., whether the selected directory for the installing process is the same as the selected directory for the process that ultimately uses the package. It's not clear that it would be, so I'd need to add a level of redirection to ensure I get the correct path. |
Does it make any sense to manage this process with BiocFileCache? It seems to boil down to the placement of software components which are files, and management of paths related to those files. |
Alternatively the simple/straightforward way to go is to have basilisk install the "core" Python modules and to NOT delay that. 624M of stuff is not a big deal (many BSgenome data packages are bigger than that) even if it might grow to 1G in the future. This would stick to the "installation folders are treated as read-only" paradigm and thus would avoid many potential complications. Sounds like a KISS situation to me. |
OTOH if all this stuff is not going to be bundled in the binary then downloading/installing this stuff cannot be part of the basilisk installation procedure (installation of a binary package is basically just extracting a tarball or zip file at the right location, no post-install hook is triggered AFAIK). Which means that it would need to happen later (but not at load-time please). Which means it should go to a place that is not under the package installation folder (to stick to the "installation folders are treated as read-only" paradigm). All this is if all the stuffing (miniconda + core modules) is not going to be bundled in the binary. |
Not really, I need a directory rather than paths to individual files. I don't think BFC handles that.
This could work, but |
Yep I have these "storing paths of more than 100 bytes is not portable" warnings too: https://bioconductor.org/checkResults/3.11/bioc-LATEST/BSgenome/malbec2-buildsrc.html AFAIK they've never caused problems (sounds like an overzealous warning from |
Switching to an Anaconda installer is very satisfactory but If we can't turn off the warnings, it's not going to build in a reasonable time, which would require us to use an external directory for the files. This could follow a BiocFileCache-like model where client packages set up a basilisk instance at run time, and the set-up is a no-op if an instance is already there. But on servers, every individual user will be pulling down their own Anaconda instance rather than using a centrally installed instance... and that would suck. I guess I could add an environment variable directing basilisk to look for content in a user/sysadmin-specified location first before creating an new instance. |
You can try to bring the issues with |
Who's in charge of |
I notice that we could set the |
Looks like this works. Would this be a palatable change for the BioC build system? export R_INSTALL_TAR=tar |
Successful builds in Bioconductor/Contributions#1318 indicate that this is indeed feasible. |
Tagging @hpages and @grimbough.
The idea is to package up a self-contained instance of Python for distribution via the R package management system. This aims to control the version of both Python and all of its modules for a given Bioconductor release, thus allowing BioC packages to reliably deploy with Python code.
The current version of this package works well enough on Unix, provided libffi is available. The tricky bit is what to do with the pre-built binaries for macs and windows. The Python executable can either be linked to a shared library or a static library, and while the latter is more obviously portable, the former is necessary for reticulate to work at all. And we really want reticulate to work here, otherwise no one's going to get on board with this solution.
So, this means that we need to distribute the compiled shared library along with the executable, which is not great. However, R already faces (and solves) this challenge when it distributes pre-built binaries of any package with native code, as these also contain shared libraries. Casual inspection of those libraries indicate that they have hard-coded paths like:
So in principle, we could just piggy-back off that by hard-coding similar paths into the Python shared library (which would also be dumped somewhere in the basilisk installation directory, so it wouldn't be any different from what we see above).
I also had some difficulties linking to OpenSSL on my mac, but I don't know whether that would be a more general problem or just something to do with my system. Looking at the openssl
configure
file suggests that some fiddling is required there.The text was updated successfully, but these errors were encountered: