Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Linux ARM64 builder node #255

Closed
wants to merge 6 commits into from

Conversation

martin-g
Copy link
Contributor

Name: kunpeng1

Signed-off-by: Martin Tzvetanov Grigorov [email protected]

@hpages
Copy link
Contributor

hpages commented Jan 27, 2023

Hi @martin-g,

Thanks for the PR. After some internal discussion we decided that it would be easier if you configured kunpeng1 as a standalone builder (as opposed to a non-standalone builder that participates to our official daily builds). At least for now.

See our Prepare-Ubuntu-20.04-HOWTO.md document for the difference between a standalone builder and a non-standalone builder (a.k.a. secondary builder). The major differences between the standalone vs non-standalone setups are:

  • With the standalone setup, kunpeng1 won't need to communicate with our central builder nebbiolo1.
  • It will also be able to run the software builds at its own cadence (it seems that kunpeng1 has 8 cpus only so there's no way it would be able to follow the cadence of our daily builds).
  • It will generate its own local build report.

Note that the standalone setup requires to install and run the Apache server on the machine.

In addition, you will need to make the following changes to your current PR:

  • Add the prerun.sh and postrun.sh scripts to the BBS/3.17/bioc/kunpeng1/ folder (you can copy them from BBS/3.17/bioc/nebbiolo1/). Oh but it looks like you have them already! Note that these were not needed if we were going for a non-standalone setup for kunpeng1, since the secondary nodes only need to run the run.sh script.

  • You need to replace your current config.sh file with something similar to what's used for nebbiolo1. The config.sh file you have right now is the typical config.sh that we use on our secondary nodes like merida1, that is, on nodes that participate to our daily software builds for BioC 3.17 (and communicate with nebbiolo1 during those builds). OTOH the config.sh file in BBS/3.17/bioc/nebbiolo1/ is the typical config.sh file to use on a standalone builder. Note that it is a more complete version of the simpler config.sh file used for secondary nodes i.e. it defines more environment variables.

  • Do not define the BBS_OUTGOING_MAP and BBS_FINAL_REPO variables. Those are only needed if you are going to propagate packages to a CRAN-style repository.

  • Set BBS_REPORT_NODES to "kunpeng1".

  • Only set the BBS_PUBLISHED_REPORT_* variables if you're going to publish the build report to another machine. The postrun.sh script will generate the report locally on kunpeng1, in ~biocbuild/public_html/BBS/3.17/bioc/report/, but it will additionally push this report to a different machine (e.g. a public machine). If you're not going to do that, then you don't need to define the BBS_PUBLISHED_REPORT_* variables. You should also comment out the last line in postrun.sh.

  • You don't need to define the BBS_NOTIFY_NODES variable either.

Once you are ready to run the builds, try to run them manually first before you run them via a crontab.

First try to run the prerun.sh script and see how it goes:

cd /home/biocbuild/BBS/3.17/bioc/`hostname` && ./prerun.sh

If everything is going as expected, this will produce a long output showing progression. If it completes successfully (this might take one hour or two), then the last lines of output should be something like this:

...
BBS> END STAGE1 loop.
BBS> -------------------------------------------------------------
BBS> STAGE1 SUMMARY:
BBS>     o Working dir: /home/biocbuild/bbs-3.17-bioc/meat
BBS>     o 2160 pkg(s) listed in file meat-index.dcf
BBS>     o 2160 pkg dir(s) queued and processed
BBS>     o 2160 srcpkg file(s) produced
BBS>     o Total time: 889.83 seconds
BBS> -------------------------------------------------------------
BBS> START creating PACKAGES index file for target repo at Thu Jan 26 14:37:02 2023...
BBS>   Copying PACKAGES to BBS_CENTRAL_BASEURL/src/contrib/:
BBS>   runJob(): /usr/bin/rsync -rl --delete --delete-excluded --exclude='.svn' --exclude='.git' --exclude='.github' --exclude='.git_*' PACKAGES /home/biocbuild/public_html/BBS/3.17/bioc/src/contrib
BBS>     pid = 3692967 / time [] = 0.10 seconds / retcode = 0 / OK
BBS> DONE creating PACKAGES index file for target repo at Thu Jan 26 14:39:13 2023.
BBS> [makeTargetRepo] DONE.
BBS> [prerun] DONE make-target-repo at Thu Jan 26 14:39:13 2023.

Next thing to try is the run.sh script:

cd /home/biocbuild/BBS/3.17/bioc/`hostname` && ./run.sh

This might take between 1 or 4 days, or even more, to complete, depending on how powerful your machine is and how you've set the BBS_NB_CPU, BBS_BUILD_NB_CPU, and BBS_CHECK_NB_CPU variables.

If it completes successfully, then the last lines of output should be something like this:

...
BBS> END STAGE4 loop.
BBS> -------------------------------------------------------------
BBS> STAGE4 SUMMARY:
BBS>   o Working dir: /home/biocbuild/bbs-3.17-bioc/meat
BBS>   o 2077 srcpkg file(s) in working dir
BBS>   o 2077 srcpkg file(s) queued and processed
BBS>   o Total time: 22236.41 seconds
BBS> -------------------------------------------------------------
BBS> [STAGE4] DONE at Thu Jan 26 00:56:27 2023.
BBS> START writing BBS_EndOfRun.txt ticket.
BBS>   cd BBS_MEAT_PATH
BBS>   Copying BBS_EndOfRun.txt to BBS_CENTRAL_BASEURL/products-in/nebbiolo1/:
BBS>   runJob(): /usr/bin/rsync -rl --delete --delete-excluded --exclude='.svn' --exclude='.git' --exclude='.github' --exclude='.git_*' BBS_EndOfRun.txt /home/biocbuild/public_html/BBS/3.17/bioc/products-in/nebbiolo1
BBS>     pid = 1553743 / time [] = 0.10 seconds / retcode = 0 / OK
BBS> END writing BBS_EndOfRun.txt ticket.

Last step is to run the prerun.sh script (this will generate the build report):

 cd /home/biocbuild/BBS/3.17/bioc/`hostname` && ./postrun.sh

If this completes successfully (should only take a couple of minutes or less), then the last lines of output should be something like this:

...
BBS> [stage6d] Will generate HTML report for nodes: kunpeng1
BBS> [make_all_LeafReports] Current working dir '/home/biocbuild/public_html/BBS/3.17/bioc/report'
BBS> [make_all_LeafReports] Creating report package subfolders and populating them with index.html files ... OK
BBS> [make_node_LeafReports] Node kunpeng1: BEGIN ...
BBS> [make_node_LeafReports] Node kunpeng1: END.
BBS> [write_node_report] Node kunpeng1: BEGIN ...
BBS> [write_node_report] Node kunpeng1: END.
BBS> [make_BioC_MainReport] BEGIN ...
BBS> [make_BioC_MainReport] END.
BBS> [stage6d] DONE at Thu Jan 26 11:11:54 2023.

The HTML report should be available at http://localhost/BBS/3.17/bioc/report/

Let us know how it goes.

Best,
H.

@martin-g martin-g marked this pull request as draft January 27, 2023 13:01
@Yikun
Copy link

Yikun commented Jan 30, 2023

@hpages Thanks for your detail guide, @martin-g help to configure and launch the BBS in a local aarch64 linux machine.

Finally, we got the first report, and I also upload it in here to convenient to others take a look (just for review):

https://yikun.github.io/bioconductor/report/long-report.html
https://yikun.github.io/bioconductor-0131/report/long-report.html

We took a rough look on the checkresults, there are mainly below kinds of error in the reporst:

1. (1320 counts) WARNING about checking loading without being on the library search path ...

* checking loading without being on the library search path ... WARNING
Error in library(a4, lib.loc = "/home/biocbuild/bbs-3.17-bioc/R/library") : 
  there is no package called ‘a4’
Execution halted

It seems search path should be /home/biocbuild/bbs-3.17-bioc/R/site-library rather than library. And Linux x86_64 seems didn't have this check.

2. (58 counts) TIMEOUT in BUILD and CHECK

this should due toi the build machine performance, we will try to upgrade the build machine flavor.

3. (784 counts) INSTALL and BUILD error:

Some basic packages not install or not available, this might be fixed one by one.

* installing to library ‘/home/biocbuild/bbs-3.17-bioc/R/site-library’
ERROR: dependency ‘BSgenome.Hsapiens.UCSC.hg18’ is not available for package ‘VanillaICE’
* installing to library ‘/home/biocbuild/bbs-3.17-bioc/R/site-library’
ERROR: dependency ‘tkrplot’ is not available for package ‘affylmGUI’
--- re-building ‘APAlyzer.Rmd’ using rmarkdown
Quitting from lines 69-73 (APAlyzer.Rmd) 
Error: processing vignette 'APAlyzer.Rmd' failed with diagnostics:
there is no package called 'TBX20BamSubset'
--- failed re-building ‘APAlyzer.Rmd’
--- re-building ‘bambu.Rmd’ using rmarkdown
Quitting from lines 117-119 (bambu.Rmd) 
Error: processing vignette 'bambu.Rmd' failed with diagnostics:
there is no package called 'BSgenome.Hsapiens.NCBI.GRCh38'
--- failed re-building ‘bambu.Rmd’

I don't know if this is the same as with Mac arm64 CI first started, pls let us know if you guys has any success experience can share with us. We'll also try to fix them.

@martin-g
Copy link
Contributor Author

@hpages @vjcitn Do you have any hints how to approach issue number 3 above (the missing dependencies) ?
What it the proper way to make sure all package dependencies are provided for the package's build and check ?

@vjcitn
Copy link
Contributor

vjcitn commented Jan 31, 2023

The first step in making a builder is going through the appropriate BiocPkgTools *PkgDependency* function and actually installing, without any checking, all needed packages in the R that you are using for building. After that is done you will find that that builder will succeed much more often. There are issues related to completeness of dependency declarations that are under discussion but I hope this first step is clear enough.

@vjcitn
Copy link
Contributor

vjcitn commented Jan 31, 2023

You will need ~20GB of disk for that installation? Maybe more, maybe less. Be sure not to fill the disk. Also manage the TMPDIR variable well.

@martin-g
Copy link
Contributor Author

There are more than 350GB of disk space!
/tmp is not a separate partition, so it is part of these 350GB.

I didn't understand well whether I need to do something manually with BiocPkgTools PkgDependency or it has been done by the first report build and there will be less such kind of issues in the second and following build runs ?

@vjcitn
Copy link
Contributor

vjcitn commented Jan 31, 2023

If there are a lot of failures due to unavailable packages that have not been dropped frpm CRAN or Bioc, I think you can conclude that the builder just wasn't set up with the 4000+ packages assumed to be available for all installations to succeed. So get the list of all packages and use BiocManager to install them all ... it will avoid redundancies and can use multicore via Ncpus= argument.

@vjcitn
Copy link
Contributor

vjcitn commented Jan 31, 2023

Herve will have more definitive information

@martin-g
Copy link
Contributor Author

I have installed all apt and pip dependencies from https://github.com/Bioconductor/BBS/tree/master/Ubuntu-files/20.04 as explained in https://github.com/Bioconductor/BBS/blob/master/Doc/Prepare-Ubuntu-20.04-HOWTO.md

But I haven't seen anything explaining that I need to install 4000+ R packages from CRAN. Maybe this is the step I missed ?!

@hpages
Copy link
Contributor

hpages commented Jan 31, 2023

@martin-g @Yikun

You don't need to install any package manually, that would be wild! 😉

The build system normally takes care of installing all Bioconductor software packages + their deps. That's 4000+ packages! Seems that you actually have them: https://yikun.github.io/bioconductor/report/kunpeng1-R-instpkgs.html (to get to that page, click on the link under Installed pkgs at the top of the main page of the report).

About those warnings:

* checking loading without being on the library search path ... WARNING
Error in library(a4, lib.loc = "/home/biocbuild/bbs-3.17-bioc/R/library") : 
  there is no package called ‘a4’
Execution halted

It looks like this package has a loading problem when not on .libPaths:
see the messages for details.

Never seen them before but I suspect they're kind of related to the use of the <R_HOME>/site-library/ folder to install packages. Funny enough, this is a setup that we're planning to use very soon on our Linux builders but that we're not using yet. I'm sorry that you're using a setup that we've not fully tested yet and that is apparently causing problems.

Anyways, these look like spurious warnings to me. An easy way to get rid of them, if you wanted to (and if my diagnostic is correct), would be to nuke the <R_HOME>/site-library/ folder and to re-run the builds (prerun.sh+run.sh+postrun.sh). This will reinstall everything under <R_HOME>/library/ like on our builders.

Best,
H.

@Yikun
Copy link

Yikun commented Feb 1, 2023

Never seen them before but I suspect they're kind of related to the use of the <R_HOME>/site-library/ folder to install packages

@hpages Ah, thanks for your reply, I remembered that I had encountered a issue after new version (4.2.x) R lang. We finally introduce the R_LIBS_SITE manually to solve. This might also related.

@Yikun
Copy link

Yikun commented Feb 3, 2023

would be to nuke the <R_HOME>/site-library/ folder and to re-run the builds (prerun.sh+run.sh+postrun.sh)

@hpages Looks like it works:

https://yikun.github.io/bioconductor-0201/report/long-report.html

Although there are still some error, but I think we are on the right road!

@vjcitn
Copy link
Contributor

vjcitn commented Feb 3, 2023

I wonder why

--- re-building ‘DESeq2.Rmd’ using rmarkdown
Quitting from lines 310-318 (DESeq2.Rmd) 
Error: processing vignette 'DESeq2.Rmd' failed with diagnostics:
there is no package called 'tximportData'

@hpages
Copy link
Contributor

hpages commented Feb 3, 2023

@Yikun Great!

Looks like tximportData failed to install. What do you see in the logs for this? Look at DESeq2.install-summary.dcf and DESeq2.install-out.txt tximportData.install-summary.dcf and tximportData.install-out.txt in ~/public_html/BBS/3.17/bioc/products-in/kunpeng1/install/. Maybe you can share this here?

@Yikun
Copy link

Yikun commented Feb 3, 2023

##############################################################################
##############################################################################
###
### Running command:
###
###   /home/biocbuild/bbs-3.17-bioc/R/bin/Rscript -e "source('/home/biocbuild/BBS/utils/installNonTargetPkg.R');installNonTargetPkg('tximportData')"
###
##############################################################################
##############################################################################


trying URL 'https://bioconductor.org/packages/3.17/data/experiment/src/contrib/tximportData_1.27.0.tar.gz'
Content type 'application/x-gzip' length 316755255 bytes (302.1 MB)
============
downloaded 73.8 MB

Error in download.file(url, destfile, method, mode = "wb", ...) : 
  download from 'https://bioconductor.org/packages/3.17/data/experiment/src/contrib/tximportData_1.27.0.tar.gz' failed
In addition: Warning messages:
1: In download.file(url, destfile, method, mode = "wb", ...) :
  downloaded length 77336018 != reported length 316755255
2: In download.file(url, destfile, method, mode = "wb", ...) :
  URL 'https://bioconductor.org/packages/3.17/data/experiment/src/contrib/tximportData_1.27.0.tar.gz': Timeout of 600 seconds was reached
Warning in download.packages(pkgs, destdir = tmpd, available = available,  :
  download of package 鈥榯ximportData鈥� failed
Package: tximportData
Version: None
Command: /home/biocbuild/bbs-3.17-bioc/R/bin/Rscript -e "source('/home/biocbuild/BBS/utils/installNonTargetPkg.R');installNonTargetPkg('tximportData')"
StartedAt: 2023-02-01 05:35:31 -0000 (Wed, 01 Feb 2023)
EndedAt: 2023-02-01 05:45:38 -0000 (Wed, 01 Feb 2023)
EllapsedTime: 607.0 seconds
RetCode: 0
Status: ERROR

@hpages Emm, looks like due to network issue...

@vjcitn
Copy link
Contributor

vjcitn commented Feb 4, 2023

make sure

options(timeout=3600)

or some such setting is in force for the builder

@hpages
Copy link
Contributor

hpages commented Feb 4, 2023

Yeah options(timeout=3600) might help, which you can achieve by setting R_DEFAULT_INTERNET_TIMEOUT to 3600 in ~/BBS/3.17/Renviron.bioc (note that it's already set to 600 there).

Anyways 10 min to download 73.8 MB seems extremely slow to me, especially for a server that runs the builds. Was probably some intermittent network outage.

@Yikun
Copy link

Yikun commented Feb 4, 2023

@vjcitn @hpages Thanks! Beside this one, there are still many R package install timeout [1]. The build machine is in Singapore region, I'm not sure whether is too far from the bioc repo.

Anyway, I have set the R_DEFAULT_INTERNET_TIMEOUT, and re-run the CI, let's see how it going.

[1] https://gist.github.com/Yikun/e9101cb48396f2a698374394cdca525c

@Yikun
Copy link

Yikun commented Feb 5, 2023

I'm not sure why the package install timeout is 2400:

Package: SNPlocs.Hsapiens.dbSNP155.GRCh37
Version: None
Command: /home/biocbuild/bbs-3.17-bioc/R/bin/Rscript -e "source('/home/biocbuild/BBS/utils/installNonTargetPkg.R');installNonTargetPkg('SNPlocs.Hsapiens.dbSNP155.GRCh37')"
StartedAt: 2023-02-05 09:06:54 -0000 (Sun, 05 Feb 2023)
EndedAt: 2023-02-05 09:46:56 -0000 (Sun, 05 Feb 2023)
EllapsedTime: 2402.0 seconds
RetCode: None
Status: TIMEOUT

but I already set R_DEFAULT_INTERNET_TIMEOUT to 6000 (For some pkg, like SNPlocs.Hsapiens.dbSNP155.GRCh37, (5GB+) I need more time), another info is that it's 6000 TIMEOUT when I excuted it manually.

./config.sh
/home/biocbuild/bbs-3.17-bioc/R/bin/Rscript -e "source('/home/biocbuild/BBS/utils/installNonTargetPkg.R');installNonTargetPkg('SNPlocs.Hsapiens.dbSNP155.GRCh37')"

So just curious where to set / configure / import the 2400 TIMEOUT.

@hpages
Copy link
Contributor

hpages commented Feb 6, 2023

bioconductor.org is served via Amazon CloudFront so physical distance from the bioc repo should not matter. You can see this with ping bioconductor.org, it should resolve to an IP in your region.

@hpages
Copy link
Contributor

hpages commented Feb 6, 2023

BBS has its own timeout limit of 40 min per command (INSTALL, BUILD, CHECK). You can change this via environment vartiables BBS_INSTALL_TIMEOUT, BBS_BUILD_TIMEOUT, and/or BBS_CHECK_TIMEOUT. You would typically define them in ~/BBS/3.17/bioc/kunpeng1/config.sh e.g. BBS_INSTALL_TIMEOUT=6500 (time must be specified in seconds).

@Yikun
Copy link

Yikun commented Feb 8, 2023

Here is latest results: https://yikun.github.io/bioconductor-0208/report/long-report.html

According latest results:

1. 94+ INSTALL ERROR:

  • There are 28 ERROR is valide and we record in here.
  • Left 66 ERROR due to depends 28 package, due to not available for package.

2. 240+ BUILD ERROR:

3. CHECK ERROR

some check error are due to limit network and performance, we are going to upgrade 8 cores to 32 core to rerun job to see any improvement or not.

@hpages
Copy link
Contributor

hpages commented Feb 8, 2023

I'm just going to focus on INSTALL errors for now. Furthermore, I'm just going to focus on CRAN packages that failed to install for now. Unfortuntaely, the build report doesn't provide the details for these failures.

So here is what I did:

The online report contains a tarball, report.tgz, that I downloaded to take a closer look at the INSTALL errors:

wget https://yikun.github.io/bioconductor-0208/report/report.tgz
tar zxvf report.tgz
cd report/
grep 'is not available for package' */kunpeng1-install.html | cut -d ' ' -f 3 | sort | uniq | wc
#     31      31     478

So 31 packages didn't get installed. To get the list:

grep 'is not available for package' */kunpeng1-install.html | cut -d ' ' -f 3 | sort | uniq

This produces the following output:

‘animation’
‘celda’
‘ChemmineR’
‘CompoundDb’
‘cytomapper’
‘EntropyExplorer’
‘flowClust’
‘gmapR’
‘IHW’
‘kmlShape’
‘lfa’
‘lpsymphony’
‘Maaslin2’
‘magick’
‘mGSZ’
‘mppa’
‘msa’
‘multipanelfigure’
‘openCyto’
‘propr’
‘Rbowtie2’
‘Rbwa’
‘ReactomeContentService4R’
‘ReorderCluster’
‘rsvg’
‘SpatialExperiment’
‘spatstat.core’
‘spp’
‘summarytools’
‘taRifx’
‘tiledb’

Now some of those packages are CRAN packages (e.g. animation) and others are Bioconductor packages (e.g. ChemmineR).

Focusing on CRAN packages for now:

CRAN packages can fail to install for different reasons. One common reason is that the package got removed from CRAN. This is the case for example for EntropyExplorer (required by Bioconductor package epihet): https://cran.r-project.org/package=EntropyExplorer

Other CRAN packages that got removed are: kmlShape (required by Bioconductor package tscR), mGSZ (required by Bioconductor package ASpediaFI), mppa (required by Bioconductor package NBSplice), propr (required by Bioconductor package timeOmics), ReorderCluster (required by Bioconductor package AneuFinder), spatstat.core (required by Bioconductor package Statial), spp (required by Bioconductor package ChIC), and taRifx (required by Bioconductor package pulsedSilac).

Note that those removals from CRAN cause the same breakage on our daily builds: https://bioconductor.org/checkResults/3.17/bioc-LATEST/ Not much we can do, so we're just gonna have to ignore those failures.

Other CRAN packages that didn't install are: magick, rsvg, multipanelfigure, and summarytools. The *.install-out.txt files in ~/public_html/BBS/3.17/bioc/products-in/kunpeng1/install/ on kunpeng1 should contain the details of what went wrong.

Note that magick and rsvg have system requirements: libmagick++-dev and librsvg2, respectively (see https://cran.r-project.org/package=magick and https://cran.r-project.org/package=rsvg for the details). You might want to double check that you have them on kunpeng1. magick and rsvg have many reverse dependencies (direct and indirect) so the fact that they failed to install on kunpeng1 is actually causing a lot of damage. The complete list of deb packages that need to be installed on a Linux builder is documented there: https://github.com/Bioconductor/BBS/blob/master/Doc/Prepare-Ubuntu-20.04-HOWTO.md#18-install-ubuntudeb-packages

For multipanelfigure and summarytools I have no idea what happened. Wanna share multipanelfigure.install-out.txt and summarytools.install-out.txt here?

Thanks,
H.

@Yikun
Copy link

Yikun commented Feb 8, 2023

BTW, grep 'is not available for package' might be change to grep 'not available for package' (remove is), because some pkg is due to multiple pkgs (are).

@hpages
Copy link
Contributor

hpages commented Feb 9, 2023

FYI, I also uploaded the install detail before:
https://github.com/Yikun/yikun.github.com/blob/master/bioconductor-0208/products-in/kunpeng1/install/

Very useful, thanks!

So we can see multipanelfigure and summarytools is due to unavailable magick package.

Good to know. That reduces the CRAN installation failures to address to magick and rsvg only.

rsvg seems due to deps

Easy to fix: sudo apt-get install librsvg2

No idea about magic

They don't have an error message that is as clear/useful as rsvg, but it looks to me that this is a similar issue. What does dpkg-query -s libmagick++-dev say? Fix with sudo apt-get install libmagick++-dev

BTW, grep 'is not available for package' might be change to grep 'not available for package' (remove is),
because some pkg is due to multiple pkgs (are).

You're right, thanks for the catch. Looks like I didn't miss any CRAN package installation failures though, only a few Bioconductor package installation failures, so I was lucky. But good to know for next time.

If you re-run the builds (after installing the missing external libs for magick and rsvg), you should get much cleaner INSTALL results. Next step we'll focus on the remaining INSTALL failures that are ARM64 -specific i.e. that we see on your report but not on nebbiolo1.

H.

@Yikun
Copy link

Yikun commented Feb 9, 2023

# dpkg-query -s libmagick++-dev
Package: libmagick++-dev
Status: install ok installed
Priority: optional
Section: oldlibs
Installed-Size: 12
Maintainer: Ubuntu Developers <[email protected]>
Architecture: all
Source: imagemagick
Version: 8:6.9.11.60+dfsg-1.3build2
Depends: imagemagick-6-common (= 8:6.9.11.60+dfsg-1.3build2), libmagick++-6.q16-dev
Pre-Depends: dpkg (>= 1.15.7.2)
Description: object-oriented C++ interface to ImageMagick -- dummy package
 The Magick++ library was a set of C++ wrapper classes that provides access
 to the ImageMagick package functionality from within a C++ application.
 .
 This is a transitional package to help migrate systems to the new
 ABI of libmagick++-6 development files for default channel depth.
 .
 This is a dummy package.  You can safely purge or remove it.
Original-Maintainer: ImageMagick Packaging Team <[email protected]>
Homepage: https://www.imagemagick.org/

I installed them, and will re-run the builds soon:

apt instsall librsvg2-dev libmagick++-dev -y

Looks like need also add them to ?
https://github.com/Bioconductor/BBS/blob/master/Ubuntu-files/20.04/apt_cran.txt

@hpages
Copy link
Contributor

hpages commented Feb 9, 2023

Ok. Are you able to install magick and rsvg manually? Start R and do BiocManager::install(c("magick", "rsvg")).

Looks like need also add them to ?
https://github.com/Bioconductor/BBS/blob/master/Ubuntu-files/20.04/apt_cran.txt

Aren't they here already?

libmagick++-dev # for magick
and
librsvg2-dev # for rsvg

Make sure everything listed in all the files in BBS/Ubuntu-files/20.04/ is installed.

@Yikun
Copy link

Yikun commented Feb 9, 2023

Are you able to install magick and rsvg manually?

Yes, I can install successfully!

Make sure everything listed in all the files in BBS/Ubuntu-files/20.04/ is installed.

Ah, I will do a double confirm! also cc @martin-g

@hpages
Copy link
Contributor

hpages commented Feb 10, 2023

54 packages with an INSTALL failure on the latest report. That's good progress!

These failures fall in 3 categories:

  1. 19 of these packages are also failing to INSTALL on nebbiolo1 (x86) so we should ignore them. We should also ignore deprecated packages like ArrayExpressHTS. So let's focus on INSTALL failures that are specific to the Linux ARM64 platform.

  2. I see 13 Bioconductor packages that fail to INSTALL because of a configure or compilation error that seems to be specific to the Linux ARM64 platform: bgx, Rbowtie2, FLAMES, flowClust, gmapR, LEA, lpsymphony, msa, NetPathMiner, Rbec, Rbwa, Rhisat2, and SAIGEgds.

    Note that some of them (bgx, flowClust, lpsymphony, msa) fail because they use an outdated configure script that fails to recognize the Linux ARM64 platform. For example, for bgx:

    ...
    checking whether C compiler accepts -O3... yes
    checking build system type... ./config.guess: unable to guess system type
    
    This script, last modified 2006-07-02, has failed to recognize
    the operating system you are using. It is advised that you
    download the most up to date version of the config scripts from
    
      http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.guess
    and
      http://savannah.gnu.org/cgi-bin/viewcvs/*checkout*/config/config/config.sub
    ...
    

    What's interesting is that we also see a configure script error for CRAN package tiledb (which Bioconductor package TileDBArray depends on):

    ...
    checking for unistd.h... yes
    checking for tiledb/tiledb... no
    configure: error: currently unsupported system Linux on aarch64
    ERROR: configuration failed for package ‘tiledb’
    * removing ‘/home/biocbuild/bbs-3.17-bioc/R/library/tiledb’
    

    As you can see here, CRAN doesn't check packages for the Linux ARM64 platform, so, not surprisingly, the Check Results for tiledb are clean. Furthermore, tiledb is unlikely to be an isolated case, and I suspect that there are probably many more CRAN packages that fail to install on this platform.

    This actually brings the important question of whether it makes sense for Bioconductor to try to support this new platform if CRAN doesn't support it yet.

  3. Other than that, the 22 remaining INSTALL failures are indirect failures i.e. these packages fail to install because they depend on another package that failed to install (e.g. CircSeqAlignTk fails because Rbowtie2 and Rhisat2 fail).

The big number of R CMD build and R CMD check TIMEOUTs and ERRORs that you get on your report is probably due to the excessive load that you are putting on kunpeng1 by using 8 workers throughout the entire builds. Right now the load is so high that the Linux kernel is apparently constantly killing jobs randomly during the builds in a desperate effort to keep the machine alive (you can see this by the huge number of "Killed" that can be found in the CHECK logs). I suggest that you reduce the load by using the following settings (we use something similar on our Mac arm64 builder):

export BBS_NB_CPU=6         # 8 cores are available
export BBS_BUILD_NB_CPU=4   # 8 cores are available
export BBS_CHECK_NB_CPU=5   # 8 cores are available

Yes, the fact that the basic M1 chip has only 8 logical cores is a serious limitation as far as the builds are concerned 😞

H.

@Yikun
Copy link

Yikun commented Feb 11, 2023

I see 13 Bioconductor packages that fail to INSTALL because of a configure or compilation error that seems to be specific to the Linux ARM64 platform: bgx, Rbowtie2, FLAMES, flowClust, gmapR, LEA, lpsymphony, msa, NetPathMiner, Rbec, Rbwa, Rhisat2, and SAIGEgds.

Yes, failures fall in below categories, and we start to work with upstream to fix them:

This actually brings the important question of whether it makes sense for Bioconductor to try to support this new platform if CRAN doesn't support it yet.

Actually, this is a separate work already some pretty good progress with AHUG(Arm HPC User Group), from the share results we can see the problem of CRAN on aarch64 linux is nothing too serious.

cc @tom91136 @jyoung3131

I suggest that you reduce the load by using the following settings

Thanks, we already upgrade the node to a more powerful flavor 32 Core 64G Mem, current we set BBS_NB_CPU to 26, looks like we need reduce them? I'm assuming it should be 4 times the same as the total (8 --> 32)?

export BBS_NB_CPU=24         # 32 cores are available
export BBS_BUILD_NB_CPU=16   # 32 cores are available
export BBS_CHECK_NB_CPU=20   # 32 cores are available

Or could you give some more suggestion?

@hpages
Copy link
Contributor

hpages commented Feb 11, 2023

Thanks, we already upgrade the node to a more powerful flavor 32 Core 64G Mem, current
we set BBS_NB_CPU to 26, looks like we need reduce them? I'm assuming it should be 4
times the same as the total (8 --> 32)?

export BBS_NB_CPU=24         # 32 cores are available
export BBS_BUILD_NB_CPU=16   # 32 cores are available
export BBS_CHECK_NB_CPU=20   # 32 cores are available

Or could you give some more suggestion?

Probably a good start. Let's see how it goes and you will be able to adjust if needed based on the results.

@tom91136
Copy link

@Yikun Thanks for CCing me, the work for getting CRAN to build is very much active. At the current stage we've offered our dual socket TX2 nodes for build checking and there seems to be some interest from CRAN maintainers.
Although the email was sent during SC22 (Nov last year) and we were told they have very little capacity for new projects at that time.
FYI, our original plan was to get CRAN to mostly work and then extend the same offer for BioC.

During SC22 we've discussed with AWS the requirements to conduct these checks in a sustainable way as the package count is still growing.
We're currently working with them to find a open-source way of doing the daily checks.
It's in their interest to offer and showcase compute as they are pushing Graviton (AArch64 Neoverse) for EC2.

The issues you've mentioned pretty much matches what we're seeing on CRAN as well (the AHUG presentation talks about these too) :

Here's a non-exhausitive list I've compiled of bad CRAN installs on AArch64 with AlmaLinux 8 (we're doing this for HPC use cases, hence RHEL derivative)

  • tiledb =>
    NEED: stdcxx fs
    If you manage to get the correct packages to install, you get this at runtime:

       Error: package or namespace load failed for ‘tiledb’ in dyn.load(file, DLLpath = DLLpath, ...):
        unable to load shared object ‘/root/tmp/R.check/r-devel-gcc/work/libdir/00LOCK-tiledb/00new/tiledb/libs/tiledb.so’:
         /root/tmp/R.check/r-devel-gcc/work/libdir/00LOCK-tiledb/00new/tiledb/libs/tiledb.so: undefined symbol: _ZNSt10filesystem6statusERKNS_7__cxx114pathE
       Error: loading failed
       Execution halted
       ERROR: loading failed

    I've written code that uses #include <filesystem> before and it never ends up doing this so it may be a buildsystem issue.

  • float =>

      unable to load shared object ‘/root/tmp/R.check/r-devel-gcc/Work/build/Packages/00LOCK-float/00new/float/libs/float.so’:
      /root/tmp/R.check/r-devel-gcc/Work/build/Packages/00LOCK-float/00new/float/libs/float.so: undefined symbol: iparam2stage_
    
  • oolong =>

    Error in normalize(self$components, "l1") : 
      could not find function "normalize"
  • SIMLR =>

    tsne.cpp:883:64: error: too few arguments to function ‘void dgemm_(const char*, const char*, const int*, const int*, const int*, const double*, const double*, const int*, const double*, const int*, const double*, double*, const int*, size_t, size_t)’
        dgemm_("T", "N", &N, &N, &D, &a1, X, &D, X, &D, &a2, DD, &N);
    
  • patternplot =>

    Error in title(...) : 
      X11 font -adobe-helvetica-%s-%s-*-*-%d-*-*-*-*-*-*-*, face 1 at size 12 could not be loaded
    
  • lfa => error: too few arguments to function ‘dgemv_’

  • aplpack => X connection to :104 broken (explicit kill or server shutdown).

@Yikun
Copy link

Yikun commented Mar 1, 2023

Here is latest report https://yikun.github.io/bioconductor-0301/report/long-report.html

31 packages with an INSTALL failure on the latest report.

I see 13 Bioconductor packages that fail to INSTALL because of a configure or compilation error that seems to be specific to the Linux ARM64 platform: bgx, Rbowtie2, FLAMES, flowClust, gmapR, LEA, lpsymphony, msa, NetPathMiner, Rbec, Rbwa, Rhisat2, and SAIGEgds.

In last month, There are 8 packages completed the Linux arm64 fix: bgx, Rbowtie2, FLAMES, gmapR, LEA, msa, Rbwa and SAIGEgds.

(Really much thanks for @hpages guidance and package matainers review and help!)

There are still 5 fixes need to be merged in upstream:

@hpages
Copy link
Contributor

hpages commented Mar 1, 2023

Only 5 Linux ARM64 specific INSTALL failures compared to 13 three weeks ago. That's great progress!

Next we'll need to start looking at Linux ARM64 specific BUILD and CHECK failures like basilisk, biobtreeR, decoupleR, DESeq2, etc... Unfortunately there are many of those. In some cases the error seems to be due to slightly different results produced by operations that involve floating point arithmetic. These are not going to be easy to troubleshoot 😟

@martin-g
Copy link
Contributor Author

Hello,

Here is a summary of the first report for Bioconductor 3.18:

  • INSTALL errors

    • 19 packages in total
    • 7 of them also fail on x86_64 (BGmix, biomvRCNS, CancerInSilico, OmicsLonDA, qrqc, Travel, trena)
    • 12 are ARM64 specific:
      • CircSeqAlignTk (depends on Rhisat2)
      • ideal (depends on IHW) (same for macOS arm64)
      • IHW (depends on lpsymphony)
      • lpsymphony (also fails on macOS arm64)
      • Maaslin2 (depends on lpsymphony) (also fails on macOS arm64)
      • Macarron (depends on Maaslin2) (also fails on macOS arm64)
      • MMUPHin (depends on Maaslin2) (also fails on macOS arm64)
      • Rarr (fatal error: R_ext/Error.h: No such file or directory)
      • Rbec (rcpp_test.cpp:3:10: fatal error: emmintrin.h: No such file or directory)
      • rfaRm (error: Timeout was reached: [rfam.org] SSL connection timeout)
      • Rhisat2 (error: unrecognized command-line option ‘-m64’, g++: error: unrecognized command-line option ‘-msse2’)
      • vsclust (RcppExports.o: error adding symbols: file in wrong format)
  • BUILD errors

    • 83 in total
    • 31 of them fail also on x86_64 (ATACCoGAPS, baySeq, BioMM, biomvRCNS, cliqueMS, dce, deco, DMRforPairs, easier, EpiMix, fcoex, genbankr, GeneGA, gespeR, HelloRanges, HPAStainR, imageHTS, mAPKL, MEIGOR, Metab, MSstatsSampleSize, netbiov, NeuCA, OmicsLonDA, pareg, PFP, projectR, psygenet2r, ReportingTools, RNAdecay, TCseq, VplotR)
    • the rest are ARM64 only:
      • biobtreeR (sh: 1: /tmp/RtmpwnUI9L/biobtree: Exec format error) Add support for Linux ARM64 tamerh/biobtree#20
      • cfTools (PackagesNotFoundError: The following packages are not available from current channels: - python=3.7.0)
      • CHRONOS (killed)
      • CNVRanger (reason: database disk image is malformed)
      • customCMPdb (database disk image is malformed)
      • densvis (PackagesNotFoundError: The following packages are not available from current channels: - umap-learn=0.5.0)
      • DEWSeq (object 'ihw' not found)
      • DMRcate (reason: error reading from connection)
      • doppelgangR (there is no package called 'curatedOvarianData')
      • eisaR (The Rhisat2 package is required for alignments, but not installed.)
      • enrichTF (Looks like you have more than one installed BSgenome data package that matches genome: hg19)
      • exomePeak2 - Looks like you have more than one installed BSgenome data package that matches genome: hg19
      • fobitools - Timeout was reached: [www.metabolomicsworkbench.org] Resolving timed out after 10000 milliseconds
      • gcapc - Looks like you have more than one installed BSgenome data package that matches genome: hg19
      • GEOmetadb - download from 'https://gbnci.cancer.gov/geo/GEOmetadb_demo.sqlite.gz' failed
      • GEOquery - Timeout was reached: [] Operation timed out after 120000 milliseconds with 1245184 out of 8480960 bytes received
      • HiCool - PackagesNotFoundError: The following packages are not available from current channels:
        - samtools=1.16.1
        - bowtie2=2.5.0
      • ideal - ERROR: dependency ‘IHW’ is not available for package ‘ideal’
      • ImmuneSpaceR - Invalid credential or deactivated account. Check your account in the portal.
      • LowMACA - http status: 400 Bad Request Job 'clustalo-R20230509-040817-0450-86040759-p1m' is still queued
      • MADSEQ - Looks like you have more than one installed BSgenome data package that matches genome: hg19
      • maser - Error in the HTTP2 framing layer
      • mbOmic - cannot open the connection to 'http://enterotypes.org/ref_samples_abundance_MetaHIT.txt'
      • megadepth - '/tmp/Rtmp3Nnjqz/bw.mean.annotation.tsv' does not exist
      • MetaboAnnotation - database disk image is malformed
      • metaseqR2 - sh: 1: /tmp/RtmpFCRgU7/test_custom/genePredToGtf: Exec format error
      • motifmatchr - Looks like you have more than one installed BSgenome data package that matches genome: hg19
      • pipeFrame - Looks like you have more than one installed BSgenome data package that matches genome: hg19
      • QuasR - The Rhisat2 package is required for alignments, but not installed.
      • RcisTarget - cannot open URL 'https://gbiomed.kuleuven.be/apps/lcb/i-cisTarget/examples/input_files/human/peaks/Encode_GATA1_peaks.bed'
      • RGMQL - Utils: Service 'sparkDriver' could not bind on a random free port
      • scviR - ERROR: No matching distribution found for jaxlib
      • slingshot - At core/paths/dijkstra.c:364 : Weight vector must not contain NaN values, Invalid value
      • TileDBArray - Expecting an external pointer: [type=NULL]
      • TOP - there is no package called 'curatedOvarianData'
      • velociraptor - PackagesNotFoundError: The following packages are not available from current channels:
        • anndata=0.7.4
        • umap-learn=0.4.6
        • stdlib-list=0.6.0

@martin-g
Copy link
Contributor Author

martin-g commented May 11, 2023

Could someone please check the new configs for 3.18 in this PR ?
We ask because we have some doubts whether the latest report really uses Bioc 3.18.

$ R
...
BiocManager::version()
[1] ‘3.17’

Trying to re-install it still uses 3.17:

install.packages("BiocManager")
trying URL 'https://cloud.r-project.org/src/contrib/BiocManager_1.30.20.tar.gz'
Content type 'application/x-gzip' length 265248 bytes (259 KB)
==================================================
downloaded 259 KB

* installing *source* package 'BiocManager' ...
** package 'BiocManager' successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (BiocManager)

The downloaded source packages are in
	'/tmp/RtmpXa8oMh/downloaded_packages'
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
> BiocManager::version()
[1] '3.17'

Do we need to update something more ?
Thanks !

For the last run the installation of R was in /home/biocbuild/bbs-3.17-bioc/R (BBS_R_HOME=$BBS_WORK_TOPDIR/R) but this caused confusion and I moved it to $HOME/R-devel_VERSION

@jwokaty
Copy link
Collaborator

jwokaty commented May 11, 2023 via email

@martin-g
Copy link
Contributor Author

BiocManager::install(version="devel")
'getOption("repos")' replaces Bioconductor standard repositories, see
'help("repositories", package = "BiocManager")' for details.
Replacement repositories:
    CRAN: https://cloud.r-project.org
Upgrade 1456 packages to Bioconductor version '3.18'? [y/n] y

Thank you, @jwokaty !

@martin-g
Copy link
Contributor Author

I just re-tested all build failures with cause Looks like you have more than one installed BSgenome data package that matches genome: hg19 and they do not fail anymore!
I guess BiocManager::install(version="devel") has fixed the problem somehow.

For example:

 R CMD build --keep-empty-dirs --no-resave-data motifmatchr
* checking for file ‘motifmatchr/DESCRIPTION’ ... OK
* preparing ‘motifmatchr’:
* checking DESCRIPTION meta-information ... OK
* cleaning src
* running ‘cleanup’
* installing the package to build vignettes
* creating vignettes ... OK
* cleaning src
* running ‘cleanup’
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* looking to see if a ‘data/datalist’ file should be added
* building ‘motifmatchr_1.23.0.tar.gz’

biocbuild@kunpeng1 ~/git> git clone https://git.bioconductor.org/packages/MADSEQ
Cloning into 'MADSEQ'...
remote: Enumerating objects: 560, done.
remote: Counting objects: 100% (560/560), done.
remote: Compressing objects: 100% (369/369), done.
remote: Total 560 (delta 311), reused 293 (delta 171), pack-reused 0
Receiving objects: 100% (560/560), 45.60 MiB | 1.23 MiB/s, done.
Resolving deltas: 100% (311/311), done.
biocbuild@kunpeng1 ~/git> R CMD build --keep-empty-dirs --no-resave-data MADSEQ
* checking for file ‘MADSEQ/DESCRIPTION’ ... OK
* preparing ‘MADSEQ’:
* checking DESCRIPTION meta-information ... OK
* installing the package to build vignettes
* creating vignettes ... OK
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
  NB: this package now depends on R (>= 3.5.0)
  WARNING: Added dependency on R >= 3.5.0 because serialized objects in
  serialize/load version 3 cannot be read in older versions of R.
  File(s) containing such objects:
    ‘MADSEQ/inst/AQP7/hg19_AQP7_gr.RDS’
    ‘MADSEQ/inst/AQP7/hg38_AQP7_gr.RDS’
    ‘MADSEQ/inst/AQP7/hs37d5_AQP7_gr.RDS’
    ‘MADSEQ/inst/HLA/hg19_HLA_gr.RDS’ ‘MADSEQ/inst/HLA/hg38_HLA_gr.RDS’
    ‘MADSEQ/inst/HLA/hs37d5_HLA_gr.RDS’ ‘MADSEQ/inst/gap/hg19_gap_gr.RDS’
    ‘MADSEQ/inst/gap/hg38_gap_gr.RDS’ ‘MADSEQ/inst/gap/hs37d5_gap_gr.RDS’
* building ‘MADSEQ_1.27.0.tar.gz’

@emiliofernandes
Copy link

I want to thank all people involved in adding support for Linux ARM64!
In our department we are moving to ARM64 to reduce the expenses and so far everything works quite good!
I hope to see this PR merged and Linux ARM64 being part of the official test reports soon!

@markjens
Copy link

Great work! We are also considering using ARM64 machines for our work!
What are the plans for this PR ? Will Linux ARM64 reports be regularly build for 3.18 as the other platforms ?

@martin-g
Copy link
Contributor Author

Thank you for the nice words, @emiliofernandes & @markjens !
Please let us know if you face any issues related to Linux ARM64 and we will try help resolving them!

Here is a link to the latest (from 31.05.2023) run of BBS on Ubuntu ARM64 - https://yikun.github.io/latest-bioc/report/long-report.html
I also hope Linux ARM64 will be added to the official BBS runs soon !

@hpages
Copy link
Contributor

hpages commented Jun 2, 2023

@emiliofernandes @markjens We're working on adding Linux ARM64 to the official BBS runs. See #292

Thanks for your interest!

Name: kunpeng1

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
The files are copied from nebbiolo1.

According to
https://github.com/Bioconductor/BBS/blob/master/Doc/Prepare-Ubuntu-20.04-HOWTO.md#25-add-software-builds-to-biocbuilds-crontab
those should be added to crontab.
But for some reason only nebbiolo1 have them. All other builder don't
have these scripts

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
For the time being kunpeng1 won't be used as a secondary builder

Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
Signed-off-by: Martin Tzvetanov Grigorov <[email protected]>
@hpages
Copy link
Contributor

hpages commented Jun 8, 2023

First Bioconductor daily report with kunpeng2 results: https://bioconductor.org/checkResults/3.18/bioc-LATEST/long-report.html 🎉

@martin-g @Yikun Can we close this PR? It's been superseded by #293

@martin-g
Copy link
Contributor Author

martin-g commented Jun 8, 2023

Thank you all!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants