Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds support for GA4GH's htsget protocol. In order to test the server out I've used our own htsget server @umccr, htsget-rs, like so:
$ docker run --platform linux/amd64 -p 8081:8081 -p 8080:8080 -v $HOME/dev/umccr/sage-data/sample_data:/data/bam ghcr.io/umccr/htsget-rs:latest
And then running the following commandline SAGE instantiation:
Please note the
htsget://
URIs in-reference_bam
and-tumor_bam
. That's the addition on this pullrequest: being able to access resources remotely, not based on a local filesystem. This change has been targeted for SAGE, but there's no reason to believe that it couldn't be applied to (all?) the other tools present in this repo, extending the distributed storage benefit to all your toolchain (and oncoanalyser).The command line arguments above (and the
-v
arguments on the docker container) assume both big and private data stored insage-data
that I'll not be able to share publicly, but I hope that you can reproduce it under your premises? I found it hard to put together a minimal integration test for this since it involves quite big files. On the unit/functional side, I'm assuming that there's enough test coverage on htsget from htsjdk, but I'd be happy to take guidance on tests you might see lacking in this PR.It would be preferable to extend this htsget support for VCF files as well as BAM files, but unfortunately there's no support in the
htsjdk
library for it at the time of writing this, /cc @lbergelson, @cmnbroad.Thanks @scwatts @ohofmann @reisingerf @mmalenic for making this possible!