Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2 release #135

Merged
merged 205 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
205 commits
Select commit Hold shift + click to select a range
21faebb
update schema to support traits, publications, and pgs_ids
nebfield Nov 24, 2022
53de43e
fix required element (sample -> sampleset)
nebfield Nov 28, 2022
c5f4026
fix *_path -> file triplet (e.g. bed, bim, fam)
nebfield Dec 1, 2022
c370b28
setup ancestry prep
nebfield Dec 8, 2022
fcfc92f
Add high-LD exclusion regions for GRCh37/38 along with source informa…
smlmbrt Dec 8, 2022
851de75
Remove chr prefix to match our variant indexing
smlmbrt Dec 8, 2022
0976285
set up qc
nebfield Dec 8, 2022
8174273
move reference file into ancestry assets dir
nebfield Dec 8, 2022
6d42b09
Add data provenance of population reference datasets
smlmbrt Dec 8, 2022
b2e6128
refactor structure
nebfield Dec 9, 2022
245f413
automatically detect compressed input without using parameter
nebfield Dec 9, 2022
111ce3b
stop collisions in storedir
nebfield Dec 9, 2022
e2ac495
finish subworkflow
nebfield Dec 13, 2022
72fe638
integrate bootstrap subworkflow
nebfield Dec 14, 2022
9c0578b
set up pca subworkflow
nebfield Dec 14, 2022
ba6fb9e
integrate ancestry projection
nebfield Dec 15, 2022
da36436
Output key ancestry files
smlmbrt Dec 15, 2022
4778527
Output uncompressed PCA projections
smlmbrt Dec 15, 2022
971b498
Should be pruning on the reference dataset, not the target dataset (w…
smlmbrt Dec 15, 2022
66aea62
bump nextflow required version (query param bug fix needed)
nebfield Dec 16, 2022
bb1edef
PCA should only be calculated using autosomal variants from reference
smlmbrt Dec 16, 2022
956cce6
Merge branch 'ancestry' of https://github.com/PGScatalog/pgsc_calc in…
smlmbrt Dec 16, 2022
479149f
rename quality_control -> filter_reference
nebfield Dec 16, 2022
a6d2547
Fix/update incorrect groupby in report (will require utils v0.4.0)
smlmbrt Jan 10, 2023
414922a
draft awk implementation
nebfield Jan 10, 2023
afa7071
sorting while printing is much faster
nebfield Jan 11, 2023
72b6bca
A more straightforward bash script to replace the awk implementation
smlmbrt Jan 11, 2023
3944504
fix psam url
nebfield Jan 11, 2023
accd8ea
remove empty metadata
nebfield Jan 11, 2023
b8be6de
add build to output name and remove build from storeDir path
nebfield Jan 11, 2023
0e3c03e
force bash to explode loudly if anything goes wrong
nebfield Jan 12, 2023
65a99d5
fix + check for multiple files in relabel_pvar output
nebfield Jan 12, 2023
ab23005
use zstd for ref database, gz is very slow
nebfield Jan 12, 2023
5c4637c
remove uuoc + fix missing print for bims
nebfield Jan 13, 2023
20616d4
add intersect_variants module
nebfield Jan 13, 2023
b431a8f
split matching from make_compatible into match subworkflow
nebfield Jan 13, 2023
e04b1bf
fix storeDir directory path
nebfield Jan 13, 2023
cbcf7f1
fix extract
nebfield Jan 13, 2023
e0873bf
update filtering steps
nebfield Jan 13, 2023
849fcf9
get plink2_pca working
nebfield Jan 16, 2023
007b98b
fix import error
nebfield Jan 16, 2023
3a7e40a
Identify strand-ambiguous variants in matches. Remove old awk impleme…
smlmbrt Jan 16, 2023
33c8286
write and compress strand ambiguous
nebfield Jan 16, 2023
e1eb854
fix --extract input
nebfield Jan 16, 2023
fe3b720
add vmiss output to modules
nebfield Jan 16, 2023
ac320b3
add vmiss to module inputs
nebfield Jan 16, 2023
e23330c
filter extract_database output to match target build
nebfield Jan 16, 2023
35ff6d0
re-enable plink2_project
nebfield Jan 16, 2023
c5ac596
Simplify & fix variant filtering for PCA derivation
smlmbrt Jan 16, 2023
7814810
fix subworkflow input
nebfield Jan 17, 2023
aeb9056
Fix duplicated plink flag in relabelpvar
smlmbrt Jan 17, 2023
8035576
fix duplicate missing flag
nebfield Jan 17, 2023
b673d6a
how did this get here?
nebfield Jan 17, 2023
1dbc1df
add checksum to make ref db errors more obvious
nebfield Jan 17, 2023
73945ac
optionally include matched variants in match_combine process input
nebfield Jan 18, 2023
16a55e0
intersect requires higher memory to avoid crashing
smlmbrt Jan 18, 2023
1a18e1c
Merge branch 'ancestry' of https://github.com/PGScatalog/pgsc_calc in…
smlmbrt Jan 18, 2023
83aaa20
only extract current target build
nebfield Jan 19, 2023
f0e4322
make projection ignore extra chroms
nebfield Jan 19, 2023
0f24631
only set scratch on HPCs to simplify debugging
nebfield Jan 20, 2023
3cc7173
cut intersected variants properly
nebfield Jan 20, 2023
5d4cebe
better caching
nebfield Jan 20, 2023
1b31d1d
fix null tag
nebfield Jan 20, 2023
8b9644f
check match_combine always runs (--only_projection failed randomly)
nebfield Jan 20, 2023
c98e46e
skip header in filter ID file
nebfield Jan 20, 2023
ccee200
fix combining when skipping ancestry projection
nebfield Jan 20, 2023
a9889cc
enable skipping ancestry
nebfield Jan 20, 2023
3f68b52
fix chain files missing from reference db
nebfield Jan 20, 2023
ff8d003
check ancestry input isn't split
nebfield Jan 20, 2023
b009a21
Initial commit of python helper script to swap IDs between datasets (…
smlmbrt Jan 20, 2023
79e55b6
Remove multi-allelics from the PCA variant set (causing mis-alignment…
smlmbrt Jan 24, 2023
b4b3d6f
Fix input
smlmbrt Jan 24, 2023
63a0248
Fix escape character in nextflow module
smlmbrt Jan 26, 2023
2db340c
Increase # of PCs calculated to 20 (ToDO: consider making this a modi…
smlmbrt Jan 27, 2023
300c6bb
Limit matching to a specific chromosome in the REFERENCE dataset. (Us…
smlmbrt Jan 27, 2023
74c49b1
Ability to subset reference dataset to specific chromosome (speeds up…
smlmbrt Jan 30, 2023
7656d20
centralise module container definitions with labels
nebfield Jan 31, 2023
4c855ef
add --only_match parameter
nebfield Jan 31, 2023
a75de32
Merge branch 'main' into ancestry
nebfield Jan 31, 2023
5ad9910
prevent unhelpful error message if workflow fails for any reason
nebfield Feb 1, 2023
1881901
fix incompatible option with --only_input
nebfield Feb 1, 2023
89df509
prevent output collisions in filter_variants
nebfield Feb 1, 2023
a2b6b02
Capture variants used in plink scoring commands
smlmbrt Feb 1, 2023
fd6bbc0
fix filter variants
nebfield Feb 2, 2023
9e278a4
add meta to output
nebfield Feb 2, 2023
7d5bc85
update channels to handle split data across samplesets
nebfield Feb 2, 2023
606ecd2
remove error check for split data
nebfield Feb 2, 2023
8c239af
add chrom to intersect_variants meta map
nebfield Feb 2, 2023
2fb9f18
use a consistent container definition across all packages
nebfield Feb 2, 2023
24b8dd2
add build to output
nebfield Feb 2, 2023
ce9e807
drop meta from pca output (build isn't important)
nebfield Feb 2, 2023
4384167
add relabel_id module
nebfield Feb 2, 2023
bbc13bf
add support for a list of mapping files
nebfield Feb 2, 2023
c169ba0
drop chrom from meta (broke groupTuple)
nebfield Feb 2, 2023
0d41e56
fix updating map in output
nebfield Feb 3, 2023
eacb862
branch relabelled id output
nebfield Feb 3, 2023
8b4eb3c
set up multi-chrom target projection input channel
nebfield Feb 3, 2023
5b3c11b
update projection with target genomes split by chromosome
nebfield Feb 3, 2023
27d8242
test singularity unit test
nebfield Feb 3, 2023
e41473a
Improve CI workflow (#83)
nebfield Feb 9, 2023
366b4f6
fix match_combine with --skip_ancestry
nebfield Feb 9, 2023
7d9bf65
migrate relabel_ids to pgscatalog_utils
nebfield Feb 9, 2023
d667660
fix jobs that got mangled from merge conflict
nebfield Feb 9, 2023
d545960
drop CI from PR
nebfield Feb 9, 2023
b20c11e
fix indentation
nebfield Feb 9, 2023
46ef73e
add CI to PRs on dev & main
nebfield Feb 9, 2023
287ead0
emit extracted reference data from projection subworkflow
nebfield Feb 10, 2023
332a3d1
fix reference genome input to apply_score
nebfield Feb 10, 2023
48f5414
fix apply_score input channel with --skip_ancestry
nebfield Feb 13, 2023
bc3f5cb
check make_compatible subworkflow produces output
nebfield Feb 13, 2023
9a7fb79
Initial commit of file to process ancestry data
smlmbrt Feb 13, 2023
7722ae1
Merge branch 'ancestry' of https://github.com/PGScatalog/pgsc_calc in…
smlmbrt Feb 13, 2023
49b53a9
Use zstd image because plink module doesn't contain join on singularity
smlmbrt Feb 13, 2023
c23e281
reduce zstd docker image from 450mb -> 94mb
nebfield Feb 13, 2023
3134bf8
fix version extraction
nebfield Feb 13, 2023
b96dbbe
reduce image size
nebfield Feb 13, 2023
100ca64
bump pgscatalog_utils version
nebfield Feb 13, 2023
3f6836e
stage intersection files in apply_score subworkflow
nebfield Feb 13, 2023
ced351d
add chrom to intersection meta
nebfield Feb 13, 2023
b63e3ae
add ps utility to new docker images
nebfield Feb 13, 2023
6976422
Change software version extraction to reflect zstd image with base tools
smlmbrt Feb 14, 2023
e7d8a86
Needs to filter using ID_TAREGT (col 7) instead of the reference IDs
smlmbrt Feb 15, 2023
4ae1603
fix error when projecting autosomes
nebfield Feb 16, 2023
8392e0f
remove make_compatible check because it hides other errors
nebfield Feb 16, 2023
83655c0
improve relabelling scorefiles
nebfield Feb 16, 2023
6d65fb6
add chrom to map output
nebfield Feb 16, 2023
7b970ad
score reference data
nebfield Feb 16, 2023
306ebb8
get correlation test working with ancestry options
nebfield Feb 17, 2023
ea65111
fix test discovery
nebfield Feb 17, 2023
d66fde0
fix test execution (but it fails)
nebfield Feb 17, 2023
20901e3
always score reference genomes with a combined scoring file
nebfield Feb 20, 2023
b4e3ab7
fix scoring reference with multiple combined files (e.g. duplicates)
nebfield Feb 21, 2023
dc9fe8f
Stage prelim outputs
smlmbrt Feb 16, 2023
876d2e0
Stage ancestry outputs in results directory
smlmbrt Feb 21, 2023
dc7442a
always output split and combined when relabelling
nebfield Feb 21, 2023
1557077
fix ancestry projection
nebfield Feb 21, 2023
fe6c6a3
fix apply_score with new relabel_id behaviour
nebfield Feb 21, 2023
52aff04
add tag to plink2_pca
nebfield Feb 21, 2023
1efd67c
delete mispelled groovy library file
nebfield Feb 24, 2023
83620d3
add groovy functions to Utils
nebfield Feb 24, 2023
2b4a7b1
update convenience functions
nebfield Feb 27, 2023
b228665
use the groovy convenience functions
nebfield Feb 27, 2023
c7b1d1b
fix pca projection using relabelled files on reference data
nebfield Feb 28, 2023
5960f9d
preserve symlinks when copying genotypes/samples after relabelling
nebfield Feb 28, 2023
8ed399b
Merge branch 'main' into dev
nebfield Mar 2, 2023
3e275fb
Implement `ancestry_oadp` subworkflow (#95)
nebfield May 2, 2023
d4187a0
compile report for each samplesheet
nebfield May 2, 2023
946fb78
skip ancestry subworkflows by default
nebfield May 2, 2023
267f66e
Updated N_PCs for PCA normalization based on PMID:35710995
smlmbrt May 3, 2023
4e8a881
fix per-sampleset report
nebfield May 3, 2023
7cb9b04
don't error if non-autosomes appear
nebfield May 4, 2023
cf9efe5
output allelic frequencies from filtered reference data
nebfield May 4, 2023
2da504a
disable mean imputation
nebfield May 4, 2023
f0963d5
load allelic frequencies when ancestry subworkflows are run
nebfield May 4, 2023
1df54d4
Update README.md
nebfield May 5, 2023
c1ca990
remove old test
nebfield May 5, 2023
a87462c
relabel afreq files ref -> target
nebfield May 10, 2023
b394995
Preload container images before running tests (#103)
nebfield May 10, 2023
9e61aa4
refactor --ref parameter -> run_ancestry
nebfield May 11, 2023
500cb63
fix report not running and staging results out
nebfield May 15, 2023
d64fc2b
make sure report file is written if subworkflow runs
nebfield May 15, 2023
0741870
fix apply_score subworkflow test
nebfield May 15, 2023
d2b5be4
add docker performance to docs
nebfield May 15, 2023
214d891
add ancestry parameter to report
nebfield May 15, 2023
5c65305
Proper section headings
smlmbrt May 17, 2023
3c9d372
Placholder PCA plot: needs better labelling, and a table of the ances…
smlmbrt May 17, 2023
21cfd9d
Placholder PCA plot: plot multiple pairs
smlmbrt May 17, 2023
e7f9ec3
Typo: axis swap
smlmbrt May 17, 2023
18d417c
refactor to remove multiple sampleset support
nebfield May 16, 2023
d419860
set up thousand genomes colour palette for pca
nebfield May 18, 2023
049c59e
add population similarity table
nebfield May 18, 2023
1acf285
fix density plot for ancestry data
nebfield May 18, 2023
4256802
fix for custom ancestry
nebfield May 30, 2023
8e8c374
fix intersecting bims
nebfield Jun 29, 2023
688a113
set up ancestry test
nebfield Jul 18, 2023
0bb5854
add ancestry action
nebfield Jul 18, 2023
211c962
fix action
nebfield Jul 18, 2023
51579b6
fix cache miss
nebfield Jul 19, 2023
e114c85
bump version
nebfield Jul 19, 2023
7f6c56c
Use reusable workflows in CI (#125)
nebfield Jul 20, 2023
b907f96
make --hg19_chain and --hg38_chain optional
nebfield Jul 21, 2023
82a9891
update docs
nebfield Jul 21, 2023
7e15351
fix test
nebfield Jul 21, 2023
6f236f0
add link to docs
nebfield Jul 21, 2023
7743cdd
Merge pull request #126 from PGScatalog/chain
smlmbrt Jul 21, 2023
2d1a163
Report edits for v2 (#128)
smlmbrt Aug 1, 2023
083fd0a
update pgscatalog_utils container reference
nebfield Aug 1, 2023
a9113b1
fix container reference
nebfield Aug 1, 2023
85f1286
fix intersect count with multiple chromosomes
nebfield Aug 2, 2023
5517024
Fix conda profile (#136)
nebfield Aug 2, 2023
432e9d8
Increase stringency of variant selection for reference panel PCA (mai…
smlmbrt Aug 3, 2023
54ca464
Lift to .config
smlmbrt Aug 3, 2023
ebd7814
add conda test
nebfield Aug 3, 2023
ffd749a
fix trigger
nebfield Aug 3, 2023
b244573
Merge pull request #138 from PGScatalog/improve_pca
smlmbrt Aug 4, 2023
eed3ffd
add --error-on-freq-calc to improve scoring efficiency (#139)
nebfield Aug 7, 2023
ff9dc9f
Update docs for ancestry release (#105)
nebfield Aug 7, 2023
b3c9c01
fix link
nebfield Aug 7, 2023
259de04
fix index
nebfield Aug 7, 2023
5e17360
fix development statement
nebfield Aug 7, 2023
623d776
Small edits
smlmbrt Aug 7, 2023
64d88c6
Additional figure to visualize changes in PGS distributions across PC…
smlmbrt Aug 7, 2023
e6f052b
Note to check PCA in outputs page
smlmbrt Aug 7, 2023
9d77e80
Additional context for population reference panel and labels
smlmbrt Aug 7, 2023
03a4d64
Update zenodo to the permanent doi that will always resolve
smlmbrt Aug 7, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
148 changes: 148 additions & 0 deletions .github/workflows/ancestry.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
name: Run ancestry test with singularity or docker profiles

on:
workflow_call:
inputs:
container-cache-key:
type: string
required: true
ancestry-cache-key:
type: string
required: true
docker:
type: boolean
singularity:
type: boolean

env:
NXF_SINGULARITY_CACHEDIR: ${{ github.workspace }}/singularity
SINGULARITY_VERSION: 3.8.3

jobs:
docker:
if: ${{ inputs.docker }}
runs-on: ubuntu-latest

steps:
- name: Set environment variables
run: |
echo "ANCESTRY_REF_DIR=$RUNNER_TEMP" >> $GITHUB_ENV
echo "ANCESTRY_TARGET_DIR=$RUNNER_TEMP" >> $GITHUB_ENV

- name: Check out pipeline code
uses: actions/checkout@v3

- uses: nf-core/setup-nextflow@v1

- name: Restore docker images
id: restore-docker
uses: actions/cache/restore@v3
with:
path: ${{ runner.temp }}/docker
key: ${{ inputs.container-cache-key }}
fail-on-cache-miss: true

- name: Load docker images from cache
run: |
find $HOME -name '*.tar'
find ${{ runner.temp }}/docker/ -name '*.tar' -exec sh -c 'docker load < {}' \;

- name: Restore reference data
uses: actions/cache/restore@v3
with:
path: |
${{ env.ANCESTRY_TARGET_DIR }}/GRCh38_HAPNEST_TARGET_ALL.pgen
${{ env.ANCESTRY_TARGET_DIR }}/GRCh38_HAPNEST_TARGET_ALL.psam
${{ env.ANCESTRY_TARGET_DIR }}/GRCh38_HAPNEST_TARGET_ALL.pvar.zst
${{ env.ANCESTRY_REF_DIR }}/GRCh38_HAPNEST_reference.tar.zst
key: ${{ inputs.ancestry-cache-key }}
fail-on-cache-miss: true

- name: Set up test requirements
uses: actions/setup-python@v3
with:
python-version: '3.10'
cache: 'pip'

- run: pip install -r ${{ github.workspace }}/tests/requirements.txt

- name: Run ancestry test
run: TMPDIR=~ PROFILE=docker pytest --kwdof --symlink --git-aware --wt 2 --tag "ancestry" --ignore tests/bin

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v3
with:
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*

singularity:
if: ${{ inputs.singularity }}
runs-on: ubuntu-latest

steps:
- name: Set environment variables
run: |
echo "ANCESTRY_REF_DIR=$RUNNER_TEMP" >> $GITHUB_ENV
echo "ANCESTRY_TARGET_DIR=$RUNNER_TEMP" >> $GITHUB_ENV

- name: Check out pipeline code
uses: actions/checkout@v3

- uses: nf-core/setup-nextflow@v1

- name: Restore singularity setup
id: restore-singularity-setup
uses: actions/cache@v3
with:
path: /opt/hostedtoolcache/singularity/${{ env.SINGULARITY_VERSION }}/x64
key: ${{ runner.os }}-singularity-${{ env.SINGULARITY_VERSION }}
fail-on-cache-miss: true

- name: Add singularity to path
run: |
echo "/opt/hostedtoolcache/singularity/${{ env.SINGULARITY_VERSION }}/x64/bin" >> $GITHUB_PATH

- name: Restore singularity container images
id: restore-singularity
uses: actions/cache@v3
with:
path: ${{ env.NXF_SINGULARITY_CACHEDIR }}
key: ${{ inputs.container-cache-key }}

- name: Restore reference data
uses: actions/cache/restore@v3
with:
path: |
${{ env.ANCESTRY_TARGET_DIR }}/GRCh38_HAPNEST_TARGET_ALL.pgen
${{ env.ANCESTRY_TARGET_DIR }}/GRCh38_HAPNEST_TARGET_ALL.psam
${{ env.ANCESTRY_TARGET_DIR }}/GRCh38_HAPNEST_TARGET_ALL.pvar.zst
${{ env.ANCESTRY_REF_DIR }}/GRCh38_HAPNEST_reference.tar.zst
key: ${{ inputs.ancestry-cache-key }}
fail-on-cache-miss: true

- name: Set up test requirements
uses: actions/setup-python@v3
with:
python-version: '3.10'
cache: 'pip'

- run: pip install -r ${{ github.workspace }}/tests/requirements.txt

- name: Run ancestry test
run: TMPDIR=~ PROFILE=singularity pytest --kwdof --symlink --git-aware --wt 2 --tag "ancestry" --ignore tests/bin

- name: Upload logs on failure
if: failure()
uses: actions/upload-artifact@v3
with:
name: logs-singularity-ancestry
path: |
/home/runner/pytest_workflow_*/*/.nextflow.log
/home/runner/pytest_workflow_*/*/log.out
/home/runner/pytest_workflow_*/*/log.err
/home/runner/pytest_workflow_*/*/output/*
Loading