Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs for ancestry release #105

Merged
merged 42 commits into from
Aug 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
6c58cd1
add plink2 explanation
nebfield May 15, 2023
700c6d9
docs review
nebfield Jul 17, 2023
f8c8892
be clear about --max_cpus and --max_memory
nebfield Jul 17, 2023
f5379ed
Update README.md
nebfield Jul 20, 2023
fc91a66
minor fixes
nebfield Jul 18, 2023
3972d5c
ancestry how-to
nebfield Jul 21, 2023
6ffcb17
update docs
nebfield Jul 24, 2023
eedce61
make downloading clearer
nebfield Jul 24, 2023
a565785
Edit TOC, plink2, and name of ancestry doc
smlmbrt Jul 26, 2023
5da4b8a
Edit plink2 + link
smlmbrt Jul 26, 2023
50aa80a
Edit results dirs
smlmbrt Jul 26, 2023
f0975ec
Add motivation (test figure caption insertion)
smlmbrt Jul 26, 2023
035c3cb
Try to fix figure caption
smlmbrt Jul 26, 2023
6559eb7
Try to fix figure caption 2
smlmbrt Jul 26, 2023
f1e8459
Add new figure with ancestry adjustment methods (using the new labels)
smlmbrt Jul 26, 2023
f7ea9e3
Change size of figures
smlmbrt Jul 26, 2023
a50cba2
Test citations
smlmbrt Jul 27, 2023
470ec0b
Change citations to numbered footnotes
smlmbrt Jul 27, 2023
b09556a
Add more explanation
smlmbrt Jul 27, 2023
19da777
Edits to previous sections
smlmbrt Jul 28, 2023
6ebde5e
Scaffold implementation
smlmbrt Jul 28, 2023
682675b
More implementation details
smlmbrt Jul 28, 2023
9fd6adb
More implementation details
smlmbrt Jul 28, 2023
9d4c42b
List formatting (very tedious spacing requirements)
smlmbrt Jul 28, 2023
80deecf
List formatting (very tedious spacing requirements) v2
smlmbrt Jul 28, 2023
c114781
ToDo: add back interpretation section (e.g. example analysis in UKB)
smlmbrt Jul 28, 2023
070ed0f
Update README.md
nebfield Jul 31, 2023
caa875f
Update index.rst
nebfield Jul 31, 2023
2bc9a87
Edits
smlmbrt Jul 31, 2023
8fef8a3
Citation edits
smlmbrt Jul 31, 2023
a888728
Edits to report/output page
smlmbrt Jul 31, 2023
66cc378
Edit to html description
smlmbrt Jul 31, 2023
91e6c2e
Added extra column descriptions for the output tables (variant matching)
smlmbrt Jul 31, 2023
3fd0a0f
Additional edits
smlmbrt Aug 1, 2023
c9343cb
More doc edits
smlmbrt Aug 1, 2023
7b9975a
Edits from Mike
smlmbrt Aug 2, 2023
ebefbba
Citation edits
smlmbrt Aug 2, 2023
af2199d
Add information about feedback & link to discussion board
smlmbrt Aug 3, 2023
456025f
Additional information re: PCA to reflect new defaults and future imp…
smlmbrt Aug 4, 2023
6b5d6ca
Bring nextflow_schema.json up to date with origin/dev
smlmbrt Aug 4, 2023
054951d
Update changelog.rst
smlmbrt Aug 6, 2023
edfadd8
Update geneticancestry.rst
nebfield Aug 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,12 @@
* [PGS Catalog API](https://pubmed.ncbi.nlm.nih.gov/33692568/)
> Lambert SA, Gil L, Jupp S, Ritchie SC, Xu Y, Buniello A, McMahon A, Abraham G, Chapman M, Parkinson H, Danesh J. The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation. Nature Genetics. 2021 Apr;53(4):420-5. doi: 10.1038/s41588-021-00783-5. PubMed PMID: 33692568.

* [PLINK 1](https://pubmed.ncbi.nlm.nih.gov/17701901/)
> Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, De Bakker PI, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and population-based linkage analyses. The American journal of human genetics. 2007 Sep 1;81(3):559-75. doi: 10.1086/519795. PubMed PMID: 17701901. PubMed Central PMCID: PMC1950838.

* [PLINK 2](https://pubmed.ncbi.nlm.nih.gov/25722852/)
> Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015 Dec 1;4(1):s13742-015. doi: 10.1186/s13742-015-0047-8. PubMed PMID: 25722852. PubMed Central PMCID: PMC4342193.

* [FRAPOSA](https://pubmed.ncbi.nlm.nih.gov/32196066/)
> Zhang, D., et al. (2020) Fast and robust ancestry prediction using principal component analysis. Bioinformatics 36(11):3439–3446. https://doi.org/10.1093/bioinformatics/btaa152

## Software packaging/containerisation tools

* [Anaconda](https://anaconda.com)
Expand Down
21 changes: 12 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7577371.svg)](https://doi.org/10.5281/zenodo.7577371)

[![Nextflow](https://img.shields.io/badge/nextflow%20DSL2-≥22.10.0-23aa62.svg?labelColor=000000)](https://www.nextflow.io/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with docker](https://img.shields.io/badge/run%20with-docker-0db7ed?labelColor=000000&logo=docker)](https://www.docker.com/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1d355c.svg?labelColor=000000)](https://sylabs.io/docs/)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)

## Introduction

Expand All @@ -19,23 +19,26 @@ and/or user-defined PGS/PRS.
## Pipeline summary

<p align="center">
<img width="70%" src="https://user-images.githubusercontent.com/11425618/195053396-a3eaf31c-b3d5-44ff-a36c-4ef6d7958668.png">
<img width="80%" src="https://github.com/PGScatalog/pgsc_calc/assets/11425618/f766b28c-0f75-4344-abf3-3463946e36cc">
</p>

The workflow performs the following steps:

* Downloading scoring files using the PGS Catalog API in a specified genome build (GRCh37 and GRCh38).
* Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
* Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
- Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
* Calculates PGS for all samples (linear sum of weights and dosages)
* Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)

### Features in development
And optionally:

- Genetic Ancestry: calculate similarity of target samples to populations in a
reference dataset ([1000 Genomes (1000G)](http://www.nature.com/nature/journal/v526/n7571/full/nature15393.html)), using principal components analysis (PCA)
- PGS Normalization: Using reference population data and/or PCA projections to report
individual-level PGS predictions (e.g. percentiles, z-scores) that account for genetic ancestry

- *Genetic Ancestry*: calculate similarity of target samples to populations in a
reference dataset (e.g. [1000 Genomes (1000G)](http://www.nature.com/nature/journal/v526/n7571/full/nature15393.html),
[Human Genome Diversity Project (HGDP)](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7115999/)) using principal components analysis (PCA).
- *PGS Normalization*: Using reference population data and/or PCA projections to report
individual-level PGS predictions (e.g. percentiles, z-scores) that account for genetic ancestry.
See documentation for a list of planned [features under development](https://pgsc-calc.readthedocs.io/en/latest/index.html#Features-under-development).

## Quick start

Expand Down Expand Up @@ -87,7 +90,7 @@ manuscript describing the tool is in preparation. In the meantime if you use the
tool we ask you to cite the repo and the paper describing the PGS Catalog
resource:

- >PGS Catalog Calculator _(in development)_. PGS Catalog
- >PGS Catalog Calculator _(preprint forthcoming)_. PGS Catalog
Team. [https://github.com/PGScatalog/pgsc_calc](https://github.com/PGScatalog/pgsc_calc)
- >Lambert _et al._ (2021) The Polygenic Score Catalog as an open database for
reproducibility and systematic evaluation. Nature Genetics. 53:420–425
Expand Down
7 changes: 4 additions & 3 deletions docs/_templates/globaltoc.html
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ <h3>Contents</h3>
<li><a href="{{ pathto('how-to/index') }}">How-to guides</a></li>
<li><a href="{{ pathto('reference/index') }}">Reference guides</a></li>
<ul>
<li><a href="{{ pathto('reference/params') }}">Input Parameters/Flags</a></li>
<li><a href="{{ pathto('reference/input') }}">Samplesheet schema</a></li>
<li><a href="{{ pathto('reference/containers') }}">Containers</a></li>
<li><a href="{{ pathto('reference/params') }}">Workflow parameters</a></li>
</ul>
<li><a href="{{ pathto('explanation/index') }}">Explanations</a></li>
<ul>
<li><a href="{{ pathto('explanation/plink2') }}">Why not use plink2?</a></li>
<li><a href="{{ pathto('explanation/geneticancestry') }}">Adjusting PGS with genetic ancestry</a></li>
<li><a href="{{ pathto('explanation/output') }}">Outputs & report</a></li>
</ul>
<li><a href="{{ pathto('troubleshooting') }}">Troubleshooting</a></li>
Expand All @@ -32,6 +32,7 @@ <h3>Useful links</h3>
<li><a href="https://github.com/pgscatalog/pgsc_calc">pgsc_calc Github</a></li>
<ul>
<li><a href="https://github.com/pgscatalog/pgsc_calc/issues">Issue tracker</a></li>
<li><a href="https://github.com/PGScatalog/pgsc_calc/discussions">Discussion board</a></li>
</ul>
<li><a href="https://github.com/PGScatalog/pgscatalog_utils">pgscatalog_utils Github</a></li>
</ul>
Expand Down
22 changes: 20 additions & 2 deletions docs/changelog.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,25 @@ will only occur in major versions with changes noted in this changelog.

.. _`semantic versioning`: https://semver.org/

pgsc_calc v1.3.2 (2022-01-27)
pgsc_calc v2.0.0 (2023-08-08)
-----------------------------

This major release features breaking changes to samplesheet structure to provide
more flexible support for extra genomic file types in the future. Two major new
features were implemented in this release:

- Genetic ancestry group similarity is calculated to a population reference panel
(default: 1000 Genomes) when the ``--run_ancestry`` flag is supplied. This runs
using PCA and projection implemented in the ``fraposa_pgsc (v0.1.0)`` package.
- Calculated PGS can be adjusted for genetic ancestry using empirical PGS distributions
from the most similar reference panel population or continuous PCA-based regressions.

These new features are optional and don't run in the default workflow. Other features
included in the release are:

- Speed optimizations for PGS scoring (skipping allele frequency calculation)

pgsc_calc v1.3.2 (2023-01-27)
-----------------------------

This patch fixes a bug that made some PGS Catalog scoring files incompatible
Expand All @@ -18,7 +36,7 @@ reporting the problem.

.. _`@j0n-a`: https://github.com/PGScatalog/pgsc_calc/issues/79

pgsc_calc v1.3.1 (2022-01-24)
pgsc_calc v1.3.1 (2023-01-24)
-----------------------------

This patch fixes a bug that breaks the workflow if all variants in one or more
Expand Down
Loading