Releases · EBIvariation/CMAT

02 Jun 13:43

tskir

v2.2.1

194063f

v2.2.1: Minor updates for the 21.06 submission

Update ClinVar investigations report (#243)
Bump OT schema version to 2.0.9

This was the code version used to process the 21.06 submission.

Assets 2

26 May 07:19

tskir

v2.2.0

df902ec

v2.2.0: Data and operational updates for batch 21.06; Major test suite revamp

Data updates

More strict duplication checks for evidence strings and corresponding documentation updates (#229 by @M-casado)
Flatten cohortPhenotypes representation to include all names instead of only the primary ones (#238 by @apriltuesday)
Report all evidence regardless of ontology mapping status (#239 by @apriltuesday)
Remove nonspecific allele origin values from the evidence strings. Report evidence strings even if they have no allele origin values, assumed to be germline by default (#240 by @apriltuesday)

Operational updates and bug fixes

Strip white spaces from ontology identifiers (#216 by @afoix)
Update FTP upload documentation (#219 by @afoix)
Fixes for new OMIM identifier format in ClinVar (#230 by @apriltuesday)
Bump Open Targets schema to v2.0.8 (#241 by @apriltuesday, #242 by @tskir)

Test suite revamp

Update VEP pipeline tests to use XML input (#217 by @apriltuesday)
Update tests to support VEP 104 (#225 by @apriltuesday)
Migrate the tests to GitHub actions (#227 by @afoix)
Improve and unify testing (#228 by @apriltuesday)
Fix VEP tests and GitHub actions (#234 by @apriltuesday)

Assets 2

01 Apr 20:06

tskir

v2.1.0

8a6aab5

v2.1.0: Updated ClinVar model investigations; Quality control system revamp

This release leaves the actual evidence strings unchanged compared to v2.0.2, but introduces other important changes:

#208 Significant updates to the ClinVar data model investigation scripts & the resulting report
#212, #214 Major refactor of the quality control system and the associated spreadsheet
#213 Update the workflow diagram to reflect changes in v2.0.0...v2.0.2 of the pipeline.

The reason for the minor version change is that the quality control metrics are now more precise and not always comparable to the metrics generated previously.

Assets 2

22 Mar 13:13

tskir

v2.0.2

c8d8cb5

v2.0.2: Corrections and updates for the Open Targets batch 2021.04

Resolves several issues with evidence strings loss in v2.0.0 and v2.0.1 compared to v1.3.2:

All ClinVar traits are now processed instead of only “Disease” type traits.
All names of a ClinVar trait are now used for looking up the corresponding ontology term. Previously, only the preferred name was used, which does not always correspond to the one in the string-to-ontology mapping database.
Reintroduced processing of mitochondrial variants and variants containing IUPAC ambiguity bases. They were previously skipped due to not being supported by the Open Targets schema.

Technical changes:

Updated the Open Targets schema version: 2.0.5 → 2.0.6.
Updated test data and assertions to fix some inconsistencies which weren't spotted in v2.0.0 and v2.0.1 releases.

Assets 2

15 Mar 13:30

tskir

v2.0.1

8f7856c

v2.0.1: Evidence string duplication, literature references, and ontology mapping adjustments

Version 2.0.1 addresses three groups of issues.

Evidence string duplication
- Verify that all known problems have been resolved (#185).
- Introduce additional checks prior to submission (#178, #188).
Processing of PubMed references
- Investigate the three types of ClinVar literature references and provide a report (#166, #192).
- Clarify the different types of PubMed references in the ClinVar XML parser class (#182).
Handling string to ontology mappings
- Verified that the preferred trait names are used consistently across the pipeline (#177).
- Verified that multiple string-to-ontology mappings are consistently supported across the pipeline, fixed a minor bug and amended documentation (#115).
- Prevented non-specific terms like “disease” from reappearing in the manual curation results (#179).
- Fixed a bug in construction of MONDO IRIs from ClinVar data (#175).

ClinVar input rewrite

All components of the pipeline now use the comprehensive XML data dump from ClinVar as input. The use of VCF and TSV summary files has been discontinued. This should make the results more consistent and comprehensive.

This is made possible by the new clinvar_xml_utils module, which provides a Python interface to work with ClinVar data. External users with similar goals are welcome to also try it out.

Repeat expansion pipeline refactor

Under the new approach, the following Microsatellite records are considered repeat expansion events:

Variants with explicit allele sequences which represent insertions of 12 bases or more;
Variants without explicit allele sequences, the HGVS-like notation of which does not represent a deletion.

The old approach was essentially confined to category (2). As a result, the number of repeat expansion consequences processed is now larger by approximately a factor of 6.

JSON schema migration

The pipeline output was migrated to accommodate the new major version of the Open Targets JSON schema, 2.0.5 (up from 1.7.5), described and discussed in detail in #189.

Other changes

Substantial refactoring and documentation updates under the hood.
Copy of the JSON schema is no longer stored in the repository and fetched on the fly instead.
Manual curation protocol now includes a “Notes” column, which stores the “NT expansion” annotation without replacing the trait frequency.
Removed a number of unused modules, including the old ClinVar XML parser written in Java.

Assets 2

22 Jan 07:08

tskir

v1.3.2

a2fd920

v1.3.2: Minor updates for the 2021.02 batch

Migrated to Open Targets schema version 1.7.5.

Assets 2

21 Oct 07:41

tskir

v1.3.1

b1f78a1

v1.3.1: Minor updates for the 2020.11 batch

Evidence string related changes
- Migrated to Open Targets schema version 1.7.3.
- Minor updates to the evidence string generation review checklist.
- Evidence string name format changed from DD-MM-YYYY to YYYY-MM-DD.
Other changes
- Minor fixes to the manual curation protocol to ensure stable sort order.
- ClinVar data examination script now calculates distributions of allele origins as well.

The latest ClinVar version with which this pipeline will work is 2020/08. After that, the variant_summary.tsv format has changed so that it does not include a “NT expansion” category anymore.

Assets 2

16 Sep 14:42

tskir

v1.3

e657989

v1.3: Process additional ClinVar attributes

These changes introduce additional ClinVar attributes into the evidence strings, in preparation for implementing a better and more comprehensive scoring mechanism. All changes affect both genetic_association and somatic_mutation evidence strings.

#146 Report records with all clinical significance levels
- Removed filtering by clinical significance throughout the pipeline.
- Format and process the clinical significance levels according to the new schema, allowing multiple values per record.
- Removed the obsolete target.activity attribute.
- Always set the evidence.gene2variant.is_associated and evidence.variant2disease.is_associated fields to True.
#148 Add ClinVar star rating and review status
- Add star rating, which ranges from 0 to 4.
- Add review status, e.g. criteria provided, conflicting interpretations.
#149 Add mode of inheritance
- Reported as strings verbatim from ClinVar and not additionally processed.
- This field will contain an array, even when there is only one mode of inheritance (which is true for the majority of all records), for consistency between all records.
#150 Add last evaluated date
- This fields tracks the timestamp of the most recent clinically meaningful update of the record: essentially, the latest (re)evaluation of the clinical significance level.

Assets 2

11 Aug 14:54

tskir

v1.2

681c804

v1.2: Technical improvements and bug fixes

#138 Refactor approach for submitting and reusing ZOOMA feedback
- Now the trait-to-ontology mappings from previous iterations of manual curation are reused directly, rather than relying on files for ZOOMA feedback, and also the feedback files themselves are generated at more appropriate stages of the pipeline.
- This solves a number of issues which occur where two iterations of manual curation happen back to back without evidence string generation in between.
#140 Use virtualenv, reorganise dependencies and pin their versions
- For more consistent dependency management, the pipeline now uses virtualenv for all purposes.
- The list of dependencies was reorganised and their versions were pinned.
- Fixed problems caused by release of Pandas 1.1.0 with multiple regressions by downgrading to Pandas 1.0.5.
#141 Changes for batch 2020.09. Includes update from JSON schema 1.6.7 to 1.7.1 (only test files and version updates, no actual evidence string format changes necessary) and minor documentation fixes.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data updates

Operational updates and bug fixes

Test suite revamp

ClinVar input rewrite

Repeat expansion pipeline refactor

JSON schema migration

Other changes

Releases: EBIvariation/CMAT

v2.2.1: Minor updates for the 21.06 submission

v2.2.0: Data and operational updates for batch 21.06; Major test suite revamp

Data updates

Operational updates and bug fixes

Test suite revamp

v2.1.0: Updated ClinVar model investigations; Quality control system revamp

v2.0.2: Corrections and updates for the Open Targets batch 2021.04

v2.0.1: Evidence string duplication, literature references, and ontology mapping adjustments

v2.0.0: Major refactor of ClinVar input, repeat expansion pipeline, and JSON schema

ClinVar input rewrite

Repeat expansion pipeline refactor

JSON schema migration

Other changes

v1.3.2: Minor updates for the 2021.02 batch

v1.3.1: Minor updates for the 2020.11 batch

v1.3: Process additional ClinVar attributes

v1.2: Technical improvements and bug fixes