v2.0.0-beta.3 #360

nebfield · 2024-08-09T06:47:40Z

nebfield
Aug 9, 2024
Maintainer

Changelog

Important fix: Fix splitting duplicated variant IDs across multiple scoring files

Background

The MATCH_COMBINE step writes new scoring files for input to plink2 --score
When plink2 encounters a variant with the same ID across multiple rows in a scoring file it will ignore duplicates and warn about them
This only happens when the same variant ID has different effect alleles across different rows
- A variant ID with the same effect allele and scores across multiple columns is OK, this causes scores to be calculated in parallel

Example

When using PGS000039, PGS000040, and PGS000041 in parallel some variants have different effect alleles at the same coordinates, for example:

22:40682469:T:C with effect allele T (PGS000041_hmPOS_GRCh38)
22:40682469:T:C with effect allele C (PGS000039_hmPOS_GRCh38)

Impact

In versions v2.0.0-beta, beta.1, and beta.2 the duplicated variant is written to the same scoring file and ignored by plink2. The duplicated variant doesn't contribute to the final calculated PGS.

In all v2.0.0-alpha versions and beta.3 a second scoring file is correctly written containing the other allele (additional alleles create extra scoring files automatically within the updated MATCH_COMBINE process). We have also updated the software tests to ensure this error doesn't occur in future releases.

This problem is more likely to happen when larger scores are calculated in parallel. As more scores are calculated in parallel, it's more likely that variant IDs with different effect alleles will duplicate and be ignored during the score calculation stage.

While the overall impact on the final score is likely to be small we encourage users to upgrade to beta.3, especially if they calculate larger scores in parallel.

How do I know if my data are affected?

$ cd work/71/35fa3c977993b71d5a85fb6721e8c3 # cd to a scoring process directory 
$ comm -3 <(sort hgdp_22_additive_0.sscore.vars) <(zcat hgdp_22_additive_0.scorefile.gz | tail -n +2 | cut -f 1 | sort)
	22:40682469:T:C

One missing variant appears in the output. This check is now included in the scoring module.

Other fixes

Fix --keep_ambiguous parameter Issue with '--keep_ambiguous' Option and Possible Bug #346
Fix variant matching information getting dropped from log when scores didn't pass the match rate threshold
Fix fraposa-pgsc handling exclusively numeric IIDs v1.0.1 fraposa_pgsc#18

This discussion was created from the release v2.0.0-beta.3.

smlmbrt · 2024-08-09T08:41:14Z

smlmbrt
Aug 9, 2024
Maintainer

Just a note, it's important to update any beta version you're using to this most recent release.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v2.0.0-beta.3 #360

{{title}}

Replies: 1 comment

{{title}}

Select a reply

v2.0.0-beta.3 #360

nebfield Aug 9, 2024 Maintainer

Changelog

Important fix: Fix splitting duplicated variant IDs across multiple scoring files

Background

Example

Impact

How do I know if my data are affected?

Other fixes

Replies: 1 comment

smlmbrt Aug 9, 2024 Maintainer

nebfield
Aug 9, 2024
Maintainer

smlmbrt
Aug 9, 2024
Maintainer