You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm trying to calculate a custom score, I also get a similar error, but the part where it says catches my attention:
pgscatalog_utils.match.read: 2023-10-23 15:26:08 DEBUG --chrom parameter not set, using all variants in scoring file
I'm not sure what it mean, and I can't find any information in the documentation.
I tried specifying the --chrome parameter in my script, but now it says it was an unexpected parameter.
This is an example of my scorefile:
#format_version=2.0
pgs_name=DIA_HIS_T2D
trait_reported=Type 2 diabetes
genome_build=GRCh38
chr_name chr_position effect_allele other_allele effect_weight
1 20729451 G C 0.018
1 39870793 T C 0.041
1 46358862 G A 0.008
for pgs_name in in ${@:3};
do
echo $pgs_name start computation
./nextflow run pgscatalog/pgsc_calc -profile conda
--input sample_sheet.csv
--target_build GRCh38
--parallel
--outdir PRS_calculated/MCPS_$pgs_name
--scorefile scorefile_338_DIA_HIS.txt;
@ElixBaSe I made a fresh issue in this (the calculator) repository because the question was more calculator related 😄
Your custom scoring file matches badly with your target genomes, so we stop calculating and give you an error. 1.78% of the variants in the scoring file are found in your input genomes.
If you want to test if the workflow will run on your computer, you could try adding the parameter --min_overlap 0.01. However, the calculated scores won't be biologically meaningful even if the pipeline finishes, because you're not using 98% of the variants in your scoring file.
There are a few reasons why you might have such bad overlaps:
was your custom scoring file developed using genotypes in the same genome build as your new input genomes?
did you use the same imputation panel across the development cohort and the new input cohort?
Regarding the variant weights I use in my scoring file, they were initially devised using the GRCh37 genome build. However, to create my score file, I used the GRCh38.
I will review the details on the imputation panels used.
Hello, I'm trying to calculate a custom score, I also get a similar error, but the part where it says catches my attention:
pgscatalog_utils.match.read: 2023-10-23 15:26:08 DEBUG --chrom parameter not set, using all variants in scoring file
I'm not sure what it mean, and I can't find any information in the documentation.
I tried specifying the --chrome parameter in my script, but now it says it was an unexpected parameter.
This is an example of my scorefile:
#format_version=2.0
pgs_name=DIA_HIS_T2D
trait_reported=Type 2 diabetes
genome_build=GRCh38
chr_name chr_position effect_allele other_allele effect_weight
1 20729451 G C 0.018
1 39870793 T C 0.041
1 46358862 G A 0.008
This is an example of my samplesheet.csv file:
sampleset,path_prefix,chrom,format
MCPS,data/genetics_regeneron/freeze_150k/data/imputation/oxford_qcd/per_chromosome/pgen_hds/mcps-freeze150k_qcd_chr1,1,pfile
MCPS,/data/genetics_regeneron/freeze_150k/data/imputation/oxford_qcd/per_chromosome/pgen_hds/mcps-freeze150k_qcd_chr2,2,pfile
MCPS,/data/genetics_regeneron/freeze_150k/data/imputation/oxford_qcd/per_chromosome/pgen_hds/mcps-freeze150k_qcd_chr3,3,pfile
Those are the options that I'm ussing in my script:
`echo start
cd "$1"
echo $PWD
source "$2"
echo environment from "$2"
module load Anaconda3/2022.05 PLINK/2.00a2.3_x86_64 Python/3.10.4-GCCcore-11.3.0 Java/11.0.2 R/4.2.1-foss-2022a yaml-cpp/0.7.0-GCCcore-11.3.0
pip install pyyaml
for pgs_name in in ${@:3};
do
echo $pgs_name start computation
./nextflow run pgscatalog/pgsc_calc -profile conda
--input sample_sheet.csv
--target_build GRCh38
--parallel
--outdir PRS_calculated/MCPS_$pgs_name
--scorefile scorefile_338_DIA_HIS.txt;
echo $pgs_name finished
done
echo all PRS in the list computed
#bash 1.2.run_pgscatalog_custom_scorefile.sh working_directory anaconda_environment PGS_name1 PGS_name2`
This is the error:
executor > local (28) [8b/dd59d6] process > PGSCATALOG_PGSCALC:PGSCALC:INPUT_CHECK:SAMPLESHEET_JSON (sample_sheet.csv) [100%] 1 of 1 ✔ [7d/dc9e54] process > PGSCATALOG_PGSCALC:PGSCALC:INPUT_CHECK:COMBINE_SCOREFILES (1) [100%] 1 of 1 ✔ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM - [skipped ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (MCPS chromosome 5) [100%] 23 of 23, stored: 23 ✔ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_VCF - [cc/ff8584] process > PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_VARIANTS (MCPS chromosome 1) [100%] 23 of 23 ✔ [54/c3811f] process > PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS) [100%] 3 of 3, failed: 3, retries: 2 ✘ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:PLINK2_SCORE - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:SCORE_AGGREGATE - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:REPORT:SCORE_REPORT - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:DUMPSOFTWAREVERSIONS - [67/413e7f] NOTE: Process
PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS)terminated with an error exit status (1) -- Execution is retried (1) [a5/c006f8] NOTE: Process
PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS)` terminated with an error exit status (1) -- Execution is retried (2)ERROR ~ Error executing process > 'PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS)'
Caused by:
Process
PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS)
terminated with an error exit status (1)Command executed:
export POLARS_MAX_THREADS=2
combine_matches --dataset MCPS --scorefile scorefiles.txt.gz --matches *.ipc.zst -n 2 --min_overlap 0.75 --outdir $PWD --split -v
cat <<-END_VERSIONS > versions.yml$(echo $ (python -c 'import pgscatalog_utils; print(pgscatalog_utils.version)'))
MATCH_COMBINE:
pgscatalog_utils:
END_VERSIONS
Command exit status:
1
Command output:
(empty)
Command error:
root: 2023-10-23 15:01:45 DEBUG Verbose logging enabled
pgscatalog_utils.config: 2023-10-23 15:01:45 DEBUG Using 2 threads to read CSVs
pgscatalog_utils.config: 2023-10-23 15:01:45 DEBUG polars threadpool size: 2
pgscatalog_utils.match.read: 2023-10-23 15:01:45 DEBUG Reading scorefile
pgscatalog_utils.match.read: 2023-10-23 15:01:45 DEBUG --chrom parameter not set, using all variants in scoring file
pgscatalog_utils.match.preprocess: 2023-10-23 15:01:45 DEBUG Complementing column effect_allele
pgscatalog_utils.match.preprocess: 2023-10-23 15:01:45 DEBUG Complementing column other_allele
pgscatalog_utils.match.combine_matches: 2023-10-23 15:01:45 DEBUG Reading matches
pgscatalog_utils.match.combine_matches: 2023-10-23 15:01:45 DEBUG Labelling match candidates
pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG Labelling best match type (refalt > altref > ...)
pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG Labelling duplicated best match: keeping first instance as best_match = True
pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG Labelling multiple scoring file lines (accession/row_nr) that best_match to the same variant
pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG Labelling all duplicates with exclude flag
pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG Labelling ambiguous variants
pgscatalog_utils.match.preprocess: 2023-10-23 15:01:45 DEBUG Complementing column REF
pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG Labelling ambiguous variants with exclude flag
pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG Labelling multiallelic matches with exclude flag
pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG Not excluding flipped matches
pgscatalog_utils.match.filter: 2023-10-23 15:01:45 DEBUG Filtering to best_match variants (with exclude flag = False)
pgscatalog_utils.match.filter: 2023-10-23 15:01:45 DEBUG Calculating overlap between target genome and scoring file
pgscatalog_utils.match.filter: 2023-10-23 15:01:46 ERROR Score scorefile_338_DIA_HIS fails minimum matching threshold (1.78% variants match)
pgscatalog_utils.match.match_variants: 2023-10-23 15:01:46 CRITICAL Error: no target variants match any variants in scoring files
Traceback (most recent call last):
File "/gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_Regenie_NoBMI/PRS/env_prs/projectA-skylake/bin/combine_matches", line 8, in
sys.exit(combine_matches())
File "/gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_Regenie_NoBMI/PRS/env_prs/projectA-skylake/lib/python3.10/site-packages/pgscatalog_utils/match/combine_matches.py", line 40, in combine_matches
log_and_write(matches=matches, scorefile=scorefile, dataset=dataset, args=args)
File "/gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_Regenie_NoBMI/PRS/env_prs/projectA-skylake/lib/python3.10/site-packages/pgscatalog_utils/match/match_variants.py", line 90, in log_and_write
raise Exception("No valid matches found")
Exception: No valid matches found
Work dir:
/gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_Regenie_NoBMI/PRS/work/54/c3811f5c0147be7e6f41910bebde79
Tip: when you have fixed the problem you can continue the execution adding the option
-resume
to the run command line-- Check '.nextflow.log' file for details
ERROR ~ ERROR: Matching subworkflow failed
-- Check '.nextflow.log' file for details
ERROR ~ ERROR: No results report written!
-- Check '.nextflow.log' file for details
ERROR ~ ERROR: No scores calculated!
-- Check '.nextflow.log' file for details`
I'm hoping that you could provide some guidance or assistance in resolving it. Your help in this matter would be greatly appreciated.
Originally posted by @ElixBaSe in PGScatalog/pgscatalog_utils#60 (comment)
The text was updated successfully, but these errors were encountered: