MATCH_COMBINE assertion error when match dataframe is empty #60

nebfield · 2023-10-23T12:39:25Z

          I have the same error and I'm wondering if it has to do with multi-allelic variants? In my original bgen I have many multiallelic variants for example

alternate_ids rsid chromosome position number_of_alleles first_allele alternative_alleles
. 21:10968913_G/A 21 10968913 2 A G
. 21:10968913_G/C 21 10968913 2 C G

I'm just using one chunked pgen in my sameplesheet to test

sampleset,path_prefix,chrom,format
test,/home/bwolford/archive/pgen/h234_hrc_chr21_chunk1,21,pfile

I tried the --keep_multiallelic option but I get the same error.

nextflow run pgscatalog/pgscalc     -profile conda     --input samplesheet.csv --pgs_id PGS000752 --target_build GRCh38 --chrom 21 --keep_multiallelic

ERROR ~ Error executing process > 'PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (test)'

Caused by:
  Process `PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (test)` terminated with an error exit status (1)

Command executed:

  export POLARS_MAX_THREADS=2

  combine_matches                  --dataset test         --scorefile scorefiles.txt.gz         --matches *.ipc.zst         -n 2         --min_overlap 0.75                  --keep_multiallelic                  --outdir $PWD         --split                  -v

  cat <<-END_VERSIONS > versions.yml
  MATCH_COMBINE:
      pgscatalog_utils: $(echo $(python -c 'import pgscatalog_utils; print(pgscatalog_utils.__version__)'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  root: 2023-10-22 22:02:53 DEBUG    Verbose logging enabled
  pgscatalog_utils.config: 2023-10-22 22:02:53 DEBUG    Using 2 threads to read CSVs
  pgscatalog_utils.config: 2023-10-22 22:02:53 DEBUG    polars threadpool size: 2
  pgscatalog_utils.match.read: 2023-10-22 22:02:53 DEBUG    Reading scorefile
  pgscatalog_utils.match.read: 2023-10-22 22:02:53 DEBUG    --chrom parameter not set, using all variants in scoring file
  pgscatalog_utils.match.preprocess: 2023-10-22 22:02:53 DEBUG    Complementing column effect_allele
  pgscatalog_utils.match.preprocess: 2023-10-22 22:02:53 DEBUG    Complementing column other_allele
  pgscatalog_utils.match.combine_matches: 2023-10-22 22:02:53 DEBUG    Reading matches
  pgscatalog_utils.match.combine_matches: 2023-10-22 22:02:53 DEBUG    Labelling match candidates
  pgscatalog_utils.match.label: 2023-10-22 22:02:53 DEBUG    Labelling best match type (refalt > altref > ...)
  pgscatalog_utils.match.label: 2023-10-22 22:02:53 DEBUG    Labelling duplicated best match: keeping first instance as best_match = True
  pgscatalog_utils.match.label: 2023-10-22 22:02:53 DEBUG    Labelling multiple scoring file lines (accession/row_nr) that best_match to the same variant
  pgscatalog_utils.match.label: 2023-10-22 22:02:53 DEBUG    Labelling all duplicates with exclude flag
  pgscatalog_utils.match.label: 2023-10-22 22:02:53 DEBUG    Labelling ambiguous variants
  pgscatalog_utils.match.preprocess: 2023-10-22 22:02:53 DEBUG    Complementing column REF
  pgscatalog_utils.match.label: 2023-10-22 22:02:53 DEBUG    Labelling ambiguous variants with exclude flag
  pgscatalog_utils.match.label: 2023-10-22 22:02:53 DEBUG    Not excluding multiallelic variants
  pgscatalog_utils.match.label: 2023-10-22 22:02:53 DEBUG    Not excluding flipped matches
  Traceback (most recent call last):
    File "/home/bwolford/pgs_calc/work/conda/pgscatalog_utils-b4f3f611180e4ff75ddd463e7ba86339/bin/combine_matches", line 8, in <module>
      sys.exit(combine_matches())
    File "/home/bwolford/pgs_calc/work/conda/pgscatalog_utils-b4f3f611180e4ff75ddd463e7ba86339/lib/python3.10/site-packages/pgscatalog_utils/match/combine_matches.py", line 37, in combine_matches
      _check_duplicate_vars(matches)
    File "/home/bwolford/pgs_calc/work/conda/pgscatalog_utils-b4f3f611180e4ff75ddd463e7ba86339/lib/python3.10/site-packages/pgscatalog_utils/match/combine_matches.py", line 52, in _check_duplicate_vars
      assert max_occurrence == [1], "Duplicate IDs in final matches"
  AssertionError: Duplicate IDs in final matches

Work dir:
  /home/bwolford/pgs_calc/work/cd/da3fd357a9ab8d0b9d74c011c291ed

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details
ERROR ~ ERROR: Matching subworkflow failed

 -- Check '.nextflow.log' file for details
ERROR ~ ERROR: No results report written!

 -- Check '.nextflow.log' file for details
ERROR ~ ERROR: No scores calculated!

 -- Check '.nextflow.log' file for details`

Originally posted by @bnwolford in PGScatalog/pgsc_calc#72 (comment)

The text was updated successfully, but these errors were encountered:

nebfield · 2023-10-23T12:40:01Z

probably linked to #52

ElixBaSe · 2023-10-23T15:25:54Z

Hello, I'm trying to calculate a custom score, I also get a similar error, but the part where it says catches my attention:

pgscatalog_utils.match.read: 2023-10-23 15:26:08 DEBUG --chrom parameter not set, using all variants in scoring file

I'm not sure what it mean, and I can't find any information in the documentation.
I tried specifying the --chrome parameter in my script, but now it says it was an unexpected parameter.

This is an example of my scorefile:

#format_version=2.0

pgs_name=DIA_HIS_T2D

trait_reported=Type 2 diabetes

genome_build=GRCh38

chr_name chr_position effect_allele other_allele effect_weight
1 20729451 G C 0.018
1 39870793 T C 0.041
1 46358862 G A 0.008

This is an example of my samplesheet.csv file:

sampleset,path_prefix,chrom,format
MCPS,data/genetics_regeneron/freeze_150k/data/imputation/oxford_qcd/per_chromosome/pgen_hds/mcps-freeze150k_qcd_chr1,1,pfile
MCPS,/data/genetics_regeneron/freeze_150k/data/imputation/oxford_qcd/per_chromosome/pgen_hds/mcps-freeze150k_qcd_chr2,2,pfile
MCPS,/data/genetics_regeneron/freeze_150k/data/imputation/oxford_qcd/per_chromosome/pgen_hds/mcps-freeze150k_qcd_chr3,3,pfile

Those are the options that I'm ussing in my script:

`echo start
cd "$1"
echo $PWD

source "$2"
echo environment from "$2"

module load Anaconda3/2022.05 PLINK/2.00a2.3_x86_64 Python/3.10.4-GCCcore-11.3.0 Java/11.0.2 R/4.2.1-foss-2022a yaml-cpp/0.7.0-GCCcore-11.3.0
pip install pyyaml

for pgs_name in in ${@:3};
do
echo $pgs_name start computation
./nextflow run pgscatalog/pgsc_calc -profile conda
--input sample_sheet.csv
--target_build GRCh38
--parallel
--outdir PRS_calculated/MCPS_$pgs_name
--scorefile scorefile_338_DIA_HIS.txt;

echo $pgs_name finished
done

echo all PRS in the list computed

#bash 1.2.run_pgscatalog_custom_scorefile.sh working_directory anaconda_environment PGS_name1 PGS_name2`

This is the error:

executor > local (28) [8b/dd59d6] process > PGSCATALOG_PGSCALC:PGSCALC:INPUT_CHECK:SAMPLESHEET_JSON (sample_sheet.csv) [100%] 1 of 1 ✔ [7d/dc9e54] process > PGSCATALOG_PGSCALC:PGSCALC:INPUT_CHECK:COMBINE_SCOREFILES (1) [100%] 1 of 1 ✔ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_RELABELBIM - [skipped ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_RELABELPVAR (MCPS chromosome 5) [100%] 23 of 23, stored: 23 ✔ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:MAKE_COMPATIBLE:PLINK2_VCF - [cc/ff8584] process > PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_VARIANTS (MCPS chromosome 1) [100%] 23 of 23 ✔ [54/c3811f] process > PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS) [100%] 3 of 3, failed: 3, retries: 2 ✘ [- ] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:PLINK2_SCORE - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:APPLY_SCORE:SCORE_AGGREGATE - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:REPORT:SCORE_REPORT - [- ] process > PGSCATALOG_PGSCALC:PGSCALC:DUMPSOFTWAREVERSIONS - [67/413e7f] NOTE: Process PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS)terminated with an error exit status (1) -- Execution is retried (1) [a5/c006f8] NOTE: ProcessPGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS)` terminated with an error exit status (1) -- Execution is retried (2)
ERROR ~ Error executing process > 'PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS)'

Caused by:
Process PGSCATALOG_PGSCALC:PGSCALC:MATCH:MATCH_COMBINE (MCPS) terminated with an error exit status (1)

Command executed:

export POLARS_MAX_THREADS=2

combine_matches --dataset MCPS --scorefile scorefiles.txt.gz --matches *.ipc.zst -n 2 --min_overlap 0.75 --outdir $PWD --split -v

cat <<-END_VERSIONS > versions.yml
MATCH_COMBINE:
pgscatalog_utils: $(echo $(python -c 'import pgscatalog_utils; print(pgscatalog_utils.version)'))
END_VERSIONS

Command exit status:
1

Command output:
(empty)

Command error:
root: 2023-10-23 15:01:45 DEBUG Verbose logging pgscatalog_utils.config: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.config: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.read: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.read: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.preprocess: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.preprocess: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.combine_matches: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.combine_matches: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.preprocess: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.label: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.filter: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.filter: 2023-10-23 15:01:45 DEBUG pgscatalog_utils.match.filter: 2023-10-23 15:01:46 ERROR pgscatalog_utils.match.match_variants: 2023-10-23 Traceback (most recent call last):
File "/gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_ sys.exit(combine_matches())
File "/gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_ log_and_write(matches=matches, scorefile=scorefile, File "/gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_ raise Exception("No valid matches found")
Exception: No valid matches found enabled
Using 2 threads to read CSVs
polars threadpool size: 2
Reading scorefile
--chrom parameter not set, using all variants in scoring file
Complementing column effect_allele
Complementing column other_allele
Reading matches
Labelling match candidates
Labelling best match type (refalt > altref > ...)
Labelling duplicated best match: keeping first instance as best_match = True
Labelling multiple scoring file lines (accession/row_nr) that best_match to the same variant
Labelling all duplicates with exclude flag
Labelling ambiguous variants
Complementing column REF
Labelling ambiguous variants with exclude flag
Labelling multiallelic matches with exclude flag
Not excluding flipped matches
Filtering to best_match variants (with exclude flag = False)
Calculating overlap between target genome and scoring file
Score scorefile_338_DIA_HIS fails minimum matching threshold (1.78% variants match)
15:01:46 CRITICAL Error: no target variants match any variants in scoring files
Regenie_NoBMI/PRS/env_prs/projectA-skylake/bin/combine_matches", line 8, in
Regenie_NoBMI/PRS/env_prs/projectA-skylake/lib/python3.10/site-packages/pgscatalog_utils/match/combine_matches.py", line 40, in combine_matches
dataset=dataset, args=args)
Regenie_NoBMI/PRS/env_prs/projectA-skylake/lib/python3.10/site-packages/pgscatalog_utils/match/match_variants.py", line 90, in log_and_write

Work dir:
/gpfs3/well/emberson/users/rgu572/GWAS_Elix/GWAS_Regenie_NoBMI/PRS/work/54/c3811f5c0147be7e6f41910bebde79

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details
ERROR ~ ERROR: Matching subworkflow failed

-- Check '.nextflow.log' file for details
ERROR ~ ERROR: No results report written!

-- Check '.nextflow.log' file for details
ERROR ~ ERROR: No scores calculated!

-- Check '.nextflow.log' file for details`

I'm hoping that you could provide some guidance or assistance in resolving it. Your help in this matter would be greatly appreciated.

nebfield · 2023-12-05T14:19:06Z

https://github.com/PGScatalog/pgscatalog_utils/releases/tag/v0.4.3

nebfield added the bug Something isn't working label Oct 23, 2023

nebfield mentioned this issue Oct 23, 2023

Custom score low overlap error PGScatalog/pgsc_calc#202

Closed

nebfield mentioned this issue Nov 28, 2023

remove assert in match_combine #67

Merged

nebfield added this to the v0.4.3 milestone Nov 29, 2023

nebfield closed this as completed Dec 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MATCH_COMBINE assertion error when match dataframe is empty #60

MATCH_COMBINE assertion error when match dataframe is empty #60

nebfield commented Oct 23, 2023

nebfield commented Oct 23, 2023

ElixBaSe commented Oct 23, 2023

nebfield commented Dec 5, 2023

MATCH_COMBINE assertion error when match dataframe is empty #60

MATCH_COMBINE assertion error when match dataframe is empty #60

Comments

nebfield commented Oct 23, 2023

nebfield commented Oct 23, 2023

ElixBaSe commented Oct 23, 2023

pgs_name=DIA_HIS_T2D

trait_reported=Type 2 diabetes

genome_build=GRCh38

nebfield commented Dec 5, 2023