Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging all changes to the IMBG-refactoring to the main branch #4

Open
wants to merge 109 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
109 commits
Select commit Hold shift + click to select a range
b9afcf9
add test data config and TODOs for steps 1.1-2.3
d-goryslavets Jun 28, 2024
1d2aabb
add test suite skeleton
d-goryslavets Jun 28, 2024
9ed44a1
add TODOs for steps 2.1-2.5 and update paths to point to test data
d-goryslavets Jul 2, 2024
117b2d1
minimal refactoring. add tmp_dir config. add mk43_ prefix to shared i…
mkrooted256 Jul 4, 2024
df04bd0
draft of a get_path util
mkrooted256 Jul 4, 2024
268affd
successful 2.1. replace + with os.path.join. annotate hardcoded paths…
mkrooted256 Jul 4, 2024
8354844
add test suite skeleton and test data
d-goryslavets Jul 15, 2024
0023053
Merge branch 'mk43/first-run' into imbg-refactoring. Refactor steps 1…
d-goryslavets Jul 15, 2024
5890acd
add tmp_dir to test data configs
d-goryslavets Jul 15, 2024
f77d107
cleanup 1.1
mkrooted256 Jul 15, 2024
7bec087
first attempts
mkrooted256 Jul 16, 2024
2cdf407
steps 1.1-2.1: refactor config and move hardcoded values
mkrooted256 Jul 18, 2024
68c2413
add todo
mkrooted256 Jul 23, 2024
1412199
upload documentation
lbombini Jul 25, 2024
73a6373
change smoke test design and add unit test template
d-goryslavets Jul 18, 2024
efab22d
change string concat to os.path.join in steps 1-2, make some function…
d-goryslavets Jul 25, 2024
e2cf204
add integration tests that run main() function of each step
d-goryslavets Jul 25, 2024
261ac9b
add unit tests for steps 1.1-2.2
d-goryslavets Jul 25, 2024
052ee3d
add a snippet for new config system. research possible ways to refact…
mkrooted256 Jul 25, 2024
b5329a8
add config utils and config ipynb playground
mkrooted256 Jul 25, 2024
1990d99
refactor path config for 1.1
mkrooted256 Jul 25, 2024
f02b394
Finish config refactoring for 2.1.
mkrooted256 Jul 26, 2024
e6c5f64
update the docs
lbombini Jul 31, 2024
10a28fa
Merge branch 'imbg-refactoring' of https://github.com/wtsi-hgi/wes-qc…
lbombini Jul 31, 2024
835f77a
fix the code blocks
lbombini Aug 1, 2024
6ee2fcb
refactor path concatenation in steps 2.3-2.5
d-goryslavets Aug 1, 2024
0de9822
add unit and integration tests for steps 2.3-2.5
d-goryslavets Aug 1, 2024
ffe66fd
Merge branch 'imbg-refactoring' into mk43/refactor-hardcoded-config/2.1
mkrooted256 Aug 4, 2024
3b8fb10
add config tests. move config utils to a separate file.
mkrooted256 Aug 4, 2024
adfc9c1
fix a probable config bug
mkrooted256 Aug 4, 2024
c44116a
refactor unittests for 1.1-2.1
mkrooted256 Aug 7, 2024
7790826
minor changes to run 1.1 and 2.1
lbombini Aug 8, 2024
1beef0f
revert stratified_sample_qc to be general
y-popov Aug 15, 2024
9e05cee
fix circular import and name mangling in utils
d-goryslavets Aug 20, 2024
f24d3e2
fix step that required its own output files to run
d-goryslavets Aug 20, 2024
0a4d776
upd unit tests to match refactored steps 1.1-2.1
d-goryslavets Aug 20, 2024
55bd50c
Merge pull request #1 from wtsi-hgi/fix-2.4
d-goryslavets Aug 21, 2024
1404f95
upd unit test after stratified sample qc change
d-goryslavets Aug 22, 2024
36e6588
Refactor 2.2. Minor change in config naming. Change config fields an…
mkrooted256 Aug 22, 2024
32ad76a
Remove sanity checks in direct kwargs config groups. Fix ld_prune con…
mkrooted256 Aug 22, 2024
3001b1c
integrate tesr data download
lbombini Aug 22, 2024
4d3635f
Merge branch 'mk43/refactor-hardcoded-config/2.2' into dh24/refactor-…
d-goryslavets Aug 27, 2024
5e956b7
Extract hardcoded values from step 2.4; add stub 2.3 config
mkrooted256 Aug 29, 2024
c11b704
Add path type adapters in step 2.4
mkrooted256 Aug 29, 2024
090b5b7
Complete function annotations with indirect config fields in 2.4
mkrooted256 Aug 29, 2024
80f0c01
refactor step 2.3
d-goryslavets Aug 29, 2024
58074c4
upd integration test config structure
d-goryslavets Aug 29, 2024
42a0972
Extract hardcoded values from 2.5; add path type adapters
mkrooted256 Aug 29, 2024
ac10add
update unit tests to reflect refactored steps 2.2
lbombini Aug 29, 2024
721f5f6
download data for integration tests from s3 bucket
d-goryslavets Aug 30, 2024
469b6b4
Merge branch 'vk11/refactor-unit-test/2.2' into vk11/refactor-unit-te…
lbombini Aug 30, 2024
f1a6b36
download only required data from the bucket and fix back stratified s…
d-goryslavets Aug 30, 2024
8d81160
update the unit test to reflect changes in step 2.3
lbombini Aug 30, 2024
99151ea
fix config and add ref_mtdir
lbombini Aug 30, 2024
0f7b5b1
Merge branch 'mk43/refactor-hardcoded-config/2.5' of https://github.c…
lbombini Sep 2, 2024
b26eb21
rename path >> file and 2.2 functions in config
lbombini Sep 2, 2024
0874a4b
Merge branch 'dh24/refactor-hardcoded-config/2.3' into imbg-refactori…
d-goryslavets Sep 3, 2024
635a6ed
Merge branch 'vk11/refactor-unit-test/2.3' into imbg-refactoring, upd…
d-goryslavets Sep 3, 2024
240995d
minor fixes to run 2.4 test
lbombini Sep 3, 2024
9ca3f28
Merge branch 'mk43/refactor-hardcoded-config/2.5' into imbg-refactori…
d-goryslavets Sep 3, 2024
15a995e
Merge branch 'vk11/refactor-unit-test/2.4-2.5' into imbg-refactoring
d-goryslavets Sep 3, 2024
d606ccc
Add stub config argument for consistency
mkrooted256 Sep 4, 2024
de8c82e
more config tests. first version of new config parser. config docs
mkrooted256 Sep 5, 2024
c2100da
add test data downloading from s3 to integration tests, upd configs, …
d-goryslavets Sep 6, 2024
cf2d327
refactor test data downloading into utils func, use it in unit tests
d-goryslavets Sep 6, 2024
6cdf3ab
new config parser. add tests. add docs. enhance code structure.
mkrooted256 Sep 6, 2024
b1f45c3
extract hardcoded params and add path type adapters for trios and non…
d-goryslavets Sep 9, 2024
4dd26f6
step 3.2: extract hardcoded params and add path type adapters for tri…
d-goryslavets Sep 9, 2024
b2fed93
step 3.3: extract hardcoded params and add path type adapters
d-goryslavets Sep 17, 2024
9c770a5
rename config tests dir back to local_tests for a merge
mkrooted256 Sep 17, 2024
c84c2de
Merge branch 'mk43/new-config-parser' into mk43/parser/v2.0-pre
mkrooted256 Sep 17, 2024
d6419d2
start fixing config tests. prepare a new config system
mkrooted256 Sep 17, 2024
445219d
rename local_tests to config_tests
mkrooted256 Sep 17, 2024
e6f968b
finish parser v2.0-pre. Ready for beta
mkrooted256 Sep 17, 2024
1b37f73
steps 3.4-3.9: extract hardcoded params, add path type adapters, fix …
d-goryslavets Sep 19, 2024
3122a1e
download RF training data from the s3 bucket in tests
d-goryslavets Sep 19, 2024
2f71069
steps 3.4-3.9: add missing path adapters, fix incorrect paths, refact…
d-goryslavets Sep 19, 2024
04892e1
3.4-3.9 add missing adapters, fix bokeh imports and other bugs
d-goryslavets Sep 20, 2024
1c67082
3.9 add missing path adapter
d-goryslavets Sep 20, 2024
d1ae745
fix and improve integration tests config
d-goryslavets Sep 20, 2024
09fee1e
refactor step 4: move hardcoded params to config, add path type adapt…
d-goryslavets Sep 20, 2024
6d291b5
add hail logs to gitignore
d-goryslavets Sep 20, 2024
3bf274c
add step 3 integration tests
lbombini Sep 21, 2024
6df11c8
allow to manually set run id in 3-train_rf
d-goryslavets Sep 23, 2024
b8f12f0
add var_qc_rf_dir to config variables
d-goryslavets Sep 23, 2024
ace32eb
switch to http-based test data downloading in unit tests; fix incorre…
d-goryslavets Sep 23, 2024
afe7460
add updated integration tests template for step 3-variant_qc
d-goryslavets Sep 23, 2024
f2a98fe
Merge branch 'dh24/refactor-hardcoded-config/4' into imbg-refactoring
d-goryslavets Sep 24, 2024
504d3a9
add missing path type adapters in 3.4
d-goryslavets Sep 24, 2024
831f558
upd example config and add script running all steps of QC
d-goryslavets Sep 24, 2024
a47837a
improve helper script to run all steps
d-goryslavets Sep 24, 2024
15d9742
Merge branch 'mk43/parser/v2.0-pre' into imbg-refactoring
d-goryslavets Sep 24, 2024
3afeaa1
Make parse_config_file return nested config and not flat. Add corresp…
mkrooted256 Sep 24, 2024
8b0e7b5
Merge branch 'mk43/parser/v2.0-pre' into imbg-refactoring; fix bugs
d-goryslavets Sep 24, 2024
0f12c2f
change dots to underscores in qc plot settings names
d-goryslavets Sep 24, 2024
03cea82
update config parsing function names in scripts
d-goryslavets Sep 24, 2024
88f98ef
upd test suite to use new config system; add integration tests for 4-…
d-goryslavets Sep 27, 2024
c270e9b
fix plot settings names in integration tests config template
d-goryslavets Sep 27, 2024
0613997
adjust variant qc scripts
y-popov Oct 1, 2024
2112dad
refactor evaluation script
y-popov Oct 3, 2024
e720546
fix indentation and partition calcs
y-popov Oct 3, 2024
0a1defa
make giab sample optional for compare combinations
y-popov Oct 4, 2024
d802b70
add readme for the test suite
d-goryslavets Oct 7, 2024
c23ce27
more robust path join
y-popov Oct 7, 2024
3793952
Merge pull request #2 from wtsi-hgi/ip13-1kg-trio-test
d-goryslavets Oct 11, 2024
0bd3b8d
add skeleton for the remaining regression tests
d-goryslavets Oct 16, 2024
bd5bbe1
add draft for downloading test data using list of files instead of an…
d-goryslavets Oct 16, 2024
6a7e641
create plots dir if doesn't exist; refactor path joining for RF outdir
d-goryslavets Oct 18, 2024
7fca5f9
switch to downloading unarchived test data; upd and refactor tests ac…
d-goryslavets Oct 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
# Hail run logs
hail-*.log

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down
67 changes: 44 additions & 23 deletions 1-import_data/1-import_gatk_vcfs_to_hail.py
Original file line number Diff line number Diff line change
@@ -1,42 +1,63 @@
# Load GATK VCFs into hail and save as matrixtable
import hail as hl
import pyspark
import yaml
import os
import sys
from wes_qc.utils.utils import parse_config
import re
import hail as hl
from utils.utils import parse_config, path_local, path_spark

# DEBUG: for some reason, paths prefix is `file:`, not a `file://`
VCF_PATTERN = re.compile("file:.*vcf.b?gz")

def load_vcfs_to_mt(indir, outdir, tmp_dir, header):
def load_vcfs_to_mt(config):
'''
load VCFs and save as hail mt
load VCFs and save as hail mt.
Save mt as outdir/gatk_unprocessed.mt

### Config fields
```
step1.gatk_vcf_header_infile
step1.gatk_vcf_indir
step1.gatk_mt_outfile
```
'''
objects = hl.utils.hadoop_ls(indir)
vcfs = [vcf["path"] for vcf in objects if (vcf["path"].startswith("file") and vcf["path"].endswith("vcf.gz"))]
print("Loading VCFs")
indir, header, outfile = (
config['step1']['gatk_vcf_indir'],
config['step1'].get('gatk_vcf_header_infile'), # optional
config['step1']['gatk_mt_outfile']
)

objects = hl.utils.hadoop_ls(path_spark(indir))

# get paths of all vcf files
vcfs = [vcf["path"] for vcf in objects if VCF_PATTERN.match(vcf["path"])]
print(f"info: Found {len(vcfs)} VCFs in {indir}")
#create and save MT
mt = hl.import_vcf(vcfs, array_elements_required=False, force_bgz=True, header_file = header)
print("Saving as hail mt")
mt_out_file = outdir + "gatk_unprocessed.mt"
if header:
print("info: Loading VCFs with header")
mt = hl.import_vcf(vcfs, array_elements_required=False, force_bgz=True, header_file=header)
else:
print("info: Loading VCFs WITHOUT header")
mt = hl.import_vcf(vcfs, array_elements_required=False, force_bgz=True)

mt_out_file = path_spark(outfile)
print(f"Saving as hail mt to {mt_out_file}")
mt.write(mt_out_file, overwrite=True)


def main():
#set up input variables
inputs = parse_config()
vcf_header = inputs['gatk_vcf_header']
import_vcf_dir = inputs['gatk_import_lustre_dir']
mtdir = inputs['matrixtables_lustre_dir']

config = parse_config()

#initialise hail
tmp_dir = "hdfs://spark-master:9820/"
sc = pyspark.SparkContext()
hadoop_config = sc._jsc.hadoopConfiguration()
hl.init(sc=sc, tmp_dir=tmp_dir, default_reference="GRCh38")
tmp_dir = config['general']['tmp_dir']
# sc = pyspark.SparkContext()
sc = pyspark.SparkContext.getOrCreate()
hadoop_config = sc._jsc.hadoopConfiguration() # unused
hl.init(sc=sc, tmp_dir=tmp_dir, default_reference="GRCh38", idempotent=True)

#load VCFs
load_vcfs_to_mt(import_vcf_dir, mtdir, tmp_dir, vcf_header)

load_vcfs_to_mt(config)

if __name__ == '__main__':
main()
main()
114 changes: 77 additions & 37 deletions 2-sample_qc/1-hard_filters_sex_annotation.py
Original file line number Diff line number Diff line change
@@ -1,68 +1,104 @@
#apply gnomad's hard filters and impute sex
#input gatk_unprocessed.mt from step 1.1
import os
import hail as hl
import hailtop.fs as hfs
import pyspark
from utils.utils import parse_config
from utils.utils import parse_config, path_local, path_spark
import os

def apply_hard_filters(mt: hl.MatrixTable, mtdir: str) -> hl.MatrixTable:
def apply_hard_filters(mt: hl.MatrixTable, config: dict) -> hl.MatrixTable:
'''
Applies hard filters and annotates samples in the filtered set with call rate
:param MatrixTable mt: MT containing samples to be ascertained for sex
:param str mtdir: directory output matrix tables are written to
:param dict config:
:return: MatrixTable with hard filtering annotation
:rtype: MatrixTable

### Config fields
step2.sex_annotation_hard_filters.filtered_mt_outfile : path
step2.sex_annotation_hard_filters.n_alt_alleles_threshold : float
step2.sex_annotation_hard_filters.defined_gt_frac_threshold : float
'''
conf = config['step2']['sex_annotation_hard_filters']

print("Applying hard filters")
filtered_mt_file = mtdir + "mt_hard_filters_annotated.mt"
filtered_mt_file = path_spark(conf['filtered_mt_outfile']) # output

mt = mt.filter_rows((hl.len(mt.alleles) == 2) & hl.is_snp(mt.alleles[0], mt.alleles[1]) &
(hl.agg.mean(mt.GT.n_alt_alleles()) / 2 > 0.001) &
(hl.agg.fraction(hl.is_defined(mt.GT)) > 0.99))
(hl.agg.mean(mt.GT.n_alt_alleles()) / 2 > conf['n_alt_alleles_threshold']) &
(hl.agg.fraction(hl.is_defined(mt.GT)) > conf['defined_gt_frac_threshold']))
mt = mt.annotate_cols(callrate=hl.agg.fraction(hl.is_defined(mt.GT)))
mt.write(filtered_mt_file, overwrite=True)

return mt


def impute_sex(mt: hl.MatrixTable, mtdir: str, annotdir: str, male_threshold: float = 0.8, female_threshold: float = 0.5) -> hl.MatrixTable:
def impute_sex(mt: hl.MatrixTable, config: dict) -> hl.MatrixTable:
'''
Imputes sex, exports data, and annotates mt with this data
:param MatrixTable mt: MT containing samples to be ascertained for sex
:param str mtdir: directory output matrix tables are written to
:param str annotdir: directory annotation files are written to
:param dict config:
:return: MatrixTable with imputed sex annotations stashed in column annotation 'sex_check'
:rtype: MatrixTable

### Config fields
step2.impute_sex.sex_ht_outfile : path
step2.impute_sex.sex_mt_outfile : path
step2.impute_sex.female_threshold : float
step2.impute_sex.male_threshold : float
step2.impute_sex.aaf_threshold : float
'''
print("Imputing sex with male_threshold = " + str(male_threshold) + " and female threshold = " + str(female_threshold))

conf = config['step2']['impute_sex']
print("Imputing sex with male_threshold = " + str(conf['male_threshold']) + " and female threshold = " + str(conf['female_threshold']))

#filter to X and select unphased diploid genotypes - no need to filter to X as impute_sex takes care of this
#mt1 = hl.filter_intervals(mt, [hl.parse_locus_interval('chrX')])
mt1 = hl.split_multi_hts(mt)
mtx_unphased = mt1.select_entries(GT=hl.unphased_diploid_gt_index_call(mt1.GT.n_alt_alleles()))
#imput sex on the unphased diploid GTs
sex_ht = hl.impute_sex(mtx_unphased.GT, aaf_threshold=0.05, female_threshold=female_threshold, male_threshold=male_threshold)
sex_ht = hl.impute_sex(mtx_unphased.GT, aaf_threshold=conf['aaf_threshold'], female_threshold=conf['female_threshold'], male_threshold=conf['male_threshold'])
#export
sex_ht.export(annotdir + '/sex_annotated.sex_check.txt.bgz')
sex_ht.export(path_spark(conf['sex_ht_outfile'])) # output
#annotate input (all chroms) mt with imputed sex and write to file
sex_colnames = ['f_stat', 'is_female']
sex_ht = sex_ht.select(*sex_colnames)
mt = mt.annotate_cols(**sex_ht[mt.col_key])
sex_mt_file = mtdir + "mt_sex_annotated.mt"
sex_mt_file = path_spark(conf['sex_mt_outfile']) # output
print("Writing to " + sex_mt_file)
mt.write(sex_mt_file, overwrite=True)

return mt


def identify_inconsistencies(mt: hl.MatrixTable, mtdir: str, annotdir: str, resourcedir: str):
def identify_inconsistencies(mt: hl.MatrixTable, config: dict):
'''
Find samples where annotated sex conflicts with the sex in our metadata
Find samples where sex is not annotated
Find samples where f_stat is between 0.2 and 0.8
Find samples where f_stat is between fstat_low and fstat_high
:param MatrixTable mt: MT containing imputed sex in column 'sex_check'
:param str mtdir: directory output matrix tables are written to
:param str annotdir: directory annotation files are written to
:param str resourcedir: directory annotation files are written to
:param dict config:

### Config fields
step2.sex_inconsistencies.sex_metadata_file : input path : TODO explain metadata structure and constants
step2.sex_inconsistencies.conflicting_sex_report_file : output path : TODO
step2.sex_inconsistencies.fstat_outliers_report_file : output path : TODO
step2.sex_inconsistencies.fstat_low : float
step2.sex_inconsistencies.fstat_high : float
'''
conf = config['step2']['sex_inconsistencies']

# TODO: do we need such a detailed logging, or a single if (... and ... and ...) will suffice?
error = False
if not hfs.exists(conf['sex_metadata_file']):
print("error: identify_inconsistencies: missing input: sex_metadata_file")
error = True
if error:
print("skip identify_inconsistencies because of previous errors")
return

print("Annotating samples with inconsistencies:")
qc_ht = mt.cols()
#convert is_female boolean to sex
Expand All @@ -73,8 +109,8 @@ def identify_inconsistencies(mt: hl.MatrixTable, mtdir: str, annotdir: str, reso
qc_ht = qc_ht.annotate(sex=sex_expr).key_by('s')

#annotate with manifest sex - keyed on ega to match identifiers in matrixtable
metadata_file = resourcedir + '/mlwh_sample_and_sex.txt'
metadata_ht = hl.import_table(metadata_file, delimiter="\t").key_by('accession_number')

metadata_ht = hl.import_table(path_spark(conf['sex_metadata_file']), delimiter="\t").key_by('accession_number')
#we only want those from the metadata file where sex is known
metadata_ht = metadata_ht.filter((metadata_ht.gender == 'Male') | (metadata_ht.gender == 'Female'))

Expand All @@ -84,41 +120,45 @@ def identify_inconsistencies(mt: hl.MatrixTable, mtdir: str, annotdir: str, reso
#identify samples where imputed sex and manifest sex conflict
conflicting_sex_ht = ht_joined.filter(((ht_joined.sex == 'male') & (ht_joined.manifest_sex == 'Female')) | (
(ht_joined.sex == 'female') & (ht_joined.manifest_sex == 'Male')))
conflicting_sex_ht.export(annotdir + '/conflicting_sex.txt.bgz')

#identify samples where f stat is between 0.2 and 0.8
f_stat_ht = qc_ht.filter( (qc_ht.f_stat > 0.2) & (qc_ht.f_stat < 0.8) )
f_stat_ht.export(annotdir + '/sex_annotation_f_stat_outliers.txt.bgz')
# TODO: do we need this redundancy? the paths already have the "file://" prefix
conflicting_sex_ht.export(path_spark(conf['conflicting_sex_report_file'])) # output

#identify samples where f stat is between fstat_low and fstat_high
f_stat_ht = qc_ht.filter( (qc_ht.f_stat > conf['fstat_low']) & (qc_ht.f_stat < conf['fstat_high']) )
f_stat_ht.export(path_spark(conf['fstat_outliers_report_file'])) # output


def main():
#set up
inputs = parse_config()
config = parse_config()
#importmtdir = inputs['load_matrixtables_lustre_dir']
mtdir = inputs['matrixtables_lustre_dir']
annotdir = inputs['annotation_lustre_dir']
resourcedir = inputs['resource_dir']

#initialise hail
tmp_dir = "hdfs://spark-master:9820/"
sc = pyspark.SparkContext()
#initialise hailS
tmp_dir = config['general']['tmp_dir']
# sc = pyspark.SparkContext()
sc = pyspark.SparkContext.getOrCreate()
hadoop_config = sc._jsc.hadoopConfiguration()
hl.init(sc=sc, tmp_dir=tmp_dir, default_reference="GRCh38")
hl.init(sc=sc, tmp_dir=tmp_dir, default_reference="GRCh38", idempotent=True)

mt_in_file = mtdir + "/gatk_unprocessed.mt"
mt_infile = config['step1']['gatk_mt_outfile'] # input from 1.1
print("Reading input matrix")
mt_unfiiltered = hl.read_matrix_table(mt_in_file)
mt_unfiltered = hl.read_matrix_table(path_spark(mt_infile))

#apply hard fitlers
mt_filtered = apply_hard_filters(mt_unfiiltered, mtdir)
mt_filtered = apply_hard_filters(mt_unfiltered, config)

#impute sex
mt_sex = impute_sex(mt_filtered, mtdir, annotdir, male_threshold=0.6)
mt_sex = impute_sex(mt_filtered, config)

# TODO: where is this function?
# annotate_ambiguous_sex(mt_sex, mtdir)
identify_inconsistencies(mt_sex, mtdir, annotdir, resourcedir)


# TODO: make this optional and check how it affects the downstream steps
# there is no metadata for our contrived test datasets
#identify_inconsistencies
identify_inconsistencies(mt_sex, config)

if __name__ == '__main__':
main()

Loading