ClinVar XML to RDF Converter
- Docker
$ docker build --tag clinvar-rdf .
Fill "[yyyymmdd]" below with latest release date listed at ClinVar FTP site.
mkdir clinvar_[yyyymmdd]
cd $_
It is recommended to divide the xml into several pieces to reduce processing time.
Check if you are at working directory
# => /.../clinvar_[yyyymmdd]
docker run --rm -v $(pwd):/data clinvar-rdf /clinvar-rdf/bin/split $(ls variation_archive_*.xml.gz)
The XML will be splitted each 10,000 records.
Check target files
ls variation_archive_*_*.xml.gz
# => variation_archive_[yyyymmdd]_1.xsd variation_archive_[yyyymmdd]_2.xsd ...
Ensure there is only one xsd file in the directory
ls *.xsd
# => variation_archive_[yyyymmdd].xsd
Execute with 10 parallel processes
ls variation_archive_*_*.xml.gz | xargs -n1 -P10 -I{} bash -c "f={}; zcat \${f} | docker run --rm -i -v $(pwd):/data clinvar-rdf convert --xsd $(ls *.xsd) 2>\${f%%.*}.log | gzip -c >\${f%%.*}.ttl.gz"
