Recently the Telomere-to-Telomere consoritum published a complete T2T reconstruction of a human genome with the exception of 5 gaps. In order to jump between hg38 and the T2T assembly one can use the tool liftOver. LiftOver requires a so-called chain file to convert a set of coordinates from one assembly to the other. In this repository you can find such chain files, the Shell scripts which were used to produce it as well as a simple Python script to convert a single coordinate on the fly!
For coordinate conversions using an exisiting chain file, such as the one provided in this repository, only pyliftover is needed. If one wants to create their own chain file using the providid scripts the following tools need to be installed:
- faToTwoBit
- faSplit
- twoBitInfo
- blat
- liftUp
- axtChain
- chainMergeSort
- chainNet
- netChainSubset
There exist several versions of hg38. Here, GCA_000001405.15_GRCh38_no_alt_analysis_set
was used. Furthermore version 1.0 of the T2T was used.
For converting some coordinate on hg38 to its respective coordinate on the T2T assembly use the script conversion.py
as follows:
python3 conversion.py hg38_to_chm13v1.chain chr2:125398
The output has the following format:
chrom:pos strand score
Here, score corresponds to the alignment score in the chain file.