seq2HLAallele

This package is designed to address the challenge of allele typing for HLA genes in crystallography, even when the available sequences are incomplete. It leverages the BLAST tool to identify the closest matching allele within the HLA database. Subsequently, it calculates the mean frequency of alleles_over_2n across diverse populations, enabling the determination of the most probable allele. Finally, it outputs the matched allele with highest frequency in the standard HLA nomenclature.

The format of the output allele will follow the HLA Naming:

HLA<gene>*<allele_group>:<specific_HLA_protein>

Installation

Environment

Clone this repo and create a python 3.11 environment and install the requirements:

conda create -n seq2hla python=3.11
conda activate seq2hla
# Make sure you are in this repo folder
pip install -r requirements.txt
python setup.py install

Make sure your computer has installed Blast tools. If not, you can install it by:

sudo apt install ncbi-blast+

Download databases

Before using the package, you have to download the:

HLA sequence database
HLA frequencies database

The database files should be under the databases/ folder with the following structure:

- databases/
   |-> hla_freq/
   |     - afnd.tsv
   |-> hla_seqs/
        - A_prot.fasta
        - all_hla_seq.fasta
        - B_prot.fasta
        - C_prot.fasta
        - DPA1_prot.fasta
        - DPB1_prot.fasta
        - DQA1_prot.fasta
        - DQB1_prot.fasta
        - DRB1_prot.fasta

HLA sequence database

You can download the database latest version and create the Blast DB automatically with:

python seq2hla/download_imgthla_database.py

Alternatively, you can download the HLA database of HLA amino acid sequences from IMGT/HLA. The files should be under the fasta/*_prot.fasta. Unify them in a single file called all_hla_seq.fasta and place it under the databases/hla_seqs/ folder. Then execute the command:

makeblastdb -in databases/hla_seqs/all_hla_seq.fasta -dbtype prot -parse_seqids

HLA frequencies database

You can manually download the file afnd.tsv HLA frequencies database from this GitHub Repo and place it under the databases/hla_freq/ folder.

You can also download the database latest version automatically using the code provided by this GitHub Repo:

python seq2hla/download_imgthla_database.py

Warning: This download might take ~1-2 hours.

As an alternative, you can directly download the file afnd.tsv from the repo using the flag --fast (or -f):

python seq2hla/download_allele_freq_database.py --fast

Usage

Command line interface:

python -m seq2hla.main sequence.fasta

Using python:

from seq2hla import get_most_freq_allele_from_seq

highest_frequency_alleles, mean_frequencies = \
    get_most_freq_allele_from_seq("sequence.fasta")

for allele in highest_frequency_alleles:
    print(f"\t{allele}\tMean Frequency: {mean_frequencies[allele]}")

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data_prep		data_prep
databases/mhc1_pdb		databases/mhc1_pdb
seq2hla		seq2hla
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

seq2HLAallele

Installation

Environment

Download databases

HLA sequence database

HLA frequencies database

Usage

About

Releases

Packages

Languages

License

annadiarov/seq2HLAallele

Folders and files

Latest commit

History

Repository files navigation

seq2HLAallele

Installation

Environment

Download databases

HLA sequence database

HLA frequencies database

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages