Skip to content

LiuShuoJiang/Protein-Information-Retrieval

Repository files navigation

Protein Structure Information Retrieval

Description

Similar to Foldseek, this project implements a protein structure database searching methodology, while the method used here is based on GVP-GNN for protein structure representation learning.

Training Dataset

We use Foldseek to generate the ground-truth datasets.

Query Database

We use CATH/Gene3D dataset, see this page to download the .pdb format dataset.

Target Database

We use Alphafold protein structure database, see this page to download the Swiss-Prot dataset (Huge!!! about 26GB compressed).

App

The app will be constructed later.

Package Requirements

Pytorch Geometric

Biopython

Biotite

FAIR-ESM

Foldseek

Pandas

WandB

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages