-
Notifications
You must be signed in to change notification settings - Fork 132
models TamGen
The TamGen is a 100 million-parameter model that can generate compounds based on the input protein information. TamGen is pre-trained on 10 million compounds from PubChem and fine-tuned on CrossDocked and PDB datasets. We evaluate TamGen on existing benchmarks and achieve top performance. Furthermore, TamGen has successfully identified novel inhibitors for Tuberculosis, which have been subsequently validated through wet-lab experiments.
To use TamGen, please follow the responsible AI policy:
- Do not use TamGen to generate any harmful/toxic compounds.
- Only use TamGen for legitimate purposes and in compliance with all applicable laws and regulations.
- Implement proper safety protocols and ethical review processes before synthesizing or testing any compounds generated by TamGen.
TamGen has two main functions:
- Generate compounds based on the input protein information.
- Optimise a previous compound to a better one based on the input protein.
The TamGen framework is composed of three integral components:
- Protein encoder: Converts the three-dimensional structure of a protein into a hidden vector representation.
- Molecule decoder: Extensively trained on a dataset of 10 million SMILES (Simplified Molecular Input Line Entry System) strings, excels in constructing chemically valid SMILES strings for new molecules.
- Contextual encoder: Integrates the information from both the protein and the compound, paving the way for targeted compound optimisation.
To generate a compound based on the input protein information:
- Gather the relevant protein data.
- Input it into the protein encoder.
- Retrieve the corresponding SMILES string from the decoder.
To optimise an existing compound relative to a specific protein:
- Input the protein information into the protein encoder.
- Process the protein and the initial compound information with the contextual encoder.
- Channel the output into the decoder to generate an optimised SMILES string for the compound.
Version: 1
task : protein-design
disable-batch : true
Preview
inference_supported_envs : None
license : mit
author : Microsoft
hiddenlayerscanned : true
SharedComputeCapacityEnabled
inference_compute_allow_list : ['Standard_NC6s_v3', 'Standard_NC12s_v3', 'Standard_NC24s_v3', 'Standard_NC24ads_A100_v4', 'Standard_NC48ads_A100_v4', 'Standard_NC96ads_A100_v4', 'Standard_ND96asr_v4', 'Standard_ND96amsr_A100_v4', 'Standard_ND40rs_v2']
View in Studio: https://ml.azure.com/registries/azureml/models/TamGen/version/1
License: mit
inference-min-sku-spec: 6|1|112|64
inference-recommended-sku: Standard_NC6s_v3, Standard_NC12s_v3, Standard_NC24s_v3, Standard_NC24ads_A100_v4, Standard_NC48ads_A100_v4, Standard_NC96ads_A100_v4, Standard_ND96asr_v4, Standard_ND96amsr_A100_v4, Standard_ND40rs_v2
languages: en
SharedComputeCapacityEnabled: True