This project aims at predicting chemical structure of metabolites from LC-MS/MS spectra using Deep Canonical Correlation Analysis(DeepCCA). DeepCCA is a deep learning extension of CCA. This work is done in three phases as outlined below.
In this notebook, we clean, intergrate and generate embeddings of structure and spectra dataset.
This notebook contains DeepCCA optimization codes.
Here we perfom a cross modal retrieval. It takes in the spectra embeddings and outputs the most likely structure of that spectra. Next we evaluate using Tanimoto scores whether the predicted structure is similar to the true structure.
After selecting best performing hyperparameters, we train the final model in this notebook
The final model is used to predict the structures of query spectrum in this notebook