Applying Machine Learning Algorithms to New Psychoactive Substances Screening

The Project

The goal of this project is to train and test various machine learning models on the task of screening new psychoactive substances.

The project was inspired by the paper Screening unknown novel psychoactive substances using GC–MS based machine learning (Wong et al., 2023) [1]. In the article, the reseachers employ an ANN, a CNN and a Balanced Random Forest to classify various substances.

This project "extends" their work by testing more and different ML models and analysing their feature importances.

The problem

New Psychoactive Substances

According to UNODC, New Psychoactive Substances (NPS) are substances of abuse, either in a pure form or a preparation, that are not controlled by the 1961 Single Convention on Narcotic Drugs or the 1971 Convention on Psychotropic Substances, but which may pose a public health threat [2]. These compounds present a significant challenge since their potential adverse health effects and social harms are often not documented.

Moreover, the analysis and identification of a large number of chemically diverse substances present in drug markets at the same time is demanding [2]. Therefore, extensive efforts have been made to develop robust and efficient methods for the detection of NPS.

Classical approaches to screening NPS

Due to its ease of operation and ability to distinguish compound mixtures, GC–MS is one of the most widely used techniques for substance identification. Traditionally, the identification of a compound involves matching its retention time and mass spectrum with known records in a library using similarity algorithms. For accurate detection, this approach requires a comprehensive database. However, in the case of NPS, creating an exhaustive database is challenging due to the constantly increasing structural diversity of the substances. Thus, current GC–MS database are unable to exhaustively cover the spectrum of all possible NPSs [1].

The machine learning solution

Newer approaches classify unknown substances using machine learning (ML) models trained on compounds in the available library (dataset). These methods are very effective for two reasons:

Pattern recognition: machine learning algorithms excel at recognizing patterns and trends within complex datasets;
Generalization: if trained correctly, ML models can generalize the knowledge learned from the available data to new unseen substances, making them particularly suit to detecting substances with always new structures.

An example of this new approach is [1], where the authors developed a machine learning based model that can classify an unknown NPS to its drug class based solely on its GC–MS, with no knowledge of its nominal molecular mass.

The results

Model	Accuracy
XGBoost	93.85%
Stacking	93.30%
Majority Voting	93.30%
Random Forest	93.30%
Extra Trees	90.50%
Decision Tree	79.89%
Support Vector Machines	77.65%
Logistic Regression	76.54%
Gaussian Naive Bayes	68.16%
Nearest Centroid	67.04%
K-Nearest Neighbours	65.92%

Credits

Credit is given to the authors of [1] for inspiring the project and providing the data that was used for training the models. This dataset can be found under the paper's Appendix D. Supplementary data.

Bibliography

[1] Wong, S. L., Ng, L. T., Tan, J., & Pan, J. (2023). Screening unknown novel psychoactive substances using GC–MS based machine learning. Forensic Chemistry, 34, 100499. https://www.sciencedirect.com/science/article/pii/S2468170923000358

[2] United Nations on Drugs and Crime (UNODC) - Early Warning Advisory on New Psychoactive Substances. https://www.unodc.org/LSS/Page/NPS

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
Data		Data
Applying Machine Learning Algorithms to New Psychoactive Substances Screening.ipynb		Applying Machine Learning Algorithms to New Psychoactive Substances Screening.ipynb
Applying Machine Learning Algorithms to New Psychoactive Substances Screening.pdf		Applying Machine Learning Algorithms to New Psychoactive Substances Screening.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Applying Machine Learning Algorithms to New Psychoactive Substances Screening

The Project

The problem

New Psychoactive Substances

Classical approaches to screening NPS

The machine learning solution

The results

Credits

Bibliography

About

Releases

Packages

Languages

mattiaferrarini/Applying-ML-to-NPS-Screening

Folders and files

Latest commit

History

Repository files navigation

Applying Machine Learning Algorithms to New Psychoactive Substances Screening

The Project

The problem

New Psychoactive Substances

Classical approaches to screening NPS

The machine learning solution

The results

Credits

Bibliography

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages