Suicidal Ideation Detection based on Social Media Dataset using Semantic, Contextual and Graph Neural Network based Hybrid Approach

This project aims to develop a system that can detect suicidal ideation (SI) from Facebook,Twitter and Reddit using Natural Language Processing (NLP) and Deep Learning (DL) models including Long Short Term Memory (LSTM) and Graph Neural Network (GNN). We develop two pipelines. One is LSI based where the LSI topic modeling is peforformed on the data then the output of LSI is embedded with word2vec. The original data is also embedded with Bidirectional Encoder Representation of Transformer (BERT). The concatenated embeddings from word2vec and BERT is used as input in LSTM to detect SI. In another pipeline, we incorporate the power of lexical features and cutting edge technique for constructing a lexical psycholinguistic knowledge-guided graph neural network based model for SI detection. We employ LIWC to extract psycholinguistic features from the collected and pre-processed text data. The LIWC features are used to create graph using k-nearest neighbour. Later, we apply graph neural network on the graph for SI detection. The system aims to identify individuals who may be at risk of suicide and contribute to suicide prevention and suicide preventional policy making approaches.

Dataset

We collect a total of 785 posts where 386, 321 and 78 posts are from reddit, Facebook and Twitter, respectively. We scrawl and scrap data from those platforms with search keywords ”Suicide”, ”suicidal”, ”self injury”, ”self harm”, and many more related words. . The collected data is annotated as ’YES’ and ’NO’ for suicidal and non suicidal labels, respectively by one behavioural scientist. Thus, 405 posts are annotated as ’YES’ and 380 posts are annotated as ’NO’.

Data Pre-processing

We carefully clean the textual data before executing them into SI detection task since the data can be noisy. We pre-process the data for both approaches. Data pre-processing steps include removing irrelevant characters, stemming and lemmatization and stop words removal etc. Nonsensical characters are not recognizable. to the machine learning models which make the text noisy. It must be deleted from text to ease the classification task. Emojis, URLs, punctuation, white space, numerals, and user references are deducted from the text using regular expressions. We apply porter stemmer and wordnet lemmatizer of nltk to perform stemming and lemmatization to improve text categorization accuracy. Unimportant and frequently occurring words which has little or no grammatical responsibility for text classification is identified as stop words. We use nltk stop words corpus to eliminate stop words to concentrate more on the relevant information.

Feature Extraction

In pipeline 1, We incorporate the power of word semantics (LSI) and preserving long text (LSTM) and produce an integrated LSI-LSTM model for SI detection. We employ TF-IDF for converting the text data into vectors. Before performing LSI, it is important to ensure term document matrix to be filled with important words by TF-IDF vectors. The TF-IDF vectors are passed through LSI for topic modeling. The output from LSI are embedded with word2vec. The original text data are embedded with BERT embedding. In pipeline 2, We employ LIWC to extract psycholinguistic features from the collected and pre-processed text data.

Deep Learning Models

LSI-LSTM Model :

The concatenated embeddings from word2vec and BERT enter into the LSTM model as input for SI detection. By incorporating BERT, we include the power of contextual word embedding through the pre-trained language model.

GNN Model :

The LIWC features are used to create graph using k-nearest neighbour. Graph neural network is applied on the graph for SI detection.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Annotated-Dataset(Facebook,twitter,Reddit),LIWC-features,Merged _bert+word2vec		Annotated-Dataset(Facebook,twitter,Reddit),LIWC-features,Merged _bert+word2vec
Annotated-Dataset		Annotated-Dataset
Baseline-Models		Baseline-Models
Raw Dataset		Raw Dataset
SI-DNN+GNN		SI-DNN+GNN
Facebook-Suicide-Data-2.xlsx		Facebook-Suicide-Data-2.xlsx
Jabeen-LSI(DEMO).ipynb		Jabeen-LSI(DEMO).ipynb
LSI(DEMO).ipynb		LSI(DEMO).ipynb
README.md		README.md
Reddit-Suicide-Data-Final.xlsx		Reddit-Suicide-Data-Final.xlsx
Suicide_twitter_data_final.xlsx		Suicide_twitter_data_final.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Suicidal Ideation Detection based on Social Media Dataset using Semantic, Contextual and Graph Neural Network based Hybrid Approach

Dataset

Data Pre-processing

Feature Extraction

Deep Learning Models

LSI-LSTM Model :

GNN Model :

About

Releases

Packages

Languages

Jabeen211/Suicidal-Ideation-Detection-based-on-Social-Media-Dataset-using-Semantic-Contextual-and-Graph-Neura

Folders and files

Latest commit

History

Repository files navigation

Suicidal Ideation Detection based on Social Media Dataset using Semantic, Contextual and Graph Neural Network based Hybrid Approach

Dataset

Data Pre-processing

Feature Extraction

Deep Learning Models

LSI-LSTM Model :

GNN Model :

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages