title | emoji | colorFrom | colorTo | sdk | sdk_version | app_file | pinned |
---|---|---|---|---|---|---|---|
Toxic Tweets |
🐢 |
gray |
blue |
streamlit |
1.17.0 |
app.py |
false |
A introduction to natural language processing and sentimental analysis through the scope of analyzing negative online behaviors.
For Windows users, first install WSL2. Install link here
wsl --install
Containers are lightweight, abstractions at the application layer that packages code and depencies together. Sharing the same operating system kernel, containers take up less space and run as isolated namespaces.
This project uses Docker.
Docker Desktop is an alternative for Windows users that provides a GUI alternative to the WSL subsystem. Understand the tradeoffs for using Docker Desktop.
sudo apt-get update
sudo apt-get upgrade
sudo apt install docker-ce docker-ce-cli containerd.io
Located here.
Install DevContainers Extension. Through this extension, you can now open any directory in a development container through the bottom left icon.
In Docker Desktop, you can find the container ID under the Containers
tab.
In WSL2, execute
docker ps
to show all running containers on the system.
Copy the ID and run the command
docker exec -it [container-id] /bin/sh
to start a shell in container.
The model was trained on the Toxic Comment Classification Dataset found here.
The challenge aims to classify toxicity into the following categories:
- toxic
- severe_toxic
- obscene
- threat
- insult
- identity_hate
The model was trained using Google Colab and exported to HuggingFace Models. The base model, "distilbert-base-uncased", served as the original model in which the dataset helped fine tune.
The output of the model is a set of values corresponding to how much of each category the input tweet is.
I calculated loss using binary-cross entropy. For every batch, we can call the model to measure its loss. It hovers around 0.03 to 0.06, dropping sometimes to 0.01.