Toxic Media - language-models

title	emoji	colorFrom	colorTo	sdk	sdk_version	app_file	pinned
Toxic Tweets	🐢	gray	blue	streamlit	1.17.0	app.py	false

link to hugging face model

Toxic Media - language-models

A introduction to natural language processing and sentimental analysis through the scope of analyzing negative online behaviors.

Requirements

WSL 2

For Windows users, first install WSL2. Install link here

Powershell

wsl --install

Containers are lightweight, abstractions at the application layer that packages code and depencies together. Sharing the same operating system kernel, containers take up less space and run as isolated namespaces.

Docker

This project uses Docker.

Docker Desktop is an alternative for Windows users that provides a GUI alternative to the WSL subsystem. Understand the tradeoffs for using Docker Desktop.

Update repository

sudo apt-get update
sudo apt-get upgrade

Install Docker packages

sudo apt install docker-ce docker-ce-cli containerd.io

Install VSCode

Located here.

Install DevContainers Extension. Through this extension, you can now open any directory in a development container through the bottom left icon.

Accessing the container

In Docker Desktop, you can find the container ID under the Containers tab.

In WSL2, execute

docker ps

to show all running containers on the system.

Copy the ID and run the command

docker exec -it [container-id] /bin/sh

to start a shell in container.

Writeup

The model was trained on the Toxic Comment Classification Dataset found here.

The challenge aims to classify toxicity into the following categories:

toxic
severe_toxic
obscene
threat
insult
identity_hate

The model was trained using Google Colab and exported to HuggingFace Models. The base model, "distilbert-base-uncased", served as the original model in which the dataset helped fine tune.

The output of the model is a set of values corresponding to how much of each category the input tweet is.

Google site

Loss

I calculated loss using binary-cross entropy. For every batch, we can call the model to measure its loss. It hovers around 0.03 to 0.06, dropping sometimes to 0.01.

youtube link

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
.vscode		.vscode
data		data
fine-trained-distilbert		fine-trained-distilbert
media		media
README.md		README.md
app.py		app.py
finetune_model.ipynb		finetune_model.ipynb
gg.py		gg.py
newarc.py		newarc.py
requirements.txt		requirements.txt
sandbox.py		sandbox.py
train.json		train.json
trainingarc.py		trainingarc.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

link to hugging face model

Toxic Media - language-models

Requirements

WSL 2

Powershell

Docker

Update repository

Install Docker packages

Install VSCode

Accessing the container

Writeup

Loss

About

Releases

Packages

Languages

davidchiii/language-models

Folders and files

Latest commit

History

Repository files navigation

link to hugging face model

Toxic Media - language-models

Requirements

WSL 2

Powershell

Docker

Update repository

Install Docker packages

Install VSCode

Accessing the container

Writeup

Loss

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages