The Webis Clickbait Spoiling Corpus 2022 (Webis-Clickbait-22) contains 4,000 spoiled clickbait posts crawled from Facebook, Reddit, and Twitter. This corpus supports the task of clickbait spoiling, which deals with generating a short text that satisfies the curiosity induced by a clickbait post.
The dataset comes with predefined train/validation/test splits:
Installation of dependencies (to run on colab follow the section "Setting up data for colab" as the dependencies are already installed in google colab environment)
-
Please refrence this link: https://www.makeuseof.com/tag/install-pip-for-python/, to install pip for Windows, Mac, Linux
-
Please create a virtual enviornment using venv (https://docs.python.org/3/library/venv.html)
-
pip install -U pip
-
pip install -r requirements.txt
-
After git cloning this project, please go to the data folder and unzip the qa file.
#setting-up-data-for-colab
-
Before running the notebooks right click folder named "data" and "models" located in "NLP243_2022/Group1" folder of google drive.
-
Select "Add shortcut to Drive"
-
Select "My Drive"
-
Select "ADD SHORTCUT"
This will ensure all the data paths are setup to run in colab.
-
The project contains three folder:
- data -> Contains all the raw train and validation data for all models
- models -> Folder where we save the best models for spoiler classification.
- notebooks -> Folder containing model notebooks
-
In the notebooks folder, there are 5 notebooks for each model we designed and test:
- nlp_243_project_classification_svm.ipynb
- nlp_243_project_classification_bert.ipynb
- nlp_243_project_classification_bert_lstm.ipynb
- nlp_243_project_qa_bert.ipynb
- nlp_243_project_qa_roberta.ipynb
-
Execute each of the notebooks sequentially (in the same order as shown above):
- To execute them, please change (in the second cell) the "root_path" to the path of the data folder and execute the notebook.
- To run in Colab
- If you have setup the shortcuts as mentioned in the section "Setting up data for colab" you won't have to change "root_path" as all the paths are already setup.
- Select "Runtime"
- Select "Change Runtime Type"
- Select "Hardware Accelerator" as "GPU"
- Select "Run All"
- Select " Connect to Google Drive" when prompted
- Select the Account you want to use
- Select "Allow"
- To run in Colab
- The results and graphs of each notebook are present at the bottom.
- To execute them, please change (in the second cell) the "root_path" to the path of the data folder and execute the notebook.