Exjobb Projects Visualizer

Introduction

This project automates the process of scraping Exjobb (master thesis) project data from Linköping University using Selenium. The scraped data is stored in a CSV file and visualized through an interactive Streamlit application. Additionally, GitHub Actions is configured to run the scraper daily, ensuring the data remains up-to-date.

Features

Automated Web Scraping: Uses Selenium to extract project details such as title, organization, research field, and application deadlines.
Data Storage: Saves scraped data in a structured CSV format for easy access and analysis.
Interactive Visualization: Streamlit app provides interactive charts and filters to explore the data.
Daily Updates: GitHub Actions workflow ensures the scraper runs daily, keeping the dataset current.
Data Download: Users can download filtered data in CSV and Excel formats directly from the Streamlit app.

Prerequisites

Before you begin, ensure you have met the following requirements:

Python 3.7 or higher installed on your machine. You can download it from here.
Google Chrome browser installed. Download it from here.
ChromeDriver compatible with your Chrome version. You can download it from here.

Installation

Clone the Repository

git clone https://github.com/your-username/liu-exjobb-crawler.git
cd liu-exjobb-crawler

Create a Virtual Environment (Optional but Recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies
```
pip install -r requirements.txt
```
Set Up ChromeDriver
- Download the ChromeDriver version that matches your installed Chrome browser.
- Extract the chromedriver executable and place it in a directory that's in your system's PATH or specify its path in the liu_data.py script.

Usage

Running the Scraper

The scraper script *_data.py uses Selenium to navigate the Exjobb website, extract project information, and save it to a CSV file.

python *_data.py

After running, the scraped data will be available at data/exjobb_projects.csv.

Launching the Streamlit App

The Streamlit application streamlit_app.py visualizes the scraped data.

streamlit run streamlit_app.py

This command will open a new tab in your default web browser displaying the interactive dashboard.

Directory Structure

├── data
│   └── *_exjobb_projects.csv       # Scraped project data
├── *_data.py                   # Web scraper script
├── README.md                     # Project documentation
├── requirements.txt              # Python dependencies
└── streamlit_app.py              # Streamlit visualization app

data/: Contains the CSV file with the scraped Exjobb project data.
liu_data.py: Python script that performs web scraping using Selenium.
streamlit_app.py: Streamlit application for data visualization and interaction.
requirements.txt: Lists all Python libraries required to run the project.
README.md: Provides an overview and instructions for the project.

Automated Updates

To ensure that the scraped data is updated daily, a GitHub Actions workflow is set up.

GitHub Actions Workflow

The workflow file .github/workflows/auto.yml is configured to run the liu_data.py script daily at 02:00 UTC.
Setup Steps
- Ensure that .github/workflows/auto.yml is present in your repository.
- The workflow installs necessary dependencies, Chrome, and ChromeDriver before running the scraper.
- After scraping, it commits and pushes the updated exjobb_projects.csv back to the repository.
Monitoring
- Navigate to the Actions tab in your GitHub repository to monitor workflow runs.
- Ensure that the workflow completes successfully and updates the data as expected.

Contributing

Contributions are welcome! Follow these steps to contribute:

Fork the Repository
Create a New Branch
```
git checkout -b feature/YourFeature
```
Make Changes and Commit
```
git commit -m "Add new feature"
```
Push to the Branch
```
git push origin feature/YourFeature
```
Open a Pull Request

License

This project is licensed under the MIT License.

Contact

For any inquiries or suggestions, please contact me.

🚀

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.github/workflows		.github/workflows
data		data
pages		pages
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cth_data.py		cth_data.py
kth_data.py		kth_data.py
liu_data.py		liu_data.py
logo.png		logo.png
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py
utils.py		utils.py
utils_cth.py		utils_cth.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exjobb Projects Visualizer

Table of Contents

Introduction

Features

Prerequisites

Installation

Usage

Running the Scraper

Launching the Streamlit App

Directory Structure

Automated Updates

Contributing

License

Contact

About

Releases

Packages

Contributors 2

Languages

License

Qervas/exjobb

Folders and files

Latest commit

History

Repository files navigation

Exjobb Projects Visualizer

Table of Contents

Introduction

Features

Prerequisites

Installation

Usage

Running the Scraper

Launching the Streamlit App

Directory Structure

Automated Updates

Contributing

License

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages