GitHub-Issue-Classifier

Overview

The GitHub-Issue-Classifier is a machine learning tool designed to automatically classify issues on GitHub based on their content. It utilizes advanced natural language processing (NLP) techniques to preprocess the text of issues, and several machine learning models to determine the most appropriate labels for each issue. This classifier helps manage and triage incoming GitHub issues more efficiently.

Features

Text Processing Techniques: Cleans and prepares text data by removing punctuations, stopwords, and applying lemmatization to reduce words to their base forms.
Text Representations:
- TF-IDF Vectors (Bag of Words): Transforms text into a meaningful representation of numbers which is easy to compare.
- Topic Models (LDA): Uses Latent Dirichlet Allocation to discover abstract topics within text.
- Word Embeddings: Utilizes dense representations of words that capture contextual nuances.
Machine Learning Models:
- Random Forests
- Gaussian Naive Bayes
- Neural Network
- Support Vector Machine (SVM)

The system trains on a dataset using combinations of these models and settings to identify the best performing model and setting configuration.

Prerequisites

Ensure you have Python installed on your system to run the scripts. You can download Python here.

Getting Started

Step 1: Clone the Repository

To get started, clone this repository to your local machine using the following command:

git clone [repository-url]
cd GitHub-Issue-Classifier

Replace [repository-url] with the actual URL of the repository.

Step 2: Install Dependencies

Install the necessary Python libraries using:

pip install -r requirements.txt

Step 3: Run the Classifier

Execute the script by running:

python github_issue_classifier.py

Follow the on-screen instructions to input issue data and receive classifications.

How It Works

Data Preprocessing: The script first preprocesses the text data using the specified text processing techniques.
Feature Extraction: It then converts the preprocessed text into one of the selected representations.
Model Training: Various models are trained using combinations of preprocessing techniques and text representations.
Evaluation and Selection: The models are evaluated, and the best performing combination is selected for use.
Classification: New issues are classified using the selected model and settings.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
github_issue_classifier.py		github_issue_classifier.py
github_issue_data_stats.py		github_issue_data_stats.py
github_issue_models_metrics.py		github_issue_models_metrics.py
github_issue_nn_trainer.py		github_issue_nn_trainer.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub-Issue-Classifier

Overview

Features

Prerequisites

Getting Started

Step 1: Clone the Repository

Step 2: Install Dependencies

Step 3: Run the Classifier

How It Works

About

Releases

Packages

Languages

2170chm/Github-Issue-Classifier

Folders and files

Latest commit

History

Repository files navigation

GitHub-Issue-Classifier

Overview

Features

Prerequisites

Getting Started

Step 1: Clone the Repository

Step 2: Install Dependencies

Step 3: Run the Classifier

How It Works

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages