AI Based Intrusion Detection System (IDS)

Implementation of an NLP-based Intrusion Detection System (IDS) for binary classification of detected attack packets.
This task won 1st place (과기정통부 장관상) in the Cybersecurity AI Big Data Challenge (Nov 2022).

📋 Task

The primary task is to classify intrusion detection system (IDS) results into attack packet or non-attack packet, using a binary classification approach.

🤖 Model

Base Model: RoBERTa (SecureBERT)
- Fine-tuned on IDS-related binary classification data.
- Leverages pre-trained language model capabilities for analyzing attack packet data.

📊 Dataset

Intrusion Detection System Dataset

Contains labeled samples for binary classification.
Size: N million samples.
Includes features extracted from network traffic packets:

'PAYLOAD', 'APP_PROTO', 'SRC_PORT', 'DST_PORT', 'IMPACT', 'RISK', 'JUDGEMENT', 'Method', 'Method-URL', 'HTTP', 'Host', 'User-Agent', 'Accept', 'Accept-Encoding', 'Accept-Language', 'Accept-Charset', 'Content-Type', 'Content-Length', 'Connection', 'Cookie', 'Upgrade-Insecure-Requests', 'Pragma', 'Cache-Control', 'Body'

📂 Repository Structure

IDS-BERT/
├── ckpt/                 
│   ├── pretrained/              
│   └── trained/                 
├── dataset/                    
├── data_preprocess.py        
├── train.py    
├── inference.py          
├── utils.py                 
├── main.ipynb                       
└── README.md

📚 Requirements

Python: 3.9+
CUDA: 11.7+ (for GPU-based training and inference)
For a complete list of dependencies, see requirements.txt.

🚀 Getting Started

1. Clone the Repository

git clone https://github.com/cv-lee/IDS-BERT.git

2. Prepare the Dataset & Pretrained Model

Place your dataset files in the dataset/ folder.
Place your pretrained files in the ckpt/pretrained folder.
Open the main.ipynb file.
Execute the data preprocessing step:
```
python3 data_preprocess.py
```

3. Train the Model

Train the RoBERTa model using the preprocessed dataset and pretrained model:
```
python3 train.py
```

4. Run Inference

Use the trained model for binary classification:
```
python3 inference.py
```

📄 Configuration

{
  "max_seq_length": 512,
  "batch_size": 16,
  "learning_rate": 1e-5,
  "weight_decay": 0.01,
  "num_epochs": 10,
  "device": "cuda"
}

📬 Contact

For any questions or issues, please contact:

Joohyun Lee

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI Based Intrusion Detection System (IDS)

📋 Task

🤖 Model

📊 Dataset

📂 Repository Structure

📚 Requirements

🚀 Getting Started

1. Clone the Repository

2. Prepare the Dataset & Pretrained Model

3. Train the Model

4. Run Inference

📄 Configuration

📬 Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
ckpt		ckpt
dataset		dataset
README.md		README.md
data_preprocess.py		data_preprocess.py
inference.py		inference.py
main.ipynb		main.ipynb
train.py		train.py
utils.py		utils.py

cv-lee/IDS-BERT

Folders and files

Latest commit

History

Repository files navigation

AI Based Intrusion Detection System (IDS)

📋 Task

🤖 Model

📊 Dataset

📂 Repository Structure

📚 Requirements

🚀 Getting Started

1. Clone the Repository

2. Prepare the Dataset & Pretrained Model

3. Train the Model

4. Run Inference

📄 Configuration

📬 Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages