- Implementation of an NLP-based Intrusion Detection System (IDS) for binary classification of detected attack packets.
- This task won 1st place (과기정통부 장관상) in the Cybersecurity AI Big Data Challenge (Nov 2022).
The primary task is to classify intrusion detection system (IDS) results into attack packet or non-attack packet, using a binary classification approach.
- Base Model: RoBERTa (SecureBERT)
- Fine-tuned on IDS-related binary classification data.
- Leverages pre-trained language model capabilities for analyzing attack packet data.
- Intrusion Detection System Dataset
- Contains labeled samples for binary classification.
- Size: N million samples.
- Includes features extracted from network traffic packets:
'PAYLOAD', 'APP_PROTO', 'SRC_PORT', 'DST_PORT', 'IMPACT', 'RISK', 'JUDGEMENT', 'Method', 'Method-URL', 'HTTP', 'Host', 'User-Agent', 'Accept', 'Accept-Encoding', 'Accept-Language', 'Accept-Charset', 'Content-Type', 'Content-Length', 'Connection', 'Cookie', 'Upgrade-Insecure-Requests', 'Pragma', 'Cache-Control', 'Body'
IDS-BERT/
├── ckpt/
│ ├── pretrained/
│ └── trained/
├── dataset/
├── data_preprocess.py
├── train.py
├── inference.py
├── utils.py
├── main.ipynb
└── README.md
- Python: 3.9+
- CUDA: 11.7+ (for GPU-based training and inference)
- For a complete list of dependencies, see
requirements.txt
.
git clone https://github.com/cv-lee/IDS-BERT.git
- Place your dataset files in the
dataset/
folder. - Place your pretrained files in the
ckpt/pretrained
folder. - Open the
main.ipynb
file. - Execute the data preprocessing step:
python3 data_preprocess.py
- Train the RoBERTa model using the preprocessed dataset and pretrained model:
python3 train.py
- Use the trained model for binary classification:
python3 inference.py
{
"max_seq_length": 512,
"batch_size": 16,
"learning_rate": 1e-5,
"weight_decay": 0.01,
"num_epochs": 10,
"device": "cuda"
}
For any questions or issues, please contact: