Skip to content

Code for Mateen: Adaptive Ensemble Learning for Network Anomaly Detection

Notifications You must be signed in to change notification settings

ICL-ml4csec/Mateen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

mtn

Overview

Mateen is an ensemble framework designed to enhance AutoEncoder (AE)-based one-class network intrusion detection systems by effectively managing distribution shifts in network traffic. It comprises four key components:

Shift Detection Function

  • Purpose: Detects distribution shifts in network traffic using statistical methods.

Sample Selection

  • Subset Selection: Identifies a representative subset of the network traffic samples that reflects the overall distribution after a shift.
  • Labeling and Update Decision: The subset is manually labeled to decide whether an update to the ensemble is necessary.

Shift Adaptation Module

  • Incremental Model Update: Integrates the benign data of the labeled subset with the existing training set. Then, updates the incremental model on this expanded set.
  • Temporary Model Training: Initiates a new temporary model with the same weights as the incremental model. Then, train this model exclusively on the benign data of the labeled subset.

Complexity Reduction Module

  • Model Merging: Merges temporary models that perform similarly.
  • Model Pruning: Removes models that underperform compared to the best-performing model.

For further details, please refer to the main paper.

Pre-requisites and requirements

Ensure the following dependencies are installed before running Mateen. You can install them using the command below:

pip install -r requirements.txt

Contents of 'requirements.txt':

torch==2.0.1
numpy==1.25.0
pandas==1.5.3
scipy==1.10.1
sklearn==1.2.2
tqdm==4.65.0

Models and Data

You can download the pre-trained models, the processed data, as well as the results CSV files from the following link:

Google Drive Folder

The contents of the folder are as follows:

  • Datasets.zip: Contains the processed data.
  • Models.zip: Contains the pre-trained models.
  • Results.zip: Prediction results and probabilities across datasets.

Ensure these files are placed in the Mateen/ directory after downloading and extracting.

How to Use Mateen

To utilize Mateen with our settings, please follow these steps to set up the required datasets and run the framework.

Dataset Setup

First, download the datasets as mentioned in the Models and Data section. Ensure that the files are organized in the following directories:

  • Datasets/CICIDS2017/ for IDS2017
  • Datasets/IDS2018/ for IDS2018
  • Datasets/Kitsune/ for Kitsune and its variants.

You can directly download and unzip the datasets into the main directory of Mateen (i.e., Mateen/).

Running Mateen

To run Mateen, use the following command:

python Mateen.py

Command-Line Options

You can customize the execution using various command-line options:

Dataset Selection

Switch between datasets using the '--dataset_name' option.

Example:

python Mateen.py --dataset_name "IDS2017"
Options "IDS2017", "IDS2018", "Kitsune", "mKitsune", and "rKitsune"

Window Size

Set the window size using the '--window_size' option.

Example:

python Mateen.py --dataset_name "IDS2017" --window_size 50000
Options 10000, 50000, and 100000

Shift Detection Threshold

Set the threshold using '--shift_threshold' option.

Example:

python Mateen.py  --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05
Options 0.05, 0.1, and 0.2

Performance Threshold

The minimum acceptable performance '--performance_thres' option.

Example:

python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99
Options 0.99, 0.95, 0.90, 0.85, and 0.8

Maximum Ensemble Size

The maximum acceptable ensemble size '--max_ensemble_length' option.

Example:

python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3
Options 3, 5, and 7

Selection Rate

Set the selection rate for building a subset for manual labeling using the '--selection_budget' option.

Example:

python Mateen.py  --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3 --selection_budget 0.01
Options 0.005, 0.01, 0.05, and 0.1

Mini Batch Size for Sample Selection

Choose the min-batch size using the '--mini_batch_size' option.

Example:

python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3 --selection_budget 0.01 --mini_batch_size 1000
Options 500, 1000, and 1500

Retention Rate

Set the value of the retention rate using '--retention_rate' option.

Example:

python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3 --selection_budget 0.01 --mini_batch_size 1000 --retention_rate 0.3
Options 0.3, 0.5, and 0.9

Lambda 0 value

Adjust the lambda_0 parameter with the '--lambda_0' option to adjust the weight assigned to uniqueness scores during the sample selection process.

Example:

python Mateen.py  --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3 --selection_budget 0.01 --mini_batch_size 1000 --retention_rate 0.3 --lambda_0 0.1
Options 0.1, 0.5, and 1.0

Hyperparameter Selection

For further details about the hyperparameter selection, please refer to the main paper, Appendix C.

Citation

@inproceedings{alotaibi24mateen,
  title={Mateen: Adaptive Ensemble Learning for Network Anomaly Detection},
  author={Alotaibi, Fahad and Maffeis, Sergio},
  booktitle={the 27th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2024)},
  year={2024},
  organization={Association for Computing Machinery}
}

Contact

If you have any questions or need further assistance, please feel free to reach out to me at any time:

About

Code for Mateen: Adaptive Ensemble Learning for Network Anomaly Detection

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages