Mateen is an ensemble framework designed to enhance AutoEncoder (AE)-based one-class network intrusion detection systems by effectively managing distribution shifts in network traffic. It comprises four key components:
- Purpose: Detects distribution shifts in network traffic using statistical methods.
- Subset Selection: Identifies a representative subset of the network traffic samples that reflects the overall distribution after a shift.
- Labeling and Update Decision: The subset is manually labeled to decide whether an update to the ensemble is necessary.
- Incremental Model Update: Integrates the benign data of the labeled subset with the existing training set. Then, updates the incremental model on this expanded set.
- Temporary Model Training: Initiates a new temporary model with the same weights as the incremental model. Then, train this model exclusively on the benign data of the labeled subset.
- Model Merging: Merges temporary models that perform similarly.
- Model Pruning: Removes models that underperform compared to the best-performing model.
For further details, please refer to the main paper.
Ensure the following dependencies are installed before running Mateen. You can install them using the command below:
pip install -r requirements.txt
Contents of 'requirements.txt':
torch==2.0.1
numpy==1.25.0
pandas==1.5.3
scipy==1.10.1
sklearn==1.2.2
tqdm==4.65.0
You can download the pre-trained models, the processed data, as well as the results CSV files from the following link:
The contents of the folder are as follows:
Datasets.zip
: Contains the processed data.Models.zip
: Contains the pre-trained models.Results.zip
: Prediction results and probabilities across datasets.
Ensure these files are placed in the Mateen/
directory after downloading and extracting.
To utilize Mateen with our settings, please follow these steps to set up the required datasets and run the framework.
First, download the datasets as mentioned in the Models and Data section. Ensure that the files are organized in the following directories:
Datasets/CICIDS2017/
for IDS2017Datasets/IDS2018/
for IDS2018Datasets/Kitsune/
for Kitsune and its variants.
You can directly download and unzip the datasets into the main directory of Mateen (i.e., Mateen/
).
To run Mateen, use the following command:
python Mateen.py
You can customize the execution using various command-line options:
Switch between datasets using the '--dataset_name' option.
Example:
python Mateen.py --dataset_name "IDS2017"
Options
"IDS2017", "IDS2018", "Kitsune", "mKitsune", and "rKitsune"Set the window size using the '--window_size' option.
Example:
python Mateen.py --dataset_name "IDS2017" --window_size 50000
Options
10000, 50000, and 100000Set the threshold using '--shift_threshold' option.
Example:
python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05
Options
0.05, 0.1, and 0.2The minimum acceptable performance '--performance_thres' option.
Example:
python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99
Options
0.99, 0.95, 0.90, 0.85, and 0.8The maximum acceptable ensemble size '--max_ensemble_length' option.
Example:
python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3
Options
3, 5, and 7Set the selection rate for building a subset for manual labeling using the '--selection_budget' option.
Example:
python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3 --selection_budget 0.01
Options
0.005, 0.01, 0.05, and 0.1Choose the min-batch size using the '--mini_batch_size' option.
Example:
python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3 --selection_budget 0.01 --mini_batch_size 1000
Options
500, 1000, and 1500Set the value of the retention rate using '--retention_rate' option.
Example:
python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3 --selection_budget 0.01 --mini_batch_size 1000 --retention_rate 0.3
Options
0.3, 0.5, and 0.9Adjust the lambda_0 parameter with the '--lambda_0' option to adjust the weight assigned to uniqueness scores during the sample selection process.
Example:
python Mateen.py --dataset_name "IDS2017" --window_size 50000 --shift_threshold 0.05 --performance_thres 0.99 --max_ensemble_length 3 --selection_budget 0.01 --mini_batch_size 1000 --retention_rate 0.3 --lambda_0 0.1
Options
0.1, 0.5, and 1.0For further details about the hyperparameter selection, please refer to the main paper, Appendix C.
@inproceedings{alotaibi24mateen,
title={Mateen: Adaptive Ensemble Learning for Network Anomaly Detection},
author={Alotaibi, Fahad and Maffeis, Sergio},
booktitle={the 27th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2024)},
year={2024},
organization={Association for Computing Machinery}
}
If you have any questions or need further assistance, please feel free to reach out to me at any time:
- Email:
[email protected]
- Alternate Email:
[email protected]