Intrusion-Detection-System-using-Machine-Learning-Methods
The intrusion detection systems are an integral part of modern communication networks. The business environments require a high level of security to safeguard their private data from any unauthorized personnel. The current intrusion detection systems are a step upgrade from the conventional anti-virus software. Two main categories based on their working. These are: • Network Intrusion Detection Systems (NIDS): These systems continuously monitor the network traffic and analyze the packets for a possible rule infringement. • Host-based Intrusion Detection Systems (HIDS): These systems monitor the operating system files of an end-user system to detect malicious software that might temper with its normal functioning.
The model block diagram gives us a flow of the entire process. We start with the raw data available in the train.csv file and use the pandas library to manage it efficiently. Every dataset needs to be preprocessed before implementing a model so we used the scikit-learn library to normalize, remove outliers, etc. We used the recursive feature elimination wrapped with random forest classifier to extract most influential features. At the next stage, we created two different models to compare their accuracy and results. Finally, we deployed the more accurate model on the test.csv dataset to predict the new classification.
The dataset used for this project is collected from Kaggle by simulating a wide variety of intrusions in a military network environment. 41 features were obtained for each connection row from both the categories. The class variable is either normal or anomalous. Features for intrusion in network.
Data obtained from https://www.kaggle.com/sampadab17/network-intrusion-detection
At this stage we analyzed the two key features – the number of failed logins and the superuser attempted. This gives us a rough idea of how access to the superuser is related to the failed logins. In the event of an attack, it is likely that if you are impersonating someone, you will do multiple logins attempts.
We used the random forest classifier wrapped by the recursive feature elimination to select the top 10 features from the list.
We used the random forest classification (RFC) at the core of RFE to select the top features. The features in the descending order of priority are:
Here we look at the correlation between the selected features.
We created two different models – c2.4 and random forest to compare and evaluate the performance. We did the train test split to see the results of the model.
We compared the two models to choose the better option for final prediction of the unknown variable in the dataset.
iampopg Matric Number: ******* Email: [email protected]
Neetesh Bhati, Email: ********
Puneet Agarwal