Machine learning approaches to network intrusion detection for contemporary internet traffic

M.U. Ilyas; S.A. Alharbi

doi:10.1007/s00607-021-01050-5

Machine learning approaches to network intrusion detection for contemporary internet traffic

M.U. Ilyas, S.A. Alharbi

Computer Science

Research output: Contribution to journal › Article › peer-review

Abstract

All organizations, be they businesses, governments, infrastructure or utility providers, depend on the availability and functioning of their computers, computer networks and data centers for all or part of their operations. Network intrusion detection systems are the first line of defense that protect computing infrastructure from external attacks. In this study we develop five different Machine Learning classifiers for a number of attacks. We used the CSE-CIC-IDS2018 dataset, developed in a collaborative effort between the Communications Security Establishment and the Canadian Institute for Cybersecurity. It is an extensive network traffic trace dataset that captures multiple attacks and has become available relatively recently. The previous major dataset used for the development of network intrusion detection systems is the KDD Cup’99 dataset, now going on 22 years, which predates mobile computing, Web 2.0/3.0, social media, streaming video and widespread use of SSL. These significant Internet trends of the last two decades demand a reevaluation and redevelopment of intrusion detectors. Prior studies that designed Machine Learning classifiers using the CSE-CIC-IDS2018 dataset use a large and rich set of features, of which at least one is not dataset-invariant. Almost none have explored the appropriateness of using all available features with datasets containing only a few hundred attack class samples. The classifiers developed in this study rely on a justifiable number of features and their performance is reviewed for stability and generalization by reporting not just average performance over 10 fold cross-validation but also the degree of variation from one fold to the next.

Original language	English
Pages (from-to)	1061–1076
Journal	Computing
Volume	104
Issue number	5
Early online date	4 Jan 2022
DOIs	https://doi.org/10.1007/s00607-021-01050-5
Publication status	Published - May 2022

Keywords

CSE-CIC-IDS2018
Machine learning
Malware
Network intrusion detection system

ASJC Scopus subject areas

Theoretical Computer Science
Software
Numerical Analysis
Computer Science Applications
Computational Theory and Mathematics
Computational Mathematics

Access to Document

10.1007/s00607-021-01050-5Licence: None: All rights reserved

Cite this

@article{d70f31f98e234995a2b988ab942ba3b2,

title = "Machine learning approaches to network intrusion detection for contemporary internet traffic",

abstract = "All organizations, be they businesses, governments, infrastructure or utility providers, depend on the availability and functioning of their computers, computer networks and data centers for all or part of their operations. Network intrusion detection systems are the first line of defense that protect computing infrastructure from external attacks. In this study we develop five different Machine Learning classifiers for a number of attacks. We used the CSE-CIC-IDS2018 dataset, developed in a collaborative effort between the Communications Security Establishment and the Canadian Institute for Cybersecurity. It is an extensive network traffic trace dataset that captures multiple attacks and has become available relatively recently. The previous major dataset used for the development of network intrusion detection systems is the KDD Cup{\textquoteright}99 dataset, now going on 22 years, which predates mobile computing, Web 2.0/3.0, social media, streaming video and widespread use of SSL. These significant Internet trends of the last two decades demand a reevaluation and redevelopment of intrusion detectors. Prior studies that designed Machine Learning classifiers using the CSE-CIC-IDS2018 dataset use a large and rich set of features, of which at least one is not dataset-invariant. Almost none have explored the appropriateness of using all available features with datasets containing only a few hundred attack class samples. The classifiers developed in this study rely on a justifiable number of features and their performance is reviewed for stability and generalization by reporting not just average performance over 10 fold cross-validation but also the degree of variation from one fold to the next.",

keywords = "CSE-CIC-IDS2018, Machine learning, Malware, Network intrusion detection system",

author = "M.U. Ilyas and S.A. Alharbi",

year = "2022",

month = may,

doi = "10.1007/s00607-021-01050-5",

language = "English",

volume = "104",

pages = "1061–1076",

journal = "Computing",

issn = "0010-485X",

publisher = "Springer",

number = "5",

}

TY - JOUR

T1 - Machine learning approaches to network intrusion detection for contemporary internet traffic

AU - Ilyas, M.U.

AU - Alharbi, S.A.

PY - 2022/5

Y1 - 2022/5

N2 - All organizations, be they businesses, governments, infrastructure or utility providers, depend on the availability and functioning of their computers, computer networks and data centers for all or part of their operations. Network intrusion detection systems are the first line of defense that protect computing infrastructure from external attacks. In this study we develop five different Machine Learning classifiers for a number of attacks. We used the CSE-CIC-IDS2018 dataset, developed in a collaborative effort between the Communications Security Establishment and the Canadian Institute for Cybersecurity. It is an extensive network traffic trace dataset that captures multiple attacks and has become available relatively recently. The previous major dataset used for the development of network intrusion detection systems is the KDD Cup’99 dataset, now going on 22 years, which predates mobile computing, Web 2.0/3.0, social media, streaming video and widespread use of SSL. These significant Internet trends of the last two decades demand a reevaluation and redevelopment of intrusion detectors. Prior studies that designed Machine Learning classifiers using the CSE-CIC-IDS2018 dataset use a large and rich set of features, of which at least one is not dataset-invariant. Almost none have explored the appropriateness of using all available features with datasets containing only a few hundred attack class samples. The classifiers developed in this study rely on a justifiable number of features and their performance is reviewed for stability and generalization by reporting not just average performance over 10 fold cross-validation but also the degree of variation from one fold to the next.

AB - All organizations, be they businesses, governments, infrastructure or utility providers, depend on the availability and functioning of their computers, computer networks and data centers for all or part of their operations. Network intrusion detection systems are the first line of defense that protect computing infrastructure from external attacks. In this study we develop five different Machine Learning classifiers for a number of attacks. We used the CSE-CIC-IDS2018 dataset, developed in a collaborative effort between the Communications Security Establishment and the Canadian Institute for Cybersecurity. It is an extensive network traffic trace dataset that captures multiple attacks and has become available relatively recently. The previous major dataset used for the development of network intrusion detection systems is the KDD Cup’99 dataset, now going on 22 years, which predates mobile computing, Web 2.0/3.0, social media, streaming video and widespread use of SSL. These significant Internet trends of the last two decades demand a reevaluation and redevelopment of intrusion detectors. Prior studies that designed Machine Learning classifiers using the CSE-CIC-IDS2018 dataset use a large and rich set of features, of which at least one is not dataset-invariant. Almost none have explored the appropriateness of using all available features with datasets containing only a few hundred attack class samples. The classifiers developed in this study rely on a justifiable number of features and their performance is reviewed for stability and generalization by reporting not just average performance over 10 fold cross-validation but also the degree of variation from one fold to the next.

KW - CSE-CIC-IDS2018

KW - Machine learning

KW - Malware

KW - Network intrusion detection system

UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-85122292635&partnerID=MN8TOARS

U2 - 10.1007/s00607-021-01050-5

DO - 10.1007/s00607-021-01050-5

M3 - Article

SN - 0010-485X

VL - 104

SP - 1061

EP - 1076

JO - Computing

JF - Computing

IS - 5

ER -

Machine learning approaches to network intrusion detection for contemporary internet traffic

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this