Towards an understanding of the misclassification rates of machine learning-based malware detection systems

Nada Alruhaily; Behzad Bordbar; Tom Chothia

doi:10.5220/0006174301010112

Towards an understanding of the misclassification rates of machine learning-based malware detection systems

Nada Alruhaily, Behzad Bordbar, Tom Chothia

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

1 Citation (Scopus)

Abstract

A number of machine learning based malware detection systems have been suggested to replace signature based detection methods. These systems have shown that they can provide a high detection rate when recognising non-previously seen malware samples. However, in systems based on behavioural features, some new malware can go undetected as a result of changes in behaviour compared to the training data. In this paper we analysed misclassified malware instances and investigated whether there were recognisable patterns across these misclassifications. Several questions needed to be understood: Can we claim that malware changes over time directly affect the detection rate? Do changes that affect classification occur in malware at the level of families, where all instances that belong to certain families are hard to detect? Alternatively, can such changes be traced back to certain malware variants instead of families? Our experiments showed that these changes are mostly due to behavioural changes at the level of variants across malware families where variants did not behave as expected. This can be due to the adoption of anti-virtualisation techniques, the fact that these variants were looking for a specific argument to be activated or it can be due to the fact that these variants were actually corrupted.

Original language	English
Title of host publication	ICISSP 2017 - Proceedings of the 3rd International Conference on Information Systems Security and Privacy
Editors	Paolo Mori , Steven Furnell, Olivier Camp
Publisher	SciTePress
Pages	101-112
Number of pages	12
Volume	2017-January
ISBN (Electronic)	9789897582097
DOIs	https://doi.org/10.5220/0006174301010112
Publication status	Published - 19 Feb 2017
Event	3rd International Conference on Information Systems Security and Privacy, ICISSP 2017 - Porto, Portugal Duration: 19 Feb 2017 → 21 Feb 2017

Conference

Conference	3rd International Conference on Information Systems Security and Privacy, ICISSP 2017
Country/Territory	Portugal
City	Porto
Period	19/02/17 → 21/02/17

Keywords

Behavioural Analysis
Classification Algorithms
Machine Learning
Malware

ASJC Scopus subject areas

Computer Networks and Communications
Information Systems
Safety, Risk, Reliability and Quality
Computer Science Applications

Access to Document

10.5220/0006174301010112Licence: None: All rights reserved

Cite this

Alruhaily, N., Bordbar, B., & Chothia, T. (2017). Towards an understanding of the misclassification rates of machine learning-based malware detection systems. In P. Mori , S. Furnell, & O. Camp (Eds.), ICISSP 2017 - Proceedings of the 3rd International Conference on Information Systems Security and Privacy (Vol. 2017-January, pp. 101-112). SciTePress. https://doi.org/10.5220/0006174301010112

@inproceedings{e3d8ae28e78f4a75b1e34c3db9103989,

title = "Towards an understanding of the misclassification rates of machine learning-based malware detection systems",

abstract = "A number of machine learning based malware detection systems have been suggested to replace signature based detection methods. These systems have shown that they can provide a high detection rate when recognising non-previously seen malware samples. However, in systems based on behavioural features, some new malware can go undetected as a result of changes in behaviour compared to the training data. In this paper we analysed misclassified malware instances and investigated whether there were recognisable patterns across these misclassifications. Several questions needed to be understood: Can we claim that malware changes over time directly affect the detection rate? Do changes that affect classification occur in malware at the level of families, where all instances that belong to certain families are hard to detect? Alternatively, can such changes be traced back to certain malware variants instead of families? Our experiments showed that these changes are mostly due to behavioural changes at the level of variants across malware families where variants did not behave as expected. This can be due to the adoption of anti-virtualisation techniques, the fact that these variants were looking for a specific argument to be activated or it can be due to the fact that these variants were actually corrupted.",

keywords = "Behavioural Analysis, Classification Algorithms, Machine Learning, Malware",

author = "Nada Alruhaily and Behzad Bordbar and Tom Chothia",

year = "2017",

month = feb,

day = "19",

doi = "10.5220/0006174301010112",

language = "English",

volume = "2017-January",

pages = "101--112",

editor = "{Mori }, {Paolo } and Furnell, {Steven } and Camp, {Olivier }",

booktitle = "ICISSP 2017 - Proceedings of the 3rd International Conference on Information Systems Security and Privacy",

publisher = "SciTePress",

note = "3rd International Conference on Information Systems Security and Privacy, ICISSP 2017 ; Conference date: 19-02-2017 Through 21-02-2017",

}

Alruhaily, N, Bordbar, B & Chothia, T 2017, Towards an understanding of the misclassification rates of machine learning-based malware detection systems. in P Mori , S Furnell & O Camp (eds), ICISSP 2017 - Proceedings of the 3rd International Conference on Information Systems Security and Privacy. vol. 2017-January, SciTePress, pp. 101-112, 3rd International Conference on Information Systems Security and Privacy, ICISSP 2017, Porto, Portugal, 19/02/17. https://doi.org/10.5220/0006174301010112

Towards an understanding of the misclassification rates of machine learning-based malware detection systems. / Alruhaily, Nada; Bordbar, Behzad; Chothia, Tom.
ICISSP 2017 - Proceedings of the 3rd International Conference on Information Systems Security and Privacy. ed. / Paolo Mori ; Steven Furnell; Olivier Camp. Vol. 2017-January SciTePress, 2017. p. 101-112.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Towards an understanding of the misclassification rates of machine learning-based malware detection systems

AU - Alruhaily, Nada

AU - Bordbar, Behzad

AU - Chothia, Tom

PY - 2017/2/19

Y1 - 2017/2/19

N2 - A number of machine learning based malware detection systems have been suggested to replace signature based detection methods. These systems have shown that they can provide a high detection rate when recognising non-previously seen malware samples. However, in systems based on behavioural features, some new malware can go undetected as a result of changes in behaviour compared to the training data. In this paper we analysed misclassified malware instances and investigated whether there were recognisable patterns across these misclassifications. Several questions needed to be understood: Can we claim that malware changes over time directly affect the detection rate? Do changes that affect classification occur in malware at the level of families, where all instances that belong to certain families are hard to detect? Alternatively, can such changes be traced back to certain malware variants instead of families? Our experiments showed that these changes are mostly due to behavioural changes at the level of variants across malware families where variants did not behave as expected. This can be due to the adoption of anti-virtualisation techniques, the fact that these variants were looking for a specific argument to be activated or it can be due to the fact that these variants were actually corrupted.

AB - A number of machine learning based malware detection systems have been suggested to replace signature based detection methods. These systems have shown that they can provide a high detection rate when recognising non-previously seen malware samples. However, in systems based on behavioural features, some new malware can go undetected as a result of changes in behaviour compared to the training data. In this paper we analysed misclassified malware instances and investigated whether there were recognisable patterns across these misclassifications. Several questions needed to be understood: Can we claim that malware changes over time directly affect the detection rate? Do changes that affect classification occur in malware at the level of families, where all instances that belong to certain families are hard to detect? Alternatively, can such changes be traced back to certain malware variants instead of families? Our experiments showed that these changes are mostly due to behavioural changes at the level of variants across malware families where variants did not behave as expected. This can be due to the adoption of anti-virtualisation techniques, the fact that these variants were looking for a specific argument to be activated or it can be due to the fact that these variants were actually corrupted.

KW - Behavioural Analysis

KW - Classification Algorithms

KW - Machine Learning

KW - Malware

UR - http://www.scopus.com/inward/record.url?scp=85049078445&partnerID=8YFLogxK

U2 - 10.5220/0006174301010112

DO - 10.5220/0006174301010112

M3 - Conference contribution

AN - SCOPUS:85049078445

VL - 2017-January

SP - 101

EP - 112

BT - ICISSP 2017 - Proceedings of the 3rd International Conference on Information Systems Security and Privacy

A2 - Mori , Paolo

A2 - Furnell, Steven

A2 - Camp, Olivier

PB - SciTePress

T2 - 3rd International Conference on Information Systems Security and Privacy, ICISSP 2017

Y2 - 19 February 2017 through 21 February 2017

ER -

Towards an understanding of the misclassification rates of machine learning-based malware detection systems

Abstract

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this