A novel data stream learning approach to tackle one-sided label noise from verification latency

Liyan Song; Shuxian Li; Leandro Minku; Xin Yao

doi:10.1109/IJCNN55064.2022.9891911

A novel data stream learning approach to tackle one-sided label noise from verification latency

Liyan Song, Shuxian Li, Leandro Minku^*, Xin Yao^*

^*Corresponding author for this work

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

66 Downloads (Pure)

Abstract

Many real-world data stream applications suffer from verification latency, where the labels of the training examples arrive with a delay. In binary classification problems, the labeling process frequently involves waiting for a pre-determined period of time to observe an event that assigns the example to a given class. Once this time passes, if such labeling event does not occur, the example is labeled as belonging to the other class. For example, in software defect prediction, one may wait to see if a defect is associated to a software change implemented by a developer, producing a defect-inducing training example. If no defect is found during the waiting time, the training example is labeled as clean. Such verification latency inherently causes label noise associated to insufficient waiting time. For example, a defect may be observed only after the pre-defined waiting time has passed, resulting in a noisy example of the clean class. Due to the nature of the waiting time, such noise is frequently onesided, meaning that it only occurs to examples of one of the classes. However, no existing work tackles label noise associated to verification latency. This paper proposes a novel data stream learning approach that estimates the confidence in the labels assigned to the training examples and uses this to improve predictive performance in problems with one-sided label noise. Our experiments with 14 real-world datasets from the domain of software defect prediction demonstrate the effectiveness of the proposed approach compared to existing ones.

Original language	English
Title of host publication	2022 International Joint Conference on Neural Networks (IJCNN)
Place of Publication	Piscataway, NJ
Publisher	IEEE
Pages	1-8
Number of pages	8
ISBN (Electronic)	978-1-7281-8671-9
ISBN (Print)	978-1-6654-9526-4
DOIs	https://doi.org/10.1109/IJCNN55064.2022.9891911
Publication status	E-pub ahead of print - 30 Sept 2022
Event	2022 International Joint Conference on Neural Networks (IJCNN) - University of Padua, Padua, Italy Duration: 18 Jul 2022 → 23 Jul 2022

Publication series

Name	International Joint Conference on Neural Networks (IJCNN)
Publisher	IEEE
ISSN (Print)	2161-4393
ISSN (Electronic)	2161-4407

Conference

Conference	2022 International Joint Conference on Neural Networks (IJCNN)
Abbreviated title	IJCNN 2022
Country/Territory	Italy
City	Padua
Period	18/07/22 → 23/07/22

Access to Document

10.1109/IJCNN55064.2022.9891911Licence: None: All rights reserved

SongL2022novel
This is the Accepted Author Manuscript (AAM) of an article published by IEEE, L. Song, S. Li, L. L. Minku and X. Yao, "A Novel Data Stream Learning Approach to Tackle One-Sided Label Noise From Verification Latency," 2022 International Joint Conference on Neural Networks (IJCNN), 2022, pp. 1-8, doi: 10.1109/IJCNN55064.2022.9891911. © 20XX IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 531 KBLicence: Other (please specify with Rights Statement)

Cite this

@inproceedings{2d06adfb61fa4bf68184c491f928a958,

title = "A novel data stream learning approach to tackle one-sided label noise from verification latency",

abstract = "Many real-world data stream applications suffer from verification latency, where the labels of the training examples arrive with a delay. In binary classification problems, the labeling process frequently involves waiting for a pre-determined period of time to observe an event that assigns the example to a given class. Once this time passes, if such labeling event does not occur, the example is labeled as belonging to the other class. For example, in software defect prediction, one may wait to see if a defect is associated to a software change implemented by a developer, producing a defect-inducing training example. If no defect is found during the waiting time, the training example is labeled as clean. Such verification latency inherently causes label noise associated to insufficient waiting time. For example, a defect may be observed only after the pre-defined waiting time has passed, resulting in a noisy example of the clean class. Due to the nature of the waiting time, such noise is frequently onesided, meaning that it only occurs to examples of one of the classes. However, no existing work tackles label noise associated to verification latency. This paper proposes a novel data stream learning approach that estimates the confidence in the labels assigned to the training examples and uses this to improve predictive performance in problems with one-sided label noise. Our experiments with 14 real-world datasets from the domain of software defect prediction demonstrate the effectiveness of the proposed approach compared to existing ones. ",

author = "Liyan Song and Shuxian Li and Leandro Minku and Xin Yao",

year = "2022",

month = sep,

day = "30",

doi = "10.1109/IJCNN55064.2022.9891911",

language = "English",

isbn = "978-1-6654-9526-4",

series = "International Joint Conference on Neural Networks (IJCNN)",

publisher = "IEEE",

pages = "1--8",

booktitle = "2022 International Joint Conference on Neural Networks (IJCNN)",

note = "2022 International Joint Conference on Neural Networks (IJCNN), IJCNN 2022 ; Conference date: 18-07-2022 Through 23-07-2022",

}

Song, L, Li, S, Minku, L & Yao, X 2022, A novel data stream learning approach to tackle one-sided label noise from verification latency. in 2022 International Joint Conference on Neural Networks (IJCNN). International Joint Conference on Neural Networks (IJCNN), IEEE, Piscataway, NJ, pp. 1-8, 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18/07/22. https://doi.org/10.1109/IJCNN55064.2022.9891911

A novel data stream learning approach to tackle one-sided label noise from verification latency. / Song, Liyan; Li, Shuxian; Minku, Leandro et al.
2022 International Joint Conference on Neural Networks (IJCNN). Piscataway, NJ: IEEE, 2022. p. 1-8 (International Joint Conference on Neural Networks (IJCNN)).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - A novel data stream learning approach to tackle one-sided label noise from verification latency

AU - Song, Liyan

AU - Li, Shuxian

AU - Minku, Leandro

AU - Yao, Xin

PY - 2022/9/30

Y1 - 2022/9/30

N2 - Many real-world data stream applications suffer from verification latency, where the labels of the training examples arrive with a delay. In binary classification problems, the labeling process frequently involves waiting for a pre-determined period of time to observe an event that assigns the example to a given class. Once this time passes, if such labeling event does not occur, the example is labeled as belonging to the other class. For example, in software defect prediction, one may wait to see if a defect is associated to a software change implemented by a developer, producing a defect-inducing training example. If no defect is found during the waiting time, the training example is labeled as clean. Such verification latency inherently causes label noise associated to insufficient waiting time. For example, a defect may be observed only after the pre-defined waiting time has passed, resulting in a noisy example of the clean class. Due to the nature of the waiting time, such noise is frequently onesided, meaning that it only occurs to examples of one of the classes. However, no existing work tackles label noise associated to verification latency. This paper proposes a novel data stream learning approach that estimates the confidence in the labels assigned to the training examples and uses this to improve predictive performance in problems with one-sided label noise. Our experiments with 14 real-world datasets from the domain of software defect prediction demonstrate the effectiveness of the proposed approach compared to existing ones.

AB - Many real-world data stream applications suffer from verification latency, where the labels of the training examples arrive with a delay. In binary classification problems, the labeling process frequently involves waiting for a pre-determined period of time to observe an event that assigns the example to a given class. Once this time passes, if such labeling event does not occur, the example is labeled as belonging to the other class. For example, in software defect prediction, one may wait to see if a defect is associated to a software change implemented by a developer, producing a defect-inducing training example. If no defect is found during the waiting time, the training example is labeled as clean. Such verification latency inherently causes label noise associated to insufficient waiting time. For example, a defect may be observed only after the pre-defined waiting time has passed, resulting in a noisy example of the clean class. Due to the nature of the waiting time, such noise is frequently onesided, meaning that it only occurs to examples of one of the classes. However, no existing work tackles label noise associated to verification latency. This paper proposes a novel data stream learning approach that estimates the confidence in the labels assigned to the training examples and uses this to improve predictive performance in problems with one-sided label noise. Our experiments with 14 real-world datasets from the domain of software defect prediction demonstrate the effectiveness of the proposed approach compared to existing ones.

UR - https://ieeexplore.ieee.org/xpl/conhome/1000500/all-proceedings

UR - https://wcci2022.org/

U2 - 10.1109/IJCNN55064.2022.9891911

DO - 10.1109/IJCNN55064.2022.9891911

M3 - Conference contribution

SN - 978-1-6654-9526-4

T3 - International Joint Conference on Neural Networks (IJCNN)

SP - 1

EP - 8

BT - 2022 International Joint Conference on Neural Networks (IJCNN)

PB - IEEE

CY - Piscataway, NJ

T2 - 2022 International Joint Conference on Neural Networks (IJCNN)

Y2 - 18 July 2022 through 23 July 2022

ER -

A novel data stream learning approach to tackle one-sided label noise from verification latency

Abstract

Publication series

Conference

Access to Document

Fingerprint

Cite this