A Novel Data Stream Learning Approach to Tackle One-Sided Label Noise From Verification Latency

Liyan Song, Shuxian Li, Leandro Minku, Xin Yao

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Many real-world data stream applications suffer
from verification latency, where the labels of the training examples
arrive with a delay. In binary classification problems, the
labeling process frequently involves waiting for a pre-determined
period of time to observe an event that assigns the example to
a given class. Once this time passes, if such labeling event does
not occur, the example is labeled as belonging to the other class.
For example, in software defect prediction, one may wait to see
if a defect is associated to a software change implemented by a
developer, producing a defect-inducing training example. If no
defect is found during the waiting time, the training example
is labeled as clean. Such verification latency inherently causes
label noise associated to insufficient waiting time. For example,
a defect may be observed only after the pre-defined waiting time
has passed, resulting in a noisy example of the clean class. Due
to the nature of the waiting time, such noise is frequently onesided,
meaning that it only occurs to examples of one of the
classes. However, no existing work tackles label noise associated
to verification latency. This paper proposes a novel data stream
learning approach that estimates the confidence in the labels
assigned to the training examples and uses this to improve
predictive performance in problems with one-sided label noise.
Our experiments with 14 real-world datasets from the domain
of software defect prediction demonstrate the effectiveness of the
proposed approach compared to existing ones.
Original languageEnglish
Title of host publicationInternational Joint Conference on Neural Networks (IJCNN 2022)
Publication statusAccepted/In press - 26 Apr 2022


Dive into the research topics of 'A Novel Data Stream Learning Approach to Tackle One-Sided Label Noise From Verification Latency'. Together they form a unique fingerprint.

Cite this