TY - GEN
T1 - On Reporting Performance and Accuracy Bugs for Deep Learning Frameworks
T2 - The International Conference on Evaluation and Assessment in Software Engineering 202
AU - Long, Guoming
AU - Chen, Tao
PY - 2022/6/13
Y1 - 2022/6/13
N2 - The tremendous success of Deep Learning (DL) has significantly boosted the number of open-sourced DL frameworks hosted on GitHub. Among others, performance and accuracy bugs are critical factors that affect the reputation of these DL frameworks, therefore understanding the practice of discovering and investigating them for DL is important. In this paper, we conduct an exploratory study on the nature of reporting performance and accuracy bugs for DL frameworks, aiming to improve our knowledge on this topic. Our study covers 10 most popular open-sourced DL frameworks on GitHub (e.g., TensorFlow, Keras, and PyTorch), based on which we sample 664 representative performance and accuracy bug reports out of a total population of 22,522. Through systematic analysis, we found that: (1) low speed is the primary reason that a performance bug related report is submitted but we see no consistent pattern for accuracy related ones; (2) most of the reports are about issues encountered in the training stage; (3) only a small proportion of the reports provide insufficient information to investigate; (4) the majority of the performance and accuracy bug reports (from 69% to 100%) are not related to the actual bug or regarded as unclassified; (5) around 50% of the performance and accuracy bug reports, which indeed reveal bugs, are not resolved by direct patches. Deriving from the above, we discuss a set of actionable implications to the researchers, maintainers, and report submitters. To promote open science, the labeled dataset has been made publicly available at https://zenodo.org/record/6371676 .
AB - The tremendous success of Deep Learning (DL) has significantly boosted the number of open-sourced DL frameworks hosted on GitHub. Among others, performance and accuracy bugs are critical factors that affect the reputation of these DL frameworks, therefore understanding the practice of discovering and investigating them for DL is important. In this paper, we conduct an exploratory study on the nature of reporting performance and accuracy bugs for DL frameworks, aiming to improve our knowledge on this topic. Our study covers 10 most popular open-sourced DL frameworks on GitHub (e.g., TensorFlow, Keras, and PyTorch), based on which we sample 664 representative performance and accuracy bug reports out of a total population of 22,522. Through systematic analysis, we found that: (1) low speed is the primary reason that a performance bug related report is submitted but we see no consistent pattern for accuracy related ones; (2) most of the reports are about issues encountered in the training stage; (3) only a small proportion of the reports provide insufficient information to investigate; (4) the majority of the performance and accuracy bug reports (from 69% to 100%) are not related to the actual bug or regarded as unclassified; (5) around 50% of the performance and accuracy bug reports, which indeed reveal bugs, are not resolved by direct patches. Deriving from the above, we discuss a set of actionable implications to the researchers, maintainers, and report submitters. To promote open science, the labeled dataset has been made publicly available at https://zenodo.org/record/6371676 .
KW - Empirical software engineering
KW - mining software repositories
KW - artificial intelligence
KW - performance engineering
M3 - Conference contribution
T3 - EASE: Evaluation and Assessment in Software Engineering
SP - 90
EP - 99
BT - EASE '22
A2 - Staron, Miroslaw
A2 - Berger, Christian
A2 - Simmonds, Jocelyn
A2 - Prikladnicki, Rafael
PB - Association for Computing Machinery (ACM)
Y2 - 13 June 2022 through 15 June 2022
ER -