TY - GEN
T1 - Measuring data completeness for microbial genomics database
AU - Emran, Nurul A.
AU - Embury, Suzanne
AU - Missier, Paolo
AU - Isa, Mohd Noor Mat
AU - Muda, Azah Kamilah
PY - 2013
Y1 - 2013
N2 - Poor quality data such as data with missing values (or records) cause negative consequences in many application domains. An important aspect of data quality is completeness. One problem in data completeness is the problem of missing individuals in data sets. Within a data set, the individuals refer to the real world entities whose information is recorded. So far, in completeness studies however, there has been little discussion about how missing individuals are assessed. In this paper, we propose the notion of population-based completeness (PBC) that deals with the missing individuals problem, with the aim of investigating what is required to measure PBC and to identify what is needed to support PBC measurements in practice. This paper explores the need of PBC in the microbial genomics where real sample data sets retrieved from a microbial database called Comprehensive Microbial Resources are used (CMR).
AB - Poor quality data such as data with missing values (or records) cause negative consequences in many application domains. An important aspect of data quality is completeness. One problem in data completeness is the problem of missing individuals in data sets. Within a data set, the individuals refer to the real world entities whose information is recorded. So far, in completeness studies however, there has been little discussion about how missing individuals are assessed. In this paper, we propose the notion of population-based completeness (PBC) that deals with the missing individuals problem, with the aim of investigating what is required to measure PBC and to identify what is needed to support PBC measurements in practice. This paper explores the need of PBC in the microbial genomics where real sample data sets retrieved from a microbial database called Comprehensive Microbial Resources are used (CMR).
KW - completeness measurement
KW - data completeness
KW - population-based completeness (PBC)
UR - http://www.scopus.com/inward/record.url?scp=84874616141&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-36546-1_20
DO - 10.1007/978-3-642-36546-1_20
M3 - Conference contribution
AN - SCOPUS:84874616141
SN - 9783642365454
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 186
EP - 195
BT - Intelligent Information and Database Systems - 5th Asian Conference, ACIIDS 2013, Proceedings
T2 - 5th Asian Conference on Intelligent Information and Database Systems, ACIIDS 2013
Y2 - 18 March 2013 through 20 March 2013
ER -