Improving data quality in practice: A case study in the italian public administration

P. Missier*, G. Lalk, V. Verykios, F. Grillo, T. Lorusso, P. Angeletti

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Assessing and improving the quality of data stored in information systems are both important and difficult tasks. For an increasing number of companies that rely on information as one of their most important assets, enforcing high data quality levels represents a strategic investment aimed at preserving the value of those assets. For a public administration or a government good data quality translates into good service and good relationships with the citizens. Achieving high quality standards, however, IS a major task because of the variety of ways that errors might be introduced in a system, and the difficulty of correcting them in a systematic way. Problems with data quality tend to fall into two categories. The first category is related to inconsistency among systems such as format, syntax and semantic inconsistencies. The second category is related to inconsistency with reality as it is exemplified by missing, obsolete and incorrect data values and outliers. In this paper, we describe a real-life case study on assessing and improving the quality of the data in the Italian Public Administration. The domain of study is set on taxpayer's data maintained by the Italian Ministry of Finances. In this context, we provide the Administration with a quantitative reckoning of such specific problems as record duplication and address mismatch and obsolescence, we suggest a set of guidelines for setting precise quality improvement goals, and we illustrate analysis techniques for achieving those goals. Our guidelines emphasize the importance of data flow analysis and of the definition of measurable quality indicators. The quality indicators that we propose are generic and can be used to describe a variety of data quality problems, thus representing a possible reference framework for practitioners. Finally, we investigate ways to partially automate the analysis of the causes for poor data quality.

Original languageEnglish
Pages (from-to)135-160
Number of pages26
JournalDistributed and Parallel Databases
Volume13
Issue number2
DOIs
Publication statusPublished - Mar 2003

Keywords

  • Automatic data error detection
  • Data quality management
  • E-government
  • Record matching

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'Improving data quality in practice: A case study in the italian public administration'. Together they form a unique fingerprint.

Cite this