Understanding data quality: ensuring data quality by design in the rail industry

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The railways worldwide are increasingly looking to the integration of their data resources coupled with advanced analytics to enhance traffic management, to provide new insights on the health of infrastructure assets, to provide soft linkages to other transport modes, and ultimately to enable them to better serve their customers. As in many industrial sectors, over the past decade the rail industry has been investing heavily in sensing technologies that record every aspect of the operation of the railway network. However, as any data scientist knows, it does not matter how good an algorithm is, if you put rubbish in, you get rubbish out; and as the traditional industry model of working with data only within the system that it was collected by becomes increasingly fragile, the industry is discovering that it knows less than it thought about the data it is gathering. When coupled with legacy data resources of unknown accuracy, such as design diagrams for assets that in many cases are decades old, the rail industry now faces a crisis in which its data may become essentially worthless due to a poor understanding of the quality of its data. This paper reports the findings of the first phase of a three-phase systematic review of literature about how data quality can be managed and evaluated in the rail domain. It begins by discussing why data quality matters in a rail context, before going on to define the quality, introduce and expand the concept of a data quality schema.

Details

Original languageEnglish
Title of host publicationProceedings of the 2017 IEEE International Conference on Big Data (BIGDATA)
Publication statusPublished - 15 Jan 2018
Event2017 IEEE International Conference on Big Data - Westin Copley Plaza Hotel, 10 Huntington Avenue, Boston, MA 02116, Boston, United States
Duration: 11 Dec 201714 Dec 2017

Conference

Conference2017 IEEE International Conference on Big Data
Abbreviated titleBigData 2017
CountryUnited States
CityBoston
Period11/12/1714/12/17

Keywords

  • Data quality, Rail, Quality by design, Data quality schema