Understanding data quality: ensuring data quality by design in the rail industry

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)
1122 Downloads (Pure)


The railways worldwide are increasingly looking to the integration of their data resources coupled with advanced analytics to enhance traffic management, to provide new insights on the health of infrastructure assets, to provide soft linkages to other transport modes, and ultimately to enable them to better serve their customers. As in many industrial sectors, over the past decade the rail industry has been investing heavily in sensing technologies that record every aspect of the operation of the railway network. However, as any data scientist knows, it does not matter how good an algorithm is, if you put rubbish in, you get rubbish out; and as the traditional industry model of working with data only within the system that it was collected by becomes increasingly fragile, the industry is discovering that it knows less than it thought about the data it is gathering. When coupled with legacy data resources of unknown accuracy, such as design diagrams for assets that in many cases are decades old, the rail industry now faces a crisis in which its data may become essentially worthless due to a poor understanding of the quality of its data. This paper reports the findings of the first phase of a three-phase systematic review of literature about how data quality can be managed and evaluated in the rail domain. It begins by discussing why data quality matters in a rail context, before going on to define the quality, introduce and expand the concept of a data quality schema.
Original languageEnglish
Title of host publicationProceedings of the 2017 IEEE International Conference on Big Data (BIGDATA)
PublisherIEEE Xplore
ISBN (Electronic)9781538627150
Publication statusPublished - 15 Jan 2018
Event2017 IEEE International Conference on Big Data - Westin Copley Plaza Hotel, 10 Huntington Avenue, Boston, MA 02116, Boston, United States
Duration: 11 Dec 201714 Dec 2017


Conference2017 IEEE International Conference on Big Data
Abbreviated titleBigData 2017
Country/TerritoryUnited States


  • Data quality
  • Rail
  • Quality by design
  • Data quality schema


Dive into the research topics of 'Understanding data quality: ensuring data quality by design in the rail industry'. Together they form a unique fingerprint.

Cite this