Data wrangling for big data: challenges and opportunities

Tim Furche, Georg Gottlob, Leonid Libkin, Giorgio Orsi, Norman W Paton

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data wrangling is the process by which the data required by an application
is identified, extracted, cleaned and integrated, to yield a
data set that is suitable for exploration and analysis. Although there
are widely used Extract, Transform and Load (ETL) techniques and
platforms, they often require manual work from technical and domain
experts at different stages of the process. When confronted
with the 4 V’s of big data (volume, velocity, variety and veracity),
manual intervention may make ETL prohibitively expensive. This
paper argues that providing cost-effective, highly-automated approaches
to data wrangling involves significant research challenges,
requiring fundamental changes to established areas such as data extraction,
integration and cleaning, and to the ways in which these
areas are brought together. Specifically, the paper discusses the importance
of comprehensive support for context awareness within
data wrangling, and the need for adaptive, pay-as-you-go solutions
that automatically tune the wrangling process to the requirements
and resources of the specific application
Original languageEnglish
Title of host publicationAdvances in Database Technology — EDBT 2016
Subtitle of host publication19th International Conference on Extending Database Technology Bordeaux, France, March 15–18, 2016 Proceedings
EditorsEvaggelia Pitoura, Sofian Maabout, Georgia Koutrika, Amelie Marian, Letizia Tanca, Ioana Manolescu, Kostas Stefanidis
PublisherOpenProceedings
Pages473-478
ISBN (Electronic)978-3-89318-070-7
DOIs
Publication statusPublished - 9 Mar 2016
Event19th International Conference on Extending Database Technology (EDBT) - Bordeaux, France
Duration: 15 Mar 201616 Mar 2016

Publication series

NameAdvances in Database Technology
ISSN (Electronic)2367-2005

Conference

Conference19th International Conference on Extending Database Technology (EDBT)
Country/TerritoryFrance
CityBordeaux
Period15/03/1616/03/16

Fingerprint

Dive into the research topics of 'Data wrangling for big data: challenges and opportunities'. Together they form a unique fingerprint.

Cite this