Abstract
Data wrangling is the process by which the data required by an application
is identified, extracted, cleaned and integrated, to yield a
data set that is suitable for exploration and analysis. Although there
are widely used Extract, Transform and Load (ETL) techniques and
platforms, they often require manual work from technical and domain
experts at different stages of the process. When confronted
with the 4 V’s of big data (volume, velocity, variety and veracity),
manual intervention may make ETL prohibitively expensive. This
paper argues that providing cost-effective, highly-automated approaches
to data wrangling involves significant research challenges,
requiring fundamental changes to established areas such as data extraction,
integration and cleaning, and to the ways in which these
areas are brought together. Specifically, the paper discusses the importance
of comprehensive support for context awareness within
data wrangling, and the need for adaptive, pay-as-you-go solutions
that automatically tune the wrangling process to the requirements
and resources of the specific application
is identified, extracted, cleaned and integrated, to yield a
data set that is suitable for exploration and analysis. Although there
are widely used Extract, Transform and Load (ETL) techniques and
platforms, they often require manual work from technical and domain
experts at different stages of the process. When confronted
with the 4 V’s of big data (volume, velocity, variety and veracity),
manual intervention may make ETL prohibitively expensive. This
paper argues that providing cost-effective, highly-automated approaches
to data wrangling involves significant research challenges,
requiring fundamental changes to established areas such as data extraction,
integration and cleaning, and to the ways in which these
areas are brought together. Specifically, the paper discusses the importance
of comprehensive support for context awareness within
data wrangling, and the need for adaptive, pay-as-you-go solutions
that automatically tune the wrangling process to the requirements
and resources of the specific application
Original language | English |
---|---|
Title of host publication | Advances in Database Technology — EDBT 2016 |
Subtitle of host publication | 19th International Conference on Extending Database Technology Bordeaux, France, March 15–18, 2016 Proceedings |
Editors | Evaggelia Pitoura, Sofian Maabout, Georgia Koutrika, Amelie Marian, Letizia Tanca, Ioana Manolescu, Kostas Stefanidis |
Publisher | OpenProceedings |
Pages | 473-478 |
ISBN (Electronic) | 978-3-89318-070-7 |
DOIs | |
Publication status | Published - 9 Mar 2016 |
Event | 19th International Conference on Extending Database Technology (EDBT) - Bordeaux, France Duration: 15 Mar 2016 → 16 Mar 2016 |
Publication series
Name | Advances in Database Technology |
---|---|
ISSN (Electronic) | 2367-2005 |
Conference
Conference | 19th International Conference on Extending Database Technology (EDBT) |
---|---|
Country/Territory | France |
City | Bordeaux |
Period | 15/03/16 → 16/03/16 |