Extracting PROV provenance traces from Wikipedia history pages

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Wikipedia History pages contain provenance metadata that describes the history of revisions of each Wikipedia article. We have developed a simple extractor which, starting from a user-specified article page, crawls through the graph of its associated history pages, and encodes the essential elements of those pages according to the PROV data model. The crawling is performed on the live pages using the Wikipedia REST interface. The resulting PROV provenance graphs are stored in a graph database (Neo4J), where they can be queried using the Cypher graph query language (proprietary to Neo4J), or traversed programmatically using the Neo4J Java Traversal API.

Original languageEnglish
Title of host publicationJoint EDBT/ICDT 2013 Workshops - Proceedings
Pages327-330
Number of pages4
DOIs
Publication statusPublished - 2013
EventJoint EDBT/ICDT 2013 Workshops - Genoa, Italy
Duration: 18 Mar 201322 Mar 2013

Publication series

NameACM International Conference Proceeding Series

Conference

ConferenceJoint EDBT/ICDT 2013 Workshops
Country/TerritoryItaly
CityGenoa
Period18/03/1322/03/13

Keywords

  • Design
  • E [Data]: General
  • H.2.3 [Database Management]: Languages - Data description languages

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Extracting PROV provenance traces from Wikipedia history pages'. Together they form a unique fingerprint.

Cite this