TY - GEN
T1 - Extracting PROV provenance traces from Wikipedia history pages
AU - Missier, Paolo
AU - Chen, Ziyu
PY - 2013
Y1 - 2013
N2 - Wikipedia History pages contain provenance metadata that describes the history of revisions of each Wikipedia article. We have developed a simple extractor which, starting from a user-specified article page, crawls through the graph of its associated history pages, and encodes the essential elements of those pages according to the PROV data model. The crawling is performed on the live pages using the Wikipedia REST interface. The resulting PROV provenance graphs are stored in a graph database (Neo4J), where they can be queried using the Cypher graph query language (proprietary to Neo4J), or traversed programmatically using the Neo4J Java Traversal API.
AB - Wikipedia History pages contain provenance metadata that describes the history of revisions of each Wikipedia article. We have developed a simple extractor which, starting from a user-specified article page, crawls through the graph of its associated history pages, and encodes the essential elements of those pages according to the PROV data model. The crawling is performed on the live pages using the Wikipedia REST interface. The resulting PROV provenance graphs are stored in a graph database (Neo4J), where they can be queried using the Cypher graph query language (proprietary to Neo4J), or traversed programmatically using the Neo4J Java Traversal API.
KW - Design
KW - E [Data]: General
KW - H.2.3 [Database Management]: Languages - Data description languages
UR - https://www.scopus.com/pages/publications/84876789082
U2 - 10.1145/2457317.2457375
DO - 10.1145/2457317.2457375
M3 - Conference contribution
AN - SCOPUS:84876789082
SN - 9781450315999
T3 - ACM International Conference Proceeding Series
SP - 327
EP - 330
BT - Joint EDBT/ICDT 2013 Workshops - Proceedings
T2 - Joint EDBT/ICDT 2013 Workshops
Y2 - 18 March 2013 through 22 March 2013
ER -