Risks of misinterpretation in the evaluation of distant supervision for relation extraction

Juan Luis Garcia-Mendoza; Luis Villasenor-Pineda; Felipe Orihuela-Espina

Risks of misinterpretation in the evaluation of distant supervision for relation extraction

Juan Luis Garcia-Mendoza, Luis Villasenor-Pineda, Felipe Orihuela-Espina

Computer Science

Research output: Contribution to journal › Article › peer-review

44 Downloads (Pure)

Abstract

Distant Supervision is frequently used for addressing Relation Extraction. The evaluation of Distant Supervision in Relation Extraction has been attempted through Precision-Recall curves and/or calculation of Precision at N elements. However, such evaluation is challenging because the labeling of the instances results from an automatic process that can introduce noise into the labels. Consequently, the labels are not necessarily correct, affecting the learning process and the interpretation of the evaluation results. Therefore, this research aims to show that the performance of the methods measured with the mentioned evaluation strategies varies significantly if the correct labels are used during the evaluation. Besides, based on the preceding, the current interpretation of the results of these measures is questioned. To this end, we manually labeled a subset of a well-known data set and evaluated the performance of 6 traditional Distant Supervision approaches. We demonstrate quantitative differences in the evaluation scores when considering manually versus automatically labeled subsets. Consequently, the ranking of performance among distant supervision methods is different with both labeled.

Original language	English
Pages (from-to)	71-83
Number of pages	13
Journal	Procesamiento de Lenguaje Natural
Volume	68
Publication status	Published - 30 Mar 2022

Bibliographical note

Funding Information:
The present work was supported by CONA- CyT/México (scholarship 937210 and grant CB-2015-01-257383). Additionally, the authors thank CONACYT for the computer resources provided through the INAOE Supercomputing Laboratory’s Deep Learning Platform for Language Technologies.

Publisher Copyright:
© 2022 Sociedad Espanola para el Procesamiento del Lenguaje Natural. All rights reserved.

Keywords

Distant Supervision evaluation
Precision at N
Precision-Recall curves
Relation Extraction

ASJC Scopus subject areas

Language and Linguistics
Linguistics and Language
Computer Science Applications

Access to Document

GarciaMendozaJ2022RisksAccepted author manuscript, 610 KBLicence: None: All rights reserved

http://journal.sepln.org/sepln/ojs/ojs/index.php/pln/article/view/6407

Cite this

@article{bf4d45c2da744c73a73149da0411801e,

title = "Risks of misinterpretation in the evaluation of distant supervision for relation extraction",

abstract = "Distant Supervision is frequently used for addressing Relation Extraction. The evaluation of Distant Supervision in Relation Extraction has been attempted through Precision-Recall curves and/or calculation of Precision at N elements. However, such evaluation is challenging because the labeling of the instances results from an automatic process that can introduce noise into the labels. Consequently, the labels are not necessarily correct, affecting the learning process and the interpretation of the evaluation results. Therefore, this research aims to show that the performance of the methods measured with the mentioned evaluation strategies varies significantly if the correct labels are used during the evaluation. Besides, based on the preceding, the current interpretation of the results of these measures is questioned. To this end, we manually labeled a subset of a well-known data set and evaluated the performance of 6 traditional Distant Supervision approaches. We demonstrate quantitative differences in the evaluation scores when considering manually versus automatically labeled subsets. Consequently, the ranking of performance among distant supervision methods is different with both labeled.",

keywords = "Distant Supervision evaluation, Precision at N, Precision-Recall curves, Relation Extraction",

author = "Garcia-Mendoza, {Juan Luis} and Luis Villasenor-Pineda and Felipe Orihuela-Espina",

note = "Funding Information: The present work was supported by CONA- CyT/M{\'e}xico (scholarship 937210 and grant CB-2015-01-257383). Additionally, the authors thank CONACYT for the computer resources provided through the INAOE Supercomputing Laboratory{\textquoteright}s Deep Learning Platform for Language Technologies. Publisher Copyright: {\textcopyright} 2022 Sociedad Espanola para el Procesamiento del Lenguaje Natural. All rights reserved.",

year = "2022",

month = mar,

day = "30",

language = "English",

volume = "68",

pages = "71--83",

journal = "Procesamiento de Lenguaje Natural",

issn = "1135-5948",

}

TY - JOUR

T1 - Risks of misinterpretation in the evaluation of distant supervision for relation extraction

AU - Garcia-Mendoza, Juan Luis

AU - Villasenor-Pineda, Luis

AU - Orihuela-Espina, Felipe

N1 - Funding Information: The present work was supported by CONA- CyT/México (scholarship 937210 and grant CB-2015-01-257383). Additionally, the authors thank CONACYT for the computer resources provided through the INAOE Supercomputing Laboratory’s Deep Learning Platform for Language Technologies. Publisher Copyright: © 2022 Sociedad Espanola para el Procesamiento del Lenguaje Natural. All rights reserved.

PY - 2022/3/30

Y1 - 2022/3/30

N2 - Distant Supervision is frequently used for addressing Relation Extraction. The evaluation of Distant Supervision in Relation Extraction has been attempted through Precision-Recall curves and/or calculation of Precision at N elements. However, such evaluation is challenging because the labeling of the instances results from an automatic process that can introduce noise into the labels. Consequently, the labels are not necessarily correct, affecting the learning process and the interpretation of the evaluation results. Therefore, this research aims to show that the performance of the methods measured with the mentioned evaluation strategies varies significantly if the correct labels are used during the evaluation. Besides, based on the preceding, the current interpretation of the results of these measures is questioned. To this end, we manually labeled a subset of a well-known data set and evaluated the performance of 6 traditional Distant Supervision approaches. We demonstrate quantitative differences in the evaluation scores when considering manually versus automatically labeled subsets. Consequently, the ranking of performance among distant supervision methods is different with both labeled.

AB - Distant Supervision is frequently used for addressing Relation Extraction. The evaluation of Distant Supervision in Relation Extraction has been attempted through Precision-Recall curves and/or calculation of Precision at N elements. However, such evaluation is challenging because the labeling of the instances results from an automatic process that can introduce noise into the labels. Consequently, the labels are not necessarily correct, affecting the learning process and the interpretation of the evaluation results. Therefore, this research aims to show that the performance of the methods measured with the mentioned evaluation strategies varies significantly if the correct labels are used during the evaluation. Besides, based on the preceding, the current interpretation of the results of these measures is questioned. To this end, we manually labeled a subset of a well-known data set and evaluated the performance of 6 traditional Distant Supervision approaches. We demonstrate quantitative differences in the evaluation scores when considering manually versus automatically labeled subsets. Consequently, the ranking of performance among distant supervision methods is different with both labeled.

KW - Distant Supervision evaluation

KW - Precision at N

KW - Precision-Recall curves

KW - Relation Extraction

UR - http://www.scopus.com/inward/record.url?scp=85128184430&partnerID=8YFLogxK

UR - http://doi.org/10.26342/2022-68-5

M3 - Article

AN - SCOPUS:85128184430

SN - 1135-5948

VL - 68

SP - 71

EP - 83

JO - Procesamiento de Lenguaje Natural

JF - Procesamiento de Lenguaje Natural

ER -

Risks of misinterpretation in the evaluation of distant supervision for relation extraction

Abstract

Bibliographical note

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this