Dependency parsing of learner English

Yan Huang; Akira Murakami; Theodora Alexopoulou; Anna Korhonen

doi:10.1075/ijcl.16080.hua

Dependency parsing of learner English

Yan Huang, Akira Murakami, Theodora Alexopoulou, Anna Korhonen

English, Drama and Creative Studies

Research output: Contribution to journal › Article › peer-review

7 Citations (Scopus)

Abstract

Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.

Original language	English
Pages (from-to)	28-57
Number of pages	27
Journal	International Journal of Corpus Linguistics
Volume	23
Issue number	1
Early online date	31 May 2018
DOIs	https://doi.org/10.1075/ijcl.16080.hua
Publication status	Published - 2018

Keywords

dependency parsing
parsing accuracy
learner error
learner English
annotation bias

Access to Document

10.1075/ijcl.16080.hua

https://www.repository.cam.ac.uk/handle/1810/275806Licence: None: All rights reserved

Cite this

@article{4914311948b34de095185a080ffa353f,

title = "Dependency parsing of learner English",

abstract = "Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.",

keywords = "dependency parsing, parsing accuracy, learner error, learner English, annotation bias",

author = "Yan Huang and Akira Murakami and Theodora Alexopoulou and Anna Korhonen",

year = "2018",

doi = "10.1075/ijcl.16080.hua",

language = "English",

volume = "23",

pages = "28--57",

journal = "International Journal of Corpus Linguistics",

issn = "1384-6655",

publisher = "John Benjamins Publishing",

number = "1",

}

TY - JOUR

T1 - Dependency parsing of learner English

AU - Huang, Yan

AU - Murakami, Akira

AU - Alexopoulou, Theodora

AU - Korhonen, Anna

PY - 2018

Y1 - 2018

N2 - Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.

AB - Current syntactic annotation of large-scale learner corpora mainly resorts to “standard parsers” trained on native language data. Understanding how these parsers perform on learner data is important for downstream research and application related to learner language. This study evaluates the performance of multiple standard probabilistic parsers on learner English. Our contributions are three-fold. Firstly, we demonstrate that the common practice of constructing a gold standard – by manually correcting the pre-annotation of a single parser – can introduce bias to parser evaluation. We propose an alternative annotation method which can control for the annotation bias. Secondly, we quantify the influence of learner errors on parsing errors, and identify the learner errors that impact on parsing most. Finally, we compare the performance of the parsers on learner English and native English. Our results have useful implications on how to select a standard parser for learner English.

KW - dependency parsing

KW - parsing accuracy

KW - learner error

KW - learner English

KW - annotation bias

U2 - 10.1075/ijcl.16080.hua

DO - 10.1075/ijcl.16080.hua

M3 - Article

SN - 1384-6655

VL - 23

SP - 28

EP - 57

JO - International Journal of Corpus Linguistics

JF - International Journal of Corpus Linguistics

IS - 1

ER -

Dependency parsing of learner English

Abstract

Keywords

Access to Document

Fingerprint

Cite this