Can vectors read minds better than experts?: Comparing data augmentation strategies for the automated scoring of children's mindreading ability

Venelin Kovatchev; Phillip Smith; Mark Lee; Rory Devine

Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability

Venelin Kovatchev, Phillip Smith, Mark Lee, Rory Devine

Research output: Working paper/Preprint › Preprint

41 Downloads (Pure)

Abstract

In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or "mindreading"). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macro-F1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data.

Original language	English
Publisher	arXiv
Pages	1-11
Number of pages	11
Publication status	Published - 3 Jun 2021

Bibliographical note

The paper will be presented at ACL-IJCNLP 2021

Keywords

cs.CL
cs.LG

Access to Document

KovatchevV2021vectorsOther version, 199 KBLicence: Creative Commons: Attribution (CC BY)

https://arxiv.org/abs/2106.01635v1Licence: Creative Commons: Attribution (CC BY)

Cite this

@techreport{77b60307ef4e4531a4562ea240952155,

title = "Can vectors read minds better than experts?: Comparing data augmentation strategies for the automated scoring of children's mindreading ability",

abstract = " In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or {"}mindreading{"}). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macro-F1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data. ",

keywords = "cs.CL, cs.LG",

author = "Venelin Kovatchev and Phillip Smith and Mark Lee and Rory Devine",

note = "The paper will be presented at ACL-IJCNLP 2021",

year = "2021",

month = jun,

day = "3",

language = "English",

pages = "1--11",

publisher = "arXiv",

type = "WorkingPaper",

institution = "arXiv",

}

TY - UNPB

T1 - Can vectors read minds better than experts?

T2 - Comparing data augmentation strategies for the automated scoring of children's mindreading ability

AU - Kovatchev, Venelin

AU - Smith, Phillip

AU - Lee, Mark

AU - Devine, Rory

N1 - The paper will be presented at ACL-IJCNLP 2021

PY - 2021/6/3

Y1 - 2021/6/3

N2 - In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or "mindreading"). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macro-F1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data.

AB - In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or "mindreading"). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macro-F1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data.

KW - cs.CL

KW - cs.LG

M3 - Preprint

SP - 1

EP - 11

BT - Can vectors read minds better than experts?

PB - arXiv

ER -