Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability

Venelin Kovatchev; Phillip Smith; Mark Lee; Rory Devine

doi:10.18653/v1/2021.acl-long.96

Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability

Venelin Kovatchev, Phillip Smith, Mark Lee, Rory Devine

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

95 Downloads (Pure)

Abstract

In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or “mindreading”). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macroF1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data.

Original language	English
Title of host publication	Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Editors	Chengqing Zong, Fei Xia, Wenjie Li, Roberto Navigli
Publisher	Association for Computational Linguistics, ACL
Pages	1196-1206
Number of pages	11
Volume	1
ISBN (Print)	9781954085527
DOIs	https://doi.org/10.18653/v1/2021.acl-long.96
Publication status	Published - 6 Aug 2021
Event	Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 - Virtual, Online Duration: 1 Aug 2021 → 6 Aug 2021

Publication series

Name	International Joint Conference on Natural Language Processing (IJCNLP)

Conference

Conference	Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021
City	Virtual, Online
Period	1/08/21 → 6/08/21

Bibliographical note

Funding Information:
We would like to thank Imogen Grumley Traynor and Irene Luque Aguilera for the annotation and the creation of the lists of synonyms and phrases. We also want to thank the anonymous reviewers for their feedback and suggestions. This project was funded by a grant from Wellcome to R. T. Devine.

Publisher Copyright:
© 2021 Association for Computational Linguistics

ASJC Scopus subject areas

Software
Computational Theory and Mathematics
Linguistics and Language
Language and Linguistics

Access to Document

10.18653/v1/2021.acl-long.96Licence: Creative Commons: Attribution (CC BY)

KovatchevV2021vectorsFinal published version, 218 KBLicence: Creative Commons: Attribution (CC BY)

https://arxiv.org/abs/2106.01635Licence: Creative Commons: Attribution (CC BY)

3 Article
1 Chapter (peer-reviewed)

Machine learning and deep learning systems for automated measurement of ‘advanced’ theory of mind: reliability and validity in children and adolescents
Devine, R. T., Kovatchev, V., Grumley Traynor, I., Smith, P. & Lee, M., 28 Feb 2023, In: Psychological Assessment. 35, 2, p. 165-177 13 p.
Research output: Contribution to journal › Article › peer-review

Open Access
File
148 Downloads (Pure)
“What is on your mind?” Automated Scoring of Mindreading in Childhood and Early Adolescence
Kovatchev, V., Smith, P., Lee, M., Grumley Traynor, I., Luque Aguilera, I. & Devine, R. T., 2 Dec 2020, Proceedings of the 28th International Conference on Computational Linguistics. Scott, D., Bel, N. & Zong, C. (eds.). Barcelona, Spain: International Committee on Computational Linguistics, p. 6217 6228 p. (Proceedings of the International Conference on Computational Linguistics).
Research output: Chapter in Book/Report/Conference proceeding › Chapter (peer-reviewed) › peer-review

Open Access
File
5 Downloads (Pure)
Measuring theory of mind across middle childhood: Reliability and validity of the Silent Films and Strange Stories tasks
Devine, R. & Hughes, C., Sept 2016, In: Journal of Experimental Child Psychology. 149, p. 23-40 18 p.
Research output: Contribution to journal › Article › peer-review
38 Citations (Scopus)
Silent Films and Strange Stories: Theory of Mind, Gender, and Social Experiences in Middle Childhood
Devine, R. & Hughes, C., 8 May 2013, In: Child Development. 84, 3, p. 989-1003 15 p.
Research output: Contribution to journal › Article › peer-review
135 Citations (Scopus)

1 Finished

Mindreading, Psychopathological and Social Adjustment in Middle Childhood
Devine, R.
THE WELLCOME TRUST
1/08/19 → 31/01/22
Project: Research

2 Guest lecture or Invited talk

Children's Understanding of Others' Minds: Implications for Social Adjustment and Mental Health
Rory Devine (Invited speaker)
25 Jan 2023
Activity: Academic and Industrial events › Guest lecture or Invited talk
Theory of Mind in Middle Childhood and Adolescence
Rory Devine (Invited speaker)
11 May 2021
Activity: Academic and Industrial events › Guest lecture or Invited talk

Cite this

Kovatchev, V., Smith, P., Lee, M., & Devine, R. (2021). Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability. In C. Zong, F. Xia, W. Li, & R. Navigli (Eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (Vol. 1, pp. 1196-1206). (International Joint Conference on Natural Language Processing (IJCNLP)). Association for Computational Linguistics, ACL. https://doi.org/10.18653/v1/2021.acl-long.96

Kovatchev, Venelin ; Smith, Phillip ; Lee, Mark et al. / Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). editor / Chengqing Zong ; Fei Xia ; Wenjie Li ; Roberto Navigli. Vol. 1 Association for Computational Linguistics, ACL, 2021. pp. 1196-1206 (International Joint Conference on Natural Language Processing (IJCNLP)).

@inproceedings{f77f24e791b2449eb17bd8c9d99f0ca5,

title = "Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability",

abstract = "In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or “mindreading”). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macroF1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data.",

author = "Venelin Kovatchev and Phillip Smith and Mark Lee and Rory Devine",

note = "Funding Information: We would like to thank Imogen Grumley Traynor and Irene Luque Aguilera for the annotation and the creation of the lists of synonyms and phrases. We also want to thank the anonymous reviewers for their feedback and suggestions. This project was funded by a grant from Wellcome to R. T. Devine. Publisher Copyright: {\textcopyright} 2021 Association for Computational Linguistics; Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021 ; Conference date: 01-08-2021 Through 06-08-2021",

year = "2021",

month = aug,

day = "6",

doi = "10.18653/v1/2021.acl-long.96",

language = "English",

isbn = "9781954085527",

volume = "1",

series = "International Joint Conference on Natural Language Processing (IJCNLP)",

publisher = "Association for Computational Linguistics, ACL",

pages = "1196--1206",

editor = "Chengqing Zong and Fei Xia and Wenjie Li and Roberto Navigli",

booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",

}

Kovatchev, V , Smith, P , Lee, M & Devine, R 2021, Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability. in C Zong, F Xia, W Li & R Navigli (eds), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). vol. 1, International Joint Conference on Natural Language Processing (IJCNLP), Association for Computational Linguistics, ACL, pp. 1196-1206, Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021, Virtual, Online, 1/08/21. https://doi.org/10.18653/v1/2021.acl-long.96

Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability. / Kovatchev, Venelin ; Smith, Phillip ; Lee, Mark et al.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). ed. / Chengqing Zong; Fei Xia; Wenjie Li; Roberto Navigli. Vol. 1 Association for Computational Linguistics, ACL, 2021. p. 1196-1206 (International Joint Conference on Natural Language Processing (IJCNLP)).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability

AU - Kovatchev, Venelin

AU - Smith, Phillip

AU - Lee, Mark

AU - Devine, Rory

N1 - Funding Information: We would like to thank Imogen Grumley Traynor and Irene Luque Aguilera for the annotation and the creation of the lists of synonyms and phrases. We also want to thank the anonymous reviewers for their feedback and suggestions. This project was funded by a grant from Wellcome to R. T. Devine. Publisher Copyright: © 2021 Association for Computational Linguistics

PY - 2021/8/6

Y1 - 2021/8/6

N2 - In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or “mindreading”). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macroF1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data.

AB - In this paper we implement and compare 7 different data augmentation strategies for the task of automatic scoring of children's ability to understand others' thoughts, feelings, and desires (or “mindreading”). We recruit in-domain experts to re-annotate augmented samples and determine to what extent each strategy preserves the original rating. We also carry out multiple experiments to measure how much each augmentation strategy improves the performance of automatic scoring systems. To determine the capabilities of automatic systems to generalize to unseen data, we create UK-MIND-20 - a new corpus of children's performance on tests of mindreading, consisting of 10,320 question-answer pairs. We obtain a new state-of-the-art performance on the MIND-CA corpus, improving macroF1-score by 6 points. Results indicate that both the number of training examples and the quality of the augmentation strategies affect the performance of the systems. The task-specific augmentations generally outperform task-agnostic augmentations. Automatic augmentations based on vectors (GloVe, FastText) perform the worst. We find that systems trained on MIND-CA generalize well to UK-MIND-20. We demonstrate that data augmentation strategies also improve the performance on unseen data.

UR - http://www.scopus.com/inward/record.url?scp=85118946946&partnerID=8YFLogxK

UR - https://aclanthology.org/

U2 - 10.18653/v1/2021.acl-long.96

DO - 10.18653/v1/2021.acl-long.96

M3 - Conference contribution

AN - SCOPUS:85118946946

SN - 9781954085527

VL - 1

T3 - International Joint Conference on Natural Language Processing (IJCNLP)

SP - 1196

EP - 1206

BT - Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A2 - Zong, Chengqing

A2 - Xia, Fei

A2 - Li, Wenjie

A2 - Navigli, Roberto

PB - Association for Computational Linguistics, ACL

T2 - Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021

Y2 - 1 August 2021 through 6 August 2021

ER -

Kovatchev V , Smith P , Lee M , Devine R. Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability. In Zong C, Xia F, Li W, Navigli R, editors, Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. Association for Computational Linguistics, ACL. 2021. p. 1196-1206. (International Joint Conference on Natural Language Processing (IJCNLP)). doi: 10.18653/v1/2021.acl-long.96

Can vectors read minds better than experts? Comparing data augmentation strategies for the automated scoring of children's mindreading ability

Abstract

Publication series

Conference

Bibliographical note

ASJC Scopus subject areas

Access to Document

Fingerprint

Research output

Machine learning and deep learning systems for automated measurement of ‘advanced’ theory of mind: reliability and validity in children and adolescents

“What is on your mind?” Automated Scoring of Mindreading in Childhood and Early Adolescence

Measuring theory of mind across middle childhood: Reliability and validity of the Silent Films and Strange Stories tasks

Silent Films and Strange Stories: Theory of Mind, Gender, and Social Experiences in Middle Childhood

Projects

Mindreading, Psychopathological and Social Adjustment in Middle Childhood

Activities

Children's Understanding of Others' Minds: Implications for Social Adjustment and Mental Health

Theory of Mind in Middle Childhood and Adolescence

Cite this