Skip to main navigation Skip to search Skip to main content

A corpus-based developmental investigation of lexical richness and syntactic complexity in children's written stories

  • Ya-Ling Hsiao
  • , Nicola J. Dawson
  • , Nilanjana Banerji
  • , Kate Nation

Research output: Working paper/PreprintPreprint

Abstract

We analysed narrative writing development using a large corpus of short stories (N> 100,000) written by children aged 5-13 in the UK. Linguistic complexity was assessed using both lexical (N=30) and syntactic (N=14) measures. Most measures were associated with age, with older children’s writing showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher for older children. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50% of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across all age categories, there was wide variation in syntactic complexity, suggesting that the ability to construct complex sentences may be less uniform across children of different ages compared to being able to use a diverse set of lexical items. We discuss the utility of analysing children’s writing development using a computational, data-driven approach.
Original languageEnglish
PublisherSSRN
Number of pages42
Publication statusPublished - 23 Aug 2022

Fingerprint

Dive into the research topics of 'A corpus-based developmental investigation of lexical richness and syntactic complexity in children's written stories'. Together they form a unique fingerprint.

Cite this