TY - UNPB
T1 - A corpus-based developmental investigation of lexical richness and syntactic complexity in children's written stories
AU - Hsiao, Ya-Ling
AU - Dawson, Nicola J.
AU - Banerji, Nilanjana
AU - Nation, Kate
PY - 2022/8/23
Y1 - 2022/8/23
N2 - We analysed narrative writing development using a large corpus of short stories (N> 100,000) written by children aged 5-13 in the UK. Linguistic complexity was assessed using both lexical (N=30) and syntactic (N=14) measures. Most measures were associated with age, with older children’s writing showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher for older children. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50% of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across all age categories, there was wide variation in syntactic complexity, suggesting that the ability to construct complex sentences may be less uniform across children of different ages compared to being able to use a diverse set of lexical items. We discuss the utility of analysing children’s writing development using a computational, data-driven approach.
AB - We analysed narrative writing development using a large corpus of short stories (N> 100,000) written by children aged 5-13 in the UK. Linguistic complexity was assessed using both lexical (N=30) and syntactic (N=14) measures. Most measures were associated with age, with older children’s writing showing greater lexical density, sophistication, and diversity than writing by younger children. Older children also used longer sentences, and longer T-units and clauses, and the density of smaller syntactic units inside larger units was also higher for older children. Principal Component Analysis identified a number of dimensions associated with complexity, with the first two dimensions capturing nearly 50% of variance. Lexical diversity was mainly represented on the first dimension and syntactic complexity on the second. Across all age categories, there was wide variation in syntactic complexity, suggesting that the ability to construct complex sentences may be less uniform across children of different ages compared to being able to use a diverse set of lexical items. We discuss the utility of analysing children’s writing development using a computational, data-driven approach.
M3 - Preprint
BT - A corpus-based developmental investigation of lexical richness and syntactic complexity in children's written stories
PB - SSRN
ER -