Variation-based distance and similarity modeling: a case study in world Englishes

Benedikt Szmrecsanyi, Jason Grafmiller, Laura Rosseel

Research output: Contribution to journalArticlepeer-review

205 Downloads (Pure)

Abstract

Inspired by work in comparative sociolinguistics and quantitative dialectometry, we sketch a corpus-based method (Variation-Based Distance & Similarity Modeling—VADIS for short) to rigorously quantify the similarity between varieties and dialects as a function of the correspondence of the ways in which language users choose between different ways of saying the same thing. To showcase the potential of the method, we present a case study that investigates three syntactic alternations in some nine international varieties of English. Key findings include that (a) probabilistic grammars are remarkably similar and stable across the varieties under study; (b) in many cases we see a cluster of “native” (a.k.a. Inner Circle) varieties, such as British English, whereas “non-native” (a.k.a. Outer Circle) varieties, such as Indian English, are a more heterogeneous group; and (c) coherence across alternations is less than perfect.
Original languageEnglish
Article number23
Number of pages14
JournalFrontiers in Artificial Intelligence
Volume2
DOIs
Publication statusPublished - 5 Nov 2019

Keywords

  • Comparative sociolinguistics
  • VADIS
  • Probabilistic grammar
  • Dialectometry
  • Variationist linguistics

Fingerprint

Dive into the research topics of 'Variation-based distance and similarity modeling: a case study in world Englishes'. Together they form a unique fingerprint.

Cite this