Variation-based distance and similarity modeling: a case study in world Englishes

Benedikt Szmrecsanyi; Jason Grafmiller; Laura Rosseel

doi:10.3389/frai.2019.00023

Variation-based distance and similarity modeling: a case study in world Englishes

Benedikt Szmrecsanyi, Jason Grafmiller, Laura Rosseel

English Language and Linguistics

Research output: Contribution to journal › Article › peer-review

205 Downloads (Pure)

Abstract

Inspired by work in comparative sociolinguistics and quantitative dialectometry, we sketch a corpus-based method (Variation-Based Distance & Similarity Modeling—VADIS for short) to rigorously quantify the similarity between varieties and dialects as a function of the correspondence of the ways in which language users choose between different ways of saying the same thing. To showcase the potential of the method, we present a case study that investigates three syntactic alternations in some nine international varieties of English. Key findings include that (a) probabilistic grammars are remarkably similar and stable across the varieties under study; (b) in many cases we see a cluster of “native” (a.k.a. Inner Circle) varieties, such as British English, whereas “non-native” (a.k.a. Outer Circle) varieties, such as Indian English, are a more heterogeneous group; and (c) coherence across alternations is less than perfect.

Original language	English
Article number	23
Number of pages	14
Journal	Frontiers in Artificial Intelligence
Volume	2
DOIs	https://doi.org/10.3389/frai.2019.00023
Publication status	Published - 5 Nov 2019

Keywords

Comparative sociolinguistics
VADIS
Probabilistic grammar
Dialectometry
Variationist linguistics

Access to Document

10.3389/frai.2019.00023Licence: Creative Commons: Attribution (CC BY)

Szmrecsanyib2019variat
© 2019 Szmrecsanyi, Grafmiller and Rosseel.
Final published version, 608 KBLicence: Creative Commons: Attribution (CC BY)

https://www.frontiersin.org/articles/10.3389/frai.2019.00023/fullLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{87635722c0e944e7ae6040c6c4958171,

title = "Variation-based distance and similarity modeling: a case study in world Englishes",

abstract = "Inspired by work in comparative sociolinguistics and quantitative dialectometry, we sketch a corpus-based method (Variation-Based Distance & Similarity Modeling—VADIS for short) to rigorously quantify the similarity between varieties and dialects as a function of the correspondence of the ways in which language users choose between different ways of saying the same thing. To showcase the potential of the method, we present a case study that investigates three syntactic alternations in some nine international varieties of English. Key findings include that (a) probabilistic grammars are remarkably similar and stable across the varieties under study; (b) in many cases we see a cluster of “native” (a.k.a. Inner Circle) varieties, such as British English, whereas “non-native” (a.k.a. Outer Circle) varieties, such as Indian English, are a more heterogeneous group; and (c) coherence across alternations is less than perfect.",

keywords = "Comparative sociolinguistics, VADIS, Probabilistic grammar, Dialectometry, Variationist linguistics",

author = "Benedikt Szmrecsanyi and Jason Grafmiller and Laura Rosseel",

year = "2019",

month = nov,

day = "5",

doi = "10.3389/frai.2019.00023",

language = "English",

volume = "2",

journal = "Frontiers in Artificial Intelligence",

issn = "2624-8212",

publisher = "Frontiers Media S.A.",

}

TY - JOUR

T1 - Variation-based distance and similarity modeling

T2 - a case study in world Englishes

AU - Szmrecsanyi, Benedikt

AU - Grafmiller, Jason

AU - Rosseel, Laura

PY - 2019/11/5

Y1 - 2019/11/5

N2 - Inspired by work in comparative sociolinguistics and quantitative dialectometry, we sketch a corpus-based method (Variation-Based Distance & Similarity Modeling—VADIS for short) to rigorously quantify the similarity between varieties and dialects as a function of the correspondence of the ways in which language users choose between different ways of saying the same thing. To showcase the potential of the method, we present a case study that investigates three syntactic alternations in some nine international varieties of English. Key findings include that (a) probabilistic grammars are remarkably similar and stable across the varieties under study; (b) in many cases we see a cluster of “native” (a.k.a. Inner Circle) varieties, such as British English, whereas “non-native” (a.k.a. Outer Circle) varieties, such as Indian English, are a more heterogeneous group; and (c) coherence across alternations is less than perfect.

AB - Inspired by work in comparative sociolinguistics and quantitative dialectometry, we sketch a corpus-based method (Variation-Based Distance & Similarity Modeling—VADIS for short) to rigorously quantify the similarity between varieties and dialects as a function of the correspondence of the ways in which language users choose between different ways of saying the same thing. To showcase the potential of the method, we present a case study that investigates three syntactic alternations in some nine international varieties of English. Key findings include that (a) probabilistic grammars are remarkably similar and stable across the varieties under study; (b) in many cases we see a cluster of “native” (a.k.a. Inner Circle) varieties, such as British English, whereas “non-native” (a.k.a. Outer Circle) varieties, such as Indian English, are a more heterogeneous group; and (c) coherence across alternations is less than perfect.

KW - Comparative sociolinguistics

KW - VADIS

KW - Probabilistic grammar

KW - Dialectometry

KW - Variationist linguistics

U2 - 10.3389/frai.2019.00023

DO - 10.3389/frai.2019.00023

M3 - Article

SN - 2624-8212

VL - 2

JO - Frontiers in Artificial Intelligence

JF - Frontiers in Artificial Intelligence

M1 - 23

ER -

Variation-based distance and similarity modeling: a case study in world Englishes

Abstract

Keywords

Access to Document

Fingerprint

Cite this