Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

K. Benoit; D. Conway; B.E. Lauderdale; M. Laver; S. Mikhaylov

doi:10.1017/S0003055416000058

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

K. Benoit, D. Conway, B.E. Lauderdale, M. Laver, S. Mikhaylov

Government

Research output: Contribution to journal › Article › peer-review

Abstract

Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

Original language	English
Pages (from-to)	278-295
Journal	American Political Science Review
Volume	110
Issue number	2
DOIs	https://doi.org/10.1017/S0003055416000058
Publication status	Published - 2016

Access to Document

10.1017/S0003055416000058

Cite this

@article{cb0f1ba404444cb18ae348337c3373b1,

title = "Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data",

abstract = "Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers{\textquoteright} attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.",

author = "K. Benoit and D. Conway and B.E. Lauderdale and M. Laver and S. Mikhaylov",

year = "2016",

doi = "10.1017/S0003055416000058",

language = "English",

volume = "110",

pages = "278--295",

journal = "American Political Science Review",

issn = "0003-0554",

publisher = "Cambridge University Press",

number = "2",

}

TY - JOUR

T1 - Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

AU - Benoit, K.

AU - Conway, D.

AU - Lauderdale, B.E.

AU - Laver, M.

AU - Mikhaylov, S.

PY - 2016

Y1 - 2016

N2 - Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

AB - Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers’ attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

UR - http://www.scopus.com/inward/record.url?eid=2-s2.0-84982918683&partnerID=MN8TOARS

U2 - 10.1017/S0003055416000058

DO - 10.1017/S0003055416000058

M3 - Article

SN - 0003-0554

VL - 110

SP - 278

EP - 295

JO - American Political Science Review

JF - American Political Science Review

IS - 2

ER -

Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data

Abstract

Access to Document

Fingerprint

Cite this