Tracking dengue epidemics using twitter content classification and topic modelling

Paolo Missier*, Alexander Romanovsky, Tudor Miu, Atinder Pal, Michael Daniilakis, Alessandro Garcia, Diego Cedrim, Leonardo da Silva Sousa

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Detecting and preventing outbreaks of mosquito-borne diseases such as Dengue and Zika in Brasil and other tropical regions has long been a priority for governments in affected areas. Streaming social media content, such as Twitter, is increasingly being used for health vigilance applications such as flu detection. However, previous work has not addressed the complexity of drastic seasonal changes on Twitter content across multiple epidemic outbreaks. In order to address this gap, this paper contrasts two complementary approaches to detecting Twitter content that is relevant for Dengue outbreak detection, namely supervised classification and unsupervised clustering using topic modelling. Each approach has benefits and shortcomings. Our classifier achieves a prediction accuracy of about 80% based on a small training set of about 1,000 instances, but the need for manual annotation makes it hard to track seasonal changes in the nature of the epidemics, such as the emergence of new types of virus in certain geographical locations. In contrast, LDA-based topic modelling scales well, generating cohesive and well-separated clusters from larger samples. While clusters can be easily re-generated following changes in epidemics, however, this approach makes it hard to clearly segregate relevant tweets into well-defined clusters.

Original languageEnglish
Title of host publicationCurrent Trends in Web Engineering - ICWE 2016 International Workshops DUI, TELERISE, SoWeMine, and Liquid Web, Revised Selected Papers
EditorsCesare Pautasso, Sven Casteleyn, Peter Dolog
PublisherSpringer Verlag
Pages80-92
Number of pages13
ISBN (Print)9783319469621
DOIs
Publication statusPublished - 2016
EventInternational Conference on Web Engineering, ICWE 2016 and 2nd International Workshop on TEchnical and LEgal aspects of data pRIvacy and SEcurity, TELERISE 2016, 2nd International Workshop on Mining the Social Web, SoWeMine 2016, 1st International Workshop on Liquid Multi-Device Software for the Web, LiquidWS 2016, 5th Workshop on Distributed User Interfaces: Distributing Interactions, DUI 2016 - Lugano, Switzerland
Duration: 6 Jun 20169 Jun 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9881 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Web Engineering, ICWE 2016 and 2nd International Workshop on TEchnical and LEgal aspects of data pRIvacy and SEcurity, TELERISE 2016, 2nd International Workshop on Mining the Social Web, SoWeMine 2016, 1st International Workshop on Liquid Multi-Device Software for the Web, LiquidWS 2016, 5th Workshop on Distributed User Interfaces: Distributing Interactions, DUI 2016
Country/TerritorySwitzerland
CityLugano
Period6/06/169/06/16

Bibliographical note

Funding Information:
This work has been supported by MRC UK and FAPERJ Brazil within the Newton Fund Project entitled A Software Infrastructure for Promoting Efficient Entomological Monitoring of Dengue Fever. The authors would like to thank Oswaldo G. Cruz (Fundao Oswaldo Cruz, Programa de Computacao Cientifica) and Leonardo Frajhof (Unirio, Rio de Janeiro, Brazil) for their contributions to this paper, and Prof. Wagner Meira Jr. and his team for sharing their 2009–2011 Twitter datasets [].

Publisher Copyright:
© Springer International Publishing AG 2016.

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Tracking dengue epidemics using twitter content classification and topic modelling'. Together they form a unique fingerprint.

Cite this