A platform for analysing stream and historic data with efficient and scalable design patterns

Rebecca Simmonds, Paul Watson, Jonathan Halliday, Paolo Missier

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Social media is an increasingly popular method for people to share information and interact with each other. Analysis of social media data has the potential to provide useful insights in a wide range of domains including social science, advertising and policing. Social media information is produced in real-time, and so analysis that can give insights into events as they occur can be particularly valuable. Similarly, analytics platforms providing low latency query responses can improve the user experience for ad-hoc data exploration on historic data sets. However, the rate at which new data is generated makes it a real challenge to design a system that can meet both of these challenges. This paper describes the deisgn and evaluation of such a system. Firstly, it describes how a meta-analysis of the types of questions that were being asked of Twitter data led to the identification of a small set of queries that could be used to answer the majority of them. Secondly, it describes the design of a scalable platform for answering these and other queries. The architecture is described: it is cloud-based, and combines both continuous query, and noSQL database technology. Evaluation results are presented which show that the system can scale to process queries on streaming data arriving at the rate of the full Twitter firehose. Experiments show that queries on large repositories of stored historic data can also be answered with low latency. Finally, we present the results of queries that combine both streaming and historic data.

Original languageEnglish
Title of host publicationProceedings - 2014 IEEE 10th World Congress on Services, SERVICES 2014
EditorsLiang-Jie Zhang, Rami Bahsoon
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages174-182
Number of pages9
ISBN (Electronic)9781479950690
DOIs
Publication statusPublished - 18 Sept 2014
Event2014 IEEE 10th World Congress on Services, SERVICES 2014 - Anchorage, United States
Duration: 27 Jun 20142 Jul 2014

Publication series

NameProceedings - 2014 IEEE 10th World Congress on Services, SERVICES 2014

Conference

Conference2014 IEEE 10th World Congress on Services, SERVICES 2014
Country/TerritoryUnited States
CityAnchorage
Period27/06/142/07/14

Bibliographical note

Funding Information:
This work was supported by the Research Councils UK Digital Economy Programme [grant number EP/G066019/1 - SIDE: Social Inclusion through the Digital Economy]

Publisher Copyright:
© 2014 IEEE

Keywords

  • Complex event processing
  • Distributed database
  • NoSQL
  • Scalability
  • Social media

ASJC Scopus subject areas

  • General Computer Science
  • General Business,Management and Accounting

Fingerprint

Dive into the research topics of 'A platform for analysing stream and historic data with efficient and scalable design patterns'. Together they form a unique fingerprint.

Cite this