Towards unified secure on- and off-line analytics at scale

P. Coetzee*, M. Leeke, S. Jarvis

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)


Data scientists have applied various analytic models and techniques to address the oft-cited problems of large volume, high velocity data rates and diversity in semantics. Such approaches have traditionally employed analytic techniques in a streaming or batch processing paradigm. This paper presents CRUCIBLE, a first-in-class framework for the analysis of large-scale datasets that exploits both streaming and batch paradigms in a unified manner. The CRUCIBLE framework includes a domain specific language for describing analyses as a set of communicating sequential processes, a common runtime model for analytic execution in multiple streamed and batch environments, and an approach to automating the management of cell-level security labelling that is applied uniformly across runtimes. This paper shows the applicability of CRUCIBLE to a variety of state-of-the-art analytic environments, and compares a range of runtime models for their scalability and performance against a series of native implementations. The work demonstrates the significant impact of runtime model selection, including improvements of between 2.3× and 480× between runtime models, with an average performance gap of just 14× between CRUCIBLE and a suite of equivalent native implementations.

Original languageEnglish
Pages (from-to)738-753
Number of pages16
JournalParallel Computing
Issue number10
Publication statusPublished - Dec 2014

Bibliographical note

Publisher Copyright:
© 2014 The Authors.


  • Analytics
  • Data intensive computing
  • Data science
  • Domain specific languages
  • Hadoop
  • Streaming analysis

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Networks and Communications
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence


Dive into the research topics of 'Towards unified secure on- and off-line analytics at scale'. Together they form a unique fingerprint.

Cite this