Crafting scalable analytics in order to extract actionable business intelligence is a challenging endeavour, requiring multiple layers of expertise and experience. Often, this expertise is irreconcilably split between an organisation's engineers and subject matter or domain experts. Previous approaches to this problem have relied on technically adept users with tool-specific training. These approaches have generally not targeted the levels of performance and scalability required to harness the sheer volume and velocity of large-scale data analytics. In this paper, we present a novel approach to the automated planning of scalable analytics using a semantically rich type system, the use of which requires little programming expertise from the user. This approach is the first of its kind to permit domain experts with little or no technical expertise to assemble complex and scalable analytics, for execution both on-and off-line, with no lower-level engineering support. We describe in detail (i) an abstract model of analytic assembly and execution, (ii) goal-based planning and (iii) code generation using this model for both on-and off-line analytics. Our implementation of this model, Mendeleev, is used to (iv) demonstrate the applicability of our approach through a series of case studies, in which a single interface is used to create analytics that can be run in real-time (on-line) and batch (off-line) environments. We (v) analyse the performance of the planner, and (vi) show that the performance of Mendeleev's generated code is comparable with that of hand-written analytics.
|Title of host publication||Proceedings - 9th IEEE International Conference on Big Data Science and Engineering, BigDataSE 2015|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||10|
|Publication status||Published - 2 Dec 2015|
|Event||14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2015 - Helsinki, Finland|
Duration: 20 Aug 2015 → 22 Aug 2015
|Name||Proceedings - 14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2015|
|Conference||14th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2015|
|Period||20/08/15 → 22/08/15|
Bibliographical noteFunding Information:
This work was funded under an Industrial EPSRC CASE Studentship, entitled "Platforms for Deploying Scalable Parallel Analytic Jobs over High Frequency Data Streams".
© 2015 IEEE.
ASJC Scopus subject areas
- Computer Networks and Communications