Online density estimation of heterogeneous data streams in higher dimensions

Michael Geilke, Andreas Karwath, Stefan Kramer

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The joint density of a data stream is suitable for performing data mining tasks without having access to the original data. However, the methods proposed so far only target a small to medium number of variables, since their estimates rely on representing all the interdependencies between the variables of the data. High-dimensional data streams, which are becoming more and more frequent due to increasing numbers of interconnected devices, are, therefore, pushing these methods to their limits. To mitigate these limitations, we present an approach that projects the original data stream into a vector space and uses a set of representatives to provide an estimate. Due to the structure of the estimates, it enables the density estimation of higher-dimensional data and approaches the true density with increasing dimensionality of the vector space. Moreover, it is not only designed to estimate homogeneous data, i.e., where all variables are nominal or all variables are numeric, but it can also estimate heterogeneous data. The evaluation is conducted on synthetic and real-world data. The software related to this paper is available at https://​github.​com/​geilke/​mideo.
Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationECML PKDD 2016
PublisherSpringer
Pages65-80
ISBN (Electronic)9783319461281
ISBN (Print)9783319461274
DOIs
Publication statusE-pub ahead of print - 4 Sept 2016
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases - Riva del Garda, Italy
Duration: 19 Sept 201623 Sept 2016

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume9851

Conference

ConferenceEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Abbreviated titleECML-PKDD 2016
Country/TerritoryItaly
CityRiva del Garda
Period19/09/1623/09/16

Keywords

  • data mining
  • density estimation
  • stream mining

Fingerprint

Dive into the research topics of 'Online density estimation of heterogeneous data streams in higher dimensions'. Together they form a unique fingerprint.

Cite this