Structural clustering of millions of molecular graphs

Madeleine Seeland, Andreas Karwath, Stefan Kramer

Research output: Contribution to conference (unpublished)Paperpeer-review

3 Citations (Scopus)


Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pattern mining algorithms able to handle weighted instances. Our experiments in the domain of graph mining and molecular graphs show that the resulting models are not significantly less accurate than models trained on the full datasets, yet require only a fraction of the time using much smaller sets of patterns.
Original languageEnglish
Publication statusPublished - 2014


Dive into the research topics of 'Structural clustering of millions of molecular graphs'. Together they form a unique fingerprint.

Cite this