Structural clustering of millions of molecular graphs

Madeleine Seeland; Andreas Karwath; Stefan Kramer

doi:10.1145/2554850.2555063

Structural clustering of millions of molecular graphs

Madeleine Seeland, Andreas Karwath, Stefan Kramer

Research output: Contribution to conference (unpublished) › Paper › peer-review

3 Citations (Scopus)

Abstract

Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pattern mining algorithms able to handle weighted instances. Our experiments in the domain of graph mining and molecular graphs show that the resulting models are not significantly less accurate than models trained on the full datasets, yet require only a fraction of the time using much smaller sets of patterns.

Original language	English
Pages	121-128
DOIs	https://doi.org/10.1145/2554850.2555063
Publication status	Published - 2014

Access to Document

10.1145/2554850.2555063

http://doi.acm.org/10.1145/2554850.2555063

Cite this

@conference{9a6dc69f2e94470b9bf88b0838f86820,

title = "Structural clustering of millions of molecular graphs",

abstract = "Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pattern mining algorithms able to handle weighted instances. Our experiments in the domain of graph mining and molecular graphs show that the resulting models are not significantly less accurate than models trained on the full datasets, yet require only a fraction of the time using much smaller sets of patterns.",

author = "Madeleine Seeland and Andreas Karwath and Stefan Kramer",

year = "2014",

doi = "10.1145/2554850.2555063",

language = "English",

pages = "121--128",

}

TY - CONF

T1 - Structural clustering of millions of molecular graphs

AU - Seeland, Madeleine

AU - Karwath, Andreas

AU - Kramer, Stefan

PY - 2014

Y1 - 2014

N2 - Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pattern mining algorithms able to handle weighted instances. Our experiments in the domain of graph mining and molecular graphs show that the resulting models are not significantly less accurate than models trained on the full datasets, yet require only a fraction of the time using much smaller sets of patterns.

AB - Statistical machine learning algorithms building on patterns found by pattern mining algorithms have to cope with large solution sets and thus the high dimensionality of the feature space. Vice versa, pattern mining algorithms are frequently applied to irrelevant instances, thus causing noise in the output. Solution sets of pattern mining algorithms also typically grow with increasing input datasets. The paper proposes an approach to overcome these limitations. The approach extracts information from trained support vector machines, in particular their support vectors and their relevance according to their coefficients. It uses the support vectors along with their coefficients as input to pattern mining algorithms able to handle weighted instances. Our experiments in the domain of graph mining and molecular graphs show that the resulting models are not significantly less accurate than models trained on the full datasets, yet require only a fraction of the time using much smaller sets of patterns.

U2 - 10.1145/2554850.2555063

DO - 10.1145/2554850.2555063

M3 - Paper

SP - 121

EP - 128

ER -

Structural clustering of millions of molecular graphs

Abstract

Access to Document

Fingerprint

Cite this