Large-scale attribute selection using wrappers

Martin Gütlein; Eibe Frank; Mark A. Hall; Andreas Karwath

doi:10.1109/CIDM.2009.4938668

Large-scale attribute selection using wrappers

Martin Gütlein, Eibe Frank, Mark A. Hall, Andreas Karwath

Research output: Contribution to conference (unpublished) › Paper › peer-review

182 Citations (Scopus)

Abstract

Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed ldquooptimalrdquo subset size. We show that this technique reduces subset size while maintaining comparable accuracy.

Original language	English
Pages	332-339
DOIs	https://doi.org/10.1109/CIDM.2009.4938668
Publication status	Published - 2009

Keywords

crossvalidation, machine learning

Access to Document

10.1109/CIDM.2009.4938668

http://dx.doi.org/10.1109/CIDM.2009.4938668

Cite this

@conference{0f8194579e82489c8a727ac7c2385d9e,

title = "Large-scale attribute selection using wrappers",

abstract = "Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed ldquooptimalrdquo subset size. We show that this technique reduces subset size while maintaining comparable accuracy.",

keywords = "crossvalidation, machine learning",

author = "Martin G{\"u}tlein and Eibe Frank and Hall, {Mark A.} and Andreas Karwath",

year = "2009",

doi = "10.1109/CIDM.2009.4938668",

language = "English",

pages = "332--339",

}

TY - CONF

T1 - Large-scale attribute selection using wrappers

AU - Gütlein, Martin

AU - Frank, Eibe

AU - Hall, Mark A.

AU - Karwath, Andreas

PY - 2009

Y1 - 2009

N2 - Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed ldquooptimalrdquo subset size. We show that this technique reduces subset size while maintaining comparable accuracy.

AB - Scheme-specific attribute selection with the wrapper and variants of forward selection is a popular attribute selection technique for classification that yields good results. However, it can run the risk of overfitting because of the extent of the search and the extensive use of internal cross-validation. Moreover, although wrapper evaluators tend to achieve superior accuracy compared to filters, they face a high computational cost. The problems of overfitting and high runtime occur in particular on high-dimensional datasets, like microarray data. We investigate Linear Forward Selection, a technique to reduce the number of attributes expansions in each forward selection step. Our experiments demonstrate that this approach is faster, finds smaller subsets and can even increase the accuracy compared to standard forward selection. We also investigate a variant that applies explicit subset size determination in forward selection to combat overfitting, where the search is forced to stop at a precomputed ldquooptimalrdquo subset size. We show that this technique reduces subset size while maintaining comparable accuracy.

KW - crossvalidation, machine learning

U2 - 10.1109/CIDM.2009.4938668

DO - 10.1109/CIDM.2009.4938668

M3 - Paper

SP - 332

EP - 339

ER -

Large-scale attribute selection using wrappers

Abstract

Keywords

Access to Document

Fingerprint

Cite this