On Bayesian Classification with Laplace Priors

Ata Kaban

doi:10.1016/j.patrec.2007.02.010

On Bayesian Classification with Laplace Priors

Ata Kaban

Computer Science

Research output: Contribution to journal › Article

32 Citations (Scopus)

Abstract

We present a new classification approach, using a variational Bayesian estimation of probit regression with Laplace priors. Laplace priors have been previously used extensively as a sparsity-inducing mechanism to perform feature selection simultaneously with classification or regression. However, contrarily to the 'myth' of sparse Bayesian learning with Laplace priors, we find that the sparsity effect is due to a property of the maximum a posteriori (MAP) parameter estimates only. The Bayesian estimates, in turn, induce a posterior weighting rather than a hard selection of features, and has different advantageous properties: (1) it provides better estimates of the prediction uncertainty; (2) it is able to retain correlated features favouring generalisation; (3) it is more stable with respect to the hyperparameter choice and (4) it produces a weight-based ranking of the features, suited for interpretation.. We analyse the behaviour of the Bayesian estimate in comparison with its MAP counterpart, as well as other related models, (a) through a graphical interpretation of the associated shrinkage and (b) by controlled numerical simulations in a range of testing conditions. The results pinpoint the situations when the advantages of Bayesian estimates are feasible to exploit. Finally, we demonstrate the working of our method in a gene expression classification task. (C) 2007 Elsevier B.V. All rights reserved.

Original language	English
Pages (from-to)	1271-1282
Number of pages	12
Journal	Pattern Recognition Letters
Volume	28
Issue number	10
DOIs	https://doi.org/10.1016/j.patrec.2007.02.010
Publication status	Published - 15 Jul 2007

Keywords

microarray gene expressions
Laplace prior
shrinkage effect
predictive features
variational Bayes
sparsity

Access to Document

10.1016/j.patrec.2007.02.010

Cite this

@article{70350998dcec45fc9d41fa7b2419e2c2,

title = "On Bayesian Classification with Laplace Priors",

abstract = "We present a new classification approach, using a variational Bayesian estimation of probit regression with Laplace priors. Laplace priors have been previously used extensively as a sparsity-inducing mechanism to perform feature selection simultaneously with classification or regression. However, contrarily to the 'myth' of sparse Bayesian learning with Laplace priors, we find that the sparsity effect is due to a property of the maximum a posteriori (MAP) parameter estimates only. The Bayesian estimates, in turn, induce a posterior weighting rather than a hard selection of features, and has different advantageous properties: (1) it provides better estimates of the prediction uncertainty; (2) it is able to retain correlated features favouring generalisation; (3) it is more stable with respect to the hyperparameter choice and (4) it produces a weight-based ranking of the features, suited for interpretation.. We analyse the behaviour of the Bayesian estimate in comparison with its MAP counterpart, as well as other related models, (a) through a graphical interpretation of the associated shrinkage and (b) by controlled numerical simulations in a range of testing conditions. The results pinpoint the situations when the advantages of Bayesian estimates are feasible to exploit. Finally, we demonstrate the working of our method in a gene expression classification task. (C) 2007 Elsevier B.V. All rights reserved.",

keywords = "microarray gene expressions, Laplace prior, shrinkage effect, predictive features, variational Bayes, sparsity",

author = "Ata Kaban",

year = "2007",

month = jul,

day = "15",

doi = "10.1016/j.patrec.2007.02.010",

language = "English",

volume = "28",

pages = "1271--1282",

journal = "Pattern Recognition Letters",

publisher = "Elsevier",

number = "10",

}

TY - JOUR

T1 - On Bayesian Classification with Laplace Priors

AU - Kaban, Ata

PY - 2007/7/15

Y1 - 2007/7/15

N2 - We present a new classification approach, using a variational Bayesian estimation of probit regression with Laplace priors. Laplace priors have been previously used extensively as a sparsity-inducing mechanism to perform feature selection simultaneously with classification or regression. However, contrarily to the 'myth' of sparse Bayesian learning with Laplace priors, we find that the sparsity effect is due to a property of the maximum a posteriori (MAP) parameter estimates only. The Bayesian estimates, in turn, induce a posterior weighting rather than a hard selection of features, and has different advantageous properties: (1) it provides better estimates of the prediction uncertainty; (2) it is able to retain correlated features favouring generalisation; (3) it is more stable with respect to the hyperparameter choice and (4) it produces a weight-based ranking of the features, suited for interpretation.. We analyse the behaviour of the Bayesian estimate in comparison with its MAP counterpart, as well as other related models, (a) through a graphical interpretation of the associated shrinkage and (b) by controlled numerical simulations in a range of testing conditions. The results pinpoint the situations when the advantages of Bayesian estimates are feasible to exploit. Finally, we demonstrate the working of our method in a gene expression classification task. (C) 2007 Elsevier B.V. All rights reserved.

AB - We present a new classification approach, using a variational Bayesian estimation of probit regression with Laplace priors. Laplace priors have been previously used extensively as a sparsity-inducing mechanism to perform feature selection simultaneously with classification or regression. However, contrarily to the 'myth' of sparse Bayesian learning with Laplace priors, we find that the sparsity effect is due to a property of the maximum a posteriori (MAP) parameter estimates only. The Bayesian estimates, in turn, induce a posterior weighting rather than a hard selection of features, and has different advantageous properties: (1) it provides better estimates of the prediction uncertainty; (2) it is able to retain correlated features favouring generalisation; (3) it is more stable with respect to the hyperparameter choice and (4) it produces a weight-based ranking of the features, suited for interpretation.. We analyse the behaviour of the Bayesian estimate in comparison with its MAP counterpart, as well as other related models, (a) through a graphical interpretation of the associated shrinkage and (b) by controlled numerical simulations in a range of testing conditions. The results pinpoint the situations when the advantages of Bayesian estimates are feasible to exploit. Finally, we demonstrate the working of our method in a gene expression classification task. (C) 2007 Elsevier B.V. All rights reserved.

KW - microarray gene expressions

KW - Laplace prior

KW - shrinkage effect

KW - predictive features

KW - variational Bayes

KW - sparsity

U2 - 10.1016/j.patrec.2007.02.010

DO - 10.1016/j.patrec.2007.02.010

M3 - Article

VL - 28

SP - 1271

EP - 1282

JO - Pattern Recognition Letters

JF - Pattern Recognition Letters

IS - 10

ER -

On Bayesian Classification with Laplace Priors

Abstract

Keywords

Access to Document

Fingerprint

Cite this