I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data

Mahan Hosseini, Michael Powell, John Collins, Chloe Callahan-Flintoft, William Jones, Howard Bowman, Brad Wyble

Research output: Contribution to journalReview articlepeer-review

4 Citations (Scopus)

Abstract

Machine learning has enhanced the abilities of neuroscientists to interpret information collected through EEG, fMRI, and MEG data. With these powerful techniques comes the danger of overfitting of hyperparameters which can render results invalid. We refer to this problem as 'over-hyping' and show that it is pernicious despite commonly used precautions. Over-hyping occurs when analysis decisions are made after observing analysis outcomes and can produce results that are partially or even completely spurious. It is commonly assumed that cross-validation is an effective protection against overfitting or overhyping, but this is not actually true. In this article, we show that spurious result can be obtained on random data by modifying hyperparameters in seemingly innocuous ways, despite the use of cross-validation. We recommend a number of techniques for limiting over-hyping, such as lock boxes, blind analyses, pre-registrations, and nested cross-validation. These techniques, are common in other fields that use machine learning, including computer science and physics. Adopting similar safeguards is critical for ensuring the robustness of machine-learning techniques in the neurosciences.

Original languageEnglish
JournalNeuroscience and biobehavioral reviews
DOIs
Publication statusE-pub ahead of print - 6 Oct 2020

Bibliographical note

Copyright © 2020. Published by Elsevier Ltd.

Fingerprint

Dive into the research topics of 'I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data'. Together they form a unique fingerprint.

Cite this