Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning

Research output: Contribution to journalArticlepeer-review

Authors

  • S:CORT Consortium

External organisations

  • Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, UK; Department of Oncology, University of Oxford, Oxford, UK.
  • Leeds Institute of Cancer and Pathology
  • Queen's University Belfast
  • OXFORD UNIVERSITY HOSPITALS NHS TRUST
  • Gastrointestinal Stem cell Biology Laboratory, Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK.
  • Wellcome Trust Sanger Institute
  • Institute of Cancer and Genomic Sciences
  • School of Cancer Sciences
  • Current address: MRC Clinical Trials Unit, University College London, London, UK.
  • Almac Diagnostics Ltd, Craigavon, UK.
  • ABERDEEN UNIVERSITY
  • Department of Clinical Oncology, Kent Oncology Centre, Maidstone, England.
  • Edinburgh Cancer Research Centre, University of Edinburgh
  • MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, United Kingdom.
  • The University of Edinburgh, Cancer Research UK Edinburgh Centre, Western General Hospital, EH4 University Cancer Centre, University of Edinburgh, Edinburgh, UK.

Abstract

OBJECTIVE: Complex phenotypes captured on histological slides represent the biological processes at play in individual cancers, but the link to underlying molecular classification has not been clarified or systematised. In colorectal cancer (CRC), histological grading is a poor predictor of disease progression, and consensus molecular subtypes (CMSs) cannot be distinguished without gene expression profiling. We hypothesise that image analysis is a cost-effective tool to associate complex features of tissue organisation with molecular and outcome data and to resolve unclassifiable or heterogeneous cases. In this study, we present an image-based approach to predict CRC CMS from standard H&E sections using deep learning.

DESIGN: Training and evaluation of a neural network were performed using a total of n=1206 tissue sections with comprehensive multi-omic data from three independent datasets (training on FOCUS trial, n=278 patients; test on rectal cancer biopsies, GRAMPIAN cohort, n=144 patients; and The Cancer Genome Atlas (TCGA), n=430 patients). Ground truth CMS calls were ascertained by matching random forest and single sample predictions from CMS classifier.

RESULTS: Image-based CMS (imCMS) accurately classified slides in unseen datasets from TCGA (n=431 slides, AUC)=0.84) and rectal cancer biopsies (n=265 slides, AUC=0.85). imCMS spatially resolved intratumoural heterogeneity and provided secondary calls correlating with bioinformatic prediction from molecular data. imCMS classified samples previously unclassifiable by RNA expression profiling, reproduced the expected correlations with genomic and epigenetic alterations and showed similar prognostic associations as transcriptomic CMS.

CONCLUSION: This study shows that a prediction of RNA expression classifiers can be made from H&E images, opening the door to simple, cheap and reliable biological stratification within routine workflows.

Bibliographic note

© Author(s) (or their employer(s)) 2020. Re-use permitted under CC BY. Published by BMJ.

Details

Original languageEnglish
JournalGut
Publication statusE-pub ahead of print - 20 Jul 2020