Bayesian clinical classification from high-dimensional data: Signatures versus variability

Akram Shalabi, Masato Inoue, Johnathan Watkins, Emanuele De Rinaldis, Anthony C.C. Coolen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)


When data exhibit imbalance between a large number d of covariates and a small number n of samples, clinical outcome prediction is impaired by overfitting and prohibitive computation demands. Here we study two simple Bayesian prediction protocols that can be applied to data of any dimension and any number of outcome classes. Calculating Bayesian integrals and optimal hyperparameters analytically leaves only a small number of numerical integrations, and CPU demands scale as O(nd). We compare their performance on synthetic and genomic data to the mclustDA method of Fraley and Raftery. For small d they perform as well as mclustDA or better. For d = 10,000 or more mclustDA breaks down computationally, while the Bayesian methods remain efficient. This allows us to explore phenomena typical of classification in high-dimensional spaces, such as overfitting and the reduced discriminative effectiveness of signatures compared to intra-class variability.

Original languageEnglish
Pages (from-to)336-351
Number of pages16
JournalStatistical Methods in Medical Research
Issue number2
Publication statusPublished - 2018 Feb 1


  • Bayesian classification
  • Discriminant analysis
  • curse of dimensionality
  • outcome prediction
  • overfitting

ASJC Scopus subject areas

  • Epidemiology
  • Statistics and Probability
  • Health Information Management


Dive into the research topics of 'Bayesian clinical classification from high-dimensional data: Signatures versus variability'. Together they form a unique fingerprint.

Cite this