Skip to main content
Search
Main content
GigaScience
Published

A large dataset of brain imaging linked to health systems data: curation and access to a whole system national cohort from NHS Scotland

Authors

Michael P J Camilleri, Dorian Gouzou, Salim Al-Wasity, Muthu R K Mookiah, María Valdes Hernandez, Bea Alex, Sotirios A Tsaftaris, Andrew Brooks, Ruairidh MacLeod, Honghan Wu, Brenda Bauer, Claire Grover, Parminder Reel, Susan Krueger, Richard Tobin, J Douglas Steele, Grant Mair, Joanna Wardlaw, Alexander Doney, Emanuele Trucco, William Whiteley

Abstract

Gigascience. 2026 Jun 9:giag072. doi: 10.1093/gigascience/giag072. Online ahead of print.

ABSTRACT

We present the design and implementation of a data curation framework to generate a large-scale clinical brain imaging dataset suitable for artificial intelligence (AI) enabled image analysis. The dataset is accessible through the Brain Health Data (BHD) initiative, which includes approximately 417,341 magnetic resonance imaging (MRI) and 846,077 computerized tomography (CT) head studies, linked electronic health records (EHRs), and associated free-text imaging reports from clinical practice between 2010 and 2018 in Scotland, exceeding 185 TB in size. The data curation framework was developed during the SCottish AI in Neuroimaging to predict Dementia and Neurodegenerative Disease (SCANDAN) study, which used a subset of 41,966 MRI series from the BHD for dementia prediction. We describe the processing of the BHD metadata and our multilabel classification output. We discuss the strengths of the BHD, including clinical relevance thanks to its unprecedented scale, population-wide representativeness of a national free-at-the-point-of-delivery healthcare, long-term follow-up to neurodegenerative disease, and real-world variability. We describe the challenges and lessons learnt in developing a framework to curate data, including the time needed to obtain permissions, the need for easily accessible, secure, responsive and affordable computational environments, the variability of clinical data, and the challenge of extracting linked clinical data and images at scale. This resource will be crucial for clinical research, fostering the development of personalized medicine approaches, and fast-tracking the implementation of AI models in clinical workflows. We encourage the use of the BHD data through a streamlined application to the Public Benefit and Privacy Panel for Health and Care via the Data Research and Innovation Service of Public Health Scotland (eDRIS).

PMID:42264156 | DOI:10.1093/gigascience/giag072

UK DRI Authors

Joanna Wardlaw profile image

Prof Joanna Wardlaw

Group Leader and Clinical Director of the CVDR

Causes, mechanisms, clinical features, diagnosis, consequences, outcomes and treatment of cerebral small vessel disease focusing on clinical studies.

Prof Joanna Wardlaw