"We aim to integrate biological data with large genetic datasets to detect disease-relevant mechanisms, develop novel mathematical approaches tailored to identifying patterns in non-linear multidimensional “omics” space, utilise high performance computing to process and analyse “BIG DATA”." Valentina Escott-Price
UK DRI Group Leader
An expert in ‘big data’, Prof Valentina Escott-Price uses large datasets to identify risk genes and biological pathways for disease. First studying mathematics at St. Petersburg University, Russia, she went on to obtain her PhD in Statistics from Cardiff University in 2001. As part of the International Genomics of Alzheimer’s Project consortium, Valentina helped discover 11 new susceptibility loci for Alzheimer’s disease. She will continue this work at the UK DRI at Cardiff, developing cutting-edge mathematical approaches to study mechanisms of disease.
1. At a glance
Using 'Big Data' to accelerate discoveries into dementia
Our genes play a key role in determining our risk of developing different diseases. Whilst faults in certain high-risk genes cause rare inherited forms of Alzheimer’s disease (AD), scientists have also identified several low-risk genetic variations that can increase the likelihood of developing this neurodegenerative condition.
Scientists are now building on these genetic discoveries by carrying out laboratory investigations into the functional effects of Alzheimer’s risk genes – as well as clinical studies involving many thousands of people to explore how they interact with other environmental and lifestyle factors.
Doctors currently diagnose someone with a neurodegenerative disease based on their symptoms – but a more accurate diagnosis based on the actual genes and biological processes involved will lead to more personalised treatments that can target the molecular causes. But achieving this will require combining huge amounts of data collected from laboratory experiments and individuals taking part in large clinical studies.
Prof Valentina Escott-Price is using computational approaches to make sense of increasingly large datasets generated by genetic, clinical and laboratory studies into dementia. She is aiming to identify key drug targets and to redefine how AD and other dementias are diagnosed better in the future.
2. Scientific goals
Genome-wide association studies (GWAS) of Alzheimer’s disease (AD) have identified multiple loci containing common variant risk alleles. These findings offer new routes to understanding disease biology that could be used to design novel therapies. However, the causal genes, pathways and process are yet to be fully identified.
There are several major challenges in attempting to translate findings from GWAS into an appreciation of altered biological function. Specifically, index GWAS variants are usually in linkage disequilibrium (LD) with many other single nucleotide variants (SNPs), any of which might credibly be the causal variant(s). There is also strong evidence that most causal alleles reside in non-coding regions of the genome, making immediate recognition and functional interpretation difficult. Finally, non-coding elements are often associated with genes over large chromosomal distances, and in a cell type-specific manner, hampering the ability to identify true AD risk genes.
The common variant risk for AD, like other complex disorders, is highly polygenic. Prof Valentina Escott-Price and her team have recently shown that AD has a significant polygenic component which has predictive utility for AD risk prediction and could be a valuable research tool complementing experimental designs, including preventative clinical trials, stem cell selection and high/low risk clinical studies. In addition, recent evidence has shown that common transcriptional mechanisms operate across risk loci, and that polygenic risk, therefore, resides in specific transcriptional networks.
Current diagnostic categories do not map onto directly underlying biology and are at odds with the continuous nature of many disease phenotypes. There is evidence for shared genetic risk across neurodegenerative disorders and genetic strata within disorders. The inclusion of phenotyping measures to improve the prediction models of AD subphenotypes and to link these to biological pathways is therefore of high interest.
Main objectives and research goals:
This UK DRI programme led by Prof Valentina Escott-Price aims to integrate data from different sources, species and formats, applying analytical approaches to accelerate new diagnostics and effective treatments for dementia.
1. Integrate genetic and functional data to identify drug targets and enhance risk prediction.
2. Identify functional non-coding AD risk variants that operate in microglia (causal variants).
3. Link non-coding variants to gene targets (causal genes).
4. Determine microglia transcriptional networks that mediate polygenic AD risk (causal networks).
5. Identify genetic, chemical and environmental regulators of AD-relevant transcriptional networks.
6. Identify novel biologically valid diagnostic categories to inform precision medicine.
7. Investigate the ability of machine learning methods for improving biologically-based classification in Alzheimer’s disease and other dementias.
8. Explore the disease pathways involved in AD using genetic data, neuroimaging and blood biomarkers.
9. Apply multivariate computational approaches to integrate two modalities of cellular molecular data (molecular and phenotypic expression profiling) derived from human stem cells.
3. Team members
Dr Dobril Ivanov (Research Fellow)
Dr Emily Baker (PostDoc)
Dr Janet Harwood (PostDoc)
Ganna Leonenko (PostDoc)
Thomas Rowe (PhD Student)
Karen Crawford (PhD Student)
Ioanna Katzourou (PhD Student)
Within UK DRI:
- UK DRI at Imperial: Investigation of Alzheimer’s disease Genetic Risk Score with post-head injury, dementia & neurodegeneration.
UK DRI at Cambridge: Investigation of the role of ER morphogens in dementia
- UK DRI at UCL: Translating Individual Alzheimer Genetic risk into disease phenotypes
Beyond UK DRI:
- Dementia Platforms UK (DPUK): implementation of genetic/phenotypic data analyses
- C-FoS/AD consortium: identification of optimal polygenic risk score (PRS) strategies to iPSC selection for functional studies; defining and testing high performance computing systems (HPC) for big data analyses.
- Psychiatric Genomics Consortium (PCG): advanced genomic/phenotypic data analyses
- Genetic and Environmental Risk in Alzheimer's Disease (GERAD) consortium: advanced genomic/phenotypic data analyses
- International Genomics of Alzheimer's Project (IGAP) consortium: advanced genomic/phenotypic data analyses, application of machine learning approaches to Alzheimer's Disease sub-phenotypes stratification.
- International Frontal Temporal Dementia Genomics Consortium (IFGC): genomic data analyses
Biostatistics, Bioinformatics, Genetic Epidemiology, functional genomics
Computational biology, machine learning (support vector machines (SVM), random forests and neural networks (NN)), GWAS, ChIP-seq, ATACseq, canonical correlations analyses
7. Key publications
Leonenko G, Shoai M, Bellou E, Sims R, Williams J, Hardy J, Escott-Price V (2019) Genetic risk for Alzheimer’s disease and for amyloid deposition is distinct. Annals of Neurology (doi:10.1002/ana.25530)
Baker E, Sims R, Leonenko G, Frizzati A, Harwood J, Grozeva D, Genetic and Environmental Risk in Alzheimer's Disease (GERAD) Consortium, PERADES consortium, IGAP consortia, Morgan K, Passmore P, Holmes C, Powell J, Brayne C, Gill M, Mead S, Heun R, Bossu P, Spalletta G, Goate A, Cruchaga C, van Duijn C, Maier W, Ramirez A, Jones L, Hardy J, Ivanov D, Hill M, Holmans P, Allen N, Morgan P, Williams J, Escott-Price V (2019). Gene based Analysis in HRC Imputed Genome Wide Association Data Identifies Three Novel Genes for Alzheimer's Disease. PloS One (doi.org/10.1371/journal.pone.0218111)
Leonenko G, Sims R, Shoai M, Frizzati A, Bossu P, Spalletta G, Fox N, Williams J, Genetic and Environmental Risk in Alzheimer's Disease (GERAD) Consortium, Hardy J, Escott-Price V. Polygenic Risk and Hazard Scores for Alzheimer’s disease prediction. Annals of Clinical and Translational Neurology 2019 https://doi.org/10.1002/acn3.7...)
Escott-Price V, Baker E, Myers A, Huentelman M, Hardy J. Genetic analysis suggests high misassignment rates in clinical Alzheimer's cases and controls. Neurobiology of aging 2019; 77: 178-182
Baker E, Schmidt KM, Sims R, O'Donovan MC, Williams J, Holmans P, Escott-Price V. POLARIS: Polygenic LD-adjusted risk score approach for set-based analysis of GWAS data. Genetic Epidemiology 2018; 42(4): 366-377
Escott-Price V, Myers A, Huentelman M, Hardy J. Polygenic Risk Score Analysis of Alzheimer's Disease in cases without APOE4 or APOE2 alleles. J Prev Alzheimer's Dis 2019; 6(1): 16-19
Salih D, Bayram S, Guelfi M, Reynolds R, Shoai M, Ryten M, Brenton J, Zhang D, Matarin M, Botia J, Shah R, Brookes K, Guetta-Baranes T, Morgan K, Bellou E, Cummings D, Hardy J, Edwards F, Escott-Price V. Genetic variability in response to Aβ deposition influences Alzheimer's risk. BioRxiv 2018; Preprint doi: (https://doi.org/10.1101/437657)