Prof Valentina Escott-Price, UK DRI Group Leader at Cardiff, studied mathematics at St Petersburg University in Russia before embarking on her PhD in statistics at Cardiff University. She has helped discover 11 new susceptibility loci for Alzheimer’s disease. Here she talks to us about big data, working in high dimensional space and the challenge of persuading others to share their datasets.
You started out as a mathematician, then statistician. What first got you interested in applying maths to the study of disease?
My first research contract after my PhD in statistics was with Cardiff University’s School of Medicine. My task was to fix and maintain a database for a schizophrenia research team. Fixing the database took about a week and maintenance didn’t require much time, so I volunteered to do some genetic data analysis. I’m from the Soviet Union, where genetics and cybernetics were two areas of research that were suppressed, so for me, working in these two fields at the same time was just fascinating. I then got interested in genetic epidemiology, which struck me as being entirely logical and mathematical. I worked for about ten years in schizophrenia, bipolar disorder and depression, before becoming more and more involved in Alzheimer’s disease and dementia.
What is the biggest challenge you face when trying to make sense of large datasets?
Actually, working with large datasets in high dimensional space isn’t so much the challenge – for me this is the fun bit! A point in a dataset in high dimensional space behaves very differently from one in small dimensional space, and this is a mathematical challenge, but a really interesting one. One of the main challenges is actually to explain to biologists and clinicians that things aren’t as simple as they might expect when looking at large, complex datasets. We sometimes need to encourage our colleagues to step outside the comfort zone of their common sense.
Another big challenge is data sharing. We need to trust each other, link our data sets together, harmonise them, then move outside the standard analysis to come up with something new and better. We mathematicians can’t come up with any new ways of analysing data if it is not shared with us.
What does it mean to say that Alzheimer’s has a ‘significant polygenic component’?
When we say that a complex neurological disorder has a high polygenic component, we simply mean that multiple small effects of genes combine – along with environmental factors – to trigger disease. All of us carry some risk genes, and people carry different genes in different chromosomal positions. So it’s often impossible to say one gene is responsible for a disease – it’s much more complex than that. What we try to do is use datasets that combine genetic and environmental factors for the same people – like with the UK Biobank – so we have genetic information but also know about people’s lifestyle, exercise, medical conditions and so on. We can then model these together to find the best prediction model of a disease.
Valentina visiting the National Museum of Computing near Bletchley Park in March 2017
How do you think your work might shape approaches to diagnosing dementia?
Our aim is to be able to predict when people will become ill and how quickly illness will progress, so the right treatments can be developed and given at the right time. At the moment, we’re quite good at testing the genetic risk prediction of Alzheimer’s disease because we have standard case control studies that tell us who has the disease and who doesn’t. To be able to predict when people will develop disease and which genes are involved, we need data about age of onset. To predict how disease will progress, we need data quantifying cognitive decline at several points in time. So, again, it comes back to having access to the right data.
And how might it help to develop personalised treatments?
Many different factors – genetic and environmental – lead to people developing disease. So the first step is to use genetics to identify subsets of people where disease has developed biologically in the same way. That way, the effectiveness of different treatments can be tested. But ideally, we would go even further and take into account people’s life events, other illnesses, habits, lifestyle, and so on. Then we are embarking on personalised medicine in the truest sense.
What is the most exciting thing happening now in the field of genetics? What do you hope will be possible in five years’ time?
I think the most exciting thing is that geneticists are inviting mathematicians like me to work with them and quite often even to lead projects. Involving specialists who understand the complexities of highly dimensional data is very important.
In five years’ time, I hope we will be developing hypotheses using data-driven analysis, rather than speculating and then testing.
What positive changes have you seen in academia and what do you think the main challenges still are?
For me, the most exciting change is in technology. We mathematicians love our high-performance computing and our parallel processing! We love to process data quickly and efficiently. Technological progress also lets us collaborate more. We can reach people wherever they are in the world.
I’ve already mentioned the challenge of data sharing, and bureaucracy within academia can often get in the way. We can get very excited and want to do things quickly, but if we can’t apply our methods to real data, that can be quite demotivating. Quite often, the will to collaborate is there, but legal or technological barriers come up. A lot of work is being done to remedy this, though, so I am hopeful.
Lastly, I think there’s still perhaps too much distance between those who make high-level decisions and the researchers who actually make things happen.
What has your experience been of working as part of the UK DRI?
Compared to much of academia, the UK DRI is more dynamic and more collaborative! We have a lot of young researchers who are enthusiastic and are using fresh thinking. This enthusiasm moves you forward much more quickly. Something else I really like about working for the UK DRI is that I am not afraid to make mistakes. Research is about trying things out – trial and error – but quite often in academia, we worry about making mistakes. We are constantly expected to report on our achievements. But the UK DRI is a supportive environment where I can make a mistake and know I won’t be punished for it. I can be more creative and fearless.
What advice would you give to a student of statistics wondering what direction to take their career in?
Most of my students know what they want, and this is most often to stay in academia and have a career in medical research. So they really don’t need my advice! I think the important thing for myself and other Group Leaders is to show our enthusiasm – that speaks for itself! If we are interested and enthusiastic about our area of work, the students will follow.
Article published: 06 May 2020