UK DRI Prize for Computational Reproducibility: Insights

Applications for the 2022 UK DRI Prize for ‘Computational Reproducibility in Dementia Research’ are now open. The scheme is designed to promote the development and open release of sustainable and reproducible code as part of dementia research output.

We spoke to last year’s winners, PhD students Kitty Murphy and Brian Schilder, both from the UK DRI at Imperial, about their experiences, and they shared some advice and resources for anyone thinking about applying this year.

Last year's

winners, Kitty Murphy (left) and Brian Schilder (right), from UK DRI at Imperial.

So, first things first, how would you define computational reproducibility?

Brian: “In general, computational reproducibility means you can run the same code with the same data, and easily get the same result. People often forget the ‘easily’ part! One of the goals I always aim for is to think about how I can make something as user friendly as possible. If you reduce the complexity, there is less chance of someone using the software incorrectly or in a way that’s unexpected.”

Kitty: “You could think of it like this: say you really liked a cake someone baked, and you wanted to bake the same cake. If they only told you the ingredients, that they mixed them, put them in a tin and baked them, you probably wouldn’t get the same cake, right? Whereas if they give you step-by-step instructions, tell you the brand and quantity of each ingredient, and whether the oven is fan assisted, then you’re much more likely to get the same cake. Computational reproducibility is the same kind of concept. When you publish a study, alongside that, you want to share the data you used and describe the computational steps in enough detail, so if someone else were to use your data and follow the steps, they would be able to get the exact same results.”

Why did you decide to enter the competition last year?

B: “I had a preprint under review at the time and I’d just started my PhD at Imperial. The software I submitted for the competition was actually a project I had done at my previous job, where I was a bioinformatician at Icahn School of Medicine at Mount Sinai in New York City. I made this software to automate fine-mapping to identify causal variants within Genome-Wide Association Studies. So the timing made sense, I had already spent a lot of time making the pipeline reproducible and putting it into an R package. I saw the competition and figured it was a good opportunity to showcase my work. Also, I think it’s great that this competition values reproducibility. Everyone knows there is a reproducibility crisis in science, and people have different definitions, so it has become a bit of a buzzword. It’s helpful to have standards in place to make sure things are reproducible.”

K: “The timing made sense, as I’d just published a preprint where I had developed an R package. Also, I think that computational reproducibility should be standard practice for computational studies, so this competition was a great opportunity to promote that.”

What was the application process like?

B: “One of the appeals for me was that it didn’t require a lot of work, certainly less than submitting a grant or a paper! I thought that was great because we’re all busy people. It was nice to have a short and succinct application form that gave the judging panel what they needed without taking up too much time to prepare.”

K: “The process was really straightforward, and anything I wasn’t sure about, I’d look at GitHub repositories of R packages that I liked and used, to make sure I was including everything I should be.”

The key word is share. Share your data, share your source code, share as much detail as you can.

Kitty Murphy

PhD student at UK DRI at Imperial and 2021 prize winner

What advice would you give someone entering this year’s competition?

B: “Thinking about the different ways that one can improve reproducibility and trying to hit at least two or three of those is a good idea. Providing the code on GitHub, containerisation, just making it crystal clear how to use the software, and why. What exactly does it do and why. Really trying to make it accessible to someone without specialist knowledge of that particular field. Providing links to other resources where relevant, because you can’t always go into a lot of detail.”

K: “The key word is share. Share your data, share your source code, share as much detail as you can. If you’ve developed a software package, have a vignette with a tutorial that has some sample data so someone can easily run through your code. And if it’s not a software package, something I personally like is if I look back at someone’s source code for their analysis, I think it’s really good if it’s commented. So comments above different chunks of code saying exactly what’s been done make it much easier to follow exactly what the code is doing. Also as an author it’s really helpful if you want to go back and look at code you’ve made years ago.”

Any resources you’d like to share?

B: “In the first year and half or so of my PhD I’ve put a lot of time into making templates and resources that are widely usable for other people in my lab, at Imperial and research institutions at large. Some good resources would be our Neurogenomics lab GitHub, and this template for making R packages that are Bioconductor and CRAN friendly. You can also check out this repository I made for a presentation on documenting and sharing code and results in research.”

K: “Generally, if I’m reading a paper, I’ll always look at the code availability statement and the data availability statement to see what other people have done. If someone is developing their first R package, there is a really nice tutorial that I’ve used. If you’re creating a vignette for your package, I’d suggest using R markdown in R. And of course, using GitHub.”

For more information and to apply for the competition, UK DRI researchers can visit the Portal Awards page.

Article published: 31 May 2022

UK DRI Prize for Computational Reproducibility: Insights from last year’s winners

So, first things first, how would you define computational reproducibility?

Why did you decide to enter the competition last year?

What was the application process like?

What advice would you give someone entering this year’s competition?

Any resources you’d like to share?