The ingredients that you need to make a datathon
5 April 2019
Ingredient 1: the tech.
The richest and most dementia-relevant datasets in the world would not be useful in a datathon if we didn’t have the computing power and flexibility to be able to handle them. That’s why the Data Portal infrastructure in Swansea, on the UK Secure eResearch Platform, is the first crucial ingredient for a datathon. The Data Portal provides scalable and customisable infrastructure with a central safe-haven analysis environment. It can adapt to the needs of the scientific user group. The ambition of a datathon is to bring scientists together, combine skillsets and approaches and see what insights they can come up with whilst interrogating the rich data resources. Attendees often don’t have the same scientific backgrounds and it’s not a case of 'one size fits all' with their technical needs either. In the case of our DPUK datathon series, we are offering bespoke software and hardware, for example, installing pre-package-customised versions of data science software such as Python, or increasing the compute power of the desktop being used to suit a machine learning use case. The Data Portal works via remote access – the data itself never moves from storage in Swansea. All of this is managed by our central security system in Swansea to allow secure analysis.
Ingredient 2: the data.
Rich data are critical for a datathon to work, and very often this means more than one dataset. In our Exeter datathon, for example, the scientists worked with the ELSA and NACC cohort datasets. Combining the healthy population cohort (ELSA) with the Alzheimer’s cohort (NACC) gives the datathon scientists a better sample of people with dementia, and more measures to look at. It means that the scientists’ analyses will be strengthened.
With eight data collection waves, the ELSA cohort is really one of the powerhouses when it comes to rich data that’s extremely relevant to dementia. The numbers are the first incredible thing: in the latest data collection wave alone, the ELSA data has over 6000 variables for 8500 participants. But it’s not just the numbers. What makes ELSA such a great dataset for dementia investigation is the nature of the particular records that it contains: there’s a great mix of cognitive and mental health test data – including measures of verbal fluency, reasoning, recall memory, and measures of depression. Recent scientific findings are suggesting important links between mental health and dementia and so it’s really important for data scientists to explore the patterns between these variables in more depth.
Ingredient 3: the ideas.
The data scientists and their varied expertise are the final and most vital ingredient in a datathon. Machine learning is one set of advanced techniques that they will bring to the table, but the beauty of a datathon is that the different attendees contribute expertise in different techniques. Bouncing ideas off each other is the most exciting part of a successful datathon, and indeed is the essence of it. In a datathon, the real hope for dementia lies in all the ideas and techniques they’ll bring to building prediction models for dementia. This could turn out to be a critical step on the path to find a cure.
Are you a data scientist interested in dementia? We are now recruiting data scientists with the right skills to take part in the virtual DPUK datathon. Apply here before 15 May 2020.