Since it opened in 1963 with just 230 students, the University of York has grown to be a premier research institution with over 17,000 students and 4,000 staff members. It is a member of the Russell Group of research-intensive universities in the UK.
They have become one of the world’s leading universities, carving out a reputation as an academic powerhouse where a clear focus on excellence has secured national and international recognition alongside longer established institutions.
The Bioscience Technology Facility within the Department of Biology provides access to cutting-edge research facilities and equipment to researchers within the University, as well as to external academic and commercial research groups. The range of equipment and expertise of the local staff make a wide range of techniques available, which would be uneconomical to establish within a single group.
The University of York are committed to enhancing their position as one of the world’s premier institutions for inspirational and life-changing research. Previously using high-performance computing (HPC), the research team wanted to explore the possibilities of the cloud.
Professor James Chong, Royal Society Industry Fellow at the University of York, has expertise in biology and studies the dynamics of anaerobic microbial communities to understand how to improve and make the processing of sewage sludge and waste treatment more efficient, by recovering resources and reducing the emission of greenhouse gases that harm the environment.
To do this, gigabases of DNA sequence from mixed microbial communities are collected by Prof Chong and his group, who then work with colleagues Dr John Davey (Bioinformatician) and Dr Peter Ashton, Head of the Genomics and Bioinformatics Laboratory (both in the University’s Bioscience Technology Facility), to analyse the data on their HPC clusters.
Ashton and his lab use nanopore sequencing to produce “long reads” containing hundreds of thousands of DNA base pairs, and Davey assembles these reads by comparing overlapping sections to piece them together.
The researchers perform “metagenome assembly”, which requires a large amount of RAM and disk space. One dataset which was 60 gigabases in size had failed to assemble on a small local cluster. After several unsuccessful attempts, the team spoke to Google who pointed them in the direction of CTS, who then helped them to pilot their workflow on Google Compute Engine’s virtual machines (VMs). CTS are able to offer unique services to the European Research and Education community due to their partnership with Google Cloud and GÉANT.
When consulting with CTS, the University of York found their expertise and level of engagement the key driver in developing a partnership to implement Google Compute Engine. Chong explained: “The team at CTS were amazing. They understood what we were trying to achieve straight away, and helped us get there. It was the people that made the decision for us.”
Working with CTS, the York team started running the genome assembly with 3TB of disk space but found they needed even more storage. CTS created a ‘Quick Start’ five day tailored training package that enabled the research team to get started with their cloud solutions with specific tools and knowledge that they needed. Within these five days they had solved the problem: completing their pipeline for the first time and on a single Google Compute VM set up as virtual 96 core server attached to a 4x8TB striped LVM partition. It is a great example of being able to perform research that was previously not possible, with the scale and capabilities of Google Cloud.
The success of the deployment will have a huge impact on the researchers at the University of York, especially in the way they work. Ashton said; “We hadn’t been able to run this workflow at all, but using Google VMs makes this metagenome assembly possible, accessible to more researchers, and more affordable. With the power of Google Compute, it has made our work much easier, accurate and it’s more responsive.”
“The team is what makes CTS, you become a partnership and everyone is really helpful. Especially Tim Ellis-Smith, he is great at what he does, invaluable resource and quite frankly essential to the whole project – we couldn’t do it without him.”
Working with CTS, the research team at the University of York were able to complete the metagenome assembly for 60 gigabases of microbial DNA by using the power of Google Compute Engine. With already seeing great benefits such as up time and scaling, the research team believe that scalable solutions like Google Cloud will be crucial to next-generation gene assembly and are already reviewing large projects that will be used on Google Cloud Platform.