Bioinformatics, Sequencing & Proteomics

Our group

Due to the advance of high-throughput techniques in recent years, biological experiments are relying more and more on datasets of unprecedented volume and complexity. Current sequencing platforms, like the NovaSeq can now sequence up to 48 human genomes and generate approximately 6TB of data, in as little as 40 hours. These developments make it possible to analyse biological processes at nucleotide level precision, even within a single cell. In addition, multiple sequencing applications can be combined to provide a better understanding of the layered and interconnected processes underpinning biological systems. Both the generation and analysis of such complex and bespoke datasets often pose novel challenges.

Our aims

The Bioinformatics, Sequencing and Proteomics Group collaborates with research groups across the Institute, enabling the application of modern high-throughput technologies to study virus-host interactions. The group’s approach is two-fold.

Firstly, we specialise in the production of bespoke datasets using high-throughput sequencing (Illumina and third generation sequencing technologies). These range from the interrogation of viral genome diversity during outbreaks of veterinary diseases to the transcriptomic profiling of virally infected cells and livestock genome sequencing. This genomic data can also be combined with functional and proteomic approaches to characterise variants and motif phenotypes (through collaboration with the University of Liverpool).

Secondly, we develop and apply bioinformatics tools and integrative methods to extract and interpret valuable information from high-throughput data. This includes the accurate annotation of large livestock genomes and the development of novel sequence assembly pipelines, as well as metagenomic analyses of vaccine-derived cell lines and the identification of opportunistic pathogens.

Our research

Activities as part of our role as a core capability, include the management of a state-of-the-art sequencing unit located within high containment. This incorporates both liquid handling robotics and single cell technologies aligned with bespoke Illumina and third generation-based library preparation. We have also developed bioinformatic pipelines tailored to the viruses and host species we study, with analysis undertaken on the Institute’s in-house high-performance computing cluster. Our research focusses on integrating these experimental (RNA sequencing, spatial gene expression and ribosome profiling) and in-silico approaches to analyse veterinary viruses and their hosts. We also provide hands on training in both fields of sequencing and bioinformatics for Institute scientists, students and their collaborating partners.

Example projects have included:

Whole genome sequencing of African swine fever virus (ASFV)
Sexual development in mosquitoes
Investigating the technical bias in Illumina sequencing of avian Coronaviruses
Phylogeography of foot-and-mouth disease virus
Single cell sequencing approaches
Characterising transcriptomic responses to avian influenza infections and the impact of interferon pre-treatment on these responses
Metagenomics approaches for veterinary diseases

Our impact

We have had significant input into research at the Institute through successful collaborations with other specialist groups, which have resulted in several breakthrough applications across multiple disciplines.

Acting as part of a major international consortium, we were able to use novel metagenomic workflows to eliminate viruses of major veterinary significance as causative agents during a mass die-off of Saiga Antelope in Kazakhstan in 2015. Furthermore, this research implicated Pasteurella multocida bacteria as the main pathogen driving this event, advancing our understanding of these mysterious phenomenon.

Ongoing collaborations with the ASFV Vaccinology Group have focused on combining sequencing data from multiple platforms validate the accuracy of existing and novel ASFV genome sequences. We have also worked with the Coronavirus Group and industrial partners in sequencing and interpreting data on viral variation during the production of live attenuated vaccines. This will significantly advance our knowledge and accelerate the design of improved and more robust live attenuated coronavirus vaccines.

In collaboration with the World Reference Laboratory for FMDV, we have constructed high quality FMDV genome assemblies for use in global surveillance databases and developed specialised toolkits for analysing FMDV sequence data. The group also plays a central role in research at Pirbright in the production and standardisation of tailored analysis pipelines, enabling institute researchers’ access to cutting-edge tools for performing viral assemblies, transcriptomic profiling and metagenomics.

Finally, large annotation projects such as those annotating the Culex quinquefasciatus mosquito genome have enabled us to contribute to global entomology research.