Abstract
EnteroBase is an integrated software environment which supports the identification of global population structures within several bacterial genera that include pathogens. Here we provide an overview on how EnteroBase works, what it can do, and its future prospects. EnteroBase has currently assembled more than 300,000 genomes from Illumina short reads from Salmonella, Escherichia, Yersinia, Clostridiodes, Helicobacter, Vibrio, and Moraxella, and genotyped those assemblies by core genome Multilocus Sequence Typing (cgMLST). Hierarchical clustering of cgMLST sequence types allows mapping, a new bacterial strain to predefined population structures at multiple levels of resolution within a few hours after uploading its short reads. Case study 1 illustrates this process for local transmissions of Salmonella enterica serovar Agama between neighboring social groups of badgers and humans. EnteroBase also supports SNP calls from both genomic assemblies and after extraction from metagenomic sequences, as illustrated by case study 2 which summarizes the microevolution of Yersinia pestis over the last 5,000 years of pandemic plague. EnteroBase can also provide a global overview of the genomic diversity within an entire genus, as illustrated by case study 3 which presents a novel, global overview of the population structure of all of the species, subspecies and clades within Escherichia.
Footnotes
↵§ The co-authors included in the Agama Study Group consist of: Derek Brown (Scottish Salmonella Reference Laboratory, Glasgow, UK); Marie Chattaway and Tim Dallman (PHE - Public Health England, Colindale, UK); Richard Delahay (National Wildlife Management Centre, APHA, Sand Hutton, York, UK); Christian Kornschober and Ariane Pietzka (AGES - Austrian Agency for Health and Food Safety, Institute for Medical Microbiology and Hygiene Graz, Austria); Burkhard Malorny (German Federal Institute for Risk Assessement, Berlin, Germany [Study Centre for Genome Sequencing and Analysis]); Liljana Petrovska and Rob Davies, (APHA - Animal and Plant Health Agency, Addlestone, UK); Andy Robertson (Environment & Sustainability Institute, University of Exeter, Penryn, UK); William Tyne (Warwick Medical School, University of Warwick, Coventry, UK); François-Xavier Weill and Marie Accou-Demartin (Institut Pasteur, Paris, France); Nicola Williams (Department of Epidemiology and Population Health, Institute of Infection and Global Health, University of Liverpool).
Abbreviations
- wgMLST
- whole genome MultiLocus Sequence Typing (Maiden et al. 2013);
- cgMLST
- core genome MultiLocus Sequence Typing (Mellmann et al. 2011);
- rMLST
- ribosomal MultiLocus Sequence Typing (Jolley et al. 2012).