Accurate, multi-kb reads resolve complex populations and detect rare microorganisms

  1. Jillian F. Banfield1,6
  1. 1Department of Earth and Planetary Science, University of California, Berkeley, Berkeley, California 94720, USA;
  2. 2Department of Bioengineering, Stanford University and Howard Hughes Medical Institute, Stanford University, Stanford, California 94305, USA;
  3. 3Department of Physics, Stanford University, Stanford, California 94305, USA;
  4. 4Illumina Inc. Technology Development, Hayward, California 94545, USA;
  5. 5Department of Energy, Joint Genome Institute, Walnut Creek, California 94598, USA;
  6. 6Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
  1. Corresponding author: jbanfield{at}berkeley.edu

Abstract

Accurate evaluation of microbial communities is essential for understanding global biogeochemical processes and can guide bioremediation and medical treatments. Metagenomics is most commonly used to analyze microbial diversity and metabolic potential, but assemblies of the short reads generated by current sequencing platforms may fail to recover heterogeneous strain populations and rare organisms. Here we used short (150-bp) and long (multi-kb) synthetic reads to evaluate strain heterogeneity and study microorganisms at low abundance in complex microbial communities from terrestrial sediments. The long-read data revealed multiple (probably dozens of) closely related species and strains from previously undescribed Deltaproteobacteria and Aminicenantes (candidate phylum OP8). Notably, these are the most abundant organisms in the communities, yet short-read assemblies achieved only partial genome coverage, mostly in the form of short scaffolds (N50 = ∼2200 bp). Genome architecture and metabolic potential for these lineages were reconstructed using a new synteny-based method. Analysis of long-read data also revealed thousands of species whose abundances were <0.1% in all samples. Most of the organisms in this “long tail” of rare organisms belong to phyla that are also represented by abundant organisms. Genes encoding glycosyl hydrolases are significantly more abundant than expected in rare genomes, suggesting that rare species may augment the capability for carbon turnover and confer resilience to changing environmental conditions. Overall, the study showed that a diversity of closely related strains and rare organisms account for a major portion of the communities. These are probably common features of many microbial communities and can be effectively studied using a combination of long and short reads.

Footnotes

  • Received August 14, 2014.
  • Accepted February 6, 2015.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents

Preprint Server