Keywords
16S rRNA amplicon analysis, microbial community analysis, microbial ecology, next-generation sequencing, bioinformatic pipeline
16S rRNA amplicon analysis, microbial community analysis, microbial ecology, next-generation sequencing, bioinformatic pipeline
Recent advances in massive high-throughput, short-amplicon sequencing are revolutionizing efforts to describe microbial diversity within and across complex biomes1. Cultivation-independent whole metagenome sequencing has received increasing attention in the functional characterization of individual communities. These efforts, however, remain relatively expensive on a per sample basis, and the richer but much more unstructured information content requires complex data modelling and analysis procedures2. Therefore targeted surveys for specific taxonomic marker genes, such as the 16S ribosomal RNA (rRNA) gene3,4, remain essential in many microbial ecological studies. These surveys rely on sequencing of short, PCR amplified, hypervariable subregions rather than of the full-length gene, mostly for reasons of throughput, sequence depth and cost-efficiency.
There have been great efforts to address the accuracy and reproducibility of findings from 16S rRNA gene amplicon sequencing studies through increased levels of standardization, and software pipelines provide comprehensive protocols to analyze microbial ecology datasets. However, these efforts have arguably enhanced replicability rather than reproducibility, by providing widely adopted defaults5. To this end, Drummond6 suggested that exact replication of an experiment (i.e., replicability) is less informative (although a necessary pre-requisite for any scientific endeavour) than the corroboration of findings by reproduction in different independent setups (i.e., reproducibility)7, because biological findings that are robust to independent methodologies are arguably more dependable than any single-track analysis5. This distinction is highly relevant for the field of microbial ecology, where replicability is often confused with reproducibility, which is apparent from many often non-interchangeable methodologies.
Accuracy can typically be evaluated by the addition of positive controls. Generally these are synthetic or mock communities (MCs) consisting of phylotypes that, ideally, are representative of the ecosystem of interest. MCs allow researchers to answer two essential questions concerning accuracy. 1) Do I retrieve the number of species I put in, and if so are they correctly assigned? 2) How well does the sequencing and data analysis procedure reproduce species relative abundances? Reproducibility can be evaluated by comparing separate sequencing runs and different primer pairs that cover distinct 16S rRNA gene regions. Although replicability is often achieved, accuracy has been shown to be challenging especially at higher taxonomic resolution such as at genus level8,9.
Central to all 16S rRNA gene amplicon studies are Operational Taxonomic Units (OTUs). These are often regarded as the synthetic proxy for microbial species and are typically clustered at 97% sequence similarity. However, the prokaryotic species definition remains a hotly debated topic without any satisfying solution so far10–12. Moreover, the 97% sequence similarity threshold is based on the complete 16S rRNA gene (~1500 nt), and although sequence variability is not evenly distributed it is routinely applied to short reads of 100–500 nt. Different regions would therefore require their own species level cut-off. This combination of an ambiguous prokaryotic species definition and its application to short reads, is the foundation for many complications regarding ‘correct’ OTU clustering. Hence there is little consensus on key experimental choices such as primers, targeted variable regions and OTU picking/clustering algorithms. Each of these technical aspects generate biases, and different methods produce clearly distinct results, leading to a situation where results of current studies cannot be easily compared or extrapolated to other study designs. Therefore there is a strong need for standardization.
Historically, 16S rRNA gene sequences generated in a project were initially clustered de novo into OTUs at >97% sequence similarity using various clustering algorithms, mostly because available 16S rRNA gene reference databases were thought to provide insufficient coverage13–16. Although new clustering algorithms that reduce the influence of clustering parameters, such as a hard cutoff for cluster similarity, have been specifically developed for amplicons17, cluster generation is context-dependent, i.e. different datasets generate different clusters, and different algorithms may produce different end-results5,18. Therefore, even though the same analysis framework is used, independent studies remain incomparable at OTU level. Consequently, reference-based OTU clustering has received increasing attention, due to the need for standardization, and because de-novo OTU clustering for very large datasets, such as those generated by Hiseq and Miseq sequencers has become computationally very intensive, unless greedy heuristics are employed which suffer from the problems described above. With reference-based OTU clustering, sequences are mapped to pre-clustered reference sets of curated 16S rRNA gene sequences, provided by dedicated databases such as the Ribosomal Database Project (RDP), Greengenes and SILVA19–21. The consequence of this approach is that the ‘quality’ of the clustering of the reference set propagates to reference-picked OTUs. Clustering has limited robustness5,18,22, and unbalances in databases due to over- or under-representation of certain species as well as error hotspots that are not necessarily matched to the variable regions23, can potentially lead to a biased cluster formation, driven by non-biological factors. These effects have been previously ignored or underestimated in reference OTU picking protocols5.
Another essential experimental choice concerns the selection of targeted variable region, because it should represent the sequence variability encountered with the full-length gene. Despite several studies comparing the performance of diverse regions, sequence lengths, sequencing platforms and taxon assignment methodologies, both within and across laboratories23–29, there still is no standard or consensus of best choices for variable regions. There are several factors that can lead to the commonly observed highly region-specific behaviour across datasets: 1) PCR bias of varying degrees23,27,30, 2) different regions are associated with different error profiles and different rates of chimera formation23,31, and 3) the actual variation contained in the sequence is dissimilar (e.g. some regions are not variable enough to differentiate between genera, while others are), which in turn can affect clustering5.
Apart from the use of a diverse range of primers and OTU picking protocols that can cause differences in results between studies and/or laboratories, sequencing error is a third important factor that defines data quality. Massive high throughput, short read length sequencing platforms have not been developed for amplicon sequencing but rather for whole genome sequencing, where sequence errors in individual reads is less important. However, in 16S rRNA gene amplicon sequencing every sequencing error could potentially lead to the false discovery of a new species. To avoid overestimation of microbial diversity, stringent quality filtering is therefore considered essential9.
Methodology rather than biology has often been shown to be the largest driver of variation in microbiome studies5,18,23,27,29,32–34, and this aspect of amplicon sequencing is increasingly addressed in literature. Nevertheless, a satisfactory solution has not been found. To address the aforementioned challenges we have applied several recommendations from literature to validate high throughput, high-resolution microbiota profiling, using Illumina Hiseq2000 101nt paired end sequencing data as a test case. We implemented redundancy by sequencing two tandem variable 16S rRNA gene regions in parallel (V4 and V5-V6). To find optimal filtering settings and to empirically determine the noise floor, multiple standardized mock communities specifically designed to tackle issues associated with filtering parameter optimization were added to each sequencing run. To our knowledge a similar setup with multiple MCs and different variable regions has not been applied to datasets generated by the Illumina platform. This set-up has enabled us to develop NG-Tax, a pipeline that better accounts for error associated with a range of technical aspects of 16S rRNA gene amplicon sequencing. NG-Tax will improve comparability by removing technical bias and facilitate efforts towards standardization, by focusing on reproducibility as well as accuracy. To assess the performance we benchmarked the results obtained with NG-Tax with results obtained with QIIME35, a common pipeline used for the analysis of this type of data.
NG-Tax consists of three core elements, namely barcode-primer filtering, OTU-picking and taxonomic assignment (Figure 1).
Barcode-Primer filtering. In a first step, paired end libraries are combined, and only read pairs with perfectly matching primers and barcodes are retained. To this end, both primers are barcoded to facilitate identification of chimeras produced during library generation after pooling of individual PCR products.
OTU picking. For each sample an OTU table is created with the most abundant sequences, using a minimum user defined relative abundance threshold. In this particular study we employed a threshold of 0.1% minimum relative abundance. Lowering the threshold will lead to the acceptance of low abundant OTUs, with an increased probability of these OTUs being artifacts due to sequencing and PCR errors. Abundance thresholds are commonly used to remove spurious OTUs generated by sequencing and PCR errors8,36, but previous studies applied a fraction threshold defined by the complete dataset under study, thereby ignoring sample size heterogeneity which may lead to under-representation of asymmetrically distributed OTUs.
Commonly employed quality filtering parameters based on Phred score, such as minimum average Phred score, maximum number of ambiguous positions, maximum bad run length, trimming and minimum read length after quality trimming, are not utilized in NG-Tax because quality scores from the Illumina base caller have been shown to be of limited use for the identification of actual sequence errors for 16S rRNA gene amplicon studies9,37. Additionally, these quality scores only check for errors that occurred during sequencing, but do not account for other sources of error, such as PCR amplification, whereas quality filtering by abundance is sensitive to any source of error. Moreover, the application of global parameters (e.g. average Phred score) ignores that error is sequence-specific, and hence some sequences could be affected more than others. If a species specific amplicon is more prone to PCR or sequencing errors, the relative abundance of that particular species will be underestimated. To compensate for this potential bias, discarded reads are clustered to the OTUs with one mismatch.
Finally, all OTUs are subjected to non-reference based chimera checking according to the following principle: given three OTUs named A, B and C, C will be considered a chimera when the following conditions are satisfied: C and A 5’ reads are identical, C and B 3’ reads are identical and both OTUs, A and B, are at least twice as abundant as OTU C. A complete overview of the number of sequences retained in both pipelines, i.e. NG-Tax and QIIME, as well as the final number of OTUs, is provided in Dataset 1.
Taxonomic assignment. In the current version of NG-Tax, taxonomy is assigned to OTUs utilizing the uclust algorithm16 and the Silva_111_SSU Ref database, containing 731,863 unique full length 16S rRNA gene sequences. To ensure maximum resolution and avoid the risk of errors due to clustering-associated flaws (e.g. reference sequence error hotspots, overrepresentation of certain species and lack of robustness in cluster formation by clustering algorithms), we use the non-clustered database. To speed up the procedure by several orders of magnitude, 16S rRNA gene sequences from the reference database are trimmed to contain only the region amplified by the primers. For each OTU, a taxonomic assignment is retrieved at six different identity thresholds levels (100%, 98%, 97%, 95%, 92% and 90%) and at two taxonomic levels (genus and family). The final taxonomic label is determined by the assignments that show concordance at the highest taxonomic resolution.
Our main objective was to develop a pipeline that accurately reproduces the synthetic MCs and also reduces the impact of experimental choices on the results. To achieve this goal, four synthetic communities of varying complexity were created, consisting of 16S rRNA gene amplicons of phylotypes (PTs) associated with the human GI-tract (Table 1). This specific setup limited the likelihood of overfitting to a particular OTU composition or distribution and allowed us to assess (1) the quantification potential, (2) noise floor and (3) the effect of richness and diversity on quality filtering parameters, thus ensuring a higher fidelity with biological samples than by using a single MC. As a reference, to assess the quality of the taxonomic classifications, full length sequences for all PTs were obtained through Sanger sequencing. Expected MCs were created by trimming the full length sequences to the sequenced region. MC1 and MC2 consisted of equimolar amounts of 17 and 55 PTs, respectively. MC3 contained 55 PTs in staggered concentrations typical for the human GI-tract, and MC4 included 50 PTs with relative abundances ranging between 0.001 and 2.49%. To account for pipetting errors, each of the four MCs was produced in triplicate. To design a pipeline that puts more focus on biology, these 12 MC templates were used to sequence the MCs with different conditions that cover most of the technical bias associated with 16S rRNA gene amplicon studies reported in literature. To this end, we 1) targeted either region V4 or region V5-V6, 2) used four PCR protocols differing in the number of PCR cycles and reaction volumes 3) PCR products were analysed in three different sequencing runs and in seven different libraries, and 4) two different library preparation protocols (with and without an extra amplification of 10 cycles) were applied (Table 2). In addition the sequencing depth ranged from 2363 to 335822 reads per sample (Dataset 1).
To evaluate the accuracy and reproducibility of taxonomic classification using a low information content of ~140 nt compared to a maximal information content of ~1500 nt, we compared the NG-Tax classification of all 55 reference sequences used in this study trimmed to V4 and V5-V6, with a classification of the corresponding full length reference sequences using the RDP classifier (RDPc)24 (Figure 2). At family level, all three classifications (i.e. full length, V4 and V5-V6) were in complete concordance for all phylotypes. Correspondingly, the consistency at genus level was very high. Only a few phylotypes (nine and five for V4 and V5-V6 amplicons, respectively), that belong to poorly classified families such as Peptostreptococcaceae, Ruminococceae and Enterobacteriaceae, attained higher resolution using the full length sequences and RDPc. In turn, for Pseudobutyrivibrio (PT35), a higher resolution was attained with short reads due to the high specificity of the hypervariable regions, which can be overshadowed when using the full length sequence. Lastly, only two (PT51, PT52, V4) and one (PT51, V5-V6) assignments at genus level (both members of the Enterobacteriaceae) were incongruent between classification of the short and full length sequences. Overall, the V5-V6 amplicons outperformed the V4 amplicons because this region allowed for differentiation between members of the Enterobacteriaceae. The average taxonomic specificity (percentage of hits with an identical taxonomic label) for all reference phylotypes was 96% for both regions with an average of 1485 and 635 hits for regions V4 and V5-V6, respectively. The high specificity and high number of hits at very high identity thresholds, combined with the fact that the vast majority of V4 and V5-V6 based assignments matched, testifies for the reliability and quality of the assignments.
To assess the ability to reproduce the expected composition of the MCs we benchmarked NG-Tax with QIIME, a commonly employed 16S rRNA gene amplicon analysis pipeline. The reproduction of MC compositional profiles generated by amplicon sequencing on Illumina platforms commonly suffers from a high fraction of poorly classified and spurious OTUs9. Using QIIME, up to 20% of OTUs per sample could not be assigned beyond the class level (Figure 3). In contrast, with NG-Tax we observed excellent reproduction of the expected profiles (Figure 4). An average of 92.02% of the reads could be assigned to genus level and 99.94% to at least family level. Spurious genera (Robinsoniella, Subdoligranulum, Cupriavidus, Ralstonia, Kluyvera and Pantoea) represented on average only 0.02% of the reads per sample compared to an average of 23% misclassified reads using QIIME38. One template, PT17 (Parabacteroides), attracted so much sequencing error in the V4 region that it was rendered undetectable although it was amplified by the primers (Supplementary Figure 1). Therefore, to test both pipelines without this sequencing anomaly, it was removed from the analysis.
Richness and diversity measures are important for understanding community complexity and dynamics. Among these measures, α-diversity is defined as the diversity within a sample, which is often estimated based on the abundance distribution (evenness) and number (richness) of species, whereas β-diversity is defined as the partitioning of diversity among communities. The ability of researchers to quantify richness and diversity hinges on an accurate assessment of the composition of these communities39. For microbial communities, this has been particularly challenging, as none of the existing molecular microbial ecology methods normally capture more than a small proportion of the estimated total richness in most microbial communities40. For deep sequencing based approaches, filtering strategies that remove low-abundance reads make it impossible to apply richness estimation metrics such as the Chao1 index and the ACE coverage estimator, because low-abundance read counts are included in their calculations. Conversely, richness estimates based on unfiltered datasets are unlikely to be accurate, if many of the reads actually represent PCR and/or sequencing errors9. In contrast to purely OTU-based methods, divergence-based methods account for the fact that not all species within a sample are equally related to each other, considering two communities to be similar if they harbour the same phylogenetic lineages, even if the species representing those lineages in each of the communities are different. Consequently, these methods are more powerful than purely OTU-based methods, because similarity in 16S rRNA gene sequence often correlates with phenotypic similarity in key features such as metabolic capabilities. An added benefit is that small errors that are likely due to unfiltered sequencing errors, are punished less severely because OTUs that are only a few nt distant from each other due to error are still closely related using divergence based indices. Therefore, these indices probably provide a better estimate of the true diversity for data generated by high throughput next generation technology sequencers.
Because the focus of NG-Tax is to retain as much biological signal as possible while minimizing the impact of any technical choice, divergence-based α-diversity (Phylogenetic Diversity (PD)41) and β-diversity (Unifrac39) metrics were used to visualize the diversity within and between MCs (Figure 5). The results obtained with QIIME suffered from all of the previously described technological artifacts. The MCs clustered by primer pair instead of MC, and within each cluster the structure, i.e. the position of MCs relative to each other, was different. More importantly, the true biological variation depicted by the expected composition was reproduced by neither primer pair (Figure 5C). Based on these results not only the Principle Coordinates Analysis (PCoA) based conclusions would have been different for both primer pairs, but also the differences in taxonomic classification could lead to significant changes in identified biomarkers, in line with what has previously been observed by He and co-workers28. Here we show that replicability within a variable region was attained. The more important reproducibility, however, i.e. the corroboration of findings by reproduction in different independent setups that use e.g. different primers, was not. This is an important observation because biological findings should be insensitive to independent methodologies5. In line with the above, also the observed α-diversity (PD) was found highly inflated and the biological order was not reproduced (Figure 5D). In contrast, NG-Tax provided a clear separation of samples by MC type and their representative expected samples regardless of variable region, PCR protocol, sequencing run, library and sequencing depth. These results are remarkable, given the biases associated with each of these categories and the difference in resolution between the two regions (Figure 5A). Moreover, MC2, MC3 and MC4 were very similar, sharing the same genera and largely the same phylotypes, only differing in relative distribution (Table 1). Correspondingly, rarefaction curves for α-diversity (Figure 5B) showed excellent reproduction of the true diversity. A perfect overlap cannot be achieved since the expected MCs do not account for sequencing or PCR errors, and these factors cannot be completely removed from real sequencing data.
An increasing number of studies have shown that the chosen methodology rather than the natural variance is responsible for the greatest variance in microbiome studies5,18,23,27,32–34. Some authors raised their concern with comparing data generated using different strategies29, which basically suggests that true reproducibility (i.e. using different approaches and drawing the same biological conclusions) cannot be attained. This is an alarming observation since studies are often used to identity biomarker organisms, associated with certain host phenotypes (often comparing a diseased state to a healthy state), yet the use of different primers might show different biomarkers8,22,23,27,28,30. So far, neither currently available pipelines nor taxonomic classifiers have been able to efficiently reduce the noise in this type of data. Nevertheless, in properly de-noised datasets, taxonomical profiles, richness and diversity should be close to the expected values and the abundance of unassigned and poorly assigned reads should be low except when dealing with largely unexplored environments that are not sufficiently covered yet by the reference databases. At lower noise levels different variable regions should yield similar conclusions with small variations due to region specific resolution, and minor changes in the experiment should still deliver the same biological conclusions. Here we presented NG-Tax, an improved pipeline for 16S rRNA gene amplicon sequencing data, which continues to be a backbone in the analysis of microbial ecosystems. Several novel steps ensure much needed improved robustness against errors associated with technical aspects of these studies, such as PCR protocols, choice of 16S rRNA gene variable region and variable rates of sequencing error27,29,32. The commonly reported problems such as many un- or poorly classified OTUs, inflated richness and diversity, taxonomic profiles that do not match the expected ones, region dependent taxonomic classification and results being highly dependent on minor changes in the experimental setup have been tackled with NG-Tax. Despite the short read length (~140 nt) and all technical biases, the average taxonomic assignment specificity for the phylotypes included in the MCs was 96%. In addition, 92.02% of the reads could be assigned to at genus level and 99.94% to at least family. Spurious genera represented only 0.02% of the reads per sample. More importantly, rarefaction curves and PCoA plots confirmed improved performance of NG-Tax with respect to clustering based on biology rather than technical aspects, such as sequencing run, library or choice of 16S rRNA gene region. Therefore NG-Tax represents a method for 16S rRNA gene amplicon analysis with improved qualitative and quantitative representation of the true sample composition. Additionally, the high robustness against technical bias associated with 16S rRNA gene amplicon studies will improve comparability between studies and facilitate efforts towards standardization.
Primer pairs 515F (5’-GTGCCAGCMGCCGCGGTAA) - 806R (5’-GGACTACHVGGGTWTCTAAT) and BSF784F (5’-RGGATTAGATACCC) - 1064R (5’-CGACRRCCATGCANCACCT) have been previously reported for amplification of the V48 and V5- V627 regions of the bacterial 16S rRNA gene, respectively. They were selected based on 1) experimental validation, 2) taxonomic coverage of the relevant ecosystem (Supplementary Figure 2) and 4) adherence to specific rules associated with the sequencing platform, such as a maximum amplicon size of <500 nt. Unless noted otherwise all primers were ordered at Biolegio (Nijmegen, Netherlands).
At the time of sequencing Illumina’s Hiseq2000 allowed for multiplexing of up to 12 samples per lane using an index or barcode read provided by Illumina. To achieve optimal sample throughput and phylogenetic depth, 70 primers containing a custom designed 8nt barcode were developed to combine with the Illumina barcodes to reach a maximum throughput of 12×70 samples per lane. Each set of 70 barcoded samples are referred to as “library”. Low diversity samples, such as 16S rRNA gene amplicons, can lead to problems with base calling due to overexposure of fluorescent labels. Therefore, the set of 70 barcodes was specifically designed to possess an equal base distribution over their complete length. Additionally, to avoid differential amplification, a two-base “linker” sequence that is not complementary to any 16S rRNA sequence at the corresponding position, from a database that contains 1132 phylotypes associated with the Human GI tract42, was inserted between the primer and barcode. The resulting set of 70 barcoded primers was checked for avoidance of secondary structure formation within or between primers (i.e., primer-dimers) or between barcodes and primers, using PrimerProspector43.
All MCs were mixed in triplicate to account for pipetting error. These MCs ranged from 17–55 species in both equimolar and staggered compositions. One MC contained members at very low abundances of 0.1, 0.01 and 0.001% (Table 2). Amplicons were generated either from cloned 16S rRNA gene amplicons, isolates available in the local culture collection of the Laboratory of Microbiology, Wageningen University, or strains ordered from DSMZ and cultured according to DSMZ recommendations, after which genomic DNA was isolated using the Genejet genomic DNA isolation kit (Thermo fisher scientific AG, Reinach, Zwitserland). A 16S rRNA gene specific PCR was performed using the universal primers 27F (5’-GTTTGATCCTGGCTCAG) - 1492R (5’-GGTTACCTTGTTACGACTT) to obtain full length amplicons of which size and concentration were checked on a 1% agarose gel and which were column purified and quantified with the Qubit 2.0 fluorometer, and dsDNA BR assay kit (Invitrogen, Eugene, USA). Amplicons were mixed in the MCs to obtain the specified relative abundances. High quality full length reference sequences of all MC members were obtained by Sanger sequencing at GATC Biotech AG (Constance, Germany) with sequencing primers 27F - 1492R in order to confirm their identity. The MCs were diluted 103-fold and subsequently used as templates in PCRs for the generation of barcoded PCR products.
Unless noted otherwise, each sample was amplified in triplicate using Phusion hot start II high fidelity polymerase (Thermo fisher scientific AG), checked for correct size and concentration on a 1% agarose gel and subsequently combined and column-purified with the High pure PCR cleanup micro kit (Roche diagnostics, Mannheim, Germany). Forty μl PCR reactions contained 28.4 μL nucleotide free water (Promega, Madison, USA), 0.4 μL of 2 U/μl polymerase, 8 μL of 5× HF buffer, 0.8 μl of 10 μM stock solutions of each of the forward (515F) and reverse (806R) primers, 0.8 μL 10mM dNTPs (Promega) and 0.8 μL template DNA (103 × diluted 200 ng/μl stock). Reactions were held at 98°C for 30 s and amplification proceeding for 25 cycles at 98°C for 10 s, 50°C for 10 s, 72°C for 10 s and a final extension of 7 min at 72°C. Purified amplicons were quantified using Qubit. For primer pair BSF784F-1064R the thermal cycling conditions were identical to those detailed above except that the annealing temperature was 42°C. To quantify noise generated by the PCR protocol, several reactions were performed with 30 or 35 cycles and 1× 100μl reaction instead of pooling 40μl in triplicate (Table 2).
A composite sample for sequencing was created by combining equimolar amounts of amplicons from the individual samples, followed by gel purification and ethanol precipitation to remove any remaining contaminants. The resulting libraries were sent to GATC Biotech AG for sequencing on an Illumina Hiseq2000 instrument.
We have used QIIME to benchmark NG-Tax. Illumina fastq files were de-multiplexed, quality filtered and analyzed using QIIME (v. 1.9)35 with closed reference OTU picking, using default settings and quality parameters as previously reported9.
The NG-tax pipeline, user manual and files and code to reproduce the presented results, are available for download at the ftp server http://systemsbiology.nl/NG-Tax/.
rRNA: ribosomal RNA; MC: Mock Community; OTU: Operational Taxonomic Unit; PT: Phylotype; RDP: Ribosomal Database Project; RDPc: RDP classifier; PD: Phylogenetic Diversity; PCoA: Principle Coordinates Analysis
F1000Research: Dataset 1. Raw data of NG-Tax pipeline for analysis of 16S rRNA amplicons from complex biome, 10.5256/f1000research.9227.d13012044
Sequence data have been deposited in the European Nucleotide Archive45, accession number [ENA:PRJEB11702]) http://www.ebi.ac.uk/ena/data/view/PRJEB11702 (amplicon sequencing data for all 49 samples) and [ENA:LN907729-LN907783]) (full length 16S rRNA gene sequences for all 55 PTs).
GDAH and JRG wrote the manuscript. JRG conceived NG-Tax and performed the statistical analysis. GDAH, PS, EGZ and HS designed the experiment, GDAH constructed the MCs and prepared libraries 1-2 for sequencing. DS and CG provided the data for libraries 3-7. HS, DS, EGZ and PS helped to draft the manuscript, of which the final version was read and approved by all the authors.
This work was funded by Top Institute Food and Nutrition (TIFN, Wageningen, The Netherlands), a public - private partnership on precompetitive research in food and nutrition. We are grateful for additional support from the European Community’s Seventh Framework Program (FP7/2007–2013) under grant agreement no. 227197 Promicrobe.
We thank Gianina Bacanu for generating libraries 3–7 and Jesse van Dam for revising the scripts.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
References
1. Edgar RC: UPARSE: highly accurate OTU sequences from microbial amplicon reads.Nat Methods. 2013; 10 (10): 996-8 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
References
1. Tremblay J, Singh K, Fern A, Kirton ES, et al.: Primer and platform effects on 16S rRNA tag sequencing.Front Microbiol. 2015; 6: 771 PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||||
---|---|---|---|---|
1 | 2 | 3 | 4 | |
Version 2 (revision) 23 Nov 18 |
read | read | ||
Version 1 22 Jul 16 |
read | read | read |
Click here to access the data.
Spreadsheet data files may not format correctly if your computer is using different default delimiters (symbols used to separate values into separate cells) - a spreadsheet created in one region is sometimes misinterpreted by computers in other regions. You can change the regional settings on your computer so that the spreadsheet can be interpreted correctly.
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (2)
http://www.systemsbiology.nl/NG-Tax/
We will correct the link in version 2 of the paper.
Sorry for the inconveniences.
Javier Ramiro-Garcia
http://www.systemsbiology.nl/NG-Tax/
We will correct the link in version 2 of the paper.
Sorry for the inconveniences.
Javier Ramiro-Garcia
http://www.systemsbiology.nl/NG-Tax/
We will correct the link in version 2 of the paper.
Sorry for the inconveniences.
Javier Ramiro-Garcia