Abstract
Background Targeted gene surveys of the 16S rRNA gene have become a standard method for profiling the membership and biodiversity of microbial communities. These studies rely upon specialized databases that provide reference sequences and their corresponding taxonomic classifications, but few independent evaluations of the nomenclature used in the taxonomic classifications have been performed.
Results Nomenclature data collected from the List of Prokaryotic names with Standing in Nomenclature, Prokaryotic Nomenclature Up-to-Date, and CyanoDB databases were used to validate the nomenclature contained in the taxonomic classifications in the Greengenes, RDP, and SILVA 16S rRNA gene reference databases. Between 82% and 97% of the genus annotations assigned to 16S rRNA gene reference sequences were deemed valid in the reference databases. Between 18% and 97% of the species annotations in Greengenes and SILVA were deemed valid. Misannotations included the use of metadata in place of taxonomic classifications, non-adherence to the binomial nomenclature, and sequences classified as eukaryote organelles or taxa.
Conclusions The misannotations identified in public 16S rRNA gene databases call into question the reliability of research made using these resources. As targeted gene surveys depend on high quality marker gene databases, imed nomenclature accuracy will be necessary.