Skip to content
ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Data Note

The Microbe Directory: An annotated, searchable inventory of microbes’ characteristics

[version 1; peer review: 1 approved, 3 approved with reservations]
* Equal contributors
PUBLISHED 05 Jan 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

The Microbe Directory is a collective research effort to profile and annotate more than 7,500 unique microbial species from the MetaPhlAn2 database that includes bacteria, archaea, viruses, fungi, and protozoa. By collecting and summarizing data on various microbes’ characteristics, the project comprises a database that can be used downstream of large-scale metagenomic taxonomic analyses, allowing one to interpret and explore their taxonomic classifications to have a deeper understanding of the microbial ecosystem they are studying. Such characteristics include, but are not limited to: optimal pH, optimal temperature, Gram stain, biofilm-formation, spore-formation, antimicrobial resistance, and COGEM class risk rating. The database has been manually curated by trained student-researchers from Weill Cornell Medicine and CUNY—Hunter College, and its analysis remains an ongoing effort with open-source capabilities so others can contribute. Available in SQL, JSON, and CSV (i.e. Excel) formats, the Microbe Directory can be queried for the aforementioned parameters by a microorganism’s taxonomy. In addition to the raw database, The Microbe Directory has an online counterpart (https://microbe.directory/) that provides a user-friendly interface for storage, retrieval, and analysis into which other microbial database projects could be incorporated. The Microbe Directory was primarily designed to serve as a resource for researchers conducting metagenomic analyses, but its online web interface should also prove useful to any individual who wishes to learn more about any particular microbe.

Keywords

Microbe, Metagenomics, Microbiome, Next-Generation Sequencing, Metadata, Database

Introduction

With the advent of next-generation sequencing technologies, there has been a surge of metagenomic and microbiome studies in the last decade, ranging from studying the human microbiome1 to the environment (water and soil)25, and city surfaces6,7. All these studies depend heavily on bioinformatics analyses that translate the sequences they uncover to taxonomic profiles found in their samples. However, an immediate challenge from taxonomoic outputs is the interpretation of the data. Learning more about a microorganism’s properties, such as optimal pH and temperatures, presence in the human microbiome, ability to form spores or biofilms, and antimicrobial sensitivity, amongst many others, are key to understanding the biochemical and ecological dynamics of the microbiomes that can be found. Despite the presence of several databases that include some of this information, such as MicrobeWiki, PATRIC, ARDB, and IMG-JGI, these databases are either incomplete or focus on a specific characteristic (e.g. antimicrobial resistance). The Microbe Directory seeks to fill this gap with an online tool that aggregates these data and expands their annotations, which thus provides a useful tool for exploration of functional, medical, or biological traits found in any microbial community.

Methods

MetaPhlAn2 list of species

The list of distinct species that was subject to curation was generated from the MetaPhlAn2 database, a computational tool for profiling the composition of microbial communities from sequencing data. MetaPhlAn2 works by relying on unique clade-specific marker genes identified from more than 16,000 reference genomes from NCBI and RefSeq8. It provides a 7-level (kingdom to strain) consistent taxonomic characterization of known domains of life and currently has identified >7,500 unique species in its database. This database was specifically chosen for the Microbe Directory due to its prevalent usage in microbiome and metagenomic studies9, allowing researchers to directly integrate the Microbe Directory into their research to learn more from the MetaPhlAn output10. Furthermore, there is a built-in capability for researchers to contribute and expand the Microbe Directory beyond the species currently curated in the database (see Using the Microbe Directory).

Selection and training of researchers

The Microbe Directory database was curated by a team of trained undergraduate, graduate, and medical students from City University of New York (CUNY) Hunter College, Macaulay Honors College, and Weill Cornell Medicine (see full list of students in Acknowledgements). The student-researchers were selected from a pool of applicants and underwent a three-hour training session that a) explained the objective of the research project and the desired outcome, b) provided a detailed and thorough explanation of each of the parameters that were the subject of research, and c) provided clear instructions on how to curate the internet for the parameters for each species. They were also given a tutorial on how to conduct the research for a sample of 10 species. They were given a list of annotation-based websites to assist their research, but they were not limited to using only those sites. (see Annotation Tutorial and Guidelines in Supplementary File 1).

After every entry, students inserted citation links to the sources they utilized for the information they inputted. Each student-researcher independently worked 4–5 hours per week to curate parameters for 10 species per week, for a total of 20 weeks. To ensure that students were not making errors during curation, the first three weeks of the project were heavily monitored and entries were manually checked for inaccuracies by the project leads. After the first 3-week trial, only two randomly selected species were checked manually from every submitted entry of 10 species per week, per student. Considerable error rates (3 or more incorrect annotations out of 10 being the threshold) consequently meant the student had to resubmit the entire set of 10 species the following week. While there is always the potential for human error in manually curated databases, the Microbe Directory has a feature where anyone can make an account and submit edits and changes to the information hosted in the database. Thus, there is potential for the Microbe Directory to continue to grow and expand, but also ensure minimal errors in its database.

Building the microbe directory

Table 1 defines the various microbial characteristics and categories of information that were curated to build the Microbe Directory. The parameters chosen were strictly objective features of microbes that are important to help interpret and understand the findings and context of whatever microbiome a researcher is studying. There is built-in potential to expand the Microbe Directory and for researchers to contribute more characteristics of these microbes, including native location, industrial applications, and associated symptoms/diseases; these features were considered to be included in the Microbe Directory but due to their subjective nature were omitted out to maintain proper quality control outlined above. Several databases were used to collect this information, including COGEM, MicrobeWiki, BacMap, ATCC, PATRIC, ARDB, GOLD, HOMD, and BEI Resources (see Annotation Tutorial and Guidelines and Links in Supplementary File 1). These peer-reviewed resources and databases have been well-established in the literature as reliable sources of information for researchers. Now, this information can be housed in one place, allowing for more efficient and comprehensive interpretation of microbiome analysis. Figure 1 is a heatmap summarizing the current information hosted in the Microbe Directory’s database across all species and parameters.

Table 1. The Microbe Directory inventory parameters and descriptions.

ParameterDefinition and notes
Optimal pHThe optimal pH at which this species grows. If the species was not widely studied, the American
Type Culture Collection (ATCC) was used to determine the optimal pH for storage. If two far ranges
of pH were determined, the average was taken.
Optimal
temperature
The optimal temperature at which this species grows. If the species was not widely studied, the
ATCC was used to determine the optimal temperature for storage. If two far ranges of temperatures
were determined, the average was taken.
COGEM
pathogenicity
rating
COGEM released a comprehensive database of pathogenicity assessment of around 2575 bacterial
species in 201110. The database ranks the pathogenicity of species on a scale of 1 to 4 - 1 being
not belonging to a recognized group of disease-invoking agents in humans or animals and having an
extended history of safe usage and 4 being a species that can cause a very serious human disease,
for which no prophylaxis is known.
Antimicrobial
susceptibility
Are there any known antibiotics that this species is sensitive to? No = 0, Yes = 1
Spore-formationIs the species spore-forming? No = 0, Yes = 1
Biofilm-formationIs the species biofilm-forming? No = 0, Yes = 1
ExtremophileExtremophiles are organisms that live in extreme environments, as opposed to organisms that live in
moderate (mesophilic) environments. This category includes acidophiles, thermophiles, osmophiles,
halophiles, oligotrophs, and others. Mesophiles = 0, Extremophile = 1
Gram-stainNegative = 0, Positive = 1, Indeterminate = 2
Found in human
microbiome
Microbes that live anywhere in the human body and are not pathogenic to humans (i.e. capable of
causing human disease) No=0, Yes=1
Plant pathogenDoes the species causes disease in plants? No = 0, Yes = 1
Animal pathogenDoes the species causes disease in animals? No = 0, Yes =1
90c0dbbf-71db-4b6b-b180-4e8fc1401fa7_figure1.gif

Figure 1. Microbe Directory heatmap.

Annotation types (x-axis) are represented across the online database and the numbers of each category (y-axis, left side) are shown, including Viroids (purple), Viruses (yellow), Eukaryotes (blue), Prokayotes (green), and Fungi (red). The scale for each of the types of metadata (right) are also shown for binary classifications (black, white) and quantitative traits (red scales). Heatmap was constructed using R (version 3) and Illustrator.

Pre-search. Before assignments were given to the student-researchers, the databases listed above were pre-searched in order to collect as much information as possible about the microbes. This was done using each website's search page. The species name was used as the search query, and the search results html page was parsed using regular expressions. The first search result that contained the microbe's binomial name and contained a link to the website's entry for that microbe was used as the pre-search's result. Such links for each microbe were compiled and given to each student with his or her weekly assignments. The student-researchers were only given the link to the entry, and they then had to manually find the relevant information (e.g. "optimal pH"). Such a system allowed the students to manually confirm that the pre-search identified the correct entry for the microbe and not just a microbe with a similar name. We also supplemented the manual curation by parsing MicrobeWiki for common keywords that could indicate particular features. We found that we could extract useful data for pathogenicity, biofilm-formation, microbe shape, halophilicity, spore formation, and metabolism. We were able to extract some subset of these features for 331 of the microbes that had been manually curated.

Text validation and normalization. Student-researchers filled out the columns for a given microbe using an Excel spreadsheet. Each entry was filled out as free-form text, so it was necessary to later normalize and validate the text. Valid column types included positive real numbers (e.g. optimal pH), ranges of positive real numbers (e.g. range of optimal pH values), series of ranges (e.g. multiple optimal pH ranges), binary values (e.g. spore forming or non-forming), ternary values (e.g. Gram-positive, Gram-negative, Gram-indeterminate), and quaternary values (e.g. COGEM Classes 1-4). Regular expressions (RegEx) were used to ensure that a given column entry conformed to the correct type (i.e. validation); validated columns were then transformed to a common form (i.e. normalization). The common form for each entry is the form used in the database.

Using the Microbe Directory

The Microbe Directory can be accessed online at https://microbe.directory. This interface provides individual users a way to browse and search the directory’s contents in an interactive format. Such a representation should prove useful for researchers who need information for a particular microbe. While viewing the page for a given microbe, registered users can also submit edits to that microbe’s data. Individuals can register to contribute to the Microbe Directory by signing up here. The edits are then put in a queue to be later reviewed by The Microbe Directory team (HS, DAW, RS).

In addition to the interactive web interface, the main website provides links to the project’s GitHub and BitBucket repositories. From the GitHub repository, users can download the SQLite database used to power the website. Users will also find JSON and CSV (i.e. Excel) representations of the database, which are auto-generated from the SQLite database using Python scripts. Since the Microbe Directory is meant to grow and expand over time, researchers wanting to make more substantial contributions can suggest changes to the database through our GitHub page. The requested changes will be merged as appropriate and could be incorporated into future releases. Moreover, there is a tutorial on the GitHub repository that shows users how they can use the JSON version of the database given a MetaPhlAn2 output file. Finally, the website used to power the web interface can also be accessed and modified through a separate BitBucket repository, which can also be accessed through the main website.

The Microbe Directory was designed to help researchers in the microbiome and metagenomics fields to learn more about the organisms they are identifying through their bioinformatics analyses. While this is only version 1.0 of the Microbe Directory, it is readily able to incorporate any contributions to the database to expand the microbial features included in our inventory. For more information on how to contribute to the project visit https://microbe.directory/.

Data availability

The web interface for the Microbe Directory can be found at https://microbe.directory/

The database and other files can also be found on the GitHub repository here: https://github.com/microbe-directory/microbe-directory and the BitBucket repository here: https://bitbucket.org/account/signin/?next=/microbedb/microbedb. Note: BitBucket requires a login, but account generation is free and there are no restrictions for signing up.

Archived code as at time of publication:

License: MIT

Comments on this article Comments (3)

Version 1
VERSION 1 PUBLISHED 05 Jan 2018
  • Reader Comment 19 Feb 2018
    Qunfeng Dong, at Loyola University Chicago Medical School, USA
    19 Feb 2018
    Reader Comment
    Very interesting work! I think that this resource can be potentially very helpful for metagenomics research.  I applaud the research team's enormous efforts to organize students for this tedious yet ... Continue reading
  • Author Response 08 Feb 2018
    Christopher Mason, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    08 Feb 2018
    Author Response
    We agree and are discussing this now with the editors.
    Competing Interests: No competing interests were disclosed.
  • Reader Comment 06 Feb 2018
    Daniel McDonald, University of California, San Diego, USA
    06 Feb 2018
    Reader Comment
    The article states that "Each student-researcher independently worked 4–5 hours per week to curate parameters for 10 species per week, for a total of 20 weeks." Despite 80-100 hours of ... Continue reading
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
Gates Open Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Shaaban H, Westfall DA, Mohammad R et al. The Microbe Directory: An annotated, searchable inventory of microbes’ characteristics [version 1; peer review: 1 approved, 3 approved with reservations] Gates Open Res 2018, 2:3 (https://doi.org/10.12688/gatesopenres.12772.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 05 Jan 2018
Views
36
Cite
Reviewer Report 22 Mar 2018
Elisabeth M. Bik, uBiome, San Francisco, CA, USA 
Approved with Reservations
VIEWS 36
Shaaban et al. describe The Microbe Directory, a database with more than 7,500 microbial species. This is a great initiative, in which a group of academic researchers, helped by a team of (under)graduate students have annotated bacteria, archaea, viruses and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Bik EM. Reviewer Report For: The Microbe Directory: An annotated, searchable inventory of microbes’ characteristics [version 1; peer review: 1 approved, 3 approved with reservations]. Gates Open Res 2018, 2:3 (https://doi.org/10.21956/gatesopenres.13832.r26308)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 11 Sep 2018
    David Westfall, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    11 Sep 2018
    Author Response
    Hello Dr. Bik,

    Thank you for your comments on the manuscript and database. Please see below for responses to your points.

    Manuscript:
    1. Yes, "taxonomoic" was a
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 11 Sep 2018
    David Westfall, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    11 Sep 2018
    Author Response
    Hello Dr. Bik,

    Thank you for your comments on the manuscript and database. Please see below for responses to your points.

    Manuscript:
    1. Yes, "taxonomoic" was a
    ... Continue reading
Views
20
Cite
Reviewer Report 15 Mar 2018
Nicole M. Vega, Biology Department, Emory University, Atlanta, GA, USA 
Approved
VIEWS 20
In this manuscript, the authors describe the creation and construction of the Microbe Directory, a resource for profiling and annotating species after large-scale metagenomic taxonomic analyses.

I very much like the idea of the Microbe Directory and ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Vega NM. Reviewer Report For: The Microbe Directory: An annotated, searchable inventory of microbes’ characteristics [version 1; peer review: 1 approved, 3 approved with reservations]. Gates Open Res 2018, 2:3 (https://doi.org/10.21956/gatesopenres.13832.r26309)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 11 Sep 2018
    David Westfall, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    11 Sep 2018
    Author Response
    Hello Dr. Vega,

    Thank you for your review. I have addressed your comments below:

    As far as updating the database in real-time from the sources is concerned, there ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 11 Sep 2018
    David Westfall, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    11 Sep 2018
    Author Response
    Hello Dr. Vega,

    Thank you for your review. I have addressed your comments below:

    As far as updating the database in real-time from the sources is concerned, there ... Continue reading
Views
26
Cite
Reviewer Report 15 Mar 2018
James E. McDonald, School of Biological Sciences, Bangor University, Bangor, UK 
Approved with Reservations
VIEWS 26
Concept:

The microbe directory is an excellent concept and aims to provide phenotypic and ecological profiles of approx. 7500 microbial species represented in the MetaPhlAn2 database. Although some information is present in other repositories, the Microbe Directory ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
McDonald JE. Reviewer Report For: The Microbe Directory: An annotated, searchable inventory of microbes’ characteristics [version 1; peer review: 1 approved, 3 approved with reservations]. Gates Open Res 2018, 2:3 (https://doi.org/10.21956/gatesopenres.13832.r26238)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 11 Sep 2018
    David Westfall, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    11 Sep 2018
    Author Response
    Hello Dr. McDonald,

    Thank you very much for your review. Please find responses to your comments below:

    Manuscript:
    1. Abstract
      1. The directory was
    ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 11 Sep 2018
    David Westfall, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    11 Sep 2018
    Author Response
    Hello Dr. McDonald,

    Thank you very much for your review. Please find responses to your comments below:

    Manuscript:
    1. Abstract
      1. The directory was
    ... Continue reading
Views
72
Cite
Reviewer Report 12 Jan 2018
David A. Coil, Genome Center, University of California, Davis, Davis, CA, USA 
Approved with Reservations
VIEWS 72
I love the idea behind “The Microbe Directory”. I think this information will be of great value and I really like the way it was generated with the help of students. With the ability to expand the database, I think ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Coil DA. Reviewer Report For: The Microbe Directory: An annotated, searchable inventory of microbes’ characteristics [version 1; peer review: 1 approved, 3 approved with reservations]. Gates Open Res 2018, 2:3 (https://doi.org/10.21956/gatesopenres.13832.r26191)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 09 Feb 2018
    Heba Shaaban, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    09 Feb 2018
    Author Response
    Thank you for your review, Dr. Coil.
     
    Paper:
     
    The edits you suggested regarding the paper will be published on version 2 of the manuscript. 
     
    As for the edits that contributors will be making ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 09 Feb 2018
    Heba Shaaban, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    09 Feb 2018
    Author Response
    Thank you for your review, Dr. Coil.
     
    Paper:
     
    The edits you suggested regarding the paper will be published on version 2 of the manuscript. 
     
    As for the edits that contributors will be making ... Continue reading

Comments on this article Comments (3)

Version 1
VERSION 1 PUBLISHED 05 Jan 2018
  • Reader Comment 19 Feb 2018
    Qunfeng Dong, at Loyola University Chicago Medical School, USA
    19 Feb 2018
    Reader Comment
    Very interesting work! I think that this resource can be potentially very helpful for metagenomics research.  I applaud the research team's enormous efforts to organize students for this tedious yet ... Continue reading
  • Author Response 08 Feb 2018
    Christopher Mason, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, 10065, USA
    08 Feb 2018
    Author Response
    We agree and are discussing this now with the editors.
    Competing Interests: No competing interests were disclosed.
  • Reader Comment 06 Feb 2018
    Daniel McDonald, University of California, San Diego, USA
    06 Feb 2018
    Reader Comment
    The article states that "Each student-researcher independently worked 4–5 hours per week to curate parameters for 10 species per week, for a total of 20 weeks." Despite 80-100 hours of ... Continue reading
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions

Are you a Gates-funded researcher?

If you are a previous or current Gates grant holder, sign up for information about developments, publishing and publications from Gates Open Research.

You must provide your first name
You must provide your last name
You must provide a valid email address
You must provide an institution.

Thank you!

We'll keep you updated on any major new updates to Gates Open Research

Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.