Abstract
Most microbes inhabiting the planet cannot be easily grown in the lab. Metagenomic techniques provide a means to study these organisms, and recent advances in the field have enabled the resolution of individual genomes from metagenomes, so-called Metagenome Assembled Genomes (MAGs). In addition to expanding the catalog of known microbial diversity, the systematic retrieval of MAGs stands as a tenable divide and conquer reduction of metagenome analysis to the simpler problem of single genome analysis. Many leading approaches to MAG retrieval depend upon time-series or transect data, whose effectiveness is a function of community complexity, target abundance and depth of sequencing. Without the need for time-series data, promising alternative methods are based upon the high-throughput sequencing technique called Hi-C.
The Hi-C technique produces read-pairs which capture in-vivo DNA-DNA proximity interactions (contacts). The physical structure of the community modulates the signal derived from these interactions and a hierarchy of interaction rates exists (īntra-chromosomal > Inter-chromosomal > Inter-cellular).
We describe an unsupervised method that exploits the hierarchical nature of Hi-C interaction rates to resolve MAGs from a single time-point. As a quantitative demonstration, next, we validate the method against the ground truth of a simulated human faecal microbiome. Lastly, we directly compare our method against a recently announced proprietary service ProxiMeta, which also performs MAG retrieval using Hi-C data.
bin3C has been implemented as a simple open-source pipeline and makes use of the unsupervised community detection algorithm Infomap (https://github.com/cerebis/bin3C).
List of abbreviations
- AMI
- adjusted mutual information
- ANI
- average nucleotide identity
- GOLD
- Genomes Online Database
- GSC
- Genomic Standards Consortium
- GTDB
- Genome Taxonomy Database
- MAG
- metagenome-assembled genome
- MIMAG
- Minimum information about a metagenome-assembled genome
- MIxS
- Minimum information about “some” sequence
- 3C
- chromosome conformation capture