HaploMerger: Reconstructing allelic relationships for polymorphic diploid genome assemblies

  1. Anlong Xu2
  1. State Key Laboratory of Biocontrol, Guangdong Key Laboratory of Pharmaceutical Functional Genes, College of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, People's Republic of China
    1. 1 These authors contributed equally to this work.

    Abstract

    Whole-genome shotgun assembly has been a long-standing issue for highly polymorphic genomes, and the advent of next-generation sequencing technologies has made the issue more challenging than ever. Here we present an automated pipeline, HaploMerger, for reconstructing allelic relationships in a diploid assembly. HaploMerger combines a LASTZ-ChainNet alignment approach with a novel graph-based structure, which helps to untangle allelic relationships between two haplotypes and guides the subsequent creation of reference haploid assemblies. The pipeline provides flexible parameters and schemes to improve the contiguity, continuity, and completeness of the reference assemblies. We show that HaploMerger produces efficient and accurate results in simulations and has advantages over manual curation when applied to real polymorphic assemblies (e.g., 4%–5% heterozygosity). We also used HaploMerger to analyze the diploid assembly of a single Chinese amphioxus (Branchiostoma belcheri) and compared the resulting haploid assemblies with EST sequences, which revealed that the two haplotypes are not only divergent but also highly complementary to each other. Taken together, we have demonstrated that HaploMerger is an effective tool for analyzing and exploiting polymorphic genome assemblies.

    Footnotes

    • Received October 19, 2011.
    • Accepted May 2, 2012.

    This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported License), as described at http://creativecommons.org/licenses/by-nc/3.0/.

    | Table of Contents

    Preprint Server