An improved assembly and annotation of the allohexaploid wheat genome identifies complete families of agronomic genes and provides genomic evidence for chromosomal translocations
- Bernardo J. Clavijo1,9,
- Luca Venturini1,9,
- Christian Schudoma1,
- Gonzalo Garcia Accinelli1,
- Gemy Kaithakottil1,
- Jonathan Wright1,
- Philippa Borrill2,
- George Kettleborough1,
- Darren Heavens1,
- Helen Chapman1,
- James Lipscombe1,
- Tom Barker1,
- Fu-Hao Lu2,
- Neil McKenzie2,
- Dina Raats1,
- Ricardo H. Ramirez-Gonzalez1,2,
- Aurore Coince1,
- Ned Peel1,
- Lawrence Percival-Alwyn1,
- Owen Duncan3,
- Josua Trösch3,
- Guotai Yu2,
- Dan M. Bolser4,
- Guy Namaati4,
- Arnaud Kerhornou4,
- Manuel Spannagl5,
- Heidrun Gundlach5,
- Georg Haberer5,
- Robert P. Davey1,6,
- Christine Fosker1,
- Federica Di Palma1,6,
- Andrew L. Phillips7,
- A. Harvey Millar3,
- Paul J. Kersey4,
- Cristobal Uauy2,
- Ksenia V. Krasileva1,6,8,
- David Swarbreck1,6,
- Michael W. Bevan2 and
- Matthew D. Clark1,6
- 1Earlham Institute, Norwich, NR4 7UZ, United Kingdom;
- 2John Innes Centre, Norwich, NR4 7UH, United Kingdom;
- 3ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Crawley Western Australia 6009, Australia;
- 4EMBL European Bioinformatics Institute, Hinxton, CB10 1SD, United Kingdom;
- 5Plant Genome and Systems Biology, Helmholtz Center Munich, 85764 Neuherberg, Germany;
- 6University of East Anglia, Norwich, NR4 7TJ, United Kingdom;
- 7Rothamsted Research, Harpenden, AL5 2JQ, United Kingdom;
- 8The Sainsbury Laboratory, Norwich, NR4 7UH, United Kingdom
- Corresponding authors: matt.clark{at}earlham.ac.uk, David.Swarbreck{at}earlham.ac.uk, michael.bevan{at}jic.ac.uk
-
↵9 These authors contributed equally to this work.
Abstract
Advances in genome sequencing and assembly technologies are generating many high-quality genome sequences, but assemblies of large, repeat-rich polyploid genomes, such as that of bread wheat, remain fragmented and incomplete. We have generated a new wheat whole-genome shotgun sequence assembly using a combination of optimized data types and an assembly algorithm designed to deal with large and complex genomes. The new assembly represents >78% of the genome with a scaffold N50 of 88.8 kb that has a high fidelity to the input data. Our new annotation combines strand-specific Illumina RNA-seq and Pacific Biosciences (PacBio) full-length cDNAs to identify 104,091 high-confidence protein-coding genes and 10,156 noncoding RNA genes. We confirmed three known and identified one novel genome rearrangements. Our approach enables the rapid and scalable assembly of wheat genomes, the identification of structural variants, and the definition of complete gene models, all powerful resources for trait analysis and breeding of this key global crop.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.217117.116.
-
Freely available online through the Genome Research Open Access option.
- Received October 13, 2016.
- Accepted March 14, 2017.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.