Detection of long repeat expansions from PCR-free whole-genome sequence data
- Egor Dolzhenko1,18,
- Joke J.F.A. van Vugt2,18,
- Richard J. Shaw3,4,
- Mitchell A. Bekritsky3,
- Marka van Blitterswijk5,
- Giuseppe Narzisi6,
- Subramanian S. Ajay1,
- Vani Rajan1,
- Bryan R. Lajoie1,
- Nathan H. Johnson1,
- Zoya Kingsbury3,
- Sean J. Humphray3,
- Raymond D. Schellevis2,
- William J. Brands2,
- Matt Baker5,
- Rosa Rademakers5,
- Maarten Kooyman7,
- Gijs H.P. Tazelaar2,
- Michael A. van Es2,
- Russell McLaughlin8,9,
- William Sproviero10,
- Aleksey Shatunov10,
- Ashley Jones10,
- Ahmad Al Khleifat10,
- Alan Pittman11,
- Sarah Morgan11,
- Orla Hardiman8,9,
- Ammar Al-Chalabi10,
- Chris Shaw10,
- Bradley Smith10,
- Edmund J. Neo10,
- Karen Morrison12,
- Pamela J. Shaw13,
- Catherine Reeves6,
- Lara Winterkorn6,
- Nancy S. Wexler14,15,
- The US–Venezuela Collaborative Research Group16,
- David E. Housman17,
- Christopher W. Ng17,
- Alina L. Li17,
- Ryan J. Taft1,
- Leonard H. van den Berg2,
- David R. Bentley3,
- Jan H. Veldink2,18 and
- Michael A. Eberle1,18
- 1Illumina Incorporated, San Diego, California 92122, USA;
- 2Department of Neurology, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht University, 3584 CX Utrecht, The Netherlands;
- 3Illumina Limited, Chesterford Research Park, Little Chesterford, Nr Saffron Walden, Essex, CB10 1XL, United Kingdom;
- 4Repositive Limited, Future Business Centre, Cambridge CB4 2HY, United Kingdom;
- 5Department of Neuroscience, Mayo Clinic, Jacksonville, Florida 32224, USA;
- 6New York Genome Center, New York, New York 10013, USA;
- 7SURFsara, 1098 XG Amsterdam, The Netherlands;
- 8Academic Unit of Neurology, Trinity College Dublin, Trinity Biomedical Sciences Institute, Dublin 2, Republic of Ireland;
- 9Department of Neurology, Beaumont Hospital, Dublin 9, Republic of Ireland;
- 10Department of Basic and Clinical Neuroscience, Maurice Wohl Clinical Neuroscience Institute, King's College London, London SE5 9RX, United Kingdom;
- 11Department of Molecular Neuroscience, UCL Institute of Neurology, London WC1N 3BG, United Kingdom;
- 12University of Southampton, Southampton SO17 1BJ, United Kingdom;
- 13Sheffield Institute for Translational Neuroscience, University of Sheffield, Sheffield S10 2HQ, United Kingdom;
- 14Columbia University, New York, New York 10032, USA;
- 15Hereditary Disease Foundation, New York, New York 10032, USA;
- 16The US–Venezuela Collaborative Research Group;
- 17Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
-
↵18 These authors contributed equally to this work.
Abstract
Identifying large expansions of short tandem repeats (STRs), such as those that cause amyotrophic lateral sclerosis (ALS) and fragile X syndrome, is challenging for short-read whole-genome sequencing (WGS) data. A solution to this problem is an important step toward integrating WGS into precision medicine. We developed a software tool called ExpansionHunter that, using PCR-free WGS short-read data, can genotype repeats at the locus of interest, even if the expanded repeat is larger than the read length. We applied our algorithm to WGS data from 3001 ALS patients who have been tested for the presence of the C9orf72 repeat expansion with repeat-primed PCR (RP-PCR). Compared against this truth data, ExpansionHunter correctly classified all (212/212, 95% CI [0.98, 1.00]) of the expanded samples as either expansions (208) or potential expansions (4). Additionally, 99.9% (2786/2789, 95% CI [0.997, 1.00]) of the wild-type samples were correctly classified as wild type by this method with the remaining three samples identified as possible expansions. We further applied our algorithm to a set of 152 samples in which every sample had one of eight different pathogenic repeat expansions, including those associated with fragile X syndrome, Friedreich's ataxia, and Huntington's disease, and correctly flagged all but one of the known repeat expansions. Thus, ExpansionHunter can be used to accurately detect known pathogenic repeat expansions and provides researchers with a tool that can be used to identify new pathogenic repeat expansions.
Footnotes
-
[Supplemental material is available for this article.]
-
Article published online before print. Article, supplemental material, and publication date are at http://www.genome.org/cgi/doi/10.1101/gr.225672.117.
-
Freely available online through the Genome Research Open Access option.
- Received June 1, 2017.
- Accepted August 28, 2017.
This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.