Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II

  1. Peter Parham1
  1. 1Departments of Structural Biology and Microbiology & Immunology, Stanford University School of Medicine, Stanford, California 94305, USA;
  2. 2Illumina Incorporated, San Diego, California 92122, USA;
  3. 3Department of Neurology, University of California San Francisco School of Medicine, San Francisco, California 94158, USA
  1. Corresponding author: paul.norman{at}stanford.edu
  • 4 Present address: Equipe ATIP, URMITE UM 63 CNRS 7278 IRD 198 Inserm U1095, IHU Méditerranée Infection, Aix-Marseille Université, Marseille 13005, France

Abstract

The most polymorphic part of the human genome, the MHC, encodes over 160 proteins of diverse function. Half of them, including the HLA class I and II genes, are directly involved in immune responses. Consequently, the MHC region strongly associates with numerous diseases and clinical therapies. Notoriously, the MHC region has been intractable to high-throughput analysis at complete sequence resolution, and current reference haplotypes are inadequate for large-scale studies. To address these challenges, we developed a method that specifically captures and sequences the 4.8-Mbp MHC region from genomic DNA. For 95 MHC homozygous cell lines we assembled, de novo, a set of high-fidelity contigs and a sequence scaffold, representing a mean 98% of the target region. Included are six alternative MHC reference sequences of the human genome that we completed and refined. Characterization of the sequence and structural diversity of the MHC region shows the approach accurately determines the sequences of the highly polymorphic HLA class I and HLA class II genes and the complex structural diversity of complement factor C4A/C4B. It has also uncovered extensive and unexpected diversity in other MHC genes; an example is MUC22, which encodes a lung mucin and exhibits more coding sequence alleles than any HLA class I or II gene studied here. More than 60% of the coding sequence alleles analyzed were previously uncharacterized. We have created a substantial database of robust reference MHC haplotype sequences that will enable future population scale studies of this complicated and clinically important region of the human genome.

Footnotes

  • Received August 1, 2016.
  • Accepted February 10, 2017.

This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

| Table of Contents

Preprint Server