Founder population-specific HapMap panel increases power in GWA studies through improved imputation accuracy and CNV tagging

  1. Samuli Ripatti1,2,12
  1. 1 Institute for Molecular Medicine Finland, FIMM, University of Helsinki, FI-00014 Helsinki, Finland;
  2. 2 Public Health Genomics Unit, National Institute for Health and Welfare, FI-00271 Helsinki, Finland;
  3. 3 Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, United Kingdom;
  4. 4 Department of Human Genetics, Leiden University Medical Centre, Leiden, The Netherlands;
  5. 5 Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom;
  6. 6 Department of Health Promotion and Chronic Disease Prevention, National Institute for Health and Welfare, FI-00271 Helsinki, Finland;
  7. 7 Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA;
  8. 8 Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts 02114, USA;
  9. 9 Department of Medical Genetics, University of Helsinki, Helsinki University Hospital, FI-00014 Helsinki, Finland
    1. 10 These authors contributed equally to this work.

    Abstract

    The combining of genome-wide association (GWA) data across populations represents a major challenge for massive global meta-analyses. Genotype imputation using densely genotyped reference samples facilitates the combination of data across different genotyping platforms. HapMap data is typically used as a reference for single nucleotide polymorphism (SNP) imputation and tagging copy number polymorphisms (CNPs). However, the advantage of having population-specific reference panels for founder populations has not been evaluated. We looked at the properties and impact of adding 81 individuals from a founder population to HapMap3 reference data on imputation quality, CNP tagging, and power to detect association in simulations and in an independent cohort of 2138 individuals. The gain in SNP imputation accuracy was highest among low-frequency markers (minor allele frequency [MAF] < 5%), for which adding the population-specific samples to the reference set increased the median R2 between imputed and genotyped SNPs from 0.90 to 0.94. Accuracy also increased in regions with high recombination rates. Similarly, a reference set with population-specific extension facilitated the identification of better tag-SNPs for a subset of CNPs; for 4% of CNPs the R2 between SNP genotypes and CNP intensity in the independent population cohort was at least twice as high as without the extension. We conclude that even a relatively small population-specific reference set yields considerable benefits in SNP imputation, CNP tagging accuracy, and the power to detect associations in founder populations and population isolates in particular.

    Footnotes

    • Received February 11, 2010.
    • Accepted July 12, 2010.
    | Table of Contents

    Preprint Server