Haplotype Block Partitioning and Tag SNP Selection Using Genotype Data and Their Applications to Association Studies

  1. Kui Zhang1,2,
  2. Zhaohui S. Qin3,4,
  3. Jun S. Liu4,
  4. Ting Chen1,
  5. Michael S. Waterman1, and
  6. Fengzhu Sun1,5
  1. 1 Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089-1113, USA
  2. 2 Section on Statistical Genetics, Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
  3. 3 Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA
  4. 4 Department of Statistics, Harvard University, Cambridge, Massachusetts 02138, USA

Abstract

Recent studies have revealed that linkage disequilibrium (LD) patterns vary across the human genome with some regions of high LD interspersed by regions of low LD. A small fraction of SNPs (tag SNPs) is sufficient to capture most of the haplotype structure of the human genome. In this paper, we develop a method to partition haplotypes into blocks and to identify tag SNPs based on genotype data by combining a dynamic programming algorithm for haplotype block partitioning and tag SNP selection based on haplotype data with a variation of the expectation maximization (EM) algorithm for haplotype inference. We assess the effects of using either haplotype or genotype data in haplotype block identification and tag SNP selection as a function of several factors, including sample size, density or number of SNPs studied, allele frequencies, fraction of missing data, and genotyping error rate, using extensive simulations. We find that a modest number of haplotype or genotype samples will result in consistent block partitions and tag SNP selection. The power of association studies based on tag SNPs using genotype data is similar to that using haplotype data.

Footnotes

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1837404. Article published online before print in April 2004.

  • 5 Corresponding author. E-MAIL fsun{at}email.usc.edu; FAX (213) 740-2437.

    • Accepted January 12, 2004.
    • Received August 1, 2003.
| Table of Contents

Preprint Server