Human Whole-Genome Shotgun Sequencing

  1. James L. Weber1,3 and
  2. Eugene W. Myers2
  1. 1Center for Medical Genetics, Marshfield Medical Research Foundation, Marshfield, Wisconsin 54449; 2Department of Computer Science, University of Arizona, Tucson, Arizona 85721

This extract was created in the absence of an abstract.

Large-scale sequencing of the human genome is now under way (Boguski et al. 1996; Marshall and Pennisi 1996). Although at the beginning of the Genome Project, many doubted the scientific value of sequencing the entire human genome, these doubts have evaporated almost entirely (Gibbs 1995; Olson 1995). Primary reasons for generating the human genomic sequence are listed in Table1.

View this table:
Table 1.

Primary Reasons for Sequencing Human Genomic DNA

The approach being taken for human genomic sequencing is the same as that used for the Saccharomyces cerevisiae andCaenorhabditis elegans genomes, namely construction of overlapping arrays of large insert Escherichia coli clones, followed by complete sequencing of these clones one at a time. In this article, we outline an alternative approach to sequencing the human and other large genomes, which we argue is less costly and more informative than the clone-by-clone approach.

A Plan for Human Whole-Genome Shotgun Sequencing

Although there are many conceivable variations, the crux of our plan involves high-quality, semiautomated sequencing from both ends of very large numbers of randomly selected human genomic DNA fragments. DNA of high molecular weight purified from at least a few different human donors would be sheared, size-selected, and cloned into E. coli. Insert sizes would fall into two classes. Long inserts would be 5–20 kb in size and would be cloned into plasmid, phage, or possibly cosmid vectors. Short inserts would be 0.4–1.2 kb in size and would be cloned into plasmid vectors. Read lengths would be of sufficient magnitude so that the two sequence reads from the ends of the short inserts overlap. The ratio of long to short inserts would be ⩾1. Standard, gel-based methods would be utilized to generate at least 30 billion nucleotides of raw sequence (10-fold coverage of the genome). Many laboratories throughout the world could participate in raw sequence generation, …

| Table of Contents

Preprint Server