Published August 29, 2023 | Version v1
Dataset Open

Simulated reads for benchmarking SARS-CoV-2 lineage abundance estimation

  • 1. Delft University of Technology
  • 2. Texas A&M University - San Antonio

Description

To evaluate the accuracy of lineage abundance estimates from amplicon-based and whole genome-based sequencing, we simulated paired-end reads from amplicons determined by AmpliDiff, and reads spanning full genomes. Abundances of lineages are based on the relative abundance of a lineage within the dataset. The data consists of the following 8 independent datasets:

  • 200 bp reads from the Netherlands based on AmpliDiff amplicons (1, 2, 5 or 10 amplicons) at 1000x coverage,
  • 400 bp reads from the Netherlands based on AmpliDiff amplicons (1, 2, 5 or 10 amplicons) at 1000x coverage,
  • 200 bp reads from the Netherlands based on whole genome sequencing at 100x coverage,
  • 400 bp reads from the Netherlands based on whole genome sequencing at 100x coverage,
  • 200 bp reads from Texas based on AmpliDiff amplicons (1, 2, 5 or 10 amplicons) at 1000x coverage,
  • 400 bp reads from Texas based on AmpliDiff amplicons (1, 2, 5 or 10 amplicons) at 1000x coverage,
  • 200 bp reads from Texas based on whole genome sequencing at 100x coverage,
  • 400 bp reads from Texas based on whole genome sequencing at 100x coverage.

Every independent dataset contains 20 sets of reads (generated with different random seeds). The genomes used for the Netherlands-based simulations can be obtained via GISAID through accession id EPI_SET_230825fe, and the genomes used for the Texas-based simulations can be obtained via GISAID through accession id EPI_SET_230825pe.

Files

Files (37.1 GB)

Name Size Download all
md5:ed890c54529ebbfdc998d844e156675a
2.0 GB Download
md5:e26c29969082ef4f9a10a67937d2a2af
1.8 GB Download
md5:161ffc4b119eb1fcfcbd427e36d44bb9
7.3 GB Download
md5:c5c9451d28a2c6f1d18411998259add9
2.9 GB Download
md5:e093c9eae0687adc83f1a7a39fab5dda
3.4 GB Download
md5:52d09f19130defaf7647bc5b14127bcd
3.2 GB Download
md5:be1b08038a5eb791cbe9bfbe254f3474
12.6 GB Download
md5:e17030c2de9a82cbf2e7855fd0ee6d5f
3.8 GB Download

Additional details

Related works

Is cited by
Preprint: 10.1101/2023.07.22.550164 (DOI)

References

  • Khare, S., et al (2021) GISAID's Role in Pandemic Response. China CDC Weekly, 3(49): 1049-1051. doi: 10.46234/ccdcw2021.255 PMCID: 8668406