Data for Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery

doi:10.5281/zenodo.3759712

Published December 11, 2019 | Version 1.1

Dataset Open

Data for Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery

1. Animal Genomics ETH Zürich

Description of the datasets

Data are organized as folders and compressed with tar.gz.

There are two compressed data folder: data which used for cattle genome graphs experiment and data_human which we used for human genome graphs experiment.

Cattle genome graphs experiments

First you need to unzip the file using command tar -xvzf data.tar.gz. After unzipping, the data folder is organized as follows:

Utilities: contain bovine ARS-UCD 1.2 fasta reference with the accompanying index.
Bin: contain the softwares used in the paper (vg, liftover, vcf2diploid)
Part1: data for analysis in variant prioritization section, further subdivided into:
- vcf_sim: variant files from four animal in each breed used to simulate reads
- reads_sim: simulated short reads used for read mapping
- vcf_freq: variants augmented to graphs filtered based on allele frequency
Part2: data used for analysis in the section of graph mapping with breeds-filtered variants, further subdivided into:
- vcf_breed: variant files used to graphs construction.
Part3: data used for analysis in the section of consensus genome, further subdivided into:
- read_sims: simulated reads as in the part1, but the coordinates are liftovered to the new consensus genomes.
- reference: contain the original reference and consensus references.
- vcf_consensus: contain major allele variants to construct consensus genomes.
Part4: data analysis in the section of whole genome graph construction and variant genotyping.
- vcf_construct: variants from chromosome 1-29 from 82 Brown Swiss used to construct BSW whole genome graph.
- BSW_graph: whole genome Brown Swiss graph with the three accompanying indexes (xg,gcsa, and gbwt).

Human genome graphs experiments

First you need to unzip the data_human file using command tar -xvzf data_hum.tar.gz. After unzipping, the data folder is organized as follows:

reference: the g1k_v37 reference used as a graph backbone
vcf_sim: variant files from four individuals in each population used to simulate reads
reads_sim: simulated short reads used for read mapping
vcf_freq: variants augmented to graphs filtered based on allele frequency

Files

Files (35.4 GB)

Name	Size	Download all
data.tar.gz md5:61174b9dcfeba5e0fd4378e8b8fb7e2e	24.8 GB	Download
data_human.tar.gz md5:59f7cd71bbfc0661a24844b5e8433410	10.6 GB	Download

	All versions	This version
Views	877	611
Downloads	92	59
Data volume	3.5 TB	2.4 TB

Data for Bovine breed-specific augmented reference graphs facilitate accurate sequence read mapping and unbiased variant discovery

Creators

Description

Files

Files (35.4 GB)