Keywords
sugarcane, long reads, polyploid, genomics
This article is included in the Agriculture, Food and Nutrition gateway.
This article is included in the Genomics and Genetics gateway.
This article is included in the Data: Use and Reuse collection.
sugarcane, long reads, polyploid, genomics
Sugarcane is an economically important crop used as source of sugar, ethanol and electricity generation1. Sugarcane has a haploid genome of ~1Gpb, however, modern sugarcane cultivars are polyploids derived from interspecific hybridization between S. officinarum L. and S. spontaneum L., reaching up to 130 chromosomes distributed among ~12 homo(eo)logous groups2,3, with a total genome size reaching 10Gpb4. Its complex genome structure has hampered genome sequencing, assembly and annotation. Partial genomic sequences are available5–8, as well as transcriptome sequences9–11, but there are not whole genome assemblies available to date. Here we used the Illumina TruSeq Synthetic Long Read sequencing technology to survey the genome of cultivar SP80-3280. The generated long reads and their assembly have been made public and will provide useful information for functional genomics studies.
The leaf rolls of greenhouse grown, two-month old plants of sugarcane cultivar SP80-3280 (provided by Centro de Tecnologia Canavieira, Piracicaba, São Paulo), were collected and immediately frozen in liquid nitrogen. The plant tissue was ground up to become fine powder, and high molecular weight DNA was extracted from 100 mg of fresh frozen tissue using CTAB (Sigma-Aldrich, USA) and chloroform:isoamyl alcohol (Sigma-Aldrich, USA) as previously described12. 6µg of DNA were sent to Illumina (CA, USA) for DNA sequencing using TruSeq Synthetic long read technology13, through their FastTrack Sequencing Service. Sequencing was performed on an Illumina HiSeq2000 system using paired-end chemistry. Nine long read libraries, each generating approx. 600Mbps, were generated, giving an estimated coverage between 4 and 5 of the monoploid genome. A total of 1,378,917 reads longer than 1.5Kbp, or 5,642,855,018 bases, were generated. The underlying 1,966,604,928 short reads amount to 393,320,985,600bp, which would translate to an estimated coverage of 393x of the haploid genome. The maximum read length was 20,918bp, with 36% of the reads being longer than 4.5Kbp. Possible contaminants were removed by comparison against the NCBI’s nucleotide database using BLAST14, keeping only the reads with best hits against Viridiplantae, resulting in 1,224,061 useful for assembly. Prior to assembly, reads originating from mitochondria (NC_008360.1) and chloroplast (NC_005878.2) were excluded using mirabait (http://mira-assembler.sourceforge.net/). Reads longer than 1.5Kbp were assembled using Celera’s WGS Assembler v8.2, using similar parameters as previously described13, except for some of the error parameters that were left in their default settings, i.e., ‘unitiger=bogart, merSize=31, ovlMinLen=100’, and the parameters ovlErrorRate, cnsErrorRate, cgwErrorRate, utgGraphErrorRate, utgGraphErrorLimit, utgMergeErrorRate, utgMergeErrorLimit. A non-redundant assembly was created using CD-HIT15, merging 100% identical sequences and sub-sequences.
Raw sequencing data are available at NCBI SRA; the long reads with accession number SRX845504, and the underlying short reads with accessions SRX853961 to SRX853969. The SP80-3280 assembly is available with accession number GCA_002018215.1. All data can be found under the BioProject.
This work was supported by institutional funds from CTBE/CNPEM to DMRP and a Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) grant to LM (2012/23345-0). The research was developed with support from CENAPAD-SP (Centro Nacional de Processamento de Alto Desempenho em São Paulo), project UNICAMP/FINEP-MCT.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The authors are grateful to Larissa Prado da Cruz (CTBE/CNPEM) for assistance with molecular biology procedures.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for creating the dataset(s) clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Yes
Are sufficient details of methods and materials provided to allow replication by others?
Yes
Are the datasets clearly presented in a useable and accessible format?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Sugarcane genetic engineering, transcriptomics
Is the rationale for creating the dataset(s) clearly described?
Yes
Are the protocols appropriate and is the work technically sound?
Yes
Are sufficient details of methods and materials provided to allow replication by others?
Yes
Are the datasets clearly presented in a useable and accessible format?
Yes
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Genome assembly
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 03 Jul 17 |
read | |
Version 1 09 Jun 17 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)