ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article

FastQ Screen: A tool for multi-genome mapping and quality control

[version 1; peer review: 3 approved, 1 approved with reservations]
PUBLISHED 24 Aug 2018
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

Abstract

DNA sequencing analysis typically involves mapping reads to just one reference genome.  Mapping against multiple genomes is necessary, however, when the genome of origin requires confirmation. Mapping against multiple genomes is also advisable for detecting contamination or for identifying sample swaps which, if left undetected, may lead to incorrect experimental conclusions.  Consequently, we present FastQ Screen, a tool to validate the origin of DNA samples by quantifying the proportion of reads that map to a panel of reference genomes. FastQ Screen is intended to be used routinely as a quality control measure and for analysing samples in which the origin of the DNA is uncertain or has multiple sources.

Keywords

Bioinformatics Contamination FastQC Illumina Metagenomics NGS QC Sequencing

Introduction

In general, reaching sound conclusions from sequencing experiments requires the origin of a sample to be identified correctly prior to mapping. To reduce the risk of contaminants leading to incorrect inferences, it is advisable to map sequencing results against not only the expected reference genome but also against reasonable sources of contamination. Common reasons for contamination include amplifying the wrong target molecule, unwanted DNA being present in reagents used in library generation, carry-over from samples previously loaded onto a sequencing machine or sample swaps.

The tool utilises either Bowtie1, Bowtie 22 or BWA3, as preferred by the user, to map reads against pre-specified genomes. FastQ Screen presents the mapping results in both text and graphical formats, thereby allowing the user to confirm the genomic origin of a sample or identify sources of DNA contamination. The tool summarises the proportion of reads that map to a single genome or to multiple genomes. In addition, it reports whether those alignments are to a unique position, or to more than one location, within the genome of interest (Figure 1).

bf116cb7-4e61-45ff-80c2-3d50e36acbb7_figure1.gif

Figure 1. Graphical output from FastQ Screen after mapping a publicly available RNA-Seq sample (SRR5100711) against several reference genomes.

Reads either i) mappped uniquely to one genome only (light blue), ii) multi-mapped to one genome only (dark blue), ii) mapped uniquely to a given genome and mapped to at least one other genome (light red) or multi-mapped to a given genome and mapped to at least one other genome (dark red). The reads represented by blue shading are significant since these are sequences that align only to one genome, and consequently, if are observed in an unexpected genome they suggest contamination.

FastQ Screen functionality is generally independent of the laboratory protocol followed and so can be used to analyse genomic DNA, RNA-Seq4, ChIP-Seq or Hi-C experiments. In addition, FastQ Screen is compatible with Bismark5, and so can also be used to process bisulfite sequence data.

Other tools exist with similar functionality to FastQ Screen, most notably Multi Genome Alignment (MGA)6. FastQ Screen has a number of advantages over these tools, including directly reporting the proportion of multi-mapping reads, thereby helping identify DNA populations rich in low-complexity sequences. Another benefit of our program is the capability to create filtered FASTQ files. FastQ Screen is also the only quality control (QC) tool that aligns reads to multiple bisulfite reference genomes.

Methods

Implementation

The program utilises a short read sequence aligner to map FASTQ reads against pre-defined reference genomes. The tool records against which genome or genomes each read maps and summarises the results in graphical and text formats.

Operation

We coded FastQ Screen in Perl and made use of the CPAN module GD::Graph for the generation of summary bar plots. The software requires a functional version of Bowtie, Bowtie 2 or BWA, and should be run on a Linux-based operating system. FASTQ Screen uses Plotly to enable visualisation of results in a web browser. The tool takes as input a text configuration file and FASTQ files, which are sub-sampled by default to 100,000 reads to reduce running times, and then mapped to a panel of pre-specified genomes.

Use cases

Preliminary sequencing QC: FastQ Screen provides preliminary evidence on whether a sequencing run has been successful, as demonstrated in Figure 1, which shows results using a publicly available RNA-Seq sample (SRR5100711) labelled as mouse. The software processed the deposited FASTQ file to generate summary results in text, HTML and PNG format. As expected, the dataset contained a substantial proportion of reads that mapped only to the mouse genome, and although a sizeable proportion of reads mapped to both the mouse and rat genomes, that may have also been expected considering the close evolutionary relationship between those two species. Of concern, however, was the discovery that 11.4% of the reads mapped solely to the human genome, suggesting the sample was contaminated. This may prove problematic if human-derived reads that also align to the mouse reference genome are not removed, since differences between mouse samples may then actually reflect the variation in the degree of contamination between the samples rather than genuine biological differences. Very few reads aligned to adapter sequences which was an encouraging observation.

Identifying sample origin from a range of alternatives: FastQ Screen was recently used by researchers to identify the origin of the clothes of the Tyrolean Iceman (popularly named Ötzi), a famous 5,300 year old natural mummy discovered in 1991 in the Italian Ötztal Alps. By screening sequences against probable sources of preserved leathers, the research team showed that the iceman’s hat came from Brown Bear, his quiver from Roe deer and his loincloth came from sheep7. In a similar fashion, FastQ Screen has been used to determine the animal origin of vellum found in 13th century Bibles8.

Filtering results: FastQ Screen can also be used to filter reads mapping (or not mapping) to specified genomes. This has numerous applications, most typically to remove DNA contaminants, as exemplified by a recent clinical microbial metagenomics study in which nucleic acids were extracted from porcine faeces9. FastQ Screen was then used to filter-out host sequences, and the remaining reads were then mapped, leading to the identification of over 1,600 bacterial and Archaea species and strains of virus.

In contrast, in some experiments the source of contamination may be completely unpredictable and so we have incorporated a setting in which all unsuccessfully mapped reads are written to a FASTQ format output file. This may then be used by other resources, such as BLAST, to determine the origin of those sequences.

Summary

Since its release, FastQ Screen has been used to analyse a myriad of sequencing datasets. We initially envisioned the software as a QC tool to complement our related program FastQC, but we subsequently used the software to confirm the origin of samples and added functionality for filtering FASTQ reads. The program may be used in conjunction with several common aligners, including Bismark for processing bisulfite libraries. FastQ Screen has been incorporated by other groups into bioinformatics workflows, was reimplemented in the recently released QC tool Aozan10, and is compatible with MultiQC11, a tool to aid comparison of samples with respect to a large number of QC metrics.

Software availability

FastQ Screen is available from: https://www.bioinformatics.babraham.ac.uk/projects/fastq_screen

Source code available from: https://github.com/StevenWingett/FastQ-Screen

Archived source code as at time of publication: http://doi.org/10.5281/zenodo.134458412

License: GNU GPL 3.0

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 24 Aug 2018
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Wingett SW and Andrews S. FastQ Screen: A tool for multi-genome mapping and quality control [version 1; peer review: 3 approved, 1 approved with reservations] F1000Research 2018, 7:1338 (https://doi.org/10.12688/f1000research.15931.1)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 24 Aug 2018
Views
41
Cite
Reviewer Report 17 Sep 2018
Matthew D. Teasdale, BioArCh, University of York, York, UK 
Approved
VIEWS 41
In this paper Wingett and Andrews describe FastQ Screen a program for quality control and source species identification. The paper is well written with clear example use cases and I am very happy to recommend FastQ Screen for indexing. 

... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Teasdale MD. Reviewer Report For: FastQ Screen: A tool for multi-genome mapping and quality control [version 1; peer review: 3 approved, 1 approved with reservations]. F1000Research 2018, 7:1338 (https://doi.org/10.5256/f1000research.17398.r37623)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Sep 2018
    Steven Wingett, Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK
    17 Sep 2018
    Author Response
    We are delighted that FastQ Screen has proven useful in your research and we hope it will remain part of your analysis pipeline as we add new features to the ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 Sep 2018
    Steven Wingett, Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK
    17 Sep 2018
    Author Response
    We are delighted that FastQ Screen has proven useful in your research and we hope it will remain part of your analysis pipeline as we add new features to the ... Continue reading
Views
45
Cite
Reviewer Report 06 Sep 2018
Stéphane Le Crom, Sorbonne Université, Univ Antilles, Univ Nice Sophia Antipolis,  Paris, France;  Sorbonne Université, UMS Omique, Plateforme Post-génomique de la Pitié-Salpêtrière, Paris, France;  Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, Paris, France 
Laurent Jourdren, Institut de biologie de l’Ecole normale supérieure (IBENS), Ecole normale supérieure, Paris, France 
Approved with Reservations
VIEWS 45
When dealing with multiple high throughput sequencing experiments, especially for core facilities, you need to pay great attention to quality controls. Contaminations from different species you are working with are one of the potential problems you can encounter. The FastQ ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Le Crom S and Jourdren L. Reviewer Report For: FastQ Screen: A tool for multi-genome mapping and quality control [version 1; peer review: 3 approved, 1 approved with reservations]. F1000Research 2018, 7:1338 (https://doi.org/10.5256/f1000research.17398.r37622)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Sep 2018
    Steven Wingett, Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK
    17 Sep 2018
    Author Response
    Thank you for your detailed feedback. Both you and the reviewer Dr Hamilton pointed out that FastQ Screen would be better served if we made pre-made genome indices available.  As ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 Sep 2018
    Steven Wingett, Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK
    17 Sep 2018
    Author Response
    Thank you for your detailed feedback. Both you and the reviewer Dr Hamilton pointed out that FastQ Screen would be better served if we made pre-made genome indices available.  As ... Continue reading
Views
26
Cite
Reviewer Report 04 Sep 2018
Ian J. Donaldson, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, UK 
Approved
VIEWS 26
FastQ Screen by Wingett and Andrews is a tool to map a sample of sequenced reads against a panel of reference genomes. 

The tool is comprehensively documented and is available from the authors' web site and via ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Donaldson IJ. Reviewer Report For: FastQ Screen: A tool for multi-genome mapping and quality control [version 1; peer review: 3 approved, 1 approved with reservations]. F1000Research 2018, 7:1338 (https://doi.org/10.5256/f1000research.17398.r37624)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Sep 2018
    Steven Wingett, Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK
    17 Sep 2018
    Author Response
    Thank you for your comments and we are pleased that you find FastQ Screen useful in your research.  We have updated the manuscript to correct the typographical errors.
    Competing Interests: No competing interests were disclosed.
COMMENTS ON THIS REPORT
  • Author Response 17 Sep 2018
    Steven Wingett, Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK
    17 Sep 2018
    Author Response
    Thank you for your comments and we are pleased that you find FastQ Screen useful in your research.  We have updated the manuscript to correct the typographical errors.
    Competing Interests: No competing interests were disclosed.
Views
49
Cite
Reviewer Report 29 Aug 2018
Russell Hamilton, University of Cambridge, Cambridge, UK 
Approved
VIEWS 49
Wingett and Andrews present FastQ Screen for mapping sequencing reads to multiple genomes with the goal of identifying the genome of origin. FastQ Screen is well documented, open source and freely available via GitHub and their own website.
 
... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Hamilton R. Reviewer Report For: FastQ Screen: A tool for multi-genome mapping and quality control [version 1; peer review: 3 approved, 1 approved with reservations]. F1000Research 2018, 7:1338 (https://doi.org/10.5256/f1000research.17398.r37620)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Sep 2018
    Steven Wingett, Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK
    17 Sep 2018
    Author Response
    We agree with the excellent suggestion to create pre-built genomes for users.  Indeed, the latest version of FastQ Screen (v0.13.0) now has a new option (--get_genomes) which instructs the script ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 Sep 2018
    Steven Wingett, Bioinformatics, Babraham Institute, Cambridge, CB22 3AT, UK
    17 Sep 2018
    Author Response
    We agree with the excellent suggestion to create pre-built genomes for users.  Indeed, the latest version of FastQ Screen (v0.13.0) now has a new option (--get_genomes) which instructs the script ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 24 Aug 2018
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.