Keywords
Drug-gene interaction, genomics, next-generation sequencing, annotation, somatic variant, clinical application, Bioconductor package, pipeline.
This article is included in the Bioinformatics gateway.
This article is included in the Bioconductor gateway.
This article is included in the RPackage gateway.
Drug-gene interaction, genomics, next-generation sequencing, annotation, somatic variant, clinical application, Bioconductor package, pipeline.
To address the reviewers comments, we made a few changes to the previous version:
- Added a paragraph under "R session setup"
- As requested by the reviewer, we added a figure (Figure 1) and refer to it in "Query drug-gene interactions"
- Added a section on "Version numbers of DGIdb integrated resources" towards the end of the manuscript
See the authors' detailed response to the review by Ankush Sharma and Md. Sahidul Islam
See the authors' detailed response to the review by Christopher Southan
In recent years, next-generation sequencing (NGS) pipelines have been established and employed extensively in research settings. These efforts have helped tremendously to improve our understanding of genetic malignancies such as cancer. More recently, joint efforts of research groups and clinics aim to further enhance our knowledge of these malignancies for better diagnostic and treatment options. For example, the Cancer Genome Atlas (TCGA)1 Consortium has sequenced several thousand samples of more than 20 different cancer types. One of the aims of this project is to better characterize different cancer types, for example through identification of distinct molecular sub-types.
There are also substantial efforts to move NGS technologies and pipelines into molecular diagnostics, for example, for the characterization of somatic variants of individual tumor samples through targeted panel sequencing. Targeted panel sequencing covers a specific set of genes or locations, typically between 50 and a few hundred. Panels focus on frequently mutated or otherwise altered genes or genomic locations. Currently, several generic cancer panels and panels for specific cancer types are available2,3. Based on the panel characterization, targeted therapies for the specific genetic aberrations can be applied.
The number of targeted therapies for cancer available today is still relatively small and their approval is typically limited to one or several cancer sub-types4. However, as the therapeutic options increase, more patients can benefit from these targeted therapies. As a consequence, several clinics or institutes developed and implemented molecular diagnostic approaches based on whole-exome and/or whole-genome sequencing5–8. Unlike targeted panels, whole-exome or whole-genome sequencing is not limited to a set of pre-selected genes, but allows for the detection of somatic aberrations across all protein coding sequences or the entire genome, respectively.
An exome- or genome-wide approach provides great advantage over targeted gene panels. They allow for a more comprehensive picture of the mutational landscape of a specific tumor. In addition, with more such data available and a better understanding of gene-gene and drug-gene interactions, prediction of drug efficacy as well as adverse drug reactions may become feasible. However, workflows based on whole-exome or whole-genome sequencing require clinical interpretation of the identified genetic variants. The result of an NGS pipeline is generally a list of genes harboring somatic variants or other genomic aberrations. To identify clinically actionable targets, these genomic aberrations need to be associated with drugs specifically targeting them.
Here we suggest a workflow to automate the identification of potential drug targets from a list of genomic aberrations, represented by a list of genes harboring them. For these genes, we mine drug-gene interactions using the drug-gene interaction database (DGIdb)9. DGIdb integrates drug-gene interactions from 15 different resources. We provide the R/Bioconductor package rDGIdb (http://bioconductor.org/packages/rDGIdb/), which allows to efficiently integrate drug-gene annotation with NGS pipelines. rDGIdb can query DGIdb and filter results on different levels, i.e., source databases, interaction types, and gene categories. Through the rDGIdb package, drug-gene interaction mining can be automated and incorporated easily into NGS pipelines. Moreover, the rDGIdb package also provides functionality to visualize results.
Somatic variants or other genomic aberrations are identified from raw sequencing data and filtered using a standard NGS pipeline. The number of somatic variants might vary substantially, depending on the sequencing approach used and the levels or stringency of filtering employed. Next, somatic variants are annotated with gene names, for which interacting drugs can then be queried through rDGIdb.
Provided a list of genes with genomic aberrations, we identify aberrations targetable with a drug or compound. The R/Bioconductor package rDGIdb provides functionality to query drug-gene interactions provided by DGIdb and to apply filtering on different levels.
The package can be installed from an open R session. Instructions are provided on the rDGIdb Bioconductor page (http://bioconductor.org/packages/rDGIdb/). After installation of the package and all its dependencies, rDGIdb needs to be attached and a gene vector prepared. Gene names can be loaded from a text file or manually entered. The code below illustrates how to load gene names from a text file called aberrated-genes.txt, assuming the text file lists one gene symbol per line.
library("rDGIdb") genes <- read.table("aberrated-genes.txt", sep = "\t", header = FALSE, stringsAsFactors = FALSE) genes <- genes[,1]
Alternatively, variants can be loaded from a variant call format (VCF) file and annotated using the Bioconductor VariantAnnotation workflow10 (http://bioconductor.org/packages/VariantAnnotation). This is illustrated in the rDGIdb package vignette.
To query DGIdb, the rDGIdb package provides a simple query function, queryDGIdb. The function takes a vector of official gene symbols for which drug-gene interactions are to be queried. This is the only required argument to the query function, all other arguments are optional.
genes <- c("DDR2") queryResult <- queryDGIdb(genes)
The function returns the query result as an object of type rDGIdbResult. The result is accessible through S4 methods. These methods format the result according to the result tabs provided on the DGIdb web interface. More specifically, the package provides four methods that return result data resembling the format provided through the DGIdb web interface, namely “Results Summary”, “Detailed Results”, “By Gene”, and “Search Term Summary”.
resultSummary(queryResult) # Summary table of the results detailedResults(queryResult) # Detailed result table listing source and interaction type byGene(queryResult) # Gene summary searchTermSummary(queryResult) # Genes successfully mapped
An example output of resultSummary for the DDR2 gene is shown in Table 1. Interactions are illustrated as a drug-gene interaction network in Figure 1. The figure further shows the resource that reported a specific interaction. Query results can either be further processed using R or saved to a text file for analysis with other software tools.
Gene | Drug | Drug-Bank | MyCancer- Genome- ClinicalTrial | GuideTo- Pharmacology- Interactions | CIViC | DoCM | Score |
---|---|---|---|---|---|---|---|
DDR2 | DASATINIB | 0 | 1 | 0 | 1 | 1 | 3 |
DDR2 | ERLOTINIB | 0 | 0 | 0 | 1 | 1 | 2 |
DDR2 | REGORAFENIB | 1 | 1 | 0 | 0 | 0 | 2 |
DDR2 | SORAFENIB | 0 | 0 | 1 | 0 | 0 | 1 |
Depending on the application, it may be desirable to filter for specific drug-gene interactions. The rDGIdb package allows filtering on the level of (1) source database, (2) gene category, (3) interaction type, and (4) other criteria, applied directly to the query result.
DGIdb accumulates drug-gene interactions from 15 different source databases. These are summarized in Table 2. Depending on the application for which drug-gene interactions are queried, one or several source databases might be more relevant. The specific database or a group of databases to be queried is specified through the sourceDatabases argument. rDGIdb will only return hits listed in respective source databases. For example, the query below returns drug-gene interactions from databases: MyCancerGenome and MyCancerGenomeClinicalTrials only.
genes <- c("KRAS", "BRAF") databases <- c("MyCancerGenome","MyCancerGenomeClinicalTrials") filter1 <- queryDGIdb(genes, sourceDatabases = databases)
Source | Link | Reference |
---|---|---|
CancerCommons | https://www.cancercommons.org | 11 |
ChEMBL | https://www.ebi.ac.uk/chembl | 12 |
CIViC | https://civic.genome.wustl.edu | 13 |
ClearityFoundationBiomarkers | http://www.clearityfoundation.org | 14 |
ClearityFoundationClinicalTrial | http://www.clearityfoundation.org/clinical-trials | 14 |
DoCM | http://docm.genome.wustl.edu | 15 |
DrugBank | http://www.drugbank.ca | 16 |
GuideToPharmacologyInteractions | http://www.guidetopharmacology.org | 17 |
MyCancerGenome | https://www.mycancergenome.org | 4 |
MyCancerGenomeClinicalTrial | https://www.mycancergenome.org/clinicaltrials | 4 |
PharmGKB | https://www.pharmgkb.org/ | 18 |
TALC | – | 19 |
TEND | – | 20 |
TdgClinicalTrial | – | 21 |
TTD | http://bidd.nus.edu.sg/group/cjttd | 22 |
The package provides a helper function that prints a list of all available source databases.
sourceDatabases()
Similarly, we can filter for specific gene categories. With the gene categories filter, drug interactions for genes with a specific category label can be queried. Examples of gene categories are clinically actionable, kinase, or tumor suppressor. The optional geneCategories argument can be used to filter by gene categories.
categories <- c("clinically actionable","kinase", "tumor suppressor") filter2 <- queryDGIdb(genes, geneCategories = categories)
There are 41 different gene categories available. The following command lists all available gene categories.
geneCategories()
Finally, the package provides filtering by interaction type. An interaction type is a label for the type of drug-gene interaction. 33 different interaction types are available and examples are: activator, inhibitor, cofactor, or modulator. The code below illustrates how to filter for specific interaction types.
interactions <- c("activator","inhibitor") filter3 <- queryDGIdb(genes, interactionTypes = interactions)
To print a list of all available interaction types, one can use the following method:
interactionTypes()
Depending on the requirement of a specific application, additional filtering might be applied directly on the query results. For example, to increase confidence of results, drug-gene interactions might be filtered by setting a minimum cutoff on the score. As a result, only drug-gene interactions supported by a minimum number of source databases will be reported. Different score cutoffs may be employed, depending on whether the aim is to query interactions with support from multiple source databases or to include as many drug-gene interactions as there are available in the source databases. The example below illustrates how to filter out drug-gene interactions with only a single supporting source database from the result summary table.
subset(resultSummary(filter2), Score > 1)
Although rDGIdb returns information on the type of interacting drug (such as inhibitor), to assist the follow-up interpretation of drug-gene interactions, querying and filtering through rDGIdb has limitations. For example, it is not possible to filter for specific drug-variant interactions. That is, variants in different locations of the same gene might have different biological effects in a cell or tumor. However, as querying is done on a gene level, variants can not be distinguished. Additional expert knowledge or other approaches will have to be employed to exclude non-relevant drug-gene interactions from the query results.
The package allows basic plotting of the results. Specifically, the number of interactions by source database can be visualized. An example plot is provided in Figure 2. This plot indicates which source databases report specifically large or small number of drug-gene interactions.
plotInteractionsBySource(filter2)
The rDGIdb package provides a function to print the version numbers of all resources integrated in DGIdb. This function helps users to decide if the resource versions available through rDGIdb are sufficient for their intended purpose.
resourceVersions()
We have described a workflow to identify potentially actionable genomic aberrations. More specifically, we have introduced the R/Bioconductor package rDGIdb, which provides an interface to query DGIdb using R. Given a list of genes with genomic aberrations, rDGIdb queries drug-gene interactions. The package allows filtering on different levels and visualization of the results. The rDGIdb package further includes detailed documentation and a vignette, which provides a step-by-step description of the workflow.
rDGIdb depends on jsonlite and httr, which are available in R version 3.3.1 or higher. Briefly, rDGIdb queries the API provided by DGIdb (http://dgidb.genome.wustl.edu/api) using the POST function implemented in httr. Drug-gene interactions are returned by DGIdb in JSON format. Next, the data is deserialized into an R list object using the jsonlite package. Finally, the list is parsed and stored as an object of type rDGIdbResult. In order for rDGIdb to work, jsonlite, httr, and their dependencies need to be installed. A complete sessionInfo() output is provided below, which includes minimal version numbers of all dependencies.
• R version 3.3.1 (2016-06-21), x86_64-apple-darwin13.4.0
• Locale: en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
• Base packages: base, datasets, graphics, grDevices, methods, stats, utils
• Other packages: rDGIdb 0.99.4
• Loaded via a namespace (and not attached): httr 1.1.0, jsonlite 1.0, R6 2.1.2, tools 3.3.1
1. Software available from: http://bioconductor.org/packages/rDGIdb/
2. Latest source code: https://github.com/Bioconductor-mirror/rDGIdb
3. Archived source code as at time of publication: http://dx.doi.org/10.5281/zenodo.5925323
4. License: MIT license
TT and FS designed the query framework, tested the package, and wrote the manuscript. TT implemented the package. NB and DS supervised the work. All authors read and approved the manuscript.
This work was supported by EU Horizon 2020 PHC grant No. 633974 (SOUND – Statistical multi-Omics UNDerstanding of Patient Samples).
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
The authors acknowlegde Anja Irmisch and Mitchell Levesque from the University Hospital Zurich (USZ) for their valuable feedback on filtering and interpretation of drug-gene interactions.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Competing Interests: No competing interests were disclosed.
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 24 Oct 16 |
read | read |
Version 1 12 Aug 16 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)