Keywords
machine learning, multi-omics, data integration, data harmonisation, multivariate analysis
This article is included in the RPackage gateway.
This article is included in the Bioinformatics gateway.
This article is included in the Artificial Intelligence and Machine Learning gateway.
machine learning, multi-omics, data integration, data harmonisation, multivariate analysis
Major pipeline updates, including to the API, the way optimal hyperparameters are stored and handled, and parallelisation. Also removed redundant plots and correctly skipped cases where the number of components are too few for a 2D plot. Replaced the existing case study with a separate, simplified case study, which is compatible with our latest pipeline API. Reworked installation to fix dependency issues, and included new automated tests. Rewrote introduction. Repository now contains information about R version compatibility and operating system requirements.
See the authors' detailed response to the review by Javad Zahiri
See the authors' detailed response to the review by Arjun Krishnan
A biological phenotype is an emergent property of a complex network of biological interactions. Since relying on a single layer of omics data to test a biological hypothesis results in an incomplete perspective of a biological system, interest in multi-omics data integration is steadily increasing as a means to decipher complex biological phenotypes.1
As a result, methods have been developed to leverage the multitude of data modalities in characterising biological systems. While many tools are available, most of these methods are heavily customised to fit a specific experimental design, and are not flexible enough to handle generic use cases.1 Furthermore, many tools that claim to perform data integration actually perform high-level data aggregation, where datasets are processed individually and only summarised.1 Of these algorithms, few perform data integration of multiple layers of omics data simultaneously, which we refer to specifically as “data harmonisation” to distinguish it from the more general term of “data integration”.1
Contributing to the lack of a generic “data harmonisation” tool is the nature of conventional biological pipelines, where multiple layers of data preprocessing and summarisation occur, causing an irreversible loss of information during data analysis. Therefore, in the context of this article, unrefined and information rich data refers to data in a primary form before heavy information loss occurs, such as matrices of molecular abundance data. By exploiting these low-level correlations instead of high-level summarised information, it is even possible to identify the relationship between individual biological molecules.
We illustrate these points with a hypothetical case of measuring protein and transcript levels in a same set of matched samples. A correlation across transcript and protein abundance functions as an interpretable association metric, highlighting interesting features (strong correlations) for further investigation [Figure 1].2 Furthermore, increasing the number of omics data theoretically increases resolution, and subsequently the resulting information obtained. Published multi-omics studies discovering novel biological insights which are not possible with single-omics data further supports our points.3–9 With the increasing volume of multi-omics data present in publicly accessible biological data repositories,10–12 multi-omics data integration is expected to be the core strategy of modern and future biological data analyses.
The rectangles represent different layers of omics data (e.g. proteome, transcriptome and lipidome) while the circles represent features within their respective omics data layer. Black single-line arrows show correlation between features within the omics data (e.g. a regulatory factor) while blue double-lines show correlation between features across different omics data layers. A powerful abstraction of the system under study can be obtained by reviewing multiple layers of omics data holistically.
It is important to note that at this time, no end-to-end pipeline or framework exists which allows the user to quickly and easily input unrefined data, run a pipeline and export output data which can be used for downstream analyses. Therefore, to facilitate this, we developed multiomics, a flexible, easy-to-install and easy-to-use pipeline targeted at bioinformaticians.13 We implemented functions from the mixOmics14 R package, as it is one of the only methods in the field which is generic in scope, makes no restrictive assumptions and integrates data at the level of individual molecules. It can be installed as a conventional R15 package or used by cloning the associated git repository.16 A series of quality control plots are generated automatically and compiled into a pdf file. There is seamless integration with mixOmics, where data generated by the pipeline is exported automatically as a R data object of mixOmics classes, allowing expert users to intervene where needed, while allowing new users to perform a comprehensive screen of their data. As a form of checkpointing, the R data object is updated at every major stage of the pipeline, and can be loaded directly into the mixOmics suite of tools for further investigation or plot customisation. To increase reproducibility, command line arguments as well as parameters are also exported as files which can be rerun directly to reproduce the output. For convenience, the option to provide command line arguments as a json file is also available.
Detailed documentation is provided both within the source git repository and as vignettes in the R package. Multiple installation methods are shown in the git repository to maximise accessibility of our pipeline for users.17–19 Additionally, walkthroughs of three case studies are included. Complete and detailed examples of input data format are also provided, including a sample dataset which can be loaded directly from the R package. In this manuscript, we summarise these information and show a minimum working example to highlight some of the key features of our pipeline.
Quick install
You can install this directly as a R package from github:
install.packages("devtools") library("devtools") install_gitlab("tyagilab/sars-cov-2", subdir="multiomics", INSTALL_opts=”—no-multiarch”)
Manual install
If the above automated install steps do not work, detailed manual installation instructions are available in the source git repository at https://github.com/tyronechen/SARS-CoV-2 and https://gitlab.com/tyagilab/sars-cov-2/-/tree/master for conda and R.
You may need to install mixOmics from source. If needed, please follow the installation instructions on https://github.com/mixOmicsTeam/mixOmics:
install_github("mixOmicsTeam/mixOmics")
The actual script used to run the pipeline is not directly callable but provided as a separate script. Running the following command will show you the path to the script. A copy of this is also available in the source git repository.
system.file("scripts", "run_pipeline.R", package="multiomics") # outside of R Rscript run_pipeline.R -h
Example input
Three elements are the minimum required input for the pipeline [Figure 2]. First, a file containing biological class information is required. Next, at least two files corresponding to omics data blocks are required. Finally, a list of unique names labelling each data block is required. Examples of these input files and their internal data structure as they appear in the pipeline are shown.
# data is included within the package # for demonstration purposes we extract the data into files, since the pipeline takes files as input. library (multiomics) data (BPH2819) names (BPH2819) # [1] "classes" "metabolome" "proteome" "transcriptome"
export <- function (name, data) { write.table( data.frame (data), paste (name, ".tsv", sep=""), quote=FALSE, sep="\t", row.names=TRUE, col.names=NA ) } mapply (export, names (BPH2819), BPH2819, SIMPLIFY=FALSE)
# if the above does not work, they are available online url_class <- "https://github.com/tyronechen/SARS-CoV-2/blob/master/multiomics/data/classes.tsv" url_meta <- "https://github.com/tyronechen/SARS-CoV-2/blob/master/multiomics/data/metabolome.tsv" url_prot <- "https://github.com/tyronechen/SARS-CoV-2/blob/master/multiomics/data/proteome.tsv" url_tran <- "https://github.com/tyronechen/SARS-CoV-2/blob/master/multiomics/data/transcriptome.tsv"
urls <- c (url_class, url_meta, url_prot, url_tran) file_names <- sapply (strsplit (urls, "/"), tail, 1) mapply (function(x, y) download.file(x, y), urls, file_names, SIMPLIFY=FALSE) if (any (file.exists (file_names)) != TRUE) {stop("Files incorrectly downloaded!")}
Note that column names and row names should be truncated to avoid bugs in the pipeline associated with name length. Furthermore, usage of non-alphanumeric characters in their names should be avoided as R quietly replaces these with “.” (periods).
The pipeline is run with the command Rscript run_pipeline.R and passing a list of command line arguments either as strings of text or in a json file (recommended). Running the actual pipeline can take some time. The main bottleneck is parameter tuning which scales exponentially with the number of omics data blocks, but it is possible to disable this if the user wants to perform a test run or is already aware of the parameters. We note that R Data objects are periodically exported that allow for seamless integration with functions in the underlying mixOmics package when needed. A secondary bottleneck is data imputation, which scales with the number of components used and the dimensions of the input data. If needed, it is possible to impute and export this imputed data either with the pipeline or with the underlying mixOmics function, and then substitute that as input. The user can adjust the number of cpus if needed to speed up the process. Data imputation can be skipped if it is not required.
Code for the pipeline can be examined in detail from the git repository or individual functions can be inspected directly after loading the R multiomics package.
Output files include a pdf file compiling all graphical output.20–24 Note that this can be quite large, especially if you have a large dataset. A graphml file is also exported for input into cytoscape.25 Due to the size and volume of plots, we provide a link to some example plots here. A manuscript using figures generated from this pipeline is also available for reference.26
Each analysis generates a series of text files containing feature weights. In some ways, these are functionally analogous to differential expression analyses, where these coefficients summarise the features with the most phenotypically relevant information. At the same time, a table of feature correlations across multi-omics data is generated. Some examples of these are shown below:
# download single-omic variable weights url <- paste( "https://raw.githubusercontent.com/tyronechen/", "SARS-CoV-2/master/results/case_study_3/", "Metabolomics_GC_MS_1_sPLSDA_max.txt", sep="" ) download.file (url, "Metabolomics_GC_MS_1_PLSDA_max.txt")
url <- paste( "https://raw.githubusercontent.com/tyronechen/", "SARS-CoV-2/master/results/case_study_3/", "Metabolomics_GC_MS_1_sPLSDA_max.txt", sep="" ) download.file (url, "Metabolomics_GC_MS_1_sPLSDA_max.txt")
# download multi-omic variable weights # this is for a single block of omics data url <- paste( "https://raw.githubusercontent.com/tyronechen/", "SARS-CoV-2/master/results/case_study_3/", "Metabolomics_GC_MS_1_DIABLO_var_keepx_max.txt", sep="" ) download.file (url, "Metabolomics_GC_MS_1_DIABLO_var_keepx_max.txt")
# download multi-omic correlations url <- paste( "https://raw.githubusercontent.com/tyronechen/", "SARS-CoV-2/master/results/case_study_3/", "DIABLO_var_keepx_correlations.txt", sep="" ) download.file (url, "DIABLO_var_keepx_correlations.txt")
metabolomics_plsda <- read.table( "Metabolomics_GC_MS_1_PLSDA_max.txt", header=TRUE, sep="\t", row.names=1 ) colnames (metabolomics_plsda) # [1] "Sera" "Contrib.RPMI" "Contrib.Sera" "Contrib" GroupContrib" # [6] "color" "importance
metabolomics_splsda <- read.table( "Metabolomics_GC_MS_1_sPLSDA_max.txt", header=TRUE, sep="\t", row.names=1 ) colnames (metabolomics_splsda) # [1] "Sera" "Contrib.RPMI" "Contrib.Sera" "Contrib" "GroupContrib" # [6] "color" "importance
head (metabolomics_splsda[,1:2]) # Sera Contrib.RPMI # HMDB0000673-0.9031770 0.9405335 # HMDB0000067-0.9197936 0.9595830 # HMDB0000273-0.9501236 0.9371693 # HMDB0000207-0.9487778 1.0114701 # HMDB0003229-1.0032847 1.0099811 # HMDB0001043-0.7016579 0.9593041
metabolomics_diablo <- read.table( "Metabolomics_GC_MS_1_DIABLO_var_keepx_max.txt", header=TRUE, sep="\t", row.names=1 ) colnames (metabolomics_diablo) # [1] "More.severe" "Contrib.Less.severe" "Contrib.More.severe" "Contrib" # [5] "GroupContrib" "color" "importance"
head (metabolomics_diablo[,1:2]) # Sera Contrib.RPMI # Metabolomics_GC_MS_HMDB0000067-0.9197936 0.9595830 # Metabolomics_GC_MS_HMDB0000673-0.9031770 0.9405335 # Metabolomics_GC_MS_HMDB0000273-0.9501236 0.9371693 # Metabolomics_GC_MS_HMDB0000207-0.9487778 1.0114701 # Metabolomics_GC_MS_HMDB0003229-1.0032847 1.0099811
correlations <- read.table( "DIABLO_var_keepx_correlations.txt", header=TRUE, sep="\t", row.names=1 ) dim (correlations) # [1] 15 15
head (correlations[,1:2]) # Metabolomics_GC_MS_HMDB0000207 # Metabolomics_GC_MS_HMDB0000067 0.9746416 # Metabolomics_GC_MS_HMDB0000207 0.9727075 # Metabolomics_GC_MS_HMDB0000273 0.9785230 # Metabolomics_GC_MS_HMDB0000673 0.9803517 # Metabolomics_GC_MS_HMDB0003229 0.9648338 # Proteomics_MS1_DDA_WP_000514408.1 -0.9706518
An R data file containing all of the information above and a script containing command line arguments which can be used to reproduce the analysis are also exported to enable full reproducibility.
Examples of these output files for three case studies are included in the source git repository.
We demonstrate a sample use case of our pipeline with reference to an earlier re-analysis of a published dataset.13,26 For simplicity, we highlight one case study only in this manuscript, but note that detailed walkthroughs for all three are available in the source git repository.27–29 Our tool takes as input at least two data files present as tables of quantitative information, with samples as rows and features as columns. A list of names corresponding to the names of these data blocks are required. A file containing class information is also required as a list of newline separated values. Examples of these data and class files for three case studies are included in the source git repository. Other command line arguments are also possible pertaining to distance metrics of choice for prediction, number of features to select and others. A full description of these can be obtained by running Rscript run_pipeline.R -h, which will list every flag in detail. Because of the number of command line arguments, an option is provided to pass these parameters as a json file to the pipeline. Examples of these json files for three case studies are included in the source git repository.
Regarding input data, some example data29 is provided as part of our R package.
library (multiomics) data (BPH2819) names (BPH2819) # [1] "classes" "metabolome" "proteome" "transcriptome"
sapply (BPH2819, dim) # $classes # NULL # $metabolome # [1] 12 153 # $proteome # [1] 12 1451 # $transcriptome # [1] 12 2771
Alternatively, you may download this from our git repository directly. This is a subset of sepsis data generated in a separate publication.29
We provide a fully processed dataset as a guide for the user. The steps below can be reproduced by downloading the R data object with the following command:
url <- paste( "https://github.com/tyronechen/SARS-CoV-2/", "raw/master/results/case_study_3/data. RData", sep="" ) download.file (url, "RData.RData") load("RData.RData") ls() # [1] "argpath" "argv" "classes" # [4] "contrib" "corr_cutoff" "correlations" # [7] "data" "data_imp" "data_names" # [10] "data_pca_multilevel" "data_plsda" "data_splsda" # [13] "design" "diablo" "diablo_input" # [16] "diablo_keepx" "diablo_ncomp" "dimensions" # [19] "dist_diablo" "dist_plsda" "dist_splsda" # [22] "export" "heatmaps" "i" # [25] "input_data" "linkage" "low_var" # [28] "mappings" "metabolomics_diablo" "metabolomics_plsda" # [31] "metabolomics_splsda" "missing" "optimal_params" # [34] "optimal_params_values" "outdir" "paths" # [37] "pca_impute" "pca_withna" "pch" # [40] "perf_diablo" "plot" "plsda_ncomp" # [43] "rdata" "splsda_keepx" "splsda_ncomp" # [46] "tuned_diablo" "tuned_splsda" "url" # [49] "x" "y"
Inspecting the minimum required input (classes and data) reveals the following:
# number of samples > length (classes) # [1] 12
# data dimensions sapply (data, dim) # Metabolomics_GC_MS Proteomics_MS1_DDA RNA_Seq # [1,] 12 12 12 # [2,] 153 1451 2771
table (classes) # classes # RPMI Sera # 6 6
head (data$Metabolomics_GC_MS[,1:3]) # X3.Aminoglutaric.acid HMDB0000005 HMDB0000008 # RPMI_0 -1.7814083 -9.103010 -3.471373 # RPMI_1 -1.9108074 -5.401229 -3.488496 # RPMI_2 -1.5458964 -10.898804 -2.845025 # RPMI_3 -2.1842312 -9.563557 -1.232155 # RPMI_4 -1.3106881 -4.755440 -1.723564 # RPMI_5 -0.9600247 -4.771127 -1.403044
First, data is filtered if associated options are specified by the user. Features with missing values across sample groups are discarded by default. The user can also choose to filter out features (columns) exceeding a certain threshold of missing values.
Imputing missing values is optional as PLS-derived methods can function without this step. However, we include this information in case the user would like to perform this step manually. Remaining missing values can be imputed by the user-specified --icomp flag. Imputation is effective when the quantity of missing values is <20% of the data. To investigate if the data has been significantly changed, the user can plot a correlation plot of the principal components before and after imputation. Since imputation can take a long time, especially for large datasets, the imputed data is saved by default and the user can load it in directly as input if desired.
If the study design is longitudinal (e.g. has repeated measurements on the same sample), then the --pch flag should be enabled by the user. The user should pass in a file with the same format as the classes file, but containing information regarding the repeated measurements.23,30 Providing this information allows the pipeline to adjust for this internally.
Most of the parameters for the machine learning algorithms are specified by the user. These cover the three methods PLSDA (partial least squares discriminant analysis), sPLSDA (sparse PLSDA) and multi-block sPLSDA (also known as DIABLO). The underlying methods are implemented within the mixOmics software package and more information is available on their website http://mixomics.org/. For each method, a distance metric is specified, either “max.dist”, “centroids.dist” or “mahalanobis.dist”. Unlike PLSDA, sPLSDA and multi-block sPLSDA focus on selecting subset of the most relevant features and therefore require a user-specified list describing the quantity of features to be selected from the data. The number of components to derive for each method is also provided. For this section, several exploratory runs with a wide range can be carried out to find the optimal configuration of features, e.g. starting at 5,10,30,50,100, inspecting subsequent output and further narrowing the range. The user can specify a few additional special parameters to the multi-block sPLSDA (block.splsda) function. The linkage parameter is a continuous value from 0 to 1, and describes the type of analysis, with a value closer to 0 prioritising class discrimination and a value closer to 1 prioritising correlation between data sets. Meanwhile, setting the number of multi-block sPLSDA components to 0 causes the pipeline to perform parameter tuning internally. Note that this can take a long time, and scales exponentially per added block of omics data. The user can also specify the number of cpus to be used for parallel processing, which mainly affects parameter tuning. Using our example, these arguments are provided here:
> argv # … [[1]] [1] FALSE $help [1] FALSE $low_var [1] FALSE $mini_run [1] FALSE $progress_bar [1] TRUE $opts [1] NA $json [1] NA $classes [1] "BPH2819_info_all.tsv" $classes_secondary [1] NA $dropna_classes [1] FALSE $dropna_prop [1] 0.6 $data [1] "Staphylococcus_aureus_BPH2819_Metabolomics_GC_MS/BPH2819.tsv" [2] "Staphylococcus_aureus_BPH2819_Proteomics_MS1_DDA/BPH2819.tsv" [3] "Staphylococcus_aureus_BPH2819_RNA_Seq/BPH2819.tsv" $data_names [1] "Metabolomics_GC_MS" "Proteomics_MS1_DDA" "RNA_Seq" $force_unique [1] TRUE $mappings [1] NA $ncpus [1] 24 $diablocomp [1] 2 $linkage [1] 0.1 $diablo_keepx [1] NA $icomp [1] 12 $zero_as_na [1] TRUE $replace_missing [1] TRUE $pcomp [1] 10 $plsdacomp [1] 2 $splsdacomp [1] 2 $splsda_keepx [1] NA $dist_plsda [1] "centroids.dist" $dist_splsda [1] "centroids.dist" $dist_diablo [1] "centroids.dist" $cross_val [1] "Mfold" $cross_val_nrepeat [1] 50 $cross_val_folds [1] 5 $contrib [1] "max" $corr_cutoff [1] 0.1 $optimal_params [1] NA# …
Results as well as quality control metrics (including cross-validation error rates) are exported in a series of plots and compiled into a pdf [Figure 3]. They can also be accessed internally from our provided R data object. Some sample output is shown below.
Pipeline output can be controlled by specifying a number of flags. By default, the pipeline deposits data in the current working directory. This behaviour can be easily modified. Setting outfile_dir specifies the master output directory. A R data object containing objects shown in the loaded RData file can be renamed with the rdata option, generating a file similar to the one used in this example. The plot flag defines the pdf file containing all graphical output as a multi-page pdf of all plots generated in the pipeline. A reproducible script is generated and named by the user with the args flag (this defaults to Rscript.sh).
> argv # continued from previous … $outfile_dir [1] "/path/to/outdir" $rdata [1] "./data. RData" $plot [1] "./Rplots.pdf" $args [1] "Rscript.sh" # …
Finally, the pipeline has a limited check-pointing built-in. At each milestone in the pipeline, the relevant output is saved and written out as a RData file, similar to the one presented above. This allows the user to manually inspect the data and adjust it to their needs where needed. In the case of completed output, the user can further customise plots and data exports for publication or downstream analysis. Importantly, data objects are compatible with core mixOmics functions, and allows seamless integration with the mixOmics suite of tools if the user intends to extend or perform their own custom analysis workflows.
Conceptualization, S. T, T. C; Data Curation, S. T, T. C; Formal Analysis, K-A. L-C, T. C; Funding Acquisition, K-A. L-C, S. T; Methodology, A. J. A, K-A. L-C; Project Administration, S. T; Resources, S. T; Supervision, K-A. L-C, S. T; Software, A. J. A, K-A. L-C, T. C; Validation, A. J. A, K-A. L-C, S. T, T. C; Visualization, A. J. A, K-A. L-C, Writing Original Draft Preparation, S. T, T. C; Writing Review & Editing, A. J. A, K-A. L-C, S. T, T. C.
Primary data was generated by third parties and is publicly available. 27 – 29 For case study 1, translatome data is available from the source publication as Supplementary Table 1 and proteome data is available as Supplementary Table 2. For case study 2, the authors provided their data in a sql database. For case study 3, data is provided in publicly available accessions.
Zenodo: Multi-omics data harmonisation for the discovery of COVID-19 drug targets. https://doi.org/10.5281/zenodo.4602867. 13
This project contains the following data.
• Documentation in markdown format describing pipeline usage on two case studies.
• Input data files in plain text (see Source Data for more information).
• Graphical output as pdf files and feature weights as text files.
• Source code, including code to reproduce figures in this article and source code for the R package.
• Docker file specifications for use with Docker and singularity images.
Github: SARS-CoV-2. https://github.com/tyronechen/SARS-CoV-2
Gitlab: SARS-CoV-2. https://gitlab.com/tyagilab/sars-cov-2. 13
• Documentation in markdown format describing pipeline usage on three case studies.
• Input data files in plain text (see Source Data for more information).
• Graphical output as pdf files and feature weights as text files.
• Source code, including code to reproduce figures in this article and source code for the R package.
• Docker file specifications for use with Docker and singularity images.
The following underlying data is used in this article:
• metabolome.tsv (Text file as raw input data (metabolomics) for case study 3)
• proteome.tsv (Text file as raw input data (proteomics) for case study 3.)
• transcriptome.tsv (Text file as raw input data (transcriptome) for case study 3.)
• classes.tsv (Text file as raw input data (biological classes) for case study 3.)
• data.RData (R data object containing all input, intermediate and output data for case study 3.)
• manuscript_figures (Example output plots that can be generated by the pipeline.) 27 , 29
Code and data is available under the MIT license. Documentation is available under the CC-BY-3.0 AU license.
The following extended data is available in the same repository:
• data/case_study_1 (All raw input data for case study 1.)
• data/case_study_2 (All raw input data for case study 2.)
• data/case_study_3 (All raw input data for case study 2.)
• results/case_study_1 (Example output data for case study 1.)
• results/case_study_2 (Example output data for case study 2.)
• results/case_study_3 (Example output data for case study 3.)
Similar to underlying data, extended code and data is available under the MIT license. Documentation is available under the CC-BY-3.0 AU license.
• Software available through R directly:
• install.packages("devtools")
• library("devtools")
• install_github("tyronechen/SARS-CoV-2", subdir="multiomics", INSTALL_opts="--no-multiarch")
The actual script used to run the pipeline is not directly callable but provided as a separate script.
# this will show you the path to the script system.file("scripts", "run_pipeline.R", package="multiomics")
• Source code available from: https://github.com/tyronechen/SARS-CoV-2
• Archived source code at time of publication: https://doi.org/10.5281/zenodo.4562009
• License: MIT License. Documentation provided under a CC-BY-3.0 AU license
The specific version numbers of the packages used are shown below, along with the version of the R installation.
> library (multiomics) > sessionInfo() R version 4.2.3 (2023-03-15) Platform: x86_64-conda-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)
Matrix products: default BLAS/LAPACK: /fs04/lz25/tyronec/envs/multiomics/lib/libopenblasp-r0.3.17.so
locale: [1] LC_CTYPE=en_AU.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_AU.UTF-8 LC_COLLATE=en_AU.UTF-8 [5] LC_MONETARY=en_AU.UTF-8 LC_MESSAGES=en_AU.UTF-8 [7] LC_PAPER=en_AU.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_AU.UTF-8 LC_IDENTIFICATION=C
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] multiomics_1.2.1
loaded via a namespace (and not attached): [1] compiler_4.2.3 tools_4.2.3
Data was generated as part of the Antibiotic Resistant Pathogens Framework Data Initiative. The authors thank the HPC team at Monash eResearch Centre for their continuous personnel support. This work was supported by the MASSIVE HPC facility. We acknowledge and pay respects to the Elders and Traditional Owners of the land on which our 4 Australian campuses stand.
Views | Downloads | |
---|---|---|
F1000Research | - | - |
PubMed Central
Data from PMC are received and updated monthly.
|
- | - |
Is the rationale for developing the new software tool clearly explained?
Yes
Is the description of the software tool technically sound?
Partly
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Partly
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
No
References
1. Chen T, Philip M, Lê Cao KA, Tyagi S: A multi-modal data harmonisation approach for discovery of COVID-19 drug targets.Brief Bioinform. 2021. PubMed Abstract | Publisher Full TextCompeting Interests: No competing interests were disclosed.
Reviewer Expertise: Computational biology, Bioinformatics, Machine learning, Software development
Is the rationale for developing the new software tool clearly explained?
Partly
Is the description of the software tool technically sound?
No
Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others?
Yes
Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool?
Partly
Are the conclusions about the tool and its performance adequately supported by the findings presented in the article?
No
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Bioinformatics, computational genomics, machine learning
Alongside their report, reviewers assign a status to the article:
Invited Reviewers | ||
---|---|---|
1 | 2 | |
Version 2 (revision) 02 Aug 23 |
||
Version 1 06 Jul 21 |
read | read |
Provide sufficient details of any financial or non-financial competing interests to enable users to assess whether your comments might lead a reasonable person to question your impartiality. Consider the following examples, but note that this is not an exhaustive list:
Sign up for content alerts and receive a weekly or monthly email with all newly published articles
Already registered? Sign in
The email address should be the one you originally registered with F1000.
You registered with F1000 via Google, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Google account password, please click here.
You registered with F1000 via Facebook, so we cannot reset your password.
To sign in, please click here.
If you still need help with your Facebook account password, please click here.
If your email address is registered with us, we will email you instructions to reset your password.
If you think you should have received this email but it has not arrived, please check your spam filters and/or contact for further assistance.
Comments on this article Comments (0)