ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Software Tool Article
Revised

multiomics: A user-friendly multi-omics data harmonisation R pipeline

[version 2; peer review: 2 not approved]
PUBLISHED 02 Aug 2023
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the RPackage gateway.

This article is included in the Bioinformatics gateway.

This article is included in the Artificial Intelligence and Machine Learning gateway.

Abstract

Data from multiple omics layers of a biological system is growing in quantity, heterogeneity and dimensionality. Simultaneous multi-omics data integration is of immense interest to researchers as it has potential to unlock previously hidden biomolecular relationships leading to early diagnosis, prognosis, and expedited treatments. Many tools for multi-omics data integration are developed. However, these tools are often restricted to highly specific experimental designs, types of omics data, and specific data formats. A major limitation of the field is the lack of a pipeline that can accept data in unrefined form to preserve maximum biology in an individual dataset prior to integration. We fill this gap by developing a flexible, generic multi-omics pipeline called multiomics, to facilitate general-purpose data exploration and analysis of heterogeneous data. The pipeline takes unrefined multi-omics data as input, sample information and user-specified parameters to generate a list of output plots and data tables for quality control and downstream analysis. We have demonstrated its application on a sepsis case study. We enabled limited checkpointing functionality where intermediate output is staged to allow continuation after errors or interruptions in the pipeline and generate a script for reproducing the analysis to improve reproducibility. Our pipeline can be installed as an R package or manually from the git repository, and is accompanied by detailed documentation with walkthroughs on three case studies.

Keywords

machine learning, multi-omics, data integration, data harmonisation, multivariate analysis

Revised Amendments from Version 1

Major pipeline updates, including to the API, the way optimal hyperparameters are stored and handled, and parallelisation. Also removed redundant plots and correctly skipped cases where the number of components are too few for a 2D plot. Replaced the existing case study with a separate, simplified case study, which is compatible with our latest pipeline API. Reworked installation to fix dependency issues, and included new automated tests. Rewrote introduction. Repository now contains information about R version compatibility and operating system requirements.

See the authors' detailed response to the review by Javad Zahiri
See the authors' detailed response to the review by Arjun Krishnan

Introduction

A biological phenotype is an emergent property of a complex network of biological interactions. Since relying on a single layer of omics data to test a biological hypothesis results in an incomplete perspective of a biological system, interest in multi-omics data integration is steadily increasing as a means to decipher complex biological phenotypes.1

As a result, methods have been developed to leverage the multitude of data modalities in characterising biological systems. While many tools are available, most of these methods are heavily customised to fit a specific experimental design, and are not flexible enough to handle generic use cases.1 Furthermore, many tools that claim to perform data integration actually perform high-level data aggregation, where datasets are processed individually and only summarised.1 Of these algorithms, few perform data integration of multiple layers of omics data simultaneously, which we refer to specifically as “data harmonisation” to distinguish it from the more general term of “data integration”.1

Contributing to the lack of a generic “data harmonisation” tool is the nature of conventional biological pipelines, where multiple layers of data preprocessing and summarisation occur, causing an irreversible loss of information during data analysis. Therefore, in the context of this article, unrefined and information rich data refers to data in a primary form before heavy information loss occurs, such as matrices of molecular abundance data. By exploiting these low-level correlations instead of high-level summarised information, it is even possible to identify the relationship between individual biological molecules.

We illustrate these points with a hypothetical case of measuring protein and transcript levels in a same set of matched samples. A correlation across transcript and protein abundance functions as an interpretable association metric, highlighting interesting features (strong correlations) for further investigation [Figure 1].2 Furthermore, increasing the number of omics data theoretically increases resolution, and subsequently the resulting information obtained. Published multi-omics studies discovering novel biological insights which are not possible with single-omics data further supports our points.39 With the increasing volume of multi-omics data present in publicly accessible biological data repositories,1012 multi-omics data integration is expected to be the core strategy of modern and future biological data analyses.

65d9cfbb-0594-404c-97aa-fd21adac26ab_figure1.gif

Figure 1. An illustration of a hypothetical multi-omics perspective on a simple biological system.

The rectangles represent different layers of omics data (e.g. proteome, transcriptome and lipidome) while the circles represent features within their respective omics data layer. Black single-line arrows show correlation between features within the omics data (e.g. a regulatory factor) while blue double-lines show correlation between features across different omics data layers. A powerful abstraction of the system under study can be obtained by reviewing multiple layers of omics data holistically.

It is important to note that at this time, no end-to-end pipeline or framework exists which allows the user to quickly and easily input unrefined data, run a pipeline and export output data which can be used for downstream analyses. Therefore, to facilitate this, we developed multiomics, a flexible, easy-to-install and easy-to-use pipeline targeted at bioinformaticians.13 We implemented functions from the mixOmics14 R package, as it is one of the only methods in the field which is generic in scope, makes no restrictive assumptions and integrates data at the level of individual molecules. It can be installed as a conventional R15 package or used by cloning the associated git repository.16 A series of quality control plots are generated automatically and compiled into a pdf file. There is seamless integration with mixOmics, where data generated by the pipeline is exported automatically as a R data object of mixOmics classes, allowing expert users to intervene where needed, while allowing new users to perform a comprehensive screen of their data. As a form of checkpointing, the R data object is updated at every major stage of the pipeline, and can be loaded directly into the mixOmics suite of tools for further investigation or plot customisation. To increase reproducibility, command line arguments as well as parameters are also exported as files which can be rerun directly to reproduce the output. For convenience, the option to provide command line arguments as a json file is also available.

Detailed documentation is provided both within the source git repository and as vignettes in the R package. Multiple installation methods are shown in the git repository to maximise accessibility of our pipeline for users.1719 Additionally, walkthroughs of three case studies are included. Complete and detailed examples of input data format are also provided, including a sample dataset which can be loaded directly from the R package. In this manuscript, we summarise these information and show a minimum working example to highlight some of the key features of our pipeline.

Methods

Implementation

Quick install

You can install this directly as a R package from github:

install.packages("devtools")
library("devtools")
install_gitlab("tyagilab/sars-cov-2", subdir="multiomics", INSTALL_opts=”—no-multiarch”)

Manual install

If the above automated install steps do not work, detailed manual installation instructions are available in the source git repository at https://github.com/tyronechen/SARS-CoV-2 and https://gitlab.com/tyagilab/sars-cov-2/-/tree/master for conda and R.

You may need to install mixOmics from source. If needed, please follow the installation instructions on https://github.com/mixOmicsTeam/mixOmics:

install_github("mixOmicsTeam/mixOmics")

The actual script used to run the pipeline is not directly callable but provided as a separate script. Running the following command will show you the path to the script. A copy of this is also available in the source git repository.

system.file("scripts", "run_pipeline.R", package="multiomics")
# outside of R
Rscript run_pipeline.R -h

Operation

Example input

Three elements are the minimum required input for the pipeline [Figure 2]. First, a file containing biological class information is required. Next, at least two files corresponding to omics data blocks are required. Finally, a list of unique names labelling each data block is required. Examples of these input files and their internal data structure as they appear in the pipeline are shown.

# data is included within the package
# for demonstration purposes we extract the data into files, since the pipeline takes files as input.
library (multiomics)
data (BPH2819)
names (BPH2819)
# [1] "classes"    "metabolome"    "proteome"    "transcriptome"
export <- function (name, data) {
  write.table(
   data.frame (data),
   paste (name, ".tsv", sep=""),
   quote=FALSE, sep="\t",
   row.names=TRUE,
   col.names=NA
   )
  }
mapply (export, names (BPH2819), BPH2819, SIMPLIFY=FALSE)
# if the above does not work, they are available online
url_class <- "https://github.com/tyronechen/SARS-CoV-2/blob/master/multiomics/data/classes.tsv"
url_meta <- "https://github.com/tyronechen/SARS-CoV-2/blob/master/multiomics/data/metabolome.tsv"
url_prot <- "https://github.com/tyronechen/SARS-CoV-2/blob/master/multiomics/data/proteome.tsv"
url_tran <- "https://github.com/tyronechen/SARS-CoV-2/blob/master/multiomics/data/transcriptome.tsv"
urls <- c (url_class, url_meta, url_prot, url_tran)
file_names <- sapply (strsplit (urls, "/"), tail, 1)
mapply (function(x, y) download.file(x, y), urls, file_names, SIMPLIFY=FALSE)
if (any (file.exists (file_names)) != TRUE) {stop("Files incorrectly downloaded!")}

65d9cfbb-0594-404c-97aa-fd21adac26ab_figure2.gif

Figure 2. Technical notes for the pipeline.

We summarise pipeline installation steps and the flow of data through the pipeline. This figure was originally published on gitlab under a CC-BY-3.0 AU license and is reproduced here with permission.

Note that column names and row names should be truncated to avoid bugs in the pipeline associated with name length. Furthermore, usage of non-alphanumeric characters in their names should be avoided as R quietly replaces these with “.” (periods).

In addition to this case study, data and metadata for two other case studies are included in the source git repository. Please refer to the corresponding case studies detailed on github for more information.

Running the pipeline

The pipeline is run with the command Rscript run_pipeline.R and passing a list of command line arguments either as strings of text or in a json file (recommended). Running the actual pipeline can take some time. The main bottleneck is parameter tuning which scales exponentially with the number of omics data blocks, but it is possible to disable this if the user wants to perform a test run or is already aware of the parameters. We note that R Data objects are periodically exported that allow for seamless integration with functions in the underlying mixOmics package when needed. A secondary bottleneck is data imputation, which scales with the number of components used and the dimensions of the input data. If needed, it is possible to impute and export this imputed data either with the pipeline or with the underlying mixOmics function, and then substitute that as input. The user can adjust the number of cpus if needed to speed up the process. Data imputation can be skipped if it is not required.

Code for the pipeline can be examined in detail from the git repository or individual functions can be inspected directly after loading the R multiomics package.

Example output

Output files include a pdf file compiling all graphical output.2024 Note that this can be quite large, especially if you have a large dataset. A graphml file is also exported for input into cytoscape.25 Due to the size and volume of plots, we provide a link to some example plots here. A manuscript using figures generated from this pipeline is also available for reference.26

Each analysis generates a series of text files containing feature weights. In some ways, these are functionally analogous to differential expression analyses, where these coefficients summarise the features with the most phenotypically relevant information. At the same time, a table of feature correlations across multi-omics data is generated. Some examples of these are shown below:

# download single-omic variable weights
   url <- paste(
   "https://raw.githubusercontent.com/tyronechen/",
   "SARS-CoV-2/master/results/case_study_3/",
   "Metabolomics_GC_MS_1_sPLSDA_max.txt",
   sep=""
   )
download.file (url, "Metabolomics_GC_MS_1_PLSDA_max.txt")
url <- paste(
   "https://raw.githubusercontent.com/tyronechen/",
   "SARS-CoV-2/master/results/case_study_3/",
   "Metabolomics_GC_MS_1_sPLSDA_max.txt",
   sep=""
   )
download.file (url, "Metabolomics_GC_MS_1_sPLSDA_max.txt")
# download multi-omic variable weights
# this is for a single block of omics data
url <- paste(
   "https://raw.githubusercontent.com/tyronechen/",
   "SARS-CoV-2/master/results/case_study_3/",
   "Metabolomics_GC_MS_1_DIABLO_var_keepx_max.txt",
   sep=""
   )
download.file (url, "Metabolomics_GC_MS_1_DIABLO_var_keepx_max.txt")
# download multi-omic correlations
url <- paste(
   "https://raw.githubusercontent.com/tyronechen/",
   "SARS-CoV-2/master/results/case_study_3/",
   "DIABLO_var_keepx_correlations.txt",
   sep=""
   )
download.file (url, "DIABLO_var_keepx_correlations.txt")
metabolomics_plsda <- read.table(
   "Metabolomics_GC_MS_1_PLSDA_max.txt",
   header=TRUE, sep="\t", row.names=1
   )
colnames (metabolomics_plsda)
# [1] "Sera"     "Contrib.RPMI" "Contrib.Sera" "Contrib"    GroupContrib"
# [6] "color"     "importance
metabolomics_splsda <- read.table(
   "Metabolomics_GC_MS_1_sPLSDA_max.txt",
   header=TRUE, sep="\t", row.names=1
   )
colnames (metabolomics_splsda)
# [1] "Sera"   "Contrib.RPMI" "Contrib.Sera" "Contrib"   "GroupContrib"
# [6] "color"   "importance
head (metabolomics_splsda[,1:2])
#                  Sera Contrib.RPMI
# HMDB0000673-0.9031770    0.9405335
# HMDB0000067-0.9197936    0.9595830
# HMDB0000273-0.9501236    0.9371693
# HMDB0000207-0.9487778    1.0114701
# HMDB0003229-1.0032847    1.0099811
# HMDB0001043-0.7016579    0.9593041
metabolomics_diablo <- read.table(
   "Metabolomics_GC_MS_1_DIABLO_var_keepx_max.txt",
   header=TRUE, sep="\t", row.names=1
   )
colnames (metabolomics_diablo)
# [1] "More.severe"     "Contrib.Less.severe" "Contrib.More.severe" "Contrib"
# [5] "GroupContrib"     "color"     "importance"
head (metabolomics_diablo[,1:2])
#                                     Sera Contrib.RPMI
# Metabolomics_GC_MS_HMDB0000067-0.9197936    0.9595830
# Metabolomics_GC_MS_HMDB0000673-0.9031770    0.9405335
# Metabolomics_GC_MS_HMDB0000273-0.9501236    0.9371693
# Metabolomics_GC_MS_HMDB0000207-0.9487778    1.0114701
# Metabolomics_GC_MS_HMDB0003229-1.0032847    1.0099811
correlations <- read.table(
   "DIABLO_var_keepx_correlations.txt",
   header=TRUE, sep="\t", row.names=1
   )
dim (correlations)
# [1] 15 15
head (correlations[,1:2])
#                                     Metabolomics_GC_MS_HMDB0000207
# Metabolomics_GC_MS_HMDB0000067      0.9746416
# Metabolomics_GC_MS_HMDB0000207      0.9727075
# Metabolomics_GC_MS_HMDB0000273      0.9785230
# Metabolomics_GC_MS_HMDB0000673      0.9803517
# Metabolomics_GC_MS_HMDB0003229      0.9648338
# Proteomics_MS1_DDA_WP_000514408.1   -0.9706518

An R data file containing all of the information above and a script containing command line arguments which can be used to reproduce the analysis are also exported to enable full reproducibility.

Examples of these output files for three case studies are included in the source git repository.

Use cases

We demonstrate a sample use case of our pipeline with reference to an earlier re-analysis of a published dataset.13,26 For simplicity, we highlight one case study only in this manuscript, but note that detailed walkthroughs for all three are available in the source git repository.2729 Our tool takes as input at least two data files present as tables of quantitative information, with samples as rows and features as columns. A list of names corresponding to the names of these data blocks are required. A file containing class information is also required as a list of newline separated values. Examples of these data and class files for three case studies are included in the source git repository. Other command line arguments are also possible pertaining to distance metrics of choice for prediction, number of features to select and others. A full description of these can be obtained by running Rscript run_pipeline.R -h, which will list every flag in detail. Because of the number of command line arguments, an option is provided to pass these parameters as a json file to the pipeline. Examples of these json files for three case studies are included in the source git repository.

Example data included within the multiomics package

Regarding input data, some example data29 is provided as part of our R package.

library (multiomics)
data (BPH2819)
names (BPH2819)
# [1] "classes"     "metabolome"     "proteome"     "transcriptome"
sapply (BPH2819, dim)
# $classes
# NULL
# $metabolome
# [1]  12 153
# $proteome
# [1]  12 1451
# $transcriptome
# [1]  12 2771

Alternatively, you may download this from our git repository directly. This is a subset of sepsis data generated in a separate publication.29

Example processing workflow

We provide a fully processed dataset as a guide for the user. The steps below can be reproduced by downloading the R data object with the following command:

url <- paste(
   "https://github.com/tyronechen/SARS-CoV-2/",
   "raw/master/results/case_study_3/data. RData",
   sep=""
)
download.file (url, "RData.RData")
load("RData.RData")
ls()
# [1] "argpath"                "argv"                "classes"
# [4] "contrib"                "corr_cutoff"         "correlations"
# [7] "data"                   "data_imp"            "data_names"
# [10] "data_pca_multilevel"   "data_plsda"          "data_splsda"
# [13] "design"                "diablo"              "diablo_input"
# [16] "diablo_keepx"          "diablo_ncomp"        "dimensions"
# [19] "dist_diablo"           "dist_plsda"          "dist_splsda"
# [22] "export"                "heatmaps"            "i"
# [25] "input_data"            "linkage"             "low_var"
# [28] "mappings"              "metabolomics_diablo" "metabolomics_plsda"
# [31] "metabolomics_splsda"   "missing"             "optimal_params"
# [34] "optimal_params_values" "outdir"              "paths"
# [37] "pca_impute"            "pca_withna"          "pch"
# [40] "perf_diablo"           "plot"                "plsda_ncomp"
# [43] "rdata"                 "splsda_keepx"        "splsda_ncomp"
# [46] "tuned_diablo"          "tuned_splsda"        "url"
# [49] "x"                     "y"

Inspecting the minimum required input (classes and data) reveals the following:

# number of samples
> length (classes)
# [1] 12
# data dimensions
sapply (data, dim)
#     Metabolomics_GC_MS Proteomics_MS1_DDA RNA_Seq
# [1,]                    12                 12      12
# [2,]                   153               1451    2771
table (classes)
# classes
# RPMI Sera
#    6    6
head (data$Metabolomics_GC_MS[,1:3])
#        X3.Aminoglutaric.acid HMDB0000005 HMDB0000008
# RPMI_0            -1.7814083   -9.103010   -3.471373
# RPMI_1            -1.9108074   -5.401229   -3.488496
# RPMI_2            -1.5458964  -10.898804   -2.845025
# RPMI_3            -2.1842312   -9.563557   -1.232155
# RPMI_4            -1.3106881   -4.755440   -1.723564
# RPMI_5            -0.9600247   -4.771127   -1.403044

Data preprocessing

First, data is filtered if associated options are specified by the user. Features with missing values across sample groups are discarded by default. The user can also choose to filter out features (columns) exceeding a certain threshold of missing values.

Imputing missing values is optional as PLS-derived methods can function without this step. However, we include this information in case the user would like to perform this step manually. Remaining missing values can be imputed by the user-specified --icomp flag. Imputation is effective when the quantity of missing values is <20% of the data. To investigate if the data has been significantly changed, the user can plot a correlation plot of the principal components before and after imputation. Since imputation can take a long time, especially for large datasets, the imputed data is saved by default and the user can load it in directly as input if desired.

If the study design is longitudinal (e.g. has repeated measurements on the same sample), then the --pch flag should be enabled by the user. The user should pass in a file with the same format as the classes file, but containing information regarding the repeated measurements.23,30 Providing this information allows the pipeline to adjust for this internally.

Method parameters

Most of the parameters for the machine learning algorithms are specified by the user. These cover the three methods PLSDA (partial least squares discriminant analysis), sPLSDA (sparse PLSDA) and multi-block sPLSDA (also known as DIABLO). The underlying methods are implemented within the mixOmics software package and more information is available on their website http://mixomics.org/. For each method, a distance metric is specified, either “max.dist”, “centroids.dist” or “mahalanobis.dist”. Unlike PLSDA, sPLSDA and multi-block sPLSDA focus on selecting subset of the most relevant features and therefore require a user-specified list describing the quantity of features to be selected from the data. The number of components to derive for each method is also provided. For this section, several exploratory runs with a wide range can be carried out to find the optimal configuration of features, e.g. starting at 5,10,30,50,100, inspecting subsequent output and further narrowing the range. The user can specify a few additional special parameters to the multi-block sPLSDA (block.splsda) function. The linkage parameter is a continuous value from 0 to 1, and describes the type of analysis, with a value closer to 0 prioritising class discrimination and a value closer to 1 prioritising correlation between data sets. Meanwhile, setting the number of multi-block sPLSDA components to 0 causes the pipeline to perform parameter tuning internally. Note that this can take a long time, and scales exponentially per added block of omics data. The user can also specify the number of cpus to be used for parallel processing, which mainly affects parameter tuning. Using our example, these arguments are provided here:

> argv
# …
[[1]]
[1] FALSE

$help
[1] FALSE

$low_var
[1] FALSE

$mini_run
[1] FALSE

$progress_bar
[1] TRUE

$opts
[1] NA

$json
[1] NA

$classes
[1] "BPH2819_info_all.tsv"

$classes_secondary
[1] NA

$dropna_classes
[1] FALSE

$dropna_prop
[1] 0.6

$data
[1] "Staphylococcus_aureus_BPH2819_Metabolomics_GC_MS/BPH2819.tsv"
[2] "Staphylococcus_aureus_BPH2819_Proteomics_MS1_DDA/BPH2819.tsv"
[3] "Staphylococcus_aureus_BPH2819_RNA_Seq/BPH2819.tsv"

$data_names
[1] "Metabolomics_GC_MS" "Proteomics_MS1_DDA" "RNA_Seq"

$force_unique
[1] TRUE

$mappings
[1] NA

$ncpus
[1] 24

$diablocomp
[1] 2

$linkage
[1] 0.1

$diablo_keepx
[1] NA

$icomp
[1] 12

$zero_as_na
[1] TRUE

$replace_missing
[1] TRUE

$pcomp
[1] 10

$plsdacomp
[1] 2

$splsdacomp
[1] 2

$splsda_keepx
[1] NA

$dist_plsda
[1] "centroids.dist"

$dist_splsda
[1] "centroids.dist"

$dist_diablo
[1] "centroids.dist"

$cross_val
[1] "Mfold"

$cross_val_nrepeat
[1] 50

$cross_val_folds
[1] 5

$contrib
[1] "max"

$corr_cutoff
[1] 0.1

$optimal_params
[1] NA# …

Result and quality control metrics visualisation

Results as well as quality control metrics (including cross-validation error rates) are exported in a series of plots and compiled into a pdf [Figure 3]. They can also be accessed internally from our provided R data object. Some sample output is shown below.

65d9cfbb-0594-404c-97aa-fd21adac26ab_figure3.gif

Figure 3. Example results visualisation.

Example results visualisation. (a) Upper left: Multi-block sPLSDA (DIABLO) correlation plots, with numbers showing the Pearson correlation between omics data, and corresponding scatter plots. (b) Upper right: Clustered image maps show the relationship between variables and omics data blocks. (c) Lower left: Barplots of loading weights show the contributions of variables towards each biological condition for each block. (d) Lower right: Circos plot depicts the high multivariate correlations between the selected features from each block. Red and blue colours indicate positive and negative correlations respectively.

Output control

Pipeline output can be controlled by specifying a number of flags. By default, the pipeline deposits data in the current working directory. This behaviour can be easily modified. Setting outfile_dir specifies the master output directory. A R data object containing objects shown in the loaded RData file can be renamed with the rdata option, generating a file similar to the one used in this example. The plot flag defines the pdf file containing all graphical output as a multi-page pdf of all plots generated in the pipeline. A reproducible script is generated and named by the user with the args flag (this defaults to Rscript.sh).

> argv
# continued from previous

$outfile_dir
[1] "/path/to/outdir"

$rdata
[1] "./data. RData"

$plot
[1] "./Rplots.pdf"

$args
[1] "Rscript.sh"
# …

Reproducibility and integration with mixOmics

Finally, the pipeline has a limited check-pointing built-in. At each milestone in the pipeline, the relevant output is saved and written out as a RData file, similar to the one presented above. This allows the user to manually inspect the data and adjust it to their needs where needed. In the case of completed output, the user can further customise plots and data exports for publication or downstream analysis. Importantly, data objects are compatible with core mixOmics functions, and allows seamless integration with the mixOmics suite of tools if the user intends to extend or perform their own custom analysis workflows.

Author contributions

Conceptualization, S. T, T. C; Data Curation, S. T, T. C; Formal Analysis, K-A. L-C, T. C; Funding Acquisition, K-A. L-C, S. T; Methodology, A. J. A, K-A. L-C; Project Administration, S. T; Resources, S. T; Supervision, K-A. L-C, S. T; Software, A. J. A, K-A. L-C, T. C; Validation, A. J. A, K-A. L-C, S. T, T. C; Visualization, A. J. A, K-A. L-C, Writing Original Draft Preparation, S. T, T. C; Writing Review & Editing, A. J. A, K-A. L-C, S. T, T. C.

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 06 Jul 2021
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Chen T, Abadi AJ, Lê Cao KA and Tyagi S. multiomics: A user-friendly multi-omics data harmonisation R pipeline [version 2; peer review: 2 not approved] F1000Research 2023, 10:538 (https://doi.org/10.12688/f1000research.53453.2)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 1
VERSION 1
PUBLISHED 06 Jul 2021
Views
65
Cite
Reviewer Report 08 Dec 2021
Arjun Krishnan, Department of Computational Mathematics, Science, and Engineering & Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824-1226, USA 
Not Approved
VIEWS 65
In this article, Chen and colleagues present an R pipeline for multi-omics data analysis that can potentially accept unrefined data and produce convenient outputs. The pipeline is available as an R package and as Docker/Singularity containers. It is built on ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Krishnan A. Reviewer Report For: multiomics: A user-friendly multi-omics data harmonisation R pipeline [version 2; peer review: 2 not approved]. F1000Research 2023, 10:538 (https://doi.org/10.5256/f1000research.56837.r89102)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Nov 2023
    Tyrone Chen, School of Biological Sciences, Monash University, Clayton, 3800, Australia
    17 Nov 2023
    Author Response
    > In this article, Chen and colleagues present an R pipeline for multi-omics data analysis that can potentially accept unrefined data and produce convenient outputs. The pipeline is available as ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 Nov 2023
    Tyrone Chen, School of Biological Sciences, Monash University, Clayton, 3800, Australia
    17 Nov 2023
    Author Response
    > In this article, Chen and colleagues present an R pipeline for multi-omics data analysis that can potentially accept unrefined data and produce convenient outputs. The pipeline is available as ... Continue reading
Views
85
Cite
Reviewer Report 29 Nov 2021
Javad Zahiri, Department of Neuroscience, University of California San Diego, La Jolla, California, USA 
Not Approved
VIEWS 85
In the present study, "multiomics: A user-friendly multi-omics data harmonisation R pipeline" the authors tried to develop a tool for multiple omics integration and analysis. The problem is of utmost importance. However, the tool needs more work to be suitable ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Zahiri J. Reviewer Report For: multiomics: A user-friendly multi-omics data harmonisation R pipeline [version 2; peer review: 2 not approved]. F1000Research 2023, 10:538 (https://doi.org/10.5256/f1000research.56837.r98916)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 17 Nov 2023
    Tyrone Chen, School of Biological Sciences, Monash University, Clayton, 3800, Australia
    17 Nov 2023
    Author Response
    > In the present study, "multiomics: A user-friendly multi-omics data harmonisation R pipeline" the authors tried to develop a tool for multiple omics integration and analysis. The problem is of ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 17 Nov 2023
    Tyrone Chen, School of Biological Sciences, Monash University, Clayton, 3800, Australia
    17 Nov 2023
    Author Response
    > In the present study, "multiomics: A user-friendly multi-omics data harmonisation R pipeline" the authors tried to develop a tool for multiple omics integration and analysis. The problem is of ... Continue reading

Comments on this article Comments (0)

Version 2
VERSION 2 PUBLISHED 06 Jul 2021
Comment
Alongside their report, reviewers assign a status to the article:
Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
Sign In
If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

The email address should be the one you originally registered with F1000.

Email address not valid, please try again

You registered with F1000 via Google, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Google account password, please click here.

You registered with F1000 via Facebook, so we cannot reset your password.

To sign in, please click here.

If you still need help with your Facebook account password, please click here.

Code not correct, please try again
Email us for further assistance.
Server error, please try again.