ALL Metrics
-
Views
-
Downloads
Get PDF
Get XML
Cite
Export
Track
Method Article
Revised

Ensemble machine learning modeling for the prediction of artemisinin resistance in malaria

[version 5; peer review: 1 approved, 2 approved with reservations]
PUBLISHED 25 Jun 2020
Author details Author details
OPEN PEER REVIEW
REVIEWER STATUS

This article is included in the Artificial Intelligence and Machine Learning gateway.

This article is included in the Software and Hardware Engineering gateway.

Abstract

Resistance in malaria is a growing concern affecting many areas of Sub-Saharan Africa and Southeast Asia. Since the emergence of artemisinin resistance in the late 2000s in Cambodia, research into the underlying mechanisms has been underway.
The 2019 Malaria Challenge posited the task of developing computational models that address important problems in advancing the fight against malaria. The first goal was to accurately predict artemisinin drug resistance levels of Plasmodium falciparum isolates, as quantified by the IC50. The second goal was to predict the parasite clearance rate of malaria parasite isolates based on in vitro transcriptional profiles.
In this work, we develop machine learning models using novel methods for transforming isolate data and handling the tens of thousands of variables that result from these data transformation exercises. This is demonstrated by using massively parallel processing of the data vectorization for use in scalable machine learning. In addition, we show the utility of ensemble machine learning modeling for highly effective predictions of both goals of this challenge. This is demonstrated by the use of multiple machine learning algorithms combined with various scaling and normalization preprocessing steps. Then, using a voting ensemble, multiple models are combined to generate a final model prediction.

Keywords

malaria, Plasmodium falciparum, machine learning, parallel computing, Apache Spark, big data, artemisinin, bioinformatics, DREAM Competition

Revised Amendments from Version 4

In this revision, we have addressed the latest reviewer's comments around the applicability of this work to the broader field of parasitology, also have also included some new work from Birnbaum et al. 2020. In addition, we also discuss the need for lab-based (in vitro) validation of these in silico findings, though this work helps to highlight the most probable/important things to test first.

It should be noted that there is some specific information that reviewers are asking about the input data that we do not have yet as this is part of a larger DREAM Challenge. Once this information is public, we will likely add it to this work as well.

See the authors' detailed response to the review by Jeremy Burrows
See the authors' detailed response to the review by Stefan Jaeger and Sameer K. Antani

Introduction

Malaria is a serious disease caused by parasites belonging to the genus Plasmodium which are transmitted by Anopheles mosquitoes in the genus. The World Health Organization (WHO) reports that there were 219 million cases of malaria in 2017 across 87 countries1. Plasmodium falciparum poses one of greatest health threats in Southeast Asia, being responsible for 62.8% of malaria cases in the region in 20171.

Artemisinin-based therapies are among the best treatment options for malaria caused by P. falciparum2. The use of artemisinin in combination with other drugs, called artemisinin combination therapies, are the best treatment options today against malaria infections.

However, emergence of artemisinin resistance in Thailand and Cambodia in 2007 has been cause for research3. While there are polymorphisms in the kelch domain–carrying protein K13 in P. falciparum that are known to be associated with artemisinin resistance, many of the underlying molecular mechanisms that confer resistance remains unknown4. In early 2020, Birnbaum et al. discovered that the highly-conserved gene kelch13 is associated with a molecular mechanism that allows the parasite to feed on host erythrocytes by endocytosis of hemoglobin5. Given that artemisinin is activated by hemoglobin degradation products, these mutations can confer resistance to artemisinin.

The established pharmacodynamics benchmark for P. falciparum sensitivity to artemisinin-based therapy is the parasite clearance rate6,7. Resistance to artemisinin-based therapy is considered to be present with a parasite clearance rate greater than five hours8. By understanding the genetic factors that affect resistance in malaria, targeted development can occur in an effort to abate further resistance or infections of resistant strains.

Previous research has shown success in applying similar machine learning methods in the explanation of genetic differences in plants9, fungi10, and even humans11. Previous work in machine learning-based tropical disease research, including malaria and other diseases, has shown effective in drug discovery12,13 and in the understanding of degradomes14. Also, other machine learning work in malaria has focused on the identification and diagnosis of malaria using image classification1517.

In this work, we create multiple machine learning-based models to address these issues around artemisinin resistance and parasite clearance. Given that the interpretation and analysis of many genes and their effects on resistance may be tedious, machine learning allows for a more power investigation into this relationship. Plus, we employ model explainability methods to help rank particular genes of interest in the malaria genome.

Prediction of artemisinin IC50

First, we created a machine learning model to predict the IC50 of malaria parasites based on transcription profiles of experimentally-tested isolates. IC50, also known as the half maximal inhibitory concentration, is the drug concentration at which 50% of parasites die. This value indicates a population of parasites’ ability to withstand various doses of anti-malarial drugs, such as artemisinin.

Methods

Training data was obtained from the 2019 DREAM Malaria Challenge18,19. The training data consists of gene expression data of 5,540 genes of 30 isolates from the malaria parasite, Plasmodium falciparum. For each malaria parasite isolate, transcription data was collected at two time points [6 hours post invasion (hpi) and 24 hpi], with and without treatment of dihydroartemisinin (the metabolically active form of artemisinin), each with a biological replicate. This yields a total of at eight data points for each isolate. The initial form of the training dataset contains 272 rows and 5,546 columns, as shown in Table 1.

Table 1. Initial IC50 model training data format.

Note that for Treatment, UT represents untreated samples and DHA represents samples treated with dihydroartemisinin.

Sample_NameIsolateTimepointTreatmentBioRepGene1Gene5540DHA_IC50
isolate_01.24HR.DHA.BRep1isolate_0124HRDHABRep10.008286-2.486532.177
isolate_01.24HR.DHA.BRep2isolate_0124HRDHABRep2-0.87203-1.794572.177
isolate_01.24HR.UT.BRep1isolate_0124HRUTBRep10.03948-2.495172.177
isolate_01.24HR.UT.BRep2isolate_0124HRUTBRep20.125177-1.735312.177
isolate_01.6HR.DHA.BRep1isolate_016HRDHABRep11.354956-0.821692.177
isolate_01.6HR.DHA.BRep2isolate_016HRDHABRep2-0.21807-1.618392.177
isolate_01.6HR.UT.BRep1isolate_016HRUTBRep11.31135-2.622622.177
isolate_01.6HR.UT.BRep2isolate_016HRUTBRep20.997722-2.247192.177
isolate_30.6HR.UT.BRep2isolate_306HRUTBRep2-0.26639-1.722731.363

The transcription data was collected as described in Table 2. The transcription data set consists of 92 non-coding RNAs (denoted by gene IDs that begins with ’MAL’), while the rest are protein coding genes (denoted by gene IDs that start with ’PF3D7’). The feature to predict is DHAIC 50.

Table 2. IC50 training data information.

(Adapted from Turnbull et al., (2017) PLoS One23).

Training Set
ArrayBozdech
PlatformPrinted
Plexes1
Unique Probes10159
Range of Probes per ExonN/A
Average Probes per Gene2
Genes Represented5363
Transcript Isoform ProfilingNo
ncRNAsNo
Channel Detection MethodTwo Color
ScannerPowerScanner
Data ExtractionGenePix Pro

Data preparation

We used Apache Spark20 to pivot the dataset such that each isolate was its own row and each of the transcription values for each gene and attributes (i.e. timepoint, treatment, biological replicate) combination was its own column. This exercise transformed the training dataset from 272 rows and 5,546 columns to 30 rows and 44,343 columns, as shown in Table 3. We completed this pivot by slicing the data by each of the eight combinations of timepoint, treatment, and biological replicate, dynamically renaming the variables (genes) for each slice, and then joining all eight slices back together.

By using the massively parallel architecture of Spark, this transformation can be completed in a minimal amount of time on a relatively small cluster environment (e.g., <10 minutes using a 8-worker/36-core cluster with PySpark on Apache Spark 2.4.3).

Table 3. Post-transformation format of the IC50 model training data.

IsolateDHA_IC50hr24_trDHA_br1_Gene1hr24_trDHA_br2_Gene1hr6_trUT_br2_Gene5540
isolate_012.1770.008286-0.87203-2.24719
isolate_301.3630.1950320.031504-1.72273

Lastly, the dataset is then vectorized using the Spark VectorAssembler, and converted into a Numpy21-compatible array. Vectorization allows for highly scalable parallelization of the machine learning modeling in the next step.

Machine learning

We used the Microsoft Azure Machine Learning Service23 as the tracking platform for retaining model performance metrics as the various models were generated. For this use case, 498 machine learning models were trained using various scaling techniques and algorithms. Scaling and normalization methods are shown in Table 14. We then created two ensemble models of the individual models using Stack Ensemble and Voting ensemble methods.

The Microsoft AutoML package24 allows for the parallel creation and testing of various models, fitting based on a primary metric. For this use case, models were trained using Decision Tree, Elastic Net, Extreme Random Tree, Gradient Boosting, Lasso Lars, LightGBM, RandomForest, and Stochastic Gradient Decent algorithms along with various scaling methods from Maximum Absolute Scaler, Min/Max Scaler, Principal Component Analysis, Robust Scaler, Sparse Normalizer, Standard Scale Wrapper, Truncated Singular Value Decomposition Wrapper (as defined in Table 14). All of the machine learning algorithms are from the scikit-learn package25 except for LightGBM, which is from the LightGBM package26. The settings for the model sweep are defined in Table 4. The ‘Preprocess Data?’ parameter enables the scaling and imputation of the features in the data. Note that these models were evaluated using random sampling of the input training dataset provided by the DREAM Challenge, though the evaluation within the challenge was performed on an unlabelled testing dataset. The metrics in the Results section below reflect the evaluation on the sampled training data.

Table 4. Model search parameter setting for the IC50 model search.

ParameterValue
TaskRegression
Number of Iterations500
Iteration Timeout (minutes)20
Max Cores per Iteration7
Primary MetricNormalized Root Mean
Squared Error
Preprocess Data?True
k-Fold Cross-Validations20 folds

Once the 498 individual models were trained, two ensemble models (voting ensemble and stack ensemble) were then created and tested. The voting ensemble method makes a prediction based on the weighted average of the previous models’ predicted regression outputs whereas the stacking ensemble method combines the previous models and trains a meta-model using the elastic net algorithm based on the output from the previous models. The model selection method used was the Caruana ensemble selection algorithm27.

Results

The voting ensemble model (using soft voting) was selected as the best model, having the lowest normalized Root Mean Squared Error (RMSE), as shown in Table 5. The top 10 models trained are reported in Table 6. Having a normalized RMSE of only 0.1228 and a Mean Absolute Percentage Error (MAPE) of 24.27%, this model is expected to accurately predict IC50 in malaria isolates. See Figure 1 for a visualization of the experiment runs and Figure 2 for the distribution of residuals on the best model.

Table 5. Model metrics of the final IC50 ensemble model.

MetricValue
Normalized Root Mean Squared Error0.1228
Root Mean Squared Log Error0.1336
Normalized Mean Absolute Error0.1097
Mean Absolute Percentage Error24.27
Normalized Median Absolute Error0.1097
Root Mean Squared Error0.3398
Explained Variance-1.755
Normalized Root Mean Squared Log Error0.1379
Median Absolute Error0.3035
Mean Absolute Error0.3035

Table 6. Top 10 training iterations of the IC50 model search, evaluated by Root Mean Squared Error.

Note that the top performing model (VotingEnsemble) is the final IC50 model discussed in this paper.

IterationPreprocessorAlgorithmNormalized RMSE
498VotingEnsemble0.12283293
370SparseNormalizerRandomForest0.132003138
432StandardScalerWrapperLightGBM0.133180215
240SparseNormalizerRandomForest0.133779391
430StandardScalerWrapperRandomForest0.137084337
65SparseNormalizerRandomForest0.13884791
56SparseNormalizerRandomForest0.14417843
68MaxAbsScalerExtremeRandomTrees0.151925822
470StandardScalerWrapperRandomForest0.152262231
181MinMaxScalerLightGBM0.15279075
3a5846bd-c5fc-4e6f-937a-5c9d7f22c2d2_figure1.gif

Figure 1. Root Mean Squared Error (RMSE) by iteration of the IC50 model search.

Each orange dot is an iteration with the blue line representing the minimum RMSE up to that iteration.

3a5846bd-c5fc-4e6f-937a-5c9d7f22c2d2_figure2.gif

Figure 2. Model residuals of the final IC50 ensemble model.

Prediction of resistance status

The second task of this work was to create a machine learning model that can predict the parasite clearance rate (fast versus slow) of malaria isolates. When resistance rates change in a pathogen, it can be indicative of regulatory changes in the pathogen’s genome. These changes can be exploited for the prevention of further resistance spread. Thus, a goal of this work is to understand genes important in the prediction of artemisinin resistance. The relationship of this use case to the first is that parasite clearance is a measure of the effectiveness of a treatment regimen. While the first use case looked at the drug concentration, this use case looks into the speed at which the parasites are cleared as a result of a standard treatment.

Methods

An in vivo transcription data set from Mok et al., (2015) Science28 was used to predict the parasite clearance rate of malaria parasite isolates based on in vitro transcriptional profiles (see Table 8).

The training data consists of 1,043 isolates with 4,952 genes from the malaria parasite Plasmodium falciparum. For each malaria parasite isolate, transcription data was collected for various PF3D7 genes. The form of the training dataset contains 1,043 rows and 4,957 columns, as shown in Table 7. The feature to predict is ClearanceRate.

Table 7. Format of the clearance rate model training data.

Sample_NamesCountryAsexual_
stage hpi_
Kmeans_GrpPF3D7_
0100100
PF3D7_1480100ClearanceRate
GSM1427365Bangladesh20B0.226311-0.64171Fast
GSM1427537Cambodia12C0.81096-1.72825Slow
GSM1428407Vietnam8A0.999095NaNFast

Table 8. Training dataset information from Mok et al., 201528.

Training Set
Number
of isolates
1043
Isolate
collection site
Southeast Asia
Isolate
collection years
2012–2014
Sample
type
in vivo
Synchronized?Not synchronized
Number
of samples per isolate
1
Additional attributes~18 hpi,
Non-perturbed, No replicates

Data preparation

The training data for this use case did not require the same pivoting transformations as in the last use case as each record describes a single isolate. Thus, only the vectorization of the data was necessary, which was performed using the Spark VectorAssembler and then converted into a Numpy-compatible array22. Note that this vectorization only kept the numerical columns, which excludes the Country, Kmeans_Grp, and Asexual_stage_hpi_ attributes as they are either absent or contain non-matching factors (i.e. different set of countries) in the testing data.

Machine learning

Once the 98 individual models were trained, two ensemble models (voting ensemble and stack ensemble) were then created and tested as before. Model search parameters are shown in Table 9.

Table 9. Model search parameter settings for the clearance rate model search.

ParameterValue
TaskRegression
Number of iterations100
Iteration timeout (minutes)20
Max cores per iteration14
Primary metricweighted area under the receiver
operating characteristic curve (AUC)
Preprocess data?True
k-Fold cross-validations10 folds

Results

The voting ensemble model (using soft voting) was selected as the best model, having the highest area under the receiver operating characteristic curve (AUC), as shown in Table 11. The top 10 of the 100 models trained are reported in Table 10. Having a weighted AUC of 0.87 and a weighted F1 score of 0.80, this model is expected to accurately predict isolate clearance rates. A confusion matrix of the predicted results versus actuals is shown in Table 12. See Figure 3 for a visualization of the experiment runs and see Figure 4 and Figure 5 for the ROC and Precision-Recall curves on the best model. Note that these models were evaluated using random sampling of the input training dataset provided by the DREAM Challenge, though the evaluation within the challenge was performed on an unlabelled testing dataset. The metrics in the Results section below reflect the evaluation on the sampled training data.

Note that the averages reported in Figure 4 and Figure 5 are defined as follows:

  • ‘micro’: Computed globally by combining the true positives and false positives from each class at each cutoff.

  • ‘macro’: The arithmetic mean for each class. This does not take class imbalance into account.

  • ‘weighted’: The arithmetic mean of the score for each class, weighted by the number of true instances in each class (support).

Table 10. Top 10 training iterations of the clearance rate model search.

Note that the top performing model (VotingEnsemble) is the clearance rate model discussed in this paper.

IterationPreprocessorAlgorithmWeighted AUC
98VotingEnsemble0.870471056
99StackEnsemble0.865215516
65StandardScalerWrapperLogisticRegression0.86062304
33StandardScalerWrapperLogisticRegression0.859881677
97StandardScalerWrapperLogisticRegression0.858791006
44StandardScalerWrapperLogisticRegression0.856105491
73StandardScalerWrapperLogisticRegression0.855502817
17RobustScalerSVM0.855452622
43StandardScalerWrapperLogisticRegression0.855368394
61RobustScalerLogisticRegression0.854357599

Table 11. Model metrics of the final clearance rate ensemble model.

MetricAccuracy
f1_score_macro0.6084
AUC_micro0.9445
AUC_macro0.8475
recall_score_micro0.8101
recall_score_weighted0.8101
average_precision_score_weighted0.8707
weighted_accuracy0.8585
precision_score_macro0.6217
precision_score_micro0.8101
balanced_accuracy0.6027
log_loss0.4455
recall_score_macro0.6027
precision_score_weighted0.8
AUC_weighted0.8705
average_precision_score_micro0.8911
f1_score_weighted0.8019
f1_score_micro0.8101
norm_macro_recall0.354
average_precision_score_macro0.7344
accuracy0.8101
3a5846bd-c5fc-4e6f-937a-5c9d7f22c2d2_figure3.gif

Figure 3. Area under the receiver operating characteristic curve (AUC) by iteration of the clearance rate model.

Each orange dot is an iteration with the blue line representing the maximum AUC up to that iteration.

3a5846bd-c5fc-4e6f-937a-5c9d7f22c2d2_figure4.gif

Figure 4. Receiver operating characteristic curve of the clearance rate model.

3a5846bd-c5fc-4e6f-937a-5c9d7f22c2d2_figure5.gif

Figure 5. Precision-Recall curve of the clearance rate model.

Feature importance

Feature importances were calculated using mimic-based model explanation of the ensemble model29. The mimic explainer works by training global surrogate models to mimic blackbox models (i.e. complex models that are difficult to explain). The surrogate model is an interpretable model, trained to approximate the predictions of a black box model as accurately as possible30. In Figure 6 and Table 13, the feature importance values for each class ("Slow", "Fast", and NULL) are shown. This shows which genes are important in the prediction of clearance rate.

The mimic explainer was opted over other traditional methods such as principal component analysis (PCA) because of its ability to provide clearer interpretations into the features’ importance. PCA occludes the true values of individual features by summarising multiple features together. Given that insights into particular genes’ importance on resistance were desired here, the mimic explainer provides this output in a more straightforward manner.

3a5846bd-c5fc-4e6f-937a-5c9d7f22c2d2_figure6.gif

Figure 6. Derived feature importances using the black box mimic model explanation of the clearance rate model.

(Shown: Top 30 genes.)

Table 12. Confusion matrix of clearance rate predictions versus actual.

ClassPrediction
Fast (ID: 0)Slow (ID: 1)Null (ID: 2)
ActualFast (ID: 0)661740
Slow (ID: 1)1151840
Null (ID: 2)630

Table 13. Top 10 PF3D7 genes (features) in predicting clearance rate.

RankPF3D7 GeneSlow ImportanceFast ImportanceNULL ImportanceOverall Importance
1PF3D7_12453000.2920.1180.0000.410
2PF3D7_11077000.0200.2740.0000.294
3PF3D7_13284000.1540.1230.0000.277
4PF3D7_13720000.1720.0950.0000.267
5PF3D7_11156000.0830.1790.0000.262
6PF3D7_06081000.0000.0000.2430.243
7PF3D7_05230000.1540.0870.0000.241
8PF3D7_12053000.0000.0020.1970.199
9PF3D7_11291000.0080.1910.0000.199

Table 14. Scaling function information for machine learning model search31.

Scaling and NormalizationDescription
StandardScaleWrapperStandardize features by removing the mean and scaling to unit variance
MinMaxScalarTransforms features by scaling each feature by that column’s minimum and maximum
MaxAbsScalerScale each feature by its maximum absolute value
RobustScalarThis Scaler features by their quantile range
PCALinear dimensionality reduction using singular value decomposition of the data to
project it to a lower dimensional space
TruncatedSVDWrapperThis transformer performs linear dimensionality reduction by means of truncated
singular value decomposition.
Contrary to PCA, this estimator does not center the data before computing the
singular value decomposition. This means it can efficiently work with sparse matrices.
SparseNormalizerEach sample (each record of the data) with at least one non-zero component is
re-scaled independently of other samples so that its norm (L1 or L2) equals one

Discussion

By using distributed processing of the data preparation, we can successfully shape and manage large malaria datasets. We efficiently transformed a matrix of over 40,000 genetic attributes for the IC50 use case and over 4,000 genetic attributes for the resistance rate use case. This was completed with scalable vectorization of the training data, which allowed for many machine learning models to be generated. By tracking the individual performance results of each machine learning model, we can determine which model is most useful. In addition, ensemble modeling of the various singular models proved effective for both tasks in this work. While the number of training observations for each use case stand to be improved, the usage of adequate cross-validation can help to stabilize the risk of over fitting models to such a small dataset. Also note that there is an imbalance in the number of samples in each class in the clearance rate experiment, which stands to be remedied in future work. There are over double the number of “Fast” clearance rate isolates compared to “Slow”. This can be seen in the variation in model performance as indicated by the macro average Precision-Recall curve (Figure 5).

The resulting model performance of both the IC50 model and the clearance rate model show relatively adequate fitting of the data for their respective predictions. While additional model tuning may provide a lift in model performance, we have demonstrated the utility of ensemble modeling in these predictive use cases in malaria. In both models, we show that IC50 and clearance rate can be effectively predicted using transcriptomic analysis data with machine learning. By extension, this is also predicting the phenotypic result of the genetic variations among the samples as is relates to resistance.

In a broader sense for the field parasitology, this exercise helps to quantify the importance of genetic features, spotlighting potential genes that are significant in artemisinin resistance. The merit of this work showcases the utility of machine learning to assist in the understanding of the underlying genetic/transcriptomic mechanisms that affect drug performance.

Specific examples include PF3D7 1245300, the most important feature in predicting slow parasite clearance. PF3D7 1245300 is the gene that codes for the NEDD8-conjugating enzyme UBC12 (UniProt ID: Q8I4X8), a ligase used in the ubiquitin conjugating pathway. Another example, PF3D7 1107700 is the most important gene for fast clearance rate. PF3D7 1107700 (UniProt ID: Q8IIS5) is important in the regulation of the cell cycle, specifically in the maturation of ribosomal RNAs and in the formation of the large ribosomal subunit. Future in vitro experiments of this in silico work should be performed to validate these findings. While biological confirmations of these genetic factors are needed, this analysis helps to rank the most probable factors by importance, therefore reducing the in vitro work to be performed.

These two examples of important genes identified here along with the other may one day be the target for future drugs or may prove integral in the overall understanding of how resistance works in P. falciparum. The utility of these models will help in directing development of alternative treatments or coordination of combination therapies in resistant infections and provides an example of the usage of machine learning in the identification of important genetic feature in infectious disease research.

Preprint

An earlier version of this article can be found on bioRxiv (doi:10.1101/856922).

Data availability

Underlying data

The challenge datasets are available from Synapse (https://www.synapse.org/; Synapse ID: syn18089524). Access to the data requires registration and agreement to the conditions for use at: https://www.synapse.org/#!Synapse: syn18089524.

Challenge documentation, including the detailed description of the Challenge design, data description, and overall results can be found at: https://www.synapse.org/#!Synapse:syn16924919/wiki/583955.

Whole genome expression profiling of artemsinin-resistant Plasmodium falciparum field isolates, Accession number GSE59099: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE59099.

Zenodo: colbyford/malaria_DREAM2019: Ensemble Machine Learning Modeling for the Prediction of Artemisinin Resistance in Malaria - Initial Code Release for Research Publication (F1000). https://doi.org/10.5281/zenodo.359045932.

This project contains the following underlying data:

  • /SubChallenge1/data/sc1_X_train.pkl (Pickle file of the SubChallenge 1 independent variables, pivoted by Timepoint, Treatment, and BioRep.)

  • /SubChallenge1/data/sc1_y_train.pkl (Pickle file of the SubChallenge 1 dependent variable, DHA_IC50.)

  • /SubChallenge2/data/sc2_X_train.pkl (Pickle file of the SubChallenge 2 independent variables.)

  • /SubChallenge2/data/sc2_y_train.pkl (Pickle file of the SubChallenge 2 dependent variable, ClearanceRate.)

Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Software availability

Comments on this article Comments (0)

Version 5
VERSION 5 PUBLISHED 29 Jan 2020
Comment
Author details Author details
Competing interests
Grant information
Copyright
Download
 
Export To
metrics
Views Downloads
F1000Research - -
PubMed Central
Data from PMC are received and updated monthly.
- -
Citations
CITE
how to cite this article
Ford CT and Janies D. Ensemble machine learning modeling for the prediction of artemisinin resistance in malaria [version 5; peer review: 1 approved, 2 approved with reservations] F1000Research 2020, 9:62 (https://doi.org/10.12688/f1000research.21539.5)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
track
receive updates on this article
Track an article to receive email alerts on any updates to this article.

Open Peer Review

Current Reviewer Status: ?
Key to Reviewer Statuses VIEW
ApprovedThe paper is scientifically sound in its current form and only minor, if any, improvements are suggested
Approved with reservations A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
Not approvedFundamental flaws in the paper seriously undermine the findings and conclusions
Version 5
VERSION 5
PUBLISHED 25 Jun 2020
Revised
Views
4
Cite
Reviewer Report 11 Jul 2022
Alyssa E Barry, IMPACT, School of Medicine, Deakin University and Burnet Institute, Melbourne, Victoria, Australia 
Myo Naung, Walter and Eliza Hall Institute, University of Melbourne, Melbourne, Australia 
Approved with Reservations
VIEWS 4
This is commendable work by the authors making use of two publicly available datasets – the 2019 DREAM Malaria Challenge and an in vivo transcription data set from Mok et al., (2015) to create a confident machine learning model predicting ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Barry AE and Naung M. Reviewer Report For: Ensemble machine learning modeling for the prediction of artemisinin resistance in malaria [version 5; peer review: 1 approved, 2 approved with reservations]. F1000Research 2020, 9:62 (https://doi.org/10.5256/f1000research.27621.r139927)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Views
13
Cite
Reviewer Report 10 Jul 2020
Sameer K. Antani, Communications Engineering Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA 
Approved
VIEWS 13
The authors have updated the article but there is limited update on machine learning elements, or it is not apparent from the web-based interface. I am willing to accept the article related to prior comments, and also recognizing that the ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Antani SK. Reviewer Report For: Ensemble machine learning modeling for the prediction of artemisinin resistance in malaria [version 5; peer review: 1 approved, 2 approved with reservations]. F1000Research 2020, 9:62 (https://doi.org/10.5256/f1000research.27621.r65540)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
Version 4
VERSION 4
PUBLISHED 21 May 2020
Revised
Views
22
Cite
Reviewer Report 22 Jun 2020
Jeremy Burrows, Medicines for Malaria Venture (MMV), Geneva, Switzerland 
Approved with Reservations
VIEWS 22
Page 3: Artemisinin-based therapies are described as being among the best treatment options for falciparum malaria. ACTs are the mainstay therapy and are, definitively, the best treatment options. This should be altered.

Page 3: The underlying biology ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Burrows J. Reviewer Report For: Ensemble machine learning modeling for the prediction of artemisinin resistance in malaria [version 5; peer review: 1 approved, 2 approved with reservations]. F1000Research 2020, 9:62 (https://doi.org/10.5256/f1000research.26770.r63887)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 25 Jun 2020
    Colby Ford, University of North Carolina at Charlotte, USA
    25 Jun 2020
    Author Response
    Thank you for your review. We have added the additional context about ACTs, and the kelch13 gene from the Birnbaum paper. In addition, we have included information about how this ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 25 Jun 2020
    Colby Ford, University of North Carolina at Charlotte, USA
    25 Jun 2020
    Author Response
    Thank you for your review. We have added the additional context about ACTs, and the kelch13 gene from the Birnbaum paper. In addition, we have included information about how this ... Continue reading
Version 3
VERSION 3
PUBLISHED 29 Apr 2020
Revised
Views
35
Cite
Reviewer Report 18 May 2020
Stefan Jaeger, National Library of Medicine, National Institutes of Health, Bethesda, USA 
Sameer K. Antani, Communications Engineering Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA 
Not Approved
VIEWS 35
The authors have addressed several, but not all of the reviewers’ comments. The description of the state-of-the-art could be stronger. For example, the authors should discuss the status quo in machine learning for malaria drug-resistance detection, and the status/results of ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Jaeger S and Antani SK. Reviewer Report For: Ensemble machine learning modeling for the prediction of artemisinin resistance in malaria [version 5; peer review: 1 approved, 2 approved with reservations]. F1000Research 2020, 9:62 (https://doi.org/10.5256/f1000research.25874.r62868)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 21 May 2020
    Colby Ford, University of North Carolina at Charlotte, USA
    21 May 2020
    Author Response
    We appreciate the reviewer's comments and have made some updates to the manuscript to reflect some figure quality issues and to address some points of confusion.

    In this revision, ... Continue reading
COMMENTS ON THIS REPORT
  • Author Response 21 May 2020
    Colby Ford, University of North Carolina at Charlotte, USA
    21 May 2020
    Author Response
    We appreciate the reviewer's comments and have made some updates to the manuscript to reflect some figure quality issues and to address some points of confusion.

    In this revision, ... Continue reading
Version 2
VERSION 2
PUBLISHED 04 Feb 2020
Revised
Views
38
Cite
Reviewer Report 17 Mar 2020
Stefan Jaeger, National Library of Medicine, National Institutes of Health, Bethesda, USA 
Sameer K. Antani, Communications Engineering Branch, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA 
Not Approved
VIEWS 38
The authors present a machine learning approach for detecting malaria drug-resistance based on genetic attributes. To this end, they train many different models, which they combine with known ensemble methods like voting. The detection of malaria drug resistance is an ... Continue reading
CITE
CITE
HOW TO CITE THIS REPORT
Jaeger S and Antani SK. Reviewer Report For: Ensemble machine learning modeling for the prediction of artemisinin resistance in malaria [version 5; peer review: 1 approved, 2 approved with reservations]. F1000Research 2020, 9:62 (https://doi.org/10.5256/f1000research.24636.r60584)
NOTE: it is important to ensure the information in square brackets after the title is included in all citations of this article.
  • Author Response 29 Apr 2020
    Colby Ford, University of North Carolina at Charlotte, USA
    29 Apr 2020
    Author Response
    We sincerely appreciate the reviewers' feedback on this work and have improved the article based on your recommendations.

    We have addressed each comment as follows in the article:
      ... Continue reading
    COMMENTS ON THIS REPORT
    • Author Response 29 Apr 2020
      Colby Ford, University of North Carolina at Charlotte, USA
      29 Apr 2020
      Author Response
      We sincerely appreciate the reviewers' feedback on this work and have improved the article based on your recommendations.

      We have addressed each comment as follows in the article:
        ... Continue reading

      Comments on this article Comments (0)

      Version 5
      VERSION 5 PUBLISHED 29 Jan 2020
      Comment
      Alongside their report, reviewers assign a status to the article:
      Approved - the paper is scientifically sound in its current form and only minor, if any, improvements are suggested
      Approved with reservations - A number of small changes, sometimes more significant revisions are required to address specific details and improve the papers academic merit.
      Not approved - fundamental flaws in the paper seriously undermine the findings and conclusions
      Sign In
      If you've forgotten your password, please enter your email address below and we'll send you instructions on how to reset your password.

      The email address should be the one you originally registered with F1000.

      Email address not valid, please try again

      You registered with F1000 via Google, so we cannot reset your password.

      To sign in, please click here.

      If you still need help with your Google account password, please click here.

      You registered with F1000 via Facebook, so we cannot reset your password.

      To sign in, please click here.

      If you still need help with your Facebook account password, please click here.

      Code not correct, please try again
      Email us for further assistance.
      Server error, please try again.