Prediction of Disease-related microRNAs through Integrating Attributes of microRNA Nodes and Multiple Kinds of Connecting Edges

Xuan, Ping; Li, Lingling; Zhang, Tiangang; Zhang, Yan; Song, Yingying

doi:10.3390/molecules24173099

Open AccessArticle

Prediction of Disease-related microRNAs through Integrating Attributes of microRNA Nodes and Multiple Kinds of Connecting Edges

¹

School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China

²

School of Mathematical Science, Heilongjiang University, Harbin 150080, China

^*

Author to whom correspondence should be addressed.

Molecules 2019, 24(17), 3099; https://0-doi-org.brum.beds.ac.uk/10.3390/molecules24173099

Submission received: 13 June 2019 / Revised: 9 August 2019 / Accepted: 14 August 2019 / Published: 26 August 2019

(This article belongs to the Special Issue Molecular Computing and Bioinformatics II)

Download

Browse Figures

Versions Notes

Abstract

:

Identifying disease-associated microRNAs (disease miRNAs) contributes to the understanding of disease pathogenesis. Most previous computational biology studies focused on multiple kinds of connecting edges of miRNAs and diseases, including miRNA–miRNA similarities, disease–disease similarities, and miRNA–disease associations. Few methods exploited the node attribute information related to miRNA family and cluster. The previous methods do not completely consider the sparsity of node attributes. Additionally, it is challenging to deeply integrate the node attributes of miRNAs and the similarities and associations related to miRNAs and diseases. In the present study, we propose a novel method, known as MDAPred, based on nonnegative matrix factorization to predict candidate disease miRNAs. MDAPred integrates the node attributes of miRNAs and the related similarities and associations of miRNAs and diseases. Since a miRNA is typically subordinate to a family or a cluster, the node attributes of miRNAs are sparse. Similarly, the data for miRNA and disease similarities are sparse. Projecting the miRNA and disease similarities and miRNA node attributes into a common low-dimensional space contributes to estimating miRNA-disease associations. Simultaneously, the possibility that a miRNA is associated with a disease depends on the miRNA’s neighbour information. Therefore, MDAPred deeply integrates projections of multiple kinds of connecting edges, projections of miRNAs node attributes, and neighbour information of miRNAs. The cross-validation results showed that MDAPred achieved superior performance compared to other state-of-the-art methods for predicting disease-miRNA associations. MDAPred can also retrieve more actual miRNA-disease associations at the top of prediction results, which is very important for biologists. Additionally, case studies of breast, lung, and pancreatic cancers further confirmed the ability of MDAPred to discover potential miRNA–disease associations.

Keywords:

miRNA–disease associations; projection of node attributes; nonnegative matrix factorization; projection of connecting edges; low-dimensional feature vector

1. Introduction

MicroRNAs (miRNAs) are small noncoding, single-stranded RNAs encoded by endogenous genes with a length of approximately 22–24 nucleotides [1,2,3,4]. MiRNAs play important regulatory roles by targeting messenger RNA for splicing or translational inhibition in animals and plants [5]. Increasing evidences shows that miRNAs are involved in the development and progression of many diseases [6,7,8,9]. Therefore, identifying the regulatory relationships between diseases and miRNAs can help researchers explore the pathogenesis of disease.

Early studies mainly used biological experiments to obtain high-accuracy experimental results that fundamentally proved the associations of miRNAs and diseases. However, experimental methods are costly and time-consuming and have low success rates. In recent years, researchers have increasingly turned to computational biology to predict disease miRNAs, which has achieved good results. Our previous work can be divided into two categories. The first category [10,11,12,13] is the inference of candidate diseases based on the regulatory relationships of miRNAs and target miRNAs. Since the number of experimentally validated target miRNAs is insufficient, a set of putative targets is typically inferred by a prediction program. Next, we use the target miRNA and genes associated with known diseases to calculate miRNAs similarities. However, the results of the prediction program have high false-positive rates, reducing the performance of such methods.

The second kind of method is mainly based on miRNAs with similar functions that are typically associated with similar diseases, which is useful for predicting disease-related candidates [14,15,16,17,18]. First, Wang et al. [19] used miRNA-associated diseases to calculate miRNAs similarities. Previous studies were conducted to build miRNA networks based on miRNAs similarities and random walking around the network to obtain network topology information [20,21,22] to infer miRNA–disease associations. Some methods used miRNA similarities to model nonnegative matrix factorization [23,24,25] to predict diseases miRNAs. These methods rely on specific diseases associated with related known miRNAs and do not apply to new diseases without related known miRNAs. In a heterogeneous network with information of disease similarities, miRNA similarities, and miRNA–disease associations, there are many different methods for predicting disease-related candidates. Some methods use machine learning methods [26,27,28] such as ensemble learning [29] to predict disease-associated miRNAs. For example, path information [30] has been used in heterogeneous networks to predict associations between diseases miRNAs and candidate miRNAs associated with diseases could be predicted by matrix factorization or random walks on heterogeneous networks [31]. However, most methods do not consider the node attributions of miRNA or low-dimensional projection representation of miRNAs and diseases.

Rfam [32] incorporated multiple miRNA with similar mature miRNA sequences into the same miRNA family through multi-sequence alignment. There is a consistent seed region among miRNAs in the same family. The seed region refers to the 2–8 bases at the 5′ end of a mature miRNA, which is the key region for the interaction between a miRNA and target gene. Therefore, miRNAs belonging to the same family may regulate similar target genes and thus may be associated with similar diseases. Previous studies showed that some human miRNAs are distributed very close to each other in the genome (<20 kb), i.e., they are distributed in clusters. Multiple miRNAs belonging to the same cluster typically transcribe synchronously and perform certain functions in coordination. Thus, miRNAs in the same cluster are more likely to be associated with similar diseases. Therefore, obtaining information on the encoding of families and clusters of miRNAs is necessary [33,34]. Based on miRNA node attributions, we can project miRNA similarities matrix, disease similarities matrix, and miRNA node attributions to obtain a representative low-dimensional space. Previous approaches to integrating miRNA families and cluster information did not project such information into low-dimensional feature spaces. The advantage of projection is that it extracts representative information on low-dimensional features, which in turn helps to improve predictive disease-associated miRNA performance.

We propose MDAPred, a new method for predicting the associations of candidate disease miRNAs. MDAPred integrates the node attributes of miRNAs and the related similarities and associations of miRNAs and diseases. MDAPred deeply integrates the projection of information such as miRNAs, diseases, miRNA families, and clusters in low-dimensional feature spaces. Projecting miRNAs and diseases and miRNA node attributions into a common low-dimensional space is useful for measuring the distance between miRNAs and diseases. The distance is closely related to the association of the miRNA with the disease. Because miRNAs with similar neighbours are more likely to be associated with similar diseases, the model makes full use of the miRNA’s neighbour information. Thus, a predictive model based on various projections and miRNA neighbour information was built and an iterative algorithm was developed to solve the model to obtain predictions of the associations of miRNAs and diseases. Experimental results based on cross-validation showed that MDAPred method has superior performance compared to several other state-of-the-art methods. Particularly, when focusing on the top part of the prediction results, MDAPred method successfully retrieved more real disease miRNAs. The case studies of three cancers further confirmed the ability of MDAPred to discover potential miRNA–disease associations.

2. Results and Discussion

2.1. Evaluation Metrics

We used 5-fold cross-validation as an evaluation method for predicting the miRNA and disease association performances. We randomly divided the associations of all known disease miRNAs into five equal parts. Of these, 4 were training sets for training the models and the remaining one was used as a test set for evaluation. We regarded the association in the test set as a positive sample and association between all unobserved miRNAs and diseases as a negative sample. In our association prediction ranking, a higher ranking of positive samples indicated better prediction performance.

Using a model based on nonnegative matrix factorization, we obtained predicted scores for miRNAs and diseases and ranked them in descending order. In this descending order, a higher positive example indicated better the prediction performance. For a pair of known associated diseases and miRNAs, if the association prediction score obtained by the model is higher than the threshold

δ

we set, it is judged as a positive sample. Otherwise, if the predicted score of the counter example is lower than

δ

, the sample is judged as negative. By varying the size of threshold

δ

, the corresponding true-positive rate (TPR) and false-positive rate (FPR) can be obtained and are defined as follows,

T P R = \frac{T P}{T P + F N}, F P R = \frac{F P}{T N + F P},

(1)

where TP is the number of positive samples, TN is the number of the negative samples, and FN is the number of positive samples misidentified as negative. Correspondingly, FP indicates the number of negative samples misidentified as positive. TPR indicates the proportion of positive samples correctly identified among the total positive samples, and FPR is the misidentified negative samples accounting for all negative samples. By changing the threshold

δ

, we can obtain different TPR and FPR values. These TPR and FPR were used to plot the receiver operating characteristic (ROC) curve. The overall predicted performance was evaluated by calculating the area under the ROC curve (AUC).

Since the ratio of the number of unobserved miRNA–disease associations (negative samples) to the number of known associations (positive samples) was 1:30, there was a serious class imbalance between the positive and negative samples. Therefore, we used the precision-recall (PR) curve, which is more convincing than the ROC curve [35], as another evaluation standard. Similarly, by changing the threshold, new precision and recall values can be obtained to draw the PR curve and the area of PR curve (AUPR) is calculated. The precision and recall values are defined as follows,

P r e c i s i o n = \frac{T P}{T P + F P}, R e c a l l = \frac{T P}{T P + F N},

(2)

where precision refers to the proportion of correctly recognized positive examples in the retrieved samples, while recall represents the ratio of correctly recognized positive examples to the total number of positive examples.

Additionally, biologists typically select the top miRNA candidates in the prediction results to verify their associations with diseases through biological experiments. In the prediction results of the top k, a larger number of positive samples appear to indicate more valuable predictions. Therefore, we calculated the recall rate of the top k, which is the ratio of positive samples in top k relative to the total positive samples, as another criterion for evaluating disease and miRNA performance.

Currently, the data for miRNA and disease association showed that most diseases are only associated with a few miRNAs, leading to a lack of sufficient association data to evaluate prediction models. Therefore, we selected 15 common diseases from the database for cross-validation and simulation experiments, each with a well-characterized disease and typically associated with at least 80 miRNAs.

2.2. Comparison with Other Methods

To better evaluate the predictive performance of MDAPred, we compared the method to GSTRW [22], BNPMDA [36], Liu’s method [37], PBMDA [30], and DMPred [23] as state-of-the-art methods for predicting miRNA and disease associations. We adjusted the hyperparameters of these comparison methods to achieve the best prediction performance. Based on the results of a cross-validation analysis, the value of hyperparametric

α_{1}, α_{2}, α_{3}, a n d α_{4}

of MDAPred was selected from

\{0.01, 0.1, 1, 10\}

. MDAPred showed the best performance when

α_{1} = 0.1,

α_{1} = 0.1,

α_{2} = 0.1

,

α_{3} = 0.1

, and

α_{4} = 0.1

. For the comparison method, we based the hyperparameters on the best parameters in the corresponding papers (

γ = θ = 0.2

,

α = β = 0.8, ω = 0.6

for GSTRW;

λ_{M} = \frac{1}{7}

,

λ_{D} = \frac{1}{10}

,

θ = \frac{1}{20}

for DMPred;

λ = 0.8, δ = 0.9, η = 0.1, λ = 0.5

for Liu’s method;

L = 3

,

α = 2.26

for PBMDA, and

α = 0.5, β = 0.5

for BNPMDA).

DMPred exploited nonnegative matrix factorization to predict candidate miRNAs and achieved better performance. You et al. proposed a method called PBMDA which inferred disease-related miRNA by exploiting the information of paths connecting miRNAs and disease. GSTRW is a prediction miRNA–disease association method based on random walk. Liu’s method inferred potential candidate miRNAs by exploiting the network topology information. BNPMDA predicted disease-related miRNA based on hierarchical clustering. Figure 1 demonstrates the receiver operating characteristic (ROC) and precision-recall (PR) curves of MDAPred and the other five methods.

As shown in Figure 1A and Table 1, MDAPred method achieved the best average performance (AUC = 0.964) among all 15 diseases that we considered. In particular, it outperformed DMPred by 3.1%, PBMDA by 9.1%, GSTRW by 15.8%, Liu’s method by 6.0%, and BNPMDA by 12.5%. We also listed the AUC of all six methods on 15 well-characterized human disease (Table 1), MDAPred yielded the best performed for 13 of the common diseases. GSTRW used disease similarities and miRNA similarities when predicting the candidate miRNAs but did not consider the disease miRNA associations. Therefore, GSTRW showed the lowest performance. As shown in Figure 1A, the ROC curves of both BNMPDA and PBMDA overlapped. PBMDA using path information performed better than the BNMPDA using layer clustering. Liu’s method achieved better results than the above two methods. Although these methods use different calculations, they make full use of the topology information of heterogeneous networks. DMPred based on nonnegative matrix factorization used network topology and the original features of miRNAs and diseases for predicting associations, which achieved a competitive prediction performance. MDAPred is also based on a nonnegative matrix algorithm. Unlike DMPred, this method considers not only node attributes but also uses projection to obtain the association prediction. Figure 1A and Table 1 show that MDAPred exhibited the best performance against 15 common diseases.

As shown in Figure 1B, the average PR curve of the 15 common diseases of MDAPred was higher than that of the other five methods. The average AUC of MDAPred was 10.3% better than DMPred, 16.7% better than PBMDA, 38% better than GSTRW, 14% better than Liu’s method, and 24.4% better than BNPMDA. Of the 15 common diseases, MDAPred showed the best performance in 14 of these diseases (Table 2).

A higher recall rate of the top k of miRNAs indicates that more true miRNAs associated with diseases are correctly identified. The top k average recall rate for 15 common diseases is shown in Figure 2. Under the various top k, MDAPred method recall was significantly higher than those of the other methods. For the top 30, MDAPred method showed a recall rate of 0.641, the top 60 recall rate was 0.862, and the top 90 recall rate was 0.965. The recall rate of the top 30 for DMPred method was 0.448, for the top 60 was 0.675, and for the top 90 was 0.791. Most recall values determined using PBMDA were close to those obtained using Liu’s method. The former’s top 30, top 60, and top 120 call values were 0.390, 0.580, and 0.680, respectively. The latter’s top 30, top 60, and top 120 call values were 0.402, 0.594, and 0.705, respectively. BNPMDA’s top 30, top 60, and top 90 were 0.465, 0.653, and 0.764 respectively. GSTRW method showed the worst performance, with a top 240 recall value of only 0.79.

In addition, to further verify that the AUCs and AUPRs of MDAPred were significantly higher than those of other methods, we perform a paired t-test. All paired t-test results were less than 0.05, which indicates that MDAPred’s performance was significantly better than that of other methods (Table 3).

2.3. Case Studies

To demonstrate the ability of MDAPred to discover high-quality candidate miRNAs, we conducted case studies of breast, pancreatic, and lung cancers. Because breast cancer is one of the most common cancers, we used it as an example to analyze its top 50 candidates in detail (Table 4).

Xie et al. used text mining techniques to extract the association between experimentally validated miRNAs and diseases [38]. These associations were further manually verified and have been incorporated into miRCancer database, which contains 632 cancer-associated 6323 miRNA–disease associations. dbDEMC [39] is a differentially expressed miRNA database in human cancers containing 2224 miRNAs differentially expressed in 36 cancers. As shown in Table 3, 39 of the 50 miRNA candidate genes were included in dbDEMC database and 21 candidates were included in miRCancer database. This suggests that these miRNAs are abnormally expressed in breast cancer and are associated with breast cancer.

PhenomiR database [45] contains miRNAs differentially expressed in diseased tissues compared to normal tissues. Twenty-six candidate miRNAs are present in PhenomiR database, indicating that they are upregulated or downregulated in breast cancer. Although hsa-mir-4480 [41] had a centrality score of 9. It was still described as a breast cancer-related miRNA in the SKBR3 network. Hsa-mir-885 [40] directly targets B7-H3 by association with the B7-H3 3′-UTR region, suggesting that hsa-mir-885 have a direct role in modulating B7-H3 protein expression in breast cancer. Chaluvally-Raghavan et al. [42] demonstrated that hsa-miR-569, which is overexpressed in a subset of ovarian and breast cancers, at least in part owing to the 3q26.2 amplicon, alters cell survival and proliferation. Xian Wang et al. [43] performed a differential expression profile analysis of hsa-mir-4454 in breast cancer cells. Junjun et al. [44] confirmed that hsa-mir-3135b is differentially expressed in the breast cancer cell line MCF7. Hsa-mir-6838 is marked “Unconfirmed” and thus not currently supported by the databases and the relevant literature.

Supplementary Table S1 lists the top 50 candidates associated with lung cancer. DbDEMC database contains 35 candidates showing abnormal expression in lung cancer, and 31 candidate miRNAs are present in miRCancer database, demonstrating their association with lung cancer disease. Thirty-seven candidate miRNAs are present in PhenomiR database, showing their expression levels significantly altered in lung cancer cells. NCIH460, a lung cancer cell line, was treated with a screening library, revealing the ability of hsa-mir-4480 [46] to inhibit the growth of lung cancer cells. Park et al. [47] showed that hsa-mir-1843 is significantly upregulated compared with normal lung tissue. Long noncoding RNA NEAT1 promotes non-small cell lung cancer progression through regulation of the hsa-mri-4262 pathway [48]. In addition, EZH2 and miR-4448 show mutual negative regulations for tumor progression via epithelial mesenchymal transition in small cell lung cancer [49]. Hsa-mir-3161 is listed as differentially expressed miRNAs in lung adenocarcinoma by Gou et al. [50]. Hsa-mir-3074-5p is also significantly correlated with small cell lung cancer metastasis [51].

For pancreatic cancer, the top 50 candidate associations are listed in Supplementary Table S2. Forty-eight and 18 candidates are present in dbDEMC and miRCancer databases, respectively, indicating that they are associated with the disease. Forty candidate miRNAs are present in the PhenomiR database, suggesting that the expression levels of this gene in pancreatic cancer cells significantly differ from those in normal tissues.

The data of disease and miRNA used herein was derived from the latest Human miRNA–Disease Database (HMDD, released in March 2019) [52], which contains 7908 miRNA–disease association pairs that have been validated by biological experiments. Disease terms from the American Medical Library (Mesh, hattp://www.ncbi.nlm.nih.gov/mesh) were used to construct directed acyclic graphs (DAGs) to calculate the semantic similarities of the disease. We obtained the disease phenotypic similarity [53] information from previous work. The information of 530 miRNA families is extracted from miRBase (version 22.1) [54]. According to previous studies, we obtained 1309 clusters by setting the distance between two miRNAs to no more than 20 kb.

The primary goal of the study was to predict disease–miRNA associations. To integrate miRNA similarities, disease similarities, miRNA–disease association, and miRNA node attributions, a model based on nonnegative matrix factorization was constructed (Figure 3), and then this model was solved with an iterative algorithm. This model can reveal association scores of miRNAs

m_{i}

and diseases

d_{j}

. A higher association score indicates a greater likelihood of an association.

3. Materials and Methods

3.1. Data Representation of miRNAs and Diseases

MiRNA similarities. It is well-known that miRNAs with similar functions are often associated with similar diseases. Wang et al. [19] successfully calculated miRNA similarities by using miRNA-associated diseases. For instance, diseases

d_{1}, d_{5}, d_{6}

are associated with miRNA

m_{a}

, while diseases

d_{2}, d_{4}, d_{5}, d_{6}

are associated with miRNA

m_{b}

and the similarity

M (m_{a}, m_{b})

of

S_{a} = \{d_{1}, d_{5}, d_{6}\}

and

S_{b} = \{d_{2}, d_{4}, d_{5}, d_{6}\}

is calculated as the similarity of

m_{a}

and

m_{b}

(Figure 3a). The miRNA similarity matrix is

M = [M_{i j}] \in ℜ^{N_{m} \times N_{m}}

, where

N_{m}

is the number of miRNAs and

M_{i j}

is the similarity of

m_{i}

and

m_{j}

. Generally,

M_{i j}

is more than or equal to 0; a higher score indicates greater similar between

m_{i}

and

m_{j}

.

Disease similarities. From the dual perspectives of disease semantics and phenotypes (signs and symptoms), we measured the similarity of two diseases. Generally, we used a DAG to represent disease-related semantic terms. A larger number of common terms on the DAG for two diseases reflects greater similarity between the two diseases. If the two diseases have more common phenotypes, then the two diseases are more similar. Therefore, we quantified the similarities of diseases based on the semantics and phenotype of the disease (Figure 3b). Xuan et al. [21,23,31,55] successfully integrated this information and calculated the similarity of diseases, which we obtained from the previous method. The similarity matrix

D = [D_{i j}] \in ℜ^{N_{d} \times N_{d}}

containing

N_{d}

diseases indicates the similarity of disease

d_{i}

and disease

d_{j}

; a larger value indicates greater similar, and the value of

D_{i j}

is generally between 0 and 1.

MiRNA–disease associations. According to the known associations between miRNAs and diseases, an associations matrix

A = [A_{i j}] \in ℜ^{N_{m} \times N_{d}}

was constructed (Figure 3c). Each row of the association matrix

A

corresponds to a miRNA, of which the column corresponds to a disease. If the miRNA

m_{i}

is associated with a disease

d_{j}

, then

A_{i j} = 1

. If

m_{i}

and

d_{j}

are not associated or no association has been observed so far, then

A_{i j} = 0

.

MiRNA node attributes.

C \in ℜ^{N_{m} \times (N_{f} + N_{c})}

is a miRNA family and cluster characteristic matrix, with the rows representing miRNAs and columns showing family or cluster information (Figure 3d). Vector

C_{i}

represent miRNA

m_{i}

subordinate to

N_{f}

family and

N_{c}

cluster, which are considered node attributes.

C_{i j} (C_{i (N_{f} + k)}) = 1

indicates that the miRNA belongs to the

j^{t h}

family or

k^{t h}

cluster; otherwise, the value is 0.

3.2. Prediction Models for Disease–miRNA Associations

A model based on nonnegative matrix factorization was constructed, which integrates miRNA similarities, disease similarities, miRNA and disease associations, as well as miRNA family and cluster information. Let

U \in ℜ^{N_{m} \times N_{d}}

indicate the predicted miRNA associated score with the disease.

N_{m}

is the number of miRNAs,

N_{d}

is the number of diseases, and

U_{i j}

is the score of the miRNA and disease association. A larger score means that

m_{i}

and

d_{j}

are more likely to be associated, and

U_{i j}

is typically greater than or equal to 0.

Projection of miRNA,disease, and node attributes. We projected miRNA disease-related information into low-dimensional space to extract representative low-dimensional feature vectors. For the miRNA,

M

denotes the miRNA similarities matrix, which is projected into the c-dimensional space.

X \in ℜ^{N_{m} \times c}

is a projection matrix of miRNA similarities,

M X \in ℜ^{N_{m} \times c}

represents the low-dimensional feature matrix of the miRNAs, and the

i^{t h}

row of

M X

represents the low-dimensional feature vector about

m_{i}

.

For the disease,

D

is the similarities matrix of the disease, which can be projected into the low-dimensional space, and the low-dimensional feature matrix can be obtained.

Y \in ℜ^{N_{d} \times c}

is a projection matrix of disease similarities,

D Y \in ℜ^{N_{d} \times c}

is the low-dimensional feature matrix of the disease, and the

j^{t h}

row of

D Y

represents the low-dimensional feature vector about

d_{j}

.

For the miRNA of the node attributes,

C \in ℜ^{N_{m} \times (N_{f} + N_{d})}

is the feature matrix of the family and cluster, which is projected into the low-dimensional space to obtain the low-dimensional feature matrix of the node attributes of the miRNA.

Z \in ℜ^{(N_{f} + N_{c}) \times c}

is the projection matrix of the node attributes.

C Z \in ℜ^{N_{m} \times c}

is a miRNA low-dimensional feature matrix with node attributes, and its

i^{t h}

row is a low-dimensional feature vector of the miRNA family and cluster.

Modelling miRNA-disease associations. In association matrix

A

, the values of all 1 represent the observed miRNA disease association, 0 indicates that an association has not been observed, and most values of 0 indicate that the miRNA is not associated with the disease. The association matrix

A

reflects the true associations between miRNAs and diseases. The element

U_{i j}

in the score matrix

U

indicates the possibility that the miRNA is associated with a disease. The evaluated score matrix

U

should be as consistent as possible with the actual correlation. The objective function is obtained as follows,

\underset{U \geq 0}{m i n} {‖ U - A ‖}_{F}^{2},

(3)

where

{‖ \cdot ‖}_{F}

is the Frobenius norm of a matrix.

Modelling similarities of miRNAs and diseases. The

i^{t h}

row of the low-dimensional feature matrix

(M X) \in ℜ^{N_{m} \times c}

represents the feature vectors of the miRNA

m_{i}

in the c-dimensional space. Similarly, the

j^{t h}

column of

{(D Y)}^{T} \in ℜ^{c \times N_{d}}

represents the feature vector of the disease

d_{j}

in the c-dimensional space. The closer the miRNA

m_{i}

is to the disease

d_{j}

in the c-dimensional space, i.e., the larger the value of

{(M X)}_{i} {(D Y)}_{j}^{T}

, the more likely

m_{i}

is associated with

d_{j}

. An element of the score matrix

U_{i j}

denotes the probability that the predicted

m_{i}

is associated with

d_{j}

.

U_{i j}

and

{(M X)}_{i} {(D Y)}_{j}^{T}

should be as consistent as possible. An objective function expansion was obtained as follows,

\underset{U \geq 0}{m i n} ‖ U - A ‖_{F}^{2} + α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2},

(4)

where

α_{1}

is a hyperparameter for adjusting the contribution of the second section.

Modelling node attributes of miRNAs.

{(C Z)}_{i}

is the

i^{t h}

row of the matrix

(C Z) \in ℜ^{N_{m} \times c}

, which records the low-dimensional feature vector of

m_{i}

based on the miRNA and node attribution. Correspondingly,

{(D Y)}_{j}^{T}

is the

i j^{t h}

row of the matrix of

{(D Y)}^{T}

, which records the low-dimensional feature vector. The more consistent

{(C Z)}_{i}

and

{(D Y)}_{j}^{T}

, the more likely

m_{i}

is associated with

d_{j}

.

U_{i j}

is the estimated association score of

m_{i}

and

d_{j}

. To make the predicted score matrix

U

and actual calculated association as consistent as possible, our objective function is expanded,

\underset{U \geq 0}{m i n} ‖ U - A ‖_{F}^{2} + α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2} + α_{2} ‖ U - C Z {(D Y)}^{T} ‖_{F}^{2},

(5)

where

α_{2}

is the contribution of the adjustment node attribute information.

Modelling the topological structure of miRNAs. miRNAs and k neighbours are more likely to be associated with similar diseases.

A

graph model

S

based on similarity between miRNA and miRNA was created,

S_{i j} = \{\begin{matrix} 1, if miRNA m_{i} is one of the similar neighbours of miRNA m_{j} \\ 0, otherwise \end{matrix} .

(6)

The graph Laplacian matrix

L

of miRNA feature graph

S

is defined as follows,

L = W - S,

(7)

where

W

is a diagonal matrix with

W (i, i) = \sum_{j}^{N_{m}} S (i, j)

. Graph models are used to introduce smooth regularization, as miRNA with similar features should have similar diseases. The graph model is used to reflect the correlation and similarity of known indications between different miRNAs. The objective function is expanded as follows,

\underset{U \geq 0}{m i n} ‖ U - A ‖_{F}^{2} + α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2} + α_{2} ‖ U - C Z {(D Y)}^{T} ‖_{F}^{2} + α_{3} T r (U^{T} L U),

(8)

where

α_{3}

is a hyperparameter that adjusts the contribution of the regularization of graphs to the entire objective function and

T r ()

represents the trac of the matrix.

Consider the sparseness of associations. Since a disease is only associated with a limited number of miRNAs, we imposed

l_{1}

-regularization to

U

learn sparse associations. The objective function is expanded as follows,

\underset{U \geq 0}{m i n} ‖ U - A ‖_{F}^{2} + α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2} + α_{2} ‖ U - C Z {(D Y)}^{T} ‖_{F}^{2} + α_{3} T r (U^{T} L U) + α_{4} ‖ U ‖_{1} .

(9)

3.3. Optimization

The objective function

L (U, X, Y, Z)

in Equation (9) is a non-convex function, and it is impractical to obtain its global optimal solution. We divided the function into four subproblems to obtain a near-optimal solution for

L (U)

.

U

-subproblem. When

X, Y

, and

Z

are fixed, the subproblem for solving

U

is as follows,

\begin{matrix} \min_{U \geq 0} L (U) = ‖ U - A ‖_{F}^{2} + α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2} \\ + α_{2} ‖ U - C Z {(D Y)}^{T} ‖_{F}^{2} + α_{3} T r (U^{T} L U) + α_{4} ‖ U ‖_{1} . \end{matrix}

(10)

According to the trace property and Frobenius norm of the matrix,

L (U)

can be rewritten as follows,

\begin{array}{l} \underset{U \geq 0}{m i n} L (U) & = ‖ U - A ‖_{F}^{2} + α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2} + α_{2} ‖ U - C Z {(D Y)}^{T} ‖ + α_{3} T r (U^{T} L U) + α_{4} ‖ U ‖_{1} \\ = T r (U U^{T} - U A^{T} - A U^{T} + A A^{T}) \\ + α_{1} T r (U U - U D Y {(M X)}^{T} - M X {(D Y)}^{T} U^{T} + M X {(D Y)}^{T} D Y (M X)^{T}) \\ + α_{2} T r (U U^{T} - U D Y {(C Z)}^{T} - C Z {(D Y)}^{T} U^{T} + C Z {(D Y)}^{T} D Y (C Z)^{T}) \\ + α_{3} T r (U^{T} W U - U^{T} S U) + α_{4} {‖ U ‖}_{1}, \end{array}

(11)

where

T r ()

is the trace of the matrix. By setting the derivative of

L (U)

with respect to

U

to 0, we obtain the following equation,

2 U - 2 A + 2 α_{1} U - 2 α_{1} M X {(D Y)}^{T} + 2 α_{2} U - 2 α_{2} C Z {(D Y)}^{T} + 2 α_{3} W U - 2 α_{3} S U + α_{4} B = 0,

(12)

where

B = [B_{i j}] \in ℜ^{N_{m} \times N_{d}}

is a matrix of which the elements are all 1. By multiplying both sides of Equation (12) by

U_{i j}

, we obtain the following equation

\begin{matrix} (2 U - 2 A + 2 α_{1} U - 2 α_{1} M X {(D Y)}^{T} + 2 α_{2} U \\ - 2 α_{2} C Z {(D Y)}^{T} + 2 α_{3} W U - 2 α_{3} S U + α_{4} B)_{i j} U_{i j} = 0 . \end{matrix}

(13)

Finally, according to the coordinate descent algorithm, we can obtain

U_{i j}

’s updated formula by multiplying its current value with the ratio of the negative terms to the positive term of Equation (13),

U_{i j}^{n e w} \leftarrow U_{i j} \cdot \frac{2 A + 2 α_{1} M X {(D Y)}^{T} + 2 α_{2} C Z {(D Y)}^{T} + 2 α_{3} S U}{2 U + 2 α_{1} U + 2 α_{2} U + 2 α_{3} W U + α_{4} B} .

(14)

X-subproblem. When

U, Y

, and

Z

are fixed, the subproblem for solving

X

is,

\underset{X \geq 0}{m i n} L (X) = α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2} .

(15)

According to the trace property and Frobenius norm of the matrix,

L (X)

can be rewritten as,

L (X) = α_{1} T r (U U^{T} - U D Y {(M X)}^{T} - M X {(D Y)}^{T} U^{T} + M X {(D Y)}^{T} D Y (M X)^{T}) .

(16)

By setting the derivative of

L (X)

with respect to

X

to 0, we obtain the following equation,

- 2 α_{1} M^{T} U D Y + 2 α_{1} M^{T} M X {(D Y)}^{T} D Y = 0 .

(17)

By multiplying both sides of Equation (17) by

X_{i j}

, we obtain the following equation,

{(- 2 α_{1} M^{T} U D Y + 2 α_{1} M^{T} M X {(D Y)}^{T} D Y)}_{i j} X_{i j} = 0 .

(18)

X

’s updating rule by applying the coordinate gradient descent algorithm is as follows,

X_{i j}^{n e w} \leftarrow X_{i j} \cdot \frac{M^{T} U D Y}{M^{T} M X {(D Y)}^{T} D Y} .

(19)

Y

-subproblem. When

U, X

, and

Z

are fixed, the subproblem for solving

Y

is as follows,

L (Y) = α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2} + α_{2} ‖ U - C Z {(D Y)}^{T} ‖ .

(20)

We transformed the Frobenius norms of the matrices in

L (Y)

to their trace norms and rewrote

L (Y)

as follows,

\begin{array}{l} L (Y) & = α_{1} ‖ U - M X {(D Y)}^{T} ‖_{F}^{2} + α_{2} ‖ U - C Z {(D Y)}^{T} ‖ \\ = α_{1} T r (U U^{T} - U D Y {(M X)}^{T} - M X {(D Y)}^{T} U^{T} + M X (D Y)^{T} D Y X^{T} M^{T}) \\ + α_{2} T r (U U^{T} - U D Y {(C Z)}^{T} - C Z {(D Y)}^{T} U^{T} + C Z (D Y)^{T} D Y Z^{T} C) . \end{array}

(21)

By setting the derivative of

L (Y)

with respect to 0, we obtain the following,

2 α_{1} D^{T} D Y {(M X)}^{T} M X - 2 α_{1} D^{T} U^{T} M X + 2 α_{2} D^{T} D Y {(C Z)}^{T} C Z - 2 α_{2} D^{T} U^{T} C Z = 0 .

(22)

After both sides of Equation (22) are multiplied by

{(Y)}_{i j}

, we obtain the following equation,

(2 α_{1} D^{T} D Y {(M X)}^{T} M X - 2 α_{1} D^{T} U^{T} M X + 2 α_{2} D^{T} D Y {(C Z)}^{T} C Z - 2 α_{2} D^{T} U^{T} C Z)_{i j} Y_{i j} = 0 .

(23)

Y

’s updating rule by applying the coordinate gradient descent algorithm is as follows,

Y_{i j}^{n e w} \leftarrow Y_{i j} \cdot \frac{α_{1} D^{T} U^{T} M X + α_{2} D^{T} U^{T} C Z}{α_{1} D^{T} D Y {(M X)}^{T} M X + α_{2} D^{T} D Y {(C Z)}^{T} C Z} .

(24)

Z

-subproblem. When

U, X

, and

Y

are fixed, the subproblem for solving

Z

is as follows,

\underset{Z \geq 0}{m i n} L (Z) = α_{2} ‖ U - C Z {(D Y)}^{T} ‖_{F}^{2} .

(25)

Similar to the process for solving the subproblems of

U, X

, and

Y

,

L (Z)

is transformed first according to the characteristic of the matrix traces. The derivative is then determined with respect to

Z

. Finally, the gradient descent algorithm is applied to obtained the updated rule for

Z

,

Z_{i j}^{n e w} \leftarrow Z_{i j} \cdot \frac{C^{T} U D Y}{C^{T} C Z {(D Y)}^{T} D Y} .

(26)

The iterative process is over when the absolute difference of

L (U, X, Y, Z)

at two adjacent moments is less than a threshold (ε = 10⁻⁶) or when the maximum number of iterations, 100, is reached. Finally,

U_{i j}

is regarded as the estimated association score between miRNA

m_{i}

and disease

d_{j}

(Figure 4).

4. Conclusions

In the current study, MDAPred, a new method based on nonnegative matrix factorization, was developed for predicting potential disease–miRNA candidates. MDAPred deeply integrates the projections of multiple kinds of connecting edges and the node attributions of miRNAs to enhance the detection of the disease–miRNA associations. MDAPred also takes full advantage of information about the neighbours of miRNAs to capture the local topology of miRNAs. A sparse penalty was introduced to improve the performance of MDAPred. An iterative algorithm was proposed to obtain discriminative ability. MDAPred was superior to other tested methods not only in their AUCs but also in their AUPRs. Additionally, MDAPred is useful for biologists, as it can list more real disease–miRNA associations in its top ranking list. Case studies of three diseases revealed the ability of MDAPred to identify potential candidates. Therefore, MDAPred can serve as a prioritization tool for identifying real associations of disease miRNAs through wet-lab experiments.

Supplementary Materials

The following are available online. Table S1: The top 50 candidates for lung cancer. Table S2: The top 50 candidates for pancreatic cancer. Table S3: The top 50 potential candidates for 341 diseases.

Author Contributions

P.X. and L.L. conceived the prediction method, and L.L. wrote the paper. Y.Z. and Y.S. developed the computer programs. P.X. and T.Z. analyzed the results and revised the paper.

Funding

The work was supported by the Natural Science Foundation of China (61972135), the Natural Science Foundation of Heilongjiang Province (LH2019F049, LH2019A029), the China Postdoctoral Science Foundation (2019M650069), the Heilongjiang Postdoctoral Scientific Research Staring Foundation (BHL-Q18104), the Fundamental Research Foundation of Universities in Heilongjiang Province for Technology Innovation (KJCX201805), the Fundamental Research Foundation of Universities in Heilongjiang Province for Youth Innovation Team (RCYJTD201805), and Heilongjiang university key laboratory jointly built by Heilongjiang province and ministry of education (Heilongjiang university).

Conflicts of Interest

The authors declare no conflict of interest

References

Meister, G.; Tuschl, T. Mechanisms of gene silencing by double-stranded RNA. Nature 2004, 431, 343–349. [Google Scholar] [CrossRef] [PubMed]
Bartel, D.P. MicroRNAs: Genomics, biogenesis, mechanism, and function. Cell 2004, 116, 281–297. [Google Scholar] [CrossRef]
Ambros, V.R. The functions of animal microRNAs. Nature 2004, 431, 350–355. [Google Scholar] [CrossRef] [PubMed]
Ambros, V.R. microRNAs: Tiny Regulators with Great Potential. Cell 2001, 107, 823–826. [Google Scholar] [CrossRef] [Green Version]
Xu, Y.; Guo, M.; Liu, X.; Wang, C.; Liu, Y.; Liu, G. Identify bilayer modules via pseudo-3D clustering: applications to miRNA-gene bilayer networks. Nucl. Acids Res. 2016, 44, e152. [Google Scholar] [CrossRef]
Calin, G.A.; Croce, C.M. MicroRNA-Cancer Connection: The Beginning of a New Tale. Cancer Res. 2006, 66, 7390–7394. [Google Scholar] [CrossRef]
Xu, P.; Guo, M.; Hay, B.A. MicroRNAs and the regulation of cell death. Trends Genet. 2004, 20, 617–624. [Google Scholar] [CrossRef]
Cheng, A.M.; Byrom, M.; Shelton, J.; Ford, L.P. Antisense inhibition of human miRNAs and indications for an involvement of miRNA in cell growth and apoptosis. Nucl. Acids Res. 2005, 33, 1290–1297. [Google Scholar] [CrossRef] [Green Version]
Fernando, T.R.; Rodriguezmalave, N.I.; Rao, D.S. MicroRNAs in B cell development and malignancy. J. Hematol. Oncol. 2012, 5, 7. [Google Scholar] [CrossRef]
Lewis, B.P.; Shih, I.; Jonesrhoades, M.W.; Bartel, D.P.; Burge, C.B. Prediction of Mammalian MicroRNA Targets. Cell 2003, 115, 787–798. [Google Scholar] [CrossRef] [Green Version]
Li, X.; Wang, Q.; Zheng, Y.; Lv, S.; Ning, S.; Sun, J.; Huang, T.; Zheng, Q.; Ren, H.; Xu, J. Prioritizing human cancer microRNAs based on genes’ functional consistency between microRNA and cancer. Nucl. Acids Res. 2011, 39, e153. [Google Scholar] [CrossRef]
John, B.; Enright, A.J.; Aravin, A.A.; Tuschl, T.; Sander, C.; Marks, D.S. Human MicroRNA Targets. PLoS Biol. 2004, 2, e363. [Google Scholar] [CrossRef]
Kertesz, M.; Iovino, N.; Unnerstall, U.; Gaul, U.; Segal, E. The role of site accessibility in microRNA target recognition. Nat. Genet. 2007, 39, 1278–1284. [Google Scholar] [CrossRef]
Chen, X.; Huang, L. LRSSLMDA: Laplacian regularized sparse subspace learning for MiRNA-disease association prediction. PLoS Comput. Biol. 2017, 13, e1005912. [Google Scholar] [CrossRef]
Chen, X.; Xie, D.; Zhao, Q.; You, Z.-H. MicroRNAs and complex diseases: From experimental results to computational models. Brief. Bioinform. 2019, 20, 515–539. [Google Scholar] [CrossRef]
Chen, X.; Yan, C.C.; Zhang, X.; You, Z.-H.; Deng, L.; Liu, Y.; Zhang, Y.; Dai, Q. WBSMDA: Within and between score for MiRNA-disease association prediction. Sci. Rep. 2016, 6, 21106. [Google Scholar] [CrossRef]
Li, J.-Q.; Rong, Z.-H.; Chen, X.; Yan, G.-Y.; You, Z.-H. MCMDA: Matrix completion for MiRNA-disease association prediction. Oncotarget 2017, 8, 21187. [Google Scholar] [CrossRef]
Chen, X.; Wang, L.; Qu, J.; Guan, N.-N.; Li, J.-Q. Predicting miRNA–disease association based on inductive matrix completion. Bioinformatics 2018, 34, 4256–4265. [Google Scholar] [CrossRef]
Wang, D.; Wang, J.; Lu, M.; Song, F.; Cui, Q. Inferring the human microRNA functional similarity and functional network based on microRNA-associated diseases. Bioinformatics 2010, 26, 1644–1650. [Google Scholar] [CrossRef] [Green Version]
Chen, X.; Liu, M.; Yan, G. RWRMDA: Predicting novel human microRNA–disease associations. Mol. BioSyst. 2012, 8, 2792–2798. [Google Scholar] [CrossRef]
Xuan, P.; Han, K.; Guo, Y.; Li, J.; Li, X.; Zhong, Y.; Zhang, Z.; Ding, J. Prediction of potential disease-associated microRNAs based on random walk. Bioinformatics 2015, 31, 1805–1815. [Google Scholar] [CrossRef]
Chen, M.; Liao, B.; Li, Z. Global Similarity Method Based on a Two-tier Random Walk for the Prediction of microRNA–Disease Association. Sci. Rep. 2018, 8, 6481. [Google Scholar] [CrossRef]
Zhong, Y.; Xuan, P.; Wang, X.; Zhang, T.; Li, J.; Liu, Y.; Zhang, W. A non-negative matrix factorization based method for predicting disease-associated miRNAs in miRNA-disease bilayer network. Bioinformatics 2018, 34, 267–277. [Google Scholar] [CrossRef]
Tan, V.Y.F.; Fevotte, C. Automatic Relevance Determination in Nonnegative Matrix Factorization with the $(\beta)$-Divergence. IEEE Trans. Pattern Anal. Machine Intellig. 2013, 35, 1592–1605. [Google Scholar] [CrossRef]
Chen, X.; Yin, J.; Qu, J.; Huang, L. MDHGI: Matrix Decomposition and Heterogeneous Graph Inference for miRNA-disease association prediction. PLoS Comput. Biol. 2018, 14, e1006418. [Google Scholar] [CrossRef]
Chen, X.; Huang, L.; Xie, D.; Zhao, Q. EGBMMDA: Extreme gradient boosting machine for MiRNA-disease association prediction. Cell Death Disease 2018, 9, 3. [Google Scholar] [CrossRef]
Zhao, Y.; Chen, X.; Yin, J. Adaptive boosting-based computational model for predicting potential miRNA-disease associations. Bioinformatics 2019, 1, 9. [Google Scholar]
Wang, L.; You, Z.-H.; Chen, X.; Li, Y.-M.; Dong, Y.-N.; Li, L.-P.; Zheng, K. LMTRDA: Using logistic model tree to predict MiRNA-disease associations by fusing multi-source information of sequences and similarities. PLoS Comput. Biol. 2019, 15, e1006865. [Google Scholar] [CrossRef]
Chen, X.; Zhou, Z.; Zhao, Y. ELLPMDA: Ensemble learning and link prediction for miRNA-disease association prediction. RNA Biol. 2018, 15, 807–818. [Google Scholar] [CrossRef]
You, Z.; Huang, Z.; Zhu, Z.; Yan, G.; Li, Z.; Wen, Z.; Chen, X. PBMDA: A novel and effective path-based computational model for miRNA-disease association prediction. PLoS Comput. Biol. 2017, 13, e1005455. [Google Scholar] [CrossRef]
Xuan, P.; Shen, T.; Wang, X.; Zhang, T.; Zhang, W. Inferring disease-associated microRNAs in heterogeneous networks with node attributes. IEEE/ACM Trans. Comput. Biol. Bioinform. 2018, 20, 1. [Google Scholar] [CrossRef]
Gardner, P.P.; Daub, J.; Tate, J.G.; Nawrocki, E.P.; Kolbe, D.L.; Lindgreen, S.; Wilkinson, A.C.; Finn, R.D.; Griffithsjones, S.; Eddy, S.R. Rfam: Updates to the RNA families database. Nucl. Acids Res. 2009, 37, 136–140. [Google Scholar] [CrossRef]
Gu, C.; Liao, B.; Li, X.; Li, K. Network consistency projection for human miRNA-disease associations inference. Sci. Rep. 2016, 6, 36054. [Google Scholar] [CrossRef]
Lu, M.; Zhang, Q.; Deng, M.; Miao, J.; Guo, Y.; Gao, W.; Cui, Q. An analysis of human microRNA and disease associations. PLoS ONE 2008, 3, e3420. [Google Scholar] [CrossRef]
Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
Chen, X.; Xie, D.; Wang, L.; Zhao, Q.; You, Z.; Liu, H. BNPMDA: Bipartite Network Projection for MiRNA–Disease Association prediction. Bioinformatics 2018, 34, 3178–3186. [Google Scholar] [CrossRef]
Liu, Y.; Zeng, X.; He, Z.; Zou, Q. Inferring MicroRNA-Disease Associations by Random Walk on a Heterogeneous Network with Multiple Data Sources. IEEE/ACM Trans. Comput. Biol. Bioinform. 2017, 14, 905–915. [Google Scholar] [CrossRef]
Xie, B.; Ding, Q.; Han, H.; Wu, D. miRCancer: A microRNA–cancer association database constructed by text mining on literature. Bioinformatics 2013, 29, 638–644. [Google Scholar] [CrossRef]
Yang, Z.; Ren, F.; Liu, C.; He, S.; Sun, G.; Gao, Q.; Yao, L.; Zhang, Y.; Miao, R.; Cao, Y. dbDEMC: A database of differentially expressed miRNAs in human cancers. BMC Genom. 2010, 11, 1–8. [Google Scholar] [CrossRef]
Nygren, M.K.; Tekle, C.; Ingebrigtsen, V.; Mäkelä, R.; Krohn, M.; Aure, M.; Nunes-Xavier, C.; Perälä, M.; Tramm, T.; Alsner, J. Identifying microRNAs regulating B7-H3 in breast cancer: The clinical impact of microRNA-29c. Br. J. Cancer 2014, 110, 2072. [Google Scholar] [CrossRef]
Cilek, E.E.; Ozturk, H.; Dedeoglu, B.G. Construction of miRNA-miRNA networks revealing the complexity of miRNA-mediated mechanisms in trastuzumab treated breast cancer cell lines. PLoS ONE 2017, 12, e0185558. [Google Scholar] [CrossRef]
Chaluvally-Raghavan, P.; Zhang, F.; Pradeep, S.; Hamilton, M.P.; Zhao, X.; Rupaimoole, R.; Moss, T.; Lu, Y.; Yu, S.; Pecot, C.V. Copy number gain of hsa-miR-569 at 3q26. 2 leads to loss of TP53INP1 and aggressiveness of epithelial cancers. Cancer Cell 2014, 26, 863–879. [Google Scholar] [CrossRef]
Wang, X.; Jiang, D.; Xu, C.; Zhu, G.; Wu, Z.; Wu, Q. Differential expression profile analysis of miRNAs with HER-2 overexpression and intervention in breast cancer cells. Int. J. Clin. Exp. Pathol. 2017, 10, 5039–5062. [Google Scholar]
Chu, J.; Zhu, Y.; Liu, Y.; Sun, L.; Lv, X.; Wu, Y.; Hu, P.; Su, F.; Gong, C.; Song, E. E2F7 overexpression leads to tamoxifen resistance in breast cancer cells by competing with E2F1 at miR-15a/16 promoter. Oncotarget 2015, 6, 31944. [Google Scholar] [CrossRef]
Ruepp, A.; Kowarsch, A.; Schmidl, D.; Buggenthin, F.; Brauner, B.; Dunger, I.; Fobo, G.; Frishman, G.; Montrone, C.; Theis, F.J. PhenomiR: A knowledgebase for microRNA expression in diseases and biological processes. Genome Biol. 2010, 11, 1–11. [Google Scholar] [CrossRef]
Lee, T.; Shim, S.; Yu, U.; Park, H.O. Pharmaceutical Composition for Treating Cancer Comprising Microrna as Active Ingredient. WO 2016/137235, 1 September 2016. [Google Scholar]
Park, J.; Jeong, S.; Park, K.; Yang, K.; Shin, S. Expression profile of microRNAs following bone marrow-derived mesenchymal stem cell treatment in lipopolysaccharide-induced acute lung injury. Exp. Therap. Med. 2018, 15, 5495–5502. [Google Scholar] [CrossRef]
Sun, C.; Li, S.; Zhang, F.; Xi, Y.; Wang, L.; Bi, Y.; Li, D. Long non-coding RNA NEAT1 promotes non-small cell lung cancer progression through regulation of miR-377-3p-E2F3 pathway. Oncotarget 2016, 7, 51784. [Google Scholar] [CrossRef]
Koyama, N.; Ishikawa, Y.; Iwai, Y.; Aoshiba, K.; Nakamura, H.; Hagiwara, K. Mutual Negative Regulation of EZH2 and miR-4448 for Tumor Progression via Epithelial Mesenchymal Transition in Small Cell Lung Cancer; AACR: Atlanta, GA, USA, 2019. [Google Scholar]
Guo, H.; Chen, J.; Meng, F. Identification of novel diagnosis biomarkers for lung adenocarcinoma from the cancer genome atlas. Orig. Artic 2016, 9, 7908–7918. [Google Scholar]
Zhou, R.; Zhou, X.; Yin, Z.; Guo, J.; Hu, T.; Jiang, S.; Liu, L.; Dong, X.; Zhang, S.; Wu, G. Tumor invasion and metastasis regulated by microRNA-184 and microRNA-574-5p in small-cell lung cancer. Oncotarget 2015, 6, 44609. [Google Scholar] [CrossRef]
Li, Y.; Qiu, C.; Tu, J.; Geng, B.; Yang, J.; Jiang, T.; Cui, Q. HMDD v2.0: A database for experimentally supported human microRNA and disease associations. Nucl. Acids Res. 2014, 42, 1070–1074. [Google Scholar] [CrossRef]
Hoehndorf, R.; Schofield, P.N.; Gkoutos, G.V. Analysis of the human diseasome using phenotype similarity between common, genetic, and infectious diseases. Sci. Rep. 2015, 5, 10888. [Google Scholar] [CrossRef]
Kozomara, A.; Birgaoanu, M.; Griffiths-Jones, S. miRBase: From microRNA sequences to function. Nucl. Acids Res. 2018, 47, D155–D162. [Google Scholar] [CrossRef]
Xuan, P.; Han, K.; Guo, M.; Guo, Y.; Li, J.; Ding, J.; Liu, Y.; Dai, Q.; Li, J.; Teng, Z. Prediction of microRNAs Associated with Human Diseases Based on Weighted k Most Similar Neighbors. PLoS ONE 2013, 8, e70204. [Google Scholar] [CrossRef]

Figure 1. Receiver operating characteristic (ROC) and precision-recall (PR) curves of MDAPred and the other five methods. (A) ROC curves (B) PR curves.

Figure 2. Recall rates of 15 diseases under different top k.

Figure 3. Multiple data representations of miRNAs and diseases: (a) calculate miRNA similarities through miRNA–associated diseases, (b) calculate the similarities of disease by combining disease semantic similarities and disease phenotypic similarities, (c) establish association matrix A based on known associations between miRNAs and diseases, and (d) create a representation matrix of miRNA families and clusters.

Figure 4. Iterative algorithm for estimation of the miRNA–disease association scores.

Table 1. Areas under the ROC curves (AUCs) of MDAPred and other methods on 15 diseases.

Disease Name	AUC
Disease Name	MDAPred	DMPred	PBMDA	GSTRW	Liu’s Method	BNPMDA
Breast neoplasms	0.986	0.974	0.906	0.837	0.920	0.902
Hepatocellular carcinoma	0.982	0.931	0.910	0.791	0.929	0.900
Glioma	0.957	0.855	0.882	0.786	0.914	0.843
Acute myeloid leukemia	0.979	0.963	0.885	0.796	0.910	0.865
Lung neoplasms	0.964	0.944	0.862	0.813	0.906	0.855
Melanoma	0.978	0.910	0.849	0.758	0.893	0.839
Osteosarcoma	0.968	0.985	0.860	0.771	0.897	0.859
Ovarian neoplasms	0.970	0.967	0.888	0.844	0.918	0.877
Pancreatic neoplasms	0.956	0.821	0.879	0.833	0.902	0.870
Alzheimer Disease	0.968	0.958	0.833	0.816	0.875	0.830
Carcinoma, Renal Cell	0.964	0.894	0.856	0.784	0.900	0.854
Diabetes Mellitus, Type 2	0.964	0.936	0.870	0.870	0.905	0.869
Glioblastoma	0.938	0.951	0.849	0.759	0.889	0.843
Heart failure	0.962	0.959	0.884	0.814	0.909	0.882
Atherosclerosis	0.962	0.955	0.891	0.822	0.910	0.876
Average AUC	0.964	0.933	0.873	0.806	0.904	0.839

The bold values indicate the higher AUCs.

Table 2. AUPRs of MDAPred and other methods on 15 diseases.

Disease Name	AUPR
Disease Name	MDAPred	DMPred	PBMDA	GSTRW	Liu’s Method	BNPMDA
Breast neoplasms	0.818	0.800	0.718	0.389	0.725	0.566
Hepatocellular carcinoma	0.816	0.715	0.767	0.483	0.749	0.676
Glioma	0.613	0.175	0.390	0.224	0.436	0.386
Acute myeloid leukemia	0.544	0.466	0.386	0.122	0.408	0.324
Lung neoplasms	0.686	0.620	0.561	0.370	0.596	0.542
Melanoma	0.689	0.366	0.482	0.205	0.524	0.491
Osteosarcoma	0.601	0.620	0.356	0.181	0.373	0.327
Ovarian neoplasms	0.714	0.366	0.529	0. 400	0.236	0.496
Pancreatic neoplasms	0.692	0.569	0.457	0.333	0.556	0.478
Alzheimer Disease	0.522	0.351	0.136	0.086	0.485	0.220
Carcinoma, Renal Cell	0.481	0.206	0.314	0.135	0.143	0.299
Diabetes Mellitus, Type 2	0.549	0.398	0.259	0.132	0.356	0.268
Glioblastoma	0.533	0.284	0.346	0.161	0.303	0.336
Heart failure	0.599	0.393	0.301	0.134	0.348	0.300
Atherosclerosis	0.315	0.309	0.304	0.084	0.297	0.218
Average PR	0.603	0.500	0.436	0.233	0.463	0.359

The bold values indicate the higher AUPRs.

Table 3. Comparison of different methods based on AUCs with a paired t-test.

p-Value between MDAPred and Other Methods	DMPred	PBMDA	GSTRW	Liu’s Method	BNPMDA
p-values of ROC curves	2.4983 × 10⁻⁴¹	3.2311 × 10⁻⁵	6.3212 × 10⁻¹⁶	6.9812 × 10⁻⁸	2.9742 × 10⁻⁶
p-values of PR curves	2.2341 × 10⁻³⁵	1.8643 × 10⁻⁶	1.6542 × 10⁻⁶	3.4521 × 10⁻⁵	8.8432 × 10⁻⁴

Table 4. The top 50 breast cancer-related candidates.

Rank	MiRNA name	Evidence	Rank	MiRNA name	Description
1	hsa-mir-186	dbDEMC, PhenomiR	26	hsa-mir-885	literature [40]
2	hsa-mir-99b	dbDEMC, PhenomiR	27	hsa-mir-6838	Unconfirmed
3	hsa-mir-483	PhenomiR	28	hsa-mir-323a	dbDEMC, PhenomiR
4	hsa-mir-4480	literature [41]	29	hsa-mir-1244	dbDEMC
5	hsa-mir-181d	dbDEMC, PhenomiR, miRCancer	30	hsa-mir-361	PhenomiR, miRCancer
6	hsa-mir-28	dbDEMC, PhenomiR	31	hsa-mir-216a	dbDEMC, PhenomiR, miRCancer
7	hsa-mir-455	PhenomiR, miRCancer	32	hsa-mir-136	dbDEMC, PhenomiR
8	hsa-mir-154	dbDEMC, PhenomiR, miRCancer	33	hsa-mir-569	literature [42]
9	hsa-mir-330	dbDEMC, PhenomiR, miRCancer	34	hsa-mir-336	dbDEMC
10	hsa-mir-454	dbDEMC, PhenomiR	35	hsa-mir-325	dbDEMC, PhenomiR
11	hsa-mir-181	dbDEMC, PhenomiR, miRCancer	36	hsa-mir-571	dbDEMC
12	hsa-mir-208b	dbDEMC, PhenomiR	37	hsa-mir-95	dbDEMC, PhenomiR
13	hsa-mir-663	dbDEMC, PhenomiR	38	hsa-mir-517b	dbDEMC, PhenomiR, miRCancer
14	hsa-mir-133	dbDEMC, PhenomiR, miRCancer	39	hsa-mir-323	dbDEMC, PhenpmiR
15	hsa-mir-30	dbDEMC, PhenomiR, miRCancer	40	hsa-mir-633	dbDEMC
16	hsa-mir-504	dbDEMC	41	hsa-mir-1183	dbDEMC
17	hsa-mir-543	dbDEMC	42	hsa-mir-4454	literature [43]
18	hsa-mir-217	dbDEMC, PhenomiR, miRCancer	43	hsa-mir-705	dbDEMC
19	hsa-mir-33	dbDEMC, PhenomiR, miRCancer	44	hsa-mir-532	dbDEMC, PhenomiR
20	hsa-mir-211	dbDEMC, PhenomiR, miRCancer	45	hsa-mir-126a	dbDEMC, miRCancer
21	hsa-mir-449b	dbDEMC, PhenomiR, miRCancer	46	hsa-mir-1909	dbDEMC
22	hsa-mir-362	miRCancer	47	hsa-mir-539	dbDEMC, PhenomiR, miRCancer
23	hsa-mir-208	dbDEMC, PhenomiR	48	hsa-mir-520f	PhenomiR, miRCancer
24	hsa-mir-433	dbDEMC, PhenomiR, miRCancer	49	hsa-mir-498	miRCancer
25	hsa-mir-520e	dbDEMC, PhenomiR, miRCancer	50	hsa-mir-3135b	literature [44]

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xuan, P.; Li, L.; Zhang, T.; Zhang, Y.; Song, Y. Prediction of Disease-related microRNAs through Integrating Attributes of microRNA Nodes and Multiple Kinds of Connecting Edges. Molecules 2019, 24, 3099. https://0-doi-org.brum.beds.ac.uk/10.3390/molecules24173099

AMA Style

Xuan P, Li L, Zhang T, Zhang Y, Song Y. Prediction of Disease-related microRNAs through Integrating Attributes of microRNA Nodes and Multiple Kinds of Connecting Edges. Molecules. 2019; 24(17):3099. https://0-doi-org.brum.beds.ac.uk/10.3390/molecules24173099

Chicago/Turabian Style

Xuan, Ping, Lingling Li, Tiangang Zhang, Yan Zhang, and Yingying Song. 2019. "Prediction of Disease-related microRNAs through Integrating Attributes of microRNA Nodes and Multiple Kinds of Connecting Edges" Molecules 24, no. 17: 3099. https://0-doi-org.brum.beds.ac.uk/10.3390/molecules24173099

Article Menu

Prediction of Disease-related microRNAs through Integrating Attributes of microRNA Nodes and Multiple Kinds of Connecting Edges

Abstract

1. Introduction

2. Results and Discussion

2.1. Evaluation Metrics

2.2. Comparison with Other Methods

2.3. Case Studies

3. Materials and Methods

3.1. Data Representation of miRNAs and Diseases

3.2. Prediction Models for Disease–miRNA Associations

3.3. Optimization

4. Conclusions

Supplementary Materials

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI