A Hybrid Linear and Nonlinear Network-based Method on Multi-source MiRNA-Disease Association Prediction
-
摘要: 现有miRNA-疾病关联研究大多采用miRNA功能和疾病语义相似性作为输入,未考虑miRNA序列和疾病功能等相似性信息;在特征提取过程中忽略了线性与非线性特征间的信息互补,影响特征质量。因此提出一种miRNA-疾病关联预测模型GCNMSF,基于miRNA和疾病多源相似性信息,融合嵌入卷积注意块的图卷积网络学习的非线性特征和非负矩阵分解方法学习的线性特征,实现信息互补,以预测miRNA-疾病关联。实验结果表明,GCNMSF模型优于现有的miRNA-疾病关联预测方法,可以有效预测miRNA-疾病关联。Abstract: MiRNA is a single-stranded and small non-coding RNA, which is closely related to human diseases. Predicting miRNA-disease associations can help understand the pathogenesis of diseases at the molecular level, so as to provide basis for studying the prognosis, diagnosis, evaluation and treatment of diseases. In miRNA-disease association prediction, most methods used miRNA functional similarity and disease semantic similarity as input, they ignored the miRNA sequence similarity, disease functional similarity and hamming similarity. And in the feature extraction process, they not considered the information complementarity between the linear features and nonlinear features, which would affect the quality of feature extraction of miRNA and disease. Therefore, we propose a novel miRNA-disease association prediction model GCNMSF. First, we introduce the miRNA sequence similarity, disease semantic similarity and hamming similarity, and use similarity kernel fusion method to integrate multi-source similarities of miRNA and disease respectively. Then, we use the graph convolutional network to learn nonlinear features. And the convolutional attention block is embedded into GCN to optimize feature distribution. At the same time, the non-negative matrix factorization method is introduced to learn linear features of miRNA and disease to enrich the feature space which can improve the ability of predicting miRNA-disease associations. Finally, we fused the linear and nonlinear features of miRNA and disease to predict miRNA-disease associations. We use five-fold cross validation to evaluate GCNMSF and the experimental results show that our model is better than the existing methods. In addition, we conduct ablation experiment and case studies to evaluate the effectiveness and applicability of the model. The results of ablation experiment verify the fusion of multi-source similarity information and the combination of linear and nonlinear features are helpful for miRNA-disease association prediction. The case studies of lung and breast cancers further confirmed that GCNMSF can not only predict the potential miRNA-disease associations, but also discover the miRNA-disease associations of unknown diseases.
-
表 1 miRNA和疾病多源相似性数据
Table 1. Multi-source similarities of miRNA and disease
Similarity Database Dimension miRNA functional similarity $ {K}_{m,1} $ MISIM 495×495 miRNA sequence similarity $ {K}_{m,2} $ miRBase 495×495 Disease semantic similarity $ {K}_{d,1} $ MeSH 383×383 Disease functional similarity $ {K}_{d,2} $ HumanNet 383×383 miRNA hamming similarity $ {K}_{m,3} $ — 495×495 Disease hamming similarity $ {K}_{d,3} $ — 383×383 表 2 不同相似性组合消融实验数据表
Table 2. Ablation experiments with different similarity combinations
Similarity combination AUC AUPR F1_score MSS+DSS 0.9314 0.9324 0.8572 MFS+DSS 0.9322 0.9324 0.8594 MSS+DFS 0.9381 0.9383 0.8693 MFS+DFS 0.9364 0.9374 0.8659 MSS+DFS+DFS+DSS 0.9394 0.9391 0.8708 MSS+DFS+DFS+DSS+HMS 0.9452 0.9470 0.8748 表 3 肺癌相关miRNA预测实验数据表
Table 3. The top 50 predicted miRNAs associated with lung cancer
Rank miRNA Database Rank miRNA Database 1 hsa-mir-16 dbDEMC3; miR2Disease 26 hsa-mir-328 dbDEMC3 2 hsa-mir-195 dbDEMC3; miR2Disease 27 hsa-mir-148b dbDEMC3 3 hsa-mir-106b dbDEMC3 28 hsa-mir-23b dbDEMC3 4 hsa-mir-193b dbDEMC3 29 hsa-mir-99a dbDEMC3; miR2Disease 5 hsa-mir-141 dbDEMC3; miR2Disease 30 hsa-mir-196b dbDEMC3 6 hsa-mir-15a dbDEMC3 31 hsa-mir-302a dbDEMC3 7 hsa-mir-302b dbDEMC3 32 hsa-mir-452 dbDEMC3 8 hsa-mir-451a dbDEMC3 33 hsa-mir-122 dbDEMC3 9 hsa-mir-429 dbDEMC3; miR2Disease 34 hsa-mir-520a dbDEMC3 10 hsa-mir-378a unconfirm 35 hsa-mir-152 dbDEMC3 11 hsa-mir-342 dbDEMC3 36 hsa-mir-194 dbDEMC3 12 hsa-mir-296 unconfirm 37 hsa-mir-215 dbDEMC3; 13 hsa-mir-320a dbDEMC3 38 hsa-mir-92b dbDEMC3 14 hsa-mir-151a unconfirm 39 hsa-mir-376c dbDEMC3 15 hsa-mir-204 dbDEMC3; miR2Disease 40 hsa-mir-520d dbDEMC3 16 hsa-mir-302c dbDEMC3 41 hsa-mir-367 dbDEMC3 17 hsa-mir-149 dbDEMC3 42 hsa-mir-708 dbDEMC3 18 hsa-mir-130a dbDEMC3; miR2Disease 43 hsa-mir-345 dbDEMC3; miR2Disease 19 hsa-mir-625 dbDEMC3 44 hsa-mir-423 dbDEMC3; miR2Disease 20 hsa-mir-15b dbDEMC3 45 hsa-mir-520c unconfirm 21 hsa-mir-20b dbDEMC3 46 hsa-mir-650 dbDEMC3; miR2Disease 22 hsa-mir-10a dbDEMC3 47 hsa-mir-130b dbDEMC3 23 hsa-mir-129 dbDEMC3 48 hsa-mir-302d dbDEMC3 24 hsa-mir-373 dbDEMC3 49 hsa-mir-449b dbDEMC3 25 hsa-mir-139 dbDEMC3; miR2Disease 50 hsa-mir-520b dbDEMC3; miR2Disease 表 4 乳腺癌相关miRNA预测实验数据表
Table 4. The top 50 predicted miRNAs associated with breast cancer
Rank miRNA Database Rank miRNA Database 1 hsa-mir-21 dbDEMC3; miR2Disease 26 hsa-mir-182 dbDEMC3; miR2Disease 2 hsa-mir-155 dbDEMC3; miR2Disease 27 hsa-let-7c dbDEMC3 3 hsa-mir-17 dbDEMC3; miR2Disease 28 hsa-mir-223 dbDEMC3 4 hsa-mir-145 dbDEMC3; miR2Disease 29 hsa-mir-210 dbDEMC3; miR2Disease 5 hsa-mir-34a dbDEMC3 30 hsa-mir-19b dbDEMC3 6 hsa-mir-125b dbDEMC3; miR2Disease 31 hsa-mir-27a dbDEMC3; miR2Disease 7 hsa-mir-20a dbDEMC3; miR2Disease 32 hsa-mir-124 dbDEMC3; miR2Disease 8 hsa-mir-146a dbDEMC3; miR2Disease 33 hsa-mir-200a dbDEMC3; miR2Disease 9 hsa-mir-126 dbDEMC3; miR2Disease 34 hsa-mir-199a dbDEMC3 10 hsa-let-7a dbDEMC3; miR2Disease 35 hsa-mir-146b miR2Disease 11 hsa-mir-221 dbDEMC3; miR2Disease 36 hsa-let-7e dbDEMC3 12 hsa-mir-92a dbDEMC3 37 hsa-let-7g dbDEMC3 13 hsa-mir-200c dbDEMC3; miR2Disease 38 hsa-mir-10b dbDEMC3; miR2Disease 14 hsa-mir-16 dbDEMC3 39 hsa-mir-30a dbDEMC3 15 hsa-mir-18a dbDEMC3; miR2Disease 40 hsa-mir-101 dbDEMC3 16 hsa-mir-143 dbDEMC3; miR2Disease 41 hsa-mir-29b dbDEMC3; miR2Disease 17 hsa-mir-31 dbDEMC3; miR2Disease 42 hsa-mir-181b dbDEMC3; miR2Disease 18 hsa-mir-200b dbDEMC3; miR2Disease 43 hsa-mir-196a dbDEMC3; miR2Disease 19 hsa-mir-1 dbDEMC3 44 hsa-mir-148a dbDEMC3; miR2Disease 20 hsa-mir-34c dbDEMC3 45 hsa-mir-379 dbDEMC3 21 hsa-mir-222 dbDEMC3; miR2Disease 46 hsa-mir-183 dbDEMC3 22 hsa-mir-19a dbDEMC3 47 hsa-mir-15a dbDEMC3 23 hsa-mir-29a dbDEMC3 48 hsa-mir-1469 dbDEMC3 24 hsa-mir-218 dbDEMC3 49 hsa-mir-34b dbDEMC3 25 hsa-let-7b dbDEMC3 50 hsa-mir-195 dbDEMC3; miR2Disease -
[1] AMBORS V. MicroRNAs: Tiny Regulators with Great Potential[J]. Cell, 2001, 107(7): 823-826. doi: 10.1016/S0092-8674(01)00616-X [2] CHEN X, HUANG L. LRSSLMDA: Laplacian Regularized Sparse Subspace Learning for MiRNA-Disease Association prediction[J]. PLOS Computational Biology, 2017, 13(12): e1005912. doi: 10.1371/journal.pcbi.1005912 [3] CHEN X, XIE D, ZHAO Q, et al. MicroRNAs and complex diseases: from experimental results to computational models[J]. Briefings in Bioinformatics, 2019, 20(2): 515-539. doi: 10.1093/bib/bbx130 [4] CALIN G A, DUMITRU C D, SHIMIZU M, et al. Frequent deletions and down-regulation of microRNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia[J]. Proceedings of the National Academy of Sciences of the United States of America, 2002, 99(24): 15524-15524. doi: 10.1073/pnas.242606799 [5] YANG L, BELAGULI, BERGER D H. MicroRNA and Colorectal Cancer[J]. World Journal of Surgery, 2009, 33: 638-646. doi: 10.1007/s00268-008-9865-5 [6] JOHNSON S M, GROSSHANS H, SHINGARA J, et al. RAS is regulated by the let-7 microRNA family[J]. Cell, 2005, 120(5): 635-647. doi: 10.1016/j.cell.2005.01.014 [7] MICHAEL M Z. Reduced accumulation of specific microRNAs in colorectal neoplasia[J]. Molecular cancer research, 2003, 1: 882-891. [8] BANDYOPADHYAY S, MITRA R, MAULIK U, et al. Development of the human cancer microRNA network[J]. Silence, 2010, 1(1): 6. doi: 10.1186/1758-907X-1-6 [9] CHEN X, SUN Y Z, ZHANG D H, et al. NRDTD: a database for clinically or experimentally supported non-coding RNAs and drug targets associations[J]. Database, 2017, 2017: bax057. [10] ZHAO Y, CHEN X, YIN J. Adaptive boosting-based computational model for predicting potential miRNA-disease associations[J]. Bioinformatics, 2019, 35(22): 4730-4738. doi: 10.1093/bioinformatics/btz297 [11] CHEN X, ZHU C C, YIN J. Ensemble of decision tree reveals potential miRNA-disease associations[J]. PLOS Computational Biology, 2019, 15(7): e1007209-e1007209. doi: 10.1371/journal.pcbi.1007209 [12] GAO Z, WANG Y T, WU Q W, et al. Graph regularized L2, 1-nonnegative matrix factorization for miRNA-disease association prediction[J]. BMC Bioinformatics, 2020, 21(1): 61. doi: 10.1186/s12859-020-3409-x [13] CHEN X, WANG L, QU J, et al. Predicting miRNA-disease association based on inductive matrix completion[J]. Bioinformatics, 2018, 34(24): 4256-4265. [14] PENG J, HUI W, LI Q, et al. A learning-based framework for miRNA-disease association identification using neural networks[J]. Bioinformatics, 2019, 35(21): 4364-4371. doi: 10.1093/bioinformatics/btz254 [15] LI J, ZHANG S, LIU T, et al. Neural inductive matrix completion with graph convolutional networks for miRNA-disease association prediction[J]. Bioinformatics, 2020, 36(8): 2538-2546. doi: 10.1093/bioinformatics/btz965 [16] ZENG M, LU C, ZHANG F, et al. SDLDA: lncRNA–disease association prediction based on singular value decomposition and deep learning[J]. Methods, 2020, 179: 73-80. doi: 10.1016/j.ymeth.2020.05.002 [17] XIE G, CHEN H, SUN Y, et al. Predicting circRNA-Disease Associations Based on Deep Matrix Factorization with Multi-source Fusion[J]. Interdisciplinary Sciences:Computational Life Sciences, 2021, 13: 582-594. doi: 10.1007/s12539-021-00455-2 [18] HUANG Z, LIU L, GAO Y, et al. Benchmark of computational methods for predicting microRNA-disease associations[J]. Genome Biology, 2019, 20(1): 202. doi: 10.1186/s13059-019-1811-3 [19] LI Y, QIU C, TU J, et al. HMDD v2.0: A database for experimentally supported human microRNA and disease associations[J]. Nucleic Acids Research, 2014, 42(D1): D1070-D1074. doi: 10.1093/nar/gkt1023 [20] JIANG L, DING Y, TANG J, et al. MDA-SKF: Similarity Kernel Fusion for Accurately Discovering miRNA-Disease Association[J]. Frontiers in Genetics, 2018, 9: 618. doi: 10.3389/fgene.2018.00618 [21] DEVARAJAN K. Nonnegative Matrix Factorization: An Analytical and Interpretive Tool in Computational Biology[J]. PLOS Computational Biology, 2008, 4(7): e1000029. doi: 10.1371/journal.pcbi.1000029 [22] 王文铃, 虞慧, 范贵. 融合分类和情境偏好的矩阵分解电影推荐算法[J]. 华东理工大学学报(自然科学版), 2021, 47(3): 348-353. [23] KIPF T N, WELLING M. Semi-Supervised Classification with Graph Convolutional Networks[J]. arXiv preprint, 2017, arXiv 1609.02907. [24] 冒鑫鑫, 吴胜昔, 咸博龙, 等. 基于骨架的自适应图卷积和LSTM行为识别[J/OL]. 华东理工大学学报(自然科学版). doi: 10.14135/j.cnki.1006-3080.20210625001 [25] HAO X, ZHANG G, MA S. Deep Learning[J]. International Journal of Semantic Computing, 2016, 10(3): 417-439. doi: 10.1142/S1793351X16500045 [26] WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional Block Attention Module[J]. European conference on computer vision, 2018, 11211: 3-19. [27] ZHU R, JI C, WANG Y, et al. Heterogeneous Graph Convolutional Networks and Matrix Completion for miRNA-Disease Association Prediction[J]. Frontiers in Bioengineering and Biotechnology, 2020, 8: 901. doi: 10.3389/fbioe.2020.00901 [28] YU L, SHEN X, ZHONG D, et al. Three-Layer Heterogeneous Network Combined with Unbalanced Random Walk for miRNA-Disease Association Prediction[J]. Frontiers in genetics, 2019, 10: 1316-1316. [29] YANG Z, REN F, LIU C, et al. dbDEMC: A database of differentially expressed miRNAs in human cancers[J]. BMC Genomics, 2010, 11(4): 1-8. [30] JIANG Q, WANG Y, HAO Y, et al. MiR2Disease: A manually curated database for microRNA deregulation in human disease[J]. Nucleic Acids Research, 2009, 37: D98-D104. doi: 10.1093/nar/gkn714 -