高级检索

    杨亚鑫, 王璟德, 孙巍. 基于特征结构组合描述的抗癌药物筛选[J]. 华东理工大学学报(自然科学版), 2023, 49(6): 907-914. DOI: 10.14135/j.cnki.1006-3080.20220908001
    引用本文: 杨亚鑫, 王璟德, 孙巍. 基于特征结构组合描述的抗癌药物筛选[J]. 华东理工大学学报(自然科学版), 2023, 49(6): 907-914. DOI: 10.14135/j.cnki.1006-3080.20220908001
    YANG Yaxin, WANG Jingde, SUN Wei. Screening of Antitumor Drug Based on the Combination of Featured Structure Description[J]. Journal of East China University of Science and Technology, 2023, 49(6): 907-914. DOI: 10.14135/j.cnki.1006-3080.20220908001
    Citation: YANG Yaxin, WANG Jingde, SUN Wei. Screening of Antitumor Drug Based on the Combination of Featured Structure Description[J]. Journal of East China University of Science and Technology, 2023, 49(6): 907-914. DOI: 10.14135/j.cnki.1006-3080.20220908001

    基于特征结构组合描述的抗癌药物筛选

    Screening of Antitumor Drug Based on the Combination of Featured Structure Description

    • 摘要: 收集了200个抗癌药和10940个非抗癌药,采用加权和欠采样方法,获得了均衡数据集。为了从众多结构指纹或者描述符中找出简短且对抗癌药物筛选贡献最大的描述符组合,采用了两种相关性特征选择方法去简化指纹或者描述符,并结合决策树筛选抗癌药物。筛选得到了3类各10个可以最有效识别抗癌药物的结构描述符组合,其中特征筛选后的10位MACC指纹最优,可以筛选出81%的抗癌药物,说明这两种相关性特征选择方法有效地提升了抗癌药物的筛选效果。

       

      Abstract: In order to screen anticancer drug accurately and quickly by effective structure with less numbers of descriptors, this work utilizes correlation feature selection methods to enhance the structural description ability of molecular fingerprints or descriptors towards anticancer drugs. The drug information from Drugbank and Pubchem, two databases of chemical compounds, is collected by labeling each antitumor drug as 1 and each non-antitumor drug is labeled as 0. An unbalanced dataset including 200 antitumor medicines and 10940 non-antitumor medicines is collected and cleaned. Weighting coefficients as well as under-sampling methods are used to deal with the unbalanced dataset and obtain two different balanced datasets. RDKit molecular descriptors, MACC fingerprints, Mordred molecular descriptors of the medicines are calculated to describe the structural information of medicines. Correlation feature selection methods are employed to reduce the redundancy among these molecular fingerprints or descriptors. By combining with decision tree, Pearson correlation coefficient and chi-squared \chi^2 test are used as the correlation feature selection to simplify the above structural molecular descriptors and select the best combination of featured structure with satisfactory screening performance. According to this results, the identification ability of antitumor drugs is enhanced through the feature selection. Furthermore, the combination of 10 featured MACC fingerprint shows the best performance with about 81% of the identified antitumor medicines. The best structural combination with anticancer effect is selected. As a conclusion, the above feature selection methods can effectively simplify molecular fingerprints or descriptors and better screen antitumor drugs.

       

    /

    返回文章
    返回