基于改进生成式对抗网络的编码DNA分子识别

随学杰; 王慧锋; 颜秉勇

doi:10.14135/j.cnki.1006-3080.20191216001

基于改进生成式对抗网络的编码DNA分子识别

Encoded DNA Molecules Identification Based on the Improved Generative Adversarial Network

摘要

摘要: 纳米孔道单分子检测技术通过在纳米孔道中捕获分子穿过时产生的离子流变化信号来研究单个分子的信息。然而，由于纳米孔道对不同分子的捕获率不同，因此采集到的单分子数据集不平衡，进而影响分子识别的准确率。本文基于编码DNA分子的阻断事件，构建以深度卷积生成式对抗网络(DCGAN)为基本框架的模型，实现少数类样本的扩充，从而达到纳米孔道数据集的平衡处理，并采用QuipuNet对平衡前后的数据集进行训练和识别。结果表明，采用DCGAN平衡数据集后，训练后的QuipuNet对部分“100”编码分子的识别准确率提升了14%，且平均识别准确率均高于其他扩充数据集的方法，验证了采用DCGAN扩充编码DNA分子数据以平衡数据集可有效提高模型训练后对实际信号的识别准确率。

Abstract: Nanopore is a highly sensitive single-molecule detection technology, which researches the information of single molecule by capturing the change signal of ion current generated while the molecule traverses the nanopore. However, due to different capture rates of different molecules in the nanopore, the collected dataset is unbalanced, which will affect the accuracy of the molecule identification. Based on the blockage events of the encoded Generative Adversarial Networks (GAN) molecules, this paper constructs a Deep Convolutional Generative Adversarial Networks (DCGAN) based model to expand the minority samples, so as to achieve the balance processing of nanopore data set. In addition, QuipuNet is used to train and identify the data set before and after the balance. Finally, it is shown via the simulation results that the average classification accuracy of the trained QuipuNet for some “100” encoded molecules is improved by 14% after using DCGAN balanced dataset, and the average recognition accuracy rate is higher than those of other extended data sets methods. It is verified that DCGAN method can effectively improve the recognition accuracy of the actual signal after the model is trained by expanding the encoded DNA molecular data to balance the data set.

HTML全文

参考文献(23)

施引文献

资源附件(0)