Abstract:
Accurate identification of enhancer-promoter interactions (EPIs) has important significance for tracking disease source and developing gene therapy. Some existing EPIs prediction methods mainly focus on the extraction of specific level sequence features, and lack attention to multi-level feature fusion in enhancer and promoter sequences. By introducing fine-grained and coarse-grained, a parallel bidirectional gating unit attention network-based EPIs prediction model, EPI-PBGA, is proposed to extract different levels of features and explore the complementarity between different levels. Through two sub-networks, the hierarchical bidirectional gating unit attention (TBGA) sub-network and convolutional neural network(CNN) sub-network, EPI-PBGA can learn the multi-granularity features of sequences separately. Due to the ubiquitous cell-specificity of EPIs, the optimal coarse grain size is determined individually in different cell lines by using sequence segmentation strategy. TBGA processes component subsequences through a component-global progressive strategy at the coarse granularity and obtains multiple component-level feature vectors such that this model can capture potential association information between component-level vectors, including promoter-promoter association information, enhancer-enhancer association information, and enhancer-promoter association information that is often ignored. Moreover, a CNN network only with fewer filters is still applied for fine-grained, because of the better extract performance of CNN in previous studies. Multi-granularity information is obtained by fusing high-dimensional features that are extracted via two sub-networks. CNN sub-network and TBGA sub-network enable this model not only to explore the complementarity between features of different grain, but also to solve the problem of feature loss in sequence segmentation. The experimental results show that EPI-PBGA can effectively combine different granularity information. By comparison with previous EPIs prediction methods, PBGA performs better on six-cell datasets and can effectively predict EPIs.