Advanced Search

    LI Xue, SHI Jinxue, WANG Huiqing, YAN Aoyu, WANG Sen. Transcription Factor Binding Site Prediction Model Based on MCSP and Swin Transformer[J]. Journal of East China University of Science and Technology. DOI: 10.14135/j.cnki.1006-3080.20241104001
    Citation: LI Xue, SHI Jinxue, WANG Huiqing, YAN Aoyu, WANG Sen. Transcription Factor Binding Site Prediction Model Based on MCSP and Swin Transformer[J]. Journal of East China University of Science and Technology. DOI: 10.14135/j.cnki.1006-3080.20241104001

    Transcription Factor Binding Site Prediction Model Based on MCSP and Swin Transformer

    • Predicting transcription factor binding sites (TFBS) can help identify specific regulatory mechanisms of cells and tissues, which is crucial for understanding gene expression regulation mechanisms. The existing methods combine DNA sequence and shape information for TFBS prediction, but they typically focus only on neighboring nucleotides to generate shape information, neglecting the influence of longer flanking nucleotides. In the sequence processing branch, these methods neglect the complementarity of features across different channels. Similarly, in the shape processing branch, local correlations and long-range dependencies of shape information are not adequately captured. This lack of deep exploration of both sequence and shape information limits prediction performance. To address these issues, this paper proposes a novel model, MSSW, for predicting transcription factor binding sites. Firstly, Deep DNAshape is used to generate long flanking shape information for the shape branch, considering a more comprehensive set of shape data. Additionally, the Swin Transformer is utilized for feature extraction of the shape information, capturing local correlations through window-based self-attention and obtaining long-range dependency information through window movement. Furthermore, the multi-scale convolution and split attention (MCSP) are employed to extract multi-scale cross-channel features from the sequence. Meanwhile, the sequence and shape features are fused to predict transcription factor binding sites. Finally, MSSW is evaluated on 165 ChIP-seq datasets. The experimental results show that it is superior to existing TFBS prediction models and ablation studies validate the effectiveness of MCSP and the Swin Transformer. Additionally, the model's generalization is verified across different cell lines, providing valuable insights for predicting TFBS in various cellular contexts. The proposed model achieves strong predictive performance across datasets of different scales, particularly excelling with medium and small-sized datasets.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return