Abstract:
Retrosynthetic analysis is the process of deriving commercially available precursor molecules from target product molecules. To address the limitations of existing single-step chemical retrosynthetic methods in terms of generalization ability and prediction accuracy, this paper proposes an ensemble prediction model integrating molecular fingerprints and graph neural networks, solving the problems of low scalability and interpretability of single models. This method improves the accuracy and stability of retrosynthetic prediction by constructing multidimensional molecular representations and dynamically selecting the optimal prediction strategy. The model is based on message-passing neural networks (MPNN), global self-attention mechanisms, and attentive FP networks to extract molecular structural features, and introduces extended connectivity fingerprints (ECFP) to achieve joint modeling of local structural information and global molecular features, constructing two fusion architectures: MPNN+FPS and Attentive FP+FPS. Extensive experiments on the USPTO-50K and natural product datasets validated the effectiveness of the method. The proposed ensemble retrosynthetic prediction method demonstrates significant advantages in both prediction accuracy and generalization ability, providing a new technical approach for retrosynthetic analysis of complex molecules and natural products.