Abstract:
Optical chemical structure recognition from scientific publications is an essential part of rediscovering the nature of chemical structures, and is of great significance for drug research and natural product research. The existing optical chemical structure recognition methods have problems such as low recognition rate. To improve the recognition performance of the optical chemical structure recognition task effectively, the paper proposes a deep learning method(DeepOCSR) for optical chemical structure recognition. Based on the encoder-decoder architecture, this method introduces Transformer and ResNeSt models to transform chemical structure images in publications into SMILES sequences. In order to train and verify the proposed method, two novel chemical structure datasets are constructed, one of which contains common substituents in the chemical literature. The proposed method is compared with other existing deep-learning approaches. It is shown via the experimental results that the proposed method is superior to other methods in the key indicators such as similarity and effectiveness.