Semantic segmentation of test papers based on subspace multi-scale feature fusion
-
摘要: 分离印刷体和手写体区域是实现试卷语义分割的关键步骤,为了提升试卷语义分割的效果,提出一种基于MaskRCNN网络的注意力改进算法。该算法将子空间多尺度特征融合(Subspace Multiscale Feature Fusion, SMFF)模块嵌入MaskRCNN网络的特征金字塔结构中,SMFF模块基于子空间计算注意力特征,减少特征图中的空间和通道冗余;通过多尺度特征融合,有效提取不同大小文本区域的特征并增强特征间的关联性。实验结果表明,在试卷图像数据集的目标检测和语义分割任务上,基于SMFF模块的MaskRCNN网络模型比MaskRCNN原网络模型的平均准确率提高了15.8%和10.2%,比基于常用注意力模块的MaskRCNN网络也有较大的性能提升。Abstract: An improved attention algorithm based on the MaskRCNN network to improve the effect of the semantic segmentation of the test paper, because separating the printed and handwritten regions is a key step to achieve the semantic segmentation of the test paper. The algorithm embeds the Subspace Multiscale Feature Fusion (SMFF) module into the feature pyramid structure of the MaskRCNN network, which calculates attention features based on the subspace, and reduces the spatial and channel redundancy in the feature map. Fusion can effectively extract features of text regions of different sizes and enhance the correlation between features. The experimental results show that the average accuracy of the MaskRCNN network model based on the SMFF module is 15.8% and 10.2% higher than that of the original MaskRCNN network model in the target detection and semantic segmentation tasks of the test paper image dataset, which has a large performance improvement than the MaskRCNN based on the commonly used attention module.
-
表 1 注意力模型的效果对比
Table 1. Comparison of the effect of attention model
Method reg/% seg/% Time
(sec)AP AP50 AP75 APm APl AP AP50 AP75 APm APl MR 57.9 89.8 66.4 40.7 55.0 49.4 90.6 41.1 44.4 46.7 6.654 MR-CC 65.6 93.9 77.7 45.9 64.4 53.3 93.8 46.9 47.3 51.4 6.765 MR-SE 67.6 94.9 79.6 48.4 66.4 55.8 96.1 53.0 56.4 53.8 6.656 MR-sc 68.2 95.7 82.1 47.8 67.9 55.4 96.5 49.5 49.1 53.8 6.656 MR-ECA 68.8 96.4 82.6 48.1 68.9 56.2 96.7 52.0 52.0 55.1 6.655 MR-UM 70.3 96.2 85.2 54.7 70.0 56.5 96.1 52.8 55.6 55.0 9.588 SMFF 73.7 97.5 89.0 55.2 74.7 59.6 97.8 61.3 54.9 58.9 8.709 表 2 特征融合模型效果对比
Table 2. Comparison of the effects of feature fusion model
Method reg/% seg/% Time(sec) AP AP50 AP75 APm APl AP AP50 AP75 APm APl MR 57.9 89.8 66.4 40.7 55.0 49.4 90.6 41.1 44.4 46.7 6.654 MR-DW 59.4 89.6 70.2 42.2 57.3 49.8 90.9 42.1 43.1 47.5 8.042 MR-Mix 65.0 93.8 78.5 44.8 63.7 52.4 94.0 44.7 46.6 49.7 7.857 MR-PS 64.6 93.1 77.0 44.2 63.3 53.5 93.3 47.3 47.1 51.0 8.118 SMFF 73.7 97.5 89.0 55.2 74.7 59.6 97.8 61.3 54.9 58.9 8.709 表 3 SMFF模块的消融实验
Table 3. Ablation experiment of SFM module
Back-Bone Sub-space Feature-fusion reg/% seg/% Time
(sec)AP AP50 AP75 APm APl AP AP50 AP75 APm APl MR 57.9 89.8 66.4 40.7 55.0 49.4 90.6 41.1 44.4 46.7 6.654 √ 70.0 95.8 84.5 52.8 70.9 56.4 96.5 54.3 58.1 55.7 7.619 √ √ 73.7 97.5 89.0 55.2 74.7 59.6 97.8 61.3 54.9 58.9 8.709 -
[1] ZHENG Y, LI H, DOERMANN D. Machine printed text and handwriting identification in noisy document images[J]. IEEE transactions on pattern analysis and machine intelligence, 2004, 26(3): 337-353. doi: 10.1109/TPAMI.2004.1262324 [2] 丁红, 张晓峰. 非均匀光照图像中粘连手写体和印刷体的辨别[J]. 计算机工程与设计, 2012, 33(12): 4634-4638. doi: 10.3969/j.issn.1000-7024.2012.12.042 [3] KAVALLIERATOU E, STAMATATOS S. Discrimination of machine-printed from handwritten text using simple structural characteristics[C]// Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004. IEEE, 2004, 1: 437-440. [4] SHIRDHONKAR M S, KOKARE M B. Discrimination between Printed and Handwritten Text in Documents[J]. International Journal of Computer Applications, 2010, RTIPPR(3): 131-134. [5] KOYAMA J, HIROSE A, KATO M. Local-spectrum-based distinction between handwritten and machine-printed characters[C]//2008 15th IEEE International Conference on Image Processing. IEEE, 2008: 1021-1024. [6] GARLAPATI B M, CHALAMALA S R. A system for handwritten and printed text classification[C]//2017 UKSim-AMSS 19th International Conference on Computer Modelling & Simulation (UKSim). IEEE, 2017: 50-54. [7] PENG X, SETLUR S, GOVINDARAJU V, et al. Handwritten text separation from annotated machine printed documents using Markov Random Fields[J]. International Journal on Document Analysis and Recognition (IJDAR), 2013, 16(1): 1-16. doi: 10.1007/s10032-011-0179-z [8] 林琴, 夏俊峰, 涂铮铮, 等. 基于帧特征及维特比解码的手写体与印刷体分类[J]. 激光与光电子学进展, 2019, 56(06): 123-129. [9] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 3431-3440. [10] RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234-241. [11] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[J]. arXiv preprint arXiv: 1706.05587, 2017. [12] CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 801-818. [13] NIRKIN Y, WOLF L, HASSNER T. Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 4061-4070. [14] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]// Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7132-7141. [15] WANG Q, WU B, ZHU P, et al. ECA-Net: efficient channel attention for deep convolutional neural networks, 2020 IEEE[C]//CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2020. [16] SAINI R, JHA N K, DAS B, et al. Ulsam: Ultra-lightweight subspace attention module for compact convolutional neural networks[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020: 1627-1636. [17] ROY A G, NAVAB N, WACHINGER C. Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks[C]// International conference on medical image computing and computer -assisted intervention. Springer, Cham, 2018: 421-429. [18] HE K, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2961-2969. [19] ROSS G. Fast R-CNN[C]//IEEE International Conferre- nce on Computer Vision(ICCV), 2015: 1440-1448 [20] HUANG Z, WANG X, HUANG L, et al. Ccnet: Criss-cross attention for semantic segmentation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 603-612. [21] LI D, YAO A, CHEN Q. Psconv: Squeezing feature pyramid into one compact poly-scale convolutional layer[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16. Springer International Publishing, 2020: 615-632. [22] CHOLLET F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258. [23] TAN M, LE Q V. Mixconv: Mixed depthwise convolutional kernels[J]. arXiv preprint arXiv: 1907. 09595, 2019. -