Abstract:
Semantic segmentation aims to assign a class label to each pixel in an image and has a wide range of applications. Semantic Segmentation needs large numbers of high-quality labels, which requires a lot of manpower and material resources. Furthermore, a semantic segmentation model trained on one domain cannot generalize well to other domains, which becomes a key problem in its practical applications. Unsupervised pixel-level intra-domain adaptation for semantic segmentation has been proven to be an effective method to address the problem. However, this method cannot effectively exploit spatial location information and is adversely affected by noisy pseudo-labels. In this work, we propose a confidence-guided multi-level domain adaptation approach to solve the problem. Specifically, we propose a multi-level domain adaptation framework to reduce the differences between pixels and spatial location information of images simultaneously. Moreover, to avoid that overfitting pseudo-labels may degrade the performance of the segmentation network, we construct a confidence loss function to constrain the network training. And we propose a method of selecting pseudo-labels and achieving better results in acquiring high-quality pseudo-labels than existing methods. We demonstrate the effectiveness of our approach through synthetic-to-real adaptation experiments. Compared with the unsupervised pixel-level intra-domain adaptation for semantic segmentation, our method leads to 6.5% and 2.8% relative improvements in mean intersection-over-union on the tasks “GTA5 to Cityscapes” and “SYNTHIA to Cityscapes”, respectively.