Abstract:
Cylinders are common equipment in the laboratory, which are characterized by large, quantity, high risk concealment and great accident harm. Therefore, cylinder supervision is very important for laboratory safety management. Video monitoring is an effective laboratory safety management means, but the monitoring videos need to be watched by specially assigned staff, and the ability of the quality of the surveillance personnel is different, so it cannot be guaranteed that they can identify dangerous information in the video pictures. Therefore, this paper proposes an image description generation method combining object detection and text recognition for the laboratory gas cylinder scene, which is used to identify the potential danger information in the cylinder scene and warn the monitoring personnel in the form of text. Firstly, the features of the scene object and the text on the cylinder body are extracted and mapped into the multi-modal embedding space. Then, Transformer structure is utilized to generate caption results. Finally, it is judged whether the scene is dangerous according to the description statement. It is shown from experimental results that the description statements generated by this method can effectively identify the dangerous substances and causes in the laboratory cylinder scene.