Abstract:
The non-contact deception detection technology has a lot of significant applications in some areas such as judicial. In order to improve the accuracy of deception detection model, this paper proposes a multi-modal fusion based deception detection model, by which we can effectively complete the deception detection evaluation task by only processing the video and audio signals of the deception detector during speaking. The heart rate may reflect the emotional changes of the liar. We extract the feature of heart rate via the photo-plethysmography method and the fully connected network and extract the video and text features through 3D-convolutional neural networks and Word2Vec+CNN. All of these extracted features are merged. And then, we use the linear support vector machines to classify the fused features. The simulation experiments are carried out on the open source real-life trial dataset. Compared with the latest MLP
H+C multi-modal model, the proposed deception detection model can increase the accuracy by 2.74%—23% in the three-mode. In order to evaluate whether the combination of different modal features could improve the performance of the deception detection model, we use the combination of features of each modality to conduct experiments. The accuracy of each category combination is over 70% and the accuracy of the combination of text and heart rate features is 96.89%. Especially, the combination of text, video and heart rate can obtain the highest accuracy, 98.88%, and the AUC values, 0.9883. The accuracy of three-mode prediction is better than the single-mode and dual-mode. The experimental results show that the proposed multi-modal model can effectively improve the correct rate of deception detection.