Abstract:
Brain signal-driven three-dimensional (3D) reconstruction is an important research direction in the field of Brain-Computer Interfaces (BCIs). Existing studies mainly rely on functional magnetic resonance imaging (fMRI), which provides high spatial resolution but suffers from high cost and poor real-time performance. Compared with fMRI, electroencephalography (EEG) is low-cost, non-invasive, and has high temporal resolution, making it more suitable for practical interactive applications. However, the low signal-to-noise ratio and limited spatial resolution of EEG make it difficult to reconstruct complex 3D scenes directly from neural signals. To address this issue, this paper proposes a novel EEG-driven 3D scene reconstruction framework named Mind2Matter. The proposed framework adopts a two-stage pipeline. In the first stage, a hierarchical EEG encoder is designed to extract discriminative spatiotemporal features from EEG signals. A large language model (LLM) is then introduced to generate semantic text descriptions through cross-modal semantic alignment between EEG features and visual embeddings. In the second stage, the generated text descriptions are used as conditions for a layout-constrained 3D Gaussian Splatting framework to reconstruct semantically consistent 3D scenes. Experimental results on the EEG-Image dataset demonstrate that the proposed method achieves superior performance in both semantic understanding and geometric reconstruction. Specifically, the method obtains 34.21% ROUGE-1 F-score, 7.62% BLEU-4, and 37.19% BERTScore F-score in the EEG-to-text generation task. In the 3D reconstruction task, the proposed framework achieves a CLIP Similarity score of 0.701, while the Chamfer Distance (CD) and Earth Mover’s Distance (EMD) are reduced to 4.66 and 10.93, respectively, outperforming existing methods. The results verify the feasibility of EEG-driven 3D reconstruction and provide a new solution for brain-computer interaction and virtual reality applications.