Abstract:
The rapid development of 3D technologies, such as game development, virtual reality (VR), and digital twins, has created an urgent demand for the automated generation of high-quality, large-scale virtual continuous terrain scenes. Traditional manual methods are cumbersome and require expertise, while procedural generation techniques, despite improved efficiency, still rely on graphical interfaces and specialized parameter adjustments, struggling to flexibly capture user semantic intent. This paper proposes an automated, text-driven continuous terrain generation method based on Large Language Models (LLMs), which aims to directly drive the generation process through natural language interaction, enabling the efficient construction of continuous virtual terrain scenes. The core of the proposed method lies in a multi-stage task decomposition strategy. The terrain generation pipeline is broken down into seven sequential steps, integrating prompt engineering with procedural generation techniques to establish an end-to-end mapping from text descriptions to large-scale 3D continuous terrain. To address the inherent limitations of LLMs in complex tasks—such as multi-hop reasoning deficits, output randomness, and context loss—mechanisms including cross-stage key information delivery, error retry, and historical log learning are designed, thereby enhancing system reliability and consistency. A prototype system developed within the Unity engine validates the effectiveness of the method. Evaluation results on four typical terrain themes demonstrate that the full system achieves an average error rate of only 12.0%. Compared to ablated systems where key mechanisms were removed, the error rate is significantly reduced by approximately 19 to 47 percentage points. In terms of efficiency, the average time for generating a single preview terrain chunk (a
1024×
1024 vertex mesh) is approximately 42.14 s, with manageable resource consumption. Comparative experiments further indicate that the virtual terrain generated by the proposed method meets the expected requirements in terms of generation efficiency, visual consistency, and scalability. This method significantly improves generation efficiency and usability, providing a viable solution for non-experts to rapidly construct high-quality, large-scale 3D terrains using natural language.