面向虚拟场景的大语言模型文本驱动连续地形生成方法研究

王唯佳; 陆元军; 吉斌; 李建华; 郭卫斌

doi:10.14135/j.cnki.1006-3080.20251221001

面向虚拟场景的大语言模型文本驱动连续地形生成方法研究

A Text-Driven Continuous Terrain Generation Method for Virtual Scenes Based on Large Language Models

摘要

摘要: 随着游戏开发、虚拟现实(Virtual Reality, VR)及数字孪生等三维技术的快速发展，对高质量、大范围虚拟连续地形场景的自动化生成需求日益迫切。传统手工交互方式操作繁琐且专业壁垒高，程序化生成方法虽能提升效率，但仍依赖图形界面和专业参数的调整，因而难以灵活捕捉用户语义意图。本文提出一种基于大语言模型(Large Language Model, LLM)的自动化文本驱动虚拟连续地形生成方法，旨在通过自然语言直接驱动生成过程，以实现高效、连续的虚拟地形场景构建。该方法采用多阶段任务分解策略，将地形生成流程拆分成7个步骤，融合提示词工程与程序化生成技术，构建从文本到大范围三维连续地形的端到端映射。针对LLM在复杂任务中的多跳推理缺陷、输出随机性及记忆丢失等局限，设计了跨阶段关键信息传递、错误重试与历史日志学习等机制，以提升系统可靠性和一致性。基于Unity引擎开发的原型系统验证了该方法的有效性，在4个典型地形主题上的测试结果表明，完整系统（Full Step System，FSS）的平均错误率仅为12.0%，相较于移除关键机制的对比系统，错误率显著降低了约19~47个百分点。在效率方面，生成单个预览地形区块（1024×1024顶点网格）的平均耗时约为42.14 s，资源消耗可控。对比实验进一步表明本文方法生成的虚拟地形在生成效率、视觉一致性与可扩展性方面均达到预期要求。该方法显著提升了生成效率与易用性，为非专业用户通过自然语言快速构建高质量、大范围三维地形提供了可行的解决方案。

Abstract: The rapid development of 3D technologies, such as game development, virtual reality (VR), and digital twins, has created an urgent demand for the automated generation of high-quality, large-scale virtual continuous terrain scenes. Traditional manual methods are cumbersome and require expertise, while procedural generation techniques, despite improved efficiency, still rely on graphical interfaces and specialized parameter adjustments, struggling to flexibly capture user semantic intent. This paper proposes an automated, text-driven continuous terrain generation method based on Large Language Models (LLMs), which aims to directly drive the generation process through natural language interaction, enabling the efficient construction of continuous virtual terrain scenes. The core of the proposed method lies in a multi-stage task decomposition strategy. The terrain generation pipeline is broken down into seven sequential steps, integrating prompt engineering with procedural generation techniques to establish an end-to-end mapping from text descriptions to large-scale 3D continuous terrain. To address the inherent limitations of LLMs in complex tasks—such as multi-hop reasoning deficits, output randomness, and context loss—mechanisms including cross-stage key information delivery, error retry, and historical log learning are designed, thereby enhancing system reliability and consistency. A prototype system developed within the Unity engine validates the effectiveness of the method. Evaluation results on four typical terrain themes demonstrate that the full system achieves an average error rate of only 12.0%. Compared to ablated systems where key mechanisms were removed, the error rate is significantly reduced by approximately 19 to 47 percentage points. In terms of efficiency, the average time for generating a single preview terrain chunk (a 1024×1024 vertex mesh) is approximately 42.14 s, with manageable resource consumption. Comparative experiments further indicate that the virtual terrain generated by the proposed method meets the expected requirements in terms of generation efficiency, visual consistency, and scalability. This method significantly improves generation efficiency and usability, providing a viable solution for non-experts to rapidly construct high-quality, large-scale 3D terrains using natural language.

HTML全文

参考文献(17)

施引文献

资源附件(0)