面向虚拟场景的大语言模型文本驱动连续地形生成方法研究

王唯佳; 陆元军; 吉斌; 李建华; 郭卫斌

doi:10.14135/j.cnki.1006-3080.20251221001

面向虚拟场景的大语言模型文本驱动连续地形生成方法研究

A Text-Driven Continuous Terrain Generation Method for Virtual Scenes Based on Large Language Models

摘要

摘要: 随着游戏开发、虚拟现实(Virtual Reality, VR)及数字孪生等三维技术的快速发展，对高质量、大范围虚拟连续地形场景的自动化生成需求日益迫切。传统的手工交互方式操作繁琐且专业壁垒高，程序化生成方法虽能提升效率，但仍依赖图形界面和专业参数的调整，因而难以灵活捕捉用户语义意图。本文提出一种基于大语言模型(Large Language Model, LLM)的自动化文本驱动虚拟连续地形生成方法，旨在通过自然语言直接驱动生成过程，以实现高效、连续的虚拟地形场景构建。该方法采用多阶段任务分解策略，将地形生成流程拆分成7个步骤，融合提示词工程与程序化生成技术，构建从文本到大范围三维连续地形的端到端映射。针对LLM在复杂任务中的多跳推理缺陷、输出随机性及记忆丢失等局限，设计了跨阶段关键信息传递、错误重试与历史日志学习等机制，以提升系统可靠性和一致性。基于Unity引擎开发的原型系统验证了该方法的有效性，在4个典型地形主题上的测试结果表明，完整系统（Full Step System，FSS）的平均错误率仅为12.0%，相较于移除关键机制的对比系统，错误率显著降低了约19~47个百分点。在效率方面，生成单个预览地形区块（1024×1024顶点网格）的平均耗时约为42.14 s，资源消耗可控。对比实验进一步表明本文方法生成的虚拟地形在生成效率、视觉一致性与可扩展性方面均达到预期要求。该方法显著提升了生成效率与易用性，为非专业用户通过自然语言快速构建高质量、大范围三维地形提供了可行的解决方案。

Abstract: With the rapid advancement of three-dimensional technologies including game development, Virtual Reality (VR) and digital twins, there is an increasingly urgent demand for the automatic generation of high-quality, large-scale continuous virtual terrain scenes. Conventional manual interactive workflows involve cumbersome operations and high professional thresholds. Although procedural generation methods can improve efficiency, they still rely on graphical interfaces and the tuning of specialized parameters, making it difficult to flexibly capture users’ semantic intentions. This paper proposes an automatic text-driven generation method for continuous virtual terrain based on the Large Language Model (LLM), which directly drives the generation pipeline through natural language to realize efficient construction of continuous virtual terrain scenes. A multi-stage task decomposition strategy is adopted in the proposed method to split the terrain generation pipeline into seven steps. By integrating prompt engineering and procedural generation techniques, an end-to-end mapping from text to large-scale continuous 3D terrain is established. To address the limitations of LLMs in complex tasks such as defects in multi-hop reasoning, output randomness and memory loss, mechanisms including cross-stage key information transmission, error retry and historical log learning are designed to improve the reliability and consistency of the system. A prototype system developed on the unity engine verifies the effectiveness of the proposed method. Test results on four typical terrain themes demonstrate that the Full Step System (FSS) achieves an average error rate of only 12.0%, which is significantly reduced by approximately 19–47 percentage points compared with the comparative systems without core mechanisms. In terms of efficiency, the average time consumption for generating a single preview terrain tile (1024 × 1024 vertex mesh) is about 42.14 s with controllable resource consumption. Further comparative experiments prove that the virtual terrain generated by the proposed method meets the expected requirements in generation efficiency, visual consistency and scalability. The proposed method remarkably improves generation efficiency and usability, offering a feasible solution for non-professional users to rapidly construct high-quality, large-scale 3D terrain via natural language.

HTML全文

参考文献(17)

施引文献

资源附件(0)