高级检索

    张成博, 虞慧群, 郭健美, 杨定裕, 范贵生. DFMapper:基于查询树的SQL-to-HiveQL自动翻译工具[J]. 华东理工大学学报(自然科学版), 2019, 45(1): 148-155. DOI: 10.14135/j.cnki.1006-3080.20171126002
    引用本文: 张成博, 虞慧群, 郭健美, 杨定裕, 范贵生. DFMapper:基于查询树的SQL-to-HiveQL自动翻译工具[J]. 华东理工大学学报(自然科学版), 2019, 45(1): 148-155. DOI: 10.14135/j.cnki.1006-3080.20171126002
    ZHANG Chengbo, YU Huiqun, GUO Jianmei, YANG Dingyu, FAN Guisheng. DFMapper: An Automatic Query-Tree Based SQL-to-HiveQL Translator[J]. Journal of East China University of Science and Technology, 2019, 45(1): 148-155. DOI: 10.14135/j.cnki.1006-3080.20171126002
    Citation: ZHANG Chengbo, YU Huiqun, GUO Jianmei, YANG Dingyu, FAN Guisheng. DFMapper: An Automatic Query-Tree Based SQL-to-HiveQL Translator[J]. Journal of East China University of Science and Technology, 2019, 45(1): 148-155. DOI: 10.14135/j.cnki.1006-3080.20171126002

    DFMapper:基于查询树的SQL-to-HiveQL自动翻译工具

    DFMapper: An Automatic Query-Tree Based SQL-to-HiveQL Translator

    • 摘要: Hive作为建立在Hadoop上的数据仓库,已成为很多企业处理大数据的首选。然而,传统企业中大量的遗留应用依赖于传统关系型数据库(RDBMS),迁移时需要翻译大量查询语句。提出了一种基于查询树的SQL到HiveQL的自动翻译方法。该方法利用SQL解析器将SQL语句解析为查询树,提供8种不同的重写策略重构查询树,进而将其转化为正确的HiveQL语句,实现了一个翻译工具——DFMapper。在基准测试集TPC-DS上进行的查询实验证明,DFMapper可以正确翻译绝大多数的查询语句,并且具有很强的扩展性。

       

      Abstract: Due to the increasing amount of data in storing and processing, the traditional RDBMS encounters performance bottleneck. As a data warehouse built on Hadoop for providing data analysis and summarization as a right alternative to the traditional RDBMS, Hive becomes the first choice for many enterprises to deal with big data for its massive scale out and fault tolerance capabilities. In traditional enterprises particularly, a wide variety of legacy applications depend on the traditional RDBMS. Therefore, when migrating these applications to Hive necessary, a large number of queries need to be translated, which will consume hug cost of labor and time via manual way. This paper proposes a query-tree based approach for automatically translating SQL in RDBMS into proper HiveQL. The SQL parser is applied to parsing SQL sentence to query trees that will be supplied with correspondence between tables and columns during pretreatment. By taking into account of set operations, correlated subqueries, and other structures that HiveQL support weakly, this paper proposes eight different rewriting strategies to reconstruct query trees, and in turn to transform those queries in HiveQL sentences. A translation tool called DFMapper may provide a strategies loader to dynamically adjust the specific strategies according to actual requirements, e.g., the version of Hive, SQL dialect, etc., via the modification of externalized configuration. Besides, a validator is designed to verify the accuracy of translation by comparing the result sets of queries executed in RDBMS and Hive, respectively. It is demonstrated via the experiments on the TPC-DS benchmark composed of 99 different queries and covering a varity of ANSI SQL syntax that DFMapper can correctly translate the vast majority of universal queries with strong extensibility.

       

    /

    返回文章
    返回