高级检索

    钱能武, 郭卫斌, 范贵生. 基于关联规则挖掘的分布式小文件存储方法[J]. 华东理工大学学报(自然科学版), 2016, (5): 708-714. DOI: 10.14135/j.cnki.1006-3080.2016.05.019
    引用本文: 钱能武, 郭卫斌, 范贵生. 基于关联规则挖掘的分布式小文件存储方法[J]. 华东理工大学学报(自然科学版), 2016, (5): 708-714. DOI: 10.14135/j.cnki.1006-3080.2016.05.019
    QIAN Neng-wu, GUO Wei-bin, FAN Gui-sheng. Approach of Distributed Small File Storage Based on Association Rule Mining[J]. Journal of East China University of Science and Technology, 2016, (5): 708-714. DOI: 10.14135/j.cnki.1006-3080.2016.05.019
    Citation: QIAN Neng-wu, GUO Wei-bin, FAN Gui-sheng. Approach of Distributed Small File Storage Based on Association Rule Mining[J]. Journal of East China University of Science and Technology, 2016, (5): 708-714. DOI: 10.14135/j.cnki.1006-3080.2016.05.019

    基于关联规则挖掘的分布式小文件存储方法

    Approach of Distributed Small File Storage Based on Association Rule Mining

    • 摘要: Hadoop分布式文件系统(HDFS)设计之初是针对大文件的处理,但无法高效地针对小文件进行存储,因此提出了一种基于关联规则挖掘的高效的小文件存储方法——ARMFS。ARMFS通过对Hadoop系统的审计日志进行关联规则挖掘,获得小文件间的关联性,通过文件合并算法将小文件合并存储至HDFS;在请求HDFS文件时,根据关联规则挖掘得到的高频访问表和预取机制表提出预取算法来进一步提高文件访问效率。实验结果表明,ARMFS方法明显提高了NameNode的内存使用效率,对于小文件的下载速度和访问效率的改善十分有效。

       

      Abstract: Hadoop distributed file system (HDFS) is previously designed for large file processing,but it is not effective for small file storage.This paper proposes an efficient method of distributed small file storage by means of association rule mining and named ARMFS.By analyzing the audit logs to obtain the association of small files,these small files are merged and compressed to HDFS via file merge algorithm.When requesting HDFS file,the prefetching algorithm is further proposed to improve the access efficiency according to the high frequency access table and prefetching table that is based on association rules.The experiment results show that the ARMFS method can significantly improve the memory efficiency on NameNode and the access efficiency of the small file on HDFS.

       

    /

    返回文章
    返回