Abstract:
Hadoop distributed file system (HDFS) is previously designed for large file processing,but it is not effective for small file storage.This paper proposes an efficient method of distributed small file storage by means of association rule mining and named ARMFS.By analyzing the audit logs to obtain the association of small files,these small files are merged and compressed to HDFS via file merge algorithm.When requesting HDFS file,the prefetching algorithm is further proposed to improve the access efficiency according to the high frequency access table and prefetching table that is based on association rules.The experiment results show that the ARMFS method can significantly improve the memory efficiency on NameNode and the access efficiency of the small file on HDFS.