Malware Detection Method Based on LSTM-SVM Model
-
摘要: 为了提高Android恶意软件的检测精度,提出了一种基于LSTM-SVM(Long Short-Term Memory-Support Vector Machine)模型的Android恶意软件静态检测方法。通过反编译Android软件的APK(Android Package)文件,提取出采用权限、组件、意图3类信息构成XML特征;通过分析API(Application Programming Interface)调用情况构成API特征。考虑恶意软件运行的时序性、特征维度等,基于XML特征构建LSTM异常检测模型,基于API特征构建SVM异常检测模型,两个模型采用并联模式,基于概率差融合算法得到最终的检测结果。在CICAndMal2017数据集上的实验结果表明,本文方法的检测精度可以达到98%以上。
-
关键词:
- Android恶意软件 /
- 静态检测 /
- 长短期记忆网络(LSTM) /
- 支持向量机(SVM)
Abstract: In order to improve the detection accuracy of Android malware, a static detection method of Android malware based on LSTM-SVM (long short-term memory network-support vector machine) model is proposed. Firstly, by means of the APK (Android Package) file of decompilation Android software, three types of information, including permission, component and intent, are extracted from the AndroidManifest.xml file to form the XML features. Then, the API features are formed by analyzing the API (Application Programming Interface) called situation. By considering the timing and feature dimension of malware operation, LSTM anomaly detection model is constructed based on XML feature, meanwhile, SVM anomaly detection model is constructed based on API feature. The obtained models are parallelly undergone to obtain the final detection result via the probability difference fusion algorithm. Finally, the experimental results on CICAndMal2017 data set show that the detection accuracy of this proposed method can reach more than 98%. -
表 1 XML特征示例
Table 1. Examples of XML features
Type Feature Permission Android.permission.WRITE_ SMS,
Android.permission.ACCESS_FINE_LOCATION,
Android.permission.ACCESS_WIFI_STATE,···Component MenuAboutActivity,TrashClearActivity,
BoostMainActivity,AppManagerActivity,···Intent ACTION_GOSTATICSDK,REGISTRATION,
ValentinesMessages,inigoandroid,foursquared,···表 2 数据集划分
Table 2. Data set partition
Training set Validation set Testing set Benign Malicious Benign Malicious Benign Malicious 1020 255 340 85 340 86 表 3 特征类别及其数量
Table 3. Category and quantity of features
API feature XML feature Permission Component Intent 2253 414 20653 2826 表 4 基本评价指标
Table 4. Basic evaluation index
Real classification Forecast classification Total Malicious Benign Malicious TN FP N Benign FN TP P 表 5 不同API特征子列表长度检测结果对比
Table 5. Comparison of detection results of different API feature sublist lengths
n ACC/% TPR/% FPR/% 700 94.13 97.94 20.93 1000 94.6 97.65 17.44 1500 94.13 96.47 15.12 2000 96.01 97.65 10.47 2500 95.77 97.65 11.63 3000 96.48 98.53 11.63 3500 96.71 98.82 11.63 表 6 不同特征融合方式检测结果比较
Table 6. Comparison of detection results of different feature fusion methods
Method ACC/% TPR/% FPR/% XML 96.48 97.35 6.98 XML+API 97.42 99.71 11.63 XML-API 97.89 99.12 6.98 表 7 基于XML特征的不同模型检测结果比较
Table 7. Comparison of detection results of different models based on XML features
Model ACC/% TPR/% FPR/% SVM 95.77 97.06 9.30 RF 96.71 97.94 8.14 LSTM 96.48 97.35 6.98 MLP 96.01 96.76 6.98 CNN 96.48 97.65 8.14 表 8 基于API特征的不同模型检测结果比较
Table 8. Comparison of detection results of different models based on API features
Model ACC/% TPR/% FPR/% SVM(Linear) 96.01 97.65 10.47 SVM(RBF) 92.25 96.18 23.26 SVM(Poly) 93.19 93.24 6.98 RF 94.84 97.65 16.28 LSTM 95.54 98.24 15.12 MLP 94.37 97.65 18.60 CNN 95.07 97.94 16.28 表 9 并联模型的对比实验结果
Table 9. Comparison of parallel models experiments
Model ACC/% TPR/% FPR/% Baseline 96.48 97.35 6.98 LSTM-SVM 98.12 99.41 6.98 LSTM-RF 97.42 98.53 6.98 LSTM-LSTM 97.89 99.12 6.98 LSTM-MLP 97.42 98.82 8.14 LSTM-CNN 97.65 99.41 9.30 -
[1] PAN Y, GE X T, FANG C R, et al. A systematic literature review of Android malware detection using static analysis[J]. IEEE Access, 2020, 8: 116363-116379. doi: 10.1109/ACCESS.2020.3002842 [2] LI L, BISSYANDE T F, PAPADAKIS M. Static analysis of Android apps: A systematic literature rreview[J]. Information and Software Technology, 2017, 88: 67-95. doi: 10.1016/j.infsof.2017.04.001 [3] YAN P, YAN Z. A Survey on dynamic mobile malwdetection[J]. Software Quality Journal, 2018, 26(3): 891-919. doi: 10.1007/s11219-017-9368-4 [4] REHMAN Z, KHAN S N, MUHAMMAD K. Machine learning-assisted signature and heuristic-based detection of malwares in Androidevices[J]. Computers & Electrical Engineering, 2018, 69: 828-841. [5] KAUSHIK P, YADAV P K. A noapproach for detecting malware in Android applications using deep learning[C]//11th International Conference on Contemporary Computing(IC3). India: IEEE, 2018: 59-62. [6] LI D F, WANG Z G, XUE Y B. Fine-grained Android malware detection based on deep learning[C]//6th IEEE Conference on Communications and Network Security (CNS). China: IEEE, 2018: 1-2. [7] FCIZOLLAH A, ANUAR N B, SDLLEH R, et al. AndroDialysis: Analysis of Android intent effectiveness in malware detection[J]. Computers & Security, 2017, 65: 121-134. [8] LI J, SUN L C, YAN Q B, et al. Significant permission identification for machine learning-based Android malware detection[J]. IEEE Transactions on Industrial Informatics, 2018, 14(7): 3216-3225. doi: 10.1109/TII.2017.2789219 [9] ALOTAIBI A. Identifying malicious software using deep residual long-short term memory[J]. IEEE Access, 2019, 7: 163128-163137. doi: 10.1109/ACCESS.2019.2951751 [10] XU K, LI Y J, ROBERT H, et al. DeepRefiner: Multi-layer Android malware detection system applying deep neural networks[C]//3rd IEEE European Symposium on Security and Privacy(Euro S&P). England: IEEE Computer Soc, 2018: 473-487. [11] 孙志强, 万良, 丁红卫. 基于深度自编码网络的Android恶意软件检测方法[J]. 计算机科学, 2020, 47(4): 298-304. doi: 10.11896/jsjkx.190700132 [12] JUNG J, LIM K, KIM B, et al. Detecting malicious Android apps using the popularity and relations of APIs[C]//2nd IEEE International Conference on Artificial Intelligence and Knowledge Engineering(AIKE). Italy: IEEE Computer Society, 2019: 309-312. [13] PEYNIRCI G, EMINAGAOGLU M, KARABULUT K. Feature selection for malware detection on the Android platform based on differences of IDF values[J]. Journal of Computer Science and Technology, 2020, 35(4): 946-962. doi: 10.1007/s11390-020-9323-x [14] YUAN Z L, LU Y Q, XUE Y B. DroidDetector: Android malware characterization and detection using deep learning[J]. Tsinghua Science and Technology, 2016, 21(1): 114-123. doi: 10.1109/TST.2016.7399288 [15] ZHANG X Q, CHEN J H. Deep learning based intelligent intrusion detection[C]//9th IEEE International Conference on Communication Software and Networks(ICCSN). China: IEEE, 2017: 1133-1137. [16] LI Z J, FENG X J, WU Z Q, et al. Classification of atrial fibrillation recurrence based on a convolution neural network with SVM architecture[J]. IEEE Access, 2019, 7: 77849-77856. doi: 10.1109/ACCESS.2019.2920900 [17] 付仔蓉, 吴胜昔, 吴潇颖, 等. 基于空间特征的BI-LSTM人体行为识别[J]. 华东理工大学学报(自然科学版), 2021, 47(2): 225-232. [18] LASHKARI A H, KADIR A F A, TAHERI L, et al. Toward developing a systematic approach to generate benchmark Android malware datasets and classification [C]//52nd Annual IEEE International Carnahan Conference on Security Technology(ICCST). Canada: IEEE, 2018: 242-248. -