Abstract:
Domain generation algorithms (DGA) is one of the domain name detection key technologies for malicious C&C (command and control server) communication detection. Many existing detection methods, e.g., machine learning methods based on statistical features and deep learning methods based on recurrent neural networks, are usually based on the randomness of the domain name and have higher false positive rate and lower detection for these domain names with low random features. A main reason is that those methods cannot effectively extract some of the high randomness from the low-random domain names. This usually makes normal domain names be falsely reported as DGA domain names and increases the unnecessary consumption of the safety system and reduces its reliability. Aiming at the above shortcoming, this paper proposes a multi-character random extraction method for domain name. The gated recurrent unit (GRU) is utilized to encode multi-character combination and extract the randomness of the domain name. At the same time, the attention mechanism is introduced to extract the randomness of characters in the domain name and strengthen the high random features in the domain name. Besides, DGA domain name detection algorithm based on the attention-based recurrent neural network ATT-GRU is proposed to improve the identification validity on the low random DGA domain name. Finally, it is verified from experiments results that the ATT-GRU algorithm can achieve better accuracy and lower false positive rate than the traditional algorithm in detecting DGA domain name.