Evaluating deep learning approaches to characterize and classify the DGAs at scale

Vinayakumar, R.; Soman, K.P.; Poornachandran, Prabaharan; Sachin Kumar, S.

doi:10.3233/JIFS-169423

Evaluating deep learning approaches to characterize and classify the DGAs at scale

Issue title: Special Section: Soft Computing and Intelligent Systems: Techniques and Applications

Guest editors: Sabu M. Thampi, El-Sayed M. El-Alfy, Sushmita Mitra and Ljiljana Trajkovic

Article type: Research Article

Authors: Vinayakumar, R.^{a; *} | Soman, K.P.^a | Poornachandran, Prabaharan^b | Sachin Kumar, S.^a

Affiliations: [a] Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Amrita University, India | [b] Center for Cyber Security Systems and Networks, Amrita School of Engineering, Amritapuri, Amrita Vishwa Vidyapeetham, Amrita University, India

Correspondence: [*] Corresponding author. R. Vinayakumar, Centre for Computational Engineering and Networking (CEN), Amrita School of Engineering, Coimbatore, Amrita Vishwa Vidyapeetham, Amrita University, India. E-mail: [email protected].

Abstract: In recent years, domain generation algorithms (DGAs) are the foundational mechanisms for many malware families. Mainly, due to the fact that DGA can generate immense number of pseudo random domain names to associate to a command and control (C2) infrastructures. This paper focuses on to detect and classify the pseudo random domain names without relying on the feature engineering or any other linguistic, contextual or semantics and statistical information by adopting deep learning approaches. A deep learning approach is a complex model of traditional machine learning mechanism that has received renewed interest by solving the long-standing tasks in artificial intelligence (AI) related to the field of natural language processing, image recognition, speech processing and many others. They have immense capability to extract optimal feature representations by taking input as in the form of raw input texts. To leverage this and to transfer the performance enhancement in aforementioned areas towards characterize, detect and classify the DGA generated domain names to a specific malware family, this paper adopts deep learning mechanisms with a known one million benign domain names from Alexa, OpenDNS and a corpus of malicious domain names generated from 17 DGA malware families in real time for training in character and bigram level and a trained model has been evaluated on the OSNIT data set in real-time. Specifically, to understand the effectiveness of various deep learning mechanisms, we used recurrent neural network (RNN), identity-recurrent neural network (I-RNN), long short-term memory (LSTM), convolution neural network (CNN), and convolutional neural network-long short-term memory (CNN-LSTM) architectures. Additionally, to find out an optimal architecture, experiments are done with various configurations of network parameters and network structures. All experiments run up to 1000 epochs with a learning rate set in the range [0.01-0.5]. Overall, deep learning approaches, particularly family of recurrent neural network and a hybrid network (where the first layer is CNN and a subsequent layer is LSTM) have showed significant performance with a highest detection rate 0.9945 and 0.9879 respectively. The main reason is deep learning approaches have inherent mechanisms to capture hierarchical feature extraction and long range-dependencies in sequence inputs.

Keywords: Domain generation algorithms (DGAs), deep learning mechanisms, recurrent neural network (RNN), identity-recurrent neural network (IRNN), long short-term memory (LSTM), convolution neural network (CNN), convolutional neural network-long short-term memory (CNN-LSTM)

DOI: 10.3233/JIFS-169423

Journal: Journal of Intelligent & Fuzzy Systems, vol. 34, no. 3, pp. 1265-1276, 2018

Published: 22 March 2018

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia